public class PotentialShapedRMax extends OOMDPPlanner implements LearningAgent
Model
class for more information on defining your
own model.
1. John Asmuth, Michael L. Littman, and Robert Zinkov. "Potential-based Shaping in Model-based Reinforcement Learning." AAAI. 2008.Modifier and Type | Class and Description |
---|---|
protected class |
PotentialShapedRMax.PotentialShapedRMaxRF
This class is a special version of a potential shaped reward function that does not remove the potential value for transitions to states with uknown action transitions
that are followed.
|
class |
PotentialShapedRMax.PotentialShapedRMaxTerminal
A Terminal function that treats transitions to RMax fictious nodes as terminal states as well as what the model reports as terminal states.
|
static class |
PotentialShapedRMax.RMaxPotential
A potential function for vanilla RMax; all states have a potential value of R_max/(1-gamma)
|
LearningAgent.LearningAgentBookKeeping
Modifier and Type | Field and Description |
---|---|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
the saved previous learning episodes
|
protected int |
maxNumSteps
The maximum number of learning steps per episode before the agent gives up
|
protected Model |
model
The model of the world that is being learned.
|
protected Domain |
modeledDomain
The modeled domain object containing the modeled actions that a planner will use.
|
protected RewardFunction |
modeledRewardFunction
The modeled reward function that is being learned.
|
protected TerminalFunction |
modeledTerminalFunction
The modeled terminal state function.
|
protected ModelPlanner |
modelPlanner
The model-adaptive planning algorithm to use
|
protected int |
numEpisodesToStore
The number of the most recent learning episodes to store.
|
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
PotentialShapedRMax(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double maxReward,
int nConfident,
double maxVIDelta,
int maxVIPasses)
Initializes for a tabular model, VI planner, and standard RMax paradigm
|
PotentialShapedRMax(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
PotentialFunction potential,
int nConfident,
double maxVIDelta,
int maxVIPasses)
Initializes for a tabular model, VI planner, and potential shaped function.
|
PotentialShapedRMax(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
PotentialFunction potential,
Model model,
ModelPlanner.ModelPlannerGenerator plannerGenerator)
Initializes for a given model, model planner, and potential shaped function.
|
Modifier and Type | Method and Description |
---|---|
protected Policy |
createDomainMappedPolicy() |
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes()
Returns all saved
EpisodeAnalysis objects of which the agent has kept track. |
EpisodeAnalysis |
getLastLearningEpisode()
Returns the last learning episode of the agent.
|
Model |
getModel()
Returns the model learning algorithm being used.
|
Domain |
getModeledDomain()
Returns the model domain for planning.
|
RewardFunction |
getModeledRewardFunction()
Returns the model reward function.
|
TerminalFunction |
getModeledTerminalFunction()
Returns the model terminal function.
|
ModelPlanner |
getModelPlanner()
Returns the planning algorithm used on the model that can be iteratively updated as the model changes.
|
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState)
Causes the agent to perform a learning episode starting in the given initial state.
|
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState,
int maxSteps)
Causes the agent to perform a learning episode starting in the given initial state.
|
void |
setNumEpisodesToStore(int numEps)
Tells the agent how many
EpisodeAnalysis objects representing learning episodes to internally store. |
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected Model model
protected Domain modeledDomain
protected RewardFunction modeledRewardFunction
protected TerminalFunction modeledTerminalFunction
protected ModelPlanner modelPlanner
protected int maxNumSteps
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int numEpisodesToStore
public PotentialShapedRMax(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxReward, int nConfident, double maxVIDelta, int maxVIPasses)
domain
- the real world domainrf
- the real world reward functiontf
- the real world terminal functiongamma
- the discount factorhashingFactory
- the hashing factory to use for VI and the tabular modelmaxReward
- the maximum possible rewardnConfident
- the number of observations required for the model to be confident in a transitionmaxVIDelta
- the maximum change in value function for VI to terminatemaxVIPasses
- the maximum number of VI iterations per replan.public PotentialShapedRMax(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, PotentialFunction potential, int nConfident, double maxVIDelta, int maxVIPasses)
domain
- the real world domainrf
- the real world reward functiontf
- the real world terminal functiongamma
- the discount factorhashingFactory
- the hashing factory to use for VI and the tabular modelpotential
- the admissible potential functionnConfident
- the number of observations required for the model to be confident in a transitionmaxVIDelta
- the maximum change in value function for VI to terminatemaxVIPasses
- the maximum number of VI iterations per replan.public PotentialShapedRMax(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, PotentialFunction potential, Model model, ModelPlanner.ModelPlannerGenerator plannerGenerator)
domain
- the real world domainrf
- the real world reward functiontf
- the real world terminal functiongamma
- the discount factorhashingFactory
- a state hashing factory for indexing statespotential
- the admissible potential functionmodel
- the model/model-learning algorithm to useplannerGenerator
- a generator for a model plannerpublic Model getModel()
public Domain getModeledDomain()
public ModelPlanner getModelPlanner()
public RewardFunction getModeledRewardFunction()
public TerminalFunction getModeledTerminalFunction()
public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
LearningAgent
runLearningEpisodeFrom
in interface LearningAgent
initialState
- The initial state in which the agent will start the episode.EpisodeAnalysis
object.public EpisodeAnalysis runLearningEpisodeFrom(State initialState, int maxSteps)
LearningAgent
runLearningEpisodeFrom
in interface LearningAgent
initialState
- The initial state in which the agent will start the episode.maxSteps
- the maximum number of steps in the episodeEpisodeAnalysis
object.protected Policy createDomainMappedPolicy()
public EpisodeAnalysis getLastLearningEpisode()
LearningAgent
getLastLearningEpisode
in interface LearningAgent
public void setNumEpisodesToStore(int numEps)
LearningAgent
EpisodeAnalysis
objects representing learning episodes to internally store.
For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number
has nothing to do with how learning is performed; it is purely for performance gathering.setNumEpisodesToStore
in interface LearningAgent
numEps
- the number of learning episodes to remember.public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
LearningAgent
EpisodeAnalysis
objects of which the agent has kept track.getAllStoredLearningEpisodes
in interface LearningAgent
EpisodeAnalysis
objects of which the agent has kept track.public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class OOMDPPlanner
initialState
- the initial state of the planning problempublic void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class OOMDPPlanner