public class ARTDP extends OOMDPPlanner implements QComputablePlanner, LearningAgent
setPolicy(PlannerDerivedPolicy)
). The Q-value assigned to state-action pairs for entirely untried
transitions is reproted as that returned by the value function initializer provided. In general, value function initialization should always be optimistic.
1.Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.Modifier and Type | Class and Description |
---|---|
protected class |
ARTDP.ARTDPPlanner
The value funciton planner that operates on the modeled world.
|
QComputablePlanner.QComputablePlannerHelper
LearningAgent.LearningAgentBookKeeping
Modifier and Type | Field and Description |
---|---|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
the saved previous learning episodes
|
protected int |
maxNumSteps
The maximum number of learning steps per episode before the agent gives up
|
protected Model |
model
The model of the world that is being learned.
|
protected ValueFunctionPlanner |
modelPlanner
The planner used on the modeled world to update the value function
|
protected int |
numEpisodesToStore
The number of the most recent learning episodes to store.
|
protected Policy |
policy
the policy to follow
|
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
ARTDP(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double vInit)
Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
|
ARTDP(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
Model model,
ValueFunctionInitialization vInit)
Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1.
|
ARTDP(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
ValueFunctionInitialization vInit)
Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
|
Modifier and Type | Method and Description |
---|---|
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes()
Returns all saved
EpisodeAnalysis objects of which the agent has kept track. |
EpisodeAnalysis |
getLastLearningEpisode()
Returns the last learning episode of the agent.
|
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState)
Causes the agent to perform a learning episode starting in the given initial state.
|
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState,
int maxSteps)
Causes the agent to perform a learning episode starting in the given initial state.
|
void |
setNumEpisodesToStore(int numEps)
Tells the agent how many
EpisodeAnalysis objects representing learning episodes to internally store. |
void |
setPolicy(PlannerDerivedPolicy policy)
Sets the policy to the provided one.
|
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected Model model
protected ValueFunctionPlanner modelPlanner
protected Policy policy
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int maxNumSteps
protected int numEpisodesToStore
public ARTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double vInit)
domain
- the domainrf
- the reward functiontf
- the termianl functiongamma
- the discount factorhashingFactory
- the state hashing factory to use for the tabular model and the planningvInit
- the constant value function initialization to use; should be optimisitc.public ARTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization vInit)
domain
- the domainrf
- the reward functiontf
- the termianl functiongamma
- the discount factorhashingFactory
- the state hashing factory to use for the tabular model and the planningvInit
- the value function initialization to use; should be optimisitc.public ARTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, Model model, ValueFunctionInitialization vInit)
domain
- the domainrf
- the reward functiontf
- the termianl functiongamma
- the discount factorhashingFactory
- the state hashing factory to use for the tabular model and the planningmodel
- the model algorithm to usevInit
- the constant value function initialization to use; should be optimisitc.public void setPolicy(PlannerDerivedPolicy policy)
QComputablePlanner
. Will automatically set its
Q-source to this object.policy
- the policy to use.public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
LearningAgent
runLearningEpisodeFrom
in interface LearningAgent
initialState
- The initial state in which the agent will start the episode.EpisodeAnalysis
object.public EpisodeAnalysis runLearningEpisodeFrom(State initialState, int maxSteps)
LearningAgent
runLearningEpisodeFrom
in interface LearningAgent
initialState
- The initial state in which the agent will start the episode.maxSteps
- the maximum number of steps in the episodeEpisodeAnalysis
object.public EpisodeAnalysis getLastLearningEpisode()
LearningAgent
getLastLearningEpisode
in interface LearningAgent
public void setNumEpisodesToStore(int numEps)
LearningAgent
EpisodeAnalysis
objects representing learning episodes to internally store.
For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number
has nothing to do with how learning is performed; it is purely for performance gathering.setNumEpisodesToStore
in interface LearningAgent
numEps
- the number of learning episodes to remember.public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
LearningAgent
EpisodeAnalysis
objects of which the agent has kept track.getAllStoredLearningEpisodes
in interface LearningAgent
EpisodeAnalysis
objects of which the agent has kept track.public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class OOMDPPlanner
initialState
- the initial state of the planning problempublic java.util.List<QValue> getQs(State s)
QComputablePlanner
List
of QValue
objects for ever permissible action for the given input state.getQs
in interface QComputablePlanner
s
- the state for which Q-values are to be returned.List
of QValue
objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QComputablePlanner
QValue
for the given state-action pair.getQ
in interface QComputablePlanner
s
- the input statea
- the input actionQValue
for the given state-action pair.public void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class OOMDPPlanner