public class ARTDP extends MDPSolver implements QFunction, LearningAgent
setPolicy(burlap.behavior.policy.SolverDerivedPolicy)). The Q-value assigned to state-action pairs for entirely untried
transitions is reported as that returned by the value function initializer provided. In general, value function initialization should always be optimistic.
1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.QFunction.QFunctionHelper| Modifier and Type | Field and Description |
|---|---|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
the saved previous learning episodes
|
protected int |
maxNumSteps
The maximum number of learning steps per episode before the agent gives up
|
protected Model |
model
The model of the world that is being learned.
|
protected DynamicProgramming |
modelPlanner
The valueFunction used on the modeled world to update the value function
|
protected int |
numEpisodesToStore
The number of the most recent learning episodes to store.
|
protected Policy |
policy
the policy to follow
|
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description |
|---|
ARTDP(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
double vInit)
Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
|
ARTDP(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
Model model,
ValueFunctionInitialization vInit)
Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1.
|
ARTDP(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
ValueFunctionInitialization vInit)
Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
|
| Modifier and Type | Method and Description |
|---|---|
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes() |
EpisodeAnalysis |
getLastLearningEpisode() |
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
EpisodeAnalysis |
runLearningEpisode(Environment env) |
EpisodeAnalysis |
runLearningEpisode(Environment env,
int maxSteps) |
void |
setNumEpisodesToStore(int numEps) |
void |
setPolicy(SolverDerivedPolicy policy)
Sets the policy to the provided one.
|
double |
value(State s)
Returns the value function evaluation of the given state.
|
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateActionprotected Model model
protected DynamicProgramming modelPlanner
protected Policy policy
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int maxNumSteps
protected int numEpisodesToStore
public ARTDP(Domain domain, double gamma, HashableStateFactory hashingFactory, double vInit)
domain - the domaingamma - the discount factorhashingFactory - the state hashing factory to use for the tabular model and the planningvInit - the constant value function initialization to use; should be optimisitc.public ARTDP(Domain domain, double gamma, HashableStateFactory hashingFactory, ValueFunctionInitialization vInit)
domain - the domaingamma - the discount factorhashingFactory - the state hashing factory to use for the tabular model and the planningvInit - the value function initialization to use; should be optimisitc.public ARTDP(Domain domain, double gamma, HashableStateFactory hashingFactory, Model model, ValueFunctionInitialization vInit)
domain - the domaingamma - the discount factorhashingFactory - the state hashing factory to use for the tabular model and the planningmodel - the model algorithm to usevInit - the constant value function initialization to use; should be optimisitc.public void setPolicy(SolverDerivedPolicy policy)
QFunction. Will automatically set its
Q-source to this object.policy - the policy to use.public EpisodeAnalysis runLearningEpisode(Environment env)
runLearningEpisode in interface LearningAgentpublic EpisodeAnalysis runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode in interface LearningAgentpublic EpisodeAnalysis getLastLearningEpisode()
public void setNumEpisodesToStore(int numEps)
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
public java.util.List<QValue> getQs(State s)
QFunctionList of QValue objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QFunctionQValue for the given state-action pair.public double value(State s)
ValueFunctionvalue in interface ValueFunctions - the state to evaluate.public void resetSolver()
MDPSolverInterfaceresetSolver in interface MDPSolverInterfaceresetSolver in class MDPSolver