public class ARTDP extends MDPSolver implements QProvider, LearningAgent
setPolicy(burlap.behavior.policy.SolverDerivedPolicy)
). The Q-value assigned to state-action pairs for entirely untried
transitions is reported as that returned by the value function initializer provided. In general, value function initialization should always be optimistic.
1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected java.util.LinkedList<Episode> |
episodeHistory
the saved previous learning episodes
|
protected int |
maxNumSteps
The maximum number of learning steps per episode before the agent gives up
|
protected LearnedModel |
model
The model of the world that is being learned.
|
protected DynamicProgramming |
modelPlanner
The valueFunction used on the modeled world to update the value function
|
protected int |
numEpisodesToStore
The number of the most recent learning episodes to store.
|
protected Policy |
policy
the policy to follow
|
actionTypes, debugCode, domain, gamma, hashingFactory, usingOptionModel
Constructor and Description |
---|
ARTDP(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double vInit)
Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
|
ARTDP(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
LearnedModel model,
ValueFunction vInit)
Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1.
|
ARTDP(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
ValueFunction vInit)
Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
|
Modifier and Type | Method and Description |
---|---|
java.util.List<Episode> |
getAllStoredLearningEpisodes() |
Episode |
getLastLearningEpisode() |
double |
qValue(State s,
Action a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
qValues(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
Episode |
runLearningEpisode(Environment env) |
Episode |
runLearningEpisode(Environment env,
int maxSteps) |
void |
setNumEpisodesToStore(int numEps) |
void |
setPolicy(SolverDerivedPolicy policy)
Sets the policy to the provided one.
|
double |
value(State s)
Returns the value function evaluation of the given state.
|
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
protected LearnedModel model
protected DynamicProgramming modelPlanner
protected Policy policy
protected java.util.LinkedList<Episode> episodeHistory
protected int maxNumSteps
protected int numEpisodesToStore
public ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, double vInit)
domain
- the domaingamma
- the discount factorhashingFactory
- the state hashing factory to use for the tabular model and the planningvInit
- the constant value function initialization to use; should be optimisitc.public ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, ValueFunction vInit)
domain
- the domaingamma
- the discount factorhashingFactory
- the state hashing factory to use for the tabular model and the planningvInit
- the value function initialization to use; should be optimisitc.public ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, LearnedModel model, ValueFunction vInit)
domain
- the domaingamma
- the discount factorhashingFactory
- the state hashing factory to use for the tabular model and the planningmodel
- the model algorithm to usevInit
- the constant value function initialization to use; should be optimisitc.public void setPolicy(SolverDerivedPolicy policy)
QProvider
. Will automatically set its
Q-source to this object.policy
- the policy to use.public Episode runLearningEpisode(Environment env)
runLearningEpisode
in interface LearningAgent
public Episode runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
public Episode getLastLearningEpisode()
public void setNumEpisodesToStore(int numEps)
public java.util.List<Episode> getAllStoredLearningEpisodes()
public java.util.List<QValue> qValues(State s)
QProvider
List
of QValue
objects for ever permissible action for the given input state.public double qValue(State s, Action a)
QFunction
QValue
for the given state-action pair.public double value(State s)
ValueFunction
value
in interface ValueFunction
s
- the state to evaluate.public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver