public class RTDP extends DynamicProgramming implements Planner
To ensure optimality, an optimistic value function initialization should be used. However, RTDP excels when a good value function initialization (e.g., an admissible heuristic) can be provided. 1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected double |
maxDelta
When the maximum change in the value function from a rollout is smaller than this value, VI will terminate.
|
protected int |
maxDepth
The maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
|
protected int |
minNumRolloutsWithSmallValueChange
RTDP will be delcared "converged" if there are this many consecutive policy rollouts in which the value function change is smaller than the maxDelta value.
|
protected int |
numberOfBellmanUpdates
Stores the number of Bellman updates made across all planning.
|
protected int |
numRollouts
the number of rollouts to perform when planning is started unless the value function delta is small enough.
|
protected Policy |
rollOutPolicy
The policy to use for episode rollouts
|
protected boolean |
useBatch
If set to use batch mode; Bellman updates will be stalled until a rollout is complete and then run in reverse.
|
operator, valueFunction, valueInitializer
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
RTDP(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double vInit,
int numRollouts,
double maxDelta,
int maxDepth)
Initializes.
|
RTDP(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
ValueFunction vInit,
int numRollouts,
double maxDelta,
int maxDepth)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
protected void |
batchRTDP(State initialState)
Performs Bellman updates only after a rollout is complete and in reverse order
|
int |
getNumberOfBellmanUpdates()
Returns the total number of Bellman updates across all planning
|
protected void |
normalRTDP(State initialState)
Runs normal RTDP in which bellman updates are performed after each action selection.
|
protected double |
performOrderedBellmanUpdates(java.util.List<HashableState> states)
Performs ordered Bellman updates on the list of (hashed) states provided to it.
|
GreedyQPolicy |
planFromState(State initialState)
Plans from the input state and then returns a
GreedyQPolicy that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly. |
void |
setMaxDelta(double delta)
Sets the maximum delta state value update in a rollout that will cause planning to terminate
|
void |
setMaxDynamicDepth(int d)
Sets the maximum depth of a rollout to use until it is prematurely temrinated to update the value function.
|
void |
setMinNumRolloutsWithSmallValueChange(int nRollsouts)
Sets the minimum number of consecutive rollsouts with a value function change less than the maxDelta value that will cause RTDP
to stop.
|
void |
setNumPasses(int p)
Sets the number of rollouts to perform when planning is started (unless the value function delta is small enough).
|
void |
setRollOutPolicy(Policy p)
Sets the rollout policy to use.
|
void |
toggleBatchMode(boolean useBatch)
When batch mode is set, Bellman updates will be stalled until a roll out is complete and then run in reverse.
|
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, resetSolver, setOperator, setValueFunctionInitialization, value, value, writeValueTable
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, resetSolver, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
protected Policy rollOutPolicy
protected int numRollouts
protected double maxDelta
protected int maxDepth
protected int minNumRolloutsWithSmallValueChange
protected boolean useBatch
protected int numberOfBellmanUpdates
public RTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth)
DynamicProgramming.setValueFunctionInitialization(burlap.behavior.valuefunction.ValueFunction)
method
to change the value function initialization and the setRollOutPolicy(Policy)
method to change the rollout policy to something else. vInit
should be set to something optimistic like VMax to ensure convergence.domain
- the domain in which to plangamma
- the discount factorhashingFactory
- the state hashing factor to usevInit
- the value to the the value function for all states will be initializednumRollouts
- the number of rollouts to perform when planning is started.maxDelta
- when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.maxDepth
- the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.public RTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, ValueFunction vInit, int numRollouts, double maxDelta, int maxDepth)
DynamicProgramming.setValueFunctionInitialization(burlap.behavior.valuefunction.ValueFunction)
method
to change the value function initialization and the setRollOutPolicy(Policy)
method to change the rollout policy to something else. vInit
should be set to something optimistic like VMax to ensure convergence.domain
- the domain in which to plangamma
- the discount factorhashingFactory
- the state hashing factor to usevInit
- the object which defines how the value function will be initialized for each individual state.numRollouts
- the number of rollouts to perform when planning is started.maxDelta
- when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.maxDepth
- the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.public void setNumPasses(int p)
p
- the number of passespublic void setMaxDelta(double delta)
delta
- the max deltapublic void setRollOutPolicy(Policy p)
p
- the rollout policy to usepublic void setMaxDynamicDepth(int d)
d
- the maximum depth of a rollout.public void setMinNumRolloutsWithSmallValueChange(int nRollsouts)
nRollsouts
- the minimum number of consecutive rollouts required.public void toggleBatchMode(boolean useBatch)
useBatch
- whether to use batchmode RTDP or not.public int getNumberOfBellmanUpdates()
public GreedyQPolicy planFromState(State initialState)
GreedyQPolicy
that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly.planFromState
in interface Planner
initialState
- the initial state of the planning problemGreedyQPolicy
.protected void normalRTDP(State initialState)
initialState
- the initial state from which to planprotected void batchRTDP(State initialState)
initialState
- the initial state from which to planprotected double performOrderedBellmanUpdates(java.util.List<HashableState> states)
states
- the ordered list of states on which to perform Bellamn updates.