public class RTDP extends ValueFunctionPlanner
ValueFunctionPlanner.StaticVFPlanner
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected double |
maxDelta
When the maximum change in the value function from a rollout is smaller than this value, VI will terminate.
|
protected int |
maxDepth
The maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
|
protected int |
minNumRolloutsWithSmallValueChange
RTDP will be delcared "converged" if there are this many consecutive policy rollouts in which the value function change is smaller than the maxDelta value.
|
protected int |
numberOfBellmanUpdates
Stores the number of Bellman updates made across all planning.
|
protected int |
numRollouts
the number of rollouts to perform when planning is started unless the value function delta is small enough.
|
protected Policy |
rollOutPolicy
The policy to use for episode rollouts
|
protected boolean |
useBatch
If set to use batch mode; Bellman updates will be stalled until a rollout is complete and then run in reverse.
|
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
RTDP(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double vInit,
int numRollouts,
double maxDelta,
int maxDepth)
Initializes the planner.
|
RTDP(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
ValueFunctionInitialization vInit,
int numRollouts,
double maxDelta,
int maxDepth)
Initializes the planner.
|
Modifier and Type | Method and Description |
---|---|
protected void |
batchRTDP(State initialState)
Performs Bellman updates only after a rollout is complete and in reverse order
|
int |
getNumberOfBellmanUpdates()
Returns the total number of Bellman updates across all planning
|
protected void |
normalRTDP(State initialState)
Runs normal RTDP in which bellman updates are performed after each action selection.
|
protected double |
performOrderedBellmanUpdates(java.util.List<StateHashTuple> states)
Performs ordered Bellman updates on the list of (hashed) states provided to it.
|
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
setMaxDelta(double delta)
Sets the maximum delta state value update in a rollout that will cause planning to terminate
|
void |
setMaxDynamicDepth(int d)
Sets the maximum depth of a rollout to use until it is prematurely temrinated to update the value function.
|
void |
setMinNumRolloutsWithSmallValueChange(int nRollsouts)
Sets the minimum number of consecutive rollsouts with a value function change less than the maxDelta value that will cause RTDP
to stop.
|
void |
setNumPasses(int p)
Sets the number of rollouts to perform when planning is started (unless the value function delta is small enough).
|
void |
setRollOutPolicy(Policy p)
Sets the rollout policy to use.
|
void |
toggleBatchMode(boolean useBatch)
When batch mode is set, Bellman updates will be stalled until a roll out is complete and then run in reverse.
|
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, resetPlannerResults, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected Policy rollOutPolicy
protected int numRollouts
protected double maxDelta
protected int maxDepth
protected int minNumRolloutsWithSmallValueChange
protected boolean useBatch
protected int numberOfBellmanUpdates
public RTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth)
ValueFunctionPlanner.setValueFunctionInitialization(ValueFunctionInitialization)
method
to change the value function initialization and the setRollOutPolicy(Policy)
method to change the rollout policy to something else. vInit
should be set to something optimistic like VMax to ensure convergence.domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factor to usevInit
- the value to the the value function for all states will be initializednumRollouts
- the number of rollouts to perform when planning is started.maxDelta
- when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.maxDepth
- the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.public RTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization vInit, int numRollouts, double maxDelta, int maxDepth)
ValueFunctionPlanner.setValueFunctionInitialization(ValueFunctionInitialization)
method
to change the value function initialization and the setRollOutPolicy(Policy)
method to change the rollout policy to something else. vInit
should be set to something optimistic like VMax to ensure convergence.domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factor to usevInit
- the object which defines how the value function will be initialized for each individual state.numRollouts
- the number of rollouts to perform when planning is started.maxDelta
- when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.maxDepth
- the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.public void setNumPasses(int p)
p
- the number of passespublic void setMaxDelta(double delta)
delta
- the max deltapublic void setRollOutPolicy(Policy p)
p
- the rollout policy to usepublic void setMaxDynamicDepth(int d)
d
- the maximum depth of a rollout.public void setMinNumRolloutsWithSmallValueChange(int nRollsouts)
nRollsouts
- the minimum number of consecutive rollouts required.public void toggleBatchMode(boolean useBatch)
useBatch
- whether to use batchmode RTDP or not.public int getNumberOfBellmanUpdates()
public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class ValueFunctionPlanner
initialState
- the initial state of the planning problemprotected void normalRTDP(State initialState)
initialState
- the initial state from which to planprotected void batchRTDP(State initialState)
initialState
- the initial state from which to planprotected double performOrderedBellmanUpdates(java.util.List<StateHashTuple> states)
states
- the ordered list of states on which to perform Bellamn updates.