RTDP

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.planning.ValueFunctionPlanner
  - - burlap.behavior.singleagent.planning.stochastic.rtdp.RTDP

All Implemented Interfaces:

QComputablePlanner, ValueFunction

Direct Known Subclasses:

BFSRTDP
```
public class RTDP
extends ValueFunctionPlanner
```
Implementation of Real-time dynamic programming [1]. The planning algorithm uses a Q-value derived policy to sample rollouts in the domain. During each step of the rollout, the current state has its value updated using the Bellman operator and the action for the current state is selected using a greedy Q policy in which ties are randomly broken. Alternatively, this algorithm may be set to batch mode. In batch mode, all Bellman updates are stalled until rollout is complete, after which the Bellman update is performed on each state that was visited in reverse.
To ensure optimality, an optimistic value function initialization should be used. However, RTDP excels when a good value function initialization (e.g., an admissible heuristic) can be provided. 1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
  ValueFunctionPlanner.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`maxDelta` When the maximum change in the value function from a rollout is smaller than this value, VI will terminate.
`protected int`	`maxDepth` The maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
`protected int`	`minNumRolloutsWithSmallValueChange` RTDP will be delcared "converged" if there are this many consecutive policy rollouts in which the value function change is smaller than the maxDelta value.
`protected int`	`numberOfBellmanUpdates` Stores the number of Bellman updates made across all planning.
`protected int`	`numRollouts` the number of rollouts to perform when planning is started unless the value function delta is small enough.
`protected Policy`	`rollOutPolicy` The policy to use for episode rollouts
`protected boolean`	`useBatch` If set to use batch mode; Bellman updates will be stalled until a rollout is complete and then run in reverse.

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`RTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth)` Initializes the planner.
`RTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization vInit, int numRollouts, double maxDelta, int maxDepth)` Initializes the planner.

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`batchRTDP(State initialState)` Performs Bellman updates only after a rollout is complete and in reverse order
`int`	`getNumberOfBellmanUpdates()` Returns the total number of Bellman updates across all planning
`protected void`	`normalRTDP(State initialState)` Runs normal RTDP in which bellman updates are performed after each action selection.
`protected double`	`performOrderedBellmanUpdates(java.util.List<StateHashTuple> states)` Performs ordered Bellman updates on the list of (hashed) states provided to it.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`setMaxDelta(double delta)` Sets the maximum delta state value update in a rollout that will cause planning to terminate
`void`	`setMaxDynamicDepth(int d)` Sets the maximum depth of a rollout to use until it is prematurely temrinated to update the value function.
`void`	`setMinNumRolloutsWithSmallValueChange(int nRollsouts)` Sets the minimum number of consecutive rollsouts with a value function change less than the maxDelta value that will cause RTDP to stop.
`void`	`setNumPasses(int p)` Sets the number of rollouts to perform when planning is started (unless the value function delta is small enough).
`void`	`setRollOutPolicy(Policy p)` Sets the rollout policy to use.
`void`	`toggleBatchMode(boolean useBatch)` When batch mode is set, Bellman updates will be stalled until a roll out is complete and then run in reverse.

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, resetPlannerResults, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - rollOutPolicy
```
protected Policy rollOutPolicy
```
    The policy to use for episode rollouts
  - numRollouts
```
protected int numRollouts
```
    the number of rollouts to perform when planning is started unless the value function delta is small enough.
  - maxDelta
```
protected double maxDelta
```
    When the maximum change in the value function from a rollout is smaller than this value, VI will terminate.
  - maxDepth
```
protected int maxDepth
```
    The maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
  - minNumRolloutsWithSmallValueChange
```
protected int minNumRolloutsWithSmallValueChange
```
    RTDP will be delcared "converged" if there are this many consecutive policy rollouts in which the value function change is smaller than the maxDelta value. The default value is 10.
  - useBatch
```
protected boolean useBatch
```
    If set to use batch mode; Bellman updates will be stalled until a rollout is complete and then run in reverse.
  - numberOfBellmanUpdates
```
protected int numberOfBellmanUpdates
```
    Stores the number of Bellman updates made across all planning.
- Constructor Detail
  - RTDP
```
public RTDP(Domain domain,
    RewardFunction rf,
    TerminalFunction tf,
    double gamma,
    StateHashFactory hashingFactory,
    double vInit,
    int numRollouts,
    double maxDelta,
    int maxDepth)
```
    Initializes the planner. The value function will be initialized to vInit by default everywhere and will use a greedy policy with random tie breaks for performing rollouts. Use the ValueFunctionPlanner.setValueFunctionInitialization(ValueFunctionInitialization) method to change the value function initialization and the setRollOutPolicy(Policy) method to change the rollout policy to something else. vInit should be set to something optimistic like VMax to ensure convergence.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factor to use
    vInit - the value to the the value function for all states will be initialized
    numRollouts - the number of rollouts to perform when planning is started.
    maxDelta - when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.
    maxDepth - the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
  - RTDP
```
public RTDP(Domain domain,
    RewardFunction rf,
    TerminalFunction tf,
    double gamma,
    StateHashFactory hashingFactory,
    ValueFunctionInitialization vInit,
    int numRollouts,
    double maxDelta,
    int maxDepth)
```
    Initializes the planner. The value function will be initialized to vInit by default everywhere and will use a greedy policy with random tie breaks for performing rollouts. Use the ValueFunctionPlanner.setValueFunctionInitialization(ValueFunctionInitialization) method to change the value function initialization and the setRollOutPolicy(Policy) method to change the rollout policy to something else. vInit should be set to something optimistic like VMax to ensure convergence.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factor to use
    vInit - the object which defines how the value function will be initialized for each individual state.
    numRollouts - the number of rollouts to perform when planning is started.
    maxDelta - when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.
    maxDepth - the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
- Method Detail
  - setNumPasses
```
public void setNumPasses(int p)
```
    Sets the number of rollouts to perform when planning is started (unless the value function delta is small enough).
    
    Parameters:
    p - the number of passes
  - setMaxDelta
```
public void setMaxDelta(double delta)
```
    Sets the maximum delta state value update in a rollout that will cause planning to terminate
    
    Parameters:
    delta - the max delta
  - setRollOutPolicy
```
public void setRollOutPolicy(Policy p)
```
    Sets the rollout policy to use.
    
    Parameters:
    p - the rollout policy to use
  - setMaxDynamicDepth
```
public void setMaxDynamicDepth(int d)
```
    Sets the maximum depth of a rollout to use until it is prematurely temrinated to update the value function.
    
    Parameters:
    d - the maximum depth of a rollout.
  - setMinNumRolloutsWithSmallValueChange
```
public void setMinNumRolloutsWithSmallValueChange(int nRollsouts)
```
    Sets the minimum number of consecutive rollsouts with a value function change less than the maxDelta value that will cause RTDP to stop.
    
    Parameters:
    nRollsouts - the minimum number of consecutive rollouts required.
  - toggleBatchMode
```
public void toggleBatchMode(boolean useBatch)
```
    When batch mode is set, Bellman updates will be stalled until a roll out is complete and then run in reverse.
    
    Parameters:
    useBatch - whether to use batchmode RTDP or not.
  - getNumberOfBellmanUpdates
```
public int getNumberOfBellmanUpdates()
```
    Returns the total number of Bellman updates across all planning
    
    Returns:
    the total number of Bellman updates across all planning
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class ValueFunctionPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - normalRTDP
```
protected void normalRTDP(State initialState)
```
    Runs normal RTDP in which bellman updates are performed after each action selection.
    
    Parameters:
    initialState - the initial state from which to plan
  - batchRTDP
```
protected void batchRTDP(State initialState)
```
    Performs Bellman updates only after a rollout is complete and in reverse order
    
    Parameters:
    initialState - the initial state from which to plan
  - performOrderedBellmanUpdates
```
protected double performOrderedBellmanUpdates(java.util.List<StateHashTuple> states)
```
    Performs ordered Bellman updates on the list of (hashed) states provided to it.
    
    Parameters:
    states - the ordered list of states on which to perform Bellamn updates.
    
    Returns:
    the maximum change in the value function for the given states

Class RTDP

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

rollOutPolicy

numRollouts

maxDelta

maxDepth

minNumRolloutsWithSmallValueChange

useBatch

numberOfBellmanUpdates

Constructor Detail

RTDP

RTDP

Method Detail

setNumPasses

setMaxDelta

setRollOutPolicy

setMaxDynamicDepth

setMinNumRolloutsWithSmallValueChange

toggleBatchMode

getNumberOfBellmanUpdates

planFromState

normalRTDP

batchRTDP

performOrderedBellmanUpdates