RTDP

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.planning.stochastic.rtdp.RTDP

All Implemented Interfaces:

MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction
```
public class RTDP
extends DynamicProgramming
implements Planner
```
Implementation of Real-time dynamic programming [1]. The planning algorithm uses a Q-value derived policy to sample rollouts in the domain. During each step of the rollout, the current state has its value updated using the Bellman operator and the action for the current state is selected using a greedy Q policy in which ties are randomly broken. Alternatively, this algorithm may be set to batch mode. In batch mode, all Bellman updates are stalled until rollout is complete, after which the Bellman update is performed on each state that was visited in reverse.
To ensure optimality, an optimistic value function initialization should be used. However, RTDP excels when a good value function initialization (e.g., an admissible heuristic) can be provided. 1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`maxDelta` When the maximum change in the value function from a rollout is smaller than this value, VI will terminate.
`protected int`	`maxDepth` The maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
`protected int`	`minNumRolloutsWithSmallValueChange` RTDP will be delcared "converged" if there are this many consecutive policy rollouts in which the value function change is smaller than the maxDelta value.
`protected int`	`numberOfBellmanUpdates` Stores the number of Bellman updates made across all planning.
`protected int`	`numRollouts` the number of rollouts to perform when planning is started unless the value function delta is small enough.
`protected Policy`	`rollOutPolicy` The policy to use for episode rollouts
`protected boolean`	`useBatch` If set to use batch mode; Bellman updates will be stalled until a rollout is complete and then run in reverse.

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
operator, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`RTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth)` Initializes.
`RTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, ValueFunction vInit, int numRollouts, double maxDelta, int maxDepth)` Initializes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`batchRTDP(State initialState)` Performs Bellman updates only after a rollout is complete and in reverse order
`int`	`getNumberOfBellmanUpdates()` Returns the total number of Bellman updates across all planning
`protected void`	`normalRTDP(State initialState)` Runs normal RTDP in which bellman updates are performed after each action selection.
`protected double`	`performOrderedBellmanUpdates(java.util.List<HashableState> states)` Performs ordered Bellman updates on the list of (hashed) states provided to it.
`GreedyQPolicy`	`planFromState(State initialState)` Plans from the input state and then returns a `GreedyQPolicy` that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
`void`	`setMaxDelta(double delta)` Sets the maximum delta state value update in a rollout that will cause planning to terminate
`void`	`setMaxDynamicDepth(int d)` Sets the maximum depth of a rollout to use until it is prematurely temrinated to update the value function.
`void`	`setMinNumRolloutsWithSmallValueChange(int nRollsouts)` Sets the minimum number of consecutive rollsouts with a value function change less than the maxDelta value that will cause RTDP to stop.
`void`	`setNumPasses(int p)` Sets the number of rollouts to perform when planning is started (unless the value function delta is small enough).
`void`	`setRollOutPolicy(Policy p)` Sets the rollout policy to use.
`void`	`toggleBatchMode(boolean useBatch)` When batch mode is set, Bellman updates will be stalled until a roll out is complete and then run in reverse.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, resetSolver, setOperator, setValueFunctionInitialization, value, value, writeValueTable

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, resetSolver, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

- Field Detail
  - rollOutPolicy
```
protected Policy rollOutPolicy
```
    The policy to use for episode rollouts
  - numRollouts
```
protected int numRollouts
```
    the number of rollouts to perform when planning is started unless the value function delta is small enough.
  - maxDelta
```
protected double maxDelta
```
    When the maximum change in the value function from a rollout is smaller than this value, VI will terminate.
  - maxDepth
```
protected int maxDepth
```
    The maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
  - minNumRolloutsWithSmallValueChange
```
protected int minNumRolloutsWithSmallValueChange
```
    RTDP will be delcared "converged" if there are this many consecutive policy rollouts in which the value function change is smaller than the maxDelta value. The default value is 10.
  - useBatch
```
protected boolean useBatch
```
    If set to use batch mode; Bellman updates will be stalled until a rollout is complete and then run in reverse.
  - numberOfBellmanUpdates
```
protected int numberOfBellmanUpdates
```
    Stores the number of Bellman updates made across all planning.
- Constructor Detail
  - RTDP
```
public RTDP(SADomain domain,
            double gamma,
            HashableStateFactory hashingFactory,
            double vInit,
            int numRollouts,
            double maxDelta,
            int maxDepth)
```
    Initializes. The value function will be initialized to vInit by default everywhere and will use a greedy policy with random tie breaks for performing rollouts. Use the DynamicProgramming.setValueFunctionInitialization(burlap.behavior.valuefunction.ValueFunction) method to change the value function initialization and the setRollOutPolicy(Policy) method to change the rollout policy to something else. vInit should be set to something optimistic like VMax to ensure convergence.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factor to use
    
    vInit - the value to the the value function for all states will be initialized
    
    numRollouts - the number of rollouts to perform when planning is started.
    
    maxDelta - when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.
    
    maxDepth - the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
  - RTDP
```
public RTDP(SADomain domain,
            double gamma,
            HashableStateFactory hashingFactory,
            ValueFunction vInit,
            int numRollouts,
            double maxDelta,
            int maxDepth)
```
    Initializes. The value function will be initialized to vInit by default everywhere and will use a greedy policy with random tie breaks for performing rollouts. Use the DynamicProgramming.setValueFunctionInitialization(burlap.behavior.valuefunction.ValueFunction) method to change the value function initialization and the setRollOutPolicy(Policy) method to change the rollout policy to something else. vInit should be set to something optimistic like VMax to ensure convergence.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factor to use
    
    vInit - the object which defines how the value function will be initialized for each individual state.
    
    numRollouts - the number of rollouts to perform when planning is started.
    
    maxDelta - when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.
    
    maxDepth - the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
- Method Detail
  - setNumPasses
```
public void setNumPasses(int p)
```
    Sets the number of rollouts to perform when planning is started (unless the value function delta is small enough).
    
    Parameters:
    
    p - the number of passes
  - setMaxDelta
```
public void setMaxDelta(double delta)
```
    Sets the maximum delta state value update in a rollout that will cause planning to terminate
    
    Parameters:
    
    delta - the max delta
  - setRollOutPolicy
```
public void setRollOutPolicy(Policy p)
```
    Sets the rollout policy to use.
    
    Parameters:
    
    p - the rollout policy to use
  - setMaxDynamicDepth
```
public void setMaxDynamicDepth(int d)
```
    Sets the maximum depth of a rollout to use until it is prematurely temrinated to update the value function.
    
    Parameters:
    
    d - the maximum depth of a rollout.
  - setMinNumRolloutsWithSmallValueChange
```
public void setMinNumRolloutsWithSmallValueChange(int nRollsouts)
```
    Sets the minimum number of consecutive rollsouts with a value function change less than the maxDelta value that will cause RTDP to stop.
    
    Parameters:
    
    nRollsouts - the minimum number of consecutive rollouts required.
  - toggleBatchMode
```
public void toggleBatchMode(boolean useBatch)
```
    When batch mode is set, Bellman updates will be stalled until a roll out is complete and then run in reverse.
    
    Parameters:
    
    useBatch - whether to use batchmode RTDP or not.
  - getNumberOfBellmanUpdates
```
public int getNumberOfBellmanUpdates()
```
    Returns the total number of Bellman updates across all planning
    
    Returns:
    
    the total number of Bellman updates across all planning
  - planFromState
```
public GreedyQPolicy planFromState(State initialState)
```
    Plans from the input state and then returns a GreedyQPolicy that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    
    initialState - the initial state of the planning problem
    
    Returns:
    
    a GreedyQPolicy.
  - normalRTDP
```
protected void normalRTDP(State initialState)
```
    Runs normal RTDP in which bellman updates are performed after each action selection.
    
    Parameters:
    
    initialState - the initial state from which to plan
  - batchRTDP
```
protected void batchRTDP(State initialState)
```
    Performs Bellman updates only after a rollout is complete and in reverse order
    
    Parameters:
    
    initialState - the initial state from which to plan
  - performOrderedBellmanUpdates
```
protected double performOrderedBellmanUpdates(java.util.List<HashableState> states)
```
    Performs ordered Bellman updates on the list of (hashed) states provided to it.
    
    Parameters:
    
    states - the ordered list of states on which to perform Bellamn updates.
    
    Returns:
    
    the maximum change in the value function for the given states

Class RTDP

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

rollOutPolicy

numRollouts

maxDelta

maxDepth

minNumRolloutsWithSmallValueChange

useBatch

numberOfBellmanUpdates

Constructor Detail

RTDP

RTDP

Method Detail

setNumPasses

setMaxDelta

setRollOutPolicy

setMaxDynamicDepth

setMinNumRolloutsWithSmallValueChange

toggleBatchMode

getNumberOfBellmanUpdates

planFromState

normalRTDP

batchRTDP

performOrderedBellmanUpdates