DynamicProgramming

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

All Implemented Interfaces:

MDPSolverInterface, QFunction, ValueFunction

Direct Known Subclasses:

BoundedRTDP, DifferentiableDP, DynamicProgramming.StaticVFPlanner, PolicyEvaluation, PolicyIteration, RTDP, ValueIteration
```
public class DynamicProgramming
extends MDPSolver
implements ValueFunction, QFunction
```
A class for performing dynamic programming operations: updating the value function using a Bellman backup. It defines data members for storing hashed transition dynamics (so that they can be quickly retrieved without multiple calls to the action transition generation) and a map from states to their values. It also implements QFunction which can return Q-values by using the transition dynamics and the stored value function.
Note that by default DynamicProgramming instances will cache the transition dynamics so that they do not have to be procedurally generated by the Action. Transition dynamic caching can be disable by calling the toggleUseCachedTransitionDynamics(boolean) method. This may be desirable if the transition dynamics are expected to change with time, such as when the model is being learned in model-based RL.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`DynamicProgramming.StaticVFPlanner` This class is used to store tabular value function values that can be manipulated with the `DynamicProgramming` methods.

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction
QFunction.QFunctionHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.Map<HashableState,java.util.List<ActionTransitions>>`	`transitionDynamics` A data structure for storing the hashed transition dynamics from each state, if this algorithm is set to use them.
`protected boolean`	`useCachedTransitions` A boolean toggle to indicate whether the transition dynamics should cached in a hashed data structure for quicker access, or computed as needed by the Action methods.
`protected java.util.Map<HashableState,java.lang.Double>`	`valueFunction` A map for storing the current value function estimate for each state.
`protected ValueFunctionInitialization`	`valueInitializer` The value function initialization to use; defaulted to an initialization of 0 everywhere.

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description

DynamicProgramming()

Constructors
Constructor and Description
`DynamicProgramming()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected double`	`computeQ(HashableState sh, GroundedAction ga)` Computes the Q-value using the uncached transition dynamics produced by the Action object methods.
`protected double`	`computeQ(State s, ActionTransitions trans)` Returns the Q-value for a given set and the possible transitions from it for a given action.
`void`	`DPPInit(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory)` Common init method for `DynamicProgramming` instances.
`protected java.util.List<ActionTransitions>`	`getActionsTransitions(HashableState sh)` Returns the stored action transitions for the given state.
`java.util.List<State>`	`getAllStates()` This method will return all states that are stored in this planners value function.
`DynamicProgramming`	`getCopyOfValueFunction()`
`protected double`	`getDefaultValue(State s)` Returns the default V-value to use for the state
`protected QValue`	`getQ(HashableState sh, GroundedAction a, java.util.Map<java.lang.String,java.lang.String> matching)` Gets a Q-Value for a hashed state, grounded action, and object instance matching from the hashed states an internally stored hashed transition dynamics.
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`ValueFunctionInitialization`	`getValueFunctionInitialization()` Returns the value initialization function used.
`boolean`	`hasComputedValueFor(State s)` Returns whether a value for the given state has been computed previously.
`protected void`	`initializeOptionsForExpectationComputations()` Options need to to have transition probabilities computed and keep track of the possible termination states using as hashed data structure.
`protected double`	`performBellmanUpdateOn(HashableState sh)` Performs a Bellman value function update on the provided (hashed) state.
`double`	`performBellmanUpdateOn(State s)` Performs a Bellman value function update on the provided state.
`protected double`	`performFixedPolicyBellmanUpdateOn(HashableState sh, Policy p)` Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided (hashed) state.
`double`	`performFixedPolicyBellmanUpdateOn(State s, Policy p)` Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`setValueFunctionInitialization(ValueFunctionInitialization vfInit)` Sets the value function initialization to use.
`void`	`toggleUseCachedTransitionDynamics(boolean useCachedTransitions)` Sets whether this object should cache hashed transition dynamics for each for faster look up, or whether to procedurally generate the transition dynamics as needed from the `Action` objects.
`double`	`value(HashableState sh)` Returns the value function evaluation of the given hashed state.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - useCachedTransitions
```
protected boolean useCachedTransitions
```
    A boolean toggle to indicate whether the transition dynamics should cached in a hashed data structure for quicker access, or computed as needed by the Action methods. The default is true, to cache the transition dynamics. However, this value should be set to false if it is expected that the transition dynamics can change over time which might be the case in model learning scenarios.
  - transitionDynamics
```
protected java.util.Map<HashableState,java.util.List<ActionTransitions>> transitionDynamics
```
    A data structure for storing the hashed transition dynamics from each state, if this algorithm is set to use them.
  - valueFunction
```
protected java.util.Map<HashableState,java.lang.Double> valueFunction
```
    A map for storing the current value function estimate for each state.
  - valueInitializer
```
protected ValueFunctionInitialization valueInitializer
```
    The value function initialization to use; defaulted to an initialization of 0 everywhere.
- Constructor Detail
  - DynamicProgramming
```
public DynamicProgramming()
```
- Method Detail
  - DPPInit
```
public void DPPInit(Domain domain,
           RewardFunction rf,
           TerminalFunction tf,
           double gamma,
           HashableStateFactory hashingFactory)
```
    Common init method for DynamicProgramming instances. This will automatically call the MDPSolver.solverInit(burlap.oomdp.core.Domain, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, double, burlap.oomdp.statehashing.HashableStateFactory) method.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factory
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver
  - setValueFunctionInitialization
```
public void setValueFunctionInitialization(ValueFunctionInitialization vfInit)
```
    Sets the value function initialization to use.
    
    Parameters:
    vfInit - the object that defines how to initializes the value function.
  - getValueFunctionInitialization
```
public ValueFunctionInitialization getValueFunctionInitialization()
```
    Returns the value initialization function used.
    
    Returns:
    the value initialization function used.
  - hasComputedValueFor
```
public boolean hasComputedValueFor(State s)
```
    Returns whether a value for the given state has been computed previously.
    
    Parameters:
    s - the state to check
    
    Returns:
    true if the the value for the given state has already been computed; false otherwise.
  - value
```
public double value(State s)
```
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    s - the state to evaluate.
    
    Returns:
    the value function evaluation of the given state.
  - value
```
public double value(HashableState sh)
```
    Returns the value function evaluation of the given hashed state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Parameters:
    sh - the hashed state to evaluate.
    
    Returns:
    the value function evaluation of the given state.
  - toggleUseCachedTransitionDynamics
```
public void toggleUseCachedTransitionDynamics(boolean useCachedTransitions)
```
    Sets whether this object should cache hashed transition dynamics for each for faster look up, or whether to procedurally generate the transition dynamics as needed from the Action objects. Letting the transition dynamics be procedurally generated may be useful if the transition dynamics can change over the time such as when using a learned model.
    
    Parameters:
    useCachedTransitions - true if the transition dynamics should be cached and stored; false if they should always be procedurally generated from the Action objects.
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QFunction
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QFunction
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QFunction
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - getAllStates
```
public java.util.List<State> getAllStates()
```
    This method will return all states that are stored in this planners value function.
    
    Returns:
    all states that are stored in this planners value function.
  - getCopyOfValueFunction
```
public DynamicProgramming getCopyOfValueFunction()
```
  - getQ
```
protected QValue getQ(HashableState sh,
          GroundedAction a,
          java.util.Map<java.lang.String,java.lang.String> matching)
```
    Gets a Q-Value for a hashed state, grounded action, and object instance matching from the hashed states an internally stored hashed transition dynamics. If the input state is a terminal state, then the value 0 is returned.
    
    Parameters:
    sh - the input state
    a - the action to get the Q-value for
    matching - the object instance matching from sh to the corresponding state stored in the value function
    
    Returns:
    the Q-value
  - getActionsTransitions
```
protected java.util.List<ActionTransitions> getActionsTransitions(HashableState sh)
```
    Returns the stored action transitions for the given state. If the action transitions are not already cached and this object is set to use caching, then they will be cached.
    
    Parameters:
    sh - the input state from which to get the transitions
    
    Returns:
    the stored action transitions for the given state
  - performBellmanUpdateOn
```
public double performBellmanUpdateOn(State s)
```
    Performs a Bellman value function update on the provided state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    s - the state on which to perform the Bellman update.
    
    Returns:
    the new value of the state.
  - performFixedPolicyBellmanUpdateOn
```
public double performFixedPolicyBellmanUpdateOn(State s,
                                       Policy p)
```
    Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    s - the state on which to perform the Bellman update.
    p - the policy that is being evaluated
    
    Returns:
    the new value of the state
  - performBellmanUpdateOn
```
protected double performBellmanUpdateOn(HashableState sh)
```
    Performs a Bellman value function update on the provided (hashed) state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    sh - the hashed state on which to perform the Bellman update.
    
    Returns:
    the new value of the state.
  - performFixedPolicyBellmanUpdateOn
```
protected double performFixedPolicyBellmanUpdateOn(HashableState sh,
                                       Policy p)
```
    Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided (hashed) state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    sh - the hashed state on which to perform the Bellman update.
    p - the policy that is being evaluated
    
    Returns:
    the new value of the state
  - computeQ
```
protected double computeQ(State s,
              ActionTransitions trans)
```
    Returns the Q-value for a given set and the possible transitions from it for a given action. This computation *is* compatible with Option objects.
    
    Parameters:
    s - the given state
    trans - the given action transitions
    
    Returns:
    the double value of a Q-value
  - computeQ
```
protected double computeQ(HashableState sh,
              GroundedAction ga)
```
    Computes the Q-value using the uncached transition dynamics produced by the Action object methods. This computation *is* compatible with Option objects.
    
    Parameters:
    sh - the given state
    ga - the given action
    
    Returns:
    the double value of a Q-value for the given state-aciton pair.
  - getDefaultValue
```
protected double getDefaultValue(State s)
```
    Returns the default V-value to use for the state
    
    Parameters:
    s - the input state to get the default V-value for
    
    Returns:
    the default V-value in double form.
  - initializeOptionsForExpectationComputations
```
protected void initializeOptionsForExpectationComputations()
```
    Options need to to have transition probabilities computed and keep track of the possible termination states using as hashed data structure. This method tells each option which state hashing factory to use.

Class DynamicProgramming

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Field Detail

useCachedTransitions

transitionDynamics

valueFunction

valueInitializer

Constructor Detail

DynamicProgramming

Method Detail

DPPInit

resetSolver

setValueFunctionInitialization

getValueFunctionInitialization

hasComputedValueFor

value

value

toggleUseCachedTransitionDynamics

getQs

getQ

getAllStates

getCopyOfValueFunction

getQ

getActionsTransitions

performBellmanUpdateOn

performFixedPolicyBellmanUpdateOn

performBellmanUpdateOn

performFixedPolicyBellmanUpdateOn

computeQ

computeQ

getDefaultValue

initializeOptionsForExpectationComputations