DynamicProgramming

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

All Implemented Interfaces:

MDPSolverInterface, QFunction, QProvider, ValueFunction

Direct Known Subclasses:

BoundedRTDP, DifferentiableDP, PolicyEvaluation, PolicyIteration, RTDP, ValueIteration
```
public class DynamicProgramming
extends MDPSolver
implements ValueFunction, QProvider
```
A class for performing dynamic programming operations: updating the value function using a Bellman backup.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected DPOperator`	`operator`
`protected java.util.Map<HashableState,java.lang.Double>`	`valueFunction` A map for storing the current value function estimate for each state.
`protected ValueFunction`	`valueInitializer` The value function initialization to use; defaulted to an initialization of 0 everywhere.

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description

DynamicProgramming()

Constructors
Constructor and Description
`DynamicProgramming()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected double`	`computeQ(State s, Action ga)` Computes the Q-value This computation is compatible with `Option` objects.
`void`	`DPPInit(SADomain domain, double gamma, HashableStateFactory hashingFactory)` Common init method for `DynamicProgramming` instances.
`java.util.List<State>`	`getAllStates()` This method will return all states that are stored in this planners value function.
`DynamicProgramming`	`getCopyOfValueFunction()`
`protected double`	`getDefaultValue(State s)` Returns the default V-value to use for the state
`SampleModel`	`getModel()` Returns the model being used by this solver
`DPOperator`	`getOperator()` Returns the dynamic programming operator used
`ValueFunction`	`getValueFunctionInitialization()` Returns the value initialization function used.
`boolean`	`hasComputedValueFor(State s)` Returns whether a value for the given state has been computed previously.
`void`	`loadValueTable(java.lang.String path)` Loads the value function table located on disk at the specified path.
`protected double`	`performBellmanUpdateOn(HashableState sh)` Performs a Bellman value function update on the provided (hashed) state.
`double`	`performBellmanUpdateOn(State s)` Performs a Bellman value function update on the provided state.
`protected double`	`performFixedPolicyBellmanUpdateOn(HashableState sh, EnumerablePolicy p)` Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state.
`double`	`performFixedPolicyBellmanUpdateOn(State s, EnumerablePolicy p)` Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state.
`double`	`qValue(State s, Action a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`qValues(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`setOperator(DPOperator operator)` Sets the dynamic programming operator use.
`void`	`setValueFunctionInitialization(ValueFunction vfInit)` Sets the value function initialization to use.
`double`	`value(HashableState sh)` Returns the value function evaluation of the given hashed state.
`double`	`value(State s)` Returns the value function evaluation of the given state.
`void`	`writeValueTable(java.lang.String path)` Writes the value function table stored in this object to the specified file path.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - valueFunction
```
protected java.util.Map<HashableState,java.lang.Double> valueFunction
```
    A map for storing the current value function estimate for each state.
  - valueInitializer
```
protected ValueFunction valueInitializer
```
    The value function initialization to use; defaulted to an initialization of 0 everywhere.
  - operator
```
protected DPOperator operator
```
- Constructor Detail
  - DynamicProgramming
```
public DynamicProgramming()
```
- Method Detail
  - DPPInit
```
public void DPPInit(SADomain domain,
                    double gamma,
                    HashableStateFactory hashingFactory)
```
    Common init method for DynamicProgramming instances. This will automatically call the MDPSolver.solverInit(SADomain, double, HashableStateFactory) method.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory
  - getModel
```
public SampleModel getModel()
```
    Description copied from interface: MDPSolverInterface
    
    Returns the model being used by this solver
    
    Specified by:
    
    getModel in interface MDPSolverInterface
    
    Overrides:
    
    getModel in class MDPSolver
    
    Returns:
    
    a SampleModel
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver
  - setValueFunctionInitialization
```
public void setValueFunctionInitialization(ValueFunction vfInit)
```
    Sets the value function initialization to use.
    
    Parameters:
    
    vfInit - the object that defines how to initializes the value function.
  - getValueFunctionInitialization
```
public ValueFunction getValueFunctionInitialization()
```
    Returns the value initialization function used.
    
    Returns:
    
    the value initialization function used.
  - getOperator
```
public DPOperator getOperator()
```
    Returns the dynamic programming operator used
    
    Returns:
    
    the dynamic programming operator used
  - setOperator
```
public void setOperator(DPOperator operator)
```
    Sets the dynamic programming operator use. Note that default setting is BellmanOperator (max)
    
    Parameters:
    
    operator - the dynamic programming operator to use.
  - hasComputedValueFor
```
public boolean hasComputedValueFor(State s)
```
    Returns whether a value for the given state has been computed previously.
    
    Parameters:
    
    s - the state to check
    
    Returns:
    
    true if the the value for the given state has already been computed; false otherwise.
  - value
```
public double value(State s)
```
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    
    s - the state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.
  - value
```
public double value(HashableState sh)
```
    Returns the value function evaluation of the given hashed state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Parameters:
    
    sh - the hashed state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.
  - qValues
```
public java.util.List<QValue> qValues(State s)
```
    Description copied from interface: QProvider
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    qValues in interface QProvider
    
    Parameters:
    
    s - the state for which Q-values are to be returned.
    
    Returns:
    
    a List of QValue objects for ever permissible action for the given input state.
  - qValue
```
public double qValue(State s,
                     Action a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    qValue in interface QFunction
    
    Parameters:
    
    s - the input state
    
    a - the input action
    
    Returns:
    
    the QValue for the given state-action pair.
  - getAllStates
```
public java.util.List<State> getAllStates()
```
    This method will return all states that are stored in this planners value function.
    
    Returns:
    
    all states that are stored in this planners value function.
  - getCopyOfValueFunction
```
public DynamicProgramming getCopyOfValueFunction()
```
  - performBellmanUpdateOn
```
public double performBellmanUpdateOn(State s)
```
    Performs a Bellman value function update on the provided state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    
    s - the state on which to perform the Bellman update.
    
    Returns:
    
    the new value of the state.
  - performFixedPolicyBellmanUpdateOn
```
public double performFixedPolicyBellmanUpdateOn(State s,
                                                EnumerablePolicy p)
```
    Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    
    s - the state on which to perform the Bellman update.
    
    p - the policy that is being evaluated
    
    Returns:
    
    the new value of the state
  - writeValueTable
```
public void writeValueTable(java.lang.String path)
```
    Writes the value function table stored in this object to the specified file path. Uses a standard YAML approach, which means the HashableState and underlying Domain states must have JavaBean like properties; i.e., have a default constructor and getters and setters (or public data members) for all relevant fields.
    
    Parameters:
    
    path - the path to write the value function
  - loadValueTable
```
public void loadValueTable(java.lang.String path)
```
    Loads the value function table located on disk at the specified path. Expects the file to be a Yaml representation of a Java Map from HashableState to Double.
    
    Parameters:
    
    path - the path to the save value function table
  - performBellmanUpdateOn
```
protected double performBellmanUpdateOn(HashableState sh)
```
    Performs a Bellman value function update on the provided (hashed) state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    
    sh - the hashed state on which to perform the Bellman update.
    
    Returns:
    
    the new value of the state.
  - performFixedPolicyBellmanUpdateOn
```
protected double performFixedPolicyBellmanUpdateOn(HashableState sh,
                                                   EnumerablePolicy p)
```
    Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state. Results are stored in the value function map as well as returned.
    
    Parameters:
    
    sh - the hashed state on which to perform the Bellman update.
    
    p - the policy that is being evaluated
    
    Returns:
    
    the new value of the state
  - computeQ
```
protected double computeQ(State s,
                          Action ga)
```
    Computes the Q-value This computation *is* compatible with Option objects.
    
    Parameters:
    
    s - the given state
    
    ga - the given action
    
    Returns:
    
    the double value of a Q-value for the given state-aciton pair.
  - getDefaultValue
```
protected double getDefaultValue(State s)
```
    Returns the default V-value to use for the state
    
    Parameters:
    
    s - the input state to get the default V-value for
    
    Returns:
    
    the default V-value in double form.

Class DynamicProgramming

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Field Detail

valueFunction

valueInitializer

operator

Constructor Detail

DynamicProgramming

Method Detail

DPPInit

getModel

resetSolver

setValueFunctionInitialization

getValueFunctionInitialization

getOperator

setOperator

hasComputedValueFor

value

value

qValues

qValue

getAllStates

getCopyOfValueFunction

performBellmanUpdateOn

performFixedPolicyBellmanUpdateOn

writeValueTable

loadValueTable

performBellmanUpdateOn

performFixedPolicyBellmanUpdateOn

computeQ

getDefaultValue