ValueFunctionPlanner

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.planning.ValueFunctionPlanner

All Implemented Interfaces:

QComputablePlanner, ValueFunction

Direct Known Subclasses:

ARTDP.ARTDPPlanner, BoundedRTDP, DifferentiableVFPlanner, PolicyIteration, RTDP, ValueFunctionPlanner.StaticVFPlanner, ValueIteration
```
public abstract class ValueFunctionPlanner
extends OOMDPPlanner
implements ValueFunction, QComputablePlanner
```
This class extends the OOMDP planner to define a class of planners that compute state value functions using the tabular Bellman update, such as ValueIteraiton. It defines data members for storing hashed transition dynamics (so that they can be quickly retrieved without multiple calls to the action transition generation) and a map from states to their values. It also adds support for the QComputable planner which can return Q-values by using the transition dynamics and the stored value function.
Note that by default ValueFunction planners will cache the transition dynamics so that they do not have to be procedurally generated by the Action. Transition dynamic caching can be disable by calling the toggleUseCachedTransitionDynamics(boolean) method. This may be desirable if the transition dynamics are expected to change with time, such as when the model is being learned in model-based RL.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`ValueFunctionPlanner.StaticVFPlanner` This class is used to store tabular value function values that can be manipulated with the `ValueFunctionPlanner` methods.

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.Map<StateHashTuple,java.util.List<ActionTransitions>>`	`transitionDynamics` A data structure for storing the hashed transition dynamics from each state, if this algorithm is set to use them.
`protected boolean`	`useCachedTransitions` A boolean toggle to indicate whether the transition dynamics should cached in a hashed data structure for quicker access, or computed as needed by the Action methods.
`protected java.util.Map<StateHashTuple,java.lang.Double>`	`valueFunction` A map for storing the current value function estimate for each state.
`protected ValueFunctionInitialization`	`valueInitializer` The value function initialization to use; defaulted to an initialization of 0 everywhere.

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description

ValueFunctionPlanner()

Constructors
Constructor and Description
`ValueFunctionPlanner()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected double`	`computeQ(State s, ActionTransitions trans)` Returns the Q-value for a given set and the possible transitions from it for a given action.
`protected double`	`computeQ(StateHashTuple sh, GroundedAction ga)` Computes the Q-value using the uncached transition dynamics produced by the Action object methods.
`protected java.util.List<ActionTransitions>`	`getActionsTransitions(StateHashTuple sh)` Returns the stored action transitions for the given state.
`java.util.List<State>`	`getAllStates()` This method will return all states that are stored in this planners value function.
`ValueFunctionPlanner.StaticVFPlanner`	`getCopyOfValueFunction()`
`protected double`	`getDefaultValue(State s)` Returns the default V-value to use for the state
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`protected QValue`	`getQ(StateHashTuple sh, GroundedAction a, java.util.Map<java.lang.String,java.lang.String> matching)` Gets a Q-Value for a hashed state, grounded action, and object instance matching from the hashed states an internally stored hashed transition dynamics.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`ValueFunctionInitialization`	`getValueFunctionInitialization()` Returns the value initialization function used.
`boolean`	`hasComputedValueFor(State s)` Returns whether a value for the given state has been computed previously.
`protected void`	`initializeOptionsForExpectationComputations()` Options need to to have transition probabilities computed and keep track of the possible termination states using as hashed data structure.
`double`	`performBellmanUpdateOn(State s)` Performs a Bellman value function update on the provided state.
`protected double`	`performBellmanUpdateOn(StateHashTuple sh)` Performs a Bellman value function update on the provided (hashed) state.
`protected double`	`performFixedPolicyBellmanUpdateOn(StateHashTuple sh, Policy p)` Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided (hashed) state.
`double`	`performFixedPolicyBellmanUpdateOn(State s, Policy p)` Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state.
`abstract void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`void`	`setValueFunctionInitialization(ValueFunctionInitialization vfInit)` Sets the value function initialization to use.
`void`	`toggleUseCachedTransitionDynamics(boolean useCachedTransitions)` Sets whether this object should cache hashed transition dynamics for each for faster look up, or whether to procedurally generate the transition dynamics as needed from the `Action` objects.
`double`	`value(State s)` Returns the value function evaluation of the given state.
`double`	`value(StateHashTuple sh)` Returns the value function evaluation of the given hashed state.
`void`	`VFPInit(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory)` Common init method for ValueFunction Planners.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - useCachedTransitions
```
protected boolean useCachedTransitions
```
    A boolean toggle to indicate whether the transition dynamics should cached in a hashed data structure for quicker access, or computed as needed by the Action methods. The default is true, to cache the transition dynamics. However, this value should be set to false if it is expected that the transition dynamics can change over time which might be the case in model learning scenarios.
  - transitionDynamics
```
protected java.util.Map<StateHashTuple,java.util.List<ActionTransitions>> transitionDynamics
```
    A data structure for storing the hashed transition dynamics from each state, if this algorithm is set to use them.
  - valueFunction
```
protected java.util.Map<StateHashTuple,java.lang.Double> valueFunction
```
    A map for storing the current value function estimate for each state.
  - valueInitializer
```
protected ValueFunctionInitialization valueInitializer
```
    The value function initialization to use; defaulted to an initialization of 0 everywhere.
- Constructor Detail
  - ValueFunctionPlanner
```
public ValueFunctionPlanner()
```
- Method Detail
  - planFromState
```
public abstract void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - VFPInit
```
public void VFPInit(Domain domain,
           RewardFunction rf,
           TerminalFunction tf,
           double gamma,
           StateHashFactory hashingFactory)
```
    Common init method for ValueFunction Planners. This will automatically call the OOMDPPLanner init method.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factory
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner
  - setValueFunctionInitialization
```
public void setValueFunctionInitialization(ValueFunctionInitialization vfInit)
```
    Sets the value function initialization to use.
    
    Parameters:
    vfInit - the object that defines how to initializes the value function.
  - getValueFunctionInitialization
```
public ValueFunctionInitialization getValueFunctionInitialization()
```
    Returns the value initialization function used.
    
    Returns:
    the value initialization function used.
  - hasComputedValueFor
```
public boolean hasComputedValueFor(State s)
```
    Returns whether a value for the given state has been computed previously.
    
    Parameters:
    s - the state to check
    
    Returns:
    true if the the value for the given state has already been computed; false otherwise.
  - value
```
public double value(State s)
```
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    s - the state to evaluate.
    
    Returns:
    the value function evaluation of the given state.
  - value
```
public double value(StateHashTuple sh)
```
    Returns the value function evaluation of the given hashed state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Parameters:
    sh - the hashed state to evaluate.
    
    Returns:
    the value function evaluation of the given state.
  - toggleUseCachedTransitionDynamics
```
public void toggleUseCachedTransitionDynamics(boolean useCachedTransitions)
```
    Sets whether this object should cache hashed transition dynamics for each for faster look up, or whether to procedurally generate the transition dynamics as needed from the Action objects. Letting the transition dynamics be procedurally generated may be useful if the transition dynamics can change over the time such as when using a learned model.
    
    Parameters:
    useCachedTransitions - true if the transition dynamics should be cached and stored; false if they should always be procedurally generated from the Action objects.
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - getAllStates
```
public java.util.List<State> getAllStates()
```
    This method will return all states that are stored in this planners value function.
    
    Returns:
    all states that are stored in this planners value function.
  - getCopyOfValueFunction
```
public ValueFunctionPlanner.StaticVFPlanner getCopyOfValueFunction()
```
  - getQ
```
protected QValue getQ(StateHashTuple sh,
          GroundedAction a,
          java.util.Map<java.lang.String,java.lang.String> matching)
```
    Gets a Q-Value for a hashed state, grounded action, and object instance matching from the hashed states an internally stored hashed transition dynamics. If the input state is a terminal state, then the value 0 is returned.
    
    Parameters:
    sh - the input state
    a - the action to get the Q-value for
    matching - the object instance matching from sh to the corresponding state stored in the value function
    
    Returns:
    the Q-value
  - getActionsTransitions
```
protected java.util.List<ActionTransitions> getActionsTransitions(StateHashTuple sh)
```
    Returns the stored action transitions for the given state. If the action transitions are not already cached and this object is set to use caching, then they will be cached.
    
    Parameters:
    sh - the input state from which to get the transitions
    
    Returns:
    the stored action transitions for the given state
  - performBellmanUpdateOn
```
public double performBellmanUpdateOn(State s)
```
    Performs a Bellman value function update on the provided state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    s - the state on which to perform the Bellman update.
    
    Returns:
    the new value of the state.
  - performFixedPolicyBellmanUpdateOn
```
public double performFixedPolicyBellmanUpdateOn(State s,
                                       Policy p)
```
    Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    s - the state on which to perform the Bellman update.
    p - the policy that is being evaluated
    
    Returns:
    the new value of the state
  - performBellmanUpdateOn
```
protected double performBellmanUpdateOn(StateHashTuple sh)
```
    Performs a Bellman value function update on the provided (hashed) state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    sh - the hashed state on which to perform the Bellman update.
    
    Returns:
    the new value of the state.
  - performFixedPolicyBellmanUpdateOn
```
protected double performFixedPolicyBellmanUpdateOn(StateHashTuple sh,
                                       Policy p)
```
    Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided (hashed) state. Results are stored in the value function map as well as returned. If this object is set to used cached transition dynamics and the transition dynamics for this state are not cached, then they will be created and cached.
    
    Parameters:
    sh - the hashed state on which to perform the Bellman update.
    p - the policy that is being evaluated
    
    Returns:
    the new value of the state
  - computeQ
```
protected double computeQ(State s,
              ActionTransitions trans)
```
    Returns the Q-value for a given set and the possible transitions from it for a given action. This computation *is* compatible with Option objects.
    
    Parameters:
    s - the given state
    trans - the given action transitions
    
    Returns:
    the double value of a Q-value
  - computeQ
```
protected double computeQ(StateHashTuple sh,
              GroundedAction ga)
```
    Computes the Q-value using the uncached transition dynamics produced by the Action object methods. This computation *is* compatible with Option objects.
    
    Parameters:
    sh - the given state
    ga - the given action
    
    Returns:
    the double value of a Q-value for the given state-aciton pair.
  - getDefaultValue
```
protected double getDefaultValue(State s)
```
    Returns the default V-value to use for the state
    
    Parameters:
    s - the input state to get the default V-value for
    
    Returns:
    the default V-value in double form.
  - initializeOptionsForExpectationComputations
```
protected void initializeOptionsForExpectationComputations()
```
    Options need to to have transition probabilities computed and keep track of the possible termination states using as hashed data structure. This method tells each option which state hashing factory to use.

Class ValueFunctionPlanner

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

useCachedTransitions

transitionDynamics

valueFunction

valueInitializer

Constructor Detail

ValueFunctionPlanner

Method Detail

planFromState

VFPInit

resetPlannerResults

setValueFunctionInitialization

getValueFunctionInitialization

hasComputedValueFor

value

value

toggleUseCachedTransitionDynamics

getQs

getQ

getAllStates

getCopyOfValueFunction

getQ

getActionsTransitions

performBellmanUpdateOn

performFixedPolicyBellmanUpdateOn

performBellmanUpdateOn

performFixedPolicyBellmanUpdateOn

computeQ

computeQ

getDefaultValue

initializeOptionsForExpectationComputations