DifferentiableVI

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.planning.ValueFunctionPlanner
  - - burlap.behavior.singleagent.learnbydemo.mlirl.differentiableplanners.DifferentiableVFPlanner
    - - burlap.behavior.singleagent.learnbydemo.mlirl.differentiableplanners.DifferentiableVI

All Implemented Interfaces:

QGradientPlanner, QComputablePlanner, ValueFunction
```
public class DifferentiableVI
extends DifferentiableVFPlanner
```
Performs Differentiable Value Iteration using the Boltzmann backup operator and a DifferentiableRF. This class behaves the same as the normal ValueIteration planner except for being in the differentiable value function case.

Author:

James MacGlashan.

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
  ValueFunctionPlanner.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`foundReachableStates` Indicates whether the reachable states has been computed yet.
`protected boolean`	`hasRunVI` Indicates whether VI has been run or not
`protected double`	`maxDelta` When the maximum change in the value function is smaller than this value, VI will terminate.
`protected int`	`maxIterations` When the number of VI iterations exceeds this value, VI will terminate.
`protected boolean`	`stopReachabilityFromTerminalStates` When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed.

Fields inherited from class burlap.behavior.singleagent.learnbydemo.mlirl.differentiableplanners.DifferentiableVFPlanner
boltzBeta, valueGradient

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`DifferentiableVI(Domain domain, DifferentiableRF rf, TerminalFunction tf, double gamma, double boltzBeta, StateHashFactory hashingFactory, double maxDelta, int maxIterations)` Initializes the planner.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addStatesToStateSpace(java.util.Collection<State> states)` Adds a `Collection` of states over which VI will iterate.
`void`	`addStateToStateSpace(State s)` Adds the given state to the state space over which VI iterates.
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used by the `runVI()` method and will cache all the transition dynamics.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`recomputeReachableStates()` Calling this method will force the planner to recompute the reachable states when the `planFromState(burlap.oomdp.core.State)` method is called next.
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`void`	`runVI()` Runs VI until the specified termination conditions are met.
`void`	`toggleReachabiltiyTerminalStatePruning(boolean toggle)` Sets whether the state reachability search to generate the state space will be prune the search from terminal states.

Methods inherited from class burlap.behavior.singleagent.learnbydemo.mlirl.differentiableplanners.DifferentiableVFPlanner
computeQGradient, getAllQGradients, getQGradient, getValueGradient, performBellmanUpdateOn, performDPValueGradientUpdateOn, setBoltzmannBetaParameter

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
getQ, getQs

- Field Detail
  - maxDelta
```
protected double maxDelta
```
    When the maximum change in the value function is smaller than this value, VI will terminate.
  - maxIterations
```
protected int maxIterations
```
    When the number of VI iterations exceeds this value, VI will terminate.
  - foundReachableStates
```
protected boolean foundReachableStates
```
    Indicates whether the reachable states has been computed yet.
  - stopReachabilityFromTerminalStates
```
protected boolean stopReachabilityFromTerminalStates
```
    When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed. It can optionally be set so that the search is pruned at terminal states by setting this value to true. By default, it is false and the full reachable state space is found
  - hasRunVI
```
protected boolean hasRunVI
```
    Indicates whether VI has been run or not
- Constructor Detail
  - DifferentiableVI
```
public DifferentiableVI(Domain domain,
                DifferentiableRF rf,
                TerminalFunction tf,
                double gamma,
                double boltzBeta,
                StateHashFactory hashingFactory,
                double maxDelta,
                int maxIterations)
```
    Initializes the planner.
    
    Parameters:
    domain - the domain in which to plan
    rf - the differentiable reward function that will be used
    tf - the terminal state function
    gamma - the discount factor
    boltzBeta - the scaling factor in the boltzmann distribution used for the state value function. The larger the value, the more deterministic.
    hashingFactory - the state hashing factor to use
    maxDelta - when the maximum change in the value function is smaller than this value, VI will terminate.
    maxIterations - when the number of VI iterations exceeds this value, VI will terminate.
- Method Detail
  - recomputeReachableStates
```
public void recomputeReachableStates()
```
    Calling this method will force the planner to recompute the reachable states when the planFromState(burlap.oomdp.core.State) method is called next. This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.
  - toggleReachabiltiyTerminalStatePruning
```
public void toggleReachabiltiyTerminalStatePruning(boolean toggle)
```
    Sets whether the state reachability search to generate the state space will be prune the search from terminal states. The default is not to prune.
    
    Parameters:
    toggle - true if the search should prune the search at terminal states; false if the search should find all reachable states regardless of terminal states.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class ValueFunctionPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Overrides:
    
    resetPlannerResults in class DifferentiableVFPlanner
  - runVI
```
public void runVI()
```
    Runs VI until the specified termination conditions are met. In general, this method should only be called indirectly through the planFromState(State) method. The performReachabilityFrom(State) must have been performed at least once in the past or a runtime exception will be thrown. The planFromState(State) method will automatically call the performReachabilityFrom(State) method first and then this if it hasn't been run.
  - addStateToStateSpace
```
public void addStateToStateSpace(State s)
```
    Adds the given state to the state space over which VI iterates.
    
    Parameters:
    s - the state to add
  - addStatesToStateSpace
```
public void addStatesToStateSpace(java.util.Collection<State> states)
```
    Adds a Collection of states over which VI will iterate.
    
    Parameters:
    states - the collection of states.
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used by the runVI() method and will cache all the transition dynamics. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    si - the source state from which all reachable states will be found
    
    Returns:
    true if a reachability analysis had never been performed from this state; false otherwise.

Class DifferentiableVI

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.behavior.singleagent.learnbydemo.mlirl.differentiableplanners.DifferentiableVFPlanner

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.learnbydemo.mlirl.differentiableplanners.DifferentiableVFPlanner

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Detail

maxDelta

maxIterations

foundReachableStates

stopReachabilityFromTerminalStates

hasRunVI

Constructor Detail

DifferentiableVI

Method Detail

recomputeReachableStates

toggleReachabiltiyTerminalStatePruning

planFromState

resetPlannerResults

runVI

addStateToStateSpace

addStatesToStateSpace

performReachabilityFrom