DifferentiableVI

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableDP
    - - burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableVI

All Implemented Interfaces:

DifferentiableQFunction, DifferentiableValueFunction, MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction
```
public class DifferentiableVI
extends DifferentiableDP
implements Planner
```
Performs Differentiable Value Iteration using the Boltzmann backup operator and a DifferentiableRF. This class behaves the same as the normal ValueIteration valueFunction except for being in the differentiable value function case.

Author:

James MacGlashan.

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`boltzBeta`
`protected boolean`	`foundReachableStates` Indicates whether the reachable states has been computed yet.
`protected boolean`	`hasRunVI` Indicates whether VI has been run or not
`protected double`	`maxDelta` When the maximum change in the value function is smaller than this value, VI will terminate.
`protected int`	`maxIterations` When the number of VI iterations exceeds this value, VI will terminate.
`protected boolean`	`stopReachabilityFromTerminalStates` When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed.

Fields inherited from class burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableDP
rf, valueGradient

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
operator, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`DifferentiableVI(SADomain domain, DifferentiableRF rf, double gamma, double boltzBeta, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)` Initializes the valueFunction.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addStatesToStateSpace(java.util.Collection<State> states)` Adds a `Collection` of states over which VI will iterate.
`void`	`addStateToStateSpace(State s)` Adds the given state to the state space over which VI iterates.
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used by the `runVI()` method and will cache all the transition dynamics.
`BoltzmannQPolicy`	`planFromState(State initialState)` Plans from the input state and returns a `BoltzmannQPolicy` following the Boltzmann parameter used for value Botlzmann value backups in this planner.
`void`	`recomputeReachableStates()` Calling this method will force the valueFunction to recompute the reachable states when the `planFromState(State)` method is called next.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`runVI()` Runs VI until the specified termination conditions are met.
`void`	`toggleReachabiltiyTerminalStatePruning(boolean toggle)` Sets whether the state reachability search to generate the state space will be prune the search from terminal states.

Methods inherited from class burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableDP
combinedNonZeroPDParameters, computeQGradient, DPPInit, getOperator, performDPValueGradientUpdateOn, qGradient, setOperator, valueGradient

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setValueFunctionInitialization, value, value, writeValueTable

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

Methods inherited from interface burlap.behavior.valuefunction.QFunction
qValue

Methods inherited from interface burlap.behavior.valuefunction.ValueFunction
value

- Field Detail
  - maxDelta
```
protected double maxDelta
```
    When the maximum change in the value function is smaller than this value, VI will terminate.
  - maxIterations
```
protected int maxIterations
```
    When the number of VI iterations exceeds this value, VI will terminate.
  - foundReachableStates
```
protected boolean foundReachableStates
```
    Indicates whether the reachable states has been computed yet.
  - stopReachabilityFromTerminalStates
```
protected boolean stopReachabilityFromTerminalStates
```
    When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed. It can optionally be set so that the search is pruned at terminal states by setting this value to true. By default, it is false and the full reachable state space is found
  - hasRunVI
```
protected boolean hasRunVI
```
    Indicates whether VI has been run or not
  - boltzBeta
```
protected double boltzBeta
```
- Constructor Detail
  - DifferentiableVI
```
public DifferentiableVI(SADomain domain,
                        DifferentiableRF rf,
                        double gamma,
                        double boltzBeta,
                        HashableStateFactory hashingFactory,
                        double maxDelta,
                        int maxIterations)
```
    Initializes the valueFunction.
    
    Parameters:
    
    domain - the domain in which to plan
    
    rf - the differentiable reward function that will be used
    
    gamma - the discount factor
    
    boltzBeta - the scaling factor in the boltzmann distribution used for the state value function. The larger the value, the more deterministic.
    
    hashingFactory - the state hashing factor to use
    
    maxDelta - when the maximum change in the value function is smaller than this value, VI will terminate.
    
    maxIterations - when the number of VI iterations exceeds this value, VI will terminate.
- Method Detail
  - recomputeReachableStates
```
public void recomputeReachableStates()
```
    Calling this method will force the valueFunction to recompute the reachable states when the planFromState(State) method is called next. This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.
  - toggleReachabiltiyTerminalStatePruning
```
public void toggleReachabiltiyTerminalStatePruning(boolean toggle)
```
    Sets whether the state reachability search to generate the state space will be prune the search from terminal states. The default is not to prune.
    
    Parameters:
    
    toggle - true if the search should prune the search at terminal states; false if the search should find all reachable states regardless of terminal states.
  - planFromState
```
public BoltzmannQPolicy planFromState(State initialState)
```
    Plans from the input state and returns a BoltzmannQPolicy following the Boltzmann parameter used for value Botlzmann value backups in this planner.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    
    initialState - the initial state of the planning problem
    
    Returns:
    
    a BoltzmannQPolicy
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Overrides:
    
    resetSolver in class DifferentiableDP
  - runVI
```
public void runVI()
```
    Runs VI until the specified termination conditions are met. In general, this method should only be called indirectly through the planFromState(State) method. The performReachabilityFrom(State) must have been performed at least once in the past or a runtime exception will be thrown. The planFromState(State) method will automatically call the performReachabilityFrom(State) method first and then this if it hasn't been run.
  - addStateToStateSpace
```
public void addStateToStateSpace(State s)
```
    Adds the given state to the state space over which VI iterates.
    
    Parameters:
    
    s - the state to add
  - addStatesToStateSpace
```
public void addStatesToStateSpace(java.util.Collection<State> states)
```
    Adds a Collection of states over which VI will iterate.
    
    Parameters:
    
    states - the collection of states.
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used by the runVI() method and will cache all the transition dynamics. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    
    si - the source state from which all reachable states will be found
    
    Returns:
    
    true if a reachability analysis had never been performed from this state; false otherwise.

Class DifferentiableVI

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableDP

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableDP

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Methods inherited from interface burlap.behavior.valuefunction.QFunction

Methods inherited from interface burlap.behavior.valuefunction.ValueFunction

Field Detail

maxDelta

maxIterations

foundReachableStates

stopReachabilityFromTerminalStates

hasRunVI

boltzBeta

Constructor Detail

DifferentiableVI

Method Detail

recomputeReachableStates

toggleReachabiltiyTerminalStatePruning

planFromState

resetSolver

runVI

addStateToStateSpace

addStatesToStateSpace

performReachabilityFrom