DifferentiableVFPlanner

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.planning.ValueFunctionPlanner
  - - burlap.behavior.singleagent.learnbydemo.mlirl.differentiableplanners.DifferentiableVFPlanner

All Implemented Interfaces:

QGradientPlanner, QComputablePlanner, ValueFunction

Direct Known Subclasses:

DifferentiableVI
```
public abstract class DifferentiableVFPlanner
extends ValueFunctionPlanner
implements QGradientPlanner
```
A class for performing dynamic programming based planning with a differentiable value backup operator. Specifically, all subclasses are assumed to use a Boltzmann backup operator and the reward functions must be differentiable by subclassing the DifferentiableRF class. The normal performBellmanUpdateOn(burlap.behavior.statehashing.StateHashTuple) method of the ValueFunctionPlanner class is overriden with a method that uses the Boltzmann backup operator.

Author:

James MacGlashan.

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
  ValueFunctionPlanner.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`boltzBeta` The Boltzmann backup operator beta parameter.
`protected java.util.Map<StateHashTuple,double[]>`	`valueGradient` The value function gradient for each state.

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description

DifferentiableVFPlanner()

Constructors
Constructor and Description
`DifferentiableVFPlanner()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected double[]`	`computeQGradient(State s, GroundedAction ga)` Computes the Q-value gradient for the given `State` and `GroundedAction`.
`java.util.List<QGradientTuple>`	`getAllQGradients(State s)` Returns the list of Q-value gradients (returned as `objects`) for each action permissible in the given state.
`QGradientTuple`	`getQGradient(State s, GroundedAction a)` Returns the Q-value gradient (`QGradientTuple`) for the given state and action.
`double[]`	`getValueGradient(State s)` Returns the value function gradient for the given `State`
`protected double`	`performBellmanUpdateOn(StateHashTuple sh)` Overrides the superclass method to perform a Boltzmann backup operator instead of a Bellman backup operator.
`protected double[]`	`performDPValueGradientUpdateOn(StateHashTuple sh)` Performs the Boltzmann value function gradient backup for the given `StateHashTuple`.
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`void`	`setBoltzmannBetaParameter(double beta)` Sets this planner's Boltzmann beta parameter used to compute gradients.

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, planFromState, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
getQ, getQs

- Field Detail
  - valueGradient
```
protected java.util.Map<StateHashTuple,double[]> valueGradient
```
    The value function gradient for each state.
  - boltzBeta
```
protected double boltzBeta
```
    The Boltzmann backup operator beta parameter. The larger the beta, the more deterministic the back up.
- Constructor Detail
  - DifferentiableVFPlanner
```
public DifferentiableVFPlanner()
```
- Method Detail
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Overrides:
    
    resetPlannerResults in class ValueFunctionPlanner
  - performBellmanUpdateOn
```
protected double performBellmanUpdateOn(StateHashTuple sh)
```
    Overrides the superclass method to perform a Boltzmann backup operator instead of a Bellman backup operator. Results are stored in this planner's internal map.
    
    Overrides:
    
    performBellmanUpdateOn in class ValueFunctionPlanner
    
    Parameters:
    sh - the hashed state on which to perform the Boltzmann update.
    
    Returns:
    the new value
  - performDPValueGradientUpdateOn
```
protected double[] performDPValueGradientUpdateOn(StateHashTuple sh)
```
    Performs the Boltzmann value function gradient backup for the given StateHashTuple. Results are stored in this planner's internal map.
    
    Parameters:
    sh - the hashed state on which to perform the Boltzmann gradient update.
    
    Returns:
    the gradient.
  - getValueGradient
```
public double[] getValueGradient(State s)
```
    Returns the value function gradient for the given State
    
    Parameters:
    s - the state for which the gradient is be returned.
    
    Returns:
    the value function gradient for the given State
  - getAllQGradients
```
public java.util.List<QGradientTuple> getAllQGradients(State s)
```
    Description copied from interface: QGradientPlanner
    
    Returns the list of Q-value gradients (returned as objects) for each action permissible in the given state.
    
    Specified by:
    
    getAllQGradients in interface QGradientPlanner
    
    Parameters:
    s - the state for which Q-value gradients are to be returned.
    
    Returns:
    the list of Q-value gradients for each action permissible in the given state.
  - getQGradient
```
public QGradientTuple getQGradient(State s,
                          GroundedAction a)
```
    Description copied from interface: QGradientPlanner
    
    Returns the Q-value gradient (QGradientTuple) for the given state and action.
    
    Specified by:
    
    getQGradient in interface QGradientPlanner
    
    Parameters:
    s - the state for which the Q-value gradient is to be returned
    a - the action for which the Q-value gradient is to be returned.
    
    Returns:
    the Q-value gradient for the given state and action.
  - setBoltzmannBetaParameter
```
public void setBoltzmannBetaParameter(double beta)
```
    Description copied from interface: QGradientPlanner
    
    Sets this planner's Boltzmann beta parameter used to compute gradients. As beta gets larger, the policy becomes more deterministic.
    
    Specified by:
    
    setBoltzmannBetaParameter in interface QGradientPlanner
    
    Parameters:
    beta - the value to which this planner's Boltzmann beta parameter will be set
  - computeQGradient
```
protected double[] computeQGradient(State s,
                        GroundedAction ga)
```
    Computes the Q-value gradient for the given State and GroundedAction.
    
    Parameters:
    s - the state
    ga - the grounded action.
    
    Returns:
    the Q-value gradient that was computed.

Class DifferentiableVFPlanner

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Detail

valueGradient

boltzBeta

Constructor Detail

DifferentiableVFPlanner

Method Detail

resetPlannerResults

performBellmanUpdateOn

performDPValueGradientUpdateOn

getValueGradient

getAllQGradients

getQGradient

setBoltzmannBetaParameter

computeQGradient