DifferentiableDP

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableDP

All Implemented Interfaces:

QGradientPlanner, MDPSolverInterface, QFunction, ValueFunction

Direct Known Subclasses:

DifferentiableVI
```
public abstract class DifferentiableDP
extends DynamicProgramming
implements QGradientPlanner
```
A class for performing dynamic programming with a differentiable value backup operator. Specifically, all subclasses are assumed to use a Boltzmann backup operator and the reward functions must be differentiable by subclassing the DifferentiableRF class. The normal performBellmanUpdateOn(burlap.oomdp.statehashing.HashableState) method of the DynamicProgramming class is overridden with a method that uses the Boltzmann backup operator.

Author:

James MacGlashan.

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  DynamicProgramming.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction
  QFunction.QFunctionHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`boltzBeta` The Boltzmann backup operator beta parameter.
`protected java.util.Map<HashableState,FunctionGradient>`	`valueGradient` The value function gradient for each state.

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description

DifferentiableDP()

Constructors
Constructor and Description
`DifferentiableDP()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected java.util.Set<java.lang.Integer>`	`combinedNonZeroPDParameters(FunctionGradient... gradients)`
`protected FunctionGradient`	`computeQGradient(State s, GroundedAction ga)` Computes the Q-value gradient for the given `State` and `GroundedAction`.
`java.util.List<QGradientTuple>`	`getAllQGradients(State s)` Returns the list of Q-value gradients (returned as `objects`) for each action permissible in the given state.
`QGradientTuple`	`getQGradient(State s, GroundedAction a)` Returns the Q-value gradient (`QGradientTuple`) for the given state and action.
`FunctionGradient`	`getValueGradient(State s)` Returns the value function gradient for the given `State`
`protected double`	`performBellmanUpdateOn(HashableState sh)` Overrides the superclass method to perform a Boltzmann backup operator instead of a Bellman backup operator.
`protected FunctionGradient`	`performDPValueGradientUpdateOn(HashableState sh)` Performs the Boltzmann value function gradient backup for the given `HashableState`.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`setBoltzmannBetaParameter(double beta)` Sets this valueFunction's Boltzmann beta parameter used to compute gradients.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.valuefunction.QFunction
getQ, getQs

Methods inherited from interface burlap.behavior.valuefunction.ValueFunction
value

- Field Detail
  - valueGradient
```
protected java.util.Map<HashableState,FunctionGradient> valueGradient
```
    The value function gradient for each state.
  - boltzBeta
```
protected double boltzBeta
```
    The Boltzmann backup operator beta parameter. The larger the beta, the more deterministic the back up.
- Constructor Detail
  - DifferentiableDP
```
public DifferentiableDP()
```
- Method Detail
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Overrides:
    
    resetSolver in class DynamicProgramming
  - performBellmanUpdateOn
```
protected double performBellmanUpdateOn(HashableState sh)
```
    Overrides the superclass method to perform a Boltzmann backup operator instead of a Bellman backup operator. Results are stored in this valueFunction's internal map.
    
    Overrides:
    
    performBellmanUpdateOn in class DynamicProgramming
    
    Parameters:
    sh - the hashed state on which to perform the Boltzmann update.
    
    Returns:
    the new value
  - performDPValueGradientUpdateOn
```
protected FunctionGradient performDPValueGradientUpdateOn(HashableState sh)
```
    Performs the Boltzmann value function gradient backup for the given HashableState. Results are stored in this valueFunction's internal map.
    
    Parameters:
    sh - the hashed state on which to perform the Boltzmann gradient update.
    
    Returns:
    the gradient.
  - getValueGradient
```
public FunctionGradient getValueGradient(State s)
```
    Returns the value function gradient for the given State
    
    Parameters:
    s - the state for which the gradient is be returned.
    
    Returns:
    the value function gradient for the given State
  - getAllQGradients
```
public java.util.List<QGradientTuple> getAllQGradients(State s)
```
    Description copied from interface: QGradientPlanner
    
    Returns the list of Q-value gradients (returned as objects) for each action permissible in the given state.
    
    Specified by:
    
    getAllQGradients in interface QGradientPlanner
    
    Parameters:
    s - the state for which Q-value gradients are to be returned.
    
    Returns:
    the list of Q-value gradients for each action permissible in the given state.
  - getQGradient
```
public QGradientTuple getQGradient(State s,
                          GroundedAction a)
```
    Description copied from interface: QGradientPlanner
    
    Returns the Q-value gradient (QGradientTuple) for the given state and action.
    
    Specified by:
    
    getQGradient in interface QGradientPlanner
    
    Parameters:
    s - the state for which the Q-value gradient is to be returned
    a - the action for which the Q-value gradient is to be returned.
    
    Returns:
    the Q-value gradient for the given state and action.
  - setBoltzmannBetaParameter
```
public void setBoltzmannBetaParameter(double beta)
```
    Description copied from interface: QGradientPlanner
    
    Sets this valueFunction's Boltzmann beta parameter used to compute gradients. As beta gets larger, the policy becomes more deterministic.
    
    Specified by:
    
    setBoltzmannBetaParameter in interface QGradientPlanner
    
    Parameters:
    beta - the value to which this valueFunction's Boltzmann beta parameter will be set
  - computeQGradient
```
protected FunctionGradient computeQGradient(State s,
                                GroundedAction ga)
```
    Computes the Q-value gradient for the given State and GroundedAction.
    
    Parameters:
    s - the state
    ga - the grounded action.
    
    Returns:
    the Q-value gradient that was computed.
  - combinedNonZeroPDParameters
```
protected java.util.Set<java.lang.Integer> combinedNonZeroPDParameters(FunctionGradient... gradients)
```

Class DifferentiableDP

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.valuefunction.QFunction

Methods inherited from interface burlap.behavior.valuefunction.ValueFunction

Field Detail

valueGradient

boltzBeta

Constructor Detail

DifferentiableDP

Method Detail

resetSolver

performBellmanUpdateOn

performDPValueGradientUpdateOn

getValueGradient

getAllQGradients

getQGradient

setBoltzmannBetaParameter

computeQGradient

combinedNonZeroPDParameters