DifferentiableDP

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableDP

All Implemented Interfaces:

DifferentiableQFunction, DifferentiableValueFunction, MDPSolverInterface, QFunction, QProvider, ValueFunction

Direct Known Subclasses:

DifferentiableVI
```
public abstract class DifferentiableDP
extends DynamicProgramming
implements DifferentiableQFunction, DifferentiableValueFunction
```
A class for performing dynamic programming with a differentiable value backup operator. Specifically, all subclasses are assumed to use a Boltzmann backup operator and the reward functions must be differentiable by subclassing the DifferentiableRF class. The normal DynamicProgramming.performBellmanUpdateOn(burlap.statehashing.HashableState) method of the DynamicProgramming class is overridden with a method that uses the Boltzmann backup operator.

Author:

James MacGlashan.

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected DifferentiableRF`	`rf` The differentiable RF
`protected java.util.Map<HashableState,FunctionGradient>`	`valueGradient` The value function gradient for each state.

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
operator, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description

DifferentiableDP()

Constructors
Constructor and Description
`DifferentiableDP()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected java.util.Set<java.lang.Integer>`	`combinedNonZeroPDParameters(FunctionGradient... gradients)`
`protected FunctionGradient`	`computeQGradient(State s, Action ga)` Computes the Q-value gradient for the given `State` and `Action`.
`void`	`DPPInit(SADomain domain, double gamma, HashableStateFactory hashingFactory)` Common init method for `DynamicProgramming` instances.
`DifferentiableDPOperator`	`getOperator()` Returns the dynamic programming operator used
`protected FunctionGradient`	`performDPValueGradientUpdateOn(HashableState sh)` Performs the Boltzmann value function gradient backup for the given `HashableState`.
`FunctionGradient`	`qGradient(State s, Action a)` Returns the Q-value gradient (`QGradientTuple`) for the given state and action.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`setOperator(DPOperator operator)` Sets the dynamic programming operator use.
`FunctionGradient`	`valueGradient(State s)` Returns the gradient of this value function

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setValueFunctionInitialization, value, value, writeValueTable

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.valuefunction.QFunction
qValue

Methods inherited from interface burlap.behavior.valuefunction.ValueFunction
value

- Field Detail
  - valueGradient
```
protected java.util.Map<HashableState,FunctionGradient> valueGradient
```
    The value function gradient for each state.
  - rf
```
protected DifferentiableRF rf
```
    The differentiable RF
- Constructor Detail
  - DifferentiableDP
```
public DifferentiableDP()
```
- Method Detail
  - DPPInit
```
public void DPPInit(SADomain domain,
                    double gamma,
                    HashableStateFactory hashingFactory)
```
    Description copied from class: DynamicProgramming
    
    Common init method for DynamicProgramming instances. This will automatically call the MDPSolver.solverInit(SADomain, double, HashableStateFactory) method.
    
    Overrides:
    
    DPPInit in class DynamicProgramming
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Overrides:
    
    resetSolver in class DynamicProgramming
  - setOperator
```
public void setOperator(DPOperator operator)
```
    Description copied from class: DynamicProgramming
    
    Sets the dynamic programming operator use. Note that default setting is BellmanOperator (max)
    
    Overrides:
    
    setOperator in class DynamicProgramming
    
    Parameters:
    
    operator - the dynamic programming operator to use.
  - getOperator
```
public DifferentiableDPOperator getOperator()
```
    Description copied from class: DynamicProgramming
    
    Returns the dynamic programming operator used
    
    Overrides:
    
    getOperator in class DynamicProgramming
    
    Returns:
    
    the dynamic programming operator used
  - performDPValueGradientUpdateOn
```
protected FunctionGradient performDPValueGradientUpdateOn(HashableState sh)
```
    Performs the Boltzmann value function gradient backup for the given HashableState. Results are stored in this valueFunction's internal map.
    
    Parameters:
    
    sh - the hashed state on which to perform the Boltzmann gradient update.
    
    Returns:
    
    the gradient.
  - valueGradient
```
public FunctionGradient valueGradient(State s)
```
    Description copied from interface: DifferentiableValueFunction
    
    Returns the gradient of this value function
    
    Specified by:
    
    valueGradient in interface DifferentiableValueFunction
    
    Parameters:
    
    s - the state on which the function is to be evaluated
    
    Returns:
    
    the gradient of this value function
  - qGradient
```
public FunctionGradient qGradient(State s,
                                  Action a)
```
    Description copied from interface: DifferentiableQFunction
    
    Returns the Q-value gradient (QGradientTuple) for the given state and action.
    
    Specified by:
    
    qGradient in interface DifferentiableQFunction
    
    Parameters:
    
    s - the state for which the Q-value gradient is to be returned
    
    a - the action for which the Q-value gradient is to be returned.
    
    Returns:
    
    the Q-value gradient for the given state and action.
  - computeQGradient
```
protected FunctionGradient computeQGradient(State s,
                                            Action ga)
```
    Computes the Q-value gradient for the given State and Action.
    
    Parameters:
    
    s - the state
    
    ga - the grounded action.
    
    Returns:
    
    the Q-value gradient that was computed.
  - combinedNonZeroPDParameters
```
protected java.util.Set<java.lang.Integer> combinedNonZeroPDParameters(FunctionGradient... gradients)
```

Class DifferentiableDP

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.valuefunction.QFunction

Methods inherited from interface burlap.behavior.valuefunction.ValueFunction

Field Detail

valueGradient

rf

Constructor Detail

DifferentiableDP

Method Detail

DPPInit

resetSolver

setOperator

getOperator

performDPValueGradientUpdateOn

valueGradient

qGradient

computeQGradient

combinedNonZeroPDParameters