DifferentiableSparseSampling

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.learnfromdemo.mlirl.differentiableplanners.DifferentiableSparseSampling

All Implemented Interfaces:

QGradientPlanner, MDPSolverInterface, Planner, QFunction, ValueFunction
```
public class DifferentiableSparseSampling
extends MDPSolver
implements QGradientPlanner, Planner
```
A Differentiable finite horizon valueFunction that can also use sparse sampling over the transition dynamics when the transition function is very large or infinite. This valueFunction can be used to perform Receding Horizon Inverse Reinforcement Learning [1] with BURLAP's implementation of maximum likelihood inverse reinforcement learning (MLIRL) [2]. Additionally, the value of the leaf nodes of this valueFunction may also be parametrized using a DifferentiableVInit object and learned with MLIRL, enabling a nice separation of shaping features/rewards and the learned (or known) reward function.

1. MacGlashan, J. Littman, M., "Between Imitation and Intention Learning," Proceedings of IJCAI 15, 2015. 2. Babes, M., Marivate, V., Subramanian, K., and Littman, "Apprenticeship learning about multiple intentions." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.

Author:

James MacGlashan.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`class`	`DifferentiableSparseSampling.DiffStateNode` A class for value differentiable state nodes.
`protected static class`	`DifferentiableSparseSampling.QAndQGradient` A tuple for storing Q-values and their gradients.
`protected static class`	`DifferentiableSparseSampling.VAndVGradient` A tuple for storing a state value and its gradient.

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction
QFunction.QFunctionHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`boltzBeta` The Boltzmann beta parameter that defines the differentiable Bellman equation.
`protected int`	`c` The number of transition dynamics samples (for the root if depth-variable C is used)
`protected boolean`	`forgetPreviousPlanResults` Whether previous planning results should be forgotten or reused; default is reused (false).
`protected int`	`h` The height of the tree
`protected java.util.Map<SparseSampling.HashedHeightState,DifferentiableSparseSampling.DiffStateNode>`	`nodesByHeight` The tree nodes indexed by state and height.
`protected int`	`numUpdates` The total number of pseudo-Bellman updates
`protected int`	`rfDim` The dimensionality of the differentiable reward function
`protected java.util.Map<HashableState,DifferentiableSparseSampling.QAndQGradient>`	`rootLevelQValues` The root state node Q-values that have been estimated by previous planning calls.
`protected boolean`	`useVariableC` Whether the number of transition dynamic samples should scale with the depth of the node.
`protected DifferentiableVInit`	`vinit` The state value used for leaf nodes; default is zero.

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`DifferentiableSparseSampling(Domain domain, DifferentiableRF rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, int h, int c, double boltzBeta)` Initializes.

Method Summary

Methods
Modifier and Type	Method and Description
`protected java.util.Set<java.lang.Integer>`	`combinedNonZeroPDParameters(FunctionGradient... gradients)`
`java.util.List<QGradientTuple>`	`getAllQGradients(State s)` Returns the list of Q-value gradients (returned as `objects`) for each action permissible in the given state.
`int`	`getC()` Returns the number of state transition samples
`protected int`	`getCAtHeight(int height)` Returns the value of C for a node at the given height (height from a leaf node).
`int`	`getDebugCode()` Returns the debug code used for logging plan results with `DPrint`.
`int`	`getH()` Returns the height of the tree
`int`	`getNumberOfValueEsitmates()` Returns the total number of state value estimates performed since the `resetSolver()` call.
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`QGradientTuple`	`getQGradient(State s, GroundedAction a)` Returns the Q-value gradient (`QGradientTuple`) for the given state and action.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`protected DifferentiableSparseSampling.DiffStateNode`	`getStateNode(State s, int height)` Either returns, or creates, indexes, and returns, the state node for the given state at the given height in the tree
`BoltzmannQPolicy`	`planFromState(State initialState)` Plans from the input state and returns a `BoltzmannQPolicy` following the Boltzmann parameter used for value Botlzmann value backups in this planner.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`setBoltzmannBetaParameter(double beta)` Sets this valueFunction's Boltzmann beta parameter used to compute gradients.
`void`	`setC(int c)` Sets the number of state transition samples used.
`void`	`setDebugCode(int debugCode)` Sets the debug code used for logging plan results with `DPrint`.
`void`	`setForgetPreviousPlanResults(boolean forgetPreviousPlanResults)` Sets whether previous planning results should be forgetten or resued in subsequent planning.
`void`	`setH(int h)` Sets the height of the tree.
`void`	`setUseVariableCSize(boolean useVariableC)` Sets whether the number of state transition samples (C) should be variable with respect to the depth of the node.
`void`	`setValueForLeafNodes(ValueFunctionInitialization vinit)` Sets the `ValueFunctionInitialization` object to use for settting the value of leaf nodes.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addNonDomainReferencedAction, getActions, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrinting

- Field Detail
  - h
```
protected int h
```
    The height of the tree
  - c
```
protected int c
```
    The number of transition dynamics samples (for the root if depth-variable C is used)
  - useVariableC
```
protected boolean useVariableC
```
    Whether the number of transition dynamic samples should scale with the depth of the node. Default is false.
  - forgetPreviousPlanResults
```
protected boolean forgetPreviousPlanResults
```
    Whether previous planning results should be forgotten or reused; default is reused (false).
  - vinit
```
protected DifferentiableVInit vinit
```
    The state value used for leaf nodes; default is zero.
  - nodesByHeight
```
protected java.util.Map<SparseSampling.HashedHeightState,DifferentiableSparseSampling.DiffStateNode> nodesByHeight
```
    The tree nodes indexed by state and height.
  - rootLevelQValues
```
protected java.util.Map<HashableState,DifferentiableSparseSampling.QAndQGradient> rootLevelQValues
```
    The root state node Q-values that have been estimated by previous planning calls.
  - boltzBeta
```
protected double boltzBeta
```
    The Boltzmann beta parameter that defines the differentiable Bellman equation. The larger the value, the more deterministic the soft max operator is.
  - rfDim
```
protected int rfDim
```
    The dimensionality of the differentiable reward function
  - numUpdates
```
protected int numUpdates
```
    The total number of pseudo-Bellman updates
- Constructor Detail
  - DifferentiableSparseSampling
```
public DifferentiableSparseSampling(Domain domain,
                            DifferentiableRF rf,
                            TerminalFunction tf,
                            double gamma,
                            HashableStateFactory hashingFactory,
                            int h,
                            int c,
                            double boltzBeta)
```
    Initializes.
    
    Parameters:
    domain - the problem domain
    rf - the differentiable reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the hashing factory used to compare state equality
    h - the planning horizon
    c - how many samples from the transition dynamics to use. Set to -1 to use the full (unsampled) transition dynamics.
    boltzBeta - the Boltzmann beta parameter for the differentiable Boltzmann (softmax) backup equation. The larger the value the more deterministic, the closer to 1 the softer.
- Method Detail
  - setUseVariableCSize
```
public void setUseVariableCSize(boolean useVariableC)
```
    Sets whether the number of state transition samples (C) should be variable with respect to the depth of the node. If set to true, then the samples will be defined using C_i = C_0 * gamma^(2i), where i is the depth of the node from the root, gamma is the discount factor and C_0 is the normal C value set for this object.
    
    Parameters:
    useVariableC - if true, then depth-variable C will be used; if false, all state nodes use the same number of samples.
  - setC
```
public void setC(int c)
```
    Sets the number of state transition samples used.
    
    Parameters:
    c - the number of state transition samples used.
  - setH
```
public void setH(int h)
```
    Sets the height of the tree.
    
    Parameters:
    h - the height of the tree.
  - getC
```
public int getC()
```
    Returns the number of state transition samples
    
    Returns:
    teh number of state transition samples
  - getH
```
public int getH()
```
    Returns the height of the tree
    
    Returns:
    the height of the tree
  - setForgetPreviousPlanResults
```
public void setForgetPreviousPlanResults(boolean forgetPreviousPlanResults)
```
    Sets whether previous planning results should be forgetten or resued in subsequent planning. Forgetting results is more memory efficient, but less CPU efficient.
    
    Parameters:
    forgetPreviousPlanResults - if true, then previous planning results will be forgotten; if true, they will be remembered and reused in susbequent planning.
  - setValueForLeafNodes
```
public void setValueForLeafNodes(ValueFunctionInitialization vinit)
```
    Sets the ValueFunctionInitialization object to use for settting the value of leaf nodes.
    
    Parameters:
    vinit - the ValueFunctionInitialization object to use for settting the value of leaf nodes.
  - getDebugCode
```
public int getDebugCode()
```
    Returns the debug code used for logging plan results with DPrint.
    
    Specified by:
    
    getDebugCode in interface MDPSolverInterface
    
    Overrides:
    
    getDebugCode in class MDPSolver
    
    Returns:
    the debug code used for logging plan results with DPrint.
  - setDebugCode
```
public void setDebugCode(int debugCode)
```
    Sets the debug code used for logging plan results with DPrint.
    
    Specified by:
    
    setDebugCode in interface MDPSolverInterface
    
    Overrides:
    
    setDebugCode in class MDPSolver
    
    Parameters:
    debugCode - the debugCode to use.
  - getNumberOfValueEsitmates
```
public int getNumberOfValueEsitmates()
```
    Returns the total number of state value estimates performed since the resetSolver() call.
    
    Returns:
    the total number of state value estimates performed since the resetSolver() call.
  - setBoltzmannBetaParameter
```
public void setBoltzmannBetaParameter(double beta)
```
    Description copied from interface: QGradientPlanner
    
    Sets this valueFunction's Boltzmann beta parameter used to compute gradients. As beta gets larger, the policy becomes more deterministic.
    
    Specified by:
    
    setBoltzmannBetaParameter in interface QGradientPlanner
    
    Parameters:
    beta - the value to which this valueFunction's Boltzmann beta parameter will be set
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QFunction
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QFunction
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QFunction
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    s - the state to evaluate.
    
    Returns:
    the value function evaluation of the given state.
  - getAllQGradients
```
public java.util.List<QGradientTuple> getAllQGradients(State s)
```
    Description copied from interface: QGradientPlanner
    
    Returns the list of Q-value gradients (returned as objects) for each action permissible in the given state.
    
    Specified by:
    
    getAllQGradients in interface QGradientPlanner
    
    Parameters:
    s - the state for which Q-value gradients are to be returned.
    
    Returns:
    the list of Q-value gradients for each action permissible in the given state.
  - getQGradient
```
public QGradientTuple getQGradient(State s,
                          GroundedAction a)
```
    Description copied from interface: QGradientPlanner
    
    Returns the Q-value gradient (QGradientTuple) for the given state and action.
    
    Specified by:
    
    getQGradient in interface QGradientPlanner
    
    Parameters:
    s - the state for which the Q-value gradient is to be returned
    a - the action for which the Q-value gradient is to be returned.
    
    Returns:
    the Q-value gradient for the given state and action.
  - planFromState
```
public BoltzmannQPolicy planFromState(State initialState)
```
    Plans from the input state and returns a BoltzmannQPolicy following the Boltzmann parameter used for value Botlzmann value backups in this planner.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    initialState - the initial state of the planning problem
    
    Returns:
    a BoltzmannQPolicy
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver
  - getCAtHeight
```
protected int getCAtHeight(int height)
```
    Returns the value of C for a node at the given height (height from a leaf node).
    
    Parameters:
    height - the height from a leaf node.
    
    Returns:
    the value of C to use.
  - getStateNode
```
protected DifferentiableSparseSampling.DiffStateNode getStateNode(State s,
                                                      int height)
```
    Either returns, or creates, indexes, and returns, the state node for the given state at the given height in the tree
    
    Parameters:
    s - the state
    height - the height (distance from leaf node) of the node.
    
    Returns:
    the state node for the given state at the given height in the tree
  - combinedNonZeroPDParameters
```
protected java.util.Set<java.lang.Integer> combinedNonZeroPDParameters(FunctionGradient... gradients)
```

Class DifferentiableSparseSampling

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

h

c

useVariableC

forgetPreviousPlanResults

vinit

nodesByHeight

rootLevelQValues

boltzBeta

rfDim

numUpdates

Constructor Detail

DifferentiableSparseSampling

Method Detail

setUseVariableCSize

setC

setH

getC

getH

setForgetPreviousPlanResults

setValueForLeafNodes

getDebugCode

setDebugCode

getNumberOfValueEsitmates

setBoltzmannBetaParameter

getQs

getQ

value

getAllQGradients

getQGradient

planFromState

resetSolver

getCAtHeight

getStateNode

combinedNonZeroPDParameters