public abstract class DifferentiableDP extends DynamicProgramming implements DifferentiableQFunction, DifferentiableValueFunction
DifferentiableRF
class. The normal DynamicProgramming.performBellmanUpdateOn(burlap.statehashing.HashableState)
method
of the DynamicProgramming
class is overridden
with a method that uses the Boltzmann backup operator.QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected DifferentiableRF |
rf
The differentiable RF
|
protected java.util.Map<HashableState,FunctionGradient> |
valueGradient
The value function gradient for each state.
|
operator, valueFunction, valueInitializer
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
DifferentiableDP() |
Modifier and Type | Method and Description |
---|---|
protected java.util.Set<java.lang.Integer> |
combinedNonZeroPDParameters(FunctionGradient... gradients) |
protected FunctionGradient |
computeQGradient(State s,
Action ga)
|
void |
DPPInit(SADomain domain,
double gamma,
HashableStateFactory hashingFactory)
Common init method for
DynamicProgramming instances. |
DifferentiableDPOperator |
getOperator()
Returns the dynamic programming operator used
|
protected FunctionGradient |
performDPValueGradientUpdateOn(HashableState sh)
Performs the Boltzmann value function gradient backup for the given
HashableState . |
FunctionGradient |
qGradient(State s,
Action a)
Returns the Q-value gradient (
QGradientTuple ) for the given state and action. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
void |
setOperator(DPOperator operator)
Sets the dynamic programming operator use.
|
FunctionGradient |
valueGradient(State s)
Returns the gradient of this value function
|
computeQ, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setValueFunctionInitialization, value, value, writeValueTable
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
value
protected java.util.Map<HashableState,FunctionGradient> valueGradient
protected DifferentiableRF rf
public void DPPInit(SADomain domain, double gamma, HashableStateFactory hashingFactory)
DynamicProgramming
DynamicProgramming
instances. This will automatically call the
MDPSolver.solverInit(SADomain, double, HashableStateFactory)
method.DPPInit
in class DynamicProgramming
domain
- the domain in which to plangamma
- the discount factorhashingFactory
- the state hashing factorypublic void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class DynamicProgramming
public void setOperator(DPOperator operator)
DynamicProgramming
BellmanOperator
(max)setOperator
in class DynamicProgramming
operator
- the dynamic programming operator to use.public DifferentiableDPOperator getOperator()
DynamicProgramming
getOperator
in class DynamicProgramming
protected FunctionGradient performDPValueGradientUpdateOn(HashableState sh)
HashableState
.
Results are stored in this valueFunction's internal map.sh
- the hashed state on which to perform the Boltzmann gradient update.public FunctionGradient valueGradient(State s)
DifferentiableValueFunction
valueGradient
in interface DifferentiableValueFunction
s
- the state on which the function is to be evaluatedpublic FunctionGradient qGradient(State s, Action a)
DifferentiableQFunction
QGradientTuple
) for the given state and action.qGradient
in interface DifferentiableQFunction
s
- the state for which the Q-value gradient is to be returneda
- the action for which the Q-value gradient is to be returned.protected FunctionGradient computeQGradient(State s, Action ga)
s
- the statega
- the grounded action.protected java.util.Set<java.lang.Integer> combinedNonZeroPDParameters(FunctionGradient... gradients)