public abstract class DifferentiableDP extends DynamicProgramming implements DifferentiableQFunction, DifferentiableValueFunction
DifferentiableRF
class. The normal DynamicProgramming.performBellmanUpdateOn(burlap.statehashing.HashableState) method
of the DynamicProgramming class is overridden
with a method that uses the Boltzmann backup operator.QProvider.Helper| Modifier and Type | Field and Description |
|---|---|
protected DifferentiableRF |
rf
The differentiable RF
|
protected java.util.Map<HashableState,FunctionGradient> |
valueGradient
The value function gradient for each state.
|
operator, valueFunction, valueInitializeractionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel| Constructor and Description |
|---|
DifferentiableDP() |
| Modifier and Type | Method and Description |
|---|---|
protected java.util.Set<java.lang.Integer> |
combinedNonZeroPDParameters(FunctionGradient... gradients) |
protected FunctionGradient |
computeQGradient(State s,
Action ga)
|
void |
DPPInit(SADomain domain,
double gamma,
HashableStateFactory hashingFactory)
Common init method for
DynamicProgramming instances. |
DifferentiableDPOperator |
getOperator()
Returns the dynamic programming operator used
|
protected FunctionGradient |
performDPValueGradientUpdateOn(HashableState sh)
Performs the Boltzmann value function gradient backup for the given
HashableState. |
FunctionGradient |
qGradient(State s,
Action a)
Returns the Q-value gradient (
QGradientTuple) for the given state and action. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
void |
setOperator(DPOperator operator)
Sets the dynamic programming operator use.
|
FunctionGradient |
valueGradient(State s)
Returns the gradient of this value function
|
computeQ, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setValueFunctionInitialization, value, value, writeValueTableaddActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrintingclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitvalueprotected java.util.Map<HashableState,FunctionGradient> valueGradient
protected DifferentiableRF rf
public void DPPInit(SADomain domain, double gamma, HashableStateFactory hashingFactory)
DynamicProgrammingDynamicProgramming instances. This will automatically call the
MDPSolver.solverInit(SADomain, double, HashableStateFactory)
method.DPPInit in class DynamicProgrammingdomain - the domain in which to plangamma - the discount factorhashingFactory - the state hashing factorypublic void resetSolver()
MDPSolverInterfaceresetSolver in interface MDPSolverInterfaceresetSolver in class DynamicProgrammingpublic void setOperator(DPOperator operator)
DynamicProgrammingBellmanOperator (max)setOperator in class DynamicProgrammingoperator - the dynamic programming operator to use.public DifferentiableDPOperator getOperator()
DynamicProgramminggetOperator in class DynamicProgrammingprotected FunctionGradient performDPValueGradientUpdateOn(HashableState sh)
HashableState.
Results are stored in this valueFunction's internal map.sh - the hashed state on which to perform the Boltzmann gradient update.public FunctionGradient valueGradient(State s)
DifferentiableValueFunctionvalueGradient in interface DifferentiableValueFunctions - the state on which the function is to be evaluatedpublic FunctionGradient qGradient(State s, Action a)
DifferentiableQFunctionQGradientTuple) for the given state and action.qGradient in interface DifferentiableQFunctions - the state for which the Q-value gradient is to be returneda - the action for which the Q-value gradient is to be returned.protected FunctionGradient computeQGradient(State s, Action ga)
s - the statega - the grounded action.protected java.util.Set<java.lang.Integer> combinedNonZeroPDParameters(FunctionGradient... gradients)