public abstract class DifferentiableDP extends DynamicProgramming implements QGradientPlanner
DifferentiableRF
class. The normal performBellmanUpdateOn(burlap.oomdp.statehashing.HashableState) method
of the DynamicProgramming class is overridden
with a method that uses the Boltzmann backup operator.DynamicProgramming.StaticVFPlannerQFunction.QFunctionHelper| Modifier and Type | Field and Description |
|---|---|
protected double |
boltzBeta
The Boltzmann backup operator beta parameter.
|
protected java.util.Map<HashableState,FunctionGradient> |
valueGradient
The value function gradient for each state.
|
transitionDynamics, useCachedTransitions, valueFunction, valueInitializeractions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description |
|---|
DifferentiableDP() |
| Modifier and Type | Method and Description |
|---|---|
protected java.util.Set<java.lang.Integer> |
combinedNonZeroPDParameters(FunctionGradient... gradients) |
protected FunctionGradient |
computeQGradient(State s,
GroundedAction ga)
Computes the Q-value gradient for the given
State and GroundedAction. |
java.util.List<QGradientTuple> |
getAllQGradients(State s)
Returns the list of Q-value gradients (returned as
objects) for each action permissible in the given state. |
QGradientTuple |
getQGradient(State s,
GroundedAction a)
Returns the Q-value gradient (
QGradientTuple) for the given state and action. |
FunctionGradient |
getValueGradient(State s)
Returns the value function gradient for the given
State |
protected double |
performBellmanUpdateOn(HashableState sh)
Overrides the superclass method to perform a Boltzmann backup operator
instead of a Bellman backup operator.
|
protected FunctionGradient |
performDPValueGradientUpdateOn(HashableState sh)
Performs the Boltzmann value function gradient backup for the given
HashableState. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
void |
setBoltzmannBetaParameter(double beta)
Sets this valueFunction's Boltzmann beta parameter used to compute gradients.
|
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, valueaddNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateActionclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitvalueprotected java.util.Map<HashableState,FunctionGradient> valueGradient
protected double boltzBeta
public void resetSolver()
MDPSolverInterfaceresetSolver in interface MDPSolverInterfaceresetSolver in class DynamicProgrammingprotected double performBellmanUpdateOn(HashableState sh)
performBellmanUpdateOn in class DynamicProgrammingsh - the hashed state on which to perform the Boltzmann update.protected FunctionGradient performDPValueGradientUpdateOn(HashableState sh)
HashableState.
Results are stored in this valueFunction's internal map.sh - the hashed state on which to perform the Boltzmann gradient update.public FunctionGradient getValueGradient(State s)
States - the state for which the gradient is be returned.Statepublic java.util.List<QGradientTuple> getAllQGradients(State s)
QGradientPlannerobjects) for each action permissible in the given state.getAllQGradients in interface QGradientPlanners - the state for which Q-value gradients are to be returned.public QGradientTuple getQGradient(State s, GroundedAction a)
QGradientPlannerQGradientTuple) for the given state and action.getQGradient in interface QGradientPlanners - the state for which the Q-value gradient is to be returneda - the action for which the Q-value gradient is to be returned.public void setBoltzmannBetaParameter(double beta)
QGradientPlannersetBoltzmannBetaParameter in interface QGradientPlannerbeta - the value to which this valueFunction's Boltzmann beta parameter will be setprotected FunctionGradient computeQGradient(State s, GroundedAction ga)
State and GroundedAction.s - the statega - the grounded action.protected java.util.Set<java.lang.Integer> combinedNonZeroPDParameters(FunctionGradient... gradients)