public abstract class DifferentiableDP extends DynamicProgramming implements QGradientPlanner
DifferentiableRF
class. The normal performBellmanUpdateOn(burlap.oomdp.statehashing.HashableState)
method
of the DynamicProgramming
class is overridden
with a method that uses the Boltzmann backup operator.DynamicProgramming.StaticVFPlanner
QFunction.QFunctionHelper
Modifier and Type | Field and Description |
---|---|
protected double |
boltzBeta
The Boltzmann backup operator beta parameter.
|
protected java.util.Map<HashableState,FunctionGradient> |
valueGradient
The value function gradient for each state.
|
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
DifferentiableDP() |
Modifier and Type | Method and Description |
---|---|
protected java.util.Set<java.lang.Integer> |
combinedNonZeroPDParameters(FunctionGradient... gradients) |
protected FunctionGradient |
computeQGradient(State s,
GroundedAction ga)
Computes the Q-value gradient for the given
State and GroundedAction . |
java.util.List<QGradientTuple> |
getAllQGradients(State s)
Returns the list of Q-value gradients (returned as
objects ) for each action permissible in the given state. |
QGradientTuple |
getQGradient(State s,
GroundedAction a)
Returns the Q-value gradient (
QGradientTuple ) for the given state and action. |
FunctionGradient |
getValueGradient(State s)
Returns the value function gradient for the given
State |
protected double |
performBellmanUpdateOn(HashableState sh)
Overrides the superclass method to perform a Boltzmann backup operator
instead of a Bellman backup operator.
|
protected FunctionGradient |
performDPValueGradientUpdateOn(HashableState sh)
Performs the Boltzmann value function gradient backup for the given
HashableState . |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
void |
setBoltzmannBetaParameter(double beta)
Sets this valueFunction's Boltzmann beta parameter used to compute gradients.
|
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
value
protected java.util.Map<HashableState,FunctionGradient> valueGradient
protected double boltzBeta
public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class DynamicProgramming
protected double performBellmanUpdateOn(HashableState sh)
performBellmanUpdateOn
in class DynamicProgramming
sh
- the hashed state on which to perform the Boltzmann update.protected FunctionGradient performDPValueGradientUpdateOn(HashableState sh)
HashableState
.
Results are stored in this valueFunction's internal map.sh
- the hashed state on which to perform the Boltzmann gradient update.public FunctionGradient getValueGradient(State s)
State
s
- the state for which the gradient is be returned.State
public java.util.List<QGradientTuple> getAllQGradients(State s)
QGradientPlanner
objects
) for each action permissible in the given state.getAllQGradients
in interface QGradientPlanner
s
- the state for which Q-value gradients are to be returned.public QGradientTuple getQGradient(State s, GroundedAction a)
QGradientPlanner
QGradientTuple
) for the given state and action.getQGradient
in interface QGradientPlanner
s
- the state for which the Q-value gradient is to be returneda
- the action for which the Q-value gradient is to be returned.public void setBoltzmannBetaParameter(double beta)
QGradientPlanner
setBoltzmannBetaParameter
in interface QGradientPlanner
beta
- the value to which this valueFunction's Boltzmann beta parameter will be setprotected FunctionGradient computeQGradient(State s, GroundedAction ga)
State
and GroundedAction
.s
- the statega
- the grounded action.protected java.util.Set<java.lang.Integer> combinedNonZeroPDParameters(FunctionGradient... gradients)