public abstract class DifferentiableVFPlanner extends ValueFunctionPlanner implements QGradientPlanner
DifferentiableRF
class. The normal performBellmanUpdateOn(burlap.behavior.statehashing.StateHashTuple)
method
of the ValueFunctionPlanner
class is overriden
with a method that uses the Boltzmann backup operator.ValueFunctionPlanner.StaticVFPlanner
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected double |
boltzBeta
The Boltzmann backup operator beta parameter.
|
protected java.util.Map<StateHashTuple,double[]> |
valueGradient
The value function gradient for each state.
|
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
DifferentiableVFPlanner() |
Modifier and Type | Method and Description |
---|---|
protected double[] |
computeQGradient(State s,
GroundedAction ga)
Computes the Q-value gradient for the given
State and GroundedAction . |
java.util.List<QGradientTuple> |
getAllQGradients(State s)
Returns the list of Q-value gradients (returned as
objects ) for each action permissible in the given state. |
QGradientTuple |
getQGradient(State s,
GroundedAction a)
Returns the Q-value gradient (
QGradientTuple ) for the given state and action. |
double[] |
getValueGradient(State s)
Returns the value function gradient for the given
State |
protected double |
performBellmanUpdateOn(StateHashTuple sh)
Overrides the superclass method to perform a Boltzmann backup operator
instead of a Bellman backup operator.
|
protected double[] |
performDPValueGradientUpdateOn(StateHashTuple sh)
Performs the Boltzmann value function gradient backup for the given
StateHashTuple . |
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
void |
setBoltzmannBetaParameter(double beta)
Sets this planner's Boltzmann beta parameter used to compute gradients.
|
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, planFromState, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getQ, getQs
protected java.util.Map<StateHashTuple,double[]> valueGradient
protected double boltzBeta
public void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class ValueFunctionPlanner
protected double performBellmanUpdateOn(StateHashTuple sh)
performBellmanUpdateOn
in class ValueFunctionPlanner
sh
- the hashed state on which to perform the Boltzmann update.protected double[] performDPValueGradientUpdateOn(StateHashTuple sh)
StateHashTuple
.
Results are stored in this planner's internal map.sh
- the hashed state on which to perform the Boltzmann gradient update.public double[] getValueGradient(State s)
State
s
- the state for which the gradient is be returned.State
public java.util.List<QGradientTuple> getAllQGradients(State s)
QGradientPlanner
objects
) for each action permissible in the given state.getAllQGradients
in interface QGradientPlanner
s
- the state for which Q-value gradients are to be returned.public QGradientTuple getQGradient(State s, GroundedAction a)
QGradientPlanner
QGradientTuple
) for the given state and action.getQGradient
in interface QGradientPlanner
s
- the state for which the Q-value gradient is to be returneda
- the action for which the Q-value gradient is to be returned.public void setBoltzmannBetaParameter(double beta)
QGradientPlanner
setBoltzmannBetaParameter
in interface QGradientPlanner
beta
- the value to which this planner's Boltzmann beta parameter will be setprotected double[] computeQGradient(State s, GroundedAction ga)
State
and GroundedAction
.s
- the statega
- the grounded action.