public class DifferentiableVI extends DifferentiableVFPlanner
DifferentiableRF
. This class
behaves the same as the normal ValueIteration
planner except for being in the differentiable value function case.ValueFunctionPlanner.StaticVFPlanner
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected boolean |
foundReachableStates
Indicates whether the reachable states has been computed yet.
|
protected boolean |
hasRunVI
Indicates whether VI has been run or not
|
protected double |
maxDelta
When the maximum change in the value function is smaller than this value, VI will terminate.
|
protected int |
maxIterations
When the number of VI iterations exceeds this value, VI will terminate.
|
protected boolean |
stopReachabilityFromTerminalStates
When the reachability analysis to find the state space is performed, a breadth first search-like pass
(spreading over all stochastic transitions) is performed.
|
boltzBeta, valueGradient
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
DifferentiableVI(Domain domain,
DifferentiableRF rf,
TerminalFunction tf,
double gamma,
double boltzBeta,
StateHashFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes the planner.
|
Modifier and Type | Method and Description |
---|---|
void |
addStatesToStateSpace(java.util.Collection<State> states)
Adds a
Collection of states over which VI will iterate. |
void |
addStateToStateSpace(State s)
Adds the given state to the state space over which VI iterates.
|
boolean |
performReachabilityFrom(State si)
This method will find all reachable states that will be used by the
runVI() method and will cache all the transition dynamics. |
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
recomputeReachableStates()
Calling this method will force the planner to recompute the reachable states when the
planFromState(burlap.oomdp.core.State) method is called next. |
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
void |
runVI()
Runs VI until the specified termination conditions are met.
|
void |
toggleReachabiltiyTerminalStatePruning(boolean toggle)
Sets whether the state reachability search to generate the state space will be prune the search from terminal states.
|
computeQGradient, getAllQGradients, getQGradient, getValueGradient, performBellmanUpdateOn, performDPValueGradientUpdateOn, setBoltzmannBetaParameter
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getQ, getQs
protected double maxDelta
protected int maxIterations
protected boolean foundReachableStates
protected boolean stopReachabilityFromTerminalStates
protected boolean hasRunVI
public DifferentiableVI(Domain domain, DifferentiableRF rf, TerminalFunction tf, double gamma, double boltzBeta, StateHashFactory hashingFactory, double maxDelta, int maxIterations)
domain
- the domain in which to planrf
- the differentiable reward function that will be usedtf
- the terminal state functiongamma
- the discount factorboltzBeta
- the scaling factor in the boltzmann distribution used for the state value function. The larger the value, the more deterministic.hashingFactory
- the state hashing factor to usemaxDelta
- when the maximum change in the value function is smaller than this value, VI will terminate.maxIterations
- when the number of VI iterations exceeds this value, VI will terminate.public void recomputeReachableStates()
planFromState(burlap.oomdp.core.State)
method is called next.
This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.public void toggleReachabiltiyTerminalStatePruning(boolean toggle)
toggle
- true if the search should prune the search at terminal states; false if the search should find all reachable states regardless of terminal states.public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class ValueFunctionPlanner
initialState
- the initial state of the planning problempublic void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class DifferentiableVFPlanner
public void runVI()
planFromState(State)
method.
The performReachabilityFrom(State)
must have been performed at least once
in the past or a runtime exception will be thrown. The planFromState(State)
method will automatically call the performReachabilityFrom(State)
method first and then this if it hasn't been run.public void addStateToStateSpace(State s)
s
- the state to addpublic void addStatesToStateSpace(java.util.Collection<State> states)
Collection
of states over which VI will iterate.states
- the collection of states.public boolean performReachabilityFrom(State si)
runVI()
method and will cache all the transition dynamics.
This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.si
- the source state from which all reachable states will be found