public class ValueIteration extends DynamicProgramming implements Planner
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected boolean |
foundReachableStates
Indicates whether the reachable states has been computed yet.
|
protected boolean |
hasRunVI |
protected double |
maxDelta
When the maximum change in the value function is smaller than this value, VI will terminate.
|
protected int |
maxIterations
When the number of VI iterations exceeds this value, VI will terminate.
|
protected boolean |
stopReachabilityFromTerminalStates
When the reachability analysis to find the state space is performed, a breadth first search-like pass
(spreading over all stochastic transitions) is performed.
|
operator, valueFunction, valueInitializer
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
ValueIteration(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializers the valueFunction.
|
Modifier and Type | Method and Description |
---|---|
boolean |
performReachabilityFrom(State si)
This method will find all reachable states that will be used by the
runVI() method and will cache all the transition dynamics. |
GreedyQPolicy |
planFromState(State initialState)
Plans from the input state and then returns a
GreedyQPolicy that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly. |
void |
recomputeReachableStates()
Calling this method will force the valueFunction to recompute the reachable states when the
planFromState(State) method is called next. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
void |
runVI()
Runs VI until the specified termination conditions are met.
|
void |
toggleReachabiltiyTerminalStatePruning(boolean toggle)
Sets whether the state reachability search to generate the state space will be prune the search from terminal states.
|
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setOperator, setValueFunctionInitialization, value, value, writeValueTable
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
protected double maxDelta
protected int maxIterations
protected boolean foundReachableStates
protected boolean stopReachabilityFromTerminalStates
protected boolean hasRunVI
public ValueIteration(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)
domain
- the domain in which to plangamma
- the discount factorhashingFactory
- the state hashing factor to usemaxDelta
- when the maximum change in the value function is smaller than this value, VI will terminate.maxIterations
- when the number of VI iterations exceeds this value, VI will terminate.public void recomputeReachableStates()
planFromState(State)
method is called next.
This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.public void toggleReachabiltiyTerminalStatePruning(boolean toggle)
toggle
- true if the search should prune the search at terminal states; false if the search should find all reachable states regardless of terminal states.public GreedyQPolicy planFromState(State initialState)
GreedyQPolicy
that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly.planFromState
in interface Planner
initialState
- the initial state of the planning problemGreedyQPolicy
.public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class DynamicProgramming
public void runVI()
planFromState(State)
method.
The performReachabilityFrom(State)
must have been performed at least once
in the past or a runtime exception will be thrown. The planFromState(State)
method will automatically call the performReachabilityFrom(State)
method first and then this if it hasn't been run.public boolean performReachabilityFrom(State si)
runVI()
method and will cache all the transition dynamics.
This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.si
- the source state from which all reachable states will be found