ValueIteration

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.planning.stochastic.valueiteration.ValueIteration

All Implemented Interfaces:

MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction

Direct Known Subclasses:

PrioritizedSweeping, VIModelLearningPlanner
```
public class ValueIteration
extends DynamicProgramming
implements Planner
```
An implementation of asynchronous value iteration. Values of states are updated using the Bellman operator in an arbitrary order and a complete pass over the state space is performed on each iteration. VI can be set to terminate under two possible conditions: when the maximum change in the value function is smaller than some threshold or when a threshold of iterations is passed. This implementation first determines the state space by finding all reachable states from a source state. The worst case time complexity of the reachability operation is equivalent to that of one VI iteration and has the added benefit that VI does not pass over non-reachable states. This implementation is compatible with options.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`foundReachableStates` Indicates whether the reachable states has been computed yet.
`protected boolean`	`hasRunVI`
`protected double`	`maxDelta` When the maximum change in the value function is smaller than this value, VI will terminate.
`protected int`	`maxIterations` When the number of VI iterations exceeds this value, VI will terminate.
`protected boolean`	`stopReachabilityFromTerminalStates` When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed.

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
operator, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`ValueIteration(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)` Initializers the valueFunction.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used by the `runVI()` method and will cache all the transition dynamics.
`GreedyQPolicy`	`planFromState(State initialState)` Plans from the input state and then returns a `GreedyQPolicy` that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
`void`	`recomputeReachableStates()` Calling this method will force the valueFunction to recompute the reachable states when the `planFromState(State)` method is called next.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`runVI()` Runs VI until the specified termination conditions are met.
`void`	`toggleReachabiltiyTerminalStatePruning(boolean toggle)` Sets whether the state reachability search to generate the state space will be prune the search from terminal states.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setOperator, setValueFunctionInitialization, value, value, writeValueTable

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

- Field Detail
  - maxDelta
```
protected double maxDelta
```
    When the maximum change in the value function is smaller than this value, VI will terminate.
  - maxIterations
```
protected int maxIterations
```
    When the number of VI iterations exceeds this value, VI will terminate.
  - foundReachableStates
```
protected boolean foundReachableStates
```
    Indicates whether the reachable states has been computed yet.
  - stopReachabilityFromTerminalStates
```
protected boolean stopReachabilityFromTerminalStates
```
    When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed. It can optionally be set so that the search is pruned at terminal states by setting this value to true. By default, it is false and the full reachable state space is found
  - hasRunVI
```
protected boolean hasRunVI
```
- Constructor Detail
  - ValueIteration
```
public ValueIteration(SADomain domain,
                      double gamma,
                      HashableStateFactory hashingFactory,
                      double maxDelta,
                      int maxIterations)
```
    Initializers the valueFunction.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factor to use
    
    maxDelta - when the maximum change in the value function is smaller than this value, VI will terminate.
    
    maxIterations - when the number of VI iterations exceeds this value, VI will terminate.
- Method Detail
  - recomputeReachableStates
```
public void recomputeReachableStates()
```
    Calling this method will force the valueFunction to recompute the reachable states when the planFromState(State) method is called next. This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.
  - toggleReachabiltiyTerminalStatePruning
```
public void toggleReachabiltiyTerminalStatePruning(boolean toggle)
```
    Sets whether the state reachability search to generate the state space will be prune the search from terminal states. The default is not to prune.
    
    Parameters:
    
    toggle - true if the search should prune the search at terminal states; false if the search should find all reachable states regardless of terminal states.
  - planFromState
```
public GreedyQPolicy planFromState(State initialState)
```
    Plans from the input state and then returns a GreedyQPolicy that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    
    initialState - the initial state of the planning problem
    
    Returns:
    
    a GreedyQPolicy.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Overrides:
    
    resetSolver in class DynamicProgramming
  - runVI
```
public void runVI()
```
    Runs VI until the specified termination conditions are met. In general, this method should only be called indirectly through the planFromState(State) method. The performReachabilityFrom(State) must have been performed at least once in the past or a runtime exception will be thrown. The planFromState(State) method will automatically call the performReachabilityFrom(State) method first and then this if it hasn't been run.
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used by the runVI() method and will cache all the transition dynamics. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    
    si - the source state from which all reachable states will be found
    
    Returns:
    
    true if a reachability analysis had never been performed from this state; false otherwise.

Class ValueIteration

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

maxDelta

maxIterations

foundReachableStates

stopReachabilityFromTerminalStates

hasRunVI

Constructor Detail

ValueIteration

Method Detail

recomputeReachableStates

toggleReachabiltiyTerminalStatePruning

planFromState

resetSolver

runVI

performReachabilityFrom