ValueIteration

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.planning.ValueFunctionPlanner
  - - burlap.behavior.singleagent.planning.stochastic.valueiteration.ValueIteration

All Implemented Interfaces:

QComputablePlanner, ValueFunction

Direct Known Subclasses:

PrioritizedSweeping
```
public class ValueIteration
extends ValueFunctionPlanner
```
An implementation of asynchronous value iteration. Values of states are updated using the Bellman operator in an arbitrary order and a complete pass over the state space is performed on each iteration. VI can be set to terminate under two possible conditions: when the maximum change in the value function is smaller than some threshold or when a threshold of iterations is passed. This implementation first determines the state space by finding all reachable states from a source state. The worst case time complexity of the reachability operation is equivalent to that of one VI iteration and has the added benefit that VI does not pass over non-reachable states. This implementation is compatible with options.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
  ValueFunctionPlanner.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`foundReachableStates` Indicates whether the reachable states has been computed yet.
`protected boolean`	`hasRunVI`
`protected double`	`maxDelta` When the maximum change in the value function is smaller than this value, VI will terminate.
`protected int`	`maxIterations` When the number of VI iterations exceeds this value, VI will terminate.
`protected boolean`	`stopReachabilityFromTerminalStates` When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed.

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`ValueIteration(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxDelta, int maxIterations)` Initializers the planner.

Method Summary

Methods
Modifier and Type	Method and Description
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used by the `runVI()` method and will cache all the transition dynamics.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`recomputeReachableStates()` Calling this method will force the planner to recompute the reachable states when the `planFromState(State)` method is called next.
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`void`	`runVI()` Runs VI until the specified termination conditions are met.
`void`	`toggleReachabiltiyTerminalStatePruning(boolean toggle)` Sets whether the state reachability search to generate the state space will be prune the search from terminal states.

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - maxDelta
```
protected double maxDelta
```
    When the maximum change in the value function is smaller than this value, VI will terminate.
  - maxIterations
```
protected int maxIterations
```
    When the number of VI iterations exceeds this value, VI will terminate.
  - foundReachableStates
```
protected boolean foundReachableStates
```
    Indicates whether the reachable states has been computed yet.
  - stopReachabilityFromTerminalStates
```
protected boolean stopReachabilityFromTerminalStates
```
    When the reachability analysis to find the state space is performed, a breadth first search-like pass (spreading over all stochastic transitions) is performed. It can optionally be set so that the search is pruned at terminal states by setting this value to true. By default, it is false and the full reachable state space is found
  - hasRunVI
```
protected boolean hasRunVI
```
- Constructor Detail
  - ValueIteration
```
public ValueIteration(Domain domain,
              RewardFunction rf,
              TerminalFunction tf,
              double gamma,
              StateHashFactory hashingFactory,
              double maxDelta,
              int maxIterations)
```
    Initializers the planner.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factor to use
    maxDelta - when the maximum change in the value function is smaller than this value, VI will terminate.
    maxIterations - when the number of VI iterations exceeds this value, VI will terminate.
- Method Detail
  - recomputeReachableStates
```
public void recomputeReachableStates()
```
    Calling this method will force the planner to recompute the reachable states when the planFromState(State) method is called next. This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.
  - toggleReachabiltiyTerminalStatePruning
```
public void toggleReachabiltiyTerminalStatePruning(boolean toggle)
```
    Sets whether the state reachability search to generate the state space will be prune the search from terminal states. The default is not to prune.
    
    Parameters:
    toggle - true if the search should prune the search at terminal states; false if the search should find all reachable states regardless of terminal states.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class ValueFunctionPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Overrides:
    
    resetPlannerResults in class ValueFunctionPlanner
  - runVI
```
public void runVI()
```
    Runs VI until the specified termination conditions are met. In general, this method should only be called indirectly through the planFromState(State) method. The performReachabilityFrom(State) must have been performed at least once in the past or a runtime exception will be thrown. The planFromState(State) method will automatically call the performReachabilityFrom(State) method first and then this if it hasn't been run.
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used by the runVI() method and will cache all the transition dynamics. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    si - the source state from which all reachable states will be found
    
    Returns:
    true if a reachability analysis had never been performed from this state; false otherwise.

Class ValueIteration

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

maxDelta

maxIterations

foundReachableStates

stopReachabilityFromTerminalStates

hasRunVI

Constructor Detail

ValueIteration

Method Detail

recomputeReachableStates

toggleReachabiltiyTerminalStatePruning

planFromState

resetPlannerResults

runVI

performReachabilityFrom