PolicyIteration

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.planning.stochastic.policyiteration.PolicyIteration

All Implemented Interfaces:

MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction
```
public class PolicyIteration
extends DynamicProgramming
implements Planner
```

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected EnumerablePolicy`	`evaluativePolicy` The current policy to be evaluated
`protected boolean`	`foundReachableStates` Indicates whether the reachable states has been computed yet.
`protected boolean`	`hasRunPlanning` Boolean to indicate whether planning as been run at least once
`protected double`	`maxEvalDelta` When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
`protected int`	`maxIterations` When the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
`protected double`	`maxPIDelta` When the maximum change between policy evaluations is smaller than this value, planning will terminate.
`protected int`	`maxPolicyIterations` When the number of policy iterations passes this value, planning will terminate.
`protected int`	`totalPolicyIterations` The total number of policy iterations performed
`protected int`	`totalValueIterations` The total number of value iterations used to evaluated policies performed

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
operator, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`PolicyIteration(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxPIDelta, double maxEvalDelta, int maxEvaluationIterations, int maxPolicyIterations)` Initializes the valueFunction.
`PolicyIteration(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxEvaluationIterations, int maxPolicyIterations)` Initializes the valueFunction.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected double`	`evaluatePolicy()` Computes the value function under following the current evaluative policy.
`Policy`	`getComputedPolicy()` Returns the policy that was last computed (or the initial policy if no planning has been performed).
`int`	`getTotalPolicyIterations()` Returns the total number of policy iterations that have been performed.
`int`	`getTotalValueIterations()` Returns the total number of value iterations used to evaluate policies.
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used when computing the value function.
`GreedyQPolicy`	`planFromState(State initialState)` Plans from the input state and then returns a `GreedyQPolicy` that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
`void`	`recomputeReachableStates()` Calling this method will force the valueFunction to recompute the reachable states when the `planFromState(State)` method is called next.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`setPolicyToEvaluate(EnumerablePolicy p)` Sets the initial policy that will be evaluated when planning with policy iteration begins.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setOperator, setValueFunctionInitialization, value, value, writeValueTable

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

- Field Detail
  - maxEvalDelta
```
protected double maxEvalDelta
```
    When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
  - maxPIDelta
```
protected double maxPIDelta
```
    When the maximum change between policy evaluations is smaller than this value, planning will terminate.
  - maxIterations
```
protected int maxIterations
```
    When the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
  - maxPolicyIterations
```
protected int maxPolicyIterations
```
    When the number of policy iterations passes this value, planning will terminate.
  - evaluativePolicy
```
protected EnumerablePolicy evaluativePolicy
```
    The current policy to be evaluated
  - foundReachableStates
```
protected boolean foundReachableStates
```
    Indicates whether the reachable states has been computed yet.
  - totalPolicyIterations
```
protected int totalPolicyIterations
```
    The total number of policy iterations performed
  - totalValueIterations
```
protected int totalValueIterations
```
    The total number of value iterations used to evaluated policies performed
  - hasRunPlanning
```
protected boolean hasRunPlanning
```
    Boolean to indicate whether planning as been run at least once
- Constructor Detail
  - PolicyIteration
```
public PolicyIteration(SADomain domain,
                       double gamma,
                       HashableStateFactory hashingFactory,
                       double maxDelta,
                       int maxEvaluationIterations,
                       int maxPolicyIterations)
```
    Initializes the valueFunction.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factor to use
    
    maxDelta - when the maximum change in the value function is smaller than this value, policy evaluation will terminate. Similarly, when the maximum value value function change between policy iterations is smaller than this value planning will terminate.
    
    maxEvaluationIterations - when the number iterations of value iteration used to evaluate a policy exceeds this value, policy evaluation will terminate.
    
    maxPolicyIterations - when the number of policy iterations passes this value, planning will terminate.
  - PolicyIteration
```
public PolicyIteration(SADomain domain,
                       double gamma,
                       HashableStateFactory hashingFactory,
                       double maxPIDelta,
                       double maxEvalDelta,
                       int maxEvaluationIterations,
                       int maxPolicyIterations)
```
    Initializes the valueFunction.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factor to use
    
    maxPIDelta - when the maximum value value function change between policy iterations is smaller than this value planning will terminate.
    
    maxEvalDelta - when the maximum change in the value function is smaller than this value, policy evaluation will terminate.
    
    maxEvaluationIterations - when the number iterations of value iteration used to evaluate a policy exceeds this value, policy evaluation will terminate.
    
    maxPolicyIterations - when the number of policy iterations passes this value, planning will terminate.
- Method Detail
  - setPolicyToEvaluate
```
public void setPolicyToEvaluate(EnumerablePolicy p)
```
    Sets the initial policy that will be evaluated when planning with policy iteration begins. After the first policy iteration, the evaluative policy will be GreedyQPolicy on the function evaluation.
    
    Parameters:
    
    p - the initial policy to evaluate when planning begins.
  - getComputedPolicy
```
public Policy getComputedPolicy()
```
    Returns the policy that was last computed (or the initial policy if no planning has been performed).
    
    Returns:
    
    the policy that was last computed.
  - recomputeReachableStates
```
public void recomputeReachableStates()
```
    Calling this method will force the valueFunction to recompute the reachable states when the planFromState(State) method is called next. This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.
  - getTotalPolicyIterations
```
public int getTotalPolicyIterations()
```
    Returns the total number of policy iterations that have been performed.
    
    Returns:
    
    the total number of policy iterations that have been performed.
  - getTotalValueIterations
```
public int getTotalValueIterations()
```
    Returns the total number of value iterations used to evaluate policies.
    
    Returns:
    
    the total number of value iterations used to evaluate policies.
  - planFromState
```
public GreedyQPolicy planFromState(State initialState)
```
    Plans from the input state and then returns a GreedyQPolicy that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    
    initialState - the initial state of the planning problem
    
    Returns:
    
    a GreedyQPolicy.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Overrides:
    
    resetSolver in class DynamicProgramming
  - evaluatePolicy
```
protected double evaluatePolicy()
```
    Computes the value function under following the current evaluative policy.
    
    Returns:
    
    the maximum single iteration change in the value function
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used when computing the value function. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    
    si - the source state from which all reachable states will be found
    
    Returns:
    
    true if a reachability analysis had never been performed from this state; false otherwise.

Class PolicyIteration

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

maxEvalDelta

maxPIDelta

maxIterations

maxPolicyIterations

evaluativePolicy

foundReachableStates

totalPolicyIterations

totalValueIterations

hasRunPlanning

Constructor Detail

PolicyIteration

PolicyIteration

Method Detail

setPolicyToEvaluate

getComputedPolicy

recomputeReachableStates

getTotalPolicyIterations

getTotalValueIterations

planFromState

resetSolver

evaluatePolicy

performReachabilityFrom