PolicyIteration

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.planning.ValueFunctionPlanner
  - - burlap.behavior.singleagent.planning.stochastic.policyiteration.PolicyIteration

All Implemented Interfaces:: QComputablePlanner, ValueFunction

public class PolicyIteration
extends ValueFunctionPlanner

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
  ValueFunctionPlanner.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected PlannerDerivedPolicy`	`evaluativePolicy` The current policy to be evaluated
`protected boolean`	`foundReachableStates` Indicates whether the reachable states has been computed yet.
`protected double`	`maxEvalDelta` When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
`protected int`	`maxIterations` When the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
`protected double`	`maxPIDelta` When the maximum change between policy evaluations is smaller than this value, planning will terminate.
`protected int`	`maxPolicyIterations` When the number of policy iterations passes this value, planning will terminate.

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`PolicyIteration(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxPIDelta, double maxEvalDelta, int maxEvaluationIterations, int maxPolicyIterations)` Initializes the planner.
`PolicyIteration(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxDelta, int maxEvaluationIterations, int maxPolicyIterations)` Initializes the planner.

Method Summary

Methods
Modifier and Type	Method and Description
`protected double`	`evaluatePolicy()` Computes the value function under following the current evaluative policy.
`Policy`	`getComputedPolicy()` Returns the policy that was last computed.
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used when computing the value function.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`recomputeReachableStates()` Calling this method will force the planner to recompute the reachable states when the `planFromState(State)` method is called next.
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`void`	`setPolicyClassToEvaluate(PlannerDerivedPolicy p)` Sets which kind of policy to use whenever the policy is updated.

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - maxEvalDelta
```
protected double maxEvalDelta
```
    When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
  - maxPIDelta
```
protected double maxPIDelta
```
    When the maximum change between policy evaluations is smaller than this value, planning will terminate.
  - maxIterations
```
protected int maxIterations
```
    When the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
  - maxPolicyIterations
```
protected int maxPolicyIterations
```
    When the number of policy iterations passes this value, planning will terminate.
  - evaluativePolicy
```
protected PlannerDerivedPolicy evaluativePolicy
```
    The current policy to be evaluated
  - foundReachableStates
```
protected boolean foundReachableStates
```
    Indicates whether the reachable states has been computed yet.
- Constructor Detail
  - PolicyIteration
```
public PolicyIteration(Domain domain,
               RewardFunction rf,
               TerminalFunction tf,
               double gamma,
               StateHashFactory hashingFactory,
               double maxDelta,
               int maxEvaluationIterations,
               int maxPolicyIterations)
```
    Initializes the planner.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factor to use
    maxDelta - when the maximum change in the value function is smaller than this value, policy evaluation will terminate. Similarly, when the maximum value value function change between policy iterations is smaller than this value planning will terminate.
    maxEvaluationIterations - when the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
    maxPolicyIterations - when the number of policy iterations passes this value, planning will terminate.
  - PolicyIteration
```
public PolicyIteration(Domain domain,
               RewardFunction rf,
               TerminalFunction tf,
               double gamma,
               StateHashFactory hashingFactory,
               double maxPIDelta,
               double maxEvalDelta,
               int maxEvaluationIterations,
               int maxPolicyIterations)
```
    Initializes the planner.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factor to use
    maxPIDelta - when the maximum value value function change between policy iterations is smaller than this value planning will terminate.
    maxEvalDelta - when the maximum change in the value function is smaller than this value, policy evaluation will terminate.
    maxEvaluationIterations - when the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
    maxPolicyIterations - when the number of policy iterations passes this value, planning will terminate.
- Method Detail
  - setPolicyClassToEvaluate
```
public void setPolicyClassToEvaluate(PlannerDerivedPolicy p)
```
    Sets which kind of policy to use whenever the policy is updated. The default is a deterministic greedy policy (GreedyDeterministicQPolicy.
    
    Parameters:
    p - the policy to use when updating to the new evaluated value function.
  - getComputedPolicy
```
public Policy getComputedPolicy()
```
    Returns the policy that was last computed.
    
    Returns:
    the policy that was last computed.
  - recomputeReachableStates
```
public void recomputeReachableStates()
```
    Calling this method will force the planner to recompute the reachable states when the planFromState(State) method is called next. This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class ValueFunctionPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Overrides:
    
    resetPlannerResults in class ValueFunctionPlanner
  - evaluatePolicy
```
protected double evaluatePolicy()
```
    Computes the value function under following the current evaluative policy.
    
    Returns:
    the maximum single iteration change in the value function
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used when computing the value function. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    si - the source state from which all reachable states will be found
    
    Returns:
    true if a reachability analysis had never been performed from this state; false otherwise.

Class PolicyIteration

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.ValueFunctionPlanner

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

maxEvalDelta

maxPIDelta

maxIterations

maxPolicyIterations

evaluativePolicy

foundReachableStates

Constructor Detail

PolicyIteration

PolicyIteration

Method Detail

setPolicyClassToEvaluate

getComputedPolicy

recomputeReachableStates

planFromState

resetPlannerResults

evaluatePolicy

performReachabilityFrom