PolicyEvaluation

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.planning.stochastic.policyiteration.PolicyEvaluation

All Implemented Interfaces:

MDPSolverInterface, QFunction, QProvider, ValueFunction
```
public class PolicyEvaluation
extends DynamicProgramming
```
This class is used to compute the value function under some specified policy. The value function is computed using tabular Value Iteration with the Bellman operator being fixed to the specified policy. After constructing an instance use the evaluatePolicy(EnumerablePolicy, State) method to evaluate a policy from some initial seed state. You can reuse this class to evaluate different subsequent policies, but doing so will overwrite the value function. If you want to save the value function that was computed for some policy, use the DynamicProgramming.getCopyOfValueFunction() method.
Alternatively, you can also evaluate a policy with the evaluatePolicy(EnumerablePolicy) method, but you should have already seeded the state space by having called the evaluatePolicy(EnumerablePolicy, State) method or the performReachabilityFrom(State) method at least once previously, a runtime exception will be thrown.

Author:

James MacGlashan.

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`maxEvalDelta` When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
`protected double`	`maxEvalIterations` When the maximum number of evaluation iterations passes this number, policy evaluation will terminate

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
operator, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`PolicyEvaluation(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxEvalDelta, double maxEvalIterations)` Initializes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`evaluatePolicy(EnumerablePolicy policy)` Computes the value function for the given policy over the states that have been discovered
`void`	`evaluatePolicy(EnumerablePolicy policy, State s)` Computes the value function for the given policy after finding all reachable states from seed state s
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used when computing the value function.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, resetSolver, setOperator, setValueFunctionInitialization, value, value, writeValueTable

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - maxEvalDelta
```
protected double maxEvalDelta
```
    When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
  - maxEvalIterations
```
protected double maxEvalIterations
```
    When the maximum number of evaluation iterations passes this number, policy evaluation will terminate
- Constructor Detail
  - PolicyEvaluation
```
public PolicyEvaluation(SADomain domain,
                        double gamma,
                        HashableStateFactory hashingFactory,
                        double maxEvalDelta,
                        double maxEvalIterations)
```
    Initializes.
    
    Parameters:
    
    domain - the domain on which to evaluate a policy
    
    gamma - the discount factor
    
    hashingFactory - the HashableStateFactory used to index states and perform state equality
    
    maxEvalDelta - the minimum change in the value function that will cause policy evaluation to terminate
    
    maxEvalIterations - the maximum number of evaluation iterations to perform before terminating policy evaluation
- Method Detail
  - evaluatePolicy
```
public void evaluatePolicy(EnumerablePolicy policy,
                           State s)
```
    Computes the value function for the given policy after finding all reachable states from seed state s
    
    Parameters:
    
    policy - The Policy to evaluate
    
    s - the seed initiate state from which to find all reachable states
  - evaluatePolicy
```
public void evaluatePolicy(EnumerablePolicy policy)
```
    Computes the value function for the given policy over the states that have been discovered
    
    Parameters:
    
    policy - the Policy to evaluate
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used when computing the value function. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    
    si - the source state from which all reachable states will be found
    
    Returns:
    
    true if a reachability analysis had never been performed from this state; false otherwise.

Class PolicyEvaluation

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Field Detail

maxEvalDelta

maxEvalIterations

Constructor Detail

PolicyEvaluation

Method Detail

evaluatePolicy

evaluatePolicy

performReachabilityFrom