PolicyEvaluation

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.planning.stochastic.policyiteration.PolicyEvaluation

All Implemented Interfaces:

MDPSolverInterface, QFunction, ValueFunction
```
public class PolicyEvaluation
extends DynamicProgramming
```
This class is used to compute the value function under some specified policy. The value function is computed using tabular Value Iteration with the Bellman operator being fixed to the specified policy. After constructing an instance use the evaluatePolicy(burlap.behavior.policy.Policy, burlap.oomdp.core.states.State) method to evaluate a policy from some initial seed state. You can reuse this class to evaluate different subsequent policies, but doing so will overwrite the value function. If you want to save the value function that was computed for some policy, use the DynamicProgramming.getCopyOfValueFunction() method.

Alternatively, you can also evaluate a policy with the evaluatePolicy(burlap.behavior.policy.Policy) method, but you should have already seeded the state space by having called the evaluatePolicy(burlap.behavior.policy.Policy, burlap.oomdp.core.states.State) method or the performReachabilityFrom(burlap.oomdp.core.states.State) method at least once previously, a runtime exception will be thrown.

Author:

James MacGlashan.

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  DynamicProgramming.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction
  QFunction.QFunctionHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`maxEvalDelta` When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
`protected double`	`maxEvalIterations` When the maximum number of evaluation iterations passes this number, policy evaluation will terminate

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`PolicyEvaluation(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double maxEvalDelta, double maxEvalIterations)` Initializes.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`evaluatePolicy(Policy policy)` Computes the value function for the given policy over the states that have been discovered
`void`	`evaluatePolicy(Policy policy, State s)` Computes the value function for the given policy after finding all reachable states from seed state s
`boolean`	`performReachabilityFrom(State si)` This method will find all reachable states that will be used when computing the value function.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, resetSolver, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - maxEvalDelta
```
protected double maxEvalDelta
```
    When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
  - maxEvalIterations
```
protected double maxEvalIterations
```
    When the maximum number of evaluation iterations passes this number, policy evaluation will terminate
- Constructor Detail
  - PolicyEvaluation
```
public PolicyEvaluation(Domain domain,
                RewardFunction rf,
                TerminalFunction tf,
                double gamma,
                HashableStateFactory hashingFactory,
                double maxEvalDelta,
                double maxEvalIterations)
```
    Initializes.
    
    Parameters:
    domain - the domain on which to evaluate a policy
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the HashableStateFactory used to index states and perform state equality
    maxEvalDelta - the minimum change in the value function that will cause policy evaluation to terminate
    maxEvalIterations - the maximum number of evaluation iterations to perform before terminating policy evaluation
- Method Detail
  - evaluatePolicy
```
public void evaluatePolicy(Policy policy,
                  State s)
```
    Computes the value function for the given policy after finding all reachable states from seed state s
    
    Parameters:
    policy - The Policy to evaluate
    s - the seed initiate state from which to find all reachable states
  - evaluatePolicy
```
public void evaluatePolicy(Policy policy)
```
    Computes the value function for the given policy over the states that have been discovered
    
    Parameters:
    policy - the Policy to evaluate
  - performReachabilityFrom
```
public boolean performReachabilityFrom(State si)
```
    This method will find all reachable states that will be used when computing the value function. This method will not do anything if all reachable states from the input state have been discovered from previous calls to this method.
    
    Parameters:
    si - the source state from which all reachable states will be found
    
    Returns:
    true if a reachability analysis had never been performed from this state; false otherwise.

Class PolicyEvaluation

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Field Detail

maxEvalDelta

maxEvalIterations

Constructor Detail

PolicyEvaluation

Method Detail

evaluatePolicy

evaluatePolicy

performReachabilityFrom