public class PolicyEvaluation extends DynamicProgramming
evaluatePolicy(burlap.behavior.policy.Policy, burlap.oomdp.core.states.State)
method to evaluate a
policy from some initial seed state. You can reuse this class to evaluate different subsequent policies, but doing so
will overwrite the value function. If you want to save the value function that was computed for some policy,
use the DynamicProgramming.getCopyOfValueFunction()
method.
evaluatePolicy(burlap.behavior.policy.Policy)
method,
but you should have already seeded the state space by having called the evaluatePolicy(burlap.behavior.policy.Policy, burlap.oomdp.core.states.State)
method or the performReachabilityFrom(burlap.oomdp.core.states.State)
method at least once previously,
a runtime exception will be thrown.DynamicProgramming.StaticVFPlanner
QFunction.QFunctionHelper
Modifier and Type | Field and Description |
---|---|
protected double |
maxEvalDelta
When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
|
protected double |
maxEvalIterations
When the maximum number of evaluation iterations passes this number, policy evaluation will terminate
|
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
PolicyEvaluation(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
HashableStateFactory hashingFactory,
double maxEvalDelta,
double maxEvalIterations)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
void |
evaluatePolicy(Policy policy)
Computes the value function for the given policy over the states that have been discovered
|
void |
evaluatePolicy(Policy policy,
State s)
Computes the value function for the given policy after finding all reachable states from seed state s
|
boolean |
performReachabilityFrom(State si)
This method will find all reachable states that will be used when computing the value function.
|
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, resetSolver, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
protected double maxEvalDelta
protected double maxEvalIterations
public PolicyEvaluation(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double maxEvalDelta, double maxEvalIterations)
domain
- the domain on which to evaluate a policyrf
- the reward functiontf
- the terminal functiongamma
- the discount factorhashingFactory
- the HashableStateFactory
used to index states and perform state equalitymaxEvalDelta
- the minimum change in the value function that will cause policy evaluation to terminatemaxEvalIterations
- the maximum number of evaluation iterations to perform before terminating policy evaluationpublic void evaluatePolicy(Policy policy, State s)
policy
- The Policy
to evaluates
- the seed initiate state from which to find all reachable statespublic void evaluatePolicy(Policy policy)
policy
- the Policy
to evaluatepublic boolean performReachabilityFrom(State si)
si
- the source state from which all reachable states will be found