public class PolicyIteration extends ValueFunctionPlanner
ValueFunctionPlanner.StaticVFPlanner
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected PlannerDerivedPolicy |
evaluativePolicy
The current policy to be evaluated
|
protected boolean |
foundReachableStates
Indicates whether the reachable states has been computed yet.
|
protected double |
maxEvalDelta
When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
|
protected int |
maxIterations
When the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
|
protected double |
maxPIDelta
When the maximum change between policy evaluations is smaller than this value, planning will terminate.
|
protected int |
maxPolicyIterations
When the number of policy iterations passes this value, planning will terminate.
|
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
PolicyIteration(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double maxPIDelta,
double maxEvalDelta,
int maxEvaluationIterations,
int maxPolicyIterations)
Initializes the planner.
|
PolicyIteration(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double maxDelta,
int maxEvaluationIterations,
int maxPolicyIterations)
Initializes the planner.
|
Modifier and Type | Method and Description |
---|---|
protected double |
evaluatePolicy()
Computes the value function under following the current evaluative policy.
|
Policy |
getComputedPolicy()
Returns the policy that was last computed.
|
boolean |
performReachabilityFrom(State si)
This method will find all reachable states that will be used when computing the value function.
|
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
recomputeReachableStates()
Calling this method will force the planner to recompute the reachable states when the
planFromState(State) method is called next. |
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
void |
setPolicyClassToEvaluate(PlannerDerivedPolicy p)
Sets which kind of policy to use whenever the policy is updated.
|
computeQ, computeQ, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value, VFPInit
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected double maxEvalDelta
protected double maxPIDelta
protected int maxIterations
protected int maxPolicyIterations
protected PlannerDerivedPolicy evaluativePolicy
protected boolean foundReachableStates
public PolicyIteration(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxDelta, int maxEvaluationIterations, int maxPolicyIterations)
domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factor to usemaxDelta
- when the maximum change in the value function is smaller than this value, policy evaluation will terminate. Similarly, when the maximum value value function change between policy iterations is smaller than this value planning will terminate.maxEvaluationIterations
- when the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.maxPolicyIterations
- when the number of policy iterations passes this value, planning will terminate.public PolicyIteration(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxPIDelta, double maxEvalDelta, int maxEvaluationIterations, int maxPolicyIterations)
domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factor to usemaxPIDelta
- when the maximum value value function change between policy iterations is smaller than this value planning will terminate.maxEvalDelta
- when the maximum change in the value function is smaller than this value, policy evaluation will terminate.maxEvaluationIterations
- when the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.maxPolicyIterations
- when the number of policy iterations passes this value, planning will terminate.public void setPolicyClassToEvaluate(PlannerDerivedPolicy p)
GreedyDeterministicQPolicy
.p
- the policy to use when updating to the new evaluated value function.public Policy getComputedPolicy()
public void recomputeReachableStates()
planFromState(State)
method is called next.
This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class ValueFunctionPlanner
initialState
- the initial state of the planning problempublic void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class ValueFunctionPlanner
protected double evaluatePolicy()
public boolean performReachabilityFrom(State si)
si
- the source state from which all reachable states will be found