public class PolicyIteration extends DynamicProgramming implements Planner
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected EnumerablePolicy |
evaluativePolicy
The current policy to be evaluated
|
protected boolean |
foundReachableStates
Indicates whether the reachable states has been computed yet.
|
protected boolean |
hasRunPlanning
Boolean to indicate whether planning as been run at least once
|
protected double |
maxEvalDelta
When the maximum change in the value function is smaller than this value, policy evaluation will terminate.
|
protected int |
maxIterations
When the number of policy evaluation iterations exceeds this value, policy evaluation will terminate.
|
protected double |
maxPIDelta
When the maximum change between policy evaluations is smaller than this value, planning will terminate.
|
protected int |
maxPolicyIterations
When the number of policy iterations passes this value, planning will terminate.
|
protected int |
totalPolicyIterations
The total number of policy iterations performed
|
protected int |
totalValueIterations
The total number of value iterations used to evaluated policies performed
|
operator, valueFunction, valueInitializer
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
PolicyIteration(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double maxPIDelta,
double maxEvalDelta,
int maxEvaluationIterations,
int maxPolicyIterations)
Initializes the valueFunction.
|
PolicyIteration(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double maxDelta,
int maxEvaluationIterations,
int maxPolicyIterations)
Initializes the valueFunction.
|
Modifier and Type | Method and Description |
---|---|
protected double |
evaluatePolicy()
Computes the value function under following the current evaluative policy.
|
Policy |
getComputedPolicy()
Returns the policy that was last computed (or the initial policy if no planning has been performed).
|
int |
getTotalPolicyIterations()
Returns the total number of policy iterations that have been performed.
|
int |
getTotalValueIterations()
Returns the total number of value iterations used to evaluate policies.
|
boolean |
performReachabilityFrom(State si)
This method will find all reachable states that will be used when computing the value function.
|
GreedyQPolicy |
planFromState(State initialState)
Plans from the input state and then returns a
GreedyQPolicy that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly. |
void |
recomputeReachableStates()
Calling this method will force the valueFunction to recompute the reachable states when the
planFromState(State) method is called next. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
void |
setPolicyToEvaluate(EnumerablePolicy p)
Sets the initial policy that will be evaluated when planning with policy iteration begins.
|
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setOperator, setValueFunctionInitialization, value, value, writeValueTable
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
protected double maxEvalDelta
protected double maxPIDelta
protected int maxIterations
protected int maxPolicyIterations
protected EnumerablePolicy evaluativePolicy
protected boolean foundReachableStates
protected int totalPolicyIterations
protected int totalValueIterations
protected boolean hasRunPlanning
public PolicyIteration(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxEvaluationIterations, int maxPolicyIterations)
domain
- the domain in which to plangamma
- the discount factorhashingFactory
- the state hashing factor to usemaxDelta
- when the maximum change in the value function is smaller than this value, policy evaluation will terminate. Similarly, when the maximum value value function change between policy iterations is smaller than this value planning will terminate.maxEvaluationIterations
- when the number iterations of value iteration used to evaluate a policy exceeds this value, policy evaluation will terminate.maxPolicyIterations
- when the number of policy iterations passes this value, planning will terminate.public PolicyIteration(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxPIDelta, double maxEvalDelta, int maxEvaluationIterations, int maxPolicyIterations)
domain
- the domain in which to plangamma
- the discount factorhashingFactory
- the state hashing factor to usemaxPIDelta
- when the maximum value value function change between policy iterations is smaller than this value planning will terminate.maxEvalDelta
- when the maximum change in the value function is smaller than this value, policy evaluation will terminate.maxEvaluationIterations
- when the number iterations of value iteration used to evaluate a policy exceeds this value, policy evaluation will terminate.maxPolicyIterations
- when the number of policy iterations passes this value, planning will terminate.public void setPolicyToEvaluate(EnumerablePolicy p)
GreedyQPolicy
on the function evaluation.p
- the initial policy to evaluate when planning begins.public Policy getComputedPolicy()
public void recomputeReachableStates()
planFromState(State)
method is called next.
This may be useful if the transition dynamics from the last planning call have changed and if planning needs to be restarted as a result.public int getTotalPolicyIterations()
public int getTotalValueIterations()
public GreedyQPolicy planFromState(State initialState)
GreedyQPolicy
that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly.planFromState
in interface Planner
initialState
- the initial state of the planning problemGreedyQPolicy
.public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class DynamicProgramming
protected double evaluatePolicy()
public boolean performReachabilityFrom(State si)
si
- the source state from which all reachable states will be found