public class VIModelLearningPlanner extends ValueIteration implements ModelLearningPlanner
QProvider.Helper| Modifier and Type | Field and Description |
|---|---|
protected State |
initialState
The last initial state of an episode
|
protected Policy |
modelPolicy
The greedy policy that results from VI
|
protected java.util.Set<HashableState> |
observedStates
States the agent has observed during learning.
|
foundReachableStates, hasRunVI, maxDelta, maxIterations, stopReachabilityFromTerminalStatesoperator, valueFunction, valueInitializeractionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel| Constructor and Description |
|---|
VIModelLearningPlanner(SADomain domain,
FullModel model,
double gamma,
HashableStateFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes
|
| Modifier and Type | Method and Description |
|---|---|
void |
initializePlannerIn(State s)
This is method is expected to be called at the beginning of any new learning episode.
|
void |
modelChanged(State changedState)
Tells the valueFunction that the model has changed and that it will need to replan accordingly
|
Policy |
modelPlannedPolicy()
Returns a policy encoding the planner's results.
|
protected void |
rerunVI()
Reruns VI on the new updated model.
|
performReachabilityFrom, planFromState, recomputeReachableStates, resetSolver, runVI, toggleReachabiltiyTerminalStatePruningcomputeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setOperator, setValueFunctionInitialization, value, value, writeValueTableaddActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrintingclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitplanFromStateaddActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, resetSolver, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrintingprotected java.util.Set<HashableState> observedStates
protected Policy modelPolicy
protected State initialState
public VIModelLearningPlanner(SADomain domain, FullModel model, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)
domain - model domainmodel - the learned model to use for planninggamma - discount factorhashingFactory - the hashing factorymaxDelta - max value function delta in VImaxIterations - max iterations of VIpublic void initializePlannerIn(State s)
ModelLearningPlannerinitializePlannerIn in interface ModelLearningPlanners - the input statepublic void modelChanged(State changedState)
ModelLearningPlannermodelChanged in interface ModelLearningPlannerchangedState - the source state that caused a change in the model.public Policy modelPlannedPolicy()
ModelLearningPlannermodelPlannedPolicy in interface ModelLearningPlannerprotected void rerunVI()