public class VIModelLearningPlanner extends ValueIteration implements ModelLearningPlanner
DynamicProgramming.StaticVFPlannerQFunction.QFunctionHelper| Modifier and Type | Field and Description |
|---|---|
protected State |
initialState
The last initial state of an episode
|
protected Policy |
modelPolicy
The greedy policy that results from VI
|
protected java.util.Set<HashableState> |
observedStates
States the agent has observed during learning.
|
foundReachableStates, hasRunVI, maxDelta, maxIterations, stopReachabilityFromTerminalStatestransitionDynamics, useCachedTransitions, valueFunction, valueInitializeractions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description |
|---|
VIModelLearningPlanner(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
HashableStateFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes
|
| Modifier and Type | Method and Description |
|---|---|
void |
initializePlannerIn(State s)
This is method is expected to be called at the beginning of any new learning episode.
|
void |
modelChanged(State changedState)
Tells the valueFunction that the model has changed and that it will need to replan accordingly
|
Policy |
modelPlannedPolicy()
Returns a policy encoding the planner's results.
|
protected void |
rerunVI()
Reruns VI on the new updated model.
|
performReachabilityFrom, planFromState, recomputeReachableStates, resetSolver, runVI, toggleReachabiltiyTerminalStatePruningcomputeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, valueaddNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateActionclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitplanFromStateaddNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, resetSolver, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrintingprotected java.util.Set<HashableState> observedStates
protected Policy modelPolicy
protected State initialState
public VIModelLearningPlanner(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)
domain - model domainrf - model reward funcitontf - model termination functiongamma - discount factorhashingFactory - the hashing factorymaxDelta - max value function delta in VImaxIterations - max iterations of VIpublic void initializePlannerIn(State s)
ModelLearningPlannerinitializePlannerIn in interface ModelLearningPlanners - the input statepublic void modelChanged(State changedState)
ModelLearningPlannermodelChanged in interface ModelLearningPlannerchangedState - the source state that caused a change in the model.public Policy modelPlannedPolicy()
ModelLearningPlannermodelPlannedPolicy in interface ModelLearningPlannerprotected void rerunVI()