public class VIModelLearningPlanner extends ValueIteration implements ModelLearningPlanner
DynamicProgramming.StaticVFPlanner
QFunction.QFunctionHelper
Modifier and Type | Field and Description |
---|---|
protected State |
initialState
The last initial state of an episode
|
protected Policy |
modelPolicy
The greedy policy that results from VI
|
protected java.util.Set<HashableState> |
observedStates
States the agent has observed during learning.
|
foundReachableStates, hasRunVI, maxDelta, maxIterations, stopReachabilityFromTerminalStates
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
VIModelLearningPlanner(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
HashableStateFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes
|
Modifier and Type | Method and Description |
---|---|
void |
initializePlannerIn(State s)
This is method is expected to be called at the beginning of any new learning episode.
|
void |
modelChanged(State changedState)
Tells the valueFunction that the model has changed and that it will need to replan accordingly
|
Policy |
modelPlannedPolicy()
Returns a policy encoding the planner's results.
|
protected void |
rerunVI()
Reruns VI on the new updated model.
|
performReachabilityFrom, planFromState, recomputeReachableStates, resetSolver, runVI, toggleReachabiltiyTerminalStatePruning
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
planFromState
addNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, resetSolver, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrinting
protected java.util.Set<HashableState> observedStates
protected Policy modelPolicy
protected State initialState
public VIModelLearningPlanner(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)
domain
- model domainrf
- model reward funcitontf
- model termination functiongamma
- discount factorhashingFactory
- the hashing factorymaxDelta
- max value function delta in VImaxIterations
- max iterations of VIpublic void initializePlannerIn(State s)
ModelLearningPlanner
initializePlannerIn
in interface ModelLearningPlanner
s
- the input statepublic void modelChanged(State changedState)
ModelLearningPlanner
modelChanged
in interface ModelLearningPlanner
changedState
- the source state that caused a change in the model.public Policy modelPlannedPolicy()
ModelLearningPlanner
modelPlannedPolicy
in interface ModelLearningPlanner
protected void rerunVI()