public class VIModelLearningPlanner extends ValueIteration implements ModelLearningPlanner
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected State |
initialState
The last initial state of an episode
|
protected Policy |
modelPolicy
The greedy policy that results from VI
|
protected java.util.Set<HashableState> |
observedStates
States the agent has observed during learning.
|
foundReachableStates, hasRunVI, maxDelta, maxIterations, stopReachabilityFromTerminalStates
operator, valueFunction, valueInitializer
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
VIModelLearningPlanner(SADomain domain,
FullModel model,
double gamma,
HashableStateFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes
|
Modifier and Type | Method and Description |
---|---|
void |
initializePlannerIn(State s)
This is method is expected to be called at the beginning of any new learning episode.
|
void |
modelChanged(State changedState)
Tells the valueFunction that the model has changed and that it will need to replan accordingly
|
Policy |
modelPlannedPolicy()
Returns a policy encoding the planner's results.
|
protected void |
rerunVI()
Reruns VI on the new updated model.
|
performReachabilityFrom, planFromState, recomputeReachableStates, resetSolver, runVI, toggleReachabiltiyTerminalStatePruning
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setOperator, setValueFunctionInitialization, value, value, writeValueTable
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
planFromState
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, resetSolver, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
protected java.util.Set<HashableState> observedStates
protected Policy modelPolicy
protected State initialState
public VIModelLearningPlanner(SADomain domain, FullModel model, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)
domain
- model domainmodel
- the learned model to use for planninggamma
- discount factorhashingFactory
- the hashing factorymaxDelta
- max value function delta in VImaxIterations
- max iterations of VIpublic void initializePlannerIn(State s)
ModelLearningPlanner
initializePlannerIn
in interface ModelLearningPlanner
s
- the input statepublic void modelChanged(State changedState)
ModelLearningPlanner
modelChanged
in interface ModelLearningPlanner
changedState
- the source state that caused a change in the model.public Policy modelPlannedPolicy()
ModelLearningPlanner
modelPlannedPolicy
in interface ModelLearningPlanner
protected void rerunVI()