public class VIModelPlanner extends java.lang.Object implements ModelPlanner, QComputablePlanner
Modifier and Type | Class and Description |
---|---|
static class |
VIModelPlanner.VIModelPlannerGenerator |
ModelPlanner.ModelPlannerGenerator
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected Domain |
domain
the model domain
|
protected double |
gamma
The model planning discount factor
|
protected StateHashFactory |
hashingFactory
The hashing factory to use
|
protected State |
initialState
The last initial state of an episode
|
protected double |
maxDelta
The maximium VI delta
|
protected int |
maxIterations
The maximum number of VI iterations
|
protected Policy |
modelPolicy
The greedy policy that results from VI
|
protected java.util.Set<StateHashTuple> |
observedStates
States the agent has observed during learning.
|
protected RewardFunction |
rf
The model reward function
|
protected TerminalFunction |
tf
The model termination function
|
protected ValueIteration |
vi
The value iteration planning object
|
Constructor and Description |
---|
VIModelPlanner(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes
|
Modifier and Type | Method and Description |
---|---|
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
ValueIteration |
getValueIterationPlanner()
Returns the value iteration object used for planning whenever the model updates.
|
void |
initializePlannerIn(State s)
This is method is expected to be called at the beginning of any new learning episode.
|
void |
modelChanged(State changedState)
Tells the planner that the model has changed and that it will need to replan accordingly
|
Policy |
modelPlannedPolicy()
Returns a policy encoding the planner's results.
|
protected void |
rerunVI()
Reruns VI on the new updated model.
|
void |
resetPlanner()
Resets planner as if no planning had never been called.
|
protected ValueIteration vi
protected Policy modelPolicy
protected State initialState
protected Domain domain
protected RewardFunction rf
protected TerminalFunction tf
protected double gamma
protected StateHashFactory hashingFactory
protected double maxDelta
protected int maxIterations
protected java.util.Set<StateHashTuple> observedStates
public VIModelPlanner(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxDelta, int maxIterations)
domain
- model domainrf
- model reward funcitontf
- model termination functiongamma
- discount factorhashingFactory
- the hashing factorymaxDelta
- max value function delta in VImaxIterations
- max iterations of VIpublic void initializePlannerIn(State s)
ModelPlanner
initializePlannerIn
in interface ModelPlanner
s
- the input statepublic void modelChanged(State changedState)
ModelPlanner
modelChanged
in interface ModelPlanner
changedState
- the source state that caused a change in the model.public Policy modelPlannedPolicy()
ModelPlanner
modelPlannedPolicy
in interface ModelPlanner
public void resetPlanner()
ModelPlanner
resetPlanner
in interface ModelPlanner
public java.util.List<QValue> getQs(State s)
QComputablePlanner
List
of QValue
objects for ever permissible action for the given input state.getQs
in interface QComputablePlanner
s
- the state for which Q-values are to be returned.List
of QValue
objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QComputablePlanner
QValue
for the given state-action pair.getQ
in interface QComputablePlanner
s
- the input statea
- the input actionQValue
for the given state-action pair.public ValueIteration getValueIterationPlanner()
protected void rerunVI()