public class FittedVI extends OOMDPPlanner implements ValueFunction, QComputablePlanner
setVInit(burlap.behavior.singleagent.ValueFunctionInitialization)
method). For each state sample, a new value for the state is computed by applying the bellman operator (using the model
of the world and the current, initially zero-valued, value function approximation). The newly computed values for each
state are then used as a supervised instance to train the next iteration of the value function.
setSamples(java.util.List)
method,
you can perform planning with the runVI()
method. You can also use the standard planFromState(burlap.oomdp.core.State)
method,
but specifying the state does not change behavior; the method just calls the runVI()
method itself.
SparseSampling
class. This enables it to
perform an approximate Bellman operator with sparse samples from the transition dynamics, which is useful if
the number of possible next state transitions is infinite or very large. Furthermore, it allows you to set
sparse sampling tree depth larger than one to get a more accurate estimate of the target state Value. The depth
of the tree can be independently set when planning (that is, running value iteration) and for control (that is,
the depth used to return the Q-values). See the setPlanningDepth(int)
, setControlDepth(int)
, and
setPlanningAndControlDepth(int)
methods for controlling the depth. By default, the depth will be 1.
Modifier and Type | Class and Description |
---|---|
class |
FittedVI.VFAVInit
A class for
ValueFunctionInitialization that always points to the outer class's current value function approximation. |
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected int |
controlDepth
The
SparseSampling depth used when
computing Q-values for the getQs(burlap.oomdp.core.State) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods used for control. |
protected FittedVI.VFAVInit |
leafNodeInit
This class computes the Bellman operator by using an instance of
SparseSampling
and setting its leaf nodes values to the current value function approximation. |
protected double |
maxDelta
The maximum change in the value function that will cause planning to terminate.
|
protected int |
maxIterations
The maximum number of iterations to run.
|
protected int |
planningDepth
The
SparseSampling planning depth used
for computing Bellman operators during value iteration. |
protected java.util.List<State> |
samples
The set of samples on which to perform value iteration.
|
protected int |
transitionSamples
The number of transition samples used when computing the bellman operator.
|
protected ValueFunction |
valueFunction
The current value function approximation
|
protected SupervisedVFA |
valueFunctionTrainer
The
SupervisedVFA instance used to train the value function on each iteration. |
protected ValueFunctionInitialization |
vinit
The initial value function to use
|
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
FittedVI(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
SupervisedVFA valueFunctionTrainer,
int transitionSamples,
double maxDelta,
int maxIterations)
Initializes.
|
FittedVI(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
SupervisedVFA valueFunctionTrainer,
java.util.List<State> samples,
int transitionSamples,
double maxDelta,
int maxIterations)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
int |
getControlDepth()
Returns the Bellman operator depth used for computing Q-values (the
getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods). |
int |
getPlanningDepth()
Returns the Bellman operator depth used during planning.
|
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
java.util.List<State> |
getSamples()
Returns the state samples to which the value function will be fit.
|
ValueFunctionInitialization |
getVInit()
Returns the value function initialization used at the start of planning.
|
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
double |
runIteration()
Runs a single iteration of value iteration.
|
void |
runVI()
Runs value iteration.
|
void |
setControlDepth(int controlDepth)
Sets the Bellman operator depth used for computing Q-values (the
getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods). |
void |
setPlanningAndControlDepth(int depth)
Sets the Bellman operator depth used during planning for computing Q-values (the
getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods). |
void |
setPlanningDepth(int planningDepth)
Sets the Bellman operator depth used during planning.
|
void |
setSamples(java.util.List<State> samples)
Sets the state samples to which the value function will be fit.
|
void |
setVInit(ValueFunctionInitialization vinit)
Sets the value function initialization used at the start of planning.
|
double |
value(State s)
Returns the value function evaluation of the given state.
|
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected java.util.List<State> samples
protected ValueFunction valueFunction
protected SupervisedVFA valueFunctionTrainer
SupervisedVFA
instance used to train the value function on each iteration.protected ValueFunctionInitialization vinit
protected FittedVI.VFAVInit leafNodeInit
SparseSampling
and setting its leaf nodes values to the current value function approximation. This value function initialization is points to
the current value function approximation for it to use.protected int planningDepth
SparseSampling
planning depth used
for computing Bellman operators during value iteration.protected int controlDepth
SparseSampling
depth used when
computing Q-values for the getQs(burlap.oomdp.core.State)
and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods used for control.protected int transitionSamples
protected int maxIterations
protected double maxDelta
public FittedVI(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, SupervisedVFA valueFunctionTrainer, int transitionSamples, double maxDelta, int maxIterations)
setSamples(java.util.List)
method before
calling planFromState(burlap.oomdp.core.State)
, runIteration()
, or runVI()
, otherwise a runtime exception
will be thrown.domain
- the domain in which to planrf
- the reward functiontf
- the terminal functiongamma
- the discount factorvalueFunctionTrainer
- the supervised learning algorithm to use for each value iterationtransitionSamples
- the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.maxDelta
- the maximum change in the value function that will cause planning to terminate.maxIterations
- the maximum number of iterations to run.public FittedVI(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, SupervisedVFA valueFunctionTrainer, java.util.List<State> samples, int transitionSamples, double maxDelta, int maxIterations)
setSamples(java.util.List)
method before
calling planFromState(burlap.oomdp.core.State)
, runIteration()
, or runVI()
, otherwise a runtime exception
will be thrown.domain
- the domain in which to planrf
- the reward functiontf
- the terminal functiongamma
- the discount factorvalueFunctionTrainer
- the supervised learning algorithm to use for each value iterationsamples
- the set of state samples to use for planning.transitionSamples
- the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.maxDelta
- the maximum change in the value function that will cause planning to terminate.maxIterations
- the maximum number of iterations to run.public ValueFunctionInitialization getVInit()
public void setVInit(ValueFunctionInitialization vinit)
vinit
- the value function initialization used at the start of planning.public int getPlanningDepth()
public void setPlanningDepth(int planningDepth)
planningDepth
- the Bellman operator depth used during planning.public int getControlDepth()
getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods).getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods).public void setControlDepth(int controlDepth)
getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods).controlDepth
- the Bellman operator depth used for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods).public void setPlanningAndControlDepth(int depth)
getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods).depth
- the Bellman operator depth used during planning for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)
methods).public java.util.List<State> getSamples()
public void setSamples(java.util.List<State> samples)
samples
- the state samples to which the value function will be fit.public void runVI()
public double runIteration()
public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class OOMDPPlanner
initialState
- the initial state of the planning problempublic void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class OOMDPPlanner
public java.util.List<QValue> getQs(State s)
QComputablePlanner
List
of QValue
objects for ever permissible action for the given input state.getQs
in interface QComputablePlanner
s
- the state for which Q-values are to be returned.List
of QValue
objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QComputablePlanner
QValue
for the given state-action pair.getQ
in interface QComputablePlanner
s
- the input statea
- the input actionQValue
for the given state-action pair.public double value(State s)
ValueFunction
value
in interface ValueFunction
s
- the state to evaluate.