public class FittedVI extends MDPSolver implements ValueFunction, QProvider, Planner
setVInit(burlap.behavior.valuefunction.ValueFunction)
method). For each state sample, a new value for the state is computed by applying the bellman operator (using the model
of the world and the current, initially zero-valued, value function approximation). The newly computed values for each
state are then used as a supervised instance to train the next iteration of the value function.
To perform planning after specifying the state samples to use (either in the constructor or with the setSamples(java.util.List)
method,
you can perform planning with the runVI()
method. You can also use the standard planFromState(State)
method,
but specifying the state does not change behavior; the method just calls the runVI()
method itself.
To compute the value of a state sample with the current value function approximation, this class will invoke the
SparseSampling
class. This enables it to
perform an approximate Bellman operator with sparse samples from the transition dynamics, which is useful if
the number of possible next state transitions is infinite or very large. Furthermore, it allows you to set
sparse sampling tree depth larger than one to get a more accurate estimate of the target state Value. The depth
of the tree can be independently set when planning (that is, running value iteration) and for control (that is,
the depth used to return the Q-values). See the setPlanningDepth(int)
, setControlDepth(int)
, and
setPlanningAndControlDepth(int)
methods for controlling the depth. By default, the depth will be 1.
1. Gordon, Geoffrey J. "Stable function approximation in dynamic programming." Proceedings of the twelfth international conference on machine learning. 1995.
Modifier and Type | Class and Description |
---|---|
class |
FittedVI.VFAVInit
A class for
QFunction that always points to the outer class's current value function approximation. |
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected int |
controlDepth
The
SparseSampling depth used when
computing Q-values for the qValues(State) and qValue(State, Action)
methods used for control. |
protected FittedVI.VFAVInit |
leafNodeInit
This class computes the Bellman operator by using an instance of
SparseSampling
and setting its leaf nodes values to the current value function approximation. |
protected double |
maxDelta
The maximum change in the value function that will cause planning to terminate.
|
protected int |
maxIterations
The maximum number of iterations to run.
|
protected int |
planningDepth
The
SparseSampling planning depth used
for computing Bellman operators during value iteration. |
protected java.util.List<State> |
samples
The set of samples on which to perform value iteration.
|
protected int |
transitionSamples
The number of transition samples used when computing the bellman operator.
|
protected ValueFunction |
valueFunction
The current value function approximation
|
protected SupervisedVFA |
valueFunctionTrainer
The
SupervisedVFA instance used to train the value function on each iteration. |
protected ValueFunction |
vinit
The initial value function to use
|
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
FittedVI(SADomain domain,
double gamma,
SupervisedVFA valueFunctionTrainer,
int transitionSamples,
double maxDelta,
int maxIterations)
Initializes.
|
FittedVI(SADomain domain,
double gamma,
SupervisedVFA valueFunctionTrainer,
java.util.List<State> samples,
int transitionSamples,
double maxDelta,
int maxIterations)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
int |
getControlDepth()
Returns the Bellman operator depth used for computing Q-values (the
qValue(State, Action) and qValue(State, Action) methods). |
int |
getPlanningDepth()
Returns the Bellman operator depth used during planning.
|
java.util.List<State> |
getSamples()
Returns the state samples to which the value function will be fit.
|
ValueFunction |
getVInit()
Returns the value function initialization used at the start of planning.
|
GreedyQPolicy |
planFromState(State initialState)
Plans from the input state and then returns a
GreedyQPolicy that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly. |
double |
qValue(State s,
Action a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
qValues(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
double |
runIteration()
Runs a single iteration of value iteration.
|
void |
runVI()
Runs value iteration.
|
void |
setControlDepth(int controlDepth)
Sets the Bellman operator depth used for computing Q-values (the
qValue(State, Action) and qValue(State, Action) methods). |
void |
setPlanningAndControlDepth(int depth)
Sets the Bellman operator depth used during planning for computing Q-values (the
qValue(State, Action) and qValue(State, Action) methods). |
void |
setPlanningDepth(int planningDepth)
Sets the Bellman operator depth used during planning.
|
void |
setSamples(java.util.List<State> samples)
Sets the state samples to which the value function will be fit.
|
void |
setVInit(ValueFunction vinit)
Sets the value function initialization used at the start of planning.
|
double |
value(State s)
Returns the value function evaluation of the given state.
|
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
protected java.util.List<State> samples
protected ValueFunction valueFunction
protected SupervisedVFA valueFunctionTrainer
SupervisedVFA
instance used to train the value function on each iteration.protected ValueFunction vinit
protected FittedVI.VFAVInit leafNodeInit
SparseSampling
and setting its leaf nodes values to the current value function approximation. This value function initialization is points to
the current value function approximation for it to use.protected int planningDepth
SparseSampling
planning depth used
for computing Bellman operators during value iteration.protected int controlDepth
SparseSampling
depth used when
computing Q-values for the qValues(State)
and qValue(State, Action)
methods used for control.protected int transitionSamples
protected int maxIterations
protected double maxDelta
public FittedVI(SADomain domain, double gamma, SupervisedVFA valueFunctionTrainer, int transitionSamples, double maxDelta, int maxIterations)
setSamples(java.util.List)
method before
calling planFromState(State)
, runIteration()
, or runVI()
, otherwise a runtime exception
will be thrown.domain
- the domain in which to plangamma
- the discount factorvalueFunctionTrainer
- the supervised learning algorithm to use for each value iterationtransitionSamples
- the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.maxDelta
- the maximum change in the value function that will cause planning to terminate.maxIterations
- the maximum number of iterations to run.public FittedVI(SADomain domain, double gamma, SupervisedVFA valueFunctionTrainer, java.util.List<State> samples, int transitionSamples, double maxDelta, int maxIterations)
setSamples(java.util.List)
method before
calling planFromState(State)
, runIteration()
, or runVI()
, otherwise a runtime exception
will be thrown.domain
- the domain in which to plangamma
- the discount factorvalueFunctionTrainer
- the supervised learning algorithm to use for each value iterationsamples
- the set of state samples to use for planning.transitionSamples
- the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.maxDelta
- the maximum change in the value function that will cause planning to terminate.maxIterations
- the maximum number of iterations to run.public ValueFunction getVInit()
public void setVInit(ValueFunction vinit)
vinit
- the value function initialization used at the start of planning.public int getPlanningDepth()
public void setPlanningDepth(int planningDepth)
planningDepth
- the Bellman operator depth used during planning.public int getControlDepth()
qValue(State, Action)
and qValue(State, Action)
methods).qValue(State, Action)
and qValue(State, Action)
methods).public void setControlDepth(int controlDepth)
qValue(State, Action)
and qValue(State, Action)
methods).controlDepth
- the Bellman operator depth used for computing Q-values (the qValue(State, Action)
and qValue(State, Action)
methods).public void setPlanningAndControlDepth(int depth)
qValue(State, Action)
and qValue(State, Action)
methods).depth
- the Bellman operator depth used during planning for computing Q-values (the qValue(State, Action)
and qValue(State, Action)
methods).public java.util.List<State> getSamples()
public void setSamples(java.util.List<State> samples)
samples
- the state samples to which the value function will be fit.public void runVI()
public double runIteration()
public GreedyQPolicy planFromState(State initialState)
GreedyQPolicy
that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly.planFromState
in interface Planner
initialState
- the initial state of the planning problemGreedyQPolicy
.public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver
public java.util.List<QValue> qValues(State s)
QProvider
List
of QValue
objects for ever permissible action for the given input state.public double qValue(State s, Action a)
QFunction
QValue
for the given state-action pair.public double value(State s)
ValueFunction
value
in interface ValueFunction
s
- the state to evaluate.