FittedVI

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.vfa.fittedvi.FittedVI

All Implemented Interfaces:

MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction
```
public class FittedVI
extends MDPSolver
implements ValueFunction, QProvider, Planner
```
A class for performing Fitted Value Iteration [1]. This is a variant of value iteration that takes a set of sample states from a domain and performs synchronous value iteration on the samples by using the Bellman operator and a current approximation of the value function. Specifically, the value function is seeded to some initial value (by default zero, but it can be set to something else with the setVInit(burlap.behavior.valuefunction.ValueFunction) method). For each state sample, a new value for the state is computed by applying the bellman operator (using the model of the world and the current, initially zero-valued, value function approximation). The newly computed values for each state are then used as a supervised instance to train the next iteration of the value function.
To perform planning after specifying the state samples to use (either in the constructor or with the setSamples(java.util.List) method, you can perform planning with the runVI() method. You can also use the standard planFromState(State) method, but specifying the state does not change behavior; the method just calls the runVI() method itself.
To compute the value of a state sample with the current value function approximation, this class will invoke the SparseSampling class. This enables it to perform an approximate Bellman operator with sparse samples from the transition dynamics, which is useful if the number of possible next state transitions is infinite or very large. Furthermore, it allows you to set sparse sampling tree depth larger than one to get a more accurate estimate of the target state Value. The depth of the tree can be independently set when planning (that is, running value iteration) and for control (that is, the depth used to return the Q-values). See the setPlanningDepth(int), setControlDepth(int), and setPlanningAndControlDepth(int) methods for controlling the depth. By default, the depth will be 1.
1. Gordon, Geoffrey J. "Stable function approximation in dynamic programming." Proceedings of the twelfth international conference on machine learning. 1995.

Author:

James MacGlashan.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

class FittedVI.VFAVInit
A class for QFunction that always points to the outer class's current value function approximation.
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Nested Classes
Modifier and Type	Class and Description
`class`	`FittedVI.VFAVInit` A class for `QFunction` that always points to the outer class's current value function approximation.

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`controlDepth` The `SparseSampling` depth used when computing Q-values for the `qValues(State)` and `qValue(State, Action)` methods used for control.
`protected FittedVI.VFAVInit`	`leafNodeInit` This class computes the Bellman operator by using an instance of `SparseSampling` and setting its leaf nodes values to the current value function approximation.
`protected double`	`maxDelta` The maximum change in the value function that will cause planning to terminate.
`protected int`	`maxIterations` The maximum number of iterations to run.
`protected int`	`planningDepth` The `SparseSampling` planning depth used for computing Bellman operators during value iteration.
`protected java.util.List<State>`	`samples` The set of samples on which to perform value iteration.
`protected int`	`transitionSamples` The number of transition samples used when computing the bellman operator.
`protected ValueFunction`	`valueFunction` The current value function approximation
`protected SupervisedVFA`	`valueFunctionTrainer` The `SupervisedVFA` instance used to train the value function on each iteration.
`protected ValueFunction`	`vinit` The initial value function to use

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`FittedVI(SADomain domain, double gamma, SupervisedVFA valueFunctionTrainer, int transitionSamples, double maxDelta, int maxIterations)` Initializes.
`FittedVI(SADomain domain, double gamma, SupervisedVFA valueFunctionTrainer, java.util.List<State> samples, int transitionSamples, double maxDelta, int maxIterations)` Initializes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int`	`getControlDepth()` Returns the Bellman operator depth used for computing Q-values (the `qValue(State, Action)` and `qValue(State, Action)` methods).
`int`	`getPlanningDepth()` Returns the Bellman operator depth used during planning.
`java.util.List<State>`	`getSamples()` Returns the state samples to which the value function will be fit.
`ValueFunction`	`getVInit()` Returns the value function initialization used at the start of planning.
`GreedyQPolicy`	`planFromState(State initialState)` Plans from the input state and then returns a `GreedyQPolicy` that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
`double`	`qValue(State s, Action a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`qValues(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`double`	`runIteration()` Runs a single iteration of value iteration.
`void`	`runVI()` Runs value iteration.
`void`	`setControlDepth(int controlDepth)` Sets the Bellman operator depth used for computing Q-values (the `qValue(State, Action)` and `qValue(State, Action)` methods).
`void`	`setPlanningAndControlDepth(int depth)` Sets the Bellman operator depth used during planning for computing Q-values (the `qValue(State, Action)` and `qValue(State, Action)` methods).
`void`	`setPlanningDepth(int planningDepth)` Sets the Bellman operator depth used during planning.
`void`	`setSamples(java.util.List<State> samples)` Sets the state samples to which the value function will be fit.
`void`	`setVInit(ValueFunction vinit)` Sets the value function initialization used at the start of planning.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

- Field Detail
  - samples
```
protected java.util.List<State> samples
```
    The set of samples on which to perform value iteration.
  - valueFunction
```
protected ValueFunction valueFunction
```
    The current value function approximation
  - valueFunctionTrainer
```
protected SupervisedVFA valueFunctionTrainer
```
    The SupervisedVFA instance used to train the value function on each iteration.
  - vinit
```
protected ValueFunction vinit
```
    The initial value function to use
  - leafNodeInit
```
protected FittedVI.VFAVInit leafNodeInit
```
    This class computes the Bellman operator by using an instance of SparseSampling and setting its leaf nodes values to the current value function approximation. This value function initialization is points to the current value function approximation for it to use.
  - planningDepth
```
protected int planningDepth
```
    The SparseSampling planning depth used for computing Bellman operators during value iteration.
  - controlDepth
```
protected int controlDepth
```
    The SparseSampling depth used when computing Q-values for the qValues(State) and qValue(State, Action) methods used for control.
  - transitionSamples
```
protected int transitionSamples
```
    The number of transition samples used when computing the bellman operator. When set to -1, the full transition dynamics are used rather than sampling.
  - maxIterations
```
protected int maxIterations
```
    The maximum number of iterations to run.
  - maxDelta
```
protected double maxDelta
```
    The maximum change in the value function that will cause planning to terminate.
- Constructor Detail
  - FittedVI
```
public FittedVI(SADomain domain,
                double gamma,
                SupervisedVFA valueFunctionTrainer,
                int transitionSamples,
                double maxDelta,
                int maxIterations)
```
    Initializes. Note that you will need to set the state samples to use for planning with the setSamples(java.util.List) method before calling planFromState(State), runIteration(), or runVI(), otherwise a runtime exception will be thrown.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    valueFunctionTrainer - the supervised learning algorithm to use for each value iteration
    
    transitionSamples - the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.
    
    maxDelta - the maximum change in the value function that will cause planning to terminate.
    
    maxIterations - the maximum number of iterations to run.
  - FittedVI
```
public FittedVI(SADomain domain,
                double gamma,
                SupervisedVFA valueFunctionTrainer,
                java.util.List<State> samples,
                int transitionSamples,
                double maxDelta,
                int maxIterations)
```
    Initializes. Note that you will need to set the state samples to use for planning with the setSamples(java.util.List) method before calling planFromState(State), runIteration(), or runVI(), otherwise a runtime exception will be thrown.
    
    Parameters:
    
    domain - the domain in which to plan
    
    gamma - the discount factor
    
    valueFunctionTrainer - the supervised learning algorithm to use for each value iteration
    
    samples - the set of state samples to use for planning.
    
    transitionSamples - the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.
    
    maxDelta - the maximum change in the value function that will cause planning to terminate.
    
    maxIterations - the maximum number of iterations to run.
- Method Detail
  - getVInit
```
public ValueFunction getVInit()
```
    Returns the value function initialization used at the start of planning.
    
    Returns:
    
    the value function initialization used at the start of planning.
  - setVInit
```
public void setVInit(ValueFunction vinit)
```
    Sets the value function initialization used at the start of planning.
    
    Parameters:
    
    vinit - the value function initialization used at the start of planning.
  - getPlanningDepth
```
public int getPlanningDepth()
```
    Returns the Bellman operator depth used during planning.
    
    Returns:
    
    the Bellman operator depth used during planning.
  - setPlanningDepth
```
public void setPlanningDepth(int planningDepth)
```
    Sets the Bellman operator depth used during planning.
    
    Parameters:
    
    planningDepth - the Bellman operator depth used during planning.
  - getControlDepth
```
public int getControlDepth()
```
    Returns the Bellman operator depth used for computing Q-values (the qValue(State, Action) and qValue(State, Action) methods).
    
    Returns:
    
    the Bellman operator depth used for computing Q-values (the qValue(State, Action) and qValue(State, Action) methods).
  - setControlDepth
```
public void setControlDepth(int controlDepth)
```
    Sets the Bellman operator depth used for computing Q-values (the qValue(State, Action) and qValue(State, Action) methods).
    
    Parameters:
    
    controlDepth - the Bellman operator depth used for computing Q-values (the qValue(State, Action) and qValue(State, Action) methods).
  - setPlanningAndControlDepth
```
public void setPlanningAndControlDepth(int depth)
```
    Sets the Bellman operator depth used during planning for computing Q-values (the qValue(State, Action) and qValue(State, Action) methods).
    
    Parameters:
    
    depth - the Bellman operator depth used during planning for computing Q-values (the qValue(State, Action) and qValue(State, Action) methods).
  - getSamples
```
public java.util.List<State> getSamples()
```
    Returns the state samples to which the value function will be fit.
    
    Returns:
    
    the state samples to which the value function will be fit.
  - setSamples
```
public void setSamples(java.util.List<State> samples)
```
    Sets the state samples to which the value function will be fit.
    
    Parameters:
    
    samples - the state samples to which the value function will be fit.
  - runVI
```
public void runVI()
```
    Runs value iteration. Note that if the state samples have not been set, it will throw a runtime exception.
  - runIteration
```
public double runIteration()
```
    Runs a single iteration of value iteration. Note that if the state samples have not been set, it will throw a runtime exception.
    
    Returns:
    
    the maximum change in the value function.
  - planFromState
```
public GreedyQPolicy planFromState(State initialState)
```
    Plans from the input state and then returns a GreedyQPolicy that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    
    initialState - the initial state of the planning problem
    
    Returns:
    
    a GreedyQPolicy.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver
  - qValues
```
public java.util.List<QValue> qValues(State s)
```
    Description copied from interface: QProvider
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    qValues in interface QProvider
    
    Parameters:
    
    s - the state for which Q-values are to be returned.
    
    Returns:
    
    a List of QValue objects for ever permissible action for the given input state.
  - qValue
```
public double qValue(State s,
                     Action a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    qValue in interface QFunction
    
    Parameters:
    
    s - the input state
    
    a - the input action
    
    Returns:
    
    the QValue for the given state-action pair.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    
    s - the state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.

Class FittedVI

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

samples

valueFunction

valueFunctionTrainer

vinit

leafNodeInit

planningDepth

controlDepth

transitionSamples

maxIterations

maxDelta

Constructor Detail

FittedVI

FittedVI

Method Detail

getVInit

setVInit

getPlanningDepth

setPlanningDepth

getControlDepth

setControlDepth

setPlanningAndControlDepth

getSamples

setSamples

runVI

runIteration

planFromState

resetSolver

qValues

qValue

value