FittedVI

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.planning.vfa.fittedvi.FittedVI

All Implemented Interfaces:

QComputablePlanner, ValueFunction
```
public class FittedVI
extends OOMDPPlanner
implements ValueFunction, QComputablePlanner
```
A class for performing Fitted Value Iteration [1]. This is a variant of value iteration that takes a set of sample states from a domain and performs synchronous value iteration on the samples by using the Bellman operator and a current approximation of the value function. Specifically, the value function is seeded to some initial value (by default zero, but it can be set to something else with the setVInit(burlap.behavior.singleagent.ValueFunctionInitialization) method). For each state sample, a new value for the state is computed by applying the bellman operator (using the model of the world and the current, initially zero-valued, value function approximation). The newly computed values for each state are then used as a supervised instance to train the next iteration of the value function.

To perform planning after specifying the state samples to use (either in the constructor or with the setSamples(java.util.List) method, you can perform planning with the runVI() method. You can also use the standard planFromState(burlap.oomdp.core.State) method, but specifying the state does not change behavior; the method just calls the runVI() method itself.

To compute the value of a state sample with the current value function approximation, this class will invoke the SparseSampling class. This enables it to perform an approximate Bellman operator with sparse samples from the transition dynamics, which is useful if the number of possible next state transitions is infinite or very large. Furthermore, it allows you to set sparse sampling tree depth larger than one to get a more accurate estimate of the target state Value. The depth of the tree can be independently set when planning (that is, running value iteration) and for control (that is, the depth used to return the Q-values). See the setPlanningDepth(int), setControlDepth(int), and setPlanningAndControlDepth(int) methods for controlling the depth. By default, the depth will be 1.

1. Gordon, Geoffrey J. "Stable function approximation in dynamic programming." Proceedings of the twelfth international conference on machine learning. 1995.

Author:

James MacGlashan.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`class`	`FittedVI.VFAVInit` A class for `ValueFunctionInitialization` that always points to the outer class's current value function approximation.

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`controlDepth` The `SparseSampling` depth used when computing Q-values for the `getQs(burlap.oomdp.core.State)` and `getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)` methods used for control.
`protected FittedVI.VFAVInit`	`leafNodeInit` This class computes the Bellman operator by using an instance of `SparseSampling` and setting its leaf nodes values to the current value function approximation.
`protected double`	`maxDelta` The maximum change in the value function that will cause planning to terminate.
`protected int`	`maxIterations` The maximum number of iterations to run.
`protected int`	`planningDepth` The `SparseSampling` planning depth used for computing Bellman operators during value iteration.
`protected java.util.List<State>`	`samples` The set of samples on which to perform value iteration.
`protected int`	`transitionSamples` The number of transition samples used when computing the bellman operator.
`protected ValueFunction`	`valueFunction` The current value function approximation
`protected SupervisedVFA`	`valueFunctionTrainer` The `SupervisedVFA` instance used to train the value function on each iteration.
`protected ValueFunctionInitialization`	`vinit` The initial value function to use

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`FittedVI(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, SupervisedVFA valueFunctionTrainer, int transitionSamples, double maxDelta, int maxIterations)` Initializes.
`FittedVI(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, SupervisedVFA valueFunctionTrainer, java.util.List<State> samples, int transitionSamples, double maxDelta, int maxIterations)` Initializes.

Method Summary

Methods
Modifier and Type	Method and Description
`int`	`getControlDepth()` Returns the Bellman operator depth used for computing Q-values (the `getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)` and `getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)` methods).
`int`	`getPlanningDepth()` Returns the Bellman operator depth used during planning.
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`java.util.List<State>`	`getSamples()` Returns the state samples to which the value function will be fit.
`ValueFunctionInitialization`	`getVInit()` Returns the value function initialization used at the start of planning.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`double`	`runIteration()` Runs a single iteration of value iteration.
`void`	`runVI()` Runs value iteration.
`void`	`setControlDepth(int controlDepth)` Sets the Bellman operator depth used for computing Q-values (the `getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)` and `getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)` methods).
`void`	`setPlanningAndControlDepth(int depth)` Sets the Bellman operator depth used during planning for computing Q-values (the `getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)` and `getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction)` methods).
`void`	`setPlanningDepth(int planningDepth)` Sets the Bellman operator depth used during planning.
`void`	`setSamples(java.util.List<State> samples)` Sets the state samples to which the value function will be fit.
`void`	`setVInit(ValueFunctionInitialization vinit)` Sets the value function initialization used at the start of planning.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - samples
```
protected java.util.List<State> samples
```
    The set of samples on which to perform value iteration.
  - valueFunction
```
protected ValueFunction valueFunction
```
    The current value function approximation
  - valueFunctionTrainer
```
protected SupervisedVFA valueFunctionTrainer
```
    The SupervisedVFA instance used to train the value function on each iteration.
  - vinit
```
protected ValueFunctionInitialization vinit
```
    The initial value function to use
  - leafNodeInit
```
protected FittedVI.VFAVInit leafNodeInit
```
    This class computes the Bellman operator by using an instance of SparseSampling and setting its leaf nodes values to the current value function approximation. This value function initialization is points to the current value function approximation for it to use.
  - planningDepth
```
protected int planningDepth
```
    The SparseSampling planning depth used for computing Bellman operators during value iteration.
  - controlDepth
```
protected int controlDepth
```
    The SparseSampling depth used when computing Q-values for the getQs(burlap.oomdp.core.State) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods used for control.
  - transitionSamples
```
protected int transitionSamples
```
    The number of transition samples used when computing the bellman operator. When set to -1, the full transition dynamics are used rather than sampling.
  - maxIterations
```
protected int maxIterations
```
    The maximum number of iterations to run.
  - maxDelta
```
protected double maxDelta
```
    The maximum change in the value function that will cause planning to terminate.
- Constructor Detail
  - FittedVI
```
public FittedVI(Domain domain,
        RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        SupervisedVFA valueFunctionTrainer,
        int transitionSamples,
        double maxDelta,
        int maxIterations)
```
    Initializes. Note that you will need to set the state samples to use for planning with the setSamples(java.util.List) method before calling planFromState(burlap.oomdp.core.State), runIteration(), or runVI(), otherwise a runtime exception will be thrown.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    valueFunctionTrainer - the supervised learning algorithm to use for each value iteration
    transitionSamples - the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.
    maxDelta - the maximum change in the value function that will cause planning to terminate.
    maxIterations - the maximum number of iterations to run.
  - FittedVI
```
public FittedVI(Domain domain,
        RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        SupervisedVFA valueFunctionTrainer,
        java.util.List<State> samples,
        int transitionSamples,
        double maxDelta,
        int maxIterations)
```
    Initializes. Note that you will need to set the state samples to use for planning with the setSamples(java.util.List) method before calling planFromState(burlap.oomdp.core.State), runIteration(), or runVI(), otherwise a runtime exception will be thrown.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    valueFunctionTrainer - the supervised learning algorithm to use for each value iteration
    samples - the set of state samples to use for planning.
    transitionSamples - the number of transition samples to use when computing the bellman operator; set to -1 if you want to use the full transition dynamics without sampling.
    maxDelta - the maximum change in the value function that will cause planning to terminate.
    maxIterations - the maximum number of iterations to run.
- Method Detail
  - getVInit
```
public ValueFunctionInitialization getVInit()
```
    Returns the value function initialization used at the start of planning.
    
    Returns:
    the value function initialization used at the start of planning.
  - setVInit
```
public void setVInit(ValueFunctionInitialization vinit)
```
    Sets the value function initialization used at the start of planning.
    
    Parameters:
    vinit - the value function initialization used at the start of planning.
  - getPlanningDepth
```
public int getPlanningDepth()
```
    Returns the Bellman operator depth used during planning.
    
    Returns:
    the Bellman operator depth used during planning.
  - setPlanningDepth
```
public void setPlanningDepth(int planningDepth)
```
    Sets the Bellman operator depth used during planning.
    
    Parameters:
    planningDepth - the Bellman operator depth used during planning.
  - getControlDepth
```
public int getControlDepth()
```
    Returns the Bellman operator depth used for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods).
    
    Returns:
    the Bellman operator depth used for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods).
  - setControlDepth
```
public void setControlDepth(int controlDepth)
```
    Sets the Bellman operator depth used for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods).
    
    Parameters:
    controlDepth - the Bellman operator depth used for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods).
  - setPlanningAndControlDepth
```
public void setPlanningAndControlDepth(int depth)
```
    Sets the Bellman operator depth used during planning for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods).
    
    Parameters:
    depth - the Bellman operator depth used during planning for computing Q-values (the getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) and getQ(burlap.oomdp.core.State, burlap.oomdp.core.AbstractGroundedAction) methods).
  - getSamples
```
public java.util.List<State> getSamples()
```
    Returns the state samples to which the value function will be fit.
    
    Returns:
    the state samples to which the value function will be fit.
  - setSamples
```
public void setSamples(java.util.List<State> samples)
```
    Sets the state samples to which the value function will be fit.
    
    Parameters:
    samples - the state samples to which the value function will be fit.
  - runVI
```
public void runVI()
```
    Runs value iteration. Note that if the state samples have not been set, it will throw a runtime exception.
  - runIteration
```
public double runIteration()
```
    Runs a single iteration of value iteration. Note that if the state samples have not been set, it will throw a runtime exception.
    
    Returns:
    the maximum change in the value function.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    s - the state to evaluate.
    
    Returns:
    the value function evaluation of the given state.

Class FittedVI

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

samples

valueFunction

valueFunctionTrainer

vinit

leafNodeInit

planningDepth

controlDepth

transitionSamples

maxIterations

maxDelta

Constructor Detail

FittedVI

FittedVI

Method Detail

getVInit

setVInit

getPlanningDepth

setPlanningDepth

getControlDepth

setControlDepth

setPlanningAndControlDepth

getSamples

setSamples

runVI

runIteration

planFromState

resetPlannerResults

getQs

getQ

value