VIModelPlanner

java.lang.Object
- burlap.behavior.singleagent.learning.modellearning.modelplanners.VIModelPlanner

All Implemented Interfaces:

ModelPlanner, QComputablePlanner
```
public class VIModelPlanner
extends java.lang.Object
implements ModelPlanner, QComputablePlanner
```
A model learning interface wrapper to VI that causes VI to be performed every time the model is updated or whenever a novel state is seen that was not previously expected to be reachable. When the model changes, planning is always performed from the initial state of an episode as well as the last changed episode

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class VIModelPlanner.VIModelPlannerGenerator
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.modellearning.ModelPlanner
  ModelPlanner.ModelPlannerGenerator
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper

Nested Classes
Modifier and Type	Class and Description
`static class`	`VIModelPlanner.VIModelPlannerGenerator`

Field Summary

Fields
Modifier and Type	Field and Description
`protected Domain`	`domain` the model domain
`protected double`	`gamma` The model planning discount factor
`protected StateHashFactory`	`hashingFactory` The hashing factory to use
`protected State`	`initialState` The last initial state of an episode
`protected double`	`maxDelta` The maximium VI delta
`protected int`	`maxIterations` The maximum number of VI iterations
`protected Policy`	`modelPolicy` The greedy policy that results from VI
`protected java.util.Set<StateHashTuple>`	`observedStates` States the agent has observed during learning.
`protected RewardFunction`	`rf` The model reward function
`protected TerminalFunction`	`tf` The model termination function
`protected ValueIteration`	`vi` The value iteration planning object

Constructor Summary

Constructors
Constructor and Description
`VIModelPlanner(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxDelta, int maxIterations)` Initializes

Method Summary

Methods
Modifier and Type	Method and Description
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`ValueIteration`	`getValueIterationPlanner()` Returns the value iteration object used for planning whenever the model updates.
`void`	`initializePlannerIn(State s)` This is method is expected to be called at the beginning of any new learning episode.
`void`	`modelChanged(State changedState)` Tells the planner that the model has changed and that it will need to replan accordingly
`Policy`	`modelPlannedPolicy()` Returns a policy encoding the planner's results.
`protected void`	`rerunVI()` Reruns VI on the new updated model.
`void`	`resetPlanner()` Resets planner as if no planning had never been called.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - vi
```
protected ValueIteration vi
```
    The value iteration planning object
  - modelPolicy
```
protected Policy modelPolicy
```
    The greedy policy that results from VI
  - initialState
```
protected State initialState
```
    The last initial state of an episode
  - domain
```
protected Domain domain
```
    the model domain
  - rf
```
protected RewardFunction rf
```
    The model reward function
  - tf
```
protected TerminalFunction tf
```
    The model termination function
  - gamma
```
protected double gamma
```
    The model planning discount factor
  - hashingFactory
```
protected StateHashFactory hashingFactory
```
    The hashing factory to use
  - maxDelta
```
protected double maxDelta
```
    The maximium VI delta
  - maxIterations
```
protected int maxIterations
```
    The maximum number of VI iterations
  - observedStates
```
protected java.util.Set<StateHashTuple> observedStates
```
    States the agent has observed during learning.
- Constructor Detail
  - VIModelPlanner
```
public VIModelPlanner(Domain domain,
              RewardFunction rf,
              TerminalFunction tf,
              double gamma,
              StateHashFactory hashingFactory,
              double maxDelta,
              int maxIterations)
```
    Initializes
    
    Parameters:
    domain - model domain
    rf - model reward funciton
    tf - model termination function
    gamma - discount factor
    hashingFactory - the hashing factory
    maxDelta - max value function delta in VI
    maxIterations - max iterations of VI
- Method Detail
  - initializePlannerIn
```
public void initializePlannerIn(State s)
```
    Description copied from interface: ModelPlanner
    
    This is method is expected to be called at the beginning of any new learning episode. This may be useful for planning algorithms that do not solve the policy for every state since new episodes may starts in epsidoes the planning algorithm had not previously considered. before a learning episode begins.
    
    Specified by:
    
    initializePlannerIn in interface ModelPlanner
    
    Parameters:
    s - the input state
  - modelChanged
```
public void modelChanged(State changedState)
```
    Description copied from interface: ModelPlanner
    
    Tells the planner that the model has changed and that it will need to replan accordingly
    
    Specified by:
    
    modelChanged in interface ModelPlanner
    
    Parameters:
    changedState - the source state that caused a change in the model.
  - modelPlannedPolicy
```
public Policy modelPlannedPolicy()
```
    Description copied from interface: ModelPlanner
    
    Returns a policy encoding the planner's results.
    
    Specified by:
    
    modelPlannedPolicy in interface ModelPlanner
    
    Returns:
    a policy object
  - resetPlanner
```
public void resetPlanner()
```
    Description copied from interface: ModelPlanner
    
    Resets planner as if no planning had never been called.
    
    Specified by:
    
    resetPlanner in interface ModelPlanner
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - getValueIterationPlanner
```
public ValueIteration getValueIterationPlanner()
```
    Returns the value iteration object used for planning whenever the model updates.
    
    Returns:
    the value iteration object used for planning whenever the model updates.
  - rerunVI
```
protected void rerunVI()
```
    Reruns VI on the new updated model. It will force VI to consider all states the agent has ever previously observed, even though not all may be connected by the current unknown transition model.

Class VIModelPlanner

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.modellearning.ModelPlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

vi

modelPolicy

initialState

domain

rf

tf

gamma

hashingFactory

maxDelta

maxIterations

observedStates

Constructor Detail

VIModelPlanner

Method Detail

initializePlannerIn

modelChanged

modelPlannedPolicy

resetPlanner

getQs

getQ

getValueIterationPlanner

rerunVI