VIModelLearningPlanner

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.planning.stochastic.valueiteration.ValueIteration
    - - burlap.behavior.singleagent.learning.modellearning.modelplanners.VIModelLearningPlanner

All Implemented Interfaces:

ModelLearningPlanner, MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction
```
public class VIModelLearningPlanner
extends ValueIteration
implements ModelLearningPlanner
```
A model learning interface wrapper to VI that causes VI to be performed every time the model is updated or whenever a novel state is seen that was not previously expected to be reachable. When the model changes, planning is always performed from the initial state of an episode as well as the last changed episode

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected State`	`initialState` The last initial state of an episode
`protected Policy`	`modelPolicy` The greedy policy that results from VI
`protected java.util.Set<HashableState>`	`observedStates` States the agent has observed during learning.

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.valueiteration.ValueIteration
foundReachableStates, hasRunVI, maxDelta, maxIterations, stopReachabilityFromTerminalStates

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
operator, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`VIModelLearningPlanner(SADomain domain, FullModel model, double gamma, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)` Initializes

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`initializePlannerIn(State s)` This is method is expected to be called at the beginning of any new learning episode.
`void`	`modelChanged(State changedState)` Tells the valueFunction that the model has changed and that it will need to replan accordingly
`Policy`	`modelPlannedPolicy()` Returns a policy encoding the planner's results.
`protected void`	`rerunVI()` Reruns VI on the new updated model.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.valueiteration.ValueIteration
performReachabilityFrom, planFromState, recomputeReachableStates, resetSolver, runVI, toggleReachabiltiyTerminalStatePruning

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, DPPInit, getAllStates, getCopyOfValueFunction, getDefaultValue, getModel, getOperator, getValueFunctionInitialization, hasComputedValueFor, loadValueTable, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, qValue, qValues, setOperator, setValueFunctionInitialization, value, value, writeValueTable

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.planning.Planner
planFromState

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, resetSolver, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

- Field Detail
  - observedStates
```
protected java.util.Set<HashableState> observedStates
```
    States the agent has observed during learning.
  - modelPolicy
```
protected Policy modelPolicy
```
    The greedy policy that results from VI
  - initialState
```
protected State initialState
```
    The last initial state of an episode
- Constructor Detail
  - VIModelLearningPlanner
```
public VIModelLearningPlanner(SADomain domain,
                              FullModel model,
                              double gamma,
                              HashableStateFactory hashingFactory,
                              double maxDelta,
                              int maxIterations)
```
    Initializes
    
    Parameters:
    
    domain - model domain
    
    model - the learned model to use for planning
    
    gamma - discount factor
    
    hashingFactory - the hashing factory
    
    maxDelta - max value function delta in VI
    
    maxIterations - max iterations of VI
- Method Detail
  - initializePlannerIn
```
public void initializePlannerIn(State s)
```
    Description copied from interface: ModelLearningPlanner
    
    This is method is expected to be called at the beginning of any new learning episode. This may be useful for planning algorithms that do not solve the policy for every state since new episodes may starts in states the planning algorithm had not previously considered. before a learning episode begins.
    
    Specified by:
    
    initializePlannerIn in interface ModelLearningPlanner
    
    Parameters:
    
    s - the input state
  - modelChanged
```
public void modelChanged(State changedState)
```
    Description copied from interface: ModelLearningPlanner
    
    Tells the valueFunction that the model has changed and that it will need to replan accordingly
    
    Specified by:
    
    modelChanged in interface ModelLearningPlanner
    
    Parameters:
    
    changedState - the source state that caused a change in the model.
  - modelPlannedPolicy
```
public Policy modelPlannedPolicy()
```
    Description copied from interface: ModelLearningPlanner
    
    Returns a policy encoding the planner's results.
    
    Specified by:
    
    modelPlannedPolicy in interface ModelLearningPlanner
    
    Returns:
    
    a policy object
  - rerunVI
```
protected void rerunVI()
```
    Reruns VI on the new updated model. It will force VI to consider all states the agent has ever previously observed, even though not all may be connected by the current unknown transition model.

Class VIModelLearningPlanner

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.valueiteration.ValueIteration

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.valueiteration.ValueIteration

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.planning.Planner

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

observedStates

modelPolicy

initialState

Constructor Detail

VIModelLearningPlanner

Method Detail

initializePlannerIn

modelChanged

modelPlannedPolicy

rerunVI