PotentialShapedRMax

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.learning.modellearning.rmax.PotentialShapedRMax

All Implemented Interfaces:

LearningAgent, MDPSolverInterface
```
public class PotentialShapedRMax
extends MDPSolver
implements LearningAgent
```
Potential Shaped RMax [1] is a generalization of RMax in which a potential-shaped reward function is used to provide less (but still admissible) optimistic views of unknown state transitions. If no potential function is provided to this class, then it defaults to classic RMax optimism. The default constructor will use value iteration for planning, but you can provide any valueFunction you'd like. Similarly, the default constructor will use a tabular transition/reward model, but you can also provide your own model learning. See the KWIKModel class for more information on defining your own model. 1. John Asmuth, Michael L. Littman, and Robert Zinkov. "Potential-based Shaping in Model-based Reinforcement Learning." AAAI. 2008.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`PotentialShapedRMax.RMaxPotential` A potential function for vanilla RMax; all states have a potential value of R_max/(1-gamma)

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.LinkedList<Episode>`	`episodeHistory` the saved previous learning episodes
`protected int`	`maxNumSteps` The maximum number of learning steps per episode before the agent gives up
`protected RMaxModel`	`model` The model of the world that is being learned.
`protected RewardFunction`	`modeledRewardFunction` The modeled reward function that is being learned.
`protected TerminalFunction`	`modeledTerminalFunction` The modeled terminal state function.
`protected ModelLearningPlanner`	`modelPlanner` The model-adaptive planning algorithm to use
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`PotentialShapedRMax(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxReward, int nConfident, double maxVIDelta, int maxVIPasses)` Initializes for a tabular model, VI valueFunction, and standard RMax paradigm
`PotentialShapedRMax(SADomain domain, double gamma, HashableStateFactory hashingFactory, PotentialFunction potential, int nConfident, double maxVIDelta, int maxVIPasses)` Initializes for a tabular model, VI valueFunction, and potential shaped function.
`PotentialShapedRMax(SADomain domain, HashableStateFactory hashingFactory, PotentialFunction potential, KWIKModel model, ModelLearningPlanner plannerGenerator)` Initializes for a given model, model learning planner, and potential shaped function.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected Policy`	`createUnmodeledFavoredPolicy()`
`java.util.List<Episode>`	`getAllStoredLearningEpisodes()`
`Episode`	`getLastLearningEpisode()`
`RMaxModel`	`getModel()` Returns the model learning algorithm being used.
`RewardFunction`	`getModeledRewardFunction()` Returns the model reward function.
`TerminalFunction`	`getModeledTerminalFunction()` Returns the model terminal function.
`ModelLearningPlanner`	`getModelPlanner()` Returns the planning algorithm used on the model that can be iteratively updated as the model changes.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`Episode`	`runLearningEpisode(Environment env)`
`Episode`	`runLearningEpisode(Environment env, int maxSteps)`
`void`	`setNumEpisodesToStore(int numEps)`

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - model
```
protected RMaxModel model
```
    The model of the world that is being learned.
  - modeledRewardFunction
```
protected RewardFunction modeledRewardFunction
```
    The modeled reward function that is being learned.
  - modeledTerminalFunction
```
protected TerminalFunction modeledTerminalFunction
```
    The modeled terminal state function.
  - modelPlanner
```
protected ModelLearningPlanner modelPlanner
```
    The model-adaptive planning algorithm to use
  - maxNumSteps
```
protected int maxNumSteps
```
    The maximum number of learning steps per episode before the agent gives up
  - episodeHistory
```
protected java.util.LinkedList<Episode> episodeHistory
```
    the saved previous learning episodes
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
- Constructor Detail
  - PotentialShapedRMax
```
public PotentialShapedRMax(SADomain domain,
                           double gamma,
                           HashableStateFactory hashingFactory,
                           double maxReward,
                           int nConfident,
                           double maxVIDelta,
                           int maxVIPasses)
```
    Initializes for a tabular model, VI valueFunction, and standard RMax paradigm
    
    Parameters:
    
    domain - the real world domain
    
    gamma - the discount factor
    
    hashingFactory - the hashing factory to use for VI and the tabular model
    
    maxReward - the maximum possible reward
    
    nConfident - the number of observations required for the model to be confident in a transition
    
    maxVIDelta - the maximum change in value function for VI to terminate
    
    maxVIPasses - the maximum number of VI iterations per replan.
  - PotentialShapedRMax
```
public PotentialShapedRMax(SADomain domain,
                           double gamma,
                           HashableStateFactory hashingFactory,
                           PotentialFunction potential,
                           int nConfident,
                           double maxVIDelta,
                           int maxVIPasses)
```
    Initializes for a tabular model, VI valueFunction, and potential shaped function.
    
    Parameters:
    
    domain - the real world domain
    
    gamma - the discount factor
    
    hashingFactory - the hashing factory to use for VI and the tabular model
    
    potential - the admissible potential function
    
    nConfident - the number of observations required for the model to be confident in a transition
    
    maxVIDelta - the maximum change in value function for VI to terminate
    
    maxVIPasses - the maximum number of VI iterations per replan.
  - PotentialShapedRMax
```
public PotentialShapedRMax(SADomain domain,
                           HashableStateFactory hashingFactory,
                           PotentialFunction potential,
                           KWIKModel model,
                           ModelLearningPlanner plannerGenerator)
```
    Initializes for a given model, model learning planner, and potential shaped function.
    
    Parameters:
    
    domain - the real world domain
    
    hashingFactory - a state hashing factory for indexing states
    
    potential - the admissible potential function
    
    model - the model/model-learning algorithm to use
    
    plannerGenerator - a generator for a model valueFunction
- Method Detail
  - getModel
```
public RMaxModel getModel()
```
    Returns the model learning algorithm being used.
    
    Specified by:
    
    getModel in interface MDPSolverInterface
    
    Overrides:
    
    getModel in class MDPSolver
    
    Returns:
    
    the model learning algorithm being used.
  - getModelPlanner
```
public ModelLearningPlanner getModelPlanner()
```
    Returns the planning algorithm used on the model that can be iteratively updated as the model changes.
    
    Returns:
    
    the planning algorithm used on the model
  - getModeledRewardFunction
```
public RewardFunction getModeledRewardFunction()
```
    Returns the model reward function. This is expected to have larger values for unknown states.
    
    Returns:
    
    the model reward function
  - getModeledTerminalFunction
```
public TerminalFunction getModeledTerminalFunction()
```
    Returns the model terminal function. This should start as a null termination and add terminal states as it obsreves them.
    
    Returns:
    
    the model terminal function
  - runLearningEpisode
```
public Episode runLearningEpisode(Environment env)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - runLearningEpisode
```
public Episode runLearningEpisode(Environment env,
                                  int maxSteps)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - createUnmodeledFavoredPolicy
```
protected Policy createUnmodeledFavoredPolicy()
```
  - getLastLearningEpisode
```
public Episode getLastLearningEpisode()
```
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
  - getAllStoredLearningEpisodes
```
public java.util.List<Episode> getAllStoredLearningEpisodes()
```
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver

Class PotentialShapedRMax

Nested Class Summary

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Field Detail

model

modeledRewardFunction

modeledTerminalFunction

modelPlanner

maxNumSteps

episodeHistory

numEpisodesToStore

Constructor Detail

PotentialShapedRMax

PotentialShapedRMax

PotentialShapedRMax

Method Detail

getModel

getModelPlanner

getModeledRewardFunction

getModeledTerminalFunction

runLearningEpisode

runLearningEpisode

createUnmodeledFavoredPolicy

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

resetSolver