PotentialShapedRMax

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.learning.modellearning.rmax.PotentialShapedRMax

All Implemented Interfaces:

LearningAgent
```
public class PotentialShapedRMax
extends OOMDPPlanner
implements LearningAgent
```
Potential Shaped RMax [1] is a generalization of RMax in which a potential-shaped reward function is used to provide less (but still admissible) optimistic views of unknown state transitions. If no potential function is provided to this class, then it defaults to classic RMax optimism. The default constructor will use value iteration for planning, but you can provide any planner you'd like. Similarly, the default constructor will use a tabular transition/reward model, but you can also provide your own model learning. See the Model class for more information on defining your own model. 1. John Asmuth, Michael L. Littman, and Robert Zinkov. "Potential-based Shaping in Model-based Reinforcement Learning." AAAI. 2008.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`protected class`	`PotentialShapedRMax.PotentialShapedRMaxRF` This class is a special version of a potential shaped reward function that does not remove the potential value for transitions to states with uknown action transitions that are followed.
`class`	`PotentialShapedRMax.PotentialShapedRMaxTerminal` A Terminal function that treats transitions to RMax fictious nodes as terminal states as well as what the model reports as terminal states.
`static class`	`PotentialShapedRMax.RMaxPotential` A potential function for vanilla RMax; all states have a potential value of R_max/(1-gamma)

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent
LearningAgent.LearningAgentBookKeeping

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.LinkedList<EpisodeAnalysis>`	`episodeHistory` the saved previous learning episodes
`protected int`	`maxNumSteps` The maximum number of learning steps per episode before the agent gives up
`protected Model`	`model` The model of the world that is being learned.
`protected Domain`	`modeledDomain` The modeled domain object containing the modeled actions that a planner will use.
`protected RewardFunction`	`modeledRewardFunction` The modeled reward function that is being learned.
`protected TerminalFunction`	`modeledTerminalFunction` The modeled terminal state function.
`protected ModelPlanner`	`modelPlanner` The model-adaptive planning algorithm to use
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`PotentialShapedRMax(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double maxReward, int nConfident, double maxVIDelta, int maxVIPasses)` Initializes for a tabular model, VI planner, and standard RMax paradigm
`PotentialShapedRMax(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, PotentialFunction potential, int nConfident, double maxVIDelta, int maxVIPasses)` Initializes for a tabular model, VI planner, and potential shaped function.
`PotentialShapedRMax(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, PotentialFunction potential, Model model, ModelPlanner.ModelPlannerGenerator plannerGenerator)` Initializes for a given model, model planner, and potential shaped function.

Method Summary

Methods
Modifier and Type	Method and Description
`protected Policy`	`createDomainMappedPolicy()`
`java.util.List<EpisodeAnalysis>`	`getAllStoredLearningEpisodes()` Returns all saved `EpisodeAnalysis` objects of which the agent has kept track.
`EpisodeAnalysis`	`getLastLearningEpisode()` Returns the last learning episode of the agent.
`Model`	`getModel()` Returns the model learning algorithm being used.
`Domain`	`getModeledDomain()` Returns the model domain for planning.
`RewardFunction`	`getModeledRewardFunction()` Returns the model reward function.
`TerminalFunction`	`getModeledTerminalFunction()` Returns the model terminal function.
`ModelPlanner`	`getModelPlanner()` Returns the planning algorithm used on the model that can be iteratively updated as the model changes.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState)` Causes the agent to perform a learning episode starting in the given initial state.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState, int maxSteps)` Causes the agent to perform a learning episode starting in the given initial state.
`void`	`setNumEpisodesToStore(int numEps)` Tells the agent how many `EpisodeAnalysis` objects representing learning episodes to internally store.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - model
```
protected Model model
```
    The model of the world that is being learned.
  - modeledDomain
```
protected Domain modeledDomain
```
    The modeled domain object containing the modeled actions that a planner will use.
  - modeledRewardFunction
```
protected RewardFunction modeledRewardFunction
```
    The modeled reward function that is being learned.
  - modeledTerminalFunction
```
protected TerminalFunction modeledTerminalFunction
```
    The modeled terminal state function.
  - modelPlanner
```
protected ModelPlanner modelPlanner
```
    The model-adaptive planning algorithm to use
  - maxNumSteps
```
protected int maxNumSteps
```
    The maximum number of learning steps per episode before the agent gives up
  - episodeHistory
```
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
```
    the saved previous learning episodes
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
- Constructor Detail
  - PotentialShapedRMax
```
public PotentialShapedRMax(Domain domain,
                   RewardFunction rf,
                   TerminalFunction tf,
                   double gamma,
                   StateHashFactory hashingFactory,
                   double maxReward,
                   int nConfident,
                   double maxVIDelta,
                   int maxVIPasses)
```
    Initializes for a tabular model, VI planner, and standard RMax paradigm
    
    Parameters:
    domain - the real world domain
    rf - the real world reward function
    tf - the real world terminal function
    gamma - the discount factor
    hashingFactory - the hashing factory to use for VI and the tabular model
    maxReward - the maximum possible reward
    nConfident - the number of observations required for the model to be confident in a transition
    maxVIDelta - the maximum change in value function for VI to terminate
    maxVIPasses - the maximum number of VI iterations per replan.
  - PotentialShapedRMax
```
public PotentialShapedRMax(Domain domain,
                   RewardFunction rf,
                   TerminalFunction tf,
                   double gamma,
                   StateHashFactory hashingFactory,
                   PotentialFunction potential,
                   int nConfident,
                   double maxVIDelta,
                   int maxVIPasses)
```
    Initializes for a tabular model, VI planner, and potential shaped function.
    
    Parameters:
    domain - the real world domain
    rf - the real world reward function
    tf - the real world terminal function
    gamma - the discount factor
    hashingFactory - the hashing factory to use for VI and the tabular model
    potential - the admissible potential function
    nConfident - the number of observations required for the model to be confident in a transition
    maxVIDelta - the maximum change in value function for VI to terminate
    maxVIPasses - the maximum number of VI iterations per replan.
  - PotentialShapedRMax
```
public PotentialShapedRMax(Domain domain,
                   RewardFunction rf,
                   TerminalFunction tf,
                   double gamma,
                   StateHashFactory hashingFactory,
                   PotentialFunction potential,
                   Model model,
                   ModelPlanner.ModelPlannerGenerator plannerGenerator)
```
    Initializes for a given model, model planner, and potential shaped function.
    
    Parameters:
    domain - the real world domain
    rf - the real world reward function
    tf - the real world terminal function
    gamma - the discount factor
    hashingFactory - a state hashing factory for indexing states
    potential - the admissible potential function
    model - the model/model-learning algorithm to use
    plannerGenerator - a generator for a model planner
- Method Detail
  - getModel
```
public Model getModel()
```
    Returns the model learning algorithm being used.
    
    Returns:
    the model learning algorithm being used.
  - getModeledDomain
```
public Domain getModeledDomain()
```
    Returns the model domain for planning. This model domain may differ from the real domain in the actions it uses for planning.
    
    Returns:
    the model domain for planning
  - getModelPlanner
```
public ModelPlanner getModelPlanner()
```
    Returns the planning algorithm used on the model that can be iteratively updated as the model changes.
    
    Returns:
    the planning algorithm used on the model
  - getModeledRewardFunction
```
public RewardFunction getModeledRewardFunction()
```
    Returns the model reward function. This is expected to have larger values for unknown states.
    
    Returns:
    the model reward function
  - getModeledTerminalFunction
```
public TerminalFunction getModeledTerminalFunction()
```
    Returns the model terminal function. This should start as a null termination and add terminal states as it obsreves them.
    
    Returns:
    the model terminal function
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached or if the agent decides to determinate the episode (e.g., by having an internal parameter set for a maximum number of steps in an episode).
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState,
                                     int maxSteps)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached, if the agent decides to determinate the episode, or if the number of steps reaches the provided threshold.
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    maxSteps - the maximum number of steps in the episode
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - createDomainMappedPolicy
```
protected Policy createDomainMappedPolicy()
```
  - getLastLearningEpisode
```
public EpisodeAnalysis getLastLearningEpisode()
```
    Description copied from interface: LearningAgent
    
    Returns the last learning episode of the agent.
    
    Specified by:
    
    getLastLearningEpisode in interface LearningAgent
    
    Returns:
    the last learning episode of the agent.
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
    Description copied from interface: LearningAgent
    
    Tells the agent how many EpisodeAnalysis objects representing learning episodes to internally store. For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number has nothing to do with how learning is performed; it is purely for performance gathering.
    
    Specified by:
    
    setNumEpisodesToStore in interface LearningAgent
    
    Parameters:
    numEps - the number of learning episodes to remember.
  - getAllStoredLearningEpisodes
```
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
```
    Description copied from interface: LearningAgent
    
    Returns all saved EpisodeAnalysis objects of which the agent has kept track.
    
    Specified by:
    
    getAllStoredLearningEpisodes in interface LearningAgent
    
    Returns:
    all saved EpisodeAnalysis objects of which the agent has kept track.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner

Class PotentialShapedRMax

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

model

modeledDomain

modeledRewardFunction

modeledTerminalFunction

modelPlanner

maxNumSteps

episodeHistory

numEpisodesToStore

Constructor Detail

PotentialShapedRMax

PotentialShapedRMax

PotentialShapedRMax

Method Detail

getModel

getModeledDomain

getModelPlanner

getModeledRewardFunction

getModeledTerminalFunction

runLearningEpisodeFrom

runLearningEpisodeFrom

createDomainMappedPolicy

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

planFromState

resetPlannerResults