SarsaLam

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.learning.tdmethods.QLearning
  - - burlap.behavior.singleagent.learning.tdmethods.SarsaLam

All Implemented Interfaces:

LearningAgent, QComputablePlanner
```
public class SarsaLam
extends QLearning
```
Tabular SARSA(\lambda) implementation [1]. This implementation will work correctly with options [2]. The implementation can either be used for learning or planning, the latter of which is performed by running many learning episodes in succession. The number of episodes used for planning can be determined by a threshold maximum number of episodes, or by a maximum change in the Q-function threshold.

Author:

James MacGlashan
1. Rummery, Gavin A., and Mahesan Niranjan. On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, 1994.
2. Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1 (1999): 181-211.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class SarsaLam.EligibilityTrace
A data structure for maintaining eligibility trace values
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent
  LearningAgent.LearningAgentBookKeeping

Nested Classes
Modifier and Type	Class and Description
`static class`	`SarsaLam.EligibilityTrace` A data structure for maintaining eligibility trace values

Field Summary

Fields
Modifier and Type Field and Description

protected double lambda
the strength of eligibility traces (0 for one step, 1 for full propagation)
- Fields inherited from class burlap.behavior.singleagent.learning.tdmethods.QLearning
  episodeHistory, eStepCounter, learningPolicy, learningRate, maxEpisodeSize, maxQChangeForPlanningTermination, maxQChangeInLastEpisode, numEpisodesForPlanning, numEpisodesToStore, qIndex, qInitFunction, shouldAnnotateOptions, shouldDecomposeOptions, totalNumberOfSteps
- Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
  actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Fields
Modifier and Type	Field and Description
`protected double`	`lambda` the strength of eligibility traces (0 for one step, 1 for full propagation)

Constructor Summary

Constructors
Constructor and Description
`SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, double lambda)` Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the agent can take in an episode.
`SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize, double lambda)` Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere.
`SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)` Initializes SARSA(\lambda) with the same Q-value initialization everywhere.
`SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)` Initializes SARSA(\lambda).

Method Summary

Methods
Modifier and Type	Method and Description
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState, int maxSteps)` Causes the agent to perform a learning episode starting in the given initial state.
`protected void`	`sarsalamInit(double lambda)`

Methods inherited from class burlap.behavior.singleagent.learning.tdmethods.QLearning
getAllStoredLearningEpisodes, getLastLearningEpisode, getLastNumSteps, getMaxQ, getQ, getQ, getQs, getQs, getStateNode, planFromState, QLInit, resetPlannerResults, runLearningEpisodeFrom, setLearningPolicy, setLearningRateFunction, setMaximumEpisodesForPlanning, setMaxQChangeForPlanningTerminaiton, setNumEpisodesToStore, setQInitFunction, toggleShouldAnnotateOptionDecomposition, toggleShouldDecomposeOption

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - lambda
```
protected double lambda
```
    the strength of eligibility traces (0 for one step, 1 for full propagation)
- Constructor Detail
  - SarsaLam
```
public SarsaLam(Domain domain,
        RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        StateHashFactory hashingFactory,
        double qInit,
        double learningRate,
        double lambda)
```
    Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the agent can take in an episode. By default the agent will only save the last learning episode and a call to the QLearning.planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - the initial Q-value to user everywhere
    learningRate - the learning rate
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
  - SarsaLam
```
public SarsaLam(Domain domain,
        RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        StateHashFactory hashingFactory,
        double qInit,
        double learningRate,
        int maxEpisodeSize,
        double lambda)
```
    Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere. By default the agent will only save the last learning episode and a call to the QLearning.planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - the initial Q-value to user everywhere
    learningRate - the learning rate
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
  - SarsaLam
```
public SarsaLam(Domain domain,
        RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        StateHashFactory hashingFactory,
        double qInit,
        double learningRate,
        Policy learningPolicy,
        int maxEpisodeSize,
        double lambda)
```
    Initializes SARSA(\lambda) with the same Q-value initialization everywhere. Note that if the provided policy is derived from the Q-value of this learning agent (as it should be), you may need to set the policy to point to this object after call this constructor; the constructor will not do this automatically in case it was by design to use the policy that was learned in some other domain. By default the agent will only save the last learning episode and a call to the QLearning.planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - the initial Q-value to user everywhere
    learningRate - the learning rate
    learningPolicy - the learning policy to follow during a learning episode.
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
  - SarsaLam
```
public SarsaLam(Domain domain,
        RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        StateHashFactory hashingFactory,
        ValueFunctionInitialization qInit,
        double learningRate,
        Policy learningPolicy,
        int maxEpisodeSize,
        double lambda)
```
    Initializes SARSA(\lambda). Note that if the provided policy is derived from the Q-value of this learning agent (as it should be), you may need to set the policy to point to this object after call this constructor; the constructor will not do this automatically in case it was by design to use the policy that was learned in some other domain. By default the agent will only save the last learning episode and a call to the QLearning.planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - a ValueFunctionInitialization object that can be used to initialize the Q-values.
    learningRate - the learning rate
    learningPolicy - the learning policy to follow during a learning episode.
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
- Method Detail
  - sarsalamInit
```
protected void sarsalamInit(double lambda)
```
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState,
                                     int maxSteps)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached, if the agent decides to determinate the episode, or if the number of steps reaches the provided threshold.
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Overrides:
    
    runLearningEpisodeFrom in class QLearning
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    maxSteps - the maximum number of steps in the episode
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.

Class SarsaLam

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent

Field Summary

Fields inherited from class burlap.behavior.singleagent.learning.tdmethods.QLearning

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.learning.tdmethods.QLearning

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

lambda

Constructor Detail

SarsaLam

SarsaLam

SarsaLam

SarsaLam

Method Detail

sarsalamInit

runLearningEpisodeFrom