GradientDescentSarsaLam

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.learning.tdmethods.vfa.GradientDescentSarsaLam

All Implemented Interfaces:

LearningAgent, QComputablePlanner
```
public class GradientDescentSarsaLam
extends OOMDPPlanner
implements QComputablePlanner, LearningAgent
```
Gradient Descent SARSA(\lambda) implementation [1]. This implementation will work correctly with options [2]. This implementation will work with both linear and non-linear value function approximations by using the gradient value provided to it through the ValueFunctionApproximation interface provided.
The implementation can either be used for learning or planning, the latter of which is performed by running many learning episodes in succession. The number of episodes used for planning can be determined by a threshold maximum number of episodes, or by a maximum change in the VFA weight threshold.

Author:

James MacGlashan
1. Rummery, Gavin A., and Mahesan Niranjan. On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, 1994.
2. 2. Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1 (1999): 181-211.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`GradientDescentSarsaLam.EligibilityTraceVector` An object for keeping track of the eligibility traces within an episode for each VFA weight

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
QComputablePlanner.QComputablePlannerHelper

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent
LearningAgent.LearningAgentBookKeeping

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.LinkedList<EpisodeAnalysis>`	`episodeHistory` the saved previous learning episodes
`protected int`	`eStepCounter` A counter for counting the number of steps in an episode that have been taken thus far
`protected double`	`lambda` the strength of eligibility traces (0 for one step, 1 for full propagation)
`protected Policy`	`learningPolicy` The learning policy to use.
`protected LearningRate`	`learningRate` A learning rate function to use
`protected int`	`maxEpisodeSize` The maximum number of steps that will be taken in an episode before the agent terminates a learning episode
`protected double`	`maxWeightChangeForPlanningTermination` The maximum allowable change in the VFA weights during an episode before the planning method terminates.
`protected double`	`maxWeightChangeInLastEpisode` The maximum VFA weight change that occurred in the last learning episode.
`protected double`	`minEligibityForUpdate` The minimum eligibility value of a trace that will cause it to be updated
`protected int`	`numEpisodesForPlanning` The maximum number of episodes to use for planning
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.
`protected boolean`	`shouldAnnotateOptions` Whether decomposed options should have their primitive actions annotated with the options name in the returned `EpisodeAnalysis` objects.
`protected boolean`	`shouldDecomposeOptions` Whether options should be decomposed into actions in the returned `EpisodeAnalysis` objects.
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.
`protected boolean`	`useFeatureWiseLearningRate` Whether the learning rate polls should be based on the VFA state features or OO-MDP state.
`protected boolean`	`useReplacingTraces` Whether to use accumulating or replacing eligibility traces.
`protected ValueFunctionApproximation`	`vfa` The object that performs value function approximation

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`GradientDescentSarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, ValueFunctionApproximation vfa, double learningRate, double lambda)` Initializes SARSA(\lambda) with 0.1 epsilon greedy policy and places no limit on the number of steps the agent can take in an episode.
`GradientDescentSarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, ValueFunctionApproximation vfa, double learningRate, int maxEpisodeSize, double lambda)` Initializes SARSA(\lambda) with 0.1 epsilon greedy policy.
`GradientDescentSarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, ValueFunctionApproximation vfa, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)` Initializes SARSA(\lambda) By default the agent will only save the last learning episode and a call to the `planFromState(State)` method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`GDSLInit(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, ValueFunctionApproximation vfa, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)` Initializes SARSA(\lambda) By default the agent will only save the last learning episode and a call to the `planFromState(State)` method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
`protected ActionApproximationResult`	`getActionApproximation(State s, GroundedAction ga)` Returns the VFA Q-value approximation for the given state and action.
`protected java.util.List<ActionApproximationResult>`	`getAllActionApproximations(State s)` Gets all Q-value VFA results for each action for a given state
`java.util.List<EpisodeAnalysis>`	`getAllStoredLearningEpisodes()` Returns all saved `EpisodeAnalysis` objects of which the agent has kept track.
`EpisodeAnalysis`	`getLastLearningEpisode()` Returns the last learning episode of the agent.
`int`	`getLastNumSteps()` Returns the number of steps taken in the last episode;
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`protected QValue`	`getQFromFeaturesFor(java.util.List<ActionApproximationResult> results, State s, GroundedAction ga)` Creates a Q-value object in which the Q-value is determined from VFA.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState)` Causes the agent to perform a learning episode starting in the given initial state.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState, int maxSteps)` Causes the agent to perform a learning episode starting in the given initial state.
`void`	`setLearningPolicy(Policy p)` Sets which policy this agent should use for learning.
`void`	`setLearningRate(LearningRate lr)` Sets the learning rate function to use.
`void`	`setMaximumEpisodesForPlanning(int n)` Sets the maximum number of episodes that will be performed when the `planFromState(State)` method is called.
`void`	`setMaxVFAWeightChangeForPlanningTerminaiton(double m)` Sets a max change in the VFA weight threshold that will cause the `planFromState(State)` to stop planning when it is achieved.
`void`	`setNumEpisodesToStore(int numEps)` Tells the agent how many `EpisodeAnalysis` objects representing learning episodes to internally store.
`void`	`setUseFeatureWiseLearningRate(boolean useFeatureWiseLearningRate)` Sets whether learning rate polls should be based on the VFA state feature ids, or the OO-MDP state.
`void`	`setUseReplaceTraces(boolean toggle)` Sets whether to use replacing eligibility traces rather than accumulating traces.
`void`	`toggleShouldAnnotateOptionDecomposition(boolean toggle)` Sets whether options that are decomposed into primitives will have the option that produced them and listed.
`void`	`toggleShouldDecomposeOption(boolean toggle)` Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - vfa
```
protected ValueFunctionApproximation vfa
```
    The object that performs value function approximation
  - learningRate
```
protected LearningRate learningRate
```
    A learning rate function to use
  - learningPolicy
```
protected Policy learningPolicy
```
    The learning policy to use. Typically these will be policies that link back to this object so that they change as the Q-value estimate change.
  - lambda
```
protected double lambda
```
    the strength of eligibility traces (0 for one step, 1 for full propagation)
  - maxEpisodeSize
```
protected int maxEpisodeSize
```
    The maximum number of steps that will be taken in an episode before the agent terminates a learning episode
  - eStepCounter
```
protected int eStepCounter
```
    A counter for counting the number of steps in an episode that have been taken thus far
  - numEpisodesForPlanning
```
protected int numEpisodesForPlanning
```
    The maximum number of episodes to use for planning
  - maxWeightChangeForPlanningTermination
```
protected double maxWeightChangeForPlanningTermination
```
    The maximum allowable change in the VFA weights during an episode before the planning method terminates.
  - maxWeightChangeInLastEpisode
```
protected double maxWeightChangeInLastEpisode
```
    The maximum VFA weight change that occurred in the last learning episode.
  - useFeatureWiseLearningRate
```
protected boolean useFeatureWiseLearningRate
```
    Whether the learning rate polls should be based on the VFA state features or OO-MDP state. If true, then based on feature VFA state features; if false then the OO-MDP state. Default is to use feature ids.
  - minEligibityForUpdate
```
protected double minEligibityForUpdate
```
    The minimum eligibility value of a trace that will cause it to be updated
  - episodeHistory
```
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
```
    the saved previous learning episodes
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
  - useReplacingTraces
```
protected boolean useReplacingTraces
```
    Whether to use accumulating or replacing eligibility traces.
  - shouldDecomposeOptions
```
protected boolean shouldDecomposeOptions
```
    Whether options should be decomposed into actions in the returned EpisodeAnalysis objects.
  - shouldAnnotateOptions
```
protected boolean shouldAnnotateOptions
```
    Whether decomposed options should have their primitive actions annotated with the options name in the returned EpisodeAnalysis objects.
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - GradientDescentSarsaLam
```
public GradientDescentSarsaLam(Domain domain,
                       RewardFunction rf,
                       TerminalFunction tf,
                       double gamma,
                       ValueFunctionApproximation vfa,
                       double learningRate,
                       double lambda)
```
    Initializes SARSA(\lambda) with 0.1 epsilon greedy policy and places no limit on the number of steps the agent can take in an episode. By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    vfa - the value function approximation method to use for estimate Q-values
    learningRate - the learning rate
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
  - GradientDescentSarsaLam
```
public GradientDescentSarsaLam(Domain domain,
                       RewardFunction rf,
                       TerminalFunction tf,
                       double gamma,
                       ValueFunctionApproximation vfa,
                       double learningRate,
                       int maxEpisodeSize,
                       double lambda)
```
    Initializes SARSA(\lambda) with 0.1 epsilon greedy policy. By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    vfa - the value function approximation method to use for estimate Q-values
    learningRate - the learning rate
    maxEpisodeSize - the maximum number of steps the agent will take in an episode before terminating
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
  - GradientDescentSarsaLam
```
public GradientDescentSarsaLam(Domain domain,
                       RewardFunction rf,
                       TerminalFunction tf,
                       double gamma,
                       ValueFunctionApproximation vfa,
                       double learningRate,
                       Policy learningPolicy,
                       int maxEpisodeSize,
                       double lambda)
```
    Initializes SARSA(\lambda) By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    vfa - the value function approximation method to use for estimate Q-values
    learningRate - the learning rate
    learningPolicy - the learning policy to follow during a learning episode.
    maxEpisodeSize - the maximum number of steps the agent will take in an episode before terminating
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
- Method Detail
  - GDSLInit
```
protected void GDSLInit(Domain domain,
            RewardFunction rf,
            TerminalFunction tf,
            double gamma,
            ValueFunctionApproximation vfa,
            double learningRate,
            Policy learningPolicy,
            int maxEpisodeSize,
            double lambda)
```
    Initializes SARSA(\lambda) By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    vfa - the value function approximation method to use for estimate Q-values
    learningRate - the learning rate
    learningPolicy - the learning policy to follow during a learning episode.
    maxEpisodeSize - the maximum number of steps the agent will take in an episode before terminating
    lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)
  - setLearningRate
```
public void setLearningRate(LearningRate lr)
```
    Sets the learning rate function to use.
    
    Parameters:
    lr - the learning rate function to use.
  - setUseFeatureWiseLearningRate
```
public void setUseFeatureWiseLearningRate(boolean useFeatureWiseLearningRate)
```
    Sets whether learning rate polls should be based on the VFA state feature ids, or the OO-MDP state. Default is to use feature ids.
    
    Parameters:
    useFeatureWiseLearningRate - if true then learning rate polls are based on VFA state feature ids; if false then they are based on the OO-MDP state object.
  - setLearningPolicy
```
public void setLearningPolicy(Policy p)
```
    Sets which policy this agent should use for learning.
    
    Parameters:
    p - the policy to use for learning.
  - setMaximumEpisodesForPlanning
```
public void setMaximumEpisodesForPlanning(int n)
```
    Sets the maximum number of episodes that will be performed when the planFromState(State) method is called.
    
    Parameters:
    n - the maximum number of episodes that will be performed when the planFromState(State) method is called.
  - setMaxVFAWeightChangeForPlanningTerminaiton
```
public void setMaxVFAWeightChangeForPlanningTerminaiton(double m)
```
    Sets a max change in the VFA weight threshold that will cause the planFromState(State) to stop planning when it is achieved.
    
    Parameters:
    m - the maximum allowable change in the VFA weights before planning stops
  - getLastNumSteps
```
public int getLastNumSteps()
```
    Returns the number of steps taken in the last episode;
    
    Returns:
    the number of steps taken in the last episode;
  - setUseReplaceTraces
```
public void setUseReplaceTraces(boolean toggle)
```
    Sets whether to use replacing eligibility traces rather than accumulating traces.
    
    Parameters:
    toggle -
  - toggleShouldDecomposeOption
```
public void toggleShouldDecomposeOption(boolean toggle)
```
    Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects. The default value is true. If this is set to false, then EpisodeAnalysis objects returned from a learning episode will record options as a single "action" and the steps taken by the option will be hidden.
    
    Parameters:
    toggle - whether to decompose options into the primitive actions taken by them or not.
  - toggleShouldAnnotateOptionDecomposition
```
public void toggleShouldAnnotateOptionDecomposition(boolean toggle)
```
    Sets whether options that are decomposed into primitives will have the option that produced them and listed. The default value is true. If option decomposition is not enabled, changing this value will do nothing. When it is enabled and this is set to true, primitive actions taken by an option in EpisodeAnalysis objects will be recorded with a special action name that indicates which option was called to produce the primitive action as well as which step of the option the primitive action is. When set to false, recorded names of primitives will be only the primitive aciton's name it will be unclear which option was taken to generate it.
    
    Parameters:
    toggle - whether to annotate the primitive actions of options with the calling option's name.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached or if the agent decides to determinate the episode (e.g., by having an internal parameter set for a maximum number of steps in an episode).
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState,
                                     int maxSteps)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached, if the agent decides to determinate the episode, or if the number of steps reaches the provided threshold.
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    maxSteps - the maximum number of steps in the episode
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - getLastLearningEpisode
```
public EpisodeAnalysis getLastLearningEpisode()
```
    Description copied from interface: LearningAgent
    
    Returns the last learning episode of the agent.
    
    Specified by:
    
    getLastLearningEpisode in interface LearningAgent
    
    Returns:
    the last learning episode of the agent.
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
    Description copied from interface: LearningAgent
    
    Tells the agent how many EpisodeAnalysis objects representing learning episodes to internally store. For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number has nothing to do with how learning is performed; it is purely for performance gathering.
    
    Specified by:
    
    setNumEpisodesToStore in interface LearningAgent
    
    Parameters:
    numEps - the number of learning episodes to remember.
  - getAllStoredLearningEpisodes
```
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
```
    Description copied from interface: LearningAgent
    
    Returns all saved EpisodeAnalysis objects of which the agent has kept track.
    
    Specified by:
    
    getAllStoredLearningEpisodes in interface LearningAgent
    
    Returns:
    all saved EpisodeAnalysis objects of which the agent has kept track.
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - getQFromFeaturesFor
```
protected QValue getQFromFeaturesFor(java.util.List<ActionApproximationResult> results,
                         State s,
                         GroundedAction ga)
```
    Creates a Q-value object in which the Q-value is determined from VFA.
    
    Parameters:
    results - the VFA prediction results for each action.
    s - the state of the Q-value
    ga - the action taken
    
    Returns:
    a Q-value object in which the Q-value is determined from VFA.
  - getAllActionApproximations
```
protected java.util.List<ActionApproximationResult> getAllActionApproximations(State s)
```
    Gets all Q-value VFA results for each action for a given state
    
    Parameters:
    s - the state for which the Q-Value VFA results should be returned.
    
    Returns:
    all Q-value VFA results for each action for a given state
  - getActionApproximation
```
protected ActionApproximationResult getActionApproximation(State s,
                                               GroundedAction ga)
```
    Returns the VFA Q-value approximation for the given state and action.
    
    Parameters:
    s - the state for which the VFA result should be returned
    ga - the action for which the VFA result should be returned
    
    Returns:
    the VFA Q-value approximation for the given state and action.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner

Class GradientDescentSarsaLam

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

vfa

learningRate

learningPolicy

lambda

maxEpisodeSize

eStepCounter

numEpisodesForPlanning

maxWeightChangeForPlanningTermination

maxWeightChangeInLastEpisode

useFeatureWiseLearningRate

minEligibityForUpdate

episodeHistory

numEpisodesToStore

useReplacingTraces

shouldDecomposeOptions

shouldAnnotateOptions

totalNumberOfSteps

Constructor Detail

GradientDescentSarsaLam

GradientDescentSarsaLam

GradientDescentSarsaLam

Method Detail

GDSLInit

setLearningRate

setUseFeatureWiseLearningRate

setLearningPolicy

setMaximumEpisodesForPlanning

setMaxVFAWeightChangeForPlanningTerminaiton

getLastNumSteps

setUseReplaceTraces

toggleShouldDecomposeOption

toggleShouldAnnotateOptionDecomposition

runLearningEpisodeFrom

runLearningEpisodeFrom

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

getQs

getQ

getQFromFeaturesFor

getAllActionApproximations

getActionApproximation

planFromState

resetPlannerResults