public class SarsaLam extends QLearning
SimulatedEnvironment
.
If you are going to use this algorithm for planning, call the QLearning.initializeForPlanning(int)
method before calling QLearning.planFromState(State)
. The number of episodes used for planning can be determined
by a threshold maximum number of episodes, or by a maximum change in the Qfunction threshold.
By default, this agent will use an epsilongreedy policy with epsilon=0.1. You can change the learning policy to
anything with the QLearning.setLearningPolicy(burlap.behavior.policy.Policy)
policy.
If you
want to use a custom learning rate decay schedule rather than a constant learning rate, use the
QLearning.setLearningRateFunction(burlap.behavior.learningrate.LearningRate)
.
1. Rummery, Gavin A., and Mahesan Niranjan. Online Qlearning using connectionist systems. University of Cambridge, Department of Engineering, 1994.
2. Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semiMDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1 (1999): 181211.
Modifier and Type  Class and Description 

static class 
SarsaLam.EligibilityTrace
A data structure for maintaining eligibility trace values

QProvider.Helper
Modifier and Type  Field and Description 

protected double 
lambda
the strength of eligibility traces (0 for one step, 1 for full propagation)

eStepCounter, learningPolicy, learningRate, maxEpisodeSize, maxQChangeForPlanningTermination, maxQChangeInLastEpisode, numEpisodesForPlanning, qFunction, qInitFunction, shouldDecomposeOptions, totalNumberOfSteps
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description 

SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Qvalue initialization everywhere, and places no limit on the number of steps the
agent can take in an episode.

SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Qvalue initialization everywhere.

SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with the same Qvalue initialization everywhere.

SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
QFunction qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda).

Modifier and Type  Method and Description 

Episode 
runLearningEpisode(Environment env,
int maxSteps) 
protected void 
sarsalamInit(double lambda) 
getLastNumSteps, getMaxQ, getQ, getQs, getStateNode, initializeForPlanning, loadQTable, planFromState, QLInit, qValue, qValues, resetSolver, runLearningEpisode, setLearningPolicy, setLearningRateFunction, setMaximumEpisodesForPlanning, setMaxQChangeForPlanningTerminaiton, setQInitFunction, toggleShouldDecomposeOption, value, writeQTable
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
protected double lambda
public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
 the domain in which to learngamma
 the discount factorhashingFactory
 the state hashing factory to use for QlookupsqInit
 the initial Qvalue to user everywherelearningRate
 the learning ratelambda
 specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
 the domain in which to learngamma
 the discount factorhashingFactory
 the state hashing factory to use for QlookupsqInit
 the initial Qvalue to user everywherelearningRate
 the learning ratemaxEpisodeSize
 the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
 specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
 the domain in which to learngamma
 the discount factorhashingFactory
 the state hashing factory to use for QlookupsqInit
 the initial Qvalue to user everywherelearningRate
 the learning ratelearningPolicy
 the learning policy to follow during a learning episode.maxEpisodeSize
 the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
 specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, QFunction qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
 the domain in which to learngamma
 the discount factorhashingFactory
 the state hashing factory to use for QlookupsqInit
 a QFunction
object that can be used to initialize the Qvalues.learningRate
 the learning ratelearningPolicy
 the learning policy to follow during a learning episode.maxEpisodeSize
 the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
 specifies the strength of eligibility traces (0 for one step, 1 for full propagation)protected void sarsalamInit(double lambda)
public Episode runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
runLearningEpisode
in class QLearning