public class SarsaLam extends QLearning
SimulatedEnvironment
.
If you are going to use this algorithm for planning, call the QLearning.initializeForPlanning(int)
method before calling QLearning.planFromState(State)
. The number of episodes used for planning can be determined
by a threshold maximum number of episodes, or by a maximum change in the Q-function threshold.
By default, this agent will use an epsilon-greedy policy with epsilon=0.1. You can change the learning policy to
anything with the QLearning.setLearningPolicy(burlap.behavior.policy.Policy)
policy.
If you
want to use a custom learning rate decay schedule rather than a constant learning rate, use the
QLearning.setLearningRateFunction(burlap.behavior.learningrate.LearningRate)
.
1. Rummery, Gavin A., and Mahesan Niranjan. On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, 1994.
2. Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1 (1999): 181-211.
Modifier and Type | Class and Description |
---|---|
static class |
SarsaLam.EligibilityTrace
A data structure for maintaining eligibility trace values
|
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected double |
lambda
the strength of eligibility traces (0 for one step, 1 for full propagation)
|
eStepCounter, learningPolicy, learningRate, maxEpisodeSize, maxQChangeForPlanningTermination, maxQChangeInLastEpisode, numEpisodesForPlanning, qFunction, qInitFunction, shouldDecomposeOptions, totalNumberOfSteps
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the
agent can take in an episode.
|
SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere.
|
SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with the same Q-value initialization everywhere.
|
SarsaLam(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
QFunction qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda).
|
Modifier and Type | Method and Description |
---|---|
Episode |
runLearningEpisode(Environment env,
int maxSteps) |
protected void |
sarsalamInit(double lambda) |
getLastNumSteps, getMaxQ, getQ, getQs, getStateNode, initializeForPlanning, loadQTable, planFromState, QLInit, qValue, qValues, resetSolver, runLearningEpisode, setLearningPolicy, setLearningRateFunction, setMaximumEpisodesForPlanning, setMaxQChangeForPlanningTerminaiton, setQInitFunction, toggleShouldDecomposeOption, value, writeQTable
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
protected double lambda
public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratelambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratemaxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratelearningPolicy
- the learning policy to follow during a learning episode.maxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(SADomain domain, double gamma, HashableStateFactory hashingFactory, QFunction qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- a QFunction
object that can be used to initialize the Q-values.learningRate
- the learning ratelearningPolicy
- the learning policy to follow during a learning episode.maxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)protected void sarsalamInit(double lambda)
public Episode runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
runLearningEpisode
in class QLearning