public class SarsaLam extends QLearning
SimulatedEnvironment
.
If you are going to use this algorithm for planning, call the QLearning.initializeForPlanning(burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, int)
method before calling QLearning.planFromState(burlap.oomdp.core.states.State)
. The number of episodes used for planning can be determined
by a threshold maximum number of episodes, or by a maximum change in the Q-function threshold.
QLearning.setLearningPolicy(burlap.behavior.policy.Policy)
policy.
QLearning.setLearningRateFunction(burlap.behavior.learningrate.LearningRate)
.
Modifier and Type | Class and Description |
---|---|
static class |
SarsaLam.EligibilityTrace
A data structure for maintaining eligibility trace values
|
QFunction.QFunctionHelper
Modifier and Type | Field and Description |
---|---|
protected double |
lambda
the strength of eligibility traces (0 for one step, 1 for full propagation)
|
episodeHistory, eStepCounter, learningPolicy, learningRate, maxEpisodeSize, maxQChangeForPlanningTermination, maxQChangeInLastEpisode, numEpisodesForPlanning, numEpisodesToStore, qIndex, qInitFunction, shouldAnnotateOptions, shouldDecomposeOptions, totalNumberOfSteps
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
SarsaLam(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the
agent can take in an episode.
|
SarsaLam(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere.
|
SarsaLam(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
double qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with the same Q-value initialization everywhere.
|
SarsaLam(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
ValueFunctionInitialization qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda).
|
Modifier and Type | Method and Description |
---|---|
EpisodeAnalysis |
runLearningEpisode(Environment env,
int maxSteps) |
protected void |
sarsalamInit(double lambda) |
getAllStoredLearningEpisodes, getLastLearningEpisode, getLastNumSteps, getMaxQ, getQ, getQ, getQs, getQs, getStateNode, initializeForPlanning, planFromState, QLInit, resetSolver, runLearningEpisode, setLearningPolicy, setLearningRateFunction, setMaximumEpisodesForPlanning, setMaxQChangeForPlanningTerminaiton, setNumEpisodesToStore, setQInitFunction, toggleShouldAnnotateOptionDecomposition, toggleShouldDecomposeOption, value
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrinting
protected double lambda
public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratelambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratemaxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratelearningPolicy
- the learning policy to follow during a learning episode.maxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, ValueFunctionInitialization qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learngamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- a ValueFunctionInitialization
object that can be used to initialize the Q-values.learningRate
- the learning ratelearningPolicy
- the learning policy to follow during a learning episode.maxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)protected void sarsalamInit(double lambda)
public EpisodeAnalysis runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
runLearningEpisode
in class QLearning