public class SarsaLam extends QLearning
SimulatedEnvironment.
 If you are going to use this algorithm for planning, call the QLearning.initializeForPlanning(burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, int)
 method before calling QLearning.planFromState(burlap.oomdp.core.states.State). The number of episodes used for planning can be determined
 by a threshold maximum number of episodes, or by a maximum change in the Q-function threshold.
 QLearning.setLearningPolicy(burlap.behavior.policy.Policy) policy.
 QLearning.setLearningRateFunction(burlap.behavior.learningrate.LearningRate).
 | Modifier and Type | Class and Description | 
|---|---|
| static class  | SarsaLam.EligibilityTraceA data structure for maintaining eligibility trace values | 
QFunction.QFunctionHelper| Modifier and Type | Field and Description | 
|---|---|
| protected double | lambdathe strength of eligibility traces (0 for one step, 1 for full propagation) | 
episodeHistory, eStepCounter, learningPolicy, learningRate, maxEpisodeSize, maxQChangeForPlanningTermination, maxQChangeInLastEpisode, numEpisodesForPlanning, numEpisodesToStore, qIndex, qInitFunction, shouldAnnotateOptions, shouldDecomposeOptions, totalNumberOfStepsactions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description | 
|---|
| SarsaLam(Domain domain,
        double gamma,
        HashableStateFactory hashingFactory,
        double qInit,
        double learningRate,
        double lambda)Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the 
 agent can take in an episode. | 
| SarsaLam(Domain domain,
        double gamma,
        HashableStateFactory hashingFactory,
        double qInit,
        double learningRate,
        int maxEpisodeSize,
        double lambda)Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere. | 
| SarsaLam(Domain domain,
        double gamma,
        HashableStateFactory hashingFactory,
        double qInit,
        double learningRate,
        Policy learningPolicy,
        int maxEpisodeSize,
        double lambda)Initializes SARSA(\lambda) with the same Q-value initialization everywhere. | 
| SarsaLam(Domain domain,
        double gamma,
        HashableStateFactory hashingFactory,
        ValueFunctionInitialization qInit,
        double learningRate,
        Policy learningPolicy,
        int maxEpisodeSize,
        double lambda)Initializes SARSA(\lambda). | 
| Modifier and Type | Method and Description | 
|---|---|
| EpisodeAnalysis | runLearningEpisode(Environment env,
                  int maxSteps) | 
| protected void | sarsalamInit(double lambda) | 
getAllStoredLearningEpisodes, getLastLearningEpisode, getLastNumSteps, getMaxQ, getQ, getQ, getQs, getQs, getStateNode, initializeForPlanning, planFromState, QLInit, resetSolver, runLearningEpisode, setLearningPolicy, setLearningRateFunction, setMaximumEpisodesForPlanning, setMaxQChangeForPlanningTerminaiton, setNumEpisodesToStore, setQInitFunction, toggleShouldAnnotateOptionDecomposition, toggleShouldDecomposeOption, valueaddNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateActionclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitaddNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrintingprotected double lambda
public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, double lambda)
QLearning.planFromState(State) method
 will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
 algorithm as a planning algorithm.domain - the domain in which to learngamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - the initial Q-value to user everywherelearningRate - the learning ratelambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize, double lambda)
QLearning.planFromState(State) method
 will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
 algorithm as a planning algorithm.domain - the domain in which to learngamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - the initial Q-value to user everywherelearningRate - the learning ratemaxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State) method
 will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
 algorithm as a planning algorithm.domain - the domain in which to learngamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - the initial Q-value to user everywherelearningRate - the learning ratelearningPolicy - the learning policy to follow during a learning episode.maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, double gamma, HashableStateFactory hashingFactory, ValueFunctionInitialization qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State) method
 will cause the valueFunction to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
 algorithm as a planning algorithm.domain - the domain in which to learngamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - a ValueFunctionInitialization object that can be used to initialize the Q-values.learningRate - the learning ratelearningPolicy - the learning policy to follow during a learning episode.maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda - specifies the strength of eligibility traces (0 for one step, 1 for full propagation)protected void sarsalamInit(double lambda)
public EpisodeAnalysis runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode in interface LearningAgentrunLearningEpisode in class QLearning