public class SarsaLam extends QLearning
Modifier and Type | Class and Description |
---|---|
static class |
SarsaLam.EligibilityTrace
A data structure for maintaining eligibility trace values
|
QComputablePlanner.QComputablePlannerHelper
LearningAgent.LearningAgentBookKeeping
Modifier and Type | Field and Description |
---|---|
protected double |
lambda
the strength of eligibility traces (0 for one step, 1 for full propagation)
|
episodeHistory, eStepCounter, learningPolicy, learningRate, maxEpisodeSize, maxQChangeForPlanningTermination, maxQChangeInLastEpisode, numEpisodesForPlanning, numEpisodesToStore, qIndex, qInitFunction, shouldAnnotateOptions, shouldDecomposeOptions, totalNumberOfSteps
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
SarsaLam(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double qInit,
double learningRate,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the
agent can take in an episode.
|
SarsaLam(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double qInit,
double learningRate,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with 0.1 epsilon greedy policy, the same Q-value initialization everywhere.
|
SarsaLam(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda) with the same Q-value initialization everywhere.
|
SarsaLam(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
ValueFunctionInitialization qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize,
double lambda)
Initializes SARSA(\lambda).
|
Modifier and Type | Method and Description |
---|---|
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState,
int maxSteps)
Causes the agent to perform a learning episode starting in the given initial state.
|
protected void |
sarsalamInit(double lambda) |
getAllStoredLearningEpisodes, getLastLearningEpisode, getLastNumSteps, getMaxQ, getQ, getQ, getQs, getQs, getStateNode, planFromState, QLInit, resetPlannerResults, runLearningEpisodeFrom, setLearningPolicy, setLearningRateFunction, setMaximumEpisodesForPlanning, setMaxQChangeForPlanningTerminaiton, setNumEpisodesToStore, setQInitFunction, toggleShouldAnnotateOptionDecomposition, toggleShouldDecomposeOption
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected double lambda
public SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, double lambda)
QLearning.planFromState(State)
method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learnrf
- the reward functiontf
- the terminal functiongamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratelambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learnrf
- the reward functiontf
- the terminal functiongamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratemaxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learnrf
- the reward functiontf
- the terminal functiongamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- the initial Q-value to user everywherelearningRate
- the learning ratelearningPolicy
- the learning policy to follow during a learning episode.maxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)public SarsaLam(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize, double lambda)
QLearning.planFromState(State)
method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain
- the domain in which to learnrf
- the reward functiontf
- the terminal functiongamma
- the discount factorhashingFactory
- the state hashing factory to use for Q-lookupsqInit
- a ValueFunctionInitialization
object that can be used to initialize the Q-values.learningRate
- the learning ratelearningPolicy
- the learning policy to follow during a learning episode.maxEpisodeSize
- the maximum number of steps the agent will take in a learning episode for the agent stops trying.lambda
- specifies the strength of eligibility traces (0 for one step, 1 for full propagation)protected void sarsalamInit(double lambda)
public EpisodeAnalysis runLearningEpisodeFrom(State initialState, int maxSteps)
LearningAgent
runLearningEpisodeFrom
in interface LearningAgent
runLearningEpisodeFrom
in class QLearning
initialState
- The initial state in which the agent will start the episode.maxSteps
- the maximum number of steps in the episodeEpisodeAnalysis
object.