public class QLearning extends OOMDPPlanner implements QComputablePlanner, LearningAgent
QComputablePlanner.QComputablePlannerHelperLearningAgent.LearningAgentBookKeeping| Modifier and Type | Field and Description |
|---|---|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
the saved previous learning episodes
|
protected int |
eStepCounter
A counter for counting the number of steps in an episode that have been taken thus far
|
protected Policy |
learningPolicy
The learning policy to use.
|
protected LearningRate |
learningRate
The learning rate function used.
|
protected int |
maxEpisodeSize
The maximum number of steps that will be taken in an episode before the agent terminates a learning episode
|
protected double |
maxQChangeForPlanningTermination
The maximum allowable change in the Q-function during an episode before the planning method terminates.
|
protected double |
maxQChangeInLastEpisode
The maximum Q-value change that occurred in the last learning episode.
|
protected int |
numEpisodesForPlanning
The maximum number of episodes to use for planning
|
protected int |
numEpisodesToStore
The number of the most recent learning episodes to store.
|
protected java.util.Map<StateHashTuple,QLearningStateNode> |
qIndex
The tabular mapping from states to Q-values
|
protected ValueFunctionInitialization |
qInitFunction
The object that defines how Q-values are initialized.
|
protected boolean |
shouldAnnotateOptions
Whether decomposed options should have their primitive actions annotated with the options name in the returned
EpisodeAnalysis objects. |
protected boolean |
shouldDecomposeOptions
Whether options should be decomposed into actions in the returned
EpisodeAnalysis objects. |
protected int |
totalNumberOfSteps
The total number of learning steps performed by this agent.
|
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description |
|---|
QLearning(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double qInit,
double learningRate)
Initializes Q-learning with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the
agent can take in an episode.
|
QLearning(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double qInit,
double learningRate,
int maxEpisodeSize)
Initializes Q-learning with 0.1 epsilon greedy policy, the same Q-value initialization everywhere.
|
QLearning(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
double qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize)
Initializes the same Q-value initialization everywhere.
|
QLearning(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
ValueFunctionInitialization qInit,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize)
Initializes the algorithm.
|
| Modifier and Type | Method and Description |
|---|---|
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes()
Returns all saved
EpisodeAnalysis objects of which the agent has kept track. |
EpisodeAnalysis |
getLastLearningEpisode()
Returns the last learning episode of the agent.
|
int |
getLastNumSteps()
Returns the number of steps taken in the last episode;
|
protected double |
getMaxQ(StateHashTuple s)
Returns the maximum Q-value in the hashed stated.
|
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
protected QValue |
getQ(StateHashTuple s,
GroundedAction a)
Returns the Q-value for a given hashed state and action.
|
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
protected java.util.List<QValue> |
getQs(StateHashTuple s)
Returns the possible Q-values for a given hashed stated.
|
protected QLearningStateNode |
getStateNode(StateHashTuple s)
Returns the
QLearningStateNode object stored for the given hashed state. |
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
protected void |
QLInit(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
ValueFunctionInitialization qInitFunction,
double learningRate,
Policy learningPolicy,
int maxEpisodeSize)
Initializes the algorithm.
|
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState)
Causes the agent to perform a learning episode starting in the given initial state.
|
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState,
int maxSteps)
Causes the agent to perform a learning episode starting in the given initial state.
|
void |
setLearningPolicy(Policy p)
Sets which policy this agent should use for learning.
|
void |
setLearningRateFunction(LearningRate lr)
Sets the learning rate function to use
|
void |
setMaximumEpisodesForPlanning(int n)
Sets the maximum number of episodes that will be performed when the
planFromState(State) method is called. |
void |
setMaxQChangeForPlanningTerminaiton(double m)
Sets a max change in the Q-function threshold that will cause the
planFromState(State) to stop planning
when it is achieved. |
void |
setNumEpisodesToStore(int numEps)
Tells the agent how many
EpisodeAnalysis objects representing learning episodes to internally store. |
void |
setQInitFunction(ValueFunctionInitialization qInit)
Sets how to initialize Q-values for previously unexperienced state-action pairs.
|
void |
toggleShouldAnnotateOptionDecomposition(boolean toggle)
Sets whether options that are decomposed into primitives will have the option that produced them and listed.
|
void |
toggleShouldDecomposeOption(boolean toggle)
Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects.
|
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateActionprotected java.util.Map<StateHashTuple,QLearningStateNode> qIndex
protected ValueFunctionInitialization qInitFunction
protected LearningRate learningRate
protected Policy learningPolicy
protected int maxEpisodeSize
protected int eStepCounter
protected int numEpisodesForPlanning
protected double maxQChangeForPlanningTermination
protected double maxQChangeInLastEpisode
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int numEpisodesToStore
protected boolean shouldDecomposeOptions
EpisodeAnalysis objects.protected boolean shouldAnnotateOptions
EpisodeAnalysis objects.protected int totalNumberOfSteps
public QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate)
planFromState(State) method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain - the domain in which to learnrf - the reward functiontf - the terminal functiongamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - the initial Q-value to user everywherelearningRate - the learning ratepublic QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize)
planFromState(State) method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain - the domain in which to learnrf - the reward functiontf - the terminal functiongamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - the initial Q-value to user everywherelearningRate - the learning ratemaxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.public QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize)
planFromState(State) method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain - the domain in which to learnrf - the reward functiontf - the terminal functiongamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - the initial Q-value to user everywherelearningRate - the learning ratelearningPolicy - the learning policy to follow during a learning episode.maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.public QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize)
planFromState(State) method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain - the domain in which to learnrf - the reward functiontf - the terminal functiongamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInit - a ValueFunctionInitialization object that can be used to initialize the Q-values.learningRate - the learning ratelearningPolicy - the learning policy to follow during a learning episode.maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.protected void QLInit(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization qInitFunction, double learningRate, Policy learningPolicy, int maxEpisodeSize)
planFromState(State) method
will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this
algorithm as a planning algorithm.domain - the domain in which to learnrf - the reward functiontf - the terminal functiongamma - the discount factorhashingFactory - the state hashing factory to use for Q-lookupsqInitFunction - a ValueFunctionInitialization object that can be used to initialize the Q-values.learningRate - the learning ratelearningPolicy - the learning policy to follow during a learning episode.maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.public void setLearningRateFunction(LearningRate lr)
lr - the learning rate function to usepublic void setQInitFunction(ValueFunctionInitialization qInit)
qInit - a ValueFunctionInitialization object that can be used to initialize the Q-values.public void setLearningPolicy(Policy p)
p - the policy to use for learning.public void setMaximumEpisodesForPlanning(int n)
planFromState(State) method is called.n - the maximum number of episodes that will be performed when the planFromState(State) method is called.public void setMaxQChangeForPlanningTerminaiton(double m)
planFromState(State) to stop planning
when it is achieved.m - the maximum allowable change in the Q-function before planning stopspublic int getLastNumSteps()
public void toggleShouldDecomposeOption(boolean toggle)
toggle - whether to decompose options into the primitive actions taken by them or not.public void toggleShouldAnnotateOptionDecomposition(boolean toggle)
toggle - whether to annotate the primitive actions of options with the calling option's name.public java.util.List<QValue> getQs(State s)
QComputablePlannerList of QValue objects for ever permissible action for the given input state.getQs in interface QComputablePlanners - the state for which Q-values are to be returned.List of QValue objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QComputablePlannerQValue for the given state-action pair.getQ in interface QComputablePlanners - the input statea - the input actionQValue for the given state-action pair.protected java.util.List<QValue> getQs(StateHashTuple s)
s - the hashed state for which to get the Q-values.protected QValue getQ(StateHashTuple s, GroundedAction a)
s - the hashed statea - the actionprotected QLearningStateNode getStateNode(StateHashTuple s)
QLearningStateNode object stored for the given hashed state. If no QLearningStateNode object.
is stored, then it is created and has its Q-value initialize using this objects ValueFunctionInitialization data member.s - the hashed state for which to get the QLearningStateNode objectQLearningStateNode object stored for the given hashed state. If no QLearningStateNode object.protected double getMaxQ(StateHashTuple s)
s - the state for which to get he maximum Q-value;public void planFromState(State initialState)
OOMDPPlannerplanFromState in class OOMDPPlannerinitialState - the initial state of the planning problempublic EpisodeAnalysis runLearningEpisodeFrom(State initialState)
LearningAgentrunLearningEpisodeFrom in interface LearningAgentinitialState - The initial state in which the agent will start the episode.EpisodeAnalysis object.public EpisodeAnalysis runLearningEpisodeFrom(State initialState, int maxSteps)
LearningAgentrunLearningEpisodeFrom in interface LearningAgentinitialState - The initial state in which the agent will start the episode.maxSteps - the maximum number of steps in the episodeEpisodeAnalysis object.public EpisodeAnalysis getLastLearningEpisode()
LearningAgentgetLastLearningEpisode in interface LearningAgentpublic void setNumEpisodesToStore(int numEps)
LearningAgentEpisodeAnalysis objects representing learning episodes to internally store.
For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number
has nothing to do with how learning is performed; it is purely for performance gathering.setNumEpisodesToStore in interface LearningAgentnumEps - the number of learning episodes to remember.public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
LearningAgentEpisodeAnalysis objects of which the agent has kept track.getAllStoredLearningEpisodes in interface LearningAgentEpisodeAnalysis objects of which the agent has kept track.public void resetPlannerResults()
OOMDPPlannerOOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults in class OOMDPPlanner