QLearning

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.learning.tdmethods.QLearning

All Implemented Interfaces:

LearningAgent, QComputablePlanner

Direct Known Subclasses:

SarsaLam
```
public class QLearning
extends OOMDPPlanner
implements QComputablePlanner, LearningAgent
```
Tabular Q-learning algorithm [1]. This implementation will work correctly with Options [2]. The implementation can either be used for learning or planning, the latter of which is performed by running many learning episodes in succession. The number of episodes used for planning can be determined by a threshold maximum number of episodes, or by a maximum change in the Q-function threshold.
1. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
2. Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1 (1999): 181-211.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent
  LearningAgent.LearningAgentBookKeeping

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.LinkedList<EpisodeAnalysis>`	`episodeHistory` the saved previous learning episodes
`protected int`	`eStepCounter` A counter for counting the number of steps in an episode that have been taken thus far
`protected Policy`	`learningPolicy` The learning policy to use.
`protected LearningRate`	`learningRate` The learning rate function used.
`protected int`	`maxEpisodeSize` The maximum number of steps that will be taken in an episode before the agent terminates a learning episode
`protected double`	`maxQChangeForPlanningTermination` The maximum allowable change in the Q-function during an episode before the planning method terminates.
`protected double`	`maxQChangeInLastEpisode` The maximum Q-value change that occurred in the last learning episode.
`protected int`	`numEpisodesForPlanning` The maximum number of episodes to use for planning
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.
`protected java.util.Map<StateHashTuple,QLearningStateNode>`	`qIndex` The tabular mapping from states to Q-values
`protected ValueFunctionInitialization`	`qInitFunction` The object that defines how Q-values are initialized.
`protected boolean`	`shouldAnnotateOptions` Whether decomposed options should have their primitive actions annotated with the options name in the returned `EpisodeAnalysis` objects.
`protected boolean`	`shouldDecomposeOptions` Whether options should be decomposed into actions in the returned `EpisodeAnalysis` objects.
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate)` Initializes Q-learning with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the agent can take in an episode.
`QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, int maxEpisodeSize)` Initializes Q-learning with 0.1 epsilon greedy policy, the same Q-value initialization everywhere.
`QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize)` Initializes the same Q-value initialization everywhere.
`QLearning(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization qInit, double learningRate, Policy learningPolicy, int maxEpisodeSize)` Initializes the algorithm.

Method Summary

Methods
Modifier and Type	Method and Description
`java.util.List<EpisodeAnalysis>`	`getAllStoredLearningEpisodes()` Returns all saved `EpisodeAnalysis` objects of which the agent has kept track.
`EpisodeAnalysis`	`getLastLearningEpisode()` Returns the last learning episode of the agent.
`int`	`getLastNumSteps()` Returns the number of steps taken in the last episode;
`protected double`	`getMaxQ(StateHashTuple s)` Returns the maximum Q-value in the hashed stated.
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`protected QValue`	`getQ(StateHashTuple s, GroundedAction a)` Returns the Q-value for a given hashed state and action.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`protected java.util.List<QValue>`	`getQs(StateHashTuple s)` Returns the possible Q-values for a given hashed stated.
`protected QLearningStateNode`	`getStateNode(StateHashTuple s)` Returns the `QLearningStateNode` object stored for the given hashed state.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`protected void`	`QLInit(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization qInitFunction, double learningRate, Policy learningPolicy, int maxEpisodeSize)` Initializes the algorithm.
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState)` Causes the agent to perform a learning episode starting in the given initial state.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState, int maxSteps)` Causes the agent to perform a learning episode starting in the given initial state.
`void`	`setLearningPolicy(Policy p)` Sets which policy this agent should use for learning.
`void`	`setLearningRateFunction(LearningRate lr)` Sets the learning rate function to use
`void`	`setMaximumEpisodesForPlanning(int n)` Sets the maximum number of episodes that will be performed when the `planFromState(State)` method is called.
`void`	`setMaxQChangeForPlanningTerminaiton(double m)` Sets a max change in the Q-function threshold that will cause the `planFromState(State)` to stop planning when it is achieved.
`void`	`setNumEpisodesToStore(int numEps)` Tells the agent how many `EpisodeAnalysis` objects representing learning episodes to internally store.
`void`	`setQInitFunction(ValueFunctionInitialization qInit)` Sets how to initialize Q-values for previously unexperienced state-action pairs.
`void`	`toggleShouldAnnotateOptionDecomposition(boolean toggle)` Sets whether options that are decomposed into primitives will have the option that produced them and listed.
`void`	`toggleShouldDecomposeOption(boolean toggle)` Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - qIndex
```
protected java.util.Map<StateHashTuple,QLearningStateNode> qIndex
```
    The tabular mapping from states to Q-values
  - qInitFunction
```
protected ValueFunctionInitialization qInitFunction
```
    The object that defines how Q-values are initialized.
  - learningRate
```
protected LearningRate learningRate
```
    The learning rate function used.
  - learningPolicy
```
protected Policy learningPolicy
```
    The learning policy to use. Typically these will be policies that link back to this object so that they change as the Q-value estimate change.
  - maxEpisodeSize
```
protected int maxEpisodeSize
```
    The maximum number of steps that will be taken in an episode before the agent terminates a learning episode
  - eStepCounter
```
protected int eStepCounter
```
    A counter for counting the number of steps in an episode that have been taken thus far
  - numEpisodesForPlanning
```
protected int numEpisodesForPlanning
```
    The maximum number of episodes to use for planning
  - maxQChangeForPlanningTermination
```
protected double maxQChangeForPlanningTermination
```
    The maximum allowable change in the Q-function during an episode before the planning method terminates.
  - maxQChangeInLastEpisode
```
protected double maxQChangeInLastEpisode
```
    The maximum Q-value change that occurred in the last learning episode.
  - episodeHistory
```
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
```
    the saved previous learning episodes
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
  - shouldDecomposeOptions
```
protected boolean shouldDecomposeOptions
```
    Whether options should be decomposed into actions in the returned EpisodeAnalysis objects.
  - shouldAnnotateOptions
```
protected boolean shouldAnnotateOptions
```
    Whether decomposed options should have their primitive actions annotated with the options name in the returned EpisodeAnalysis objects.
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - QLearning
```
public QLearning(Domain domain,
         RewardFunction rf,
         TerminalFunction tf,
         double gamma,
         StateHashFactory hashingFactory,
         double qInit,
         double learningRate)
```
    Initializes Q-learning with 0.1 epsilon greedy policy, the same Q-value initialization everywhere, and places no limit on the number of steps the agent can take in an episode. By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - the initial Q-value to user everywhere
    learningRate - the learning rate
  - QLearning
```
public QLearning(Domain domain,
         RewardFunction rf,
         TerminalFunction tf,
         double gamma,
         StateHashFactory hashingFactory,
         double qInit,
         double learningRate,
         int maxEpisodeSize)
```
    Initializes Q-learning with 0.1 epsilon greedy policy, the same Q-value initialization everywhere. By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - the initial Q-value to user everywhere
    learningRate - the learning rate
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.
  - QLearning
```
public QLearning(Domain domain,
         RewardFunction rf,
         TerminalFunction tf,
         double gamma,
         StateHashFactory hashingFactory,
         double qInit,
         double learningRate,
         Policy learningPolicy,
         int maxEpisodeSize)
```
    Initializes the same Q-value initialization everywhere. Note that if the provided policy is derived from the Q-value of this learning agent (as it should be), you may need to set the policy to point to this object after call this constructor; the constructor will not do this automatically in case it was by design to use the policy that was learned in some other domain. By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - the initial Q-value to user everywhere
    learningRate - the learning rate
    learningPolicy - the learning policy to follow during a learning episode.
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.
  - QLearning
```
public QLearning(Domain domain,
         RewardFunction rf,
         TerminalFunction tf,
         double gamma,
         StateHashFactory hashingFactory,
         ValueFunctionInitialization qInit,
         double learningRate,
         Policy learningPolicy,
         int maxEpisodeSize)
```
    Initializes the algorithm. Note that if the provided policy is derived from the Q-value of this learning agent (as it should be), you may need to set the policy to point to this object after call this constructor; the constructor will not do this automatically in case it was by design to use the policy that was learned in some other domain. By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInit - a ValueFunctionInitialization object that can be used to initialize the Q-values.
    learningRate - the learning rate
    learningPolicy - the learning policy to follow during a learning episode.
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.
- Method Detail
  - QLInit
```
protected void QLInit(Domain domain,
          RewardFunction rf,
          TerminalFunction tf,
          double gamma,
          StateHashFactory hashingFactory,
          ValueFunctionInitialization qInitFunction,
          double learningRate,
          Policy learningPolicy,
          int maxEpisodeSize)
```
    Initializes the algorithm. By default the agent will only save the last learning episode and a call to the planFromState(State) method will cause the planner to use only one episode for planning; this should probably be changed to a much larger value if you plan on using this algorithm as a planning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function
    tf - the terminal function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for Q-lookups
    qInitFunction - a ValueFunctionInitialization object that can be used to initialize the Q-values.
    learningRate - the learning rate
    learningPolicy - the learning policy to follow during a learning episode.
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode for the agent stops trying.
  - setLearningRateFunction
```
public void setLearningRateFunction(LearningRate lr)
```
    Sets the learning rate function to use
    
    Parameters:
    lr - the learning rate function to use
  - setQInitFunction
```
public void setQInitFunction(ValueFunctionInitialization qInit)
```
    Sets how to initialize Q-values for previously unexperienced state-action pairs.
    
    Parameters:
    qInit - a ValueFunctionInitialization object that can be used to initialize the Q-values.
  - setLearningPolicy
```
public void setLearningPolicy(Policy p)
```
    Sets which policy this agent should use for learning.
    
    Parameters:
    p - the policy to use for learning.
  - setMaximumEpisodesForPlanning
```
public void setMaximumEpisodesForPlanning(int n)
```
    Sets the maximum number of episodes that will be performed when the planFromState(State) method is called.
    
    Parameters:
    n - the maximum number of episodes that will be performed when the planFromState(State) method is called.
  - setMaxQChangeForPlanningTerminaiton
```
public void setMaxQChangeForPlanningTerminaiton(double m)
```
    Sets a max change in the Q-function threshold that will cause the planFromState(State) to stop planning when it is achieved.
    
    Parameters:
    m - the maximum allowable change in the Q-function before planning stops
  - getLastNumSteps
```
public int getLastNumSteps()
```
    Returns the number of steps taken in the last episode;
    
    Returns:
    the number of steps taken in the last episode;
  - toggleShouldDecomposeOption
```
public void toggleShouldDecomposeOption(boolean toggle)
```
    Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects. The default value is true. If this is set to false, then EpisodeAnalysis objects returned from a learning episode will record options as a single "action" and the steps taken by the option will be hidden.
    
    Parameters:
    toggle - whether to decompose options into the primitive actions taken by them or not.
  - toggleShouldAnnotateOptionDecomposition
```
public void toggleShouldAnnotateOptionDecomposition(boolean toggle)
```
    Sets whether options that are decomposed into primitives will have the option that produced them and listed. The default value is true. If option decomposition is not enabled, changing this value will do nothing. When it is enabled and this is set to true, primitive actions taken by an option in EpisodeAnalysis objects will be recorded with a special action name that indicates which option was called to produce the primitive action as well as which step of the option the primitive action is. When set to false, recorded names of primitives will be only the primitive aciton's name it will be unclear which option was taken to generate it.
    
    Parameters:
    toggle - whether to annotate the primitive actions of options with the calling option's name.
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - getQs
```
protected java.util.List<QValue> getQs(StateHashTuple s)
```
    Returns the possible Q-values for a given hashed stated.
    
    Parameters:
    s - the hashed state for which to get the Q-values.
    
    Returns:
    the possible Q-values for a given hashed stated.
  - getQ
```
protected QValue getQ(StateHashTuple s,
          GroundedAction a)
```
    Returns the Q-value for a given hashed state and action.
    
    Parameters:
    s - the hashed state
    a - the action
    
    Returns:
    the Q-value for a given hashed state and action; null is returned if there is not Q-value currently stored.
  - getStateNode
```
protected QLearningStateNode getStateNode(StateHashTuple s)
```
    Returns the QLearningStateNode object stored for the given hashed state. If no QLearningStateNode object. is stored, then it is created and has its Q-value initialize using this objects ValueFunctionInitialization data member.
    
    Parameters:
    s - the hashed state for which to get the QLearningStateNode object
    
    Returns:
    the QLearningStateNode object stored for the given hashed state. If no QLearningStateNode object.
  - getMaxQ
```
protected double getMaxQ(StateHashTuple s)
```
    Returns the maximum Q-value in the hashed stated.
    
    Parameters:
    s - the state for which to get he maximum Q-value;
    
    Returns:
    the maximum Q-value in the hashed stated.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached or if the agent decides to determinate the episode (e.g., by having an internal parameter set for a maximum number of steps in an episode).
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState,
                                     int maxSteps)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached, if the agent decides to determinate the episode, or if the number of steps reaches the provided threshold.
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    maxSteps - the maximum number of steps in the episode
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - getLastLearningEpisode
```
public EpisodeAnalysis getLastLearningEpisode()
```
    Description copied from interface: LearningAgent
    
    Returns the last learning episode of the agent.
    
    Specified by:
    
    getLastLearningEpisode in interface LearningAgent
    
    Returns:
    the last learning episode of the agent.
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
    Description copied from interface: LearningAgent
    
    Tells the agent how many EpisodeAnalysis objects representing learning episodes to internally store. For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number has nothing to do with how learning is performed; it is purely for performance gathering.
    
    Specified by:
    
    setNumEpisodesToStore in interface LearningAgent
    
    Parameters:
    numEps - the number of learning episodes to remember.
  - getAllStoredLearningEpisodes
```
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
```
    Description copied from interface: LearningAgent
    
    Returns all saved EpisodeAnalysis objects of which the agent has kept track.
    
    Specified by:
    
    getAllStoredLearningEpisodes in interface LearningAgent
    
    Returns:
    all saved EpisodeAnalysis objects of which the agent has kept track.
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner

Class QLearning

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

qIndex

qInitFunction

learningRate

learningPolicy

maxEpisodeSize

eStepCounter

numEpisodesForPlanning

maxQChangeForPlanningTermination

maxQChangeInLastEpisode

episodeHistory

numEpisodesToStore

shouldDecomposeOptions

shouldAnnotateOptions

totalNumberOfSteps

Constructor Detail

QLearning

QLearning

QLearning

QLearning

Method Detail

QLInit

setLearningRateFunction

setQInitFunction

setLearningPolicy

setMaximumEpisodesForPlanning

setMaxQChangeForPlanningTerminaiton

getLastNumSteps

toggleShouldDecomposeOption

toggleShouldAnnotateOptionDecomposition

getQs

getQ

getQs

getQ

getStateNode

getMaxQ

planFromState

runLearningEpisodeFrom

runLearningEpisodeFrom

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

resetPlannerResults