MultiAgentQLearning

java.lang.Object
- burlap.oomdp.stochasticgames.Agent
- - burlap.behavior.stochasticgame.agents.maql.MultiAgentQLearning

All Implemented Interfaces:

MultiAgentQSourceProvider
```
public class MultiAgentQLearning
extends Agent
implements MultiAgentQSourceProvider
```
A class for performing multi-agent Q-learning in which different Q-value backup operators can be provided to enable the learning of different soution concepts. Multi-agent Q-learning differs from single agent Q-learning in that Q-values are associated with joint actions, rather than actions, and in that a different Q-value is stored for each agent in the game.
In this class, each agent stores its own Q-value and an object that provides a source for the Q-values of other agents. This allows the storage of Q-values to vary so that an agent can store the Q-values forall other agents, or the map can provide access to the Q-values stored by other MultiAgentQLearning agents in the world so that only one copy of each agent's Q-value is ever stored. In the case of the latter, all agents should be implementing the same solution concept learning algorithm. Otherwise, each agent should maintain their own set of Q-values.
After an agent observes an outcome, it determines the change in Q-value. However, the agent will not actually update its Q-value to the new value until it is asked for its next action (getAction(State)) or until the gameTerminated() message is sent. Q-value updates are delayed in this way because if Q-values for each agent are shared and distributed among the agents, this ensures that the Q-values are all updated after the next Q-value has been determined for each agent.
In general the learning policy followed by this agent should reflect the needs of the solution concept being learned. For instance, CoCo-Q should use some variant of a maximum wellfare joint policy.
The learning policy and its underlining joint policy will automatically be told that this agent is its target agent, the agent definitions in the world, and that this agent is the Q-source provider of the joint policy MAQSourcePolicy. If the set joint policy is not an instance of MAQSourcePolicy, then an exception will be thrown.

Author:

James MacGlashan; adapted from code provided by Esha Gosh, John Meehan, Michalis Michaelidis

Field Summary

Fields
Modifier and Type	Field and Description
`protected SGBackupOperator`	`backupOperator` The backup operator that defines the solution concept being learned
`protected double`	`discount` The discount factor
`protected StateHashFactory`	`hashingFactory` The state hashing factory used to index Q-values by state
`protected PolicyFromJointPolicy`	`learningPolicy` The learning policy to be followed
`protected LearningRate`	`learningRate` The learning rate for updating Q-values
`protected QSourceForSingleAgent`	`myQSource` This agent's Q-value source
`protected boolean`	`needsToUpdateQValue` Whether the agent needs to update its Q-values from a recent experience
`protected double`	`nextQValue` The new Q-value to which the last Q-value needs to be udpated
`protected ValueFunctionInitialization`	`qInit` The Q-value initialization to use
`protected AgentQSourceMap`	`qSourceMap` The object that maps to other agent's Q-value sources
`protected JAQValue`	`qToUpdate` Which Q-value object needs to be updated
`protected boolean`	`queryOtherAgentsQSource` Whether this agent is using the Q-values stored by other agents in the world rather than keeping a separate copy of the Q-values for each agent itself.
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.

Fields inherited from class burlap.oomdp.stochasticgames.Agent
agentType, domain, internalRewardFunction, world, worldAgentName

Constructor Summary

Constructors
Constructor and Description
`MultiAgentQLearning(SGDomain d, double discount, double learningRate, StateHashFactory hashFactory, double qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues)` Initializes this Q-learning agent.
`MultiAgentQLearning(SGDomain d, double discount, LearningRate learningRate, StateHashFactory hashFactory, ValueFunctionInitialization qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues)` Initializes this Q-learning agent.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`gameStarting()` This method is called by the world when a new game is starting.
`void`	`gameTerminated()` This method is called by the world when a game has ended.
`GroundedSingleAction`	`getAction(State s)` This method is called by the world when it needs the agent to choose an action
`QSourceForSingleAgent`	`getMyQSource()` Returns this agent's individual Q-value source
`AgentQSourceMap`	`getQSources()` Returns an object that can provide Q-value sources for each agent.
`void`	`joinWorld(World w, AgentType as)` Causes this agent instance to join a world.
`void`	`observeOutcome(State s, JointAction jointAction, java.util.Map<java.lang.String,java.lang.Double> jointReward, State sprime, boolean isTerminal)` This method is called by the world when every agent in the world has taken their action.
`void`	`setLearningPolicy(PolicyFromJointPolicy p)` Sets the learning policy to be followed by the agent.
`protected void`	`updateLatestQValue()` Updates the Q-value for the most recent observation if it has not already been updated

Methods inherited from class burlap.oomdp.stochasticgames.Agent
getAgentName, getAgentType, getInternalRewardFunction, init, setInternalRewardFunction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - discount
```
protected double discount
```
    The discount factor
  - myQSource
```
protected QSourceForSingleAgent myQSource
```
    This agent's Q-value source
  - qSourceMap
```
protected AgentQSourceMap qSourceMap
```
    The object that maps to other agent's Q-value sources
  - learningPolicy
```
protected PolicyFromJointPolicy learningPolicy
```
    The learning policy to be followed
  - learningRate
```
protected LearningRate learningRate
```
    The learning rate for updating Q-values
  - qInit
```
protected ValueFunctionInitialization qInit
```
    The Q-value initialization to use
  - hashingFactory
```
protected StateHashFactory hashingFactory
```
    The state hashing factory used to index Q-values by state
  - backupOperator
```
protected SGBackupOperator backupOperator
```
    The backup operator that defines the solution concept being learned
  - queryOtherAgentsQSource
```
protected boolean queryOtherAgentsQSource
```
    Whether this agent is using the Q-values stored by other agents in the world rather than keeping a separate copy of the Q-values for each agent itself.
  - needsToUpdateQValue
```
protected boolean needsToUpdateQValue
```
    Whether the agent needs to update its Q-values from a recent experience
  - nextQValue
```
protected double nextQValue
```
    The new Q-value to which the last Q-value needs to be udpated
  - qToUpdate
```
protected JAQValue qToUpdate
```
    Which Q-value object needs to be updated
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - MultiAgentQLearning
```
public MultiAgentQLearning(SGDomain d,
                   double discount,
                   double learningRate,
                   StateHashFactory hashFactory,
                   double qInit,
                   SGBackupOperator backupOperator,
                   boolean queryOtherAgentsForTheirQValues)
```
    Initializes this Q-learning agent. This agent's Q-source will use a QSourceForSingleAgent.HashBackedQSource q-source and the learning policy is defaulted to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will only store its own Q-values and will use the other agent's stored Q-values to determine theirs.
    
    Parameters:
    d - the domain in which to perform learing
    discount - the discount factor
    learningRate - the constant learning rate
    hashFactory - the hashing factory used to index states and Q-values
    qInit - the default Q-value to which all initial Q-values will be initialized
    backupOperator - the backup operator to use that defines the solution concept being learned
    queryOtherAgentsForTheirQValues - it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.
  - MultiAgentQLearning
```
public MultiAgentQLearning(SGDomain d,
                   double discount,
                   LearningRate learningRate,
                   StateHashFactory hashFactory,
                   ValueFunctionInitialization qInit,
                   SGBackupOperator backupOperator,
                   boolean queryOtherAgentsForTheirQValues)
```
    Initializes this Q-learning agent. This agent's Q-source will use a QSourceForSingleAgent.HashBackedQSource q-source and the learning policy is defaulted to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will only store its own Q-values and will use the other agent's stored Q-values to determine theirs.
    
    Parameters:
    d - the domain in which to perform learing
    discount - the discount factor
    learningRate - the learning rate function to use
    hashFactory - the hashing factory used to index states and Q-values
    qInit - the q-value initialization to use
    backupOperator - the backup operator to use that defines the solution concept being learned
    queryOtherAgentsForTheirQValues - it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.
- Method Detail
  - joinWorld
```
public void joinWorld(World w,
             AgentType as)
```
    Description copied from class: Agent
    
    Causes this agent instance to join a world.
    
    Overrides:
    
    joinWorld in class Agent
    
    Parameters:
    w - the world for the agent to join
    as - the agent type the agent will be joining as
  - getMyQSource
```
public QSourceForSingleAgent getMyQSource()
```
    Returns this agent's individual Q-value source
    
    Returns:
    this agent's individual Q-value source
  - getQSources
```
public AgentQSourceMap getQSources()
```
    Description copied from interface: MultiAgentQSourceProvider
    
    Returns an object that can provide Q-value sources for each agent.
    
    Specified by:
    
    getQSources in interface MultiAgentQSourceProvider
    
    Returns:
    a AgentQSourceMap object.
  - setLearningPolicy
```
public void setLearningPolicy(PolicyFromJointPolicy p)
```
    Sets the learning policy to be followed by the agent. The underlining joint policy of the learning policy be an instance of MultiAgentQLearning or a runtime exception will be thrown.
    
    Parameters:
    p - the learning policy to follow
  - gameStarting
```
public void gameStarting()
```
    Description copied from class: Agent
    
    This method is called by the world when a new game is starting.
    
    Specified by:
    
    gameStarting in class Agent
  - getAction
```
public GroundedSingleAction getAction(State s)
```
    Description copied from class: Agent
    
    This method is called by the world when it needs the agent to choose an action
    
    Specified by:
    
    getAction in class Agent
    
    Parameters:
    s - the current state of the world
    
    Returns:
    the action this agent wishes to take
  - observeOutcome
```
public void observeOutcome(State s,
                  JointAction jointAction,
                  java.util.Map<java.lang.String,java.lang.Double> jointReward,
                  State sprime,
                  boolean isTerminal)
```
    Description copied from class: Agent
    
    This method is called by the world when every agent in the world has taken their action. It conveys the result of the joint action.
    
    Specified by:
    
    observeOutcome in class Agent
    
    Parameters:
    s - the state in which the last action of each agent was taken
    jointAction - the joint action of all agents in the world
    jointReward - the joint reward of all agents in the world
    sprime - the next state to which the agent transitioned
    isTerminal - whether the new state is a terminal state
  - gameTerminated
```
public void gameTerminated()
```
    Description copied from class: Agent
    
    This method is called by the world when a game has ended.
    
    Specified by:
    
    gameTerminated in class Agent
  - updateLatestQValue
```
protected void updateLatestQValue()
```
    Updates the Q-value for the most recent observation if it has not already been updated

Class MultiAgentQLearning

Field Summary

Fields inherited from class burlap.oomdp.stochasticgames.Agent

Constructor Summary

Method Summary

Methods inherited from class burlap.oomdp.stochasticgames.Agent

Methods inherited from class java.lang.Object

Field Detail

discount

myQSource

qSourceMap

learningPolicy

learningRate

qInit

hashingFactory

backupOperator

queryOtherAgentsQSource

needsToUpdateQValue

nextQValue

qToUpdate

totalNumberOfSteps

Constructor Detail

MultiAgentQLearning

MultiAgentQLearning

Method Detail

joinWorld

getMyQSource

getQSources

setLearningPolicy

gameStarting

getAction

observeOutcome

gameTerminated

updateLatestQValue