MultiAgentQLearning

java.lang.Object
- burlap.mdp.stochasticgames.agent.SGAgentBase
- - burlap.behavior.stochasticgames.agents.maql.MultiAgentQLearning

All Implemented Interfaces:

MultiAgentQSourceProvider, SGAgent
```
public class MultiAgentQLearning
extends SGAgentBase
implements MultiAgentQSourceProvider
```
A class for performing multi-agent Q-learning in which different Q-value backup operators can be provided to enable the learning of different solution concepts. Multi-agent Q-learning differs from single agent Q-learning in that Q-values are associated with joint actions, rather than actions, and in that a different Q-value is stored for each agent in the game.
In this class, each agent stores its own Q-value and an object that provides a source for the Q-values of other agents. This allows the storage of Q-values to vary so that an agent can store the Q-values forall other agents, or the map can provide access to the Q-values stored by other MultiAgentQLearning agents in the world so that only one copy of each agent's Q-value is ever stored. In the case of the latter, all agents should be implementing the same solution concept learning algorithm. Otherwise, each agent should maintain their own set of Q-values.
After an agent observes an outcome, it determines the change in Q-value. However, the agent will not actually update its Q-value to the new value until it is asked for its next action (action(State)) or until the gameTerminated() message is sent. Q-value updates are delayed in this way because if Q-values for each agent are shared and distributed among the agents, this ensures that the Q-values are all updated after the next Q-value has been determined for each agent.
In general the learning policy followed by this agent should reflect the needs of the solution concept being learned. For instance, CoCo-Q should use some variant of a maximum welfare joint policy.
The learning policy and its underlining joint policy will automatically be told that this agent is its target agent, the agent definitions in the world, and that this agent is the Q-source provider of the joint policy MAQSourcePolicy. If the set joint policy is not an instance of MAQSourcePolicy, then an exception will be thrown.
Acknowledgements: Esha Gosh, John Meehan, Michalis Michaelidis for code on which this was based.

Author:

James MacGlashan

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`agentNum`
`protected SGBackupOperator`	`backupOperator` The backup operator that defines the solution concept being learned
`protected double`	`discount` The discount factor
`protected HashableStateFactory`	`hashingFactory` The state hashing factory used to index Q-values by state
`protected PolicyFromJointPolicy`	`learningPolicy` The learning policy to be followed
`protected LearningRate`	`learningRate` The learning rate for updating Q-values
`protected QSourceForSingleAgent`	`myQSource` This agent's Q-value source
`protected boolean`	`needsToUpdateQValue` Whether the agent needs to update its Q-values from a recent experience
`protected double`	`nextQValue` The new Q-value to which the last Q-value needs to be udpated
`protected QFunction`	`qInit` The Q-value initialization to use
`protected AgentQSourceMap`	`qSourceMap` The object that maps to other agent's Q-value sources
`protected JAQValue`	`qToUpdate` Which Q-value object needs to be updated
`protected boolean`	`queryOtherAgentsQSource` Whether this agent is using the Q-values stored by other agents in the world rather than keeping a separate copy of the Q-values for each agent itself.
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.

Fields inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase
agentType, domain, internalRewardFunction, world, worldAgentName

Constructor Summary

Constructors
Constructor and Description
`MultiAgentQLearning(SGDomain d, double discount, double learningRate, HashableStateFactory hashFactory, double qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues, java.lang.String agentName, SGAgentType agentType)` Initializes this Q-learning agent.
`MultiAgentQLearning(SGDomain d, double discount, LearningRate learningRate, HashableStateFactory hashFactory, QFunction qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues, java.lang.String agentName, SGAgentType agentType)` Initializes this Q-learning agent.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Action`	`action(State s)` This method is called by the world when it needs the agent to choose an action
`void`	`gameStarting(World w, int agentNum)` This method is called by the world when a new game is starting.
`void`	`gameTerminated()` This method is called by the world when a game has ended.
`QSourceForSingleAgent`	`getMyQSource()` Returns this agent's individual Q-value source
`AgentQSourceMap`	`getQSources()` Returns an object that can provide Q-value sources for each agent.
`void`	`observeOutcome(State s, JointAction jointAction, double[] jointReward, State sprime, boolean isTerminal)` This method is called by the world when every agent in the world has taken their action.
`void`	`setLearningPolicy(PolicyFromJointPolicy p)` Sets the learning policy to be followed by the agent.
`protected void`	`updateLatestQValue()` Updates the Q-value for the most recent observation if it has not already been updated

Methods inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase
agentName, agentType, getInternalRewardFunction, init, init, setAgentDetails, setInternalRewardFunction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - discount
```
protected double discount
```
    The discount factor
  - myQSource
```
protected QSourceForSingleAgent myQSource
```
    This agent's Q-value source
  - qSourceMap
```
protected AgentQSourceMap qSourceMap
```
    The object that maps to other agent's Q-value sources
  - learningPolicy
```
protected PolicyFromJointPolicy learningPolicy
```
    The learning policy to be followed
  - learningRate
```
protected LearningRate learningRate
```
    The learning rate for updating Q-values
  - qInit
```
protected QFunction qInit
```
    The Q-value initialization to use
  - hashingFactory
```
protected HashableStateFactory hashingFactory
```
    The state hashing factory used to index Q-values by state
  - backupOperator
```
protected SGBackupOperator backupOperator
```
    The backup operator that defines the solution concept being learned
  - queryOtherAgentsQSource
```
protected boolean queryOtherAgentsQSource
```
    Whether this agent is using the Q-values stored by other agents in the world rather than keeping a separate copy of the Q-values for each agent itself.
  - needsToUpdateQValue
```
protected boolean needsToUpdateQValue
```
    Whether the agent needs to update its Q-values from a recent experience
  - nextQValue
```
protected double nextQValue
```
    The new Q-value to which the last Q-value needs to be udpated
  - qToUpdate
```
protected JAQValue qToUpdate
```
    Which Q-value object needs to be updated
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
  - agentNum
```
protected int agentNum
```
- Constructor Detail
  - MultiAgentQLearning
```
public MultiAgentQLearning(SGDomain d,
                           double discount,
                           double learningRate,
                           HashableStateFactory hashFactory,
                           double qInit,
                           SGBackupOperator backupOperator,
                           boolean queryOtherAgentsForTheirQValues,
                           java.lang.String agentName,
                           SGAgentType agentType)
```
    Initializes this Q-learning agent. This agent's Q-source will use a QSourceForSingleAgent.HashBackedQSource q-source and the learning policy is defaulted to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will only store its own Q-values and will use the other agent's stored Q-values to determine theirs.
    
    Parameters:
    
    d - the domain in which to perform learing
    
    discount - the discount factor
    
    learningRate - the constant learning rate
    
    hashFactory - the hashing factory used to index states and Q-values
    
    qInit - the default Q-value to which all initial Q-values will be initialized
    
    backupOperator - the backup operator to use that defines the solution concept being learned
    
    queryOtherAgentsForTheirQValues - it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.
    
    agentName - the name of the agent
    
    agentType - the SGAgentType for the agent defining its action space
  - MultiAgentQLearning
```
public MultiAgentQLearning(SGDomain d,
                           double discount,
                           LearningRate learningRate,
                           HashableStateFactory hashFactory,
                           QFunction qInit,
                           SGBackupOperator backupOperator,
                           boolean queryOtherAgentsForTheirQValues,
                           java.lang.String agentName,
                           SGAgentType agentType)
```
    Initializes this Q-learning agent. This agent's Q-source will use a QSourceForSingleAgent.HashBackedQSource q-source and the learning policy is defaulted to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will only store its own Q-values and will use the other agent's stored Q-values to determine theirs.
    
    Parameters:
    
    d - the domain in which to perform learing
    
    discount - the discount factor
    
    learningRate - the learning rate function to use
    
    hashFactory - the hashing factory used to index states and Q-values
    
    qInit - the q-value initialization to use
    
    backupOperator - the backup operator to use that defines the solution concept being learned
    
    queryOtherAgentsForTheirQValues - it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.
    
    agentName - the name of the agent
    
    agentType - the SGAgentType for the agent defining its action space
- Method Detail
  - getMyQSource
```
public QSourceForSingleAgent getMyQSource()
```
    Returns this agent's individual Q-value source
    
    Returns:
    
    this agent's individual Q-value source
  - getQSources
```
public AgentQSourceMap getQSources()
```
    Description copied from interface: MultiAgentQSourceProvider
    
    Returns an object that can provide Q-value sources for each agent.
    
    Specified by:
    
    getQSources in interface MultiAgentQSourceProvider
    
    Returns:
    
    a AgentQSourceMap object.
  - setLearningPolicy
```
public void setLearningPolicy(PolicyFromJointPolicy p)
```
    Sets the learning policy to be followed by the agent. The underlining joint policy of the learning policy be an instance of MultiAgentQLearning or a runtime exception will be thrown.
    
    Parameters:
    
    p - the learning policy to follow
  - gameStarting
```
public void gameStarting(World w,
                         int agentNum)
```
    Description copied from interface: SGAgent
    
    This method is called by the world when a new game is starting.
    
    Specified by:
    
    gameStarting in interface SGAgent
    
    Parameters:
    
    w - the world in which the game is starting
    
    agentNum - the agent number of the agent in the world
  - action
```
public Action action(State s)
```
    Description copied from interface: SGAgent
    
    This method is called by the world when it needs the agent to choose an action
    
    Specified by:
    
    action in interface SGAgent
    
    Parameters:
    
    s - the current state of the world
    
    Returns:
    
    the action this agent wishes to take
  - observeOutcome
```
public void observeOutcome(State s,
                           JointAction jointAction,
                           double[] jointReward,
                           State sprime,
                           boolean isTerminal)
```
    Description copied from interface: SGAgent
    
    This method is called by the world when every agent in the world has taken their action. It conveys the result of the joint action.
    
    Specified by:
    
    observeOutcome in interface SGAgent
    
    Parameters:
    
    s - the state in which the last action of each agent was taken
    
    jointAction - the joint action of all agents in the world
    
    jointReward - the joint reward of all agents in the world
    
    sprime - the next state to which the agent transitioned
    
    isTerminal - whether the new state is a terminal state
  - gameTerminated
```
public void gameTerminated()
```
    Description copied from interface: SGAgent
    
    This method is called by the world when a game has ended.
    
    Specified by:
    
    gameTerminated in interface SGAgent
  - updateLatestQValue
```
protected void updateLatestQValue()
```
    Updates the Q-value for the most recent observation if it has not already been updated

Class MultiAgentQLearning

Field Summary

Fields inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase

Constructor Summary

Method Summary

Methods inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase

Methods inherited from class java.lang.Object

Field Detail

discount

myQSource

qSourceMap

learningPolicy

learningRate

qInit

hashingFactory

backupOperator

queryOtherAgentsQSource

needsToUpdateQValue

nextQValue

qToUpdate

totalNumberOfSteps

agentNum

Constructor Detail

MultiAgentQLearning

MultiAgentQLearning

Method Detail

getMyQSource

getQSources

setLearningPolicy

gameStarting

action

observeOutcome

gameTerminated

updateLatestQValue