public class MultiAgentQLearning extends SGAgentBase implements MultiAgentQSourceProvider
In this class, each agent stores its own Q-value and an object that provides a source for the Q-values of other agents. This allows the storage of Q-values to vary so that an agent can store the Q-values forall other agents, or the map can provide access to the Q-values stored by other MultiAgentQLearning agents in the world so that only one copy of each agent's Q-value is ever stored. In the case of the latter, all agents should be implementing the same solution concept learning algorithm. Otherwise, each agent should maintain their own set of Q-values.
After an agent observes an outcome, it determines the change in Q-value. However, the agent will not actually update its Q-value
to the new value until it is asked for its next action (action(State)
) or until the gameTerminated()
message is sent.
Q-value updates are delayed in this way because if Q-values for each agent are shared and distributed among the agents, this ensures
that the Q-values are all updated after the next Q-value has been determined for each agent.
In general the learning policy followed by this agent should reflect the needs of the solution concept being learned. For instance, CoCo-Q should use some variant of a maximum welfare joint policy.
The learning policy and its underlining joint policy will automatically be told that this agent is its target agent, the agent definitions
in the world, and that this agent is the Q-source provider of the joint policy MAQSourcePolicy
. If the set joint policy
is not an instance of MAQSourcePolicy
, then an exception will be thrown.
Acknowledgements: Esha Gosh, John Meehan, Michalis Michaelidis for code on which this was based.
Modifier and Type | Field and Description |
---|---|
protected int |
agentNum |
protected SGBackupOperator |
backupOperator
The backup operator that defines the solution concept being learned
|
protected double |
discount
The discount factor
|
protected HashableStateFactory |
hashingFactory
The state hashing factory used to index Q-values by state
|
protected PolicyFromJointPolicy |
learningPolicy
The learning policy to be followed
|
protected LearningRate |
learningRate
The learning rate for updating Q-values
|
protected QSourceForSingleAgent |
myQSource
This agent's Q-value source
|
protected boolean |
needsToUpdateQValue
Whether the agent needs to update its Q-values from a recent experience
|
protected double |
nextQValue
The new Q-value to which the last Q-value needs to be udpated
|
protected QFunction |
qInit
The Q-value initialization to use
|
protected AgentQSourceMap |
qSourceMap
The object that maps to other agent's Q-value sources
|
protected JAQValue |
qToUpdate
Which Q-value object needs to be updated
|
protected boolean |
queryOtherAgentsQSource
Whether this agent is using the Q-values stored by other agents in the world rather than keeping a separate copy of the Q-values for each agent itself.
|
protected int |
totalNumberOfSteps
The total number of learning steps performed by this agent.
|
agentType, domain, internalRewardFunction, world, worldAgentName
Constructor and Description |
---|
MultiAgentQLearning(SGDomain d,
double discount,
double learningRate,
HashableStateFactory hashFactory,
double qInit,
SGBackupOperator backupOperator,
boolean queryOtherAgentsForTheirQValues,
java.lang.String agentName,
SGAgentType agentType)
Initializes this Q-learning agent.
|
MultiAgentQLearning(SGDomain d,
double discount,
LearningRate learningRate,
HashableStateFactory hashFactory,
QFunction qInit,
SGBackupOperator backupOperator,
boolean queryOtherAgentsForTheirQValues,
java.lang.String agentName,
SGAgentType agentType)
Initializes this Q-learning agent.
|
Modifier and Type | Method and Description |
---|---|
Action |
action(State s)
This method is called by the world when it needs the agent to choose an action
|
void |
gameStarting(World w,
int agentNum)
This method is called by the world when a new game is starting.
|
void |
gameTerminated()
This method is called by the world when a game has ended.
|
QSourceForSingleAgent |
getMyQSource()
Returns this agent's individual Q-value source
|
AgentQSourceMap |
getQSources()
Returns an object that can provide Q-value sources for each agent.
|
void |
observeOutcome(State s,
JointAction jointAction,
double[] jointReward,
State sprime,
boolean isTerminal)
This method is called by the world when every agent in the world has taken their action.
|
void |
setLearningPolicy(PolicyFromJointPolicy p)
Sets the learning policy to be followed by the agent.
|
protected void |
updateLatestQValue()
Updates the Q-value for the most recent observation if it has not already been updated
|
agentName, agentType, getInternalRewardFunction, init, init, setAgentDetails, setInternalRewardFunction
protected double discount
protected QSourceForSingleAgent myQSource
protected AgentQSourceMap qSourceMap
protected PolicyFromJointPolicy learningPolicy
protected LearningRate learningRate
protected QFunction qInit
protected HashableStateFactory hashingFactory
protected SGBackupOperator backupOperator
protected boolean queryOtherAgentsQSource
protected boolean needsToUpdateQValue
protected double nextQValue
protected JAQValue qToUpdate
protected int totalNumberOfSteps
protected int agentNum
public MultiAgentQLearning(SGDomain d, double discount, double learningRate, HashableStateFactory hashFactory, double qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues, java.lang.String agentName, SGAgentType agentType)
QSourceForSingleAgent.HashBackedQSource
q-source and the learning policy is defaulted
to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare
) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will
only store its own Q-values and will use the other agent's stored Q-values to determine theirs.d
- the domain in which to perform learingdiscount
- the discount factorlearningRate
- the constant learning ratehashFactory
- the hashing factory used to index states and Q-valuesqInit
- the default Q-value to which all initial Q-values will be initializedbackupOperator
- the backup operator to use that defines the solution concept being learnedqueryOtherAgentsForTheirQValues
- it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.agentName
- the name of the agentagentType
- the SGAgentType
for the agent defining its action spacepublic MultiAgentQLearning(SGDomain d, double discount, LearningRate learningRate, HashableStateFactory hashFactory, QFunction qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues, java.lang.String agentName, SGAgentType agentType)
QSourceForSingleAgent.HashBackedQSource
q-source and the learning policy is defaulted
to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare
) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will
only store its own Q-values and will use the other agent's stored Q-values to determine theirs.d
- the domain in which to perform learingdiscount
- the discount factorlearningRate
- the learning rate function to usehashFactory
- the hashing factory used to index states and Q-valuesqInit
- the q-value initialization to usebackupOperator
- the backup operator to use that defines the solution concept being learnedqueryOtherAgentsForTheirQValues
- it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.agentName
- the name of the agentagentType
- the SGAgentType
for the agent defining its action spacepublic QSourceForSingleAgent getMyQSource()
public AgentQSourceMap getQSources()
MultiAgentQSourceProvider
getQSources
in interface MultiAgentQSourceProvider
AgentQSourceMap
object.public void setLearningPolicy(PolicyFromJointPolicy p)
MultiAgentQLearning
or a runtime exception will be thrown.p
- the learning policy to followpublic void gameStarting(World w, int agentNum)
SGAgent
gameStarting
in interface SGAgent
w
- the world in which the game is startingagentNum
- the agent number of the agent in the worldpublic Action action(State s)
SGAgent
public void observeOutcome(State s, JointAction jointAction, double[] jointReward, State sprime, boolean isTerminal)
SGAgent
observeOutcome
in interface SGAgent
s
- the state in which the last action of each agent was takenjointAction
- the joint action of all agents in the worldjointReward
- the joint reward of all agents in the worldsprime
- the next state to which the agent transitionedisTerminal
- whether the new state is a terminal statepublic void gameTerminated()
SGAgent
gameTerminated
in interface SGAgent
protected void updateLatestQValue()