public class MultiAgentQLearning extends Agent implements MultiAgentQSourceProvider
getAction(State)
) or until the gameTerminated()
message is sent.
Q-value updates are delayed in this way because if Q-values for each agent are shared and distributed among the agents, this ensures
that the Q-values are all updated after the next Q-value has been determined for each agent.
In general the learning policy followed by this agent should reflect the needs of the solution concept being learned. For instance,
CoCo-Q should use some variant of a maximum wellfare joint policy.
The learning policy and its underlining joint policy will automatically be told that this agent is its target agent, the agent definitions
in the world, and that this agent is the Q-source provider of the joint policy MAQSourcePolicy
. If the set joint policy
is not an instance of MAQSourcePolicy
, then an exception will be thrown.Modifier and Type | Field and Description |
---|---|
protected SGBackupOperator |
backupOperator
The backup operator that defines the solution concept being learned
|
protected double |
discount
The discount factor
|
protected StateHashFactory |
hashingFactory
The state hashing factory used to index Q-values by state
|
protected PolicyFromJointPolicy |
learningPolicy
The learning policy to be followed
|
protected LearningRate |
learningRate
The learning rate for updating Q-values
|
protected QSourceForSingleAgent |
myQSource
This agent's Q-value source
|
protected boolean |
needsToUpdateQValue
Whether the agent needs to update its Q-values from a recent experience
|
protected double |
nextQValue
The new Q-value to which the last Q-value needs to be udpated
|
protected ValueFunctionInitialization |
qInit
The Q-value initialization to use
|
protected AgentQSourceMap |
qSourceMap
The object that maps to other agent's Q-value sources
|
protected JAQValue |
qToUpdate
Which Q-value object needs to be updated
|
protected boolean |
queryOtherAgentsQSource
Whether this agent is using the Q-values stored by other agents in the world rather than keeping a separate copy of the Q-values for each agent itself.
|
protected int |
totalNumberOfSteps
The total number of learning steps performed by this agent.
|
agentType, domain, internalRewardFunction, world, worldAgentName
Constructor and Description |
---|
MultiAgentQLearning(SGDomain d,
double discount,
double learningRate,
StateHashFactory hashFactory,
double qInit,
SGBackupOperator backupOperator,
boolean queryOtherAgentsForTheirQValues)
Initializes this Q-learning agent.
|
MultiAgentQLearning(SGDomain d,
double discount,
LearningRate learningRate,
StateHashFactory hashFactory,
ValueFunctionInitialization qInit,
SGBackupOperator backupOperator,
boolean queryOtherAgentsForTheirQValues)
Initializes this Q-learning agent.
|
Modifier and Type | Method and Description |
---|---|
void |
gameStarting()
This method is called by the world when a new game is starting.
|
void |
gameTerminated()
This method is called by the world when a game has ended.
|
GroundedSingleAction |
getAction(State s)
This method is called by the world when it needs the agent to choose an action
|
QSourceForSingleAgent |
getMyQSource()
Returns this agent's individual Q-value source
|
AgentQSourceMap |
getQSources()
Returns an object that can provide Q-value sources for each agent.
|
void |
joinWorld(World w,
AgentType as)
Causes this agent instance to join a world.
|
void |
observeOutcome(State s,
JointAction jointAction,
java.util.Map<java.lang.String,java.lang.Double> jointReward,
State sprime,
boolean isTerminal)
This method is called by the world when every agent in the world has taken their action.
|
void |
setLearningPolicy(PolicyFromJointPolicy p)
Sets the learning policy to be followed by the agent.
|
protected void |
updateLatestQValue()
Updates the Q-value for the most recent observation if it has not already been updated
|
getAgentName, getAgentType, getInternalRewardFunction, init, setInternalRewardFunction
protected double discount
protected QSourceForSingleAgent myQSource
protected AgentQSourceMap qSourceMap
protected PolicyFromJointPolicy learningPolicy
protected LearningRate learningRate
protected ValueFunctionInitialization qInit
protected StateHashFactory hashingFactory
protected SGBackupOperator backupOperator
protected boolean queryOtherAgentsQSource
protected boolean needsToUpdateQValue
protected double nextQValue
protected JAQValue qToUpdate
protected int totalNumberOfSteps
public MultiAgentQLearning(SGDomain d, double discount, double learningRate, StateHashFactory hashFactory, double qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues)
QSourceForSingleAgent.HashBackedQSource
q-source and the learning policy is defaulted
to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare
) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will
only store its own Q-values and will use the other agent's stored Q-values to determine theirs.d
- the domain in which to perform learingdiscount
- the discount factorlearningRate
- the constant learning ratehashFactory
- the hashing factory used to index states and Q-valuesqInit
- the default Q-value to which all initial Q-values will be initializedbackupOperator
- the backup operator to use that defines the solution concept being learnedqueryOtherAgentsForTheirQValues
- it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.public MultiAgentQLearning(SGDomain d, double discount, LearningRate learningRate, StateHashFactory hashFactory, ValueFunctionInitialization qInit, SGBackupOperator backupOperator, boolean queryOtherAgentsForTheirQValues)
QSourceForSingleAgent.HashBackedQSource
q-source and the learning policy is defaulted
to an epsilon = 0.1 maximum wellfare (EGreedyMaxWellfare
) derived policy. If queryOtherAgentsForTheirQValues is set to true, then this agent will
only store its own Q-values and will use the other agent's stored Q-values to determine theirs.d
- the domain in which to perform learingdiscount
- the discount factorlearningRate
- the learning rate function to usehashFactory
- the hashing factory used to index states and Q-valuesqInit
- the q-value initialization to usebackupOperator
- the backup operator to use that defines the solution concept being learnedqueryOtherAgentsForTheirQValues
- it true, then the agent uses the Q-values for other agents that are stored by them; if false then the agent stores a Q-value for each other agent in the world.public void joinWorld(World w, AgentType as)
Agent
public QSourceForSingleAgent getMyQSource()
public AgentQSourceMap getQSources()
MultiAgentQSourceProvider
getQSources
in interface MultiAgentQSourceProvider
AgentQSourceMap
object.public void setLearningPolicy(PolicyFromJointPolicy p)
MultiAgentQLearning
or a runtime exception will be thrown.p
- the learning policy to followpublic void gameStarting()
Agent
gameStarting
in class Agent
public GroundedSingleAction getAction(State s)
Agent
public void observeOutcome(State s, JointAction jointAction, java.util.Map<java.lang.String,java.lang.Double> jointReward, State sprime, boolean isTerminal)
Agent
observeOutcome
in class Agent
s
- the state in which the last action of each agent was takenjointAction
- the joint action of all agents in the worldjointReward
- the joint reward of all agents in the worldsprime
- the next state to which the agent transitionedisTerminal
- whether the new state is a terminal statepublic void gameTerminated()
Agent
gameTerminated
in class Agent
protected void updateLatestQValue()