EGreedyMaxWellfare

java.lang.Object
- burlap.behavior.policy.Policy
- - burlap.behavior.stochasticgames.JointPolicy
  - - burlap.behavior.stochasticgames.madynamicprogramming.MAQSourcePolicy
    - - burlap.behavior.stochasticgames.madynamicprogramming.policies.EGreedyMaxWellfare

```
public class EGreedyMaxWellfare
extends MAQSourcePolicy
```
An epsilon greedy joint policy, in which the joint aciton with the highest aggregate Q-values for each agent is returned a 1-epsilon fraction of the time and a random joint action an epsilon fraction of the time. Ties are broken deterministically (the first joint aciton with the maximum value is selected), but can be set to default to break ties randomly. The former is useful to maintain consistency between agents selecting their action indepdently from each other. This policy is typically used for CoCo-Q agents.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.policy.Policy
  Policy.ActionProb, Policy.GroundedAnnotatedAction, Policy.PolicyUndefinedException

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`breakTiesRandomly` Whether ties should be broken randomly or not.
`protected double`	`epsilon` The epsilon parameter specifying how often random joint actions are returned
`protected MultiAgentQSourceProvider`	`qSourceProvider` The multi-agent q-source provider
`protected java.util.Random`	`rand` A random object used for sampling

Fields inherited from class burlap.behavior.stochasticgames.JointPolicy
agentsInJointPolicy, agentsSyncrhonizedSoFar, lastSyncedState, lastSynchronizedJointAction

Fields inherited from class burlap.behavior.policy.Policy
annotateOptionDecomposition, evaluateDecomposesOptions

Constructor Summary

Constructors
Constructor and Description
`EGreedyMaxWellfare(double epsilon)` Initializes for a given epsilon value.
`EGreedyMaxWellfare(double epsilon, boolean breakTiesRandomly)` Initializes for a given epsilon value and whether to break ties randomly.
`EGreedyMaxWellfare(MultiAgentQLearning actingAgent, double epsilon)` Initializes for a multi-agent Q-learning object and epsilon value.
`EGreedyMaxWellfare(MultiAgentQLearning actingAgent, double epsilon, boolean breakTiesRandomly)` Initializes for a multi-agent Q-learning object and epsilon value.

Method Summary

Methods
Modifier and Type	Method and Description
`JointPolicy`	`copy()` Creates a copy of this joint policy and returns it.
`AbstractGroundedAction`	`getAction(State s)` This method will return an action sampled by the policy for the given state.
`java.util.List<Policy.ActionProb>`	`getActionDistributionForState(State s)` This method will return action probability distribution defined by the policy.
`boolean`	`isDefinedFor(State s)` Specifies whether this policy is defined for the input state.
`boolean`	`isStochastic()` Indicates whether the policy is stochastic or deterministic.
`void`	`setBreakTiesRandomly(boolean breakTiesRandomly)` Whether to break ties randomly or deterministically.
`void`	`setQSourceProvider(MultiAgentQSourceProvider provider)` Sets the `MultiAgentQSourceProvider` that will be used to define this object's joint policy.
`void`	`setTargetAgent(java.lang.String agentName)` Sets the target privledged agent from which this joint policy is defined.

Methods inherited from class burlap.behavior.stochasticgames.JointPolicy
getAgentsInJointPolicy, getAgentSynchronizedActionSelection, getAllJointActions, setAgentsInJointPolicy, setAgentsInJointPolicy, setAgentsInJointPolicyFromWorld

Methods inherited from class burlap.behavior.policy.Policy
evaluateBehavior, evaluateBehavior, evaluateBehavior, evaluateBehavior, evaluateBehavior, evaluateMethodsShouldAnnotateOptionDecomposition, evaluateMethodsShouldDecomposeOption, followAndRecordPolicy, followAndRecordPolicy, getDeterministicPolicy, getProbOfAction, getProbOfActionGivenDistribution, getProbOfActionGivenDistribution, sampleFromActionDistribution

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - qSourceProvider
```
protected MultiAgentQSourceProvider qSourceProvider
```
    The multi-agent q-source provider
  - epsilon
```
protected double epsilon
```
    The epsilon parameter specifying how often random joint actions are returned
  - rand
```
protected java.util.Random rand
```
    A random object used for sampling
  - breakTiesRandomly
```
protected boolean breakTiesRandomly
```
    Whether ties should be broken randomly or not.
- Constructor Detail
  - EGreedyMaxWellfare
```
public EGreedyMaxWellfare(double epsilon)
```
    Initializes for a given epsilon value. The set of agents for which joint actions are returned and multi-agent q-source provider will need to be set manually with the methods JointPolicy.setAgentsInJointPolicy(java.util.Map), and setQSourceProvider(MultiAgentQSourceProvider) before the policy can be queried. Note that the MultiAgentQLearning and MultiAgentDPPlanningAgent agents may do this themselves. Consult the documentation to check.
    
    Parameters:
    epsilon - the fraction of the time [0, 1] that the agent selections random actions.
  - EGreedyMaxWellfare
```
public EGreedyMaxWellfare(double epsilon,
                  boolean breakTiesRandomly)
```
    Initializes for a given epsilon value and whether to break ties randomly. The set of agents for which joint actions are returned and multi-agent q-source provider will need to be set manually with the methods JointPolicy.setAgentsInJointPolicy(java.util.Map), and setQSourceProvider(MultiAgentQSourceProvider) before the policy can be queried. Note that the MultiAgentQLearning and MultiAgentDPPlanningAgent agents may do this themselves. Consult the documentation to check.
    
    Parameters:
    epsilon - the fraction of the time [0, 1] that the agent selections random actions.
    breakTiesRandomly - whether ties should be broken randomly (true) or not (false)
  - EGreedyMaxWellfare
```
public EGreedyMaxWellfare(MultiAgentQLearning actingAgent,
                  double epsilon)
```
    Initializes for a multi-agent Q-learning object and epsilon value. The set of agents for which joint actions are to be returned must be subsequently defined with the JointPolicy.setAgentsInJointPolicy(java.util.Map). Note that the MultiAgentQLearning and MultiAgentDPPlanningAgent agents may do this themselves. Consult the documentation to check.
    
    Parameters:
    actingAgent - the agent who will use this policy.
    epsilon - the fraction of the time [0, 1] that the agent selections random actions.
  - EGreedyMaxWellfare
```
public EGreedyMaxWellfare(MultiAgentQLearning actingAgent,
                  double epsilon,
                  boolean breakTiesRandomly)
```
    Initializes for a multi-agent Q-learning object and epsilon value. The set of agents for which joint actions are to be returned must be subsequently defined with the JointPolicy.setAgentsInJointPolicy(java.util.Map). Note that the MultiAgentQLearning and MultiAgentDPPlanningAgent agents may do this themselves. Consult the documentation to check.
    
    Parameters:
    actingAgent - the agent who will use this policy.
    epsilon - the fraction of the time [0, 1] that the agent selections random actions.
    breakTiesRandomly - whether ties should be broken randomly (true) or not (false)
- Method Detail
  - setBreakTiesRandomly
```
public void setBreakTiesRandomly(boolean breakTiesRandomly)
```
    Whether to break ties randomly or deterministically. The former is useful for exploration during learning. The latter is useful to synchronize action selection for agents that must select an action indepdently from the same joint policy.
    
    Parameters:
    breakTiesRandomly - true if ties will be broken randomly; false if ties will be broken detemrinistically.
  - setQSourceProvider
```
public void setQSourceProvider(MultiAgentQSourceProvider provider)
```
    Description copied from class: MAQSourcePolicy
    
    Sets the MultiAgentQSourceProvider that will be used to define this object's joint policy.
    
    Specified by:
    
    setQSourceProvider in class MAQSourcePolicy
    
    Parameters:
    provider - the MultiAgentQSourceProvider that will be used to define this object's joint policy.
  - getAction
```
public AbstractGroundedAction getAction(State s)
```
    Description copied from class: Policy
    
    This method will return an action sampled by the policy for the given state. If the defined policy is stochastic, then multiple calls to this method for the same state may return different actions. The sampling should be with respect to defined action distribution that is returned by getActionDistributionForState
    
    Specified by:
    
    getAction in class Policy
    
    Parameters:
    s - the state for which an action should be returned
    
    Returns:
    a sample action from the action distribution; null if the policy is undefined for s
  - getActionDistributionForState
```
public java.util.List<Policy.ActionProb> getActionDistributionForState(State s)
```
    Description copied from class: Policy
    
    This method will return action probability distribution defined by the policy. The action distribution is represented by a list of ActionProb objects, each which specifies a grounded action and a probability of that grounded action being taken. The returned list does not have to include actions with probability 0.
    
    Specified by:
    
    getActionDistributionForState in class Policy
    
    Parameters:
    s - the state for which an action distribution should be returned
    
    Returns:
    a list of possible actions taken by the policy and their probability.
  - isStochastic
```
public boolean isStochastic()
```
    Description copied from class: Policy
    
    Indicates whether the policy is stochastic or deterministic.
    
    Specified by:
    
    isStochastic in class Policy
    
    Returns:
    true when the policy is stochastic; false when it is deterministic.
  - isDefinedFor
```
public boolean isDefinedFor(State s)
```
    Description copied from class: Policy
    
    Specifies whether this policy is defined for the input state.
    
    Specified by:
    
    isDefinedFor in class Policy
    
    Parameters:
    s - the input state to test for whether this policy is defined
    
    Returns:
    true if this policy is defined for State s, false otherwise.
  - setTargetAgent
```
public void setTargetAgent(java.lang.String agentName)
```
    Description copied from class: JointPolicy
    
    Sets the target privledged agent from which this joint policy is defined.
    
    Specified by:
    
    setTargetAgent in class JointPolicy
    
    Parameters:
    agentName - the name of the target agent.
  - copy
```
public JointPolicy copy()
```
    Description copied from class: JointPolicy
    
    Creates a copy of this joint policy and returns it. This is useful when generating different agents using the same kind of policy, but have different target agents evaluating it.
    
    Specified by:
    
    copy in class JointPolicy
    
    Returns:
    a copy of this joint policy.

Class EGreedyMaxWellfare

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.policy.Policy

Field Summary

Fields inherited from class burlap.behavior.stochasticgames.JointPolicy

Fields inherited from class burlap.behavior.policy.Policy

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.stochasticgames.JointPolicy

Methods inherited from class burlap.behavior.policy.Policy

Methods inherited from class java.lang.Object

Field Detail

qSourceProvider

epsilon

rand

breakTiesRandomly

Constructor Detail

EGreedyMaxWellfare

EGreedyMaxWellfare

EGreedyMaxWellfare

EGreedyMaxWellfare

Method Detail

setBreakTiesRandomly

setQSourceProvider

getAction

getActionDistributionForState

isStochastic

isDefinedFor

setTargetAgent

copy