EMinMaxPolicy

java.lang.Object
- burlap.behavior.stochasticgames.JointPolicy
- - burlap.behavior.stochasticgames.madynamicprogramming.MAQSourcePolicy
  - - burlap.behavior.stochasticgames.madynamicprogramming.policies.EMinMaxPolicy

All Implemented Interfaces:

EnumerablePolicy, Policy
```
public class EMinMaxPolicy
extends MAQSourcePolicy
implements EnumerablePolicy
```
Class for following a minmax joint policy. Given some target agent, a minmax joint policy is computed over the space of joint action Q-values. A fraction epsilong of the time though, a random joint action is selected. If the input Q-values are not zero-sum, then they are forced to be from the perspective of the target agent.

Author:

James MacGlashan

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`epsilon` The epsilon parameter specifying how often random joint actions are returned
`protected MultiAgentQSourceProvider`	`qSourceProvider` The multi-agent q-source provider
`protected int`	`targetAgent` The target agent who is maximizing action selection

Fields inherited from class burlap.behavior.stochasticgames.JointPolicy
agentsInJointPolicy, agentsSynchronizedSoFar, lastSyncedState, lastSynchronizedJointAction

Constructor Summary

Constructors
Constructor and Description
`EMinMaxPolicy(double epsilon)` Initializes for a given epsilon value; the fraction of the time a random joint action is selected
`EMinMaxPolicy(MultiAgentQLearning actingAgent, double epsilon, int targetAgentNum)` Initializes for a given Q-learning agent and epsilon value.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Action`	`action(State s)` This method will return an action sampled by the policy for the given state.
`double`	`actionProb(State s, Action a)` Returns the probability/probability density that the given action will be taken in the given state.
`JointPolicy`	`copy()` Creates a copy of this joint policy and returns it.
`boolean`	`definedFor(State s)` Specifies whether this policy is defined for the input state.
`java.util.List<ActionProb>`	`policyDistribution(State s)` This method will return action probability distribution defined by the policy.
`void`	`setQSourceProvider(MultiAgentQSourceProvider provider)` Sets the `MultiAgentQSourceProvider` that will be used to define this object's joint policy.
`void`	`setTargetAgent(int agentNum)` Sets the target privileged agent from which this joint policy is defined.

Methods inherited from class burlap.behavior.stochasticgames.JointPolicy
getAgentsInJointPolicy, getAgentSynchronizedActionSelection, getAllJointActions, setAgentsInJointPolicy, setAgentsInJointPolicyFromWorld, setAgentTypesInJointPolicy

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - qSourceProvider
```
protected MultiAgentQSourceProvider qSourceProvider
```
    The multi-agent q-source provider
  - epsilon
```
protected double epsilon
```
    The epsilon parameter specifying how often random joint actions are returned
  - targetAgent
```
protected int targetAgent
```
    The target agent who is maximizing action selection
- Constructor Detail
  - EMinMaxPolicy
```
public EMinMaxPolicy(double epsilon)
```
    Initializes for a given epsilon value; the fraction of the time a random joint action is selected
    
    Parameters:
    
    epsilon - the espilon parameter
  - EMinMaxPolicy
```
public EMinMaxPolicy(MultiAgentQLearning actingAgent,
                     double epsilon,
                     int targetAgentNum)
```
    Initializes for a given Q-learning agent and epsilon value. The Q-learning agent is set as the Q-source and they are set as the target agent. Epsilon is the fraction of the time a random joint action is selected.
    
    Parameters:
    
    actingAgent - the Q-learning agent
    
    epsilon - the epsilon parameter
    
    targetAgentNum - the agent number of the target agent
- Method Detail
  - setTargetAgent
```
public void setTargetAgent(int agentNum)
```
    Description copied from class: JointPolicy
    
    Sets the target privileged agent from which this joint policy is defined.
    
    Specified by:
    
    setTargetAgent in class JointPolicy
    
    Parameters:
    
    agentNum - the target agent.
  - copy
```
public JointPolicy copy()
```
    Description copied from class: JointPolicy
    
    Creates a copy of this joint policy and returns it. This is useful when generating different agents using the same kind of policy, but have different target agents evaluating it.
    
    Specified by:
    
    copy in class JointPolicy
    
    Returns:
    
    a copy of this joint policy.
  - action
```
public Action action(State s)
```
    Description copied from interface: Policy
    
    This method will return an action sampled by the policy for the given state. If the defined policy is stochastic, then multiple calls to this method for the same state may return different actions. The sampling should be with respect to defined action distribution that is returned by getActionDistributionForState
    
    Specified by:
    
    action in interface Policy
    
    Parameters:
    
    s - the state for which an action should be returned
    
    Returns:
    
    a sample action from the action distribution; null if the policy is undefined for s
  - actionProb
```
public double actionProb(State s,
                         Action a)
```
    Description copied from interface: Policy
    
    Returns the probability/probability density that the given action will be taken in the given state.
    
    Specified by:
    
    actionProb in interface Policy
    
    Parameters:
    
    s - the state of interest
    
    a - the action that may be taken in the state
    
    Returns:
    
    the probability/probability density
  - policyDistribution
```
public java.util.List<ActionProb> policyDistribution(State s)
```
    Description copied from interface: EnumerablePolicy
    
    This method will return action probability distribution defined by the policy. The action distribution is represented by a list of ActionProb objects, each which specifies a grounded action and a probability of that grounded action being taken. The returned list does not have to include actions with probability 0.
    
    Specified by:
    
    policyDistribution in interface EnumerablePolicy
    
    Parameters:
    
    s - the state for which an action distribution should be returned
    
    Returns:
    
    a list of possible actions taken by the policy and their probability.
  - definedFor
```
public boolean definedFor(State s)
```
    Description copied from interface: Policy
    
    Specifies whether this policy is defined for the input state.
    
    Specified by:
    
    definedFor in interface Policy
    
    Parameters:
    
    s - the input state to test for whether this policy is defined
    
    Returns:
    
    true if this policy is defined for State s, false otherwise.
  - setQSourceProvider
```
public void setQSourceProvider(MultiAgentQSourceProvider provider)
```
    Description copied from class: MAQSourcePolicy
    
    Sets the MultiAgentQSourceProvider that will be used to define this object's joint policy.
    
    Specified by:
    
    setQSourceProvider in class MAQSourcePolicy
    
    Parameters:
    
    provider - the MultiAgentQSourceProvider that will be used to define this object's joint policy.

Class EMinMaxPolicy

Field Summary

Fields inherited from class burlap.behavior.stochasticgames.JointPolicy

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.stochasticgames.JointPolicy

Methods inherited from class java.lang.Object

Field Detail

qSourceProvider

epsilon

targetAgent

Constructor Detail

EMinMaxPolicy

EMinMaxPolicy

Method Detail

setTargetAgent

copy

action

actionProb

policyDistribution

definedFor

setQSourceProvider