SGNaiveQLAgent

java.lang.Object
- burlap.mdp.stochasticgames.agent.SGAgentBase
- - burlap.behavior.stochasticgames.agents.naiveq.SGNaiveQLAgent

All Implemented Interfaces:

QFunction, QProvider, ValueFunction, SGAgent

Direct Known Subclasses:

SGQWActionHistory
```
public class SGNaiveQLAgent
extends SGAgentBase
implements QProvider
```
A Tabular Q-learning [1] algorithm for stochastic games formalisms. This algorithm ignores the actions of other agents and treats the outcomes from their decisions as if they're part of the environment transition dynamics, hence the "naive" qualifier.
1. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`agentNum`
`protected double`	`discount` The discount factor
`protected HashableStateFactory`	`hashFactory` The state hashing factory to use.
`protected LearningRate`	`learningRate` the learning rate
`protected Policy`	`policy` The policy this agent follows
`protected QFunction`	`qInit` Defines how q-values are initialized
`protected java.util.Map<HashableState,java.util.List<QValue>>`	`qMap` The tabular map from (hashed) states to the list of Q-values for each action in those states
`protected java.util.Map<HashableState,State>`	`stateRepresentations` A map from hashed states to the internal state representation for the states stored in the q-table.
`protected StateMapping`	`storedMapAbstraction` A state abstraction to use.
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.

Fields inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase
agentType, domain, internalRewardFunction, world, worldAgentName

Constructor Summary

Constructors
Constructor and Description
`SGNaiveQLAgent(SGDomain d, double discount, double learningRate, double defaultQ, HashableStateFactory hashFactory)` Initializes with a default 0.1 epsilon greedy policy/strategy
`SGNaiveQLAgent(SGDomain d, double discount, double learningRate, HashableStateFactory hashFactory)` Initializes with a default Q-value of 0 and a 0.1 epsilon greedy policy/strategy
`SGNaiveQLAgent(SGDomain d, double discount, double learningRate, QFunction qInitizalizer, HashableStateFactory hashFactory)` Initializes with a default 0.1 epsilon greedy policy/strategy

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Action`	`action(State s)` This method is called by the world when it needs the agent to choose an action
`void`	`gameStarting(World w, int agentNum)` This method is called by the world when a new game is starting.
`void`	`gameTerminated()` This method is called by the world when a game has ended.
`protected double`	`getMaxQValue(State s)` Returns maximum numeric Q-value for a given state
`void`	`observeOutcome(State s, JointAction jointAction, double[] jointReward, State sprime, boolean isTerminal)` This method is called by the world when every agent in the world has taken their action.
`double`	`qValue(State s, Action a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`qValues(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`SGNaiveQLAgent`	`setAgentDetails(java.lang.String agentName, SGAgentType type)`
`void`	`setLearningRate(LearningRate lr)`
`void`	`setQValueInitializer(QFunction qInit)`
`void`	`setStoredMapAbstraction(StateMapping abstraction)` Sets the state abstraction that this agent will use
`void`	`setStrategy(Policy policy)` Sets the Q-learning policy that this agent will use (e.g., epsilon greedy)
`protected HashableState`	`stateHash(State s)` First abstracts state s, and then returns the `HashableState` object for the abstracted state.
`protected QValue`	`storedQ(State s, Action a)`
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase
agentName, agentType, getInternalRewardFunction, init, init, setInternalRewardFunction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - qMap
```
protected java.util.Map<HashableState,java.util.List<QValue>> qMap
```
    The tabular map from (hashed) states to the list of Q-values for each action in those states
  - stateRepresentations
```
protected java.util.Map<HashableState,State> stateRepresentations
```
    A map from hashed states to the internal state representation for the states stored in the q-table. This is useful since two identical states may have different object instance name identifiers that can affect the parameters in GroundedActions.
  - storedMapAbstraction
```
protected StateMapping storedMapAbstraction
```
    A state abstraction to use.
  - discount
```
protected double discount
```
    The discount factor
  - learningRate
```
protected LearningRate learningRate
```
    the learning rate
  - qInit
```
protected QFunction qInit
```
    Defines how q-values are initialized
  - policy
```
protected Policy policy
```
    The policy this agent follows
  - hashFactory
```
protected HashableStateFactory hashFactory
```
    The state hashing factory to use.
  - agentNum
```
protected int agentNum
```
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - SGNaiveQLAgent
```
public SGNaiveQLAgent(SGDomain d,
                      double discount,
                      double learningRate,
                      HashableStateFactory hashFactory)
```
    Initializes with a default Q-value of 0 and a 0.1 epsilon greedy policy/strategy
    
    Parameters:
    
    d - the domain in which the agent will act
    
    discount - the discount factor
    
    learningRate - the learning rate
    
    hashFactory - the state hashing factory
  - SGNaiveQLAgent
```
public SGNaiveQLAgent(SGDomain d,
                      double discount,
                      double learningRate,
                      double defaultQ,
                      HashableStateFactory hashFactory)
```
    Initializes with a default 0.1 epsilon greedy policy/strategy
    
    Parameters:
    
    d - the domain in which the agent will act
    
    discount - the discount factor
    
    learningRate - the learning rate
    
    defaultQ - the default to which all Q-values will be initialized
    
    hashFactory - the state hashing factory
  - SGNaiveQLAgent
```
public SGNaiveQLAgent(SGDomain d,
                      double discount,
                      double learningRate,
                      QFunction qInitizalizer,
                      HashableStateFactory hashFactory)
```
    Initializes with a default 0.1 epsilon greedy policy/strategy
    
    Parameters:
    
    d - the domain in which the agent will act
    
    discount - the discount factor
    
    learningRate - the learning rate
    
    qInitizalizer - the Q-value initialization method
    
    hashFactory - the state hashing factory
- Method Detail
  - setAgentDetails
```
public SGNaiveQLAgent setAgentDetails(java.lang.String agentName,
                                      SGAgentType type)
```
    Overrides:
    
    setAgentDetails in class SGAgentBase
  - setStoredMapAbstraction
```
public void setStoredMapAbstraction(StateMapping abstraction)
```
    Sets the state abstraction that this agent will use
    
    Parameters:
    
    abstraction - the state abstraction that this agent will use
  - setStrategy
```
public void setStrategy(Policy policy)
```
    Sets the Q-learning policy that this agent will use (e.g., epsilon greedy)
    
    Parameters:
    
    policy - the Q-learning policy that this agent will use
  - setQValueInitializer
```
public void setQValueInitializer(QFunction qInit)
```
  - setLearningRate
```
public void setLearningRate(LearningRate lr)
```
  - gameStarting
```
public void gameStarting(World w,
                         int agentNum)
```
    Description copied from interface: SGAgent
    
    This method is called by the world when a new game is starting.
    
    Specified by:
    
    gameStarting in interface SGAgent
    
    Parameters:
    
    w - the world in which the game is starting
    
    agentNum - the agent number of the agent in the world
  - action
```
public Action action(State s)
```
    Description copied from interface: SGAgent
    
    This method is called by the world when it needs the agent to choose an action
    
    Specified by:
    
    action in interface SGAgent
    
    Parameters:
    
    s - the current state of the world
    
    Returns:
    
    the action this agent wishes to take
  - observeOutcome
```
public void observeOutcome(State s,
                           JointAction jointAction,
                           double[] jointReward,
                           State sprime,
                           boolean isTerminal)
```
    Description copied from interface: SGAgent
    
    This method is called by the world when every agent in the world has taken their action. It conveys the result of the joint action.
    
    Specified by:
    
    observeOutcome in interface SGAgent
    
    Parameters:
    
    s - the state in which the last action of each agent was taken
    
    jointAction - the joint action of all agents in the world
    
    jointReward - the joint reward of all agents in the world
    
    sprime - the next state to which the agent transitioned
    
    isTerminal - whether the new state is a terminal state
  - gameTerminated
```
public void gameTerminated()
```
    Description copied from interface: SGAgent
    
    This method is called by the world when a game has ended.
    
    Specified by:
    
    gameTerminated in interface SGAgent
  - getMaxQValue
```
protected double getMaxQValue(State s)
```
    Returns maximum numeric Q-value for a given state
    
    Parameters:
    
    s - the state for which the max Q-value should be returned
    
    Returns:
    
    maximum numeric Q-value for a given state
  - stateHash
```
protected HashableState stateHash(State s)
```
    First abstracts state s, and then returns the HashableState object for the abstracted state.
    
    Parameters:
    
    s - the state for which the state hash should be returned.
    
    Returns:
    
    the hashed state.
  - qValues
```
public java.util.List<QValue> qValues(State s)
```
    Description copied from interface: QProvider
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    qValues in interface QProvider
    
    Parameters:
    
    s - the state for which Q-values are to be returned.
    
    Returns:
    
    a List of QValue objects for ever permissible action for the given input state.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    
    s - the state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.
  - qValue
```
public double qValue(State s,
                     Action a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    qValue in interface QFunction
    
    Parameters:
    
    s - the input state
    
    a - the input action
    
    Returns:
    
    the QValue for the given state-action pair.
  - storedQ
```
protected QValue storedQ(State s,
                         Action a)
```

Class SGNaiveQLAgent

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase

Constructor Summary

Method Summary

Methods inherited from class burlap.mdp.stochasticgames.agent.SGAgentBase

Methods inherited from class java.lang.Object

Field Detail

qMap

stateRepresentations

storedMapAbstraction

discount

learningRate

qInit

policy

hashFactory

agentNum

totalNumberOfSteps

Constructor Detail

SGNaiveQLAgent

SGNaiveQLAgent

SGNaiveQLAgent

Method Detail

setAgentDetails

setStoredMapAbstraction

setStrategy

setQValueInitializer

setLearningRate

gameStarting

action

observeOutcome

gameTerminated

getMaxQValue

stateHash

qValues

value

qValue

storedQ