SGNaiveQLAgent

java.lang.Object
- burlap.oomdp.stochasticgames.Agent
- - burlap.behavior.stochasticgame.agents.naiveq.SGNaiveQLAgent

All Implemented Interfaces:

QComputablePlanner

Direct Known Subclasses:

SGQWActionHistory
```
public class SGNaiveQLAgent
extends Agent
implements QComputablePlanner
```
A Tabular Q-learning [1] algorithm for stochastic games formalisms. This algorithm ignores the actions of other agents and treats the outcomes from their decisions as if they're part of the environment transition dynamics, hence the "naive" qualifier.
1. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`discount` The discount factor
`protected StateHashFactory`	`hashFactory` The state hashing factory to use.
`protected LearningRate`	`learningRate` the learning rate
`protected Policy`	`policy` The policy this agent follows
`protected ValueFunctionInitialization`	`qInit` Defines how q-values are initialized
`protected java.util.Map<StateHashTuple,java.util.List<QValue>>`	`qMap` The tabular map from (hashed) states to the list of Q-values for each action in those states
`protected java.util.Map<StateHashTuple,State>`	`stateRepresentations` A map from hashed states to the internal state representation for the states stored in the q-table.
`protected StateAbstraction`	`storedMapAbstraction` A state abstraction to use.
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.

Fields inherited from class burlap.oomdp.stochasticgames.Agent
agentType, domain, internalRewardFunction, world, worldAgentName

Constructor Summary

Constructors
Constructor and Description
`SGNaiveQLAgent(SGDomain d, double discount, double learningRate, double defaultQ, StateHashFactory hashFactory)` Initializes with a default 0.1 epsilon greedy policy/strategy
`SGNaiveQLAgent(SGDomain d, double discount, double learningRate, StateHashFactory hashFactory)` Initializes with a default Q-value of 0 and a 0.1 epsilon greedy policy/strategy
`SGNaiveQLAgent(SGDomain d, double discount, double learningRate, ValueFunctionInitialization qInitizalizer, StateHashFactory hashFactory)` Initializes with a default 0.1 epsilon greedy policy/strategy

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`gameStarting()` This method is called by the world when a new game is starting.
`void`	`gameTerminated()` This method is called by the world when a game has ended.
`GroundedSingleAction`	`getAction(State s)` This method is called by the world when it needs the agent to choose an action
`protected double`	`getMaxQValue(State s)` Returns maximum numeric Q-value for a given state
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`observeOutcome(State s, JointAction jointAction, java.util.Map<java.lang.String,java.lang.Double> jointReward, State sprime, boolean isTerminal)` This method is called by the world when every agent in the world has taken their action.
`void`	`setLearningRate(LearningRate lr)`
`void`	`setQValueInitializer(ValueFunctionInitialization qInit)`
`void`	`setStoredMapAbstraction(StateAbstraction abstraction)` Sets the state abstraction that this agent will use
`void`	`setStrategy(Policy policy)` Sets the Q-learning policy that this agent will use (e.g., epsilon greedy)
`protected StateHashTuple`	`stateHash(State s)` First abstracts state s, and then returns the `StateHashTuple` object for the abstracted state.
`protected GroundedSingleAction`	`translateAction(GroundedSingleAction a, java.util.Map<java.lang.String,java.lang.String> matching)` Takes an input action and mapping objects in the source state for the action to objects in another state and returns a action with its object parameters mapped to the matched objects.

Methods inherited from class burlap.oomdp.stochasticgames.Agent
getAgentName, getAgentType, getInternalRewardFunction, init, joinWorld, setInternalRewardFunction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - qMap
```
protected java.util.Map<StateHashTuple,java.util.List<QValue>> qMap
```
    The tabular map from (hashed) states to the list of Q-values for each action in those states
  - stateRepresentations
```
protected java.util.Map<StateHashTuple,State> stateRepresentations
```
    A map from hashed states to the internal state representation for the states stored in the q-table. This is useful since two identical states may have different object instance name identifiers that can affect the parameters in GroundedActions.
  - storedMapAbstraction
```
protected StateAbstraction storedMapAbstraction
```
    A state abstraction to use.
  - discount
```
protected double discount
```
    The discount factor
  - learningRate
```
protected LearningRate learningRate
```
    the learning rate
  - qInit
```
protected ValueFunctionInitialization qInit
```
    Defines how q-values are initialized
  - policy
```
protected Policy policy
```
    The policy this agent follows
  - hashFactory
```
protected StateHashFactory hashFactory
```
    The state hashing factory to use.
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - SGNaiveQLAgent
```
public SGNaiveQLAgent(SGDomain d,
              double discount,
              double learningRate,
              StateHashFactory hashFactory)
```
    Initializes with a default Q-value of 0 and a 0.1 epsilon greedy policy/strategy
    
    Parameters:
    d - the domain in which the agent will act
    discount - the discount factor
    learningRate - the learning rate
    hashFactory - the state hashing factory
  - SGNaiveQLAgent
```
public SGNaiveQLAgent(SGDomain d,
              double discount,
              double learningRate,
              double defaultQ,
              StateHashFactory hashFactory)
```
    Initializes with a default 0.1 epsilon greedy policy/strategy
    
    Parameters:
    d - the domain in which the agent will act
    discount - the discount factor
    learningRate - the learning rate
    defaultQ - the default to which all Q-values will be initialized
    hashFactory - the state hashing factory
  - SGNaiveQLAgent
```
public SGNaiveQLAgent(SGDomain d,
              double discount,
              double learningRate,
              ValueFunctionInitialization qInitizalizer,
              StateHashFactory hashFactory)
```
    Initializes with a default 0.1 epsilon greedy policy/strategy
    
    Parameters:
    d - the domain in which the agent will act
    discount - the discount factor
    learningRate - the learning rate
    qInitizalizer - the Q-value initialization method
    hashFactory - the state hashing factory
- Method Detail
  - setStoredMapAbstraction
```
public void setStoredMapAbstraction(StateAbstraction abstraction)
```
    Sets the state abstraction that this agent will use
    
    Parameters:
    abstraction - the state abstraction that this agent will use
  - setStrategy
```
public void setStrategy(Policy policy)
```
    Sets the Q-learning policy that this agent will use (e.g., epsilon greedy)
    
    Parameters:
    policy - the Q-learning policy that this agent will use
  - setQValueInitializer
```
public void setQValueInitializer(ValueFunctionInitialization qInit)
```
  - setLearningRate
```
public void setLearningRate(LearningRate lr)
```
  - gameStarting
```
public void gameStarting()
```
    Description copied from class: Agent
    
    This method is called by the world when a new game is starting.
    
    Specified by:
    
    gameStarting in class Agent
  - getAction
```
public GroundedSingleAction getAction(State s)
```
    Description copied from class: Agent
    
    This method is called by the world when it needs the agent to choose an action
    
    Specified by:
    
    getAction in class Agent
    
    Parameters:
    s - the current state of the world
    
    Returns:
    the action this agent wishes to take
  - observeOutcome
```
public void observeOutcome(State s,
                  JointAction jointAction,
                  java.util.Map<java.lang.String,java.lang.Double> jointReward,
                  State sprime,
                  boolean isTerminal)
```
    Description copied from class: Agent
    
    This method is called by the world when every agent in the world has taken their action. It conveys the result of the joint action.
    
    Specified by:
    
    observeOutcome in class Agent
    
    Parameters:
    s - the state in which the last action of each agent was taken
    jointAction - the joint action of all agents in the world
    jointReward - the joint reward of all agents in the world
    sprime - the next state to which the agent transitioned
    isTerminal - whether the new state is a terminal state
  - gameTerminated
```
public void gameTerminated()
```
    Description copied from class: Agent
    
    This method is called by the world when a game has ended.
    
    Specified by:
    
    gameTerminated in class Agent
  - getMaxQValue
```
protected double getMaxQValue(State s)
```
    Returns maximum numeric Q-value for a given state
    
    Parameters:
    s - the state for which the max Q-value should be returned
    
    Returns:
    maximum numeric Q-value for a given state
  - stateHash
```
protected StateHashTuple stateHash(State s)
```
    First abstracts state s, and then returns the StateHashTuple object for the abstracted state.
    
    Parameters:
    s - the state for which the state hash should be returned.
    
    Returns:
    the hashed state.
  - translateAction
```
protected GroundedSingleAction translateAction(GroundedSingleAction a,
                                   java.util.Map<java.lang.String,java.lang.String> matching)
```
    Takes an input action and mapping objects in the source state for the action to objects in another state and returns a action with its object parameters mapped to the matched objects.
    
    Parameters:
    a - the input action
    matching - the matching between objects from the source state in which the action was generated to objects in another state.
    
    Returns:
    an action with its object parameters mapped according to the state object matching.
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.

Class SGNaiveQLAgent

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Field Summary

Fields inherited from class burlap.oomdp.stochasticgames.Agent

Constructor Summary

Method Summary

Methods inherited from class burlap.oomdp.stochasticgames.Agent

Methods inherited from class java.lang.Object

Field Detail

qMap

stateRepresentations

storedMapAbstraction

discount

learningRate

qInit

policy

hashFactory

totalNumberOfSteps

Constructor Detail

SGNaiveQLAgent

SGNaiveQLAgent

SGNaiveQLAgent

Method Detail

setStoredMapAbstraction

setStrategy

setQValueInitializer

setLearningRate

gameStarting

getAction

observeOutcome

gameTerminated

getMaxQValue

stateHash

translateAction

getQs

getQ