public class SGQWActionHistory extends SGNaiveQLAgent
ActionIdMap
to use,
then when the first game starts, it will be initialized to an ParameterNaiveActionIdMap
and the number of
players will be set to the number of players in the world which this agent has joined. If the world contains
parameterized actions, this may be a problem and you should use the SGQWActionHistory(SGDomain, double, double, StateHashFactory, int, int, ActionIdMap)
constructor to resolve action parameterization instead.
1. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292. QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected ActionIdMap |
actionMap
a map from actions to int values which can be used to fill in an action history attribute value
|
static java.lang.String |
ATTHAID
A constant for the name of the attribute used to define which action an agent took
|
static java.lang.String |
ATTHNUM
A constant for the name of the history time index attribute.
|
static java.lang.String |
ATTHPN
A constant for the name of the attribute used to define which agent in the world this history object represents
|
protected ObjectClass |
classHistory
The object class that will be used to represent a history component.
|
static java.lang.String |
CLASSHISTORY
A constant for the name of the history object class.
|
protected java.util.LinkedList<JointAction> |
history
the joint action history
|
protected int |
historySize
The size of action history to store.
|
discount, hashFactory, learningRate, policy, qInit, qMap, stateRepresentations, storedMapAbstraction, totalNumberOfSteps
agentType, domain, internalRewardFunction, world, worldAgentName
Constructor and Description |
---|
SGQWActionHistory(SGDomain d,
double discount,
double learningRate,
StateHashFactory hashFactory,
int historySize)
Initializes the learning algorithm using 0.1 epsilon greedy learning strategy/policy
|
SGQWActionHistory(SGDomain d,
double discount,
double learningRate,
StateHashFactory hashFactory,
int historySize,
int maxPlayers,
ActionIdMap actionMap)
Initializes the learning algorithm using 0.1 epsilon greedy learning strategy/policy
|
Modifier and Type | Method and Description |
---|---|
void |
gameStarting()
This method is called by the world when a new game is starting.
|
protected State |
getHistoryAugmentedState(State s)
Takes an input state and returns an augmented state with the history of actions each agent previously took.
|
protected ObjectInstance |
getHistoryLessObjectInstanceForAgent(java.lang.String aname,
int h)
Returns a history object instance for a given agent in which the action that was taken is unset because
the episode has not last h steps.
|
protected ObjectInstance |
getHistoryObjectInstanceForAgent(GroundedSingleAction gsa,
int h)
Returns a history object instance for the corresponding action and how far back in history it occurred
|
protected void |
initializeActionMapAndAugmentedDomain()
Initializes the action map to be an instance of
ParameterNaiveActionIdMap and then initializes
the history augmented domain using the max players as the number of players in the world which this agent
has joined. |
protected void |
initializeHistoryAugmentedDomain(int maxPlayers)
Initializes the history augmented domain/state representation the agent will use
|
void |
observeOutcome(State s,
JointAction jointAction,
java.util.Map<java.lang.String,java.lang.Double> jointReward,
State sprime,
boolean isTerminal)
This method is called by the world when every agent in the world has taken their action.
|
protected StateHashTuple |
stateHash(State s)
First abstracts state s, and then returns the
StateHashTuple object for the abstracted state. |
gameTerminated, getAction, getMaxQValue, getQ, getQs, setLearningRate, setQValueInitializer, setStoredMapAbstraction, setStrategy, translateAction
getAgentName, getAgentType, getInternalRewardFunction, init, joinWorld, setInternalRewardFunction
protected java.util.LinkedList<JointAction> history
protected int historySize
protected ActionIdMap actionMap
protected ObjectClass classHistory
public static final java.lang.String ATTHNUM
public static final java.lang.String ATTHPN
public static final java.lang.String ATTHAID
public static java.lang.String CLASSHISTORY
public SGQWActionHistory(SGDomain d, double discount, double learningRate, StateHashFactory hashFactory, int historySize, int maxPlayers, ActionIdMap actionMap)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning ratehashFactory
- the state hashing factory to usehistorySize
- the number of previous steps to remember and with which to augment the state spacemaxPlayers
- the maximum number of players that will be in the gameactionMap
- a mapping from actions to integer identifiers for thempublic SGQWActionHistory(SGDomain d, double discount, double learningRate, StateHashFactory hashFactory, int historySize)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning ratehashFactory
- the state hashing factory to usehistorySize
- the number of previous steps to remember and with which to augment the state spaceprotected void initializeHistoryAugmentedDomain(int maxPlayers)
maxPlayers
- the maximum number of players in the gamepublic void gameStarting()
Agent
gameStarting
in class SGNaiveQLAgent
protected void initializeActionMapAndAugmentedDomain()
ParameterNaiveActionIdMap
and then initializes
the history augmented domain using the max players as the number of players in the world which this agent
has joined.public void observeOutcome(State s, JointAction jointAction, java.util.Map<java.lang.String,java.lang.Double> jointReward, State sprime, boolean isTerminal)
Agent
observeOutcome
in class SGNaiveQLAgent
s
- the state in which the last action of each agent was takenjointAction
- the joint action of all agents in the worldjointReward
- the joint reward of all agents in the worldsprime
- the next state to which the agent transitionedisTerminal
- whether the new state is a terminal stateprotected State getHistoryAugmentedState(State s)
s
- the input state to augmentprotected ObjectInstance getHistoryObjectInstanceForAgent(GroundedSingleAction gsa, int h)
gsa
- the action that was taken (which includes which agent took it)h
- how far back in history the action was taken.protected ObjectInstance getHistoryLessObjectInstanceForAgent(java.lang.String aname, int h)
aname
- the name of agent for which the history object should be returnedh
- how many step backs this object instance representsprotected StateHashTuple stateHash(State s)
SGNaiveQLAgent
StateHashTuple
object for the abstracted state.stateHash
in class SGNaiveQLAgent
s
- the state for which the state hash should be returned.