public class SGNaiveQLAgent extends SGAgentBase implements QProvider
1. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
QProvider.Helper| Modifier and Type | Field and Description |
|---|---|
protected int |
agentNum |
protected double |
discount
The discount factor
|
protected HashableStateFactory |
hashFactory
The state hashing factory to use.
|
protected LearningRate |
learningRate
the learning rate
|
protected Policy |
policy
The policy this agent follows
|
protected QFunction |
qInit
Defines how q-values are initialized
|
protected java.util.Map<HashableState,java.util.List<QValue>> |
qMap
The tabular map from (hashed) states to the list of Q-values for each action in those states
|
protected java.util.Map<HashableState,State> |
stateRepresentations
A map from hashed states to the internal state representation for the states stored in the q-table.
|
protected StateMapping |
storedMapAbstraction
A state abstraction to use.
|
protected int |
totalNumberOfSteps
The total number of learning steps performed by this agent.
|
agentType, domain, internalRewardFunction, world, worldAgentName| Constructor and Description |
|---|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
double defaultQ,
HashableStateFactory hashFactory)
Initializes with a default 0.1 epsilon greedy policy/strategy
|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
HashableStateFactory hashFactory)
Initializes with a default Q-value of 0 and a 0.1 epsilon greedy policy/strategy
|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
QFunction qInitizalizer,
HashableStateFactory hashFactory)
Initializes with a default 0.1 epsilon greedy policy/strategy
|
| Modifier and Type | Method and Description |
|---|---|
Action |
action(State s)
This method is called by the world when it needs the agent to choose an action
|
void |
gameStarting(World w,
int agentNum)
This method is called by the world when a new game is starting.
|
void |
gameTerminated()
This method is called by the world when a game has ended.
|
protected double |
getMaxQValue(State s)
Returns maximum numeric Q-value for a given state
|
void |
observeOutcome(State s,
JointAction jointAction,
double[] jointReward,
State sprime,
boolean isTerminal)
This method is called by the world when every agent in the world has taken their action.
|
double |
qValue(State s,
Action a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
qValues(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
SGNaiveQLAgent |
setAgentDetails(java.lang.String agentName,
SGAgentType type) |
void |
setLearningRate(LearningRate lr) |
void |
setQValueInitializer(QFunction qInit) |
void |
setStoredMapAbstraction(StateMapping abstraction)
Sets the state abstraction that this agent will use
|
void |
setStrategy(Policy policy)
Sets the Q-learning policy that this agent will use (e.g., epsilon greedy)
|
protected HashableState |
stateHash(State s)
First abstracts state s, and then returns the
HashableState object for the abstracted state. |
protected QValue |
storedQ(State s,
Action a) |
double |
value(State s)
Returns the value function evaluation of the given state.
|
agentName, agentType, getInternalRewardFunction, init, init, setInternalRewardFunctionprotected java.util.Map<HashableState,java.util.List<QValue>> qMap
protected java.util.Map<HashableState,State> stateRepresentations
protected StateMapping storedMapAbstraction
protected double discount
protected LearningRate learningRate
protected QFunction qInit
protected Policy policy
protected HashableStateFactory hashFactory
protected int agentNum
protected int totalNumberOfSteps
public SGNaiveQLAgent(SGDomain d, double discount, double learningRate, HashableStateFactory hashFactory)
d - the domain in which the agent will actdiscount - the discount factorlearningRate - the learning ratehashFactory - the state hashing factorypublic SGNaiveQLAgent(SGDomain d, double discount, double learningRate, double defaultQ, HashableStateFactory hashFactory)
d - the domain in which the agent will actdiscount - the discount factorlearningRate - the learning ratedefaultQ - the default to which all Q-values will be initializedhashFactory - the state hashing factorypublic SGNaiveQLAgent(SGDomain d, double discount, double learningRate, QFunction qInitizalizer, HashableStateFactory hashFactory)
d - the domain in which the agent will actdiscount - the discount factorlearningRate - the learning rateqInitizalizer - the Q-value initialization methodhashFactory - the state hashing factorypublic SGNaiveQLAgent setAgentDetails(java.lang.String agentName, SGAgentType type)
setAgentDetails in class SGAgentBasepublic void setStoredMapAbstraction(StateMapping abstraction)
abstraction - the state abstraction that this agent will usepublic void setStrategy(Policy policy)
policy - the Q-learning policy that this agent will usepublic void setQValueInitializer(QFunction qInit)
public void setLearningRate(LearningRate lr)
public void gameStarting(World w, int agentNum)
SGAgentgameStarting in interface SGAgentw - the world in which the game is startingagentNum - the agent number of the agent in the worldpublic Action action(State s)
SGAgentpublic void observeOutcome(State s, JointAction jointAction, double[] jointReward, State sprime, boolean isTerminal)
SGAgentobserveOutcome in interface SGAgents - the state in which the last action of each agent was takenjointAction - the joint action of all agents in the worldjointReward - the joint reward of all agents in the worldsprime - the next state to which the agent transitionedisTerminal - whether the new state is a terminal statepublic void gameTerminated()
SGAgentgameTerminated in interface SGAgentprotected double getMaxQValue(State s)
s - the state for which the max Q-value should be returnedprotected HashableState stateHash(State s)
HashableState object for the abstracted state.s - the state for which the state hash should be returned.public java.util.List<QValue> qValues(State s)
QProviderList of QValue objects for ever permissible action for the given input state.public double value(State s)
ValueFunctionvalue in interface ValueFunctions - the state to evaluate.public double qValue(State s, Action a)
QFunctionQValue for the given state-action pair.