public class SGNaiveQLAgent extends SGAgentBase implements QProvider
1. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected int |
agentNum |
protected double |
discount
The discount factor
|
protected HashableStateFactory |
hashFactory
The state hashing factory to use.
|
protected LearningRate |
learningRate
the learning rate
|
protected Policy |
policy
The policy this agent follows
|
protected QFunction |
qInit
Defines how q-values are initialized
|
protected java.util.Map<HashableState,java.util.List<QValue>> |
qMap
The tabular map from (hashed) states to the list of Q-values for each action in those states
|
protected java.util.Map<HashableState,State> |
stateRepresentations
A map from hashed states to the internal state representation for the states stored in the q-table.
|
protected StateMapping |
storedMapAbstraction
A state abstraction to use.
|
protected int |
totalNumberOfSteps
The total number of learning steps performed by this agent.
|
agentType, domain, internalRewardFunction, world, worldAgentName
Constructor and Description |
---|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
double defaultQ,
HashableStateFactory hashFactory)
Initializes with a default 0.1 epsilon greedy policy/strategy
|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
HashableStateFactory hashFactory)
Initializes with a default Q-value of 0 and a 0.1 epsilon greedy policy/strategy
|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
QFunction qInitizalizer,
HashableStateFactory hashFactory)
Initializes with a default 0.1 epsilon greedy policy/strategy
|
Modifier and Type | Method and Description |
---|---|
Action |
action(State s)
This method is called by the world when it needs the agent to choose an action
|
void |
gameStarting(World w,
int agentNum)
This method is called by the world when a new game is starting.
|
void |
gameTerminated()
This method is called by the world when a game has ended.
|
protected double |
getMaxQValue(State s)
Returns maximum numeric Q-value for a given state
|
void |
observeOutcome(State s,
JointAction jointAction,
double[] jointReward,
State sprime,
boolean isTerminal)
This method is called by the world when every agent in the world has taken their action.
|
double |
qValue(State s,
Action a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
qValues(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
SGNaiveQLAgent |
setAgentDetails(java.lang.String agentName,
SGAgentType type) |
void |
setLearningRate(LearningRate lr) |
void |
setQValueInitializer(QFunction qInit) |
void |
setStoredMapAbstraction(StateMapping abstraction)
Sets the state abstraction that this agent will use
|
void |
setStrategy(Policy policy)
Sets the Q-learning policy that this agent will use (e.g., epsilon greedy)
|
protected HashableState |
stateHash(State s)
First abstracts state s, and then returns the
HashableState object for the abstracted state. |
protected QValue |
storedQ(State s,
Action a) |
double |
value(State s)
Returns the value function evaluation of the given state.
|
agentName, agentType, getInternalRewardFunction, init, init, setInternalRewardFunction
protected java.util.Map<HashableState,java.util.List<QValue>> qMap
protected java.util.Map<HashableState,State> stateRepresentations
protected StateMapping storedMapAbstraction
protected double discount
protected LearningRate learningRate
protected QFunction qInit
protected Policy policy
protected HashableStateFactory hashFactory
protected int agentNum
protected int totalNumberOfSteps
public SGNaiveQLAgent(SGDomain d, double discount, double learningRate, HashableStateFactory hashFactory)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning ratehashFactory
- the state hashing factorypublic SGNaiveQLAgent(SGDomain d, double discount, double learningRate, double defaultQ, HashableStateFactory hashFactory)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning ratedefaultQ
- the default to which all Q-values will be initializedhashFactory
- the state hashing factorypublic SGNaiveQLAgent(SGDomain d, double discount, double learningRate, QFunction qInitizalizer, HashableStateFactory hashFactory)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning rateqInitizalizer
- the Q-value initialization methodhashFactory
- the state hashing factorypublic SGNaiveQLAgent setAgentDetails(java.lang.String agentName, SGAgentType type)
setAgentDetails
in class SGAgentBase
public void setStoredMapAbstraction(StateMapping abstraction)
abstraction
- the state abstraction that this agent will usepublic void setStrategy(Policy policy)
policy
- the Q-learning policy that this agent will usepublic void setQValueInitializer(QFunction qInit)
public void setLearningRate(LearningRate lr)
public void gameStarting(World w, int agentNum)
SGAgent
gameStarting
in interface SGAgent
w
- the world in which the game is startingagentNum
- the agent number of the agent in the worldpublic Action action(State s)
SGAgent
public void observeOutcome(State s, JointAction jointAction, double[] jointReward, State sprime, boolean isTerminal)
SGAgent
observeOutcome
in interface SGAgent
s
- the state in which the last action of each agent was takenjointAction
- the joint action of all agents in the worldjointReward
- the joint reward of all agents in the worldsprime
- the next state to which the agent transitionedisTerminal
- whether the new state is a terminal statepublic void gameTerminated()
SGAgent
gameTerminated
in interface SGAgent
protected double getMaxQValue(State s)
s
- the state for which the max Q-value should be returnedprotected HashableState stateHash(State s)
HashableState
object for the abstracted state.s
- the state for which the state hash should be returned.public java.util.List<QValue> qValues(State s)
QProvider
List
of QValue
objects for ever permissible action for the given input state.public double value(State s)
ValueFunction
value
in interface ValueFunction
s
- the state to evaluate.public double qValue(State s, Action a)
QFunction
QValue
for the given state-action pair.