public class SGNaiveQLAgent extends Agent implements QComputablePlanner
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected double |
discount
The discount factor
|
protected StateHashFactory |
hashFactory
The state hashing factory to use.
|
protected LearningRate |
learningRate
the learning rate
|
protected Policy |
policy
The policy this agent follows
|
protected ValueFunctionInitialization |
qInit
Defines how q-values are initialized
|
protected java.util.Map<StateHashTuple,java.util.List<QValue>> |
qMap
The tabular map from (hashed) states to the list of Q-values for each action in those states
|
protected java.util.Map<StateHashTuple,State> |
stateRepresentations
A map from hashed states to the internal state representation for the states stored in the q-table.
|
protected StateAbstraction |
storedMapAbstraction
A state abstraction to use.
|
protected int |
totalNumberOfSteps
The total number of learning steps performed by this agent.
|
agentType, domain, internalRewardFunction, world, worldAgentName
Constructor and Description |
---|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
double defaultQ,
StateHashFactory hashFactory)
Initializes with a default 0.1 epsilon greedy policy/strategy
|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
StateHashFactory hashFactory)
Initializes with a default Q-value of 0 and a 0.1 epsilon greedy policy/strategy
|
SGNaiveQLAgent(SGDomain d,
double discount,
double learningRate,
ValueFunctionInitialization qInitizalizer,
StateHashFactory hashFactory)
Initializes with a default 0.1 epsilon greedy policy/strategy
|
Modifier and Type | Method and Description |
---|---|
void |
gameStarting()
This method is called by the world when a new game is starting.
|
void |
gameTerminated()
This method is called by the world when a game has ended.
|
GroundedSingleAction |
getAction(State s)
This method is called by the world when it needs the agent to choose an action
|
protected double |
getMaxQValue(State s)
Returns maximum numeric Q-value for a given state
|
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
void |
observeOutcome(State s,
JointAction jointAction,
java.util.Map<java.lang.String,java.lang.Double> jointReward,
State sprime,
boolean isTerminal)
This method is called by the world when every agent in the world has taken their action.
|
void |
setLearningRate(LearningRate lr) |
void |
setQValueInitializer(ValueFunctionInitialization qInit) |
void |
setStoredMapAbstraction(StateAbstraction abstraction)
Sets the state abstraction that this agent will use
|
void |
setStrategy(Policy policy)
Sets the Q-learning policy that this agent will use (e.g., epsilon greedy)
|
protected StateHashTuple |
stateHash(State s)
First abstracts state s, and then returns the
StateHashTuple object for the abstracted state. |
protected GroundedSingleAction |
translateAction(GroundedSingleAction a,
java.util.Map<java.lang.String,java.lang.String> matching)
Takes an input action and mapping objects in the source state for the action to objects in another state
and returns a action with its object parameters mapped to the matched objects.
|
getAgentName, getAgentType, getInternalRewardFunction, init, joinWorld, setInternalRewardFunction
protected java.util.Map<StateHashTuple,java.util.List<QValue>> qMap
protected java.util.Map<StateHashTuple,State> stateRepresentations
protected StateAbstraction storedMapAbstraction
protected double discount
protected LearningRate learningRate
protected ValueFunctionInitialization qInit
protected Policy policy
protected StateHashFactory hashFactory
protected int totalNumberOfSteps
public SGNaiveQLAgent(SGDomain d, double discount, double learningRate, StateHashFactory hashFactory)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning ratehashFactory
- the state hashing factorypublic SGNaiveQLAgent(SGDomain d, double discount, double learningRate, double defaultQ, StateHashFactory hashFactory)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning ratedefaultQ
- the default to which all Q-values will be initializedhashFactory
- the state hashing factorypublic SGNaiveQLAgent(SGDomain d, double discount, double learningRate, ValueFunctionInitialization qInitizalizer, StateHashFactory hashFactory)
d
- the domain in which the agent will actdiscount
- the discount factorlearningRate
- the learning rateqInitizalizer
- the Q-value initialization methodhashFactory
- the state hashing factorypublic void setStoredMapAbstraction(StateAbstraction abstraction)
abstraction
- the state abstraction that this agent will usepublic void setStrategy(Policy policy)
policy
- the Q-learning policy that this agent will usepublic void setQValueInitializer(ValueFunctionInitialization qInit)
public void setLearningRate(LearningRate lr)
public void gameStarting()
Agent
gameStarting
in class Agent
public GroundedSingleAction getAction(State s)
Agent
public void observeOutcome(State s, JointAction jointAction, java.util.Map<java.lang.String,java.lang.Double> jointReward, State sprime, boolean isTerminal)
Agent
observeOutcome
in class Agent
s
- the state in which the last action of each agent was takenjointAction
- the joint action of all agents in the worldjointReward
- the joint reward of all agents in the worldsprime
- the next state to which the agent transitionedisTerminal
- whether the new state is a terminal statepublic void gameTerminated()
Agent
gameTerminated
in class Agent
protected double getMaxQValue(State s)
s
- the state for which the max Q-value should be returnedprotected StateHashTuple stateHash(State s)
StateHashTuple
object for the abstracted state.s
- the state for which the state hash should be returned.protected GroundedSingleAction translateAction(GroundedSingleAction a, java.util.Map<java.lang.String,java.lang.String> matching)
a
- the input actionmatching
- the matching between objects from the source state in which the action was generated to objects in another state.public java.util.List<QValue> getQs(State s)
QComputablePlanner
List
of QValue
objects for ever permissible action for the given input state.getQs
in interface QComputablePlanner
s
- the state for which Q-values are to be returned.List
of QValue
objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QComputablePlanner
QValue
for the given state-action pair.getQ
in interface QComputablePlanner
s
- the input statea
- the input actionQValue
for the given state-action pair.