public class ActorCritic extends OOMDPPlanner implements LearningAgent
Actor and Critic objects. The general structure of the
learning algorithm is for the Actor class to be queried for an action given the current state of the world.
That action is taken and a resulting state is observed. The Critic is then asked to critique this behavior
which is returned in a CritiqueResult object and then passed along to the Actor so that the actor may
update is behavior accordingly.LearningAgent.LearningAgentBookKeeping| Modifier and Type | Field and Description |
|---|---|
protected Actor |
actor
The actor component to use.
|
protected Critic |
critic
The critic component to use
|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
The saved and most recent learning episodes this agent has performed.
|
protected int |
maxEpisodeSize
The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted
to Integer.MAX_VALUE.
|
protected int |
numEpisodesForPlanning
The number of simulated learning episodes to use when the
planFromState(State) method is called. |
protected int |
numEpisodesToStore
The number of most recent learning episodes to store.
|
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description |
|---|
ActorCritic(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
Actor actor,
Critic critic)
Initializes the learning algorithm.
|
ActorCritic(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
Actor actor,
Critic critic,
int maxEpisodeSize)
Initializes the learning algorithm.
|
| Modifier and Type | Method and Description |
|---|---|
void |
addNonDomainReferencedAction(Action a)
Adds an additional action the planner that is not included in the domain definition.
|
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes()
Returns all saved
EpisodeAnalysis objects of which the agent has kept track. |
EpisodeAnalysis |
getLastLearningEpisode()
Returns the last learning episode of the agent.
|
Policy |
getPolicy()
Returns the policy/actor of this learning algorithm.
|
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState)
Causes the agent to perform a learning episode starting in the given initial state.
|
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState,
int maxSteps)
Causes the agent to perform a learning episode starting in the given initial state.
|
void |
setNumEpisodesToStore(int numEps)
Tells the agent how many
EpisodeAnalysis objects representing learning episodes to internally store. |
getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateActionprotected Actor actor
protected Critic critic
protected int maxEpisodeSize
protected int numEpisodesForPlanning
planFromState(State) method is called.protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int numEpisodesToStore
public ActorCritic(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, Actor actor, Critic critic)
domain - the domain in which to learnrf - the reward function to usetf - the terminal state function to usegamma - the discount factoractor - the actor component to use to select actionscritic - the critic component to use to critiquepublic ActorCritic(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, Actor actor, Critic critic, int maxEpisodeSize)
domain - the domain in which to learnrf - the reward function to usetf - the terminal state function to usegamma - the discount factoractor - the actor component to use to select actionscritic - the critic component to use to critiquemaxEpisodeSize - the maximum number of steps the agent will take in a learning episode before the agent gives up.public void addNonDomainReferencedAction(Action a)
OOMDPPlannerOption
should be added using this method.addNonDomainReferencedAction in class OOMDPPlannera - the action to add to the plannerpublic EpisodeAnalysis runLearningEpisodeFrom(State initialState)
LearningAgentrunLearningEpisodeFrom in interface LearningAgentinitialState - The initial state in which the agent will start the episode.EpisodeAnalysis object.public EpisodeAnalysis runLearningEpisodeFrom(State initialState, int maxSteps)
LearningAgentrunLearningEpisodeFrom in interface LearningAgentinitialState - The initial state in which the agent will start the episode.maxSteps - the maximum number of steps in the episodeEpisodeAnalysis object.public EpisodeAnalysis getLastLearningEpisode()
LearningAgentgetLastLearningEpisode in interface LearningAgentpublic void setNumEpisodesToStore(int numEps)
LearningAgentEpisodeAnalysis objects representing learning episodes to internally store.
For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number
has nothing to do with how learning is performed; it is purely for performance gathering.setNumEpisodesToStore in interface LearningAgentnumEps - the number of learning episodes to remember.public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
LearningAgentEpisodeAnalysis objects of which the agent has kept track.getAllStoredLearningEpisodes in interface LearningAgentEpisodeAnalysis objects of which the agent has kept track.public void planFromState(State initialState)
OOMDPPlannerplanFromState in class OOMDPPlannerinitialState - the initial state of the planning problempublic void resetPlannerResults()
OOMDPPlannerOOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults in class OOMDPPlanner