public class ActorCritic extends OOMDPPlanner implements LearningAgent
Actor
and Critic
objects. The general structure of the
learning algorithm is for the Actor
class to be queried for an action given the current state of the world.
That action is taken and a resulting state is observed. The Critic
is then asked to critique this behavior
which is returned in a CritiqueResult
object and then passed along to the Actor
so that the actor may
update is behavior accordingly.LearningAgent.LearningAgentBookKeeping
Modifier and Type | Field and Description |
---|---|
protected Actor |
actor
The actor component to use.
|
protected Critic |
critic
The critic component to use
|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
The saved and most recent learning episodes this agent has performed.
|
protected int |
maxEpisodeSize
The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted
to Integer.MAX_VALUE.
|
protected int |
numEpisodesForPlanning
The number of simulated learning episodes to use when the
planFromState(State) method is called. |
protected int |
numEpisodesToStore
The number of most recent learning episodes to store.
|
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
ActorCritic(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
Actor actor,
Critic critic)
Initializes the learning algorithm.
|
ActorCritic(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
Actor actor,
Critic critic,
int maxEpisodeSize)
Initializes the learning algorithm.
|
Modifier and Type | Method and Description |
---|---|
void |
addNonDomainReferencedAction(Action a)
Adds an additional action the planner that is not included in the domain definition.
|
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes()
Returns all saved
EpisodeAnalysis objects of which the agent has kept track. |
EpisodeAnalysis |
getLastLearningEpisode()
Returns the last learning episode of the agent.
|
Policy |
getPolicy()
Returns the policy/actor of this learning algorithm.
|
void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState)
Causes the agent to perform a learning episode starting in the given initial state.
|
EpisodeAnalysis |
runLearningEpisodeFrom(State initialState,
int maxSteps)
Causes the agent to perform a learning episode starting in the given initial state.
|
void |
setNumEpisodesToStore(int numEps)
Tells the agent how many
EpisodeAnalysis objects representing learning episodes to internally store. |
getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected Actor actor
protected Critic critic
protected int maxEpisodeSize
protected int numEpisodesForPlanning
planFromState(State)
method is called.protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int numEpisodesToStore
public ActorCritic(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, Actor actor, Critic critic)
domain
- the domain in which to learnrf
- the reward function to usetf
- the terminal state function to usegamma
- the discount factoractor
- the actor component to use to select actionscritic
- the critic component to use to critiquepublic ActorCritic(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, Actor actor, Critic critic, int maxEpisodeSize)
domain
- the domain in which to learnrf
- the reward function to usetf
- the terminal state function to usegamma
- the discount factoractor
- the actor component to use to select actionscritic
- the critic component to use to critiquemaxEpisodeSize
- the maximum number of steps the agent will take in a learning episode before the agent gives up.public void addNonDomainReferencedAction(Action a)
OOMDPPlanner
Option
should be added using this method.addNonDomainReferencedAction
in class OOMDPPlanner
a
- the action to add to the plannerpublic EpisodeAnalysis runLearningEpisodeFrom(State initialState)
LearningAgent
runLearningEpisodeFrom
in interface LearningAgent
initialState
- The initial state in which the agent will start the episode.EpisodeAnalysis
object.public EpisodeAnalysis runLearningEpisodeFrom(State initialState, int maxSteps)
LearningAgent
runLearningEpisodeFrom
in interface LearningAgent
initialState
- The initial state in which the agent will start the episode.maxSteps
- the maximum number of steps in the episodeEpisodeAnalysis
object.public EpisodeAnalysis getLastLearningEpisode()
LearningAgent
getLastLearningEpisode
in interface LearningAgent
public void setNumEpisodesToStore(int numEps)
LearningAgent
EpisodeAnalysis
objects representing learning episodes to internally store.
For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number
has nothing to do with how learning is performed; it is purely for performance gathering.setNumEpisodesToStore
in interface LearningAgent
numEps
- the number of learning episodes to remember.public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
LearningAgent
EpisodeAnalysis
objects of which the agent has kept track.getAllStoredLearningEpisodes
in interface LearningAgent
EpisodeAnalysis
objects of which the agent has kept track.public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class OOMDPPlanner
initialState
- the initial state of the planning problempublic void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class OOMDPPlanner