public class ActorCritic extends MDPSolver implements LearningAgent
Actor
and Critic
objects. The general structure of the
learning algorithm is for the Actor
class to be queried for an action given the current state of the world.
That action is taken and a resulting state is observed. The Critic
is then asked to critique this behavior
which is returned in a CritiqueResult
object and then passed along to the Actor
so that the actor may
update is behavior accordingly.
In addition to learning, this algorithm can also be used for planning using the planFromState(State)
method. If you plan to use it for planning, you should call the initializeForPlanning(int)
method before calling the planFromState(State)
.
Modifier and Type | Field and Description |
---|---|
protected Actor |
actor
The actor component to use.
|
protected Critic |
critic
The critic component to use
|
protected java.util.LinkedList<Episode> |
episodeHistory
The saved and most recent learning episodes this agent has performed.
|
protected int |
maxEpisodeSize
The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted
to Integer.MAX_VALUE.
|
protected int |
numEpisodesForPlanning
The number of simulated learning episodes to use when the
planFromState(State) method is called. |
protected int |
numEpisodesToStore
The number of most recent learning episodes to store.
|
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
ActorCritic(SADomain domain,
double gamma,
Actor actor,
Critic critic)
Initializes the learning algorithm.
|
ActorCritic(SADomain domain,
double gamma,
Actor actor,
Critic critic,
int maxEpisodeSize)
Initializes the learning algorithm.
|
Modifier and Type | Method and Description |
---|---|
void |
addActionType(ActionType a)
Adds an additional action the solver that is not included in the domain definition.
|
java.util.List<Episode> |
getAllStoredLearningEpisodes() |
Episode |
getLastLearningEpisode() |
Policy |
getPolicy()
Returns the policy/actor of this learning algorithm.
|
void |
initializeForPlanning(int numEpisodesForPlanning)
Sets the number of simulated episodes to use for planning when
the
planFromState(State) method is called. |
void |
planFromState(State initialState) |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
Episode |
runLearningEpisode(Environment env) |
Episode |
runLearningEpisode(Environment env,
int maxSteps) |
void |
setNumEpisodesToStore(int numEps) |
applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
protected Actor actor
protected Critic critic
protected int maxEpisodeSize
protected int numEpisodesForPlanning
planFromState(State)
method is called.protected java.util.LinkedList<Episode> episodeHistory
protected int numEpisodesToStore
public ActorCritic(SADomain domain, double gamma, Actor actor, Critic critic)
domain
- the domain in which to learngamma
- the discount factoractor
- the actor component to use to select actionscritic
- the critic component to use to critiquepublic ActorCritic(SADomain domain, double gamma, Actor actor, Critic critic, int maxEpisodeSize)
domain
- the domain in which to learngamma
- the discount factoractor
- the actor component to use to select actionscritic
- the critic component to use to critiquemaxEpisodeSize
- the maximum number of steps the agent will take in a learning episode before the agent gives up.public void initializeForPlanning(int numEpisodesForPlanning)
planFromState(State)
method is called. If the
RewardFunction
and TerminalFunction
are not set, the planFromState(State)
method will throw a runtime exception.numEpisodesForPlanning
- the number of simulated episodes to run for planning.public void addActionType(ActionType a)
MDPSolverInterface
Option
should be added using this method.addActionType
in interface MDPSolverInterface
addActionType
in class MDPSolver
a
- the action to add to the solverpublic Episode runLearningEpisode(Environment env)
runLearningEpisode
in interface LearningAgent
public Episode runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
public Episode getLastLearningEpisode()
public void setNumEpisodesToStore(int numEps)
public java.util.List<Episode> getAllStoredLearningEpisodes()
public void planFromState(State initialState)
public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver