public class ActorCritic extends MDPSolver implements LearningAgent
Actor
and Critic
objects. The general structure of the
learning algorithm is for the Actor
class to be queried for an action given the current state of the world.
That action is taken and a resulting state is observed. The Critic
is then asked to critique this behavior
which is returned in a CritiqueResult
object and then passed along to the Actor
so that the actor may
update is behavior accordingly.
planFromState(burlap.oomdp.core.states.State)
method. If you plan to use it for planning, you should call the initializeForPlanning(burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, int)
method before calling the planFromState(burlap.oomdp.core.states.State)
.Modifier and Type | Field and Description |
---|---|
protected Actor |
actor
The actor component to use.
|
protected Critic |
critic
The critic component to use
|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
The saved and most recent learning episodes this agent has performed.
|
protected int |
maxEpisodeSize
The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted
to Integer.MAX_VALUE.
|
protected int |
numEpisodesForPlanning
The number of simulated learning episodes to use when the
planFromState(State) method is called. |
protected int |
numEpisodesToStore
The number of most recent learning episodes to store.
|
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
ActorCritic(Domain domain,
double gamma,
Actor actor,
Critic critic)
Initializes the learning algorithm.
|
ActorCritic(Domain domain,
double gamma,
Actor actor,
Critic critic,
int maxEpisodeSize)
Initializes the learning algorithm.
|
Modifier and Type | Method and Description |
---|---|
void |
addNonDomainReferencedAction(Action a)
Adds an additional action the solver that is not included in the domain definition.
|
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes() |
EpisodeAnalysis |
getLastLearningEpisode() |
Policy |
getPolicy()
Returns the policy/actor of this learning algorithm.
|
void |
initializeForPlanning(RewardFunction rf,
TerminalFunction tf,
int numEpisodesForPlanning)
Sets the
RewardFunction , TerminalFunction ,
and the number of simulated episodes to use for planning when
the planFromState(burlap.oomdp.core.states.State) method is called. |
void |
planFromState(State initialState) |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
EpisodeAnalysis |
runLearningEpisode(Environment env) |
EpisodeAnalysis |
runLearningEpisode(Environment env,
int maxSteps) |
void |
setNumEpisodesToStore(int numEps) |
getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
protected Actor actor
protected Critic critic
protected int maxEpisodeSize
protected int numEpisodesForPlanning
planFromState(State)
method is called.protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int numEpisodesToStore
public ActorCritic(Domain domain, double gamma, Actor actor, Critic critic)
domain
- the domain in which to learngamma
- the discount factoractor
- the actor component to use to select actionscritic
- the critic component to use to critiquepublic ActorCritic(Domain domain, double gamma, Actor actor, Critic critic, int maxEpisodeSize)
domain
- the domain in which to learngamma
- the discount factoractor
- the actor component to use to select actionscritic
- the critic component to use to critiquemaxEpisodeSize
- the maximum number of steps the agent will take in a learning episode before the agent gives up.public void initializeForPlanning(RewardFunction rf, TerminalFunction tf, int numEpisodesForPlanning)
RewardFunction
, TerminalFunction
,
and the number of simulated episodes to use for planning when
the planFromState(burlap.oomdp.core.states.State)
method is called. If the
RewardFunction
and TerminalFunction
are not set, the planFromState(burlap.oomdp.core.states.State)
method will throw a runtime exception.rf
- the reward function to use for planningtf
- the terminal function to use for planningnumEpisodesForPlanning
- the number of simulated episodes to run for planning.public void addNonDomainReferencedAction(Action a)
MDPSolverInterface
Option
should be added using this method.addNonDomainReferencedAction
in interface MDPSolverInterface
addNonDomainReferencedAction
in class MDPSolver
a
- the action to add to the solverpublic EpisodeAnalysis runLearningEpisode(Environment env)
runLearningEpisode
in interface LearningAgent
public EpisodeAnalysis runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
public EpisodeAnalysis getLastLearningEpisode()
public void setNumEpisodesToStore(int numEps)
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
public void planFromState(State initialState)
public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver