ActorCritic

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.learning.actorcritic.ActorCritic

All Implemented Interfaces:

LearningAgent, MDPSolverInterface
```
public class ActorCritic
extends MDPSolver
implements LearningAgent
```
This is a general class structure for implementing Actor-critic learning. The kind of actor critic learning performed can be modified by swapping out different Actor and Critic objects. The general structure of the learning algorithm is for the Actor class to be queried for an action given the current state of the world. That action is taken and a resulting state is observed. The Critic is then asked to critique this behavior which is returned in a CritiqueResult object and then passed along to the Actor so that the actor may update is behavior accordingly.

In addition to learning, this algorithm can also be used for planning using the planFromState(burlap.oomdp.core.states.State) method. If you plan to use it for planning, you should call the initializeForPlanning(burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, int) method before calling the planFromState(burlap.oomdp.core.states.State).

Author:

James MacGlashan

Field Summary

Fields
Modifier and Type	Field and Description
`protected Actor`	`actor` The actor component to use.
`protected Critic`	`critic` The critic component to use
`protected java.util.LinkedList<EpisodeAnalysis>`	`episodeHistory` The saved and most recent learning episodes this agent has performed.
`protected int`	`maxEpisodeSize` The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted to Integer.MAX_VALUE.
`protected int`	`numEpisodesForPlanning` The number of simulated learning episodes to use when the `planFromState(State)` method is called.
`protected int`	`numEpisodesToStore` The number of most recent learning episodes to store.

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`ActorCritic(Domain domain, double gamma, Actor actor, Critic critic)` Initializes the learning algorithm.
`ActorCritic(Domain domain, double gamma, Actor actor, Critic critic, int maxEpisodeSize)` Initializes the learning algorithm.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addNonDomainReferencedAction(Action a)` Adds an additional action the solver that is not included in the domain definition.
`java.util.List<EpisodeAnalysis>`	`getAllStoredLearningEpisodes()`
`EpisodeAnalysis`	`getLastLearningEpisode()`
`Policy`	`getPolicy()` Returns the policy/actor of this learning algorithm.
`void`	`initializeForPlanning(RewardFunction rf, TerminalFunction tf, int numEpisodesForPlanning)` Sets the `RewardFunction`, `TerminalFunction`, and the number of simulated episodes to use for planning when the `planFromState(burlap.oomdp.core.states.State)` method is called.
`void`	`planFromState(State initialState)`
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`EpisodeAnalysis`	`runLearningEpisode(Environment env)`
`EpisodeAnalysis`	`runLearningEpisode(Environment env, int maxSteps)`
`void`	`setNumEpisodesToStore(int numEps)`

Methods inherited from class burlap.behavior.singleagent.MDPSolver
getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - actor
```
protected Actor actor
```
    The actor component to use.
  - critic
```
protected Critic critic
```
    The critic component to use
  - maxEpisodeSize
```
protected int maxEpisodeSize
```
    The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted to Integer.MAX_VALUE.
  - numEpisodesForPlanning
```
protected int numEpisodesForPlanning
```
    The number of simulated learning episodes to use when the planFromState(State) method is called.
  - episodeHistory
```
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
```
    The saved and most recent learning episodes this agent has performed.
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of most recent learning episodes to store.
- Constructor Detail
  - ActorCritic
```
public ActorCritic(Domain domain,
           double gamma,
           Actor actor,
           Critic critic)
```
    Initializes the learning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    gamma - the discount factor
    actor - the actor component to use to select actions
    critic - the critic component to use to critique
  - ActorCritic
```
public ActorCritic(Domain domain,
           double gamma,
           Actor actor,
           Critic critic,
           int maxEpisodeSize)
```
    Initializes the learning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    gamma - the discount factor
    actor - the actor component to use to select actions
    critic - the critic component to use to critique
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode before the agent gives up.
- Method Detail
  - initializeForPlanning
```
public void initializeForPlanning(RewardFunction rf,
                         TerminalFunction tf,
                         int numEpisodesForPlanning)
```
    Sets the RewardFunction, TerminalFunction, and the number of simulated episodes to use for planning when the planFromState(burlap.oomdp.core.states.State) method is called. If the RewardFunction and TerminalFunction are not set, the planFromState(burlap.oomdp.core.states.State) method will throw a runtime exception.
    
    Parameters:
    rf - the reward function to use for planning
    tf - the terminal function to use for planning
    numEpisodesForPlanning - the number of simulated episodes to run for planning.
  - addNonDomainReferencedAction
```
public void addNonDomainReferencedAction(Action a)
```
    Description copied from interface: MDPSolverInterface
    
    Adds an additional action the solver that is not included in the domain definition. For instance, an Option should be added using this method.
    
    Specified by:
    
    addNonDomainReferencedAction in interface MDPSolverInterface
    
    Overrides:
    
    addNonDomainReferencedAction in class MDPSolver
    
    Parameters:
    a - the action to add to the solver
  - runLearningEpisode
```
public EpisodeAnalysis runLearningEpisode(Environment env)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - runLearningEpisode
```
public EpisodeAnalysis runLearningEpisode(Environment env,
                                 int maxSteps)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - getLastLearningEpisode
```
public EpisodeAnalysis getLastLearningEpisode()
```
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
  - getAllStoredLearningEpisodes
```
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
```
  - planFromState
```
public void planFromState(State initialState)
```
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver
  - getPolicy
```
public Policy getPolicy()
```
    Returns the policy/actor of this learning algorithm. Note that all Actor objects are also Policy objects.
    
    Returns:
    the policy/actor of this learning algorithm.

Class ActorCritic

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Field Detail

actor

critic

maxEpisodeSize

numEpisodesForPlanning

episodeHistory

numEpisodesToStore

Constructor Detail

ActorCritic

ActorCritic

Method Detail

initializeForPlanning

addNonDomainReferencedAction

runLearningEpisode

runLearningEpisode

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

planFromState

resetSolver

getPolicy