ActorCritic

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.learning.actorcritic.ActorCritic

All Implemented Interfaces:

LearningAgent
```
public class ActorCritic
extends OOMDPPlanner
implements LearningAgent
```
This is a general class structure for implementing Actor-critic learning. The kind of actor critic learning performed can be modified by swapping out different Actor and Critic objects. The general structure of the learning algorithm is for the Actor class to be queried for an action given the current state of the world. That action is taken and a resulting state is observed. The Critic is then asked to critique this behavior which is returned in a CritiqueResult object and then passed along to the Actor so that the actor may update is behavior accordingly.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent
  LearningAgent.LearningAgentBookKeeping

Field Summary

Fields
Modifier and Type	Field and Description
`protected Actor`	`actor` The actor component to use.
`protected Critic`	`critic` The critic component to use
`protected java.util.LinkedList<EpisodeAnalysis>`	`episodeHistory` The saved and most recent learning episodes this agent has performed.
`protected int`	`maxEpisodeSize` The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted to Integer.MAX_VALUE.
`protected int`	`numEpisodesForPlanning` The number of simulated learning episodes to use when the `planFromState(State)` method is called.
`protected int`	`numEpisodesToStore` The number of most recent learning episodes to store.

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`ActorCritic(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, Actor actor, Critic critic)` Initializes the learning algorithm.
`ActorCritic(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, Actor actor, Critic critic, int maxEpisodeSize)` Initializes the learning algorithm.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addNonDomainReferencedAction(Action a)` Adds an additional action the planner that is not included in the domain definition.
`java.util.List<EpisodeAnalysis>`	`getAllStoredLearningEpisodes()` Returns all saved `EpisodeAnalysis` objects of which the agent has kept track.
`EpisodeAnalysis`	`getLastLearningEpisode()` Returns the last learning episode of the agent.
`Policy`	`getPolicy()` Returns the policy/actor of this learning algorithm.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState)` Causes the agent to perform a learning episode starting in the given initial state.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState, int maxSteps)` Causes the agent to perform a learning episode starting in the given initial state.
`void`	`setNumEpisodesToStore(int numEps)` Tells the agent how many `EpisodeAnalysis` objects representing learning episodes to internally store.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - actor
```
protected Actor actor
```
    The actor component to use.
  - critic
```
protected Critic critic
```
    The critic component to use
  - maxEpisodeSize
```
protected int maxEpisodeSize
```
    The maximum number of steps of an episode before the agent will manually terminate it.This is defaulted to Integer.MAX_VALUE.
  - numEpisodesForPlanning
```
protected int numEpisodesForPlanning
```
    The number of simulated learning episodes to use when the planFromState(State) method is called.
  - episodeHistory
```
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
```
    The saved and most recent learning episodes this agent has performed.
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of most recent learning episodes to store.
- Constructor Detail
  - ActorCritic
```
public ActorCritic(Domain domain,
           RewardFunction rf,
           TerminalFunction tf,
           double gamma,
           Actor actor,
           Critic critic)
```
    Initializes the learning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function to use
    tf - the terminal state function to use
    gamma - the discount factor
    actor - the actor component to use to select actions
    critic - the critic component to use to critique
  - ActorCritic
```
public ActorCritic(Domain domain,
           RewardFunction rf,
           TerminalFunction tf,
           double gamma,
           Actor actor,
           Critic critic,
           int maxEpisodeSize)
```
    Initializes the learning algorithm.
    
    Parameters:
    domain - the domain in which to learn
    rf - the reward function to use
    tf - the terminal state function to use
    gamma - the discount factor
    actor - the actor component to use to select actions
    critic - the critic component to use to critique
    maxEpisodeSize - the maximum number of steps the agent will take in a learning episode before the agent gives up.
- Method Detail
  - addNonDomainReferencedAction
```
public void addNonDomainReferencedAction(Action a)
```
    Description copied from class: OOMDPPlanner
    
    Adds an additional action the planner that is not included in the domain definition. For instance, an Option should be added using this method.
    
    Overrides:
    
    addNonDomainReferencedAction in class OOMDPPlanner
    
    Parameters:
    a - the action to add to the planner
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached or if the agent decides to determinate the episode (e.g., by having an internal parameter set for a maximum number of steps in an episode).
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState,
                                     int maxSteps)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached, if the agent decides to determinate the episode, or if the number of steps reaches the provided threshold.
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    maxSteps - the maximum number of steps in the episode
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - getLastLearningEpisode
```
public EpisodeAnalysis getLastLearningEpisode()
```
    Description copied from interface: LearningAgent
    
    Returns the last learning episode of the agent.
    
    Specified by:
    
    getLastLearningEpisode in interface LearningAgent
    
    Returns:
    the last learning episode of the agent.
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
    Description copied from interface: LearningAgent
    
    Tells the agent how many EpisodeAnalysis objects representing learning episodes to internally store. For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number has nothing to do with how learning is performed; it is purely for performance gathering.
    
    Specified by:
    
    setNumEpisodesToStore in interface LearningAgent
    
    Parameters:
    numEps - the number of learning episodes to remember.
  - getAllStoredLearningEpisodes
```
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
```
    Description copied from interface: LearningAgent
    
    Returns all saved EpisodeAnalysis objects of which the agent has kept track.
    
    Specified by:
    
    getAllStoredLearningEpisodes in interface LearningAgent
    
    Returns:
    all saved EpisodeAnalysis objects of which the agent has kept track.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner
  - getPolicy
```
public Policy getPolicy()
```
    Returns the policy/actor of this learning algorithm. Note that all Actor objects are also Policy objects.
    
    Returns:
    the policy/actor of this learning algorithm.

Class ActorCritic

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

actor

critic

maxEpisodeSize

numEpisodesForPlanning

episodeHistory

numEpisodesToStore

Constructor Detail

ActorCritic

ActorCritic

Method Detail

addNonDomainReferencedAction

runLearningEpisodeFrom

runLearningEpisodeFrom

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

planFromState

resetPlannerResults

getPolicy