ARTDP

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.learning.modellearning.artdp.ARTDP

All Implemented Interfaces:

LearningAgent, QComputablePlanner
```
public class ARTDP
extends OOMDPPlanner
implements QComputablePlanner, LearningAgent
```
This class provides an implementation of Adapative Realtime Dynamic Programming [1]. By default, a tabular model will be used and a boltzmann distribution with a temperature of 0.1 will be used. A different model can be provided in the constructor as well as the value function initialization used. The policy followed may be set with a mutator (setPolicy(PlannerDerivedPolicy)). The Q-value assigned to state-action pairs for entirely untried transitions is reproted as that returned by the value function initializer provided. In general, value function initialization should always be optimistic. 1.Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected class ARTDP.ARTDPPlanner
The value funciton planner that operates on the modeled world.
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent
  LearningAgent.LearningAgentBookKeeping

Nested Classes
Modifier and Type	Class and Description
`protected class`	`ARTDP.ARTDPPlanner` The value funciton planner that operates on the modeled world.

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.LinkedList<EpisodeAnalysis>`	`episodeHistory` the saved previous learning episodes
`protected int`	`maxNumSteps` The maximum number of learning steps per episode before the agent gives up
`protected Model`	`model` The model of the world that is being learned.
`protected ValueFunctionPlanner`	`modelPlanner` The planner used on the modeled world to update the value function
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.
`protected Policy`	`policy` the policy to follow

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`ARTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, double vInit)` Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
`ARTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, Model model, ValueFunctionInitialization vInit)` Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1.
`ARTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, ValueFunctionInitialization vInit)` Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.

Method Summary

Methods
Modifier and Type	Method and Description
`java.util.List<EpisodeAnalysis>`	`getAllStoredLearningEpisodes()` Returns all saved `EpisodeAnalysis` objects of which the agent has kept track.
`EpisodeAnalysis`	`getLastLearningEpisode()` Returns the last learning episode of the agent.
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState)` Causes the agent to perform a learning episode starting in the given initial state.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState, int maxSteps)` Causes the agent to perform a learning episode starting in the given initial state.
`void`	`setNumEpisodesToStore(int numEps)` Tells the agent how many `EpisodeAnalysis` objects representing learning episodes to internally store.
`void`	`setPolicy(PlannerDerivedPolicy policy)` Sets the policy to the provided one.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - model
```
protected Model model
```
    The model of the world that is being learned.
  - modelPlanner
```
protected ValueFunctionPlanner modelPlanner
```
    The planner used on the modeled world to update the value function
  - policy
```
protected Policy policy
```
    the policy to follow
  - episodeHistory
```
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
```
    the saved previous learning episodes
  - maxNumSteps
```
protected int maxNumSteps
```
    The maximum number of learning steps per episode before the agent gives up
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
- Constructor Detail
  - ARTDP
```
public ARTDP(Domain domain,
     RewardFunction rf,
     TerminalFunction tf,
     double gamma,
     StateHashFactory hashingFactory,
     double vInit)
```
    Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
    
    Parameters:
    domain - the domain
    rf - the reward function
    tf - the termianl function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for the tabular model and the planning
    vInit - the constant value function initialization to use; should be optimisitc.
  - ARTDP
```
public ARTDP(Domain domain,
     RewardFunction rf,
     TerminalFunction tf,
     double gamma,
     StateHashFactory hashingFactory,
     ValueFunctionInitialization vInit)
```
    Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
    
    Parameters:
    domain - the domain
    rf - the reward function
    tf - the termianl function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for the tabular model and the planning
    vInit - the value function initialization to use; should be optimisitc.
  - ARTDP
```
public ARTDP(Domain domain,
     RewardFunction rf,
     TerminalFunction tf,
     double gamma,
     StateHashFactory hashingFactory,
     Model model,
     ValueFunctionInitialization vInit)
```
    Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1.
    
    Parameters:
    domain - the domain
    rf - the reward function
    tf - the termianl function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for the tabular model and the planning
    model - the model algorithm to use
    vInit - the constant value function initialization to use; should be optimisitc.
- Method Detail
  - setPolicy
```
public void setPolicy(PlannerDerivedPolicy policy)
```
    Sets the policy to the provided one. Should be a policy that operates on a QComputablePlanner. Will automatically set its Q-source to this object.
    
    Parameters:
    policy - the policy to use.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached or if the agent decides to determinate the episode (e.g., by having an internal parameter set for a maximum number of steps in an episode).
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState,
                                     int maxSteps)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached, if the agent decides to determinate the episode, or if the number of steps reaches the provided threshold.
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    maxSteps - the maximum number of steps in the episode
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - getLastLearningEpisode
```
public EpisodeAnalysis getLastLearningEpisode()
```
    Description copied from interface: LearningAgent
    
    Returns the last learning episode of the agent.
    
    Specified by:
    
    getLastLearningEpisode in interface LearningAgent
    
    Returns:
    the last learning episode of the agent.
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
    Description copied from interface: LearningAgent
    
    Tells the agent how many EpisodeAnalysis objects representing learning episodes to internally store. For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number has nothing to do with how learning is performed; it is purely for performance gathering.
    
    Specified by:
    
    setNumEpisodesToStore in interface LearningAgent
    
    Parameters:
    numEps - the number of learning episodes to remember.
  - getAllStoredLearningEpisodes
```
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
```
    Description copied from interface: LearningAgent
    
    Returns all saved EpisodeAnalysis objects of which the agent has kept track.
    
    Specified by:
    
    getAllStoredLearningEpisodes in interface LearningAgent
    
    Returns:
    all saved EpisodeAnalysis objects of which the agent has kept track.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner

Class ARTDP

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

model

modelPlanner

policy

episodeHistory

maxNumSteps

numEpisodesToStore

Constructor Detail

ARTDP

ARTDP

ARTDP

Method Detail

setPolicy

runLearningEpisodeFrom

runLearningEpisodeFrom

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

planFromState

getQs

getQ

resetPlannerResults