ARTDP

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.learning.modellearning.artdp.ARTDP

All Implemented Interfaces:

LearningAgent, MDPSolverInterface, QFunction, QProvider, ValueFunction
```
public class ARTDP
extends MDPSolver
implements QProvider, LearningAgent
```
This class provides an implementation of Adaptive Realtime Dynamic Programming [1]. By default, a tabular model will be used and a boltzmann distribution with a temperature of 0.1 will be used. A different model can be provided in the constructor as well as the value function initialization used. The policy followed may be set with a setter (setPolicy(burlap.behavior.policy.SolverDerivedPolicy)). The Q-value assigned to state-action pairs for entirely untried transitions is reported as that returned by the value function initializer provided. In general, value function initialization should always be optimistic. 1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.LinkedList<Episode>`	`episodeHistory` the saved previous learning episodes
`protected int`	`maxNumSteps` The maximum number of learning steps per episode before the agent gives up
`protected LearnedModel`	`model` The model of the world that is being learned.
`protected DynamicProgramming`	`modelPlanner` The valueFunction used on the modeled world to update the value function
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.
`protected Policy`	`policy` the policy to follow

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, double vInit)` Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
`ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, LearnedModel model, ValueFunction vInit)` Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1.
`ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, ValueFunction vInit)` Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.util.List<Episode>`	`getAllStoredLearningEpisodes()`
`Episode`	`getLastLearningEpisode()`
`double`	`qValue(State s, Action a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`qValues(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`Episode`	`runLearningEpisode(Environment env)`
`Episode`	`runLearningEpisode(Environment env, int maxSteps)`
`void`	`setNumEpisodesToStore(int numEps)`
`void`	`setPolicy(SolverDerivedPolicy policy)` Sets the policy to the provided one.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - model
```
protected LearnedModel model
```
    The model of the world that is being learned.
  - modelPlanner
```
protected DynamicProgramming modelPlanner
```
    The valueFunction used on the modeled world to update the value function
  - policy
```
protected Policy policy
```
    the policy to follow
  - episodeHistory
```
protected java.util.LinkedList<Episode> episodeHistory
```
    the saved previous learning episodes
  - maxNumSteps
```
protected int maxNumSteps
```
    The maximum number of learning steps per episode before the agent gives up
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
- Constructor Detail
  - ARTDP
```
public ARTDP(SADomain domain,
             double gamma,
             HashableStateFactory hashingFactory,
             double vInit)
```
    Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
    
    Parameters:
    
    domain - the domain
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for the tabular model and the planning
    
    vInit - the constant value function initialization to use; should be optimisitc.
  - ARTDP
```
public ARTDP(SADomain domain,
             double gamma,
             HashableStateFactory hashingFactory,
             ValueFunction vInit)
```
    Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1.
    
    Parameters:
    
    domain - the domain
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for the tabular model and the planning
    
    vInit - the value function initialization to use; should be optimisitc.
  - ARTDP
```
public ARTDP(SADomain domain,
             double gamma,
             HashableStateFactory hashingFactory,
             LearnedModel model,
             ValueFunction vInit)
```
    Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1.
    
    Parameters:
    
    domain - the domain
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for the tabular model and the planning
    
    model - the model algorithm to use
    
    vInit - the constant value function initialization to use; should be optimisitc.
- Method Detail
  - setPolicy
```
public void setPolicy(SolverDerivedPolicy policy)
```
    Sets the policy to the provided one. Should be a policy that operates on a QProvider. Will automatically set its Q-source to this object.
    
    Parameters:
    
    policy - the policy to use.
  - runLearningEpisode
```
public Episode runLearningEpisode(Environment env)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - runLearningEpisode
```
public Episode runLearningEpisode(Environment env,
                                  int maxSteps)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - getLastLearningEpisode
```
public Episode getLastLearningEpisode()
```
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
  - getAllStoredLearningEpisodes
```
public java.util.List<Episode> getAllStoredLearningEpisodes()
```
  - qValues
```
public java.util.List<QValue> qValues(State s)
```
    Description copied from interface: QProvider
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    qValues in interface QProvider
    
    Parameters:
    
    s - the state for which Q-values are to be returned.
    
    Returns:
    
    a List of QValue objects for ever permissible action for the given input state.
  - qValue
```
public double qValue(State s,
                     Action a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    qValue in interface QFunction
    
    Parameters:
    
    s - the input state
    
    a - the input action
    
    Returns:
    
    the QValue for the given state-action pair.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    
    s - the state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver

Class ARTDP

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Field Detail

model

modelPlanner

policy

episodeHistory

maxNumSteps

numEpisodesToStore

Constructor Detail

ARTDP

ARTDP

ARTDP

Method Detail

setPolicy

runLearningEpisode

runLearningEpisode

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes

qValues

qValue

value

resetSolver