TimeIndexedTDLambda

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
  - - burlap.behavior.singleagent.learning.actorcritic.critics.TimeIndexedTDLambda

All Implemented Interfaces:

Critic, MDPSolverInterface, ValueFunction
```
public class TimeIndexedTDLambda
extends TDLambda
```
An implementation of TDLambda that can be used as a critic for ActorCritic algorithms [1], except that this class treats states at different depths as unique states. In general the typical TDLambda method is recommend unless a special Actor object that exploits the time information is to be used as well.
1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`TimeIndexedTDLambda.StateTimeElibilityTrace` Extends the standard `TDLambda.StateEligibilityTrace` to include time/depth information.

Nested classes/interfaces inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
TDLambda.StateEligibilityTrace

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`curTime` The current time index / depth of the current episode
`protected int`	`maxEpisodeSize` The maximum number of steps possible in an episode.
`protected java.util.List<java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue>>`	`vTIndex` The time/depth indexed value function

Fields inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
lambda, learningRate, totalNumberOfSteps, traces, vIndex, vInitFunction

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`TimeIndexedTDLambda(double gamma, HashableStateFactory hashingFactory, double learningRate, double vinit, double lambda)` Initializes the algorithm.
`TimeIndexedTDLambda(double gamma, HashableStateFactory hashingFactory, double learningRate, ValueFunction vinit, double lambda, int maxEpisodeSize)` Initializes the algorithm.
`TimeIndexedTDLambda(RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double learningRate, double vinit, double lambda, int maxEpisodeSize)` Initializes the algorithm.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`CritiqueResult`	`critiqueAndUpdate(EnvironmentOutcome eo)` This method's implementation provides the critique for some specific instance of the behavior.
`void`	`endEpisode()` This method is called whenever a learning episode terminates
`int`	`getCurTime()` Returns the current time/depth of the current episodes
`protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue`	`getV(HashableState sh, int t)` Returns the `TDLambda.VValue` object (storing the value) for a given hashed stated at the specified time/depth.
`void`	`initializeEpisode(State s)` This method is called whenever a new learning episode begins
`void`	`resetData()` Used to reset any data that was created/modified during learning so that learning can be begin anew.
`void`	`setCurTime(int t)` Sets the time/depth of the current episode.

Methods inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
getV, resetSolver, setLearningRate, value

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.learning.actorcritic.Critic
addActionType

- Field Detail
  - vTIndex
```
protected java.util.List<java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue>> vTIndex
```
    The time/depth indexed value function
  - curTime
```
protected int curTime
```
    The current time index / depth of the current episode
  - maxEpisodeSize
```
protected int maxEpisodeSize
```
    The maximum number of steps possible in an episode.
- Constructor Detail
  - TimeIndexedTDLambda
```
public TimeIndexedTDLambda(double gamma,
                           HashableStateFactory hashingFactory,
                           double learningRate,
                           double vinit,
                           double lambda)
```
    Initializes the algorithm.
    
    Parameters:
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    
    vinit - a constant value function initialization value to use.
    
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
  - TimeIndexedTDLambda
```
public TimeIndexedTDLambda(RewardFunction rf,
                           TerminalFunction tf,
                           double gamma,
                           HashableStateFactory hashingFactory,
                           double learningRate,
                           double vinit,
                           double lambda,
                           int maxEpisodeSize)
```
    Initializes the algorithm.
    
    Parameters:
    
    rf - the reward function
    
    tf - the terminal state function
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    
    vinit - a constant value function initialization value to use.
    
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
    
    maxEpisodeSize - the maximum number of steps possible in an episode
  - TimeIndexedTDLambda
```
public TimeIndexedTDLambda(double gamma,
                           HashableStateFactory hashingFactory,
                           double learningRate,
                           ValueFunction vinit,
                           double lambda,
                           int maxEpisodeSize)
```
    Initializes the algorithm.
    
    Parameters:
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    
    vinit - a method of initializing the value function for previously unvisited states.
    
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
    
    maxEpisodeSize - the maximum number of steps possible in an episode
- Method Detail
  - getCurTime
```
public int getCurTime()
```
    Returns the current time/depth of the current episodes
    
    Returns:
    
    the current time/depth of the current episodes
  - setCurTime
```
public void setCurTime(int t)
```
    Sets the time/depth of the current episode.
    
    Parameters:
    
    t - the time/depth of the current episode.
  - initializeEpisode
```
public void initializeEpisode(State s)
```
    Description copied from interface: Critic
    
    This method is called whenever a new learning episode begins
    
    Specified by:
    
    initializeEpisode in interface Critic
    
    Overrides:
    
    initializeEpisode in class TDLambda
    
    Parameters:
    
    s - the initial state of the new learning episode
  - endEpisode
```
public void endEpisode()
```
    Description copied from interface: Critic
    
    This method is called whenever a learning episode terminates
    
    Specified by:
    
    endEpisode in interface Critic
    
    Overrides:
    
    endEpisode in class TDLambda
  - critiqueAndUpdate
```
public CritiqueResult critiqueAndUpdate(EnvironmentOutcome eo)
```
    Description copied from interface: Critic
    
    This method's implementation provides the critique for some specific instance of the behavior.
    
    Specified by:
    
    critiqueAndUpdate in interface Critic
    
    Overrides:
    
    critiqueAndUpdate in class TDLambda
    
    Parameters:
    
    eo - the EnvironmentOutcome specifying the event
    
    Returns:
    
    the critique of this behavior.
  - getV
```
protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue getV(HashableState sh,
                                                                                        int t)
```
    Returns the TDLambda.VValue object (storing the value) for a given hashed stated at the specified time/depth.
    
    Parameters:
    
    sh - the hashed state for which the value should be returned.
    
    t - the time/depth at which the state is visited
    
    Returns:
    
    the TDLambda.VValue object (storing the value) for a given hashed stated at the specified time/depth
  - resetData
```
public void resetData()
```
    Description copied from interface: Critic
    
    Used to reset any data that was created/modified during learning so that learning can be begin anew.
    
    Specified by:
    
    resetData in interface Critic
    
    Overrides:
    
    resetData in class TDLambda

Class TimeIndexedTDLambda

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

Field Summary

Fields inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.learning.actorcritic.Critic

Field Detail

vTIndex

curTime

maxEpisodeSize

Constructor Detail

TimeIndexedTDLambda

TimeIndexedTDLambda

TimeIndexedTDLambda

Method Detail

getCurTime

setCurTime

initializeEpisode

endEpisode

critiqueAndUpdate

getV

resetData