TimeIndexedTDLambda

java.lang.Object
- burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
- - burlap.behavior.singleagent.learning.actorcritic.critics.TimeIndexedTDLambda

All Implemented Interfaces:

Critic, ValueFunction
```
public class TimeIndexedTDLambda
extends TDLambda
```
An implementation of TDLambda that can be used as a critic for ActorCritic algorithms [1], except that this class treats states at different depths as unique states. In general the typical TDLambda method is recommend unless a special Actor object that exploits the time information is to be used as well.
1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`TimeIndexedTDLambda.StateTimeElibilityTrace` Extends the standard `TDLambda.StateEligibilityTrace` to include time/depth information.

Nested classes/interfaces inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
TDLambda.StateEligibilityTrace

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`curTime` The current time index / depth of the current episode
`protected int`	`maxEpisodeSize` The maximum number of steps possible in an episode.
`protected java.util.List<java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue>>`	`vTIndex` The time/depth indexed value function

Fields inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
gamma, hashingFactory, lambda, learningRate, rf, tf, totalNumberOfSteps, traces, vIndex, vInitFunction

Constructor Summary

Constructors
Constructor and Description
`TimeIndexedTDLambda(RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double learningRate, double vinit, double lambda)` Initializes the algorithm.
`TimeIndexedTDLambda(RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double learningRate, double vinit, double lambda, int maxEpisodeSize)` Initializes the algorithm.
`TimeIndexedTDLambda(RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double learningRate, ValueFunctionInitialization vinit, double lambda, int maxEpisodeSize)` Initializes the algorithm.

Method Summary

Methods
Modifier and Type	Method and Description
`CritiqueResult`	`critiqueAndUpdate(State s, GroundedAction ga, State sprime)` This method's implementation provides the critique for some specific instance of the behavior.
`void`	`endEpisode()` This method is called whenever a learning episode terminates
`int`	`getCurTime()` Returns the current time/depth of the current episodes
`protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue`	`getV(HashableState sh, int t)` Returns the `TDLambda.VValue` object (storing the value) for a given hashed stated at the specified time/depth.
`void`	`initializeEpisode(State s)` This method is called whenever a new learning episode begins
`void`	`resetData()` Used to reset any data that was created/modified during learning so that learning can be begin anew.
`void`	`setCurTime(int t)` Sets the time/depth of the current episode.

Methods inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda
addNonDomainReferencedAction, getV, setLearningRate, setRewardFunction, value

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - vTIndex
```
protected java.util.List<java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue>> vTIndex
```
    The time/depth indexed value function
  - curTime
```
protected int curTime
```
    The current time index / depth of the current episode
  - maxEpisodeSize
```
protected int maxEpisodeSize
```
    The maximum number of steps possible in an episode.
- Constructor Detail
  - TimeIndexedTDLambda
```
public TimeIndexedTDLambda(RewardFunction rf,
                   TerminalFunction tf,
                   double gamma,
                   HashableStateFactory hashingFactory,
                   double learningRate,
                   double vinit,
                   double lambda)
```
    Initializes the algorithm.
    
    Parameters:
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    vinit - a constant value function initialization value to use.
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
  - TimeIndexedTDLambda
```
public TimeIndexedTDLambda(RewardFunction rf,
                   TerminalFunction tf,
                   double gamma,
                   HashableStateFactory hashingFactory,
                   double learningRate,
                   double vinit,
                   double lambda,
                   int maxEpisodeSize)
```
    Initializes the algorithm.
    
    Parameters:
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    vinit - a constant value function initialization value to use.
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
    maxEpisodeSize - the maximum number of steps possible in an episode
  - TimeIndexedTDLambda
```
public TimeIndexedTDLambda(RewardFunction rf,
                   TerminalFunction tf,
                   double gamma,
                   HashableStateFactory hashingFactory,
                   double learningRate,
                   ValueFunctionInitialization vinit,
                   double lambda,
                   int maxEpisodeSize)
```
    Initializes the algorithm.
    
    Parameters:
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    vinit - a method of initializing the value function for previously unvisited states.
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
    maxEpisodeSize - the maximum number of steps possible in an episode
- Method Detail
  - getCurTime
```
public int getCurTime()
```
    Returns the current time/depth of the current episodes
    
    Returns:
    the current time/depth of the current episodes
  - setCurTime
```
public void setCurTime(int t)
```
    Sets the time/depth of the current episode.
    
    Parameters:
    t - the time/depth of the current episode.
  - initializeEpisode
```
public void initializeEpisode(State s)
```
    Description copied from interface: Critic
    
    This method is called whenever a new learning episode begins
    
    Specified by:
    
    initializeEpisode in interface Critic
    
    Overrides:
    
    initializeEpisode in class TDLambda
    
    Parameters:
    s - the initial state of the new learning episode
  - endEpisode
```
public void endEpisode()
```
    Description copied from interface: Critic
    
    This method is called whenever a learning episode terminates
    
    Specified by:
    
    endEpisode in interface Critic
    
    Overrides:
    
    endEpisode in class TDLambda
  - critiqueAndUpdate
```
public CritiqueResult critiqueAndUpdate(State s,
                               GroundedAction ga,
                               State sprime)
```
    Description copied from interface: Critic
    
    This method's implementation provides the critique for some specific instance of the behavior.
    
    Specified by:
    
    critiqueAndUpdate in interface Critic
    
    Overrides:
    
    critiqueAndUpdate in class TDLambda
    
    Parameters:
    s - an input state
    ga - an action taken in s
    sprime - the state the agent transitioned to for taking action ga in state s
    
    Returns:
    the critique of this behavior.
  - getV
```
protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue getV(HashableState sh,
                                                                            int t)
```
    Returns the TDLambda.VValue object (storing the value) for a given hashed stated at the specified time/depth.
    
    Parameters:
    sh - the hashed state for which the value should be returned.
    t - the time/depth at which the state is visited
    
    Returns:
    the TDLambda.VValue object (storing the value) for a given hashed stated at the specified time/depth
  - resetData
```
public void resetData()
```
    Description copied from interface: Critic
    
    Used to reset any data that was created/modified during learning so that learning can be begin anew.
    
    Specified by:
    
    resetData in interface Critic
    
    Overrides:
    
    resetData in class TDLambda

Class TimeIndexedTDLambda

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

Field Summary

Fields inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

Methods inherited from class java.lang.Object

Field Detail

vTIndex

curTime

maxEpisodeSize

Constructor Detail

TimeIndexedTDLambda

TimeIndexedTDLambda

TimeIndexedTDLambda

Method Detail

getCurTime

setCurTime

initializeEpisode

endEpisode

critiqueAndUpdate

getV

resetData