TDLambda

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

All Implemented Interfaces:

Critic, MDPSolverInterface, ValueFunction

Direct Known Subclasses:

TimeIndexedTDLambda
```
public class TDLambda
extends MDPSolver
implements Critic, ValueFunction
```
An implementation of TDLambda that can be used as a critic for ActorCritic algorithms [1].
1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class TDLambda.StateEligibilityTrace
A data structure for storing the elements of an eligibility trace.

Nested Classes
Modifier and Type	Class and Description
`static class`	`TDLambda.StateEligibilityTrace` A data structure for storing the elements of an eligibility trace.

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`lambda` Indicates the strength of eligibility traces.
`protected LearningRate`	`learningRate`
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.
`protected java.util.LinkedList<TDLambda.StateEligibilityTrace>`	`traces` The eligibility traces for the current episode.
`protected java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue>`	`vIndex` The state value function.
`protected ValueFunction`	`vInitFunction` Defines how the value function is initialized for unvisited states

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`TDLambda(double gamma, HashableStateFactory hashingFactory, double learningRate, double vinit, double lambda)` Initializes the algorithm.
`TDLambda(double gamma, HashableStateFactory hashingFactory, double learningRate, ValueFunction vinit, double lambda)` Initializes the algorithm.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`CritiqueResult`	`critiqueAndUpdate(EnvironmentOutcome eo)` This method's implementation provides the critique for some specific instance of the behavior.
`void`	`endEpisode()` This method is called whenever a learning episode terminates
`protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue`	`getV(HashableState sh)` Returns the `TDLambda.VValue` object (storing the value) for a given hashed stated.
`void`	`initializeEpisode(State s)` This method is called whenever a new learning episode begins
`void`	`resetData()` Used to reset any data that was created/modified during learning so that learning can be begin anew.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`void`	`setLearningRate(LearningRate lr)` Sets the learning rate function to use.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.learning.actorcritic.Critic
addActionType

- Field Detail
  - learningRate
```
protected LearningRate learningRate
```
  - vInitFunction
```
protected ValueFunction vInitFunction
```
    Defines how the value function is initialized for unvisited states
  - lambda
```
protected double lambda
```
    Indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
  - vIndex
```
protected java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue> vIndex
```
    The state value function.
  - traces
```
protected java.util.LinkedList<TDLambda.StateEligibilityTrace> traces
```
    The eligibility traces for the current episode.
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - TDLambda
```
public TDLambda(double gamma,
                HashableStateFactory hashingFactory,
                double learningRate,
                double vinit,
                double lambda)
```
    Initializes the algorithm.
    
    Parameters:
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    
    vinit - a constant value function initialization value to use.
    
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
  - TDLambda
```
public TDLambda(double gamma,
                HashableStateFactory hashingFactory,
                double learningRate,
                ValueFunction vinit,
                double lambda)
```
    Initializes the algorithm.
    
    Parameters:
    
    gamma - the discount factor
    
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    
    vinit - a method of initializing the value function for previously unvisited states.
    
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
- Method Detail
  - initializeEpisode
```
public void initializeEpisode(State s)
```
    Description copied from interface: Critic
    
    This method is called whenever a new learning episode begins
    
    Specified by:
    
    initializeEpisode in interface Critic
    
    Parameters:
    
    s - the initial state of the new learning episode
  - endEpisode
```
public void endEpisode()
```
    Description copied from interface: Critic
    
    This method is called whenever a learning episode terminates
    
    Specified by:
    
    endEpisode in interface Critic
  - setLearningRate
```
public void setLearningRate(LearningRate lr)
```
    Sets the learning rate function to use.
    
    Parameters:
    
    lr - the learning rate function to use.
  - critiqueAndUpdate
```
public CritiqueResult critiqueAndUpdate(EnvironmentOutcome eo)
```
    Description copied from interface: Critic
    
    This method's implementation provides the critique for some specific instance of the behavior.
    
    Specified by:
    
    critiqueAndUpdate in interface Critic
    
    Parameters:
    
    eo - the EnvironmentOutcome specifying the event
    
    Returns:
    
    the critique of this behavior.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    
    s - the state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver
  - resetData
```
public void resetData()
```
    Description copied from interface: Critic
    
    Used to reset any data that was created/modified during learning so that learning can be begin anew.
    
    Specified by:
    
    resetData in interface Critic
  - getV
```
protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue getV(HashableState sh)
```
    Returns the TDLambda.VValue object (storing the value) for a given hashed stated.
    
    Parameters:
    
    sh - the hased state for which the value should be returned.
    
    Returns:
    
    the TDLambda.VValue object (storing the value) for the given hashed stated.

Class TDLambda

Nested Class Summary

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.learning.actorcritic.Critic

Field Detail

learningRate

vInitFunction

lambda

vIndex

traces

totalNumberOfSteps

Constructor Detail

TDLambda

TDLambda

Method Detail

initializeEpisode

endEpisode

setLearningRate

critiqueAndUpdate

value

resetSolver

resetData

getV