TDLambda

java.lang.Object
- burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda

All Implemented Interfaces:

Critic, ValueFunction

Direct Known Subclasses:

TimeIndexedTDLambda
```
public class TDLambda
extends java.lang.Object
implements Critic, ValueFunction
```
An implementation of TDLambda that can be used as a critic for ActorCritic algorithms [1].
1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class TDLambda.StateEligibilityTrace
A data structure for storing the elements of an eligibility trace.

Nested Classes
Modifier and Type	Class and Description
`static class`	`TDLambda.StateEligibilityTrace` A data structure for storing the elements of an eligibility trace.

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`gamma` The discount factor
`protected HashableStateFactory`	`hashingFactory` The state hashing factor used for hashing states and performing state equality checks.
`protected double`	`lambda` Indicates the strength of eligibility traces.
`protected LearningRate`	`learningRate` The learning rate function that affects how quickly the estimated value function changes.
`protected RewardFunction`	`rf` The reward function used for learning.
`protected TerminalFunction`	`tf` The state termination function to indicate end states
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.
`protected java.util.LinkedList<TDLambda.StateEligibilityTrace>`	`traces` The eligibility traces for the current episode.
`protected java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue>`	`vIndex` The state value function.
`protected ValueFunctionInitialization`	`vInitFunction` Defines how the value function is initialized for unvisited states

Constructor Summary

Constructors
Constructor and Description
`TDLambda(RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double learningRate, double vinit, double lambda)` Initializes the algorithm.
`TDLambda(RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double learningRate, ValueFunctionInitialization vinit, double lambda)` Initializes the algorithm.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addNonDomainReferencedAction(Action a)` This method allows the critic to critique actions that are not apart of the domain definition.
`CritiqueResult`	`critiqueAndUpdate(State s, GroundedAction ga, State sprime)` This method's implementation provides the critique for some specific instance of the behavior.
`void`	`endEpisode()` This method is called whenever a learning episode terminates
`protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue`	`getV(HashableState sh)` Returns the `TDLambda.VValue` object (storing the value) for a given hashed stated.
`void`	`initializeEpisode(State s)` This method is called whenever a new learning episode begins
`void`	`resetData()` Used to reset any data that was created/modified during learning so that learning can be begin anew.
`void`	`setLearningRate(LearningRate lr)` Sets the learning rate function to use.
`void`	`setRewardFunction(RewardFunction rf)` Sets the reward function to use.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - rf
```
protected RewardFunction rf
```
    The reward function used for learning.
  - tf
```
protected TerminalFunction tf
```
    The state termination function to indicate end states
  - gamma
```
protected double gamma
```
    The discount factor
  - hashingFactory
```
protected HashableStateFactory hashingFactory
```
    The state hashing factor used for hashing states and performing state equality checks.
  - learningRate
```
protected LearningRate learningRate
```
    The learning rate function that affects how quickly the estimated value function changes.
  - vInitFunction
```
protected ValueFunctionInitialization vInitFunction
```
    Defines how the value function is initialized for unvisited states
  - lambda
```
protected double lambda
```
    Indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
  - vIndex
```
protected java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue> vIndex
```
    The state value function.
  - traces
```
protected java.util.LinkedList<TDLambda.StateEligibilityTrace> traces
```
    The eligibility traces for the current episode.
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - TDLambda
```
public TDLambda(RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        HashableStateFactory hashingFactory,
        double learningRate,
        double vinit,
        double lambda)
```
    Initializes the algorithm.
    
    Parameters:
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    vinit - a constant value function initialization value to use.
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
  - TDLambda
```
public TDLambda(RewardFunction rf,
        TerminalFunction tf,
        double gamma,
        HashableStateFactory hashingFactory,
        double learningRate,
        ValueFunctionInitialization vinit,
        double lambda)
```
    Initializes the algorithm.
    
    Parameters:
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factory to use for hashing states and performing equality checks.
    learningRate - the learning rate that affects how quickly the estimated value function is adjusted.
    vinit - a method of initializing the value function for previously unvisited states.
    lambda - indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
- Method Detail
  - addNonDomainReferencedAction
```
public void addNonDomainReferencedAction(Action a)
```
    Description copied from interface: Critic
    
    This method allows the critic to critique actions that are not apart of the domain definition.
    
    Specified by:
    
    addNonDomainReferencedAction in interface Critic
    
    Parameters:
    a - a an action not apart of the of the domain definition that this critic should be able to crique.
  - setRewardFunction
```
public void setRewardFunction(RewardFunction rf)
```
    Sets the reward function to use.
    
    Parameters:
    rf -
  - initializeEpisode
```
public void initializeEpisode(State s)
```
    Description copied from interface: Critic
    
    This method is called whenever a new learning episode begins
    
    Specified by:
    
    initializeEpisode in interface Critic
    
    Parameters:
    s - the initial state of the new learning episode
  - endEpisode
```
public void endEpisode()
```
    Description copied from interface: Critic
    
    This method is called whenever a learning episode terminates
    
    Specified by:
    
    endEpisode in interface Critic
  - setLearningRate
```
public void setLearningRate(LearningRate lr)
```
    Sets the learning rate function to use.
    
    Parameters:
    lr - the learning rate function to use.
  - critiqueAndUpdate
```
public CritiqueResult critiqueAndUpdate(State s,
                               GroundedAction ga,
                               State sprime)
```
    Description copied from interface: Critic
    
    This method's implementation provides the critique for some specific instance of the behavior.
    
    Specified by:
    
    critiqueAndUpdate in interface Critic
    
    Parameters:
    s - an input state
    ga - an action taken in s
    sprime - the state the agent transitioned to for taking action ga in state s
    
    Returns:
    the critique of this behavior.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    s - the state to evaluate.
    
    Returns:
    the value function evaluation of the given state.
  - resetData
```
public void resetData()
```
    Description copied from interface: Critic
    
    Used to reset any data that was created/modified during learning so that learning can be begin anew.
    
    Specified by:
    
    resetData in interface Critic
  - getV
```
protected burlap.behavior.singleagent.learning.actorcritic.critics.TDLambda.VValue getV(HashableState sh)
```
    Returns the TDLambda.VValue object (storing the value) for a given hashed stated.
    
    Parameters:
    sh - the hased state for which the value should be returned.
    
    Returns:
    the TDLambda.VValue object (storing the value) for the given hashed stated.

Class TDLambda

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

rf

tf

gamma

hashingFactory

learningRate

vInitFunction

lambda

vIndex

traces

totalNumberOfSteps

Constructor Detail

TDLambda

TDLambda

Method Detail

addNonDomainReferencedAction

setRewardFunction

initializeEpisode

endEpisode

setLearningRate

critiqueAndUpdate

value

resetData

getV