LSPI

java.lang.Object
- burlap.behavior.singleagent.planning.OOMDPPlanner
- - burlap.behavior.singleagent.learning.lspi.LSPI

All Implemented Interfaces:

LearningAgent, QComputablePlanner
```
public class LSPI
extends OOMDPPlanner
implements QComputablePlanner, LearningAgent
```
This class implements the optimized version of last squares policy iteration [1] (runs in quadratic time of the number of state features). Unlike other planning and learning algorithms, it is reccomended that you use this class differently than the conventional ways. That is, rather than using the planFromState(State) or runLearningEpisodeFrom(State) methods, you should instead use a SARSCollector object to gather a bunch of example state-action-reward-state tuples that are then used for policy iteration. You can set the dataset to use using the setDataset(SARSData) method and then you can run LSPI on it using the runPolicyIteration(int, double) method. LSPI requires initializing a matrix to an identity matrix multiplied by some large positive constant (see the reference for more information). By default this constant is 100, but you can change it with the setIdentityScalar(double) method.
If you do use the planFromState(State) method, it will work by creating a SARSCollector.UniformRandomSARSCollector and collecting SARS data from the input state and then calling the runPolicyIteration(int, double) method. You can change the SARSCollector this method uses, the number of samples it acquires, the maximum weight change for PI termination, and the maximum number of policy iterations by using the setPlanningCollector(SARSCollector), setNumSamplesForPlanning(int), setMaxChange(double), and setMaxNumPlanningIterations(int) methods repsectively.
If you do use the runLearningEpisodeFrom(State) method (or the runLearningEpisodeFrom(State, int) method), it will work by following a learning policy for the episode and adding its observations to its dataset for its policy iteration. After enough new data has been acquired, policy iteration will be rereun. You can adjust the learning policy, the maximum number of allowed learning steps in an episode, and the minimum number of new observations until LSPI is rerun using the setLearningPolicy(Policy), setMaxLearningSteps(int), setMinNewStepsForLearningPI(int) methods respectively. The LSPI termination parameters are set using the same methods that you use for adjusting the results from the planFromState(State) method discussed above.
This data gathering and replanning behavior from learning episodes is not expected to be an especailly good choice. Therefore, if you want a better online data acquisition, you should consider subclassing this class and overriding the methods updateDatasetWithLearningEpisode(EpisodeAnalysis) and shouldRereunPolicyIteration(EpisodeAnalysis), or the runLearningEpisodeFrom(State, int) method itself.
1. Lagoudakis, Michail G., and Ronald Parr. "Least-squares policy iteration." The Journal of Machine Learning Research 4 (2003): 1107-1149.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected class LSPI.SSFeatures
Pair of the the state-action features and the next state-action features.
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner
  QComputablePlanner.QComputablePlannerHelper
- Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent
  LearningAgent.LearningAgentBookKeeping

Nested Classes
Modifier and Type	Class and Description
`protected class`	`LSPI.SSFeatures` Pair of the the state-action features and the next state-action features.

Field Summary

Fields
Modifier and Type	Field and Description
`protected SARSData`	`dataset` The SARS dataset on which LSPI is performed
`protected java.util.LinkedList<EpisodeAnalysis>`	`episodeHistory` the saved previous learning episodes
`protected FeatureDatabase`	`featureDatabase` The state feature database on which the linear VFA is performed
`protected double`	`identityScalar` The initial LSPI identity matrix scalar; default is 100.
`protected org.ejml.simple.SimpleMatrix`	`lastWeights` The last weight values set from LSTDQ
`protected Policy`	`learningPolicy` The learning policy followed in `runLearningEpisodeFrom(State)` method calls.
`protected double`	`maxChange` The maximum change in weights permitted to terminate LSPI.
`protected int`	`maxLearningSteps` The maximum number of learning steps in an episode when the `runLearningEpisodeFrom(State)` method is called.
`protected int`	`maxNumPlanningIterations` The maximum number of policy iterations permitted when LSPI is run from the `planFromState(State)` or `runLearningEpisodeFrom(State)` methods.
`protected int`	`minNewStepsForLearningPI` The minimum number of new observations received from learning episodes before LSPI will be run again.
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.
`protected int`	`numSamplesForPlanning` the number of samples that are acquired for this object's dataset when the `planFromState(State)` method is called.
`protected int`	`numStepsSinceLastLearningPI` Number of new observations received from learning episodes since LSPI was run
`protected SARSCollector`	`planningCollector` The data collector used by the `planFromState(State)` method.
`protected ValueFunctionApproximation`	`vfa` The object that performs value function approximation given the weights that are estimated

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`LSPI(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, FeatureDatabase fd)` Initializes for the given domain, reward function, terminal state function, discount factor and the feature database that provides the state features used by LSPI.

Method Summary

Methods
Modifier and Type	Method and Description
`protected java.util.List<GroundedAction>`	`gaListWrapper(AbstractGroundedAction ga)` Wraps a `GroundedAction` in a list of size 1.
`java.util.List<EpisodeAnalysis>`	`getAllStoredLearningEpisodes()` Returns all saved `EpisodeAnalysis` objects of which the agent has kept track.
`SARSData`	`getDataset()` Returns the dataset this object uses for LSPI
`FeatureDatabase`	`getFeatureDatabase()` Returns the feature database defining state features
`double`	`getIdentityScalar()` Returns the initial LSPI identity matrix scalar used
`EpisodeAnalysis`	`getLastLearningEpisode()` Returns the last learning episode of the agent.
`Policy`	`getLearningPolicy()` The learning policy followed by the `runLearningEpisodeFrom(State)` and `runLearningEpisodeFrom(State, int)` methods.
`double`	`getMaxChange()` The maximum change in weights required to terminate policy iteration when called from the `planFromState(State)`, `runLearningEpisodeFrom(State)` or `runLearningEpisodeFrom(State, int)` methods.
`int`	`getMaxLearningSteps()` The maximum number of learning steps permitted by the `runLearningEpisodeFrom(State)` method.
`int`	`getMaxNumPlanningIterations()` The maximum number of policy iterations that will be used by the `planFromState(State)` method.
`int`	`getMinNewStepsForLearningPI()` The minimum number of new learning observations before policy iteration is run again.
`int`	`getNumSamplesForPlanning()` Gets the number of SARS samples that will be gathered by the `planFromState(State)` method.
`SARSCollector`	`getPlanningCollector()` Gets the `SARSCollector` used by the `planFromState(State)` method for collecting data.
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`protected QValue`	`getQFromFeaturesFor(java.util.List<ActionApproximationResult> results, State s, GroundedAction ga)` Creates a Q-value object in which the Q-value is determined from VFA.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`org.ejml.simple.SimpleMatrix`	`LSTDQ()` Runs LSTDQ on this object's current `SARSData` dataset.
`protected org.ejml.simple.SimpleMatrix`	`phiConstructor(java.util.List<ActionFeaturesQuery> features, int nf)` Constructs the state-action feature vector as a `SimpleMatrix`.
`void`	`planFromState(State initialState)` This method will cause the planner to begin planning from the specified initial state
`void`	`resetPlannerResults()` Use this method to reset all planner results so that planning can be started fresh with a call to `OOMDPPlanner.planFromState(State)` as if no planning had ever been performed before.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState)` Causes the agent to perform a learning episode starting in the given initial state.
`EpisodeAnalysis`	`runLearningEpisodeFrom(State initialState, int maxSteps)` Causes the agent to perform a learning episode starting in the given initial state.
`void`	`runPolicyIteration(int numIterations, double maxChange)` Runs LSPI for either numIterations or until the change in the weight matrix is no greater than maxChange.
`void`	`setDataset(SARSData dataset)` Sets the SARS dataset this object will use for LSPI
`void`	`setFeatureDatabase(FeatureDatabase featureDatabase)` Sets the feature datbase defining state features
`void`	`setIdentityScalar(double identityScalar)` Sets the initial LSPI identity matrix scalar used.
`void`	`setLearningPolicy(Policy learningPolicy)` Sets the learning policy followed by the `runLearningEpisodeFrom(State)` and `runLearningEpisodeFrom(State, int)` methods.
`void`	`setMaxChange(double maxChange)` Sets the maximum change in weights required to terminate policy iteration when called from the `planFromState(State)`, `runLearningEpisodeFrom(State)` or `runLearningEpisodeFrom(State, int)` methods.
`void`	`setMaxLearningSteps(int maxLearningSteps)` Sets the maximum number of learning steps permitted by the `runLearningEpisodeFrom(State)` method.
`void`	`setMaxNumPlanningIterations(int maxNumPlanningIterations)` Sets the maximum number of policy iterations that will be used by the `planFromState(State)` method.
`void`	`setMinNewStepsForLearningPI(int minNewStepsForLearningPI)` Sets the minimum number of new learning observations before policy iteration is run again.
`void`	`setNumEpisodesToStore(int numEps)` Tells the agent how many `EpisodeAnalysis` objects representing learning episodes to internally store.
`void`	`setNumSamplesForPlanning(int numSamplesForPlanning)` Sets the number of SARS samples that will be gathered by the `planFromState(State)` method.
`void`	`setPlanningCollector(SARSCollector planningCollector)` Sets the `SARSCollector` used by the `planFromState(State)` method for collecting data.
`protected boolean`	`shouldRereunPolicyIteration(EpisodeAnalysis ea)` Returns whether LSPI should be rereun given the latest learning episode results.
`protected void`	`updateDatasetWithLearningEpisode(EpisodeAnalysis ea)` Updates this object's `SARSData` to include the results of a learning episode.

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - vfa
```
protected ValueFunctionApproximation vfa
```
    The object that performs value function approximation given the weights that are estimated
  - dataset
```
protected SARSData dataset
```
    The SARS dataset on which LSPI is performed
  - featureDatabase
```
protected FeatureDatabase featureDatabase
```
    The state feature database on which the linear VFA is performed
  - identityScalar
```
protected double identityScalar
```
    The initial LSPI identity matrix scalar; default is 100.
  - lastWeights
```
protected org.ejml.simple.SimpleMatrix lastWeights
```
    The last weight values set from LSTDQ
  - numSamplesForPlanning
```
protected int numSamplesForPlanning
```
    the number of samples that are acquired for this object's dataset when the planFromState(State) method is called.
  - maxChange
```
protected double maxChange
```
    The maximum change in weights permitted to terminate LSPI. Default is 1e-6.
  - planningCollector
```
protected SARSCollector planningCollector
```
    The data collector used by the planFromState(State) method.
  - maxNumPlanningIterations
```
protected int maxNumPlanningIterations
```
    The maximum number of policy iterations permitted when LSPI is run from the planFromState(State) or runLearningEpisodeFrom(State) methods.
  - learningPolicy
```
protected Policy learningPolicy
```
    The learning policy followed in runLearningEpisodeFrom(State) method calls. Default is 0.1 epsilon greedy.
  - maxLearningSteps
```
protected int maxLearningSteps
```
    The maximum number of learning steps in an episode when the runLearningEpisodeFrom(State) method is called. Default is INT_MAX.
  - numStepsSinceLastLearningPI
```
protected int numStepsSinceLastLearningPI
```
    Number of new observations received from learning episodes since LSPI was run
  - minNewStepsForLearningPI
```
protected int minNewStepsForLearningPI
```
    The minimum number of new observations received from learning episodes before LSPI will be run again.
  - episodeHistory
```
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
```
    the saved previous learning episodes
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
- Constructor Detail
  - LSPI
```
public LSPI(Domain domain,
    RewardFunction rf,
    TerminalFunction tf,
    double gamma,
    FeatureDatabase fd)
```
    Initializes for the given domain, reward function, terminal state function, discount factor and the feature database that provides the state features used by LSPI.
    
    Parameters:
    domain - the problem domain
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    fd - the feature database defining state features on which LSPI will run.
- Method Detail
  - setDataset
```
public void setDataset(SARSData dataset)
```
    Sets the SARS dataset this object will use for LSPI
    
    Parameters:
    dataset - the SARSA dataset
  - getDataset
```
public SARSData getDataset()
```
    Returns the dataset this object uses for LSPI
    
    Returns:
    the dataset this object uses for LSPI
  - getFeatureDatabase
```
public FeatureDatabase getFeatureDatabase()
```
    Returns the feature database defining state features
    
    Returns:
    the feature database defining state features
  - setFeatureDatabase
```
public void setFeatureDatabase(FeatureDatabase featureDatabase)
```
    Sets the feature datbase defining state features
    
    Parameters:
    featureDatabase - the feature database defining state features
  - getIdentityScalar
```
public double getIdentityScalar()
```
    Returns the initial LSPI identity matrix scalar used
    
    Returns:
    the initial LSPI identity matrix scalar used
  - setIdentityScalar
```
public void setIdentityScalar(double identityScalar)
```
    Sets the initial LSPI identity matrix scalar used.
    
    Parameters:
    identityScalar - the initial LSPI identity matrix scalar used.
  - getNumSamplesForPlanning
```
public int getNumSamplesForPlanning()
```
    Gets the number of SARS samples that will be gathered by the planFromState(State) method.
    
    Returns:
    the number of SARS samples that will be gathered by the planFromState(State) method.
  - setNumSamplesForPlanning
```
public void setNumSamplesForPlanning(int numSamplesForPlanning)
```
    Sets the number of SARS samples that will be gathered by the planFromState(State) method.
    
    Parameters:
    numSamplesForPlanning - the number of SARS samples that will be gathered by the planFromState(State) method.
  - getPlanningCollector
```
public SARSCollector getPlanningCollector()
```
    Gets the SARSCollector used by the planFromState(State) method for collecting data.
    
    Returns:
    the SARSCollector used by the planFromState(State) method for collecting data.
  - setPlanningCollector
```
public void setPlanningCollector(SARSCollector planningCollector)
```
    Sets the SARSCollector used by the planFromState(State) method for collecting data.
    
    Parameters:
    planningCollector - the SARSCollector used by the planFromState(State) method for collecting data.
  - getMaxNumPlanningIterations
```
public int getMaxNumPlanningIterations()
```
    The maximum number of policy iterations that will be used by the planFromState(State) method.
    
    Returns:
    the maximum number of policy iterations that will be used by the planFromState(State) method.
  - setMaxNumPlanningIterations
```
public void setMaxNumPlanningIterations(int maxNumPlanningIterations)
```
    Sets the maximum number of policy iterations that will be used by the planFromState(State) method.
    
    Parameters:
    maxNumPlanningIterations - the maximum number of policy iterations that will be used by the planFromState(State) method.
  - getLearningPolicy
```
public Policy getLearningPolicy()
```
    The learning policy followed by the runLearningEpisodeFrom(State) and runLearningEpisodeFrom(State, int) methods.
    
    Returns:
    learning policy followed by the runLearningEpisodeFrom(State) and runLearningEpisodeFrom(State, int) methods.
  - setLearningPolicy
```
public void setLearningPolicy(Policy learningPolicy)
```
    Sets the learning policy followed by the runLearningEpisodeFrom(State) and runLearningEpisodeFrom(State, int) methods.
    
    Parameters:
    learningPolicy - the learning policy followed by the runLearningEpisodeFrom(State) and runLearningEpisodeFrom(State, int) methods.
  - getMaxLearningSteps
```
public int getMaxLearningSteps()
```
    The maximum number of learning steps permitted by the runLearningEpisodeFrom(State) method.
    
    Returns:
    maximum number of learning steps permitted by the runLearningEpisodeFrom(State) method.
  - setMaxLearningSteps
```
public void setMaxLearningSteps(int maxLearningSteps)
```
    Sets the maximum number of learning steps permitted by the runLearningEpisodeFrom(State) method.
    
    Parameters:
    maxLearningSteps - the maximum number of learning steps permitted by the runLearningEpisodeFrom(State) method.
  - getMinNewStepsForLearningPI
```
public int getMinNewStepsForLearningPI()
```
    The minimum number of new learning observations before policy iteration is run again.
    
    Returns:
    the minimum number of new learning observations before policy iteration is run again.
  - setMinNewStepsForLearningPI
```
public void setMinNewStepsForLearningPI(int minNewStepsForLearningPI)
```
    Sets the minimum number of new learning observations before policy iteration is run again.
    
    Parameters:
    minNewStepsForLearningPI - the minimum number of new learning observations before policy iteration is run again.
  - getMaxChange
```
public double getMaxChange()
```
    The maximum change in weights required to terminate policy iteration when called from the planFromState(State), runLearningEpisodeFrom(State) or runLearningEpisodeFrom(State, int) methods.
    
    Returns:
    the maximum change in weights required to terminate policy iteration when called from the planFromState(State), runLearningEpisodeFrom(State) or runLearningEpisodeFrom(State, int) methods.
  - setMaxChange
```
public void setMaxChange(double maxChange)
```
    Sets the maximum change in weights required to terminate policy iteration when called from the planFromState(State), runLearningEpisodeFrom(State) or runLearningEpisodeFrom(State, int) methods.
    
    Parameters:
    maxChange - the maximum change in weights required to terminate policy iteration when called from the planFromState(State), runLearningEpisodeFrom(State) or runLearningEpisodeFrom(State, int) methods.
  - LSTDQ
```
public org.ejml.simple.SimpleMatrix LSTDQ()
```
    Runs LSTDQ on this object's current SARSData dataset.
    
    Returns:
    the new weight matrix as a SimpleMatrix object.
  - runPolicyIteration
```
public void runPolicyIteration(int numIterations,
                      double maxChange)
```
    Runs LSPI for either numIterations or until the change in the weight matrix is no greater than maxChange.
    
    Parameters:
    numIterations - the maximum number of policy iterations.
    maxChange - when the weight change is smaller than this value, LSPI terminates.
  - phiConstructor
```
protected org.ejml.simple.SimpleMatrix phiConstructor(java.util.List<ActionFeaturesQuery> features,
                                          int nf)
```
    Constructs the state-action feature vector as a SimpleMatrix.
    
    Parameters:
    features - the state-action features that have non-zero values
    nf - the total number of state-action features.
    
    Returns:
    the state-action feature vector as a SimpleMatrix.
  - gaListWrapper
```
protected java.util.List<GroundedAction> gaListWrapper(AbstractGroundedAction ga)
```
    Wraps a GroundedAction in a list of size 1.
    
    Parameters:
    ga - the GroundedAction to wrap.
    
    Returns:
    a List consisting of just the input GroundedAction object.
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QComputablePlanner
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QComputablePlanner
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QComputablePlanner
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QComputablePlanner
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - getQFromFeaturesFor
```
protected QValue getQFromFeaturesFor(java.util.List<ActionApproximationResult> results,
                         State s,
                         GroundedAction ga)
```
    Creates a Q-value object in which the Q-value is determined from VFA.
    
    Parameters:
    results - the VFA prediction results for each action.
    s - the state of the Q-value
    ga - the action taken
    
    Returns:
    a Q-value object in which the Q-value is determined from VFA.
  - planFromState
```
public void planFromState(State initialState)
```
    Description copied from class: OOMDPPlanner
    
    This method will cause the planner to begin planning from the specified initial state
    
    Specified by:
    
    planFromState in class OOMDPPlanner
    
    Parameters:
    initialState - the initial state of the planning problem
  - resetPlannerResults
```
public void resetPlannerResults()
```
    Description copied from class: OOMDPPlanner
    
    Use this method to reset all planner results so that planning can be started fresh with a call to OOMDPPlanner.planFromState(State) as if no planning had ever been performed before. Specifically, data produced from calls to the OOMDPPlanner.planFromState(State) will be cleared, but all other planner settings should remain the same. This is useful if the reward function or transition dynamics have changed, thereby requiring new results to be computed. If there were other objects this planner was provided that may have changed and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.
    
    Specified by:
    
    resetPlannerResults in class OOMDPPlanner
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached or if the agent decides to determinate the episode (e.g., by having an internal parameter set for a maximum number of steps in an episode).
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - runLearningEpisodeFrom
```
public EpisodeAnalysis runLearningEpisodeFrom(State initialState,
                                     int maxSteps)
```
    Description copied from interface: LearningAgent
    
    Causes the agent to perform a learning episode starting in the given initial state. The episode terminates when a terminal state is reached, if the agent decides to determinate the episode, or if the number of steps reaches the provided threshold.
    
    Specified by:
    
    runLearningEpisodeFrom in interface LearningAgent
    
    Parameters:
    initialState - The initial state in which the agent will start the episode.
    maxSteps - the maximum number of steps in the episode
    
    Returns:
    The learning episode events that was performed, stored in an EpisodeAnalysis object.
  - updateDatasetWithLearningEpisode
```
protected void updateDatasetWithLearningEpisode(EpisodeAnalysis ea)
```
    Updates this object's SARSData to include the results of a learning episode.
    
    Parameters:
    ea - the learning episode as an EpisodeAnalysis object.
  - shouldRereunPolicyIteration
```
protected boolean shouldRereunPolicyIteration(EpisodeAnalysis ea)
```
    Returns whether LSPI should be rereun given the latest learning episode results. Default behavior is to return true if the number of leanring episode steps plus the number of steps since the last run is greater than the numStepsSinceLastLearningPI threshold.
    
    Parameters:
    ea - the most recent learning episode
    
    Returns:
    true if LSPI should be rerun; false otherwise.
  - getLastLearningEpisode
```
public EpisodeAnalysis getLastLearningEpisode()
```
    Description copied from interface: LearningAgent
    
    Returns the last learning episode of the agent.
    
    Specified by:
    
    getLastLearningEpisode in interface LearningAgent
    
    Returns:
    the last learning episode of the agent.
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
    Description copied from interface: LearningAgent
    
    Tells the agent how many EpisodeAnalysis objects representing learning episodes to internally store. For instance, if the number of set to 5, then the agent should remember the save the last 5 learning episodes. Note that this number has nothing to do with how learning is performed; it is purely for performance gathering.
    
    Specified by:
    
    setNumEpisodesToStore in interface LearningAgent
    
    Parameters:
    numEps - the number of learning episodes to remember.
  - getAllStoredLearningEpisodes
```
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
```
    Description copied from interface: LearningAgent
    
    Returns all saved EpisodeAnalysis objects of which the agent has kept track.
    
    Specified by:
    
    getAllStoredLearningEpisodes in interface LearningAgent
    
    Returns:
    all saved EpisodeAnalysis objects of which the agent has kept track.

Class LSPI

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.planning.QComputablePlanner

Nested classes/interfaces inherited from interface burlap.behavior.singleagent.learning.LearningAgent

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.OOMDPPlanner

Methods inherited from class java.lang.Object

Field Detail

vfa

dataset

featureDatabase

identityScalar

lastWeights

numSamplesForPlanning

maxChange

planningCollector

maxNumPlanningIterations

learningPolicy

maxLearningSteps

numStepsSinceLastLearningPI

minNewStepsForLearningPI

episodeHistory

numEpisodesToStore

Constructor Detail

LSPI

Method Detail

setDataset

getDataset

getFeatureDatabase

setFeatureDatabase

getIdentityScalar

setIdentityScalar

getNumSamplesForPlanning

setNumSamplesForPlanning

getPlanningCollector

setPlanningCollector

getMaxNumPlanningIterations

setMaxNumPlanningIterations

getLearningPolicy

setLearningPolicy

getMaxLearningSteps

setMaxLearningSteps

getMinNewStepsForLearningPI

setMinNewStepsForLearningPI

getMaxChange

setMaxChange

LSTDQ

runPolicyIteration

phiConstructor

gaListWrapper

getQs

getQ

getQFromFeaturesFor

planFromState

resetPlannerResults

runLearningEpisodeFrom

runLearningEpisodeFrom

updateDatasetWithLearningEpisode

shouldRereunPolicyIteration

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes