LSPI

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.learning.lspi.LSPI

All Implemented Interfaces:

LearningAgent, MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction
```
public class LSPI
extends MDPSolver
implements QProvider, LearningAgent, Planner
```
This class implements the optimized version of last squares policy iteration [1] (runs in quadratic time of the number of state features). Unlike other planning and learning algorithms, it is recommended that you use this class differently than the conventional ways. That is, rather than using the planFromState(State) or runLearningEpisode(burlap.mdp.singleagent.environment.Environment) methods, you should instead use a SARSCollector object to gather a bunch of example state-action-reward-state tuples that are then used for policy iteration. You can set the dataset to use using the setDataset(SARSData) method and then you can run LSPI on it using the runPolicyIteration(int, double) method. LSPI requires initialize a matrix to an identity matrix multiplied by some large positive constant (see the reference for more information). By default this constant is 100, but you can change it with the setIdentityScalar(double) method.
If you do use the planFromState(State) method, you should first initialize the parameters for it using the initializeForPlanning(int, SARSCollector) or initializeForPlanning(int) method. If you do not set a SARSCollector to use for planning a SARSCollector.UniformRandomSARSCollector will be automatically created. After collecting data, it will call the runPolicyIteration(int, double) method using a maximum of 30 policy iterations. You can change the SARSCollector this method uses, the number of samples it acquires, the maximum weight change for PI termination, and the maximum number of policy iterations by using the setPlanningCollector(SARSCollector), setNumSamplesForPlanning(int), setMaxChange(double), and setMaxNumPlanningIterations(int) methods respectively.
If you use the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) method (or the runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) method), it will work by following a learning policy for the episode and adding its observations to its dataset for its policy iteration. After enough new data has been acquired, policy iteration will be rereun. You can adjust the learning policy, the maximum number of allowed learning steps in an episode, and the minimum number of new observations until LSPI is rerun using the setLearningPolicy(Policy), setMaxLearningSteps(int), setMinNewStepsForLearningPI(int) methods respectively. The LSPI termination parameters are set using the same methods that you use for adjusting the results from the planFromState(State) method discussed above.
This data gathering and replanning behavior from learning episodes is not expected to be an especially good choice. Therefore, if you want a better online data acquisition, you should consider subclassing this class and overriding the methods updateDatasetWithLearningEpisode(Episode) and shouldRereunPolicyIteration(Episode), or the runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) method itself.
Note that LSPI is not well defined for domains with terminal states. Therefore, you need to make sure your reward function returns a value for terminal transitions that offsets the effect of the state not being terminal. For example, for goal states, it should return a large enough value to offset any costs incurred from continuing. For failure states, it should return a negative reward large enough to offset any gains incurred from continuing.
1. Lagoudakis, Michail G., and Ronald Parr. "Least-squares policy iteration." The Journal of Machine Learning Research 4 (2003): 1107-1149.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected class LSPI.SSFeatures
Pair of the the state-action features and the next state-action features.
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Nested Classes
Modifier and Type	Class and Description
`protected class`	`LSPI.SSFeatures` Pair of the the state-action features and the next state-action features.

Field Summary

Fields
Modifier and Type	Field and Description
`protected SARSData`	`dataset` The SARS dataset on which LSPI is performed
`protected java.util.LinkedList<Episode>`	`episodeHistory` the saved previous learning episodes
`protected double`	`identityScalar` The initial LSPI identity matrix scalar; default is 100.
`protected org.ejml.simple.SimpleMatrix`	`lastWeights` The last weight values set from LSTDQ
`protected Policy`	`learningPolicy` The learning policy followed in `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` method calls.
`protected double`	`maxChange` The maximum change in weights permitted to terminate LSPI.
`protected int`	`maxLearningSteps` The maximum number of learning steps in an episode when the `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` method is called.
`protected int`	`maxNumPlanningIterations` The maximum number of policy iterations permitted when LSPI is run from the `planFromState(State)` or `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` methods.
`protected int`	`minNewStepsForLearningPI` The minimum number of new observations received from learning episodes before LSPI will be run again.
`protected int`	`numEpisodesToStore` The number of the most recent learning episodes to store.
`protected int`	`numSamplesForPlanning` the number of samples that are acquired for this object's dataset when the `planFromState(State)` method is called.
`protected int`	`numStepsSinceLastLearningPI` Number of new observations received from learning episodes since LSPI was run
`protected SARSCollector`	`planningCollector` The data collector used by the `planFromState(State)` method.
`protected DenseStateActionFeatures`	`saFeatures` The state feature database on which the linear VFA is performed
`protected DenseStateActionLinearVFA`	`vfa` The object that performs value function approximation given the weights that are estimated

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Constructor Summary

Constructors
Constructor and Description
`LSPI(SADomain domain, double gamma, DenseStateActionFeatures saFeatures)` Initializes.
`LSPI(SADomain domain, double gamma, DenseStateActionFeatures saFeatures, SARSData dataset)` Initializes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.util.List<Episode>`	`getAllStoredLearningEpisodes()`
`SARSData`	`getDataset()` Returns the dataset this object uses for LSPI
`double`	`getIdentityScalar()` Returns the initial LSPI identity matrix scalar used
`Episode`	`getLastLearningEpisode()`
`Policy`	`getLearningPolicy()` The learning policy followed by the `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` and `runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int)` methods.
`double`	`getMaxChange()` The maximum change in weights required to terminate policy iteration when called from the `planFromState(State)`, `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` or `runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int)` methods.
`int`	`getMaxLearningSteps()` The maximum number of learning steps permitted by the `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` method.
`int`	`getMaxNumPlanningIterations()` The maximum number of policy iterations that will be used by the `planFromState(State)` method.
`int`	`getMinNewStepsForLearningPI()` The minimum number of new learning observations before policy iteration is run again.
`int`	`getNumSamplesForPlanning()` Gets the number of SARS samples that will be gathered by the `planFromState(State)` method.
`SARSCollector`	`getPlanningCollector()` Gets the `SARSCollector` used by the `planFromState(State)` method for collecting data.
`DenseStateActionFeatures`	`getSaFeatures()` Returns the state-action features used
`void`	`initializeForPlanning(int numSamplesForPlanning)` Sets the number of `SARSData.SARS` samples to use for planning when the `planFromState(State)` method is called.
`void`	`initializeForPlanning(int numSamplesForPlanning, SARSCollector planningCollector)` Sets the number of `SARSData.SARS` samples, and the `SARSCollector` to use to collect samples for planning when the `planFromState(State)` method is called.
`org.ejml.simple.SimpleMatrix`	`LSTDQ()` Runs LSTDQ on this object's current `SARSData` dataset.
`protected org.ejml.simple.SimpleMatrix`	`phiConstructor(double[] features, int nf)` Constructs the state-action feature vector as a `SimpleMatrix`.
`GreedyQPolicy`	`planFromState(State initialState)` Plans from the input state and then returns a `GreedyQPolicy` that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
`double`	`qValue(State s, Action a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`qValues(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`Episode`	`runLearningEpisode(Environment env)`
`Episode`	`runLearningEpisode(Environment env, int maxSteps)`
`GreedyQPolicy`	`runPolicyIteration(int numIterations, double maxChange)` Runs LSPI for either numIterations or until the change in the weight matrix is no greater than maxChange.
`void`	`setDataset(SARSData dataset)` Sets the SARS dataset this object will use for LSPI
`void`	`setIdentityScalar(double identityScalar)` Sets the initial LSPI identity matrix scalar used.
`void`	`setLearningPolicy(Policy learningPolicy)` Sets the learning policy followed by the `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` and `runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int)` methods.
`void`	`setMaxChange(double maxChange)` Sets the maximum change in weights required to terminate policy iteration when called from the `planFromState(State)`, `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` or `runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int)` methods.
`void`	`setMaxLearningSteps(int maxLearningSteps)` Sets the maximum number of learning steps permitted by the `runLearningEpisode(burlap.mdp.singleagent.environment.Environment)` method.
`void`	`setMaxNumPlanningIterations(int maxNumPlanningIterations)` Sets the maximum number of policy iterations that will be used by the `planFromState(State)` method.
`void`	`setMinNewStepsForLearningPI(int minNewStepsForLearningPI)` Sets the minimum number of new learning observations before policy iteration is run again.
`void`	`setNumEpisodesToStore(int numEps)`
`void`	`setNumSamplesForPlanning(int numSamplesForPlanning)` Sets the number of SARS samples that will be gathered by the `planFromState(State)` method.
`void`	`setPlanningCollector(SARSCollector planningCollector)` Sets the `SARSCollector` used by the `planFromState(State)` method for collecting data.
`void`	`setSaFeatures(DenseStateActionFeatures saFeatures)` Sets the state-action features to used
`protected boolean`	`shouldRereunPolicyIteration(Episode ea)` Returns whether LSPI should be rereun given the latest learning episode results.
`protected void`	`updateDatasetWithLearningEpisode(Episode ea)` Updates this object's `SARSData` to include the results of a learning episode.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

- Field Detail
  - vfa
```
protected DenseStateActionLinearVFA vfa
```
    The object that performs value function approximation given the weights that are estimated
  - dataset
```
protected SARSData dataset
```
    The SARS dataset on which LSPI is performed
  - saFeatures
```
protected DenseStateActionFeatures saFeatures
```
    The state feature database on which the linear VFA is performed
  - identityScalar
```
protected double identityScalar
```
    The initial LSPI identity matrix scalar; default is 100.
  - lastWeights
```
protected org.ejml.simple.SimpleMatrix lastWeights
```
    The last weight values set from LSTDQ
  - numSamplesForPlanning
```
protected int numSamplesForPlanning
```
    the number of samples that are acquired for this object's dataset when the planFromState(State) method is called.
  - maxChange
```
protected double maxChange
```
    The maximum change in weights permitted to terminate LSPI. Default is 1e-6.
  - planningCollector
```
protected SARSCollector planningCollector
```
    The data collector used by the planFromState(State) method.
  - maxNumPlanningIterations
```
protected int maxNumPlanningIterations
```
    The maximum number of policy iterations permitted when LSPI is run from the planFromState(State) or runLearningEpisode(burlap.mdp.singleagent.environment.Environment) methods.
  - learningPolicy
```
protected Policy learningPolicy
```
    The learning policy followed in runLearningEpisode(burlap.mdp.singleagent.environment.Environment) method calls. Default is 0.1 epsilon greedy.
  - maxLearningSteps
```
protected int maxLearningSteps
```
    The maximum number of learning steps in an episode when the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) method is called. Default is INT_MAX.
  - numStepsSinceLastLearningPI
```
protected int numStepsSinceLastLearningPI
```
    Number of new observations received from learning episodes since LSPI was run
  - minNewStepsForLearningPI
```
protected int minNewStepsForLearningPI
```
    The minimum number of new observations received from learning episodes before LSPI will be run again.
  - episodeHistory
```
protected java.util.LinkedList<Episode> episodeHistory
```
    the saved previous learning episodes
  - numEpisodesToStore
```
protected int numEpisodesToStore
```
    The number of the most recent learning episodes to store.
- Constructor Detail
  - LSPI
```
public LSPI(SADomain domain,
            double gamma,
            DenseStateActionFeatures saFeatures)
```
    Initializes.
    
    Parameters:
    
    domain - the problem domain
    
    gamma - the discount factor
    
    saFeatures - the state-action features to use
  - LSPI
```
public LSPI(SADomain domain,
            double gamma,
            DenseStateActionFeatures saFeatures,
            SARSData dataset)
```
    Initializes.
    
    Parameters:
    
    domain - the problem domain
    
    gamma - the discount factor
    
    saFeatures - the state-action features
    
    dataset - the dataset of transitions to use
- Method Detail
  - initializeForPlanning
```
public void initializeForPlanning(int numSamplesForPlanning)
```
    Sets the number of SARSData.SARS samples to use for planning when the planFromState(State) method is called. If the RewardFunction and TerminalFunction are not set, the planFromState(State) method will throw a runtime exception.
    
    Parameters:
    
    numSamplesForPlanning - the number of SARS samples to collect for planning.
  - initializeForPlanning
```
public void initializeForPlanning(int numSamplesForPlanning,
                                  SARSCollector planningCollector)
```
    Sets the number of SARSData.SARS samples, and the SARSCollector to use to collect samples for planning when the planFromState(State) method is called. If the RewardFunction and TerminalFunction are not set, the planFromState(State) method will throw a runtime exception.
    
    Parameters:
    
    numSamplesForPlanning - the number of SARS samples to collect for planning.
    
    planningCollector - the dataset collector to use for planning
  - setDataset
```
public void setDataset(SARSData dataset)
```
    Sets the SARS dataset this object will use for LSPI
    
    Parameters:
    
    dataset - the SARSA dataset
  - getDataset
```
public SARSData getDataset()
```
    Returns the dataset this object uses for LSPI
    
    Returns:
    
    the dataset this object uses for LSPI
  - getSaFeatures
```
public DenseStateActionFeatures getSaFeatures()
```
    Returns the state-action features used
    
    Returns:
    
    the state-action features used
  - setSaFeatures
```
public void setSaFeatures(DenseStateActionFeatures saFeatures)
```
    Sets the state-action features to used
    
    Parameters:
    
    saFeatures - the state-action feature to use
  - getIdentityScalar
```
public double getIdentityScalar()
```
    Returns the initial LSPI identity matrix scalar used
    
    Returns:
    
    the initial LSPI identity matrix scalar used
  - setIdentityScalar
```
public void setIdentityScalar(double identityScalar)
```
    Sets the initial LSPI identity matrix scalar used.
    
    Parameters:
    
    identityScalar - the initial LSPI identity matrix scalar used.
  - getNumSamplesForPlanning
```
public int getNumSamplesForPlanning()
```
    Gets the number of SARS samples that will be gathered by the planFromState(State) method.
    
    Returns:
    
    the number of SARS samples that will be gathered by the planFromState(State) method.
  - setNumSamplesForPlanning
```
public void setNumSamplesForPlanning(int numSamplesForPlanning)
```
    Sets the number of SARS samples that will be gathered by the planFromState(State) method.
    
    Parameters:
    
    numSamplesForPlanning - the number of SARS samples that will be gathered by the planFromState(State) method.
  - getPlanningCollector
```
public SARSCollector getPlanningCollector()
```
    Gets the SARSCollector used by the planFromState(State) method for collecting data.
    
    Returns:
    
    the SARSCollector used by the planFromState(State) method for collecting data.
  - setPlanningCollector
```
public void setPlanningCollector(SARSCollector planningCollector)
```
    Sets the SARSCollector used by the planFromState(State) method for collecting data.
    
    Parameters:
    
    planningCollector - the SARSCollector used by the planFromState(State) method for collecting data.
  - getMaxNumPlanningIterations
```
public int getMaxNumPlanningIterations()
```
    The maximum number of policy iterations that will be used by the planFromState(State) method.
    
    Returns:
    
    the maximum number of policy iterations that will be used by the planFromState(State) method.
  - setMaxNumPlanningIterations
```
public void setMaxNumPlanningIterations(int maxNumPlanningIterations)
```
    Sets the maximum number of policy iterations that will be used by the planFromState(State) method.
    
    Parameters:
    
    maxNumPlanningIterations - the maximum number of policy iterations that will be used by the planFromState(State) method.
  - getLearningPolicy
```
public Policy getLearningPolicy()
```
    The learning policy followed by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) and runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
    
    Returns:
    
    learning policy followed by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) and runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
  - setLearningPolicy
```
public void setLearningPolicy(Policy learningPolicy)
```
    Sets the learning policy followed by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) and runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
    
    Parameters:
    
    learningPolicy - the learning policy followed by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) and runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
  - getMaxLearningSteps
```
public int getMaxLearningSteps()
```
    The maximum number of learning steps permitted by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) method.
    
    Returns:
    
    maximum number of learning steps permitted by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) method.
  - setMaxLearningSteps
```
public void setMaxLearningSteps(int maxLearningSteps)
```
    Sets the maximum number of learning steps permitted by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) method.
    
    Parameters:
    
    maxLearningSteps - the maximum number of learning steps permitted by the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) method.
  - getMinNewStepsForLearningPI
```
public int getMinNewStepsForLearningPI()
```
    The minimum number of new learning observations before policy iteration is run again.
    
    Returns:
    
    the minimum number of new learning observations before policy iteration is run again.
  - setMinNewStepsForLearningPI
```
public void setMinNewStepsForLearningPI(int minNewStepsForLearningPI)
```
    Sets the minimum number of new learning observations before policy iteration is run again.
    
    Parameters:
    
    minNewStepsForLearningPI - the minimum number of new learning observations before policy iteration is run again.
  - getMaxChange
```
public double getMaxChange()
```
    The maximum change in weights required to terminate policy iteration when called from the planFromState(State), runLearningEpisode(burlap.mdp.singleagent.environment.Environment) or runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
    
    Returns:
    
    the maximum change in weights required to terminate policy iteration when called from the planFromState(State), runLearningEpisode(burlap.mdp.singleagent.environment.Environment) or runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
  - setMaxChange
```
public void setMaxChange(double maxChange)
```
    Sets the maximum change in weights required to terminate policy iteration when called from the planFromState(State), runLearningEpisode(burlap.mdp.singleagent.environment.Environment) or runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
    
    Parameters:
    
    maxChange - the maximum change in weights required to terminate policy iteration when called from the runLearningEpisode(burlap.mdp.singleagent.environment.Environment) or runLearningEpisode(burlap.mdp.singleagent.environment.Environment, int) methods.
  - LSTDQ
```
public org.ejml.simple.SimpleMatrix LSTDQ()
```
    Runs LSTDQ on this object's current SARSData dataset.
    
    Returns:
    
    the new weight matrix as a SimpleMatrix object.
  - runPolicyIteration
```
public GreedyQPolicy runPolicyIteration(int numIterations,
                                        double maxChange)
```
    Runs LSPI for either numIterations or until the change in the weight matrix is no greater than maxChange.
    
    Parameters:
    
    numIterations - the maximum number of policy iterations.
    
    maxChange - when the weight change is smaller than this value, LSPI terminates.
    
    Returns:
    
    a GreedyQPolicy using this object as the QProvider source.
  - phiConstructor
```
protected org.ejml.simple.SimpleMatrix phiConstructor(double[] features,
                                                      int nf)
```
    Constructs the state-action feature vector as a SimpleMatrix.
    
    Parameters:
    
    features - the state-action features
    
    nf - the total number of state-action features.
    
    Returns:
    
    the state-action feature vector as a SimpleMatrix.
  - qValues
```
public java.util.List<QValue> qValues(State s)
```
    Description copied from interface: QProvider
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    qValues in interface QProvider
    
    Parameters:
    
    s - the state for which Q-values are to be returned.
    
    Returns:
    
    a List of QValue objects for ever permissible action for the given input state.
  - qValue
```
public double qValue(State s,
                     Action a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    qValue in interface QFunction
    
    Parameters:
    
    s - the input state
    
    a - the input action
    
    Returns:
    
    the QValue for the given state-action pair.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    
    s - the state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.
  - planFromState
```
public GreedyQPolicy planFromState(State initialState)
```
    Plans from the input state and then returns a GreedyQPolicy that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    
    initialState - the initial state of the planning problem
    
    Returns:
    
    a GreedyQPolicy.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver
  - runLearningEpisode
```
public Episode runLearningEpisode(Environment env)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - runLearningEpisode
```
public Episode runLearningEpisode(Environment env,
                                  int maxSteps)
```
    Specified by:
    
    runLearningEpisode in interface LearningAgent
  - updateDatasetWithLearningEpisode
```
protected void updateDatasetWithLearningEpisode(Episode ea)
```
    Updates this object's SARSData to include the results of a learning episode.
    
    Parameters:
    
    ea - the learning episode as an Episode object.
  - shouldRereunPolicyIteration
```
protected boolean shouldRereunPolicyIteration(Episode ea)
```
    Returns whether LSPI should be rereun given the latest learning episode results. Default behavior is to return true if the number of leanring episode steps plus the number of steps since the last run is greater than the numStepsSinceLastLearningPI threshold.
    
    Parameters:
    
    ea - the most recent learning episode
    
    Returns:
    
    true if LSPI should be rerun; false otherwise.
  - getLastLearningEpisode
```
public Episode getLastLearningEpisode()
```
  - setNumEpisodesToStore
```
public void setNumEpisodesToStore(int numEps)
```
  - getAllStoredLearningEpisodes
```
public java.util.List<Episode> getAllStoredLearningEpisodes()
```

Class LSPI

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

vfa

dataset

saFeatures

identityScalar

lastWeights

numSamplesForPlanning

maxChange

planningCollector

maxNumPlanningIterations

learningPolicy

maxLearningSteps

numStepsSinceLastLearningPI

minNewStepsForLearningPI

episodeHistory

numEpisodesToStore

Constructor Detail

LSPI

LSPI

Method Detail

initializeForPlanning

initializeForPlanning

setDataset

getDataset

getSaFeatures

setSaFeatures

getIdentityScalar

setIdentityScalar

getNumSamplesForPlanning

setNumSamplesForPlanning

getPlanningCollector

setPlanningCollector

getMaxNumPlanningIterations

setMaxNumPlanningIterations

getLearningPolicy

setLearningPolicy

getMaxLearningSteps

setMaxLearningSteps

getMinNewStepsForLearningPI

setMinNewStepsForLearningPI

getMaxChange

setMaxChange

LSTDQ

runPolicyIteration

phiConstructor

qValues

qValue

value

planFromState

resetSolver

runLearningEpisode

runLearningEpisode

updateDatasetWithLearningEpisode

shouldRereunPolicyIteration

getLastLearningEpisode

setNumEpisodesToStore

getAllStoredLearningEpisodes