Class | Description |
---|---|
LSPI |
This class implements the optimized version of last squares policy iteration [1] (runs in quadratic time of the number of state features).
|
SARSCollector |
This object is used to collected
SARSData (state-action-reard-state tuples) that can then be used by algorithms like LSPI for learning. |
SARSCollector.UniformRandomSARSCollector |
Collects SARS data from source states generated by a
StateGenerator by choosing actions uniformly at random. |
SARSData |
Class that provides a wrapper for a List holding a bunch of state-action-reward-state (
SARSData.SARS ) tuples. |
SARSData.SARS |
State-action-reward-state tuple.
|