public class SparseSampling extends MDPSolver implements QFunction, Planner
setHAndCByMDPError(double, double, int)
 method to ensure this, however, the required horizon and C will probably be intractably large). SS must replan for every new state it sees, so an agent following it in general must replan after every step it takes in the real world. Using a
 Q-based Policy object, will ensure this behavior because this algorithm will call the valueFunction whenever it's queried for the Q-value for a state it has not seen.
 
 The algorithm operates by building a tree from the source initial state. The tree is built by sampling C outcome states for each possible state-action pair, thereby generating new
 state nodes in the tree. The tree is built out to a fixed height H and then in a tail recursive way, the Q-value and state value is estimated using a Bellman update as if the C samples perfectly defined
 the transition dynamics. Because the values are are based on fixed horizon and computed in a recursive way, only one bellman update is required per node.
 
 Although the complexity of the algorithm is independent of the state space size, it is exponential in the height of the tree, so if a large tree height is required
 to make good value function estimates, this algorithm may not be appropriate. Therefore, when rewards are sparse or uniform except at a distant horizon, this may not be an appropriate
 algorithm choice.
 
 By default, this class will remember the estimated Q-value for every state from which the planFromState(State) method was called (which will be indirectly called
 by the Q-value query methods if it does not have the Q-value for it) and it will also remember the value of state tree nodes it computed so that they may be reused in
 subsequent tree creations, thereby limiting the amount of additional computation required. However, if memory is scarce, the class can be told to forget all prior planning
 results, except the Q-value estimate for the most recently planned for state, by using the forgetPreviousPlanResults method.
 
 By default, the C parameter (number of state transition samples) is fixed for all nodes; however, it may also be set to use a variable C that reduces the number
 of sampled states the further down in the tree it is according to C_i = C_0 * gamma^(2i), where i is the depth of the node from the root and gamma is the discount
 factor.
 
 By default, the state value of leafs will be set to 0, but this value can be changed by providing a ValueFunctionInitialization object via the
 setValueForLeafNodes(ValueFunctionInitialization) method. Using a non-zero heuristic value may reduce the need for a large tree height.
 
 This class will work with Options, but including options will necessarily *increase* the computational complexity, so they are not recommended.
 
 This class requires a HashableStateFactory.
 
 This class can optionally be set to not use sampling and instead use the full Bellman update, which results in the exact finite horizon Q-value being computed.
 However, this should only be done when the number of possible state transitions is small and when the full model for the domain is defined (that is,
 all Action instances implement the FullActionModel interface.). To set this class to compute the exact finite horizon value function, use the
 setComputeExactValueFunction(boolean) method. Note that you cannot use Options when using the fully Bellman update, because that would
 required factored access to the probability of each length of each transition, which is not available from Options (it's aggregated into the transition function
 itself). An exception will be thrown if Options are used with the full Bellman transitions.
 
 
 
 1. Kearns, Michael, Yishay Mansour, and Andrew Y. Ng. "A sparse sampling algorithm for near-optimal planning in large Markov decision processes." 
 Machine Learning 49.2-3 (2002): 193-208.| Modifier and Type | Class and Description | 
|---|---|
| static class  | SparseSampling.HashedHeightStateTuple for a state and its height in a tree that can be hashed for quick retrieval. | 
| class  | SparseSampling.StateNodeA class for state nodes. | 
QFunction.QFunctionHelper| Modifier and Type | Field and Description | 
|---|---|
| protected int | cThe number of transition dynamics samples (for the root if depth-variable C is used) | 
| protected boolean | computeExactValueFunctionThis parameter indicates whether the exact finite horizon value function is computed or whether sparse sampling
 to estimate should be used. | 
| protected boolean | forgetPreviousPlanResultsWhether previous planning results should be forgetten or reused; default is reused (false). | 
| protected int | hThe height of the tree | 
| protected java.util.Map<SparseSampling.HashedHeightState,SparseSampling.StateNode> | nodesByHeightThe tree nodes indexed by state and height. | 
| protected int | numUpdatesThe total number of pseudo-Bellman updates | 
| protected java.util.Map<HashableState,java.util.List<QValue>> | rootLevelQValuesThe root state node Q-values that have been estimated by previous planning calls. | 
| protected boolean | useVariableCWhether the number of transition dyanmic samples should scale with the depth of the node. | 
| protected ValueFunctionInitialization | vinitThe state value used for leaf nodes; default is zero. | 
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description | 
|---|
| SparseSampling(Domain domain,
              RewardFunction rf,
              TerminalFunction tf,
              double gamma,
              HashableStateFactory hashingFactory,
              int h,
              int c)Initializes. | 
| Modifier and Type | Method and Description | 
|---|---|
| boolean | computesExactValueFunction()Returns whether this valueFunction comptues the exact finite horizon value function (by using the full transition dynamics) or whether
 it estimates the value funciton with sampling. | 
| int | getC()Returns the number of state transition samples | 
| protected int | getCAtHeight(int height)Returns the value of C for a node at the given height (height from a leaf node). | 
| int | getDebugCode()Returns the debug code used for logging plan results with  DPrint. | 
| int | getH()Returns the height of the tree | 
| int | getNumberOfStateNodesCreated()Returns the total number of state nodes that have been created. | 
| int | getNumberOfValueEsitmates()Returns the total number of state value estimates performed since the  resetSolver()call. | 
| QValue | getQ(State s,
    AbstractGroundedAction a)Returns the  QValuefor the given state-action pair. | 
| java.util.List<QValue> | getQs(State s)Returns a  ListofQValueobjects for ever permissible action for the given input state. | 
| protected SparseSampling.StateNode | getStateNode(State s,
            int height)Either returns, or creates, indexes, and returns, the state node for the given state at the given height in the tree | 
| protected static double | logbase(double base,
       double x)Retuns the log value at the given bases. | 
| GreedyQPolicy | planFromState(State initialState)Plans from the input state and then returns a  GreedyQPolicythat greedily
 selects the action with the highest Q-value and breaks ties uniformly randomly. | 
| void | resetSolver()This method resets all solver results so that a solver can be restarted fresh
 as if had never solved the MDP. | 
| void | setC(int c)Sets the number of state transition samples used. | 
| void | setComputeExactValueFunction(boolean computeExactValueFunction)Sets whether this valueFunction will compute the exact finite horizon value funciton (using the full transition dynamics) or if sampling
 to estimate the value function will be used. | 
| void | setDebugCode(int debugCode)Sets the debug code used for logging plan results with  DPrint. | 
| void | setForgetPreviousPlanResults(boolean forgetPreviousPlanResults)Sets whether previous planning results should be forgetten or resued in subsequent planning. | 
| void | setH(int h)Sets the height of the tree. | 
| void | setHAndCByMDPError(double rmax,
                  double epsilon,
                  int numActions)Sets the height and number of transition dynamics samples in a way that ensure epsilon optimality. | 
| void | setUseVariableCSize(boolean useVariableC)Sets whether the number of state transition samples (C) should be variable with respect to the depth of the node. | 
| void | setValueForLeafNodes(ValueFunctionInitialization vinit)Sets the  ValueFunctionInitializationobject to use for settting the value of leaf nodes. | 
| double | value(State s)Returns the value function evaluation of the given state. | 
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateActionclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitaddNonDomainReferencedAction, getActions, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrintingprotected int h
protected int c
protected boolean useVariableC
protected boolean forgetPreviousPlanResults
protected ValueFunctionInitialization vinit
protected boolean computeExactValueFunction
protected java.util.Map<SparseSampling.HashedHeightState,SparseSampling.StateNode> nodesByHeight
protected java.util.Map<HashableState,java.util.List<QValue>> rootLevelQValues
protected int numUpdates
public SparseSampling(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, int h, int c)
setHAndCByMDPError(double, double, int) method, but in
 general this will result in very large values that will be intractable. If you set c = -1, then the full transition dynamics will be used. You should
 only use the full transition dynamics if the number of possible transitions from each state is small and if the domain Action object's
 implements the FullActionModel interface.domain - the planning domainrf - the reward functiontf - the terminal functiongamma - the discount factorhashingFactory - the state hashing factory for matching generated states with their state nodes.h - the height of the treec - the number of transition dynamics samples used. If set to -1, then the full transition dynamics are used.public void setHAndCByMDPError(double rmax,
                      double epsilon,
                      int numActions)
rmax - the maximum reward value of the MDPepsilon - the epsilon optimality (amount that the estimated value funciton may diverge from the true optimal)numActions - the maximum number of actions that could be applied from a statepublic void setUseVariableCSize(boolean useVariableC)
useVariableC - if true, then depth-variable C will be used; if false, all state nodes use the same number of samples.public void setC(int c)
c - the number of state transition samples used. If -1, then the full transition dynamics are used.public void setH(int h)
h - the height of the tree.public int getC()
public int getH()
public void setComputeExactValueFunction(boolean computeExactValueFunction)
computeExactValueFunction - if true, the exact finite horizon value function is computed; if false, then sampling is used.public boolean computesExactValueFunction()
public void setForgetPreviousPlanResults(boolean forgetPreviousPlanResults)
forgetPreviousPlanResults - if true, then previous planning results will be forgotten; if true, they will be remembered and reused in susbequent planning.public void setValueForLeafNodes(ValueFunctionInitialization vinit)
ValueFunctionInitialization object to use for settting the value of leaf nodes.vinit - the ValueFunctionInitialization object to use for settting the value of leaf nodes.public int getDebugCode()
DPrint.getDebugCode in interface MDPSolverInterfacegetDebugCode in class MDPSolverDPrint.public void setDebugCode(int debugCode)
DPrint.setDebugCode in interface MDPSolverInterfacesetDebugCode in class MDPSolverdebugCode - the debugCode to use.public int getNumberOfValueEsitmates()
resetSolver() call.resetSolver() call.public int getNumberOfStateNodesCreated()
public GreedyQPolicy planFromState(State initialState)
GreedyQPolicy that greedily
 selects the action with the highest Q-value and breaks ties uniformly randomly.planFromState in interface PlannerinitialState - the initial state of the planning problemGreedyQPolicy.public void resetSolver()
MDPSolverInterfaceresetSolver in interface MDPSolverInterfaceresetSolver in class MDPSolverpublic java.util.List<QValue> getQs(State s)
QFunctionList of QValue objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QFunctionQValue for the given state-action pair.public double value(State s)
ValueFunctionvalue in interface ValueFunctions - the state to evaluate.protected int getCAtHeight(int height)
height - the height from a leaf node.protected SparseSampling.StateNode getStateNode(State s, int height)
s - the stateheight - the height (distance from leaf node) of the node.protected static double logbase(double base,
             double x)
base - the log basex - the input of the log