public class SparseSampling extends OOMDPPlanner implements QComputablePlanner
setHAndCByMDPError(double, double, int)
method to ensure this, however, the required horizon and C will probably be intractably large). SS must replan for every new state it sees, so an agent following it in general must replan after every step it takes in the real world. Using a
Qbased Policy
object, will ensure this behavior because this algorithm will call the planner whenever it's quried for the Qvalue for a state it has not seen.
The algorithm operates by building a tree frome the source initial state. The tree is built by sampling C outcome states for each possible stateaction pair, thereby generating new state nodes in the tree. The tree is built out to a fixed height H and then in a tail recursive way, the Qvalue and state value is estimated using a Bellman update as if the C samples perfectly defined the transition dyanamics. Because the values are are based on fixed horizon and computed in a tail recusive way, only one bellman update is required per node.
Although the complexity of the algorithm is indepdent of the state space size, it is exponential in the height of the tree, so if a large tree height is required to make good value function estimates, this algorithm may not be appropriate. Therefore, when rewards are sparse or uniform except at a distant horizon, this may not be an appropriate algorithm choice.
By default, this classs will remember the estimated Qvalue for every state from which the planFromState(State)
method was called (which will be indirectly called
by the Qvalue query methods if it does not have the Qvalue for it) and it will also remember the value of state tree nodes it computed so that they may be reused in
subsequent tree creations, thereby limiting the amount of additional computation required. However, if memory is scarce, the class can be told to forget all prior planning
results, except the Qvalue estimate for the most recently planned for state, by using the forgetPreviousPlanResults
method.
By default, the C parameter (number of state transition samples) is fixed for all nodes; however, it may also be set to use a variable C that reduces the number of sampled states the further down in the tree it is according to C_i = C_0 * gamma^(2i), where i is the depth of the node from the root and gamma is the discount factor.
By default, the state value of leafs will be set to 0, but this value can be changed by providing a ValueFunctionInitialization
object via the
setValueForLeafNodes(ValueFunctionInitialization)
method. Using a nonzero heuristic value may reduce the need for a large tree height.
This class will work with Option
s, but including options will necessarily *increase* the computational complexity, so they are not reccommeneded.
This class requires a StateHashFactory
; if the domain is continuous, just use a NameDependentStateHashFactory
instance.
This class can optionally be set to not use sampling and instead use the full Bellman update, which results in the exact finite horizon Qvalue being computed.
However, this should only be done when the number of possible state transitions is small and when the full model for the domain is defined (that is, the
Action.getTransitions(State, String[])
method is defined). To set this class to comptue the exact finite horizon value function, use the
setComputeExactValueFunction(boolean)
method. Note that you cannot use Option
s when using the fully Bellman update, because that would
required factored access to the probability of each length of each transition, which is not available from Options (it's aggregated into the transition function
itself). An exception will be thrown if Option
s are used with the full Bellman transitions.
1. Kearns, Michael, Yishay Mansour, and Andrew Y. Ng. "A sparse sampling algorithm for nearoptimal planning in large Markov decision processes." Machine Learning 49.23 (2002): 193208.
Modifier and Type  Class and Description 

static class 
SparseSampling.HashedHeightState
Tuple for a state and its height in a tree that can be hashed for quick retrieval.

class 
SparseSampling.StateNode
A class for state nodes.

QComputablePlanner.QComputablePlannerHelper
Modifier and Type  Field and Description 

protected int 
c
The number of transition dynamics samples (for the root if depthvariable C is used)

protected boolean 
computeExactValueFunction
This parameter indicates whether the exact finite horizon value function is computed or whether sparse sampling
to estimate should be used.

protected boolean 
forgetPreviousPlanResults
Whether previous planning results should be forgetten or reused; default is reused (false).

protected int 
h
The height of the tree

protected java.util.Map<SparseSampling.HashedHeightState,SparseSampling.StateNode> 
nodesByHeight
The tree nodes indexed by state and height.

protected int 
numUpdates
The total number of pseudoBellman updates

protected java.util.Map<StateHashTuple,java.util.List<QValue>> 
rootLevelQValues
The root state node Qvalues that have been estimated by previous planning calls.

protected boolean 
useVariableC
Whether the number of transition dyanmic samples should scale with the depth of the node.

protected ValueFunctionInitialization 
vinit
The state value used for leaf nodes; default is zero.

actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description 

SparseSampling(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory,
int h,
int c)
Initializes.

Modifier and Type  Method and Description 

boolean 
computesExactValueFunction()
Returns whether this planner comptues the exact finite horizon value function (by using the full transition dynamics) or whether
it estimates the value funciton with sampling.

int 
getC()
Returns the number of state transition samples

protected int 
getCAtHeight(int height)
Returns the value of C for a node at the given height (height from a leaf node).

int 
getDebugCode()
Returns the debug code used for logging plan results with
DPrint . 
int 
getH()
Returns the height of the tree

int 
getNumberOfStateNodesCreated()
Returns the total number of state nodes that have been created.

int 
getNumberOfValueEsitmates()
Returns the total number of state value estimates performed since the
resetPlannerResults() call. 
QValue 
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given stateaction pair. 
java.util.List<QValue> 
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. 
protected SparseSampling.StateNode 
getStateNode(State s,
int height)
Either returns, or creates, indexes, and returns, the state node for the given state at the given height in the tree

protected static double 
logbase(double base,
double x)
Retuns the log value at the given bases.

void 
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state

void 
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. 
void 
setC(int c)
Sets the number of state transition samples used.

void 
setComputeExactValueFunction(boolean computeExactValueFunction)
Sets whether this planner will compute the exact finite horizon value funciton (using the full transition dynamics) or if sampling
to estimate the value function will be used.

void 
setDebugCode(int debugCode)
Sets the debug code used for logging plan results with
DPrint . 
void 
setForgetPreviousPlanResults(boolean forgetPreviousPlanResults)
Sets whether previous planning results should be forgetten or resued in subsequent planning.

void 
setH(int h)
Sets the height of the tree.

void 
setHAndCByMDPError(double rmax,
double epsilon,
int numActions)
Sets the height and number of transition dynamics samples in a way that ensure epsilon optimality.

void 
setUseVariableCSize(boolean useVariableC)
Sets whether the number of state transition samples (C) should be variable with respect to the depth of the node.

void 
setValueForLeafNodes(ValueFunctionInitialization vinit)
Sets the
ValueFunctionInitialization object to use for settting the value of leaf nodes. 
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected int h
protected int c
protected boolean useVariableC
protected boolean forgetPreviousPlanResults
protected ValueFunctionInitialization vinit
protected boolean computeExactValueFunction
protected java.util.Map<SparseSampling.HashedHeightState,SparseSampling.StateNode> nodesByHeight
protected java.util.Map<StateHashTuple,java.util.List<QValue>> rootLevelQValues
protected int numUpdates
public SparseSampling(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory, int h, int c)
setHAndCByMDPError(double, double, int)
method, but in
general this will result in very large values that will be intractable. If you set c = 1, then the full transition dynamics will be used. You should
only use the full transition dynanics if the number of possible transitions from each state is small and if the domain Action object's Action.getTransitions(State, String[])
method is defined.domain
 the planning domainrf
 the reward functiontf
 the terminal functiongamma
 the discount factorhashingFactory
 the state hashing factory for matching generated states with their state nodes. If the domain is continuous, use a NameDependentStateHashFactory
h
 the height of the treec
 the number of transition dynamics samples used. If set to 1, then the full transition dynamics are used.public void setHAndCByMDPError(double rmax, double epsilon, int numActions)
rmax
 the maximum reward value of the MDPepsilon
 the epsilon optimality (amount that the estimated value funciton may diverge from the true optimal)numActions
 the maximum number of actions that could be applied from a statepublic void setUseVariableCSize(boolean useVariableC)
useVariableC
 if true, then depthvariable C will be used; if false, all state nodes use the same number of samples.public void setC(int c)
c
 the number of state transition samples used. If 1, then the full transition dynamics are used.public void setH(int h)
h
 the height of the tree.public int getC()
public int getH()
public void setComputeExactValueFunction(boolean computeExactValueFunction)
computeExactValueFunction
 if true, the exact finite horizon value function is computed; if false, then sampling is used.public boolean computesExactValueFunction()
public void setForgetPreviousPlanResults(boolean forgetPreviousPlanResults)
forgetPreviousPlanResults
 if true, then previous planning results will be forgotten; if true, they will be remembered and reused in susbequent planning.public void setValueForLeafNodes(ValueFunctionInitialization vinit)
ValueFunctionInitialization
object to use for settting the value of leaf nodes.vinit
 the ValueFunctionInitialization
object to use for settting the value of leaf nodes.public int getDebugCode()
DPrint
.getDebugCode
in class OOMDPPlanner
DPrint
.public void setDebugCode(int debugCode)
DPrint
.setDebugCode
in class OOMDPPlanner
debugCode
 the debugCode to use.public int getNumberOfValueEsitmates()
resetPlannerResults()
call.resetPlannerResults()
call.public int getNumberOfStateNodesCreated()
public void planFromState(State initialState)
OOMDPPlanner
planFromState
in class OOMDPPlanner
initialState
 the initial state of the planning problempublic void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class OOMDPPlanner
public java.util.List<QValue> getQs(State s)
QComputablePlanner
List
of QValue
objects for ever permissible action for the given input state.getQs
in interface QComputablePlanner
s
 the state for which Qvalues are to be returned.List
of QValue
objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QComputablePlanner
QValue
for the given stateaction pair.getQ
in interface QComputablePlanner
s
 the input statea
 the input actionQValue
for the given stateaction pair.protected int getCAtHeight(int height)
height
 the height from a leaf node.protected SparseSampling.StateNode getStateNode(State s, int height)
s
 the stateheight
 the height (distance from leaf node) of the node.protected static double logbase(double base, double x)
base
 the log basex
 the input of the log