public class BFSMarkovOptionModel extends java.lang.Object implements FullModel
SampleModel
. A FullModel
is
required for the transitions(State, Action)
method. Note that the transitions model for an option
is a multi-time model, which means the state transition probabilities factor in the discount factor. That is,
P(s' | s, o) = \sum_k^\ifnty p(s', k | s, o) \gamma^k, where p(s', k | s, o) is the probability that the
agent will terminate in state s' after k steps, given that option o was initiated in state s.
The computation of the transition model can be quite
expensive (particularly for stochastic domains) and ideally, you should consider a custom implementation of your option model. The computation of
the model proceeds by running a BFS-like algorithm from the input state following the option policy
to possible option (or environment) termination states. The BFS expansion will stop when a minimum threshold
of the probability mass of all possible trajectories following the policy is computed (by default 0.999). However,
you can shrink the probability threshold using the method setMinProb(double)
to decrease computation time.
When you decrease the probability threshold,
the compute probabilities are normalized by the amount of the trajectory probability mass computed, given
an estimated option transition model.
If you need a model for non-Markov options (e.g., a MacroAction
), use
the BFSNonMarkovOptionModel
model, which using slightly more memory overhead in the computation to maintain
the fully trajectory history.
Modifier and Type | Class and Description |
---|---|
static class |
BFSMarkovOptionModel.CachedModel |
static class |
BFSMarkovOptionModel.OptionScanNode |
FullModel.Helper
Modifier and Type | Field and Description |
---|---|
protected java.util.Map<Option,BFSMarkovOptionModel.CachedModel> |
cachedModels |
protected double |
discount |
protected HashableStateFactory |
hashingFactory |
protected double |
minProb |
protected SampleModel |
model |
protected boolean |
requireMarkov |
protected java.util.Set<HashableState> |
srcTerminateStates |
Constructor and Description |
---|
BFSMarkovOptionModel(SampleModel model,
double discount,
HashableStateFactory hashingFactory) |
Modifier and Type | Method and Description |
---|---|
protected double |
computeTransitions(State s,
Option o,
HashedAggregator<HashableState> possibleTerminations,
double[] expectedReturn) |
protected BFSMarkovOptionModel.CachedModel |
getOrCreateModel(Option o) |
EnvironmentOutcome |
sample(State s,
Action a)
Samples a transition from the transition distribution and returns it.
|
void |
setMinProb(double minProb) |
boolean |
terminal(State s)
Indicates whether a state is a terminal state (i.e., no more action occurs and zero reward received from there on out)
|
java.util.List<TransitionProb> |
transitions(State s,
Action a)
|
protected SampleModel model
protected double discount
protected HashableStateFactory hashingFactory
protected java.util.Map<Option,BFSMarkovOptionModel.CachedModel> cachedModels
protected java.util.Set<HashableState> srcTerminateStates
protected double minProb
protected boolean requireMarkov
public BFSMarkovOptionModel(SampleModel model, double discount, HashableStateFactory hashingFactory)
public void setMinProb(double minProb)
public java.util.List<TransitionProb> transitions(State s, Action a)
FullModel
Action
is applied in State
s. The returned
list only needs to include transitions that have non-zero probability of occurring.transitions
in interface FullModel
s
- the source State
a
- the Action
applied in the source statepublic EnvironmentOutcome sample(State s, Action a)
SampleModel
sample
in interface SampleModel
s
- the source statea
- the action taken in the source stateEnvironmentOutcome
describing the sampled transitionpublic boolean terminal(State s)
SampleModel
terminal
in interface SampleModel
s
- the input state to testprotected BFSMarkovOptionModel.CachedModel getOrCreateModel(Option o)
protected double computeTransitions(State s, Option o, HashedAggregator<HashableState> possibleTerminations, double[] expectedReturn)