Option

java.lang.Object
- burlap.oomdp.singleagent.Action
- - burlap.behavior.singleagent.options.Option

All Implemented Interfaces:

FullActionModel

Direct Known Subclasses:

DeterministicTerminationOption, MacroAction, PolicyDefinedSubgoalOption
```
public abstract class Option
extends Action
implements FullActionModel
```
This is an abstract class to provide support to learning and planning with options [1], which are temporally extended actions. Options may be Markov or non-Markov. An example of a non-Markov option is a macro action whose termination depends on its action history. Because options are subclasses of the Action class, they may be trivally added to any planning or learning algorithm. Some planning and learning algorithms should handle options specially; for instance Q-learning needs to treat the return from options specially. However, the current planning and learning algorithms all handle options in the appropriately special ways so that Options may be used confidently with existing algorithms.
In order for correct value function returns from option executions to be determined, options need to keep track of the cumulative reward and number of steps they've taken since they began execution. This abstract class has data structures and code in place to automatically handle that information so that any subclass of this Option class should "just work." When an option is added to an MDPSolver object through the MDPSolver.addNonDomainReferencedAction(Action) method, it will automatically tell the Option which reward function and discount factor it should be using to keep track of the cumulative reward.
Note that value function planning algorithms that use the Bellman update (such as value iteration) require the option to return not only the possible terminal states, but the expected number of steps to those terminal states and the expected cumulative reward. By default, this abstract Option class will compute those transition dynamics through a branching exploration of the possible outcomes at each step of execution and save the results so that they do not need to be computed again. If an option is stochastic or if the underlining domain is stochastic, there may be an infinite number of possible outcomes. As a result, the transition dynamics computation will stop searching for states at given horizons that are less than some small probability of occurring (by default set to 0.001). This threshold hold may be modified. However, if these transition dynamics can be specified a priori, it is recommended that the getTransitions(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction) method is overridden and specified by hand rather than requiring this class to have to enumerate the results. Finally, note that the getTransitions(State, burlap.oomdp.singleagent.GroundedAction) returns TransitionProbability elements, where each TransitionProbability holds the probability of transitioning to a state discounted by the the expected length of time. That is, the probability value in each TransitionProbability is
\sum_k \gamma^k * p(s, s', k)
where p(s, s', k) is the probability that the option will terminate in s' after being initiated in state s and taking k steps, gamma is the discount factor and s' is the state associated with the probability value in the TransitionProbability object.
1. Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1 (1999): 181-211.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.oomdp.singleagent.FullActionModel
  FullActionModel.FullActionModelHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.Map<HashableState,java.util.List<TransitionProbability>>`	`cachedExpectations` The cached transition probabilities from each initiation state
`protected java.util.Map<HashableState,java.lang.Double>`	`cachedExpectedRewards` The cached expected reward from each initiation state
`protected double`	`cumulativeDiscount` How much to discount the reward in the next option step
`protected double`	`discountFactor` discount factor of the MDP in which this option will be applied
`protected double`	`expectationSearchCutoffProb` The minimum probability a possible terminal state being reached to be included in the computed transition dynamics
`protected HashableStateFactory`	`expectationStateHashingFactory` State hash factory used to cache the transition probabilities so that they only need to be computed once for each state
`protected TerminalFunction`	`externalTerminalFunction` the terminal function of the MDP in which this option is to be executed.
`protected boolean`	`keepTrackOfReward` boolean indicating whether the cumulative reward during execution should be recorded
`protected double`	`lastCumulativeReward` the cumulative reward received during the last execution of this option
`protected int`	`lastNumSteps` How many steps were taken in the options last execution
`protected EpisodeAnalysis`	`lastOptionExecutionResults` Stores the last execution results of an option from the initiation state to the state in which it terminated
`protected java.util.Random`	`rand` Random object for following stochastic option policies
`protected RewardFunction`	`rf` reward function for keeping track of the cumulative reward during an execution
`protected boolean`	`shouldAnnotateExecution` Boolean indicating whether the last option execution recording annotates the selected actions with this option's name
`protected boolean`	`shouldRecordResults` Boolean indicating whether the last option execution result should be saved
`protected StateMapping`	`stateMapping` An option state mapping to use to map from a source MDP state representation to a representation that this option will use for action selection.
`protected DirectOptionTerminateMapper`	`terminateMapper` An optional mapping from initiation states to terminal states so that the execution of an option does not need to be simulated.

Fields inherited from class burlap.oomdp.singleagent.Action
actionObservers, domain, name

Constructor Summary

Constructors
Constructor and Description

Option()
Initializes an option without a name and parameters.

Option(java.lang.String name, Domain domain)
Initializes.

Constructors
Constructor and Description
`Option()` Initializes an option without a name and parameters.
`Option(java.lang.String name, Domain domain)` Initializes.

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`accumulateDiscountedProb(java.util.Map<HashableState,java.lang.Double> possibleTerminations, State s, double p)` Adds to the expected discounted probability of reaching state given a value p, where p = \gamma^k * p(s, s', k), where s' is a possible terminal state and k is a unique number of steps not yet added to sum over all possible step sizes to s'.
`boolean`	`continueFromState(State s, GroundedAction groundedAction)` This method will use this option's termination probability, roll the dice, and return whether the option should continue or terminate.
`abstract java.util.List<Policy.ActionProb>`	`getActionDistributionForState(State s, GroundedAction groundedAction)` Returns the option's policy distribution for a given state.
`protected java.util.List<Policy.ActionProb>`	`getDeterministicPolicy(State s, GroundedAction groundedAction)` This method creates a deterministic action selection probability distribution where the deterministic action to be selected with probability 1 is the one returned by the method `getDeterministicPolicy(State, burlap.oomdp.singleagent.GroundedAction)`.
`double`	`getExpectedRewards(State s, GroundedAction groundedAction)` Returns the expected reward to be received from initiating this option from state s.
`double`	`getLastCumulativeReward()` Returns the cumulative discounted reward received in last execution of this option.
`EpisodeAnalysis`	`getLastExecutionResults()` Returns the events from this option's last execution
`int`	`getLastNumSteps()` Returns the number of steps taken in the last execution of this option.
`java.util.List<TransitionProbability>`	`getTransitions(State st, GroundedAction groundedAction)` Returns the transition probabilities for applying this action in the given state with the given set of parameters.
`void`	`initiateInState(State s, GroundedAction groundedAction)` Tells the option that it is being initiated in the given state with the given parameters.
`abstract void`	`initiateInStateHelper(State s, GroundedAction groundedAction)` This method is always called when an option is initiated and begins execution.
`boolean`	`isAnnotatingExecutionResults()` Returns whether this option is annotating recorded action executions with this option's name.
`abstract boolean`	`isMarkov()` Returns whether this option is Markov or not; that is, whether action selection and termination only depends on the current state.
`boolean`	`isPrimitive()` Returns whether this action is a primitive action of the domain or not.
`boolean`	`isRecordingExecutionResults()` Returns whether this option is recording its executions
`protected void`	`iterateExpectationScan(burlap.behavior.singleagent.options.Option.ExpectationSearchNode src, double stackedDiscount, java.util.Map<HashableState,java.lang.Double> possibleTerminations, double[] expectedReturn)` This method will recursively determine all possible paths that could occur from execution of the option as well as the expected return.
`void`	`keepTrackOfRewardWith(RewardFunction rf, double discount)` Tells this option to keep track the cumulative reward from its execution using the given reward function and the given discount factor.
`protected State`	`map(State s)` Returns the state that is mapped from the input state.
`EnvironmentOutcome`	`oneStep(Environment env, GroundedAction groundedAction)` Performs one step of execution of the option in the provided `Environment`.
`State`	`oneStep(State s, GroundedAction groundedAction)` Performs one step of execution of the option.
`abstract GroundedAction`	`oneStepActionSelection(State s, GroundedAction groundedAction)` This method causes the option to select a single step in the given state, when the option was initiated with the provided parameters.
`protected State`	`performActionHelper(State st, GroundedAction groundedAction)` This method determines what happens when an action is applied in the given state with the given parameters.
`EnvironmentOutcome`	`performInEnvironment(Environment env, GroundedAction groundedActions)` Executes this action with the specified parameters in the provided environment and returns the `EnvironmentOutcome` result.
`abstract double`	`probabilityOfTermination(State s, GroundedAction groundedAction)` Returns the probability that this option (executed with the given parameters) will terminate in the given state
`void`	`setExernalTermination(TerminalFunction tf)` Sets what the external MDPs terminal function is that will cause this option to terminate if it enters those terminal states.
`void`	`setExpectationCalculationProbabilityCutoff(double cutoff)` Sets the minimum probability of reaching a terminal state for it to be included in the options computed transition dynamics distribution.
`void`	`setExpectationHashingFactory(HashableStateFactory hashingFactory)` Sets the option to use the provided hashing factory for caching transition probability results.
`void`	`setStateMapping(StateMapping m)` Sets this option to use a state mapping that maps from the source MDP states to another state representation that will be used by this option for making action selections.
`void`	`setTerminateMapper(DirectOptionTerminateMapper tm)` Sets this option to determine its execution results using a direct terminal state mapping rather than actually executing each action selcted by the option step by step.
`void`	`toggleShouldAnnotateResults(boolean toggle)` Toggle whether the last recorded option execution will annotate the actions taken with this option's name
`void`	`toggleShouldRecordResults(boolean toggle)` Change whether the options last execution will be recorded or not.
`abstract boolean`	`usesDeterministicPolicy()` Returns whether this option's policy is deterministic or stochastic
`abstract boolean`	`usesDeterministicTermination()` Returns whether this option's termination conditions are deterministic or stochastic

Methods inherited from class burlap.oomdp.singleagent.Action
addActionObserver, applicableInState, clearAllActionsObservers, deterministicTransition, equals, getAllApplicableGroundedActions, getAllApplicableGroundedActionsFromActionList, getAssociatedGroundedAction, getDomain, getGroundedAction, getName, hashCode, isParameterized, performAction

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - rand
```
protected java.util.Random rand
```
    Random object for following stochastic option policies
  - lastOptionExecutionResults
```
protected EpisodeAnalysis lastOptionExecutionResults
```
    Stores the last execution results of an option from the initiation state to the state in which it terminated
  - shouldRecordResults
```
protected boolean shouldRecordResults
```
    Boolean indicating whether the last option execution result should be saved
  - shouldAnnotateExecution
```
protected boolean shouldAnnotateExecution
```
    Boolean indicating whether the last option execution recording annotates the selected actions with this option's name
  - rf
```
protected RewardFunction rf
```
    reward function for keeping track of the cumulative reward during an execution
  - keepTrackOfReward
```
protected boolean keepTrackOfReward
```
    boolean indicating whether the cumulative reward during execution should be recorded
  - discountFactor
```
protected double discountFactor
```
    discount factor of the MDP in which this option will be applied
  - lastCumulativeReward
```
protected double lastCumulativeReward
```
    the cumulative reward received during the last execution of this option
  - cumulativeDiscount
```
protected double cumulativeDiscount
```
    How much to discount the reward in the next option step
  - lastNumSteps
```
protected int lastNumSteps
```
    How many steps were taken in the options last execution
  - externalTerminalFunction
```
protected TerminalFunction externalTerminalFunction
```
    the terminal function of the MDP in which this option is to be executed. Will cause an option to prematurely end if ever reached.
  - expectationStateHashingFactory
```
protected HashableStateFactory expectationStateHashingFactory
```
    State hash factory used to cache the transition probabilities so that they only need to be computed once for each state
  - cachedExpectations
```
protected java.util.Map<HashableState,java.util.List<TransitionProbability>> cachedExpectations
```
    The cached transition probabilities from each initiation state
  - cachedExpectedRewards
```
protected java.util.Map<HashableState,java.lang.Double> cachedExpectedRewards
```
    The cached expected reward from each initiation state
  - expectationSearchCutoffProb
```
protected double expectationSearchCutoffProb
```
    The minimum probability a possible terminal state being reached to be included in the computed transition dynamics
  - stateMapping
```
protected StateMapping stateMapping
```
    An option state mapping to use to map from a source MDP state representation to a representation that this option will use for action selection.
  - terminateMapper
```
protected DirectOptionTerminateMapper terminateMapper
```
    An optional mapping from initiation states to terminal states so that the execution of an option does not need to be simulated. This can only be used in special circumstances. See the DirectOptionTerminateMapper class documentation for more information.
- Constructor Detail
  - Option
```
public Option()
```
    Initializes an option without a name and parameters. These values should be filled in by other means otherwise the option may not work.
  - Option
```
public Option(java.lang.String name,
      Domain domain)
```
    Initializes.
    
    Parameters:
    name - the name of the option (should be unique from other options and actions a planning/learning algorithm can use)
    domain - a domain with which this option is associated; note that this option will *not* be added to domain's list of actions like a normal action.
- Method Detail
  - isMarkov
```
public abstract boolean isMarkov()
```
    Returns whether this option is Markov or not; that is, whether action selection and termination only depends on the current state.
    
    Returns:
    True if this option is Markov ; false otherwise.
  - usesDeterministicTermination
```
public abstract boolean usesDeterministicTermination()
```
    Returns whether this option's termination conditions are deterministic or stochastic
    
    Returns:
    true if this option's termination conditions are deterministic; false if stochastic.
  - usesDeterministicPolicy
```
public abstract boolean usesDeterministicPolicy()
```
    Returns whether this option's policy is deterministic or stochastic
    
    Returns:
    true if this option's policy is deterministic; false if stochastic
  - probabilityOfTermination
```
public abstract double probabilityOfTermination(State s,
                              GroundedAction groundedAction)
```
    Returns the probability that this option (executed with the given parameters) will terminate in the given state
    
    Parameters:
    s - the state to test for termination
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    the probability that this option (executed with the given parameters) will terminate in the given state
  - initiateInStateHelper
```
public abstract void initiateInStateHelper(State s,
                         GroundedAction groundedAction)
```
    This method is always called when an option is initiated and begins execution. Specifically, it is called from the performActionHelper(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction) For Markov options, this method probably does not need to do anything, but for non-Markov options, like Macro actions, it may need to initialize some structures for determining termination and action selection.
    
    Parameters:
    s - the state in which the option was initiated
    groundedAction - the parameters in which this option will be initiated
  - oneStepActionSelection
```
public abstract GroundedAction oneStepActionSelection(State s,
                                    GroundedAction groundedAction)
```
    This method causes the option to select a single step in the given state, when the option was initiated with the provided parameters. This method will be called by the performActionHelper(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction) method until it is determined that the option terminates.
    
    Parameters:
    s - the state in which an action should be selected.
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    the action the option has selected to take in State s
  - getActionDistributionForState
```
public abstract java.util.List<Policy.ActionProb> getActionDistributionForState(State s,
                                                              GroundedAction groundedAction)
```
    Returns the option's policy distribution for a given state. This method is primarily used by the methods for computing transition dynamics. Note that if this is a non-Markov option, the returned distribution should be with respect to the state in which the option was executed and any previous actions it took that influence behavior.
    
    Parameters:
    s - the state for which this option's policy distribution should be returned
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    this options policy distribution for the given state.
  - setExpectationHashingFactory
```
public void setExpectationHashingFactory(HashableStateFactory hashingFactory)
```
    Sets the option to use the provided hashing factory for caching transition probability results.
    
    Parameters:
    hashingFactory - the state hashing factory to use.
  - setExpectationCalculationProbabilityCutoff
```
public void setExpectationCalculationProbabilityCutoff(double cutoff)
```
    Sets the minimum probability of reaching a terminal state for it to be included in the options computed transition dynamics distribution.
    
    Parameters:
    cutoff - the minimum probability of reaching a terminal state for it to be included in the options computed transition dynamics distribution.
  - toggleShouldRecordResults
```
public void toggleShouldRecordResults(boolean toggle)
```
    Change whether the options last execution will be recorded or not.
    
    Parameters:
    toggle - true if the last option execution should be saved; false otherwise.
  - toggleShouldAnnotateResults
```
public void toggleShouldAnnotateResults(boolean toggle)
```
    Toggle whether the last recorded option execution will annotate the actions taken with this option's name
    
    Parameters:
    toggle - true if the last recorded option execution will annotate the actions taken with this option's name; false otherwise
  - isRecordingExecutionResults
```
public boolean isRecordingExecutionResults()
```
    Returns whether this option is recording its executions
    
    Returns:
    true if this option is recording its executions; false otherwise.
  - isAnnotatingExecutionResults
```
public boolean isAnnotatingExecutionResults()
```
    Returns whether this option is annotating recorded action executions with this option's name.
    
    Returns:
    true if this option is annotating recorded action executions with this option's name; false otherwise.
  - getLastExecutionResults
```
public EpisodeAnalysis getLastExecutionResults()
```
    Returns the events from this option's last execution
    
    Returns:
    the events from this option's last execution
  - setStateMapping
```
public void setStateMapping(StateMapping m)
```
    Sets this option to use a state mapping that maps from the source MDP states to another state representation that will be used by this option for making action selections.
    
    Parameters:
    m - the state mapping to use.
  - setTerminateMapper
```
public void setTerminateMapper(DirectOptionTerminateMapper tm)
```
    Sets this option to determine its execution results using a direct terminal state mapping rather than actually executing each action selcted by the option step by step. A method like this should only be used under specific circumstances. See the DirectOptionTerminateMapper class documentation for more information.
    
    Parameters:
    tm - the direct state to terminal state mapping to use.
  - setExernalTermination
```
public void setExernalTermination(TerminalFunction tf)
```
    Sets what the external MDPs terminal function is that will cause this option to terminate if it enters those terminal states.
    
    Parameters:
    tf - the external MDPs terminal function is
  - map
```
protected State map(State s)
```
    Returns the state that is mapped from the input state. If this option does not using a state mapping, then the input state is returned.
    
    Parameters:
    s - the input state from which a mapped state is to be returned.
    
    Returns:
    the state that is mapped from the input state.
  - keepTrackOfRewardWith
```
public void keepTrackOfRewardWith(RewardFunction rf,
                         double discount)
```
    Tells this option to keep track the cumulative reward from its execution using the given reward function and the given discount factor.
    
    Parameters:
    rf - the reward function to use
    discount - the discount factor to use
  - getLastCumulativeReward
```
public double getLastCumulativeReward()
```
    Returns the cumulative discounted reward received in last execution of this option.
    
    Returns:
    the cumulative discounted reward received in last execution of this option.
  - getLastNumSteps
```
public int getLastNumSteps()
```
    Returns the number of steps taken in the last execution of this option.
    
    Returns:
    the number of steps taken in the last execution of this option.
  - isPrimitive
```
public boolean isPrimitive()
```
    Description copied from class: Action
    
    Returns whether this action is a primitive action of the domain or not. A primitive action is defined to be an action that always takes one time step.
    
    Specified by:
    
    isPrimitive in class Action
    
    Returns:
    true if the action is primitive; false otherwise.
  - initiateInState
```
public void initiateInState(State s,
                   GroundedAction groundedAction)
```
    Tells the option that it is being initiated in the given state with the given parameters. Will set auxiliary data such as the cumulative reward received in the last execution to 0 since the option is about to be executed again. The initiateInStateHelper(State, burlap.oomdp.singleagent.GroundedAction) method will be called before exiting.
    
    Parameters:
    s - the state in which the option is being initiated.
    groundedAction - the parameters in which this option was initiated
  - performActionHelper
```
protected State performActionHelper(State st,
                        GroundedAction groundedAction)
```
    Description copied from class: Action
    
    This method determines what happens when an action is applied in the given state with the given parameters. The State object s may be directly modified in this method since the parent method (Action.performAction(burlap.oomdp.core.states.State, GroundedAction) first copies the input state to pass to this helper method. The resulting state (which may be s) should then be returned.
    
    Specified by:
    
    performActionHelper in class Action
    
    Parameters:
    st - the state to perform the action on
    groundedAction - the GroundedAction specifying the parameters to use
    
    Returns:
    the resulting State from performing this action
  - performInEnvironment
```
public EnvironmentOutcome performInEnvironment(Environment env,
                                      GroundedAction groundedActions)
```
    Description copied from class: Action
    
    Executes this action with the specified parameters in the provided environment and returns the EnvironmentOutcome result.
    
    Overrides:
    
    performInEnvironment in class Action
    
    Parameters:
    env - the environment in which the action should be performed.
    groundedActions - the GroundedAction specifying the parameters to use
    
    Returns:
    an EnvironmentOutcome specifying the result of the action execution in the environment
  - oneStep
```
public State oneStep(State s,
            GroundedAction groundedAction)
```
    Performs one step of execution of the option. This method assumes that the initiateInState(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction) method was called previously for the state in which this option was initiated.
    
    Parameters:
    s - the state in which a single step of the option is to be taken.
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    the resulting state from a single step of the option being performed.
  - oneStep
```
public EnvironmentOutcome oneStep(Environment env,
                         GroundedAction groundedAction)
```
    Performs one step of execution of the option in the provided Environment. This method assuems that the initiateInState(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction) method was called previously for the state in which this option was initiated.
    
    Parameters:
    env - The Environment in which this option is to be applied
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    the EnvironmentOutcome of the one step of interaction.
  - continueFromState
```
public boolean continueFromState(State s,
                        GroundedAction groundedAction)
```
    This method will use this option's termination probability, roll the dice, and return whether the option should continue or terminate.
    
    Parameters:
    s - the state to check against
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    true if this option should continuing executing in s; false if it should terminate.
  - getExpectedRewards
```
public double getExpectedRewards(State s,
                        GroundedAction groundedAction)
```
    Returns the expected reward to be received from initiating this option from state s.
    
    Parameters:
    s - the state in which the option is initiated
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    the expected reward to be received from initiating this option from state s.
  - getTransitions
```
public java.util.List<TransitionProbability> getTransitions(State st,
                                                   GroundedAction groundedAction)
```
    Description copied from interface: FullActionModel
    
    Returns the transition probabilities for applying this action in the given state with the given set of parameters. Transition probabilities are specified as list of TransitionProbability objects. The list is only required to contain transitions with non-zero probability.
    
    Specified by:
    
    getTransitions in interface FullActionModel
    
    Parameters:
    st - the state from which the transition probabilities when applying this action will be returned.
    groundedAction - the GroundedAction specifying the parameters to use
    
    Returns:
    a List of transition probabilities for applying this action in the given state with the given set of parameters
  - iterateExpectationScan
```
protected void iterateExpectationScan(burlap.behavior.singleagent.options.Option.ExpectationSearchNode src,
                          double stackedDiscount,
                          java.util.Map<HashableState,java.lang.Double> possibleTerminations,
                          double[] expectedReturn)
```
    This method will recursively determine all possible paths that could occur from execution of the option as well as the expected return. This method will stop expanding the possible paths when the probability of a state being reached is less than expectationSearchCutoffProb
    
    Parameters:
    src - the source node from which to expand possible paths
    stackedDiscount - the discount amount up to this point
    possibleTerminations - a map of possible termination states and their probability
    expectedReturn - the expected discounted cumulative reward up to node src (this is an array of length 1 that is used to be a mutable double)
  - accumulateDiscountedProb
```
protected void accumulateDiscountedProb(java.util.Map<HashableState,java.lang.Double> possibleTerminations,
                            State s,
                            double p)
```
    Adds to the expected discounted probability of reaching state given a value p, where p = \gamma^k * p(s, s', k), where s' is a possible terminal state and k is a unique number of steps not yet added to sum over all possible step sizes to s'.
    
    Parameters:
    possibleTerminations - the map from of all possible termination states to the expected discounted probability of reaching them
    s - a possible termination state
    p - the discounted probability of reaching s for some specific number of steps not already summed into the respective possibleTerminations map.
  - getDeterministicPolicy
```
protected java.util.List<Policy.ActionProb> getDeterministicPolicy(State s,
                                                       GroundedAction groundedAction)
```
    This method creates a deterministic action selection probability distribution where the deterministic action to be selected with probability 1 is the one returned by the method getDeterministicPolicy(State, burlap.oomdp.singleagent.GroundedAction). This method is helpful for quickly defining the action selection distribution for deterministic option policies.
    
    Parameters:
    s - the state for which the action selection distribution should be returned.
    groundedAction - the parameters in which this option was initiated
    
    Returns:
    a deterministic action selection distribution.

Class Option

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.oomdp.singleagent.FullActionModel

Field Summary

Fields inherited from class burlap.oomdp.singleagent.Action

Constructor Summary

Method Summary

Methods inherited from class burlap.oomdp.singleagent.Action

Methods inherited from class java.lang.Object

Field Detail

rand

lastOptionExecutionResults

shouldRecordResults

shouldAnnotateExecution

rf

keepTrackOfReward

discountFactor

lastCumulativeReward

cumulativeDiscount

lastNumSteps

externalTerminalFunction

expectationStateHashingFactory

cachedExpectations

cachedExpectedRewards

expectationSearchCutoffProb

stateMapping

terminateMapper

Constructor Detail

Option

Option

Method Detail

isMarkov

usesDeterministicTermination

usesDeterministicPolicy

probabilityOfTermination

initiateInStateHelper

oneStepActionSelection

getActionDistributionForState

setExpectationHashingFactory

setExpectationCalculationProbabilityCutoff

toggleShouldRecordResults

toggleShouldAnnotateResults

isRecordingExecutionResults

isAnnotatingExecutionResults

getLastExecutionResults

setStateMapping

setTerminateMapper

setExernalTermination

map

keepTrackOfRewardWith

getLastCumulativeReward

getLastNumSteps

isPrimitive

initiateInState

performActionHelper

performInEnvironment

oneStep

oneStep

continueFromState

getExpectedRewards

getTransitions

iterateExpectationScan

accumulateDiscountedProb

getDeterministicPolicy