Policy

java.lang.Object
- burlap.behavior.policy.Policy

Direct Known Subclasses:

Actor, ApprenticeshipLearning.RandomPolicy, BeliefPolicyToPOMDPPolicy, BoltzmannQPolicy, CachedPolicy, DDPlannerPolicy, DomainMappedPolicy, EpsilonGreedy, GreedyDeterministicQPolicy, GreedyQPolicy, JointPolicy, PolicyFromJointPolicy, RandomPolicy, SDPlannerPolicy, UCTTreeWalkPolicy, UnmodeledFavoredPolicy
```
public abstract class Policy
extends java.lang.Object
```
This abstract class is used to store a policy for a domain that can be queried and perform common operations with the policy. This class provides a number of important methods for working with and defining policies. To implement this class you must implement the methods: getAction(burlap.oomdp.core.states.State), getActionDistributionForState(burlap.oomdp.core.states.State), isStochastic(), and isDefinedFor(burlap.oomdp.core.states.State).

The getAction(burlap.oomdp.core.states.State) should return the action (specified by an AbstractGroundedAction; e.g., a GroundedAction for single agent domains) this policy defines for the input State. If this Policy is a stochastic policy, then the getAction(burlap.oomdp.core.states.State) method should sample an action from its probability distribution and return it.

The getActionDistributionForState(burlap.oomdp.core.states.State) should return this Policy's action selection probability distribution for the input State. The probability distribution is specified by returning a List of Policy.ActionProb instances. An Policy.ActionProb is a pair consisting of an AbstractGroundedAction specifying the action and a double specifying the probability that this Policy would select that action.

The isStochastic() method should return true if this Policy is stochastic and false if it is deterministic.

The isDefinedFor(burlap.oomdp.core.states.State) method should return true if this Policy is defined for the input State and false if it is not.

This abstract class also has some pre-implemented methods that can be used to help define these required methods. For example, if the getActionDistributionForState(burlap.oomdp.core.states.State) is implemented and stochastic, then the getAction(burlap.oomdp.core.states.State) can be trivially implemented by having it return the result of the superclass method sampleFromActionDistribution(burlap.oomdp.core.states.State), which will get the probability distribution from the getActionDistributionForState(burlap.oomdp.core.states.State), roll a random number and return an action based on the fully define action distribution. Inversely, if the policy is deterministic and the getAction(burlap.oomdp.core.states.State) is implemented, then the getActionDistributionForState(burlap.oomdp.core.states.State) method can be trivially implemented by having it return the result of getDeterministicPolicy(burlap.oomdp.core.states.State), which will call getAction(burlap.oomdp.core.states.State) and wrap the result in an Policy.ActionProb object with assigned probability of 1.0.

Superclass method
This class also has many superclass methods for interacting with policy. These include getProbOfAction(burlap.oomdp.core.states.State, burlap.oomdp.core.AbstractGroundedAction), evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction) (and other variants of the method signature), and evaluateBehavior(burlap.oomdp.singleagent.environment.Environment) (and other variants of the method signature).

The getProbOfAction(burlap.oomdp.core.states.State, burlap.oomdp.core.AbstractGroundedAction) method takes as input a State and AbstractGroundedAction and returns the probability of this Policy selecting that action. It uses the result of the getActionDistributionForState(burlap.oomdp.core.states.State) method to determine the full distribution, finds the matching AbstractGroundedAction in the returned list, and then returns its assigned probability. It may be possible to return this value in a more efficient way than enumerating the full probability distribution, in which case you may want to consider overriding the method.

The evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction), evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, int), and evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, int) methods will all evaluate this policy by rolling it out from the input State or until it reaches a terminal state or executes for the maximum number of steps (depending on which version of the method you use). The resulting behavior will be saved in an EpisodeAnalysis object that is returned. Note that this method requires that the returned AbstractGroundedAction instances are able to be executed using the action's defined transition dynamics. For single agent domains in which the actions are GroundedAction instances, this will work as long as the corresponding Action.performAction(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction) method is implemented. If this policy defines the policy for an agent in a stochastic game, returning GroundedSGAgentAction instances for the action, then the policy cannot be rolled out since the outcome state would depend on the action selection of other agents.

The evaluateBehavior(burlap.oomdp.singleagent.environment.Environment) and evaluateBehavior(burlap.oomdp.singleagent.environment.Environment, int) methods will execute this policy in some input Environment until either the Environment reaches a terminal state or the maximum number of steps are taken (depending on which method signature is used). This method is useful if a policy was computed with a planning algorithm using some model of the world and then needs to be executed in an environment which may have slightly different transitions; for example, planning a policy for a robot using a model of the world and then executing it on the actual robot by following the policy in an Environment.

All of the evaluateBehavior methods also know how to work with Options. In particular, they also are able to record the option execution in the returned EpisodeAnalysis object in verbose ways for better debugging. By default, when an option is selected in an evaluateBehavior method, each primitive step will be recorded in the EpisodeAnalysis object, rather than only recording that the option was taken. Additionally, in the returned EpisodeAnalysis, each primitive step by default will be annotated with the option the executed and which step in the option execution that it was. If you would like to disable option decomposition and/or the option annotation, you can do so with the evaluateMethodsShouldDecomposeOption(boolean) and evaluateMethodsShouldAnnotateOptionDecomposition(boolean) methods.

Author:

James MacGlashan

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`Policy.ActionProb` Class for storing an action and probability tuple.
`static class`	`Policy.GroundedAnnotatedAction` A class for annotating an action selection, specified with a `GroundedAction`, with a string.
`static class`	`Policy.PolicyUndefinedException` RuntimeException to be thrown when a Policy is queried for a state in which the policy is undefined.

Field Summary

Fields
Modifier and Type Field and Description

protected boolean annotateOptionDecomposition

protected boolean evaluateDecomposesOptions

Fields
Modifier and Type	Field and Description
`protected boolean`	`annotateOptionDecomposition`
`protected boolean`	`evaluateDecomposesOptions`

Constructor Summary

Constructors
Constructor and Description

Policy()

Constructors
Constructor and Description
`Policy()`

Method Summary

Methods
Modifier and Type	Method and Description
`EpisodeAnalysis`	`evaluateBehavior(Environment env)` Evaluates this policy in the provided `Environment`.
`EpisodeAnalysis`	`evaluateBehavior(Environment env, int numSteps)` Evaluates this policy in the provided `Environment`.
`EpisodeAnalysis`	`evaluateBehavior(State s, RewardFunction rf, int numSteps)` This method will return the an episode that results from following this policy from state s.
`EpisodeAnalysis`	`evaluateBehavior(State s, RewardFunction rf, TerminalFunction tf)` This method will return the an episode that results from following this policy from state s.
`EpisodeAnalysis`	`evaluateBehavior(State s, RewardFunction rf, TerminalFunction tf, int maxSteps)` This method will return the an episode that results from following this policy from state s.
`void`	`evaluateMethodsShouldAnnotateOptionDecomposition(boolean toggle)` Sets whether options that are decomposed into primitives will have the option that produced them and listed.
`void`	`evaluateMethodsShouldDecomposeOption(boolean toggle)` Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects.
`protected void`	`followAndRecordPolicy(Environment env, EpisodeAnalysis ea)` Follows this policy for one time step in the provided `Environment` and records the interaction in the provided `EpisodeAnalysis` object.
`protected State`	`followAndRecordPolicy(EpisodeAnalysis ea, State cur, RewardFunction rf)` Follows this policy for one time step from the provided `State` and records the interaction in the provided `EpisodeAnalysis` object.
`abstract AbstractGroundedAction`	`getAction(State s)` This method will return an action sampled by the policy for the given state.
`abstract java.util.List<Policy.ActionProb>`	`getActionDistributionForState(State s)` This method will return action probability distribution defined by the policy.
`protected java.util.List<Policy.ActionProb>`	`getDeterministicPolicy(State s)` A helper method for defining deterministic policies.
`double`	`getProbOfAction(State s, AbstractGroundedAction ga)` Will return the probability of this policy taking action ga in state s
`static double`	`getProbOfActionGivenDistribution(AbstractGroundedAction ga, java.util.List<Policy.ActionProb> distribution)` Searches the input distribution for the occurrence of the input action and returns its probability.
`static double`	`getProbOfActionGivenDistribution(State s, AbstractGroundedAction ga, java.util.List<Policy.ActionProb> distribution)` Deprecated.
`abstract boolean`	`isDefinedFor(State s)` Specifies whether this policy is defined for the input state.
`abstract boolean`	`isStochastic()` Indicates whether the policy is stochastic or deterministic.
`protected AbstractGroundedAction`	`sampleFromActionDistribution(State s)` This is a helper method for stochastic policies.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - evaluateDecomposesOptions
```
protected boolean evaluateDecomposesOptions
```
  - annotateOptionDecomposition
```
protected boolean annotateOptionDecomposition
```
- Constructor Detail
  - Policy
```
public Policy()
```
- Method Detail
  - getAction
```
public abstract AbstractGroundedAction getAction(State s)
```
    This method will return an action sampled by the policy for the given state. If the defined policy is stochastic, then multiple calls to this method for the same state may return different actions. The sampling should be with respect to defined action distribution that is returned by getActionDistributionForState
    
    Parameters:
    s - the state for which an action should be returned
    
    Returns:
    a sample action from the action distribution; null if the policy is undefined for s
  - getActionDistributionForState
```
public abstract java.util.List<Policy.ActionProb> getActionDistributionForState(State s)
```
    This method will return action probability distribution defined by the policy. The action distribution is represented by a list of ActionProb objects, each which specifies a grounded action and a probability of that grounded action being taken. The returned list does not have to include actions with probability 0.
    
    Parameters:
    s - the state for which an action distribution should be returned
    
    Returns:
    a list of possible actions taken by the policy and their probability.
  - isStochastic
```
public abstract boolean isStochastic()
```
    Indicates whether the policy is stochastic or deterministic.
    
    Returns:
    true when the policy is stochastic; false when it is deterministic.
  - isDefinedFor
```
public abstract boolean isDefinedFor(State s)
```
    Specifies whether this policy is defined for the input state.
    
    Parameters:
    s - the input state to test for whether this policy is defined
    
    Returns:
    true if this policy is defined for State s, false otherwise.
  - getProbOfAction
```
public double getProbOfAction(State s,
                     AbstractGroundedAction ga)
```
    Will return the probability of this policy taking action ga in state s
    
    Parameters:
    s - the state in which the action would be taken
    ga - the action being queried
    
    Returns:
    the probability of this policy taking action ga in state s
  - getProbOfActionGivenDistribution
```
@Deprecated
public static double getProbOfActionGivenDistribution(State s,
                                                 AbstractGroundedAction ga,
                                                 java.util.List<Policy.ActionProb> distribution)
```
    Deprecated.
    
    Don't use this, the input state is not necessary; instead use getProbOfActionGivenDistribution(burlap.oomdp.core.AbstractGroundedAction, java.util.List).
  - getProbOfActionGivenDistribution
```
public static double getProbOfActionGivenDistribution(AbstractGroundedAction ga,
                                      java.util.List<Policy.ActionProb> distribution)
```
    Searches the input distribution for the occurrence of the input action and returns its probability.
    
    Parameters:
    ga - the AbstractGroundedAction for which its probability in specified distribution should be returned.
    distribution - the probability distribution over actions.
    
    Returns:
    the probability of selecting action ga according to the probability specified in distribution.
  - getDeterministicPolicy
```
protected java.util.List<Policy.ActionProb> getDeterministicPolicy(State s)
```
    A helper method for defining deterministic policies. This method relies on the getAction method being implemented and will return a list of ActionProb objects with a single instance: the result of the getAction method with assigned probability 1. This method simplifies the definition of deterministic policies because the getActionDistributionForState method can just retunr this method.
    
    Parameters:
    s - the state for which the action distribution should be returned.
    
    Returns:
    a deterministic action distribution for the action returned by the getAction method.
  - sampleFromActionDistribution
```
protected AbstractGroundedAction sampleFromActionDistribution(State s)
```
    This is a helper method for stochastic policies. If the policy is stochastic, then rather than having the subclass policy define both the getAction(burlap.oomdp.core.states.State) method and getActionDistributionForState(burlap.oomdp.core.states.State) method, the subclass needs to only define the getActionDistributionForState(burlap.oomdp.core.states.State) method and the getAction(burlap.oomdp.core.states.State) method can simply call this method to return an action.
    
    Parameters:
    s - the input state from which an action should be selected.
    
    Returns:
    an AbstractGroundedAction to take
  - evaluateMethodsShouldDecomposeOption
```
public void evaluateMethodsShouldDecomposeOption(boolean toggle)
```
    Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects. The default value is true. If this is set to false, then EpisodeAnalysis objects returned from evaluating a policy will record options as a single "action" and the steps taken by the option will be hidden.
    
    Parameters:
    toggle - whether to decompose options into the primitive actions taken by them or not.
  - evaluateMethodsShouldAnnotateOptionDecomposition
```
public void evaluateMethodsShouldAnnotateOptionDecomposition(boolean toggle)
```
    Sets whether options that are decomposed into primitives will have the option that produced them and listed. The default value is true. If option decomposition is not enabled, changing this value will do nothing. When it is enabled and this is set to true, primitive actions taken by an option in EpisodeAnalysis objects will be recorded with a special action name that indicates which option was called to produce the primitive action as well as which step of the option the primitive action is. When set to false, recorded names of primitives will be only the primitive aciton's name it will be unclear which option was taken to generate it.
    
    Parameters:
    toggle - whether to annotate the primitive actions of options with the calling option's name.
  - evaluateBehavior
```
public EpisodeAnalysis evaluateBehavior(State s,
                               RewardFunction rf,
                               TerminalFunction tf)
```
    This method will return the an episode that results from following this policy from state s. The episode will terminate when the policy reaches a terminal state defined by tf.
    
    Parameters:
    s - the state from which to roll out the policy
    rf - the reward function used to track rewards accumulated during the episode
    tf - the terminal function defining when the policy should stop being followed.
    
    Returns:
    an EpisodeAnalysis object that records the events from following the policy.
  - evaluateBehavior
```
public EpisodeAnalysis evaluateBehavior(State s,
                               RewardFunction rf,
                               TerminalFunction tf,
                               int maxSteps)
```
    This method will return the an episode that results from following this policy from state s. The episode will terminate when the policy reaches a terminal state defined by tf or when the number of steps surpasses maxSteps.
    
    Parameters:
    s - the state from which to roll out the policy
    rf - the reward function used to track rewards accumulated during the episode
    tf - the terminal function defining when the policy should stop being followed.
    maxSteps - the maximum number of steps to take before terminating the policy rollout.
    
    Returns:
    an EpisodeAnalysis object that records the events from following the policy.
  - evaluateBehavior
```
public EpisodeAnalysis evaluateBehavior(State s,
                               RewardFunction rf,
                               int numSteps)
```
    This method will return the an episode that results from following this policy from state s. The episode will terminate when the number of steps taken is >= numSteps.
    
    Parameters:
    s - the state from which to roll out the policy
    rf - the reward function used to track rewards accumulated during the episode
    numSteps - the number of steps to take before terminating the policy rollout
    
    Returns:
    an EpisodeAnalysis object that records the events from following the policy.
  - evaluateBehavior
```
public EpisodeAnalysis evaluateBehavior(Environment env)
```
    Evaluates this policy in the provided Environment. The policy will stop being evaluated once a terminal state in the environment is reached.
    
    Parameters:
    env - The Environment in which this policy is to be evaluated.
    
    Returns:
    An EpisodeAnalysis object specifying the interaction with the environment.
  - evaluateBehavior
```
public EpisodeAnalysis evaluateBehavior(Environment env,
                               int numSteps)
```
    Evaluates this policy in the provided Environment. The policy will stop being evaluated once a terminal state in the environment is reached or when the provided number of steps has been taken.
    
    Parameters:
    env - The Environment in which this policy is to be evaluated.
    numSteps - the maximum number of steps to take in the environment.
    
    Returns:
    An EpisodeAnalysis object specifying the interaction with the environment.
  - followAndRecordPolicy
```
protected void followAndRecordPolicy(Environment env,
                         EpisodeAnalysis ea)
```
    Follows this policy for one time step in the provided Environment and records the interaction in the provided EpisodeAnalysis object. If the policy selects an Option, then how the option's interaction in the environment is recorded depends on this object's evaluateDecomposesOptions and annotateOptionDecomposition flags. If evaluateDecomposesOptions is false, then the option is recorded as a single action. If it is true, then the individual primitive actions selected by the environment are recorded. If annotateOptionDecomposition is also true, then each primitive action selected but the option is also given a unique name specifying the option which controlled it and its step in the option's execution.
    
    Parameters:
    env - The Environment in which this policy should be followed.
    ea - The EpisodeAnalysis object to which the action selection will be recorded.
  - followAndRecordPolicy
```
protected State followAndRecordPolicy(EpisodeAnalysis ea,
                          State cur,
                          RewardFunction rf)
```
    Follows this policy for one time step from the provided State and records the interaction in the provided EpisodeAnalysis object. If the policy selects an Option, then how the option's interaction in the environment is recorded depends on this object's evaluateDecomposesOptions and annotateOptionDecomposition flags. If evaluateDecomposesOptions is false, then the option is recorded as a single action. If it is true, then the individual primitive actions selected by the environment are recorded. If annotateOptionDecomposition is also true, then each primitive action selected but the option is also given a unique name specifying the option which controlled it and its step in the option's execution.
    
    Parameters:
    ea - The EpisodeAnalysis object to which the action selection will be recorded.
    cur - The State from which the policy will be followed
    rf - The RewardFunction to keep track of reward
    
    Returns:
    the next State that is a consequence of following this policy for one action selection.

Class Policy

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

evaluateDecomposesOptions

annotateOptionDecomposition

Constructor Detail

Policy

Method Detail

getAction

getActionDistributionForState

isStochastic

isDefinedFor

getProbOfAction

getProbOfActionGivenDistribution

getProbOfActionGivenDistribution

getDeterministicPolicy

sampleFromActionDistribution

evaluateMethodsShouldDecomposeOption

evaluateMethodsShouldAnnotateOptionDecomposition

evaluateBehavior

evaluateBehavior

evaluateBehavior

evaluateBehavior

evaluateBehavior

followAndRecordPolicy

followAndRecordPolicy