PolicyUtils

java.lang.Object
- burlap.behavior.policy.PolicyUtils

public class PolicyUtils
extends java.lang.Object

Author:: James MacGlashan.

Field Summary

Fields
Modifier and Type	Field and Description
`static boolean`	`rolloutsDecomposeOptions` Indicates whether rollout methods will decompose `Option` selections into the primitive `Action` objects they execute and annotate them with the name of the calling `Option` in the returned `Episode`.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static double`	`actionProbFromEnum(EnumerablePolicy p, State s, Action a)` Returns the probability of the policy taking action a in state s by searching for the action in the returned policy distribution from the provided `EnumerablePolicy`.
`static double`	`actionProbGivenDistribution(Action a, java.util.List<ActionProb> distribution)` Searches the input distribution for the occurrence of the input action and returns its probability.
`static java.util.List<ActionProb>`	`deterministicPolicyDistribution(Policy p, State s)` A helper method for defining deterministic policies.
`protected static void`	`followAndRecordPolicy(Policy p, Environment env, Episode ea)` Follows this policy for one time step in the provided `Environment` and records the interaction in the provided `Episode` object.
`static Episode`	`rollout(Policy p, Environment env)` Follows the policy in the given `Environment`.
`static Episode`	`rollout(Policy p, Environment env, int numSteps)` Follows the policy in the given `Environment`.
`static Episode`	`rollout(Policy p, State s, SampleModel model)` This method will return the an episode that results from following the given policy from state s.
`static Episode`	`rollout(Policy p, State s, SampleModel model, int maxSteps)` This method will return the an episode that results from following the given policy from state s.
`static Action`	`sampleFromActionDistribution(EnumerablePolicy p, State s)` This is a helper method for stochastic policies.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - rolloutsDecomposeOptions
```
public static boolean rolloutsDecomposeOptions
```
    Indicates whether rollout methods will decompose Option selections into the primitive Action objects they execute and annotate them with the name of the calling Option in the returned Episode. Default value is true.
- Method Detail
  - actionProbFromEnum
```
public static double actionProbFromEnum(EnumerablePolicy p,
                                        State s,
                                        Action a)
```
    Returns the probability of the policy taking action a in state s by searching for the action in the returned policy distribution from the provided EnumerablePolicy.
    
    Parameters:
    
    p - the EnumerablePolicy
    
    s - the state in which the action would be taken
    
    a - the action being queried
    
    Returns:
    
    the probability of this policy taking action ga in state s
  - actionProbGivenDistribution
```
public static double actionProbGivenDistribution(Action a,
                                                 java.util.List<ActionProb> distribution)
```
    Searches the input distribution for the occurrence of the input action and returns its probability.
    
    Parameters:
    
    a - the Action for which its probability in specified distribution should be returned.
    
    distribution - the probability distribution over actions.
    
    Returns:
    
    the probability of selecting action ga according to the probability specified in distribution.
  - deterministicPolicyDistribution
```
public static java.util.List<ActionProb> deterministicPolicyDistribution(Policy p,
                                                                         State s)
```
    A helper method for defining deterministic policies. This method relies on the Policy.action(State) method being implemented and will return a list of ActionProb objects with a single instance: the result of the Policy.action(State) method with assigned probability 1.
    
    Parameters:
    
    p - the Policy
    
    s - the state for which the action distribution should be returned.
    
    Returns:
    
    a deterministic action distribution for the action returned by the getAction method.
  - sampleFromActionDistribution
```
public static Action sampleFromActionDistribution(EnumerablePolicy p,
                                                  State s)
```
    This is a helper method for stochastic policies. If the policy is stochastic, then rather than having the policy define both the Policy.action(State) method and EnumerablePolicy.policyDistribution(State) method, the objects needs to only define the EnumerablePolicy.policyDistribution(State) method and the Policy.action(State) method can simply return the result of this method to sample an action.
    
    Parameters:
    
    p - the EnumerablePolicy
    
    s - the input state from which an action should be selected.
    
    Returns:
    
    an Action to take
  - rollout
```
public static Episode rollout(Policy p,
                              State s,
                              SampleModel model)
```
    This method will return the an episode that results from following the given policy from state s. The episode will terminate when the policy reaches a terminal state.
    
    Parameters:
    
    p - the Policy to roll out
    
    s - the state from which to roll out the policy
    
    model - the model from which to sample
    
    Returns:
    
    an EpisodeAnalysis object that records the events from following the policy.
  - rollout
```
public static Episode rollout(Policy p,
                              State s,
                              SampleModel model,
                              int maxSteps)
```
    This method will return the an episode that results from following the given policy from state s. The episode will terminate when the policy reaches a terminal state or when the number of steps surpasses maxSteps.
    
    Parameters:
    
    p - the Policy to roll out
    
    s - the state from which to roll out the policy
    
    model - the model from which to same state transitions
    
    maxSteps - the maximum number of steps to take before terminating the policy rollout.
    
    Returns:
    
    an EpisodeAnalysis object that records the events from following the policy.
  - rollout
```
public static Episode rollout(Policy p,
                              Environment env)
```
    Follows the policy in the given Environment. The policy will stop being followed once a terminal state in the environment is reached.
    
    Parameters:
    
    p - the Policy
    
    env - The Environment in which this policy is to be evaluated.
    
    Returns:
    
    An Episode object specifying the interaction with the environment.
  - rollout
```
public static Episode rollout(Policy p,
                              Environment env,
                              int numSteps)
```
    Follows the policy in the given Environment. The policy will stop being followed once a terminal state in the environment is reached or when the provided number of steps has been taken.
    
    Parameters:
    
    p - the Policy
    
    env - The Environment in which this policy is to be evaluated.
    
    numSteps - the maximum number of steps to take in the environment.
    
    Returns:
    
    An Episode object specifying the interaction with the environment.
  - followAndRecordPolicy
```
protected static void followAndRecordPolicy(Policy p,
                                            Environment env,
                                            Episode ea)
```
    Follows this policy for one time step in the provided Environment and records the interaction in the provided Episode object. If the policy selects an Option, then how the option's interaction in the environment is recorded depends on the rolloutsDecomposeOptions flag. If rolloutsDecomposeOptions is false, then the option is recorded as a single action. If it is true, then the individual primitive actions selected by the environment are recorded.
    
    Parameters:
    
    p - the Policy
    
    env - The Environment in which this policy should be followed.
    
    ea - The Episode object to which the action selection will be recorded.

Class PolicyUtils

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

rolloutsDecomposeOptions

Method Detail

actionProbFromEnum

actionProbGivenDistribution

deterministicPolicyDistribution

sampleFromActionDistribution

rollout

rollout

rollout

rollout

followAndRecordPolicy