public class PolicyUtils
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
static boolean |
rolloutsDecomposeOptions
|
| Modifier and Type | Method and Description |
|---|---|
static double |
actionProbFromEnum(EnumerablePolicy p,
State s,
Action a)
Returns the probability of the policy taking action a in state s by searching for the action
in the returned policy distribution from the provided
EnumerablePolicy. |
static double |
actionProbGivenDistribution(Action a,
java.util.List<ActionProb> distribution)
Searches the input distribution for the occurrence of the input action and returns its probability.
|
static java.util.List<ActionProb> |
deterministicPolicyDistribution(Policy p,
State s)
A helper method for defining deterministic policies.
|
protected static void |
followAndRecordPolicy(Policy p,
Environment env,
Episode ea)
Follows this policy for one time step in the provided
Environment and
records the interaction in the provided Episode object. |
static Episode |
rollout(Policy p,
Environment env)
Follows the policy in the given
Environment. |
static Episode |
rollout(Policy p,
Environment env,
int numSteps)
Follows the policy in the given
Environment. |
static Episode |
rollout(Policy p,
State s,
SampleModel model)
This method will return the an episode that results from following the given policy from state s.
|
static Episode |
rollout(Policy p,
State s,
SampleModel model,
int maxSteps)
This method will return the an episode that results from following the given policy from state s.
|
static Action |
sampleFromActionDistribution(EnumerablePolicy p,
State s)
This is a helper method for stochastic policies.
|
public static double actionProbFromEnum(EnumerablePolicy p, State s, Action a)
EnumerablePolicy.p - the EnumerablePolicys - the state in which the action would be takena - the action being queriedpublic static double actionProbGivenDistribution(Action a, java.util.List<ActionProb> distribution)
a - the Action for which its probability in specified distribution should be returned.distribution - the probability distribution over actions.public static java.util.List<ActionProb> deterministicPolicyDistribution(Policy p, State s)
Policy.action(State) method being
implemented and will return a list of ActionProb objects with a single instance: the result of
the Policy.action(State) method with assigned probability 1.p - the Policys - the state for which the action distribution should be returned.public static Action sampleFromActionDistribution(EnumerablePolicy p, State s)
Policy.action(State) method and
EnumerablePolicy.policyDistribution(State) method,
the objects needs to only define the EnumerablePolicy.policyDistribution(State) method and
the Policy.action(State) method can simply
return the result of this method to sample an action.p - the EnumerablePolicys - the input state from which an action should be selected.Action to takepublic static Episode rollout(Policy p, State s, SampleModel model)
p - the Policy to roll outs - the state from which to roll out the policymodel - the model from which to samplepublic static Episode rollout(Policy p, State s, SampleModel model, int maxSteps)
p - the Policy to roll outs - the state from which to roll out the policymodel - the model from which to same state transitionsmaxSteps - the maximum number of steps to take before terminating the policy rollout.public static Episode rollout(Policy p, Environment env)
Environment. The policy will stop being followed once a terminal state
in the environment is reached.p - the Policyenv - The Environment in which this policy is to be evaluated.Episode object specifying the interaction with the environment.public static Episode rollout(Policy p, Environment env, int numSteps)
Environment. The policy will stop being followed once a terminal state
in the environment is reached or when the provided number of steps has been taken.p - the Policyenv - The Environment in which this policy is to be evaluated.numSteps - the maximum number of steps to take in the environment.Episode object specifying the interaction with the environment.protected static void followAndRecordPolicy(Policy p, Environment env, Episode ea)
Environment and
records the interaction in the provided Episode object. If the policy
selects an Option, then how the option's interaction in the environment
is recorded depends on the rolloutsDecomposeOptions flag.
If rolloutsDecomposeOptions is false, then the option is recorded as a single action. If it is true, then
the individual primitive actions selected by the environment are recorded.p - the Policyenv - The Environment in which this policy should be followed.ea - The Episode object to which the action selection will be recorded.