public class PolicyUtils
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
static boolean |
rolloutsDecomposeOptions
|
Modifier and Type | Method and Description |
---|---|
static double |
actionProbFromEnum(EnumerablePolicy p,
State s,
Action a)
Returns the probability of the policy taking action a in state s by searching for the action
in the returned policy distribution from the provided
EnumerablePolicy . |
static double |
actionProbGivenDistribution(Action a,
java.util.List<ActionProb> distribution)
Searches the input distribution for the occurrence of the input action and returns its probability.
|
static java.util.List<ActionProb> |
deterministicPolicyDistribution(Policy p,
State s)
A helper method for defining deterministic policies.
|
protected static void |
followAndRecordPolicy(Policy p,
Environment env,
Episode ea)
Follows this policy for one time step in the provided
Environment and
records the interaction in the provided Episode object. |
static Episode |
rollout(Policy p,
Environment env)
Follows the policy in the given
Environment . |
static Episode |
rollout(Policy p,
Environment env,
int numSteps)
Follows the policy in the given
Environment . |
static Episode |
rollout(Policy p,
State s,
SampleModel model)
This method will return the an episode that results from following the given policy from state s.
|
static Episode |
rollout(Policy p,
State s,
SampleModel model,
int maxSteps)
This method will return the an episode that results from following the given policy from state s.
|
static Action |
sampleFromActionDistribution(EnumerablePolicy p,
State s)
This is a helper method for stochastic policies.
|
public static double actionProbFromEnum(EnumerablePolicy p, State s, Action a)
EnumerablePolicy
.p
- the EnumerablePolicy
s
- the state in which the action would be takena
- the action being queriedpublic static double actionProbGivenDistribution(Action a, java.util.List<ActionProb> distribution)
a
- the Action
for which its probability in specified distribution should be returned.distribution
- the probability distribution over actions.public static java.util.List<ActionProb> deterministicPolicyDistribution(Policy p, State s)
Policy.action(State)
method being
implemented and will return a list of ActionProb
objects with a single instance: the result of
the Policy.action(State)
method with assigned probability 1.p
- the Policy
s
- the state for which the action distribution should be returned.public static Action sampleFromActionDistribution(EnumerablePolicy p, State s)
Policy.action(State)
method and
EnumerablePolicy.policyDistribution(State)
method,
the objects needs to only define the EnumerablePolicy.policyDistribution(State)
method and
the Policy.action(State)
method can simply
return the result of this method to sample an action.p
- the EnumerablePolicy
s
- the input state from which an action should be selected.Action
to takepublic static Episode rollout(Policy p, State s, SampleModel model)
p
- the Policy
to roll outs
- the state from which to roll out the policymodel
- the model from which to samplepublic static Episode rollout(Policy p, State s, SampleModel model, int maxSteps)
p
- the Policy
to roll outs
- the state from which to roll out the policymodel
- the model from which to same state transitionsmaxSteps
- the maximum number of steps to take before terminating the policy rollout.public static Episode rollout(Policy p, Environment env)
Environment
. The policy will stop being followed once a terminal state
in the environment is reached.p
- the Policy
env
- The Environment
in which this policy is to be evaluated.Episode
object specifying the interaction with the environment.public static Episode rollout(Policy p, Environment env, int numSteps)
Environment
. The policy will stop being followed once a terminal state
in the environment is reached or when the provided number of steps has been taken.p
- the Policy
env
- The Environment
in which this policy is to be evaluated.numSteps
- the maximum number of steps to take in the environment.Episode
object specifying the interaction with the environment.protected static void followAndRecordPolicy(Policy p, Environment env, Episode ea)
Environment
and
records the interaction in the provided Episode
object. If the policy
selects an Option
, then how the option's interaction in the environment
is recorded depends on the rolloutsDecomposeOptions
flag.
If rolloutsDecomposeOptions
is false, then the option is recorded as a single action. If it is true, then
the individual primitive actions selected by the environment are recorded.p
- the Policy
env
- The Environment
in which this policy should be followed.ea
- The Episode
object to which the action selection will be recorded.