public abstract class Policy
extends java.lang.Object
getAction(burlap.oomdp.core.states.State)
,
getActionDistributionForState(burlap.oomdp.core.states.State)
,
isStochastic()
, and
isDefinedFor(burlap.oomdp.core.states.State)
.
getAction(burlap.oomdp.core.states.State)
should return the action (specified by an
AbstractGroundedAction
; e.g., a GroundedAction
for
single agent domains) this policy defines for the
input State
. If this Policy
is a stochastic policy,
then the getAction(burlap.oomdp.core.states.State)
method should sample an action from its probability distribution
and return it.
getActionDistributionForState(burlap.oomdp.core.states.State)
should return this Policy
's
action selection probability distribution for the input State
. The probability distribution is
specified by returning a List
of Policy.ActionProb
instances.
An Policy.ActionProb
is a pair consisting of an AbstractGroundedAction
specifying the action and a double specifying the probability that this Policy
would
select that action.
isStochastic()
method should return true if this Policy
is
stochastic and false if it is deterministic.
isDefinedFor(burlap.oomdp.core.states.State)
method should return true if this Policy
is defined for the input State
and false if it is not.
getActionDistributionForState(burlap.oomdp.core.states.State)
is implemented and stochastic, then the
getAction(burlap.oomdp.core.states.State)
can be trivially implemented by having it return the result of the
superclass method sampleFromActionDistribution(burlap.oomdp.core.states.State)
, which will get the probability
distribution from the getActionDistributionForState(burlap.oomdp.core.states.State)
, roll a random number
and return an action based on the fully define action distribution. Inversely, if the policy is deterministic and
the getAction(burlap.oomdp.core.states.State)
is implemented, then the
getActionDistributionForState(burlap.oomdp.core.states.State)
method can be trivially implemented by having it
return the result of getDeterministicPolicy(burlap.oomdp.core.states.State)
, which will call getAction(burlap.oomdp.core.states.State)
and wrap the result in an Policy.ActionProb
object with assigned probability of 1.0.
getProbOfAction(burlap.oomdp.core.states.State, burlap.oomdp.core.AbstractGroundedAction)
,
evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction)
(and other variants of the method signature), and evaluateBehavior(burlap.oomdp.singleagent.environment.Environment)
(and
other variants of the method signature).
getProbOfAction(burlap.oomdp.core.states.State, burlap.oomdp.core.AbstractGroundedAction)
method
takes as input a State
and AbstractGroundedAction
and returns
the probability of this Policy
selecting that action. It uses the result of the
getActionDistributionForState(burlap.oomdp.core.states.State)
method to determine the full distribution, finds
the matching AbstractGroundedAction
in the returned list, and then returns its assigned probability.
It may be possible to return this value in a more efficient way than enumerating the full probability distribution,
in which case you may want to consider overriding the method.
evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction)
,
evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, int)
, and
evaluateBehavior(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, int)
methods will all evaluate this policy by rolling it out from the input State
or until
it reaches a terminal state or executes for the maximum number of steps (depending on which version of the method you use).
The resulting behavior will be saved in an EpisodeAnalysis
object that is returned.
Note that this method requires that the returned AbstractGroundedAction
instances are
able to be executed using the action's defined transition dynamics. For single agent domains in which the actions
are GroundedAction
instances, this will work as long as the corresponding
Action.performAction(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction)
method is implemented. If this
policy defines the policy for an agent in a stochastic game, returning GroundedSGAgentAction
instances
for the action, then the policy cannot be rolled out since the outcome state would depend on the action selection of
other agents.
evaluateBehavior(burlap.oomdp.singleagent.environment.Environment)
and
evaluateBehavior(burlap.oomdp.singleagent.environment.Environment, int)
methods will execute this policy in some input Environment
until either
the Environment
reaches a terminal state or the maximum number of
steps are taken (depending on which method signature is used). This method is useful if a policy was computed
with a planning algorithm using some model of the world and then needs to be executed in an environment which may
have slightly different transitions; for example, planning a policy for a robot using a model of the world and then
executing it on the actual robot by following the policy in an Environment
.
Option
s.
In particular, they also are able to record
the option execution in the returned EpisodeAnalysis
object in verbose ways
for better debugging. By default, when an option is selected in an evaluateBehavior method, each primitive step
will be recorded in the EpisodeAnalysis
object, rather than only recording that
the option was taken. Additionally, in the returned EpisodeAnalysis
, each primitive
step by default will be annotated with the option the executed and which step in the option execution that it was.
If you would like to disable option decomposition and/or the option annotation, you can do so with the
evaluateMethodsShouldDecomposeOption(boolean)
and evaluateMethodsShouldAnnotateOptionDecomposition(boolean)
methods.Modifier and Type | Class and Description |
---|---|
static class |
Policy.ActionProb
Class for storing an action and probability tuple.
|
static class |
Policy.GroundedAnnotatedAction
A class for annotating an action selection, specified with a
GroundedAction , with a string. |
static class |
Policy.PolicyUndefinedException
RuntimeException to be thrown when a Policy is queried for a state in which the policy is undefined.
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
annotateOptionDecomposition |
protected boolean |
evaluateDecomposesOptions |
Constructor and Description |
---|
Policy() |
Modifier and Type | Method and Description |
---|---|
EpisodeAnalysis |
evaluateBehavior(Environment env)
Evaluates this policy in the provided
Environment . |
EpisodeAnalysis |
evaluateBehavior(Environment env,
int numSteps)
Evaluates this policy in the provided
Environment . |
EpisodeAnalysis |
evaluateBehavior(State s,
RewardFunction rf,
int numSteps)
This method will return the an episode that results from following this policy from state s.
|
EpisodeAnalysis |
evaluateBehavior(State s,
RewardFunction rf,
TerminalFunction tf)
This method will return the an episode that results from following this policy from state s.
|
EpisodeAnalysis |
evaluateBehavior(State s,
RewardFunction rf,
TerminalFunction tf,
int maxSteps)
This method will return the an episode that results from following this policy from state s.
|
void |
evaluateMethodsShouldAnnotateOptionDecomposition(boolean toggle)
Sets whether options that are decomposed into primitives will have the option that produced them and listed.
|
void |
evaluateMethodsShouldDecomposeOption(boolean toggle)
Sets whether the primitive actions taken during an options will be included as steps in produced EpisodeAnalysis objects.
|
protected void |
followAndRecordPolicy(Environment env,
EpisodeAnalysis ea)
Follows this policy for one time step in the provided
Environment and
records the interaction in the provided EpisodeAnalysis object. |
protected State |
followAndRecordPolicy(EpisodeAnalysis ea,
State cur,
RewardFunction rf)
Follows this policy for one time step from the provided
State and
records the interaction in the provided EpisodeAnalysis object. |
abstract AbstractGroundedAction |
getAction(State s)
This method will return an action sampled by the policy for the given state.
|
abstract java.util.List<Policy.ActionProb> |
getActionDistributionForState(State s)
This method will return action probability distribution defined by the policy.
|
protected java.util.List<Policy.ActionProb> |
getDeterministicPolicy(State s)
A helper method for defining deterministic policies.
|
double |
getProbOfAction(State s,
AbstractGroundedAction ga)
Will return the probability of this policy taking action ga in state s
|
static double |
getProbOfActionGivenDistribution(AbstractGroundedAction ga,
java.util.List<Policy.ActionProb> distribution)
Searches the input distribution for the occurrence of the input action and returns its probability.
|
static double |
getProbOfActionGivenDistribution(State s,
AbstractGroundedAction ga,
java.util.List<Policy.ActionProb> distribution)
Deprecated.
|
abstract boolean |
isDefinedFor(State s)
Specifies whether this policy is defined for the input state.
|
abstract boolean |
isStochastic()
Indicates whether the policy is stochastic or deterministic.
|
protected AbstractGroundedAction |
sampleFromActionDistribution(State s)
This is a helper method for stochastic policies.
|
protected boolean evaluateDecomposesOptions
protected boolean annotateOptionDecomposition
public abstract AbstractGroundedAction getAction(State s)
s
- the state for which an action should be returnedpublic abstract java.util.List<Policy.ActionProb> getActionDistributionForState(State s)
s
- the state for which an action distribution should be returnedpublic abstract boolean isStochastic()
public abstract boolean isDefinedFor(State s)
s
- the input state to test for whether this policy is definedState
s, false otherwise.public double getProbOfAction(State s, AbstractGroundedAction ga)
s
- the state in which the action would be takenga
- the action being queried@Deprecated public static double getProbOfActionGivenDistribution(State s, AbstractGroundedAction ga, java.util.List<Policy.ActionProb> distribution)
getProbOfActionGivenDistribution(burlap.oomdp.core.AbstractGroundedAction, java.util.List)
.public static double getProbOfActionGivenDistribution(AbstractGroundedAction ga, java.util.List<Policy.ActionProb> distribution)
ga
- the AbstractGroundedAction
for which its probability in specified distribution should be returned.distribution
- the probability distribution over actions.protected java.util.List<Policy.ActionProb> getDeterministicPolicy(State s)
s
- the state for which the action distribution should be returned.protected AbstractGroundedAction sampleFromActionDistribution(State s)
getAction(burlap.oomdp.core.states.State)
method and
getActionDistributionForState(burlap.oomdp.core.states.State)
method,
the subclass needs to only define the getActionDistributionForState(burlap.oomdp.core.states.State)
method and
the getAction(burlap.oomdp.core.states.State)
method can simply
call this method to return an action.s
- the input state from which an action should be selected.AbstractGroundedAction
to takepublic void evaluateMethodsShouldDecomposeOption(boolean toggle)
toggle
- whether to decompose options into the primitive actions taken by them or not.public void evaluateMethodsShouldAnnotateOptionDecomposition(boolean toggle)
toggle
- whether to annotate the primitive actions of options with the calling option's name.public EpisodeAnalysis evaluateBehavior(State s, RewardFunction rf, TerminalFunction tf)
s
- the state from which to roll out the policyrf
- the reward function used to track rewards accumulated during the episodetf
- the terminal function defining when the policy should stop being followed.public EpisodeAnalysis evaluateBehavior(State s, RewardFunction rf, TerminalFunction tf, int maxSteps)
s
- the state from which to roll out the policyrf
- the reward function used to track rewards accumulated during the episodetf
- the terminal function defining when the policy should stop being followed.maxSteps
- the maximum number of steps to take before terminating the policy rollout.public EpisodeAnalysis evaluateBehavior(State s, RewardFunction rf, int numSteps)
s
- the state from which to roll out the policyrf
- the reward function used to track rewards accumulated during the episodenumSteps
- the number of steps to take before terminating the policy rolloutpublic EpisodeAnalysis evaluateBehavior(Environment env)
Environment
. The policy will stop being evaluated once a terminal state
in the environment is reached.env
- The Environment
in which this policy is to be evaluated.EpisodeAnalysis
object specifying the interaction with the environment.public EpisodeAnalysis evaluateBehavior(Environment env, int numSteps)
Environment
. The policy will stop being evaluated once a terminal state
in the environment is reached or when the provided number of steps has been taken.env
- The Environment
in which this policy is to be evaluated.numSteps
- the maximum number of steps to take in the environment.EpisodeAnalysis
object specifying the interaction with the environment.protected void followAndRecordPolicy(Environment env, EpisodeAnalysis ea)
Environment
and
records the interaction in the provided EpisodeAnalysis
object. If the policy
selects an Option
, then how the option's interaction in the environment
is recorded depends on this object's evaluateDecomposesOptions
and annotateOptionDecomposition
flags.
If evaluateDecomposesOptions
is false, then the option is recorded as a single action. If it is true, then
the individual primitive actions selected by the environment are recorded. If annotateOptionDecomposition
is
also true, then each primitive action selected but the option is also given a unique name specifying the option
which controlled it and its step in the option's execution.env
- The Environment
in which this policy should be followed.ea
- The EpisodeAnalysis
object to which the action selection will be recorded.protected State followAndRecordPolicy(EpisodeAnalysis ea, State cur, RewardFunction rf)
State
and
records the interaction in the provided EpisodeAnalysis
object. If the policy
selects an Option
, then how the option's interaction in the environment
is recorded depends on this object's evaluateDecomposesOptions
and annotateOptionDecomposition
flags.
If evaluateDecomposesOptions
is false, then the option is recorded as a single action. If it is true, then
the individual primitive actions selected by the environment are recorded. If annotateOptionDecomposition
is
also true, then each primitive action selected but the option is also given a unique name specifying the option
which controlled it and its step in the option's execution.ea
- The EpisodeAnalysis
object to which the action selection will be recorded.cur
- The State
from which the policy will be followedrf
- The RewardFunction
to keep track of rewardState
that is a consequence of following this policy for one action selection.