This class defines a an epsilon-greedy policy over Q-values and requires a QComputable planner to be specified.
With probability epsilon the policy will return a random action (with uniform distribution over all possible action).
With probability 1 - epsilon the policy will return the greedy action. If multiple actions tie for the highest Q-value,
then one of the tied actions is randomly selected.
Nested Class Summary
Nested classes/interfaces inherited from class burlap.behavior.singleagent.Policy
This method will return an action sampled by the policy for the given state. If the defined policy is
stochastic, then multiple calls to this method for the same state may return different actions. The sampling
should be with respect to defined action distribution that is returned by getActionDistributionForState
This method will return action probability distribution defined by the policy. The action distribution is represented
by a list of ActionProb objects, each which specifies a grounded action and a probability of that grounded action being
taken. The returned list does not have to include actions with probability 0.