public class UCTTreeWalkPolicy extends Policy implements PlannerDerivedPolicy
Policy.ActionProb, Policy.PolicyUndefinedException, Policy.RandomPolicy
annotateOptionDecomposition, evaluateDecomposesOptions
Constructor and Description |
---|
UCTTreeWalkPolicy(UCT planner)
Initializes the policy with the UCT planner
|
Modifier and Type | Method and Description |
---|---|
void |
computePolicyFromTree()
computes a hash-backed policy for every state visited along the greedy path of the UCT tree.
|
AbstractGroundedAction |
getAction(State s)
This method will return an action sampled by the policy for the given state.
|
java.util.List<Policy.ActionProb> |
getActionDistributionForState(State s)
This method will return action probability distribution defined by the policy.
|
protected UCTActionNode |
getQGreedyNode(UCTStateNode snode)
Returns the
UCTActionNode with the highest average sample return. |
boolean |
isDefinedFor(State s)
Specifies whether this policy is defined for the input state.
|
boolean |
isStochastic()
Indicates whether the policy is stochastic or deterministic.
|
void |
setPlanner(OOMDPPlanner planner)
Sets the planner whose results affect this policy.
|
evaluateBehavior, evaluateBehavior, evaluateBehavior, evaluateMethodsShouldAnnotateOptionDecomposition, evaluateMethodsShouldDecomposeOption, getDeterministicPolicy, getProbOfAction, getProbOfActionGivenDistribution, getProbOfActionGivenDistribution, sampleFromActionDistribution
public UCTTreeWalkPolicy(UCT planner)
planner
- the UCT planner whose tree should be walked.public void setPlanner(OOMDPPlanner planner)
PlannerDerivedPolicy
setPlanner
in interface PlannerDerivedPolicy
public void computePolicyFromTree()
protected UCTActionNode getQGreedyNode(UCTStateNode snode)
UCTActionNode
with the highest average sample return. Note that this does not use the upper confidence since
planning is completed.snode
- the UCTStateNode
for which to get the best UCTActionNode
.UCTActionNode
with the highest average sample return.public AbstractGroundedAction getAction(State s)
Policy
public java.util.List<Policy.ActionProb> getActionDistributionForState(State s)
Policy
getActionDistributionForState
in class Policy
s
- the state for which an action distribution should be returnedpublic boolean isStochastic()
Policy
isStochastic
in class Policy
public boolean isDefinedFor(State s)
Policy
isDefinedFor
in class Policy
s
- the input state to test for whether this policy is definedState
s, false otherwise.