public class UCTTreeWalkPolicy extends java.lang.Object implements SolverDerivedPolicy, EnumerablePolicy
| Constructor and Description |
|---|
UCTTreeWalkPolicy(UCT planner)
Initializes the policy with the UCT valueFunction
|
| Modifier and Type | Method and Description |
|---|---|
Action |
action(State s)
This method will return an action sampled by the policy for the given state.
|
double |
actionProb(State s,
Action a)
Returns the probability/probability density that the given action will be taken in the given state.
|
void |
computePolicyFromTree()
computes a hash-backed policy for every state visited along the greedy path of the UCT tree.
|
boolean |
definedFor(State s)
Specifies whether this policy is defined for the input state.
|
protected UCTActionNode |
getQGreedyNode(UCTStateNode snode)
Returns the
UCTActionNode with the highest average sample return. |
java.util.List<ActionProb> |
policyDistribution(State s)
This method will return action probability distribution defined by the policy.
|
void |
setSolver(MDPSolverInterface solver)
Sets the valueFunction whose results affect this policy.
|
public UCTTreeWalkPolicy(UCT planner)
planner - the UCT valueFunction whose tree should be walked.public void setSolver(MDPSolverInterface solver)
SolverDerivedPolicysetSolver in interface SolverDerivedPolicysolver - the solver from which this policy is derivedpublic void computePolicyFromTree()
protected UCTActionNode getQGreedyNode(UCTStateNode snode)
UCTActionNode with the highest average sample return. Note that this does not use the upper confidence since
planning is completed.snode - the UCTStateNode for which to get the best UCTActionNode.UCTActionNode with the highest average sample return.public Action action(State s)
Policypublic double actionProb(State s, Action a)
PolicyactionProb in interface Policys - the state of interesta - the action that may be taken in the statepublic java.util.List<ActionProb> policyDistribution(State s)
EnumerablePolicypolicyDistribution in interface EnumerablePolicys - the state for which an action distribution should be returnedpublic boolean definedFor(State s)
PolicydefinedFor in interface Policys - the input state to test for whether this policy is definedState s, false otherwise.