UCTTreeWalkPolicy

java.lang.Object
- burlap.behavior.singleagent.planning.stochastic.montecarlo.uct.UCTTreeWalkPolicy

All Implemented Interfaces:

EnumerablePolicy, Policy, SolverDerivedPolicy
```
public class UCTTreeWalkPolicy
extends java.lang.Object
implements SolverDerivedPolicy, EnumerablePolicy
```
This policy is for use with UCT. Note that UCT can only guarantee the policy for the initial state of planning. However, the policy from states that lie on the greedy path from the initial state are likely "okay" as well since they were used for the determining which action to take in the initial state. This class defines the policy for states that lie along the the greedy path of the UCT tree. Any state not visited by the greedy path in the UCT tree is excluded from the policy and will result in an error if this policy is queried for such a state. A more robust policy would cause the valueFunction to be called at each state to build a new tree.

Author:

James MacGlashan

Constructor Summary

Constructors
Constructor and Description

UCTTreeWalkPolicy(UCT planner)
Initializes the policy with the UCT valueFunction

Constructors
Constructor and Description
`UCTTreeWalkPolicy(UCT planner)` Initializes the policy with the UCT valueFunction

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Action`	`action(State s)` This method will return an action sampled by the policy for the given state.
`double`	`actionProb(State s, Action a)` Returns the probability/probability density that the given action will be taken in the given state.
`void`	`computePolicyFromTree()` computes a hash-backed policy for every state visited along the greedy path of the UCT tree.
`boolean`	`definedFor(State s)` Specifies whether this policy is defined for the input state.
`protected UCTActionNode`	`getQGreedyNode(UCTStateNode snode)` Returns the `UCTActionNode` with the highest average sample return.
`java.util.List<ActionProb>`	`policyDistribution(State s)` This method will return action probability distribution defined by the policy.
`void`	`setSolver(MDPSolverInterface solver)` Sets the valueFunction whose results affect this policy.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - UCTTreeWalkPolicy
```
public UCTTreeWalkPolicy(UCT planner)
```
    Initializes the policy with the UCT valueFunction
    
    Parameters:
    
    planner - the UCT valueFunction whose tree should be walked.
- Method Detail
  - setSolver
```
public void setSolver(MDPSolverInterface solver)
```
    Description copied from interface: SolverDerivedPolicy
    
    Sets the valueFunction whose results affect this policy.
    
    Specified by:
    
    setSolver in interface SolverDerivedPolicy
    
    Parameters:
    
    solver - the solver from which this policy is derived
  - computePolicyFromTree
```
public void computePolicyFromTree()
```
    computes a hash-backed policy for every state visited along the greedy path of the UCT tree.
  - getQGreedyNode
```
protected UCTActionNode getQGreedyNode(UCTStateNode snode)
```
    Returns the UCTActionNode with the highest average sample return. Note that this does not use the upper confidence since planning is completed.
    
    Parameters:
    
    snode - the UCTStateNode for which to get the best UCTActionNode.
    
    Returns:
    
    the UCTActionNode with the highest average sample return.
  - action
```
public Action action(State s)
```
    Description copied from interface: Policy
    
    This method will return an action sampled by the policy for the given state. If the defined policy is stochastic, then multiple calls to this method for the same state may return different actions. The sampling should be with respect to defined action distribution that is returned by getActionDistributionForState
    
    Specified by:
    
    action in interface Policy
    
    Parameters:
    
    s - the state for which an action should be returned
    
    Returns:
    
    a sample action from the action distribution; null if the policy is undefined for s
  - actionProb
```
public double actionProb(State s,
                         Action a)
```
    Description copied from interface: Policy
    
    Returns the probability/probability density that the given action will be taken in the given state.
    
    Specified by:
    
    actionProb in interface Policy
    
    Parameters:
    
    s - the state of interest
    
    a - the action that may be taken in the state
    
    Returns:
    
    the probability/probability density
  - policyDistribution
```
public java.util.List<ActionProb> policyDistribution(State s)
```
    Description copied from interface: EnumerablePolicy
    
    This method will return action probability distribution defined by the policy. The action distribution is represented by a list of ActionProb objects, each which specifies a grounded action and a probability of that grounded action being taken. The returned list does not have to include actions with probability 0.
    
    Specified by:
    
    policyDistribution in interface EnumerablePolicy
    
    Parameters:
    
    s - the state for which an action distribution should be returned
    
    Returns:
    
    a list of possible actions taken by the policy and their probability.
  - definedFor
```
public boolean definedFor(State s)
```
    Description copied from interface: Policy
    
    Specifies whether this policy is defined for the input state.
    
    Specified by:
    
    definedFor in interface Policy
    
    Parameters:
    
    s - the input state to test for whether this policy is defined
    
    Returns:
    
    true if this policy is defined for State s, false otherwise.

Class UCTTreeWalkPolicy

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

UCTTreeWalkPolicy

Method Detail

setSolver

computePolicyFromTree

getQGreedyNode

action

actionProb

policyDistribution

definedFor