BFSRTDP

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  - - burlap.behavior.singleagent.planning.stochastic.rtdp.RTDP
    - - burlap.behavior.singleagent.planning.stochastic.rtdp.BFSRTDP

All Implemented Interfaces:

MDPSolverInterface, Planner, QFunction, ValueFunction
```
public class BFSRTDP
extends RTDP
```
A modified version of Real-time Dynamic Programming [1] in which first a breadth-first search-like pass is made to seed the value function, and then planning continues in the typical RTDP rollout-like fashion. The BFS pass either extends to all reachable states from the source state, or optionally, to the depth required to visit a goal state. This approach may be useful if the depth of the optimal policy is expected to be much shorter than the depth of the entire state space and if a good initial value initialization is not able to be provided. The BFS-like pass expands all possible stochastic transitions from an action. 1. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
  DynamicProgramming.StaticVFPlanner
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction
  QFunction.QFunctionHelper

Field Summary

Fields
Modifier and Type	Field and Description
`protected StateConditionTest`	`goalCondition` The goal condition that stops the BFS-like pass
`protected boolean`	`performedInitialPlan` indicates whether the BFS-like pass has already been performed.

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.rtdp.RTDP
maxDelta, maxDepth, minNumRolloutsWithSmallValueChange, numberOfBellmanUpdates, numRollouts, rollOutPolicy, useBatch

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer

Fields inherited from class burlap.behavior.singleagent.MDPSolver
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Constructor Summary

Constructors
Constructor and Description
`BFSRTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth)` Initializes the valueFunction.
`BFSRTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth, StateConditionTest goalCondition)` Initializes the valueFunction.

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`performInitialPassFromState(State initialState)` Performs a BFS-like pass to either all reachable states or to depth at which a goal state is found and then performs the Bellman update on all those states.
`protected java.util.List<HashableState>`	`performRecahabilityAnalysisFrom(State si)` Finds either all reachable states from si or all states up to the depth that the first goal state is found from si.
`GreedyQPolicy`	`planFromState(State initialState)` Plans from the input state and then returns a `GreedyQPolicy` that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
`protected boolean`	`satisfiesGoal(State s)` Returns whether a state is a goal state.
`void`	`setGoalCondition(StateConditionTest gc)` Sets the goal state that causes the BFS-like pass to stop expanding when found.

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.rtdp.RTDP
batchRTDP, getNumberOfBellmanUpdates, normalRTDP, performOrderedBellmanUpdates, setMaxDelta, setMaxDynamicDepth, setMinNumRolloutsWithSmallValueChange, setNumPasses, setRollOutPolicy, toggleBatchMode

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, resetSolver, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, resetSolver, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrinting

- Field Detail
  - performedInitialPlan
```
protected boolean performedInitialPlan
```
    indicates whether the BFS-like pass has already been performed.
  - goalCondition
```
protected StateConditionTest goalCondition
```
    The goal condition that stops the BFS-like pass
- Constructor Detail
  - BFSRTDP
```
public BFSRTDP(Domain domain,
       RewardFunction rf,
       TerminalFunction tf,
       double gamma,
       HashableStateFactory hashingFactory,
       double vInit,
       int numRollouts,
       double maxDelta,
       int maxDepth)
```
    Initializes the valueFunction. The value function will be initialized to vInit by default everywhere and will use a greedy policy with random tie breaks for performing rollouts. Use the DynamicProgramming.setValueFunctionInitialization(ValueFunctionInitialization) method to change the value function initialization and the RTDP.setRollOutPolicy(Policy) method to change the rollout policy to something else.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factor to use
    vInit - the value to the the value function for all states will be initialized
    numRollouts - the number of rollouts to perform when planning is started.
    maxDelta - when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.
    maxDepth - the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
  - BFSRTDP
```
public BFSRTDP(Domain domain,
       RewardFunction rf,
       TerminalFunction tf,
       double gamma,
       HashableStateFactory hashingFactory,
       double vInit,
       int numRollouts,
       double maxDelta,
       int maxDepth,
       StateConditionTest goalCondition)
```
    Initializes the valueFunction. The value function will be initialized to vInit by default everywhere and will use a greedy policy with random tie breaks for performing rollouts. Use the DynamicProgramming.setValueFunctionInitialization(ValueFunctionInitialization) method to change the value function initialization and the RTDP.setRollOutPolicy(Policy) method to change the rollout policy to something else.
    
    Parameters:
    domain - the domain in which to plan
    rf - the reward function
    tf - the terminal state function
    gamma - the discount factor
    hashingFactory - the state hashing factor to use
    vInit - the value to the the value function for all states will be initialized
    numRollouts - the number of rollouts to perform when planning is started.
    maxDelta - when the maximum change in the value function from a rollout is smaller than this value, VI will terminate.
    maxDepth - the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
    goalCondition - a state condition test that returns true for goal states. Causes the BFS-like pass to stop expanding when found.
- Method Detail
  - setGoalCondition
```
public void setGoalCondition(StateConditionTest gc)
```
    Sets the goal state that causes the BFS-like pass to stop expanding when found.
    
    Parameters:
    gc -
  - planFromState
```
public GreedyQPolicy planFromState(State initialState)
```
    Plans from the input state and then returns a GreedyQPolicy that greedily selects the action with the highest Q-value and breaks ties uniformly randomly.
    
    Specified by:
    
    planFromState in interface Planner
    
    Overrides:
    
    planFromState in class RTDP
    
    Parameters:
    initialState - the initial state of the planning problem
    
    Returns:
    a GreedyQPolicy.
  - performInitialPassFromState
```
protected void performInitialPassFromState(State initialState)
```
    Performs a BFS-like pass to either all reachable states or to depth at which a goal state is found and then performs the Bellman update on all those states.
    
    Parameters:
    initialState - the initial state from which to perform the BFS-like pass.
  - performRecahabilityAnalysisFrom
```
protected java.util.List<HashableState> performRecahabilityAnalysisFrom(State si)
```
    Finds either all reachable states from si or all states up to the depth that the first goal state is found from si.
    
    Parameters:
    si - the initial state from which to search for states
    
    Returns:
    the list of all states found
  - satisfiesGoal
```
protected boolean satisfiesGoal(State s)
```
    Returns whether a state is a goal state.
    
    Parameters:
    s - the state to test.
    
    Returns:
    true if s is a goal state; false otherwise.

Class BFSRTDP

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction

Field Summary

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.rtdp.RTDP

Fields inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.rtdp.RTDP

Methods inherited from class burlap.behavior.singleagent.planning.stochastic.DynamicProgramming

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

performedInitialPlan

goalCondition

Constructor Detail

BFSRTDP

BFSRTDP

Method Detail

setGoalCondition

planFromState

performInitialPassFromState

performRecahabilityAnalysisFrom

satisfiesGoal