public class BFSRTDP extends RTDP
DynamicProgramming.StaticVFPlanner
QFunction.QFunctionHelper
Modifier and Type | Field and Description |
---|---|
protected StateConditionTest |
goalCondition
The goal condition that stops the BFS-like pass
|
protected boolean |
performedInitialPlan
indicates whether the BFS-like pass has already been performed.
|
maxDelta, maxDepth, minNumRolloutsWithSmallValueChange, numberOfBellmanUpdates, numRollouts, rollOutPolicy, useBatch
transitionDynamics, useCachedTransitions, valueFunction, valueInitializer
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
BFSRTDP(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
HashableStateFactory hashingFactory,
double vInit,
int numRollouts,
double maxDelta,
int maxDepth)
Initializes the valueFunction.
|
BFSRTDP(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
HashableStateFactory hashingFactory,
double vInit,
int numRollouts,
double maxDelta,
int maxDepth,
StateConditionTest goalCondition)
Initializes the valueFunction.
|
Modifier and Type | Method and Description |
---|---|
protected void |
performInitialPassFromState(State initialState)
Performs a BFS-like pass to either all reachable states or to depth at which a goal state is found and then performs the Bellman update on all those states.
|
protected java.util.List<HashableState> |
performRecahabilityAnalysisFrom(State si)
Finds either all reachable states from si or all states up to the depth that the first goal state is found from si.
|
GreedyQPolicy |
planFromState(State initialState)
Plans from the input state and then returns a
GreedyQPolicy that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly. |
protected boolean |
satisfiesGoal(State s)
Returns whether a state is a goal state.
|
void |
setGoalCondition(StateConditionTest gc)
Sets the goal state that causes the BFS-like pass to stop expanding when found.
|
batchRTDP, getNumberOfBellmanUpdates, normalRTDP, performOrderedBellmanUpdates, setMaxDelta, setMaxDynamicDepth, setMinNumRolloutsWithSmallValueChange, setNumPasses, setRollOutPolicy, toggleBatchMode
computeQ, computeQ, DPPInit, getActionsTransitions, getAllStates, getCopyOfValueFunction, getDefaultValue, getQ, getQ, getQs, getValueFunctionInitialization, hasComputedValueFor, initializeOptionsForExpectationComputations, performBellmanUpdateOn, performBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, performFixedPolicyBellmanUpdateOn, resetSolver, setValueFunctionInitialization, toggleUseCachedTransitionDynamics, value, value
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, resetSolver, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrinting
protected boolean performedInitialPlan
protected StateConditionTest goalCondition
public BFSRTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth)
DynamicProgramming.setValueFunctionInitialization(ValueFunctionInitialization)
method
to change the value function initialization and the RTDP.setRollOutPolicy(Policy)
method to change the rollout policy to something else.domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factor to usevInit
- the value to the the value function for all states will be initializednumRollouts
- the number of rollouts to perform when planning is started.maxDelta
- when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.maxDepth
- the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.public BFSRTDP(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory, double vInit, int numRollouts, double maxDelta, int maxDepth, StateConditionTest goalCondition)
DynamicProgramming.setValueFunctionInitialization(ValueFunctionInitialization)
method
to change the value function initialization and the RTDP.setRollOutPolicy(Policy)
method to change the rollout policy to something else.domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factor to usevInit
- the value to the the value function for all states will be initializednumRollouts
- the number of rollouts to perform when planning is started.maxDelta
- when the maximum change in the value function from a rollout is smaller than this value, VI will terminate.maxDepth
- the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.goalCondition
- a state condition test that returns true for goal states. Causes the BFS-like pass to stop expanding when found.public void setGoalCondition(StateConditionTest gc)
gc
- public GreedyQPolicy planFromState(State initialState)
GreedyQPolicy
that greedily
selects the action with the highest Q-value and breaks ties uniformly randomly.planFromState
in interface Planner
planFromState
in class RTDP
initialState
- the initial state of the planning problemGreedyQPolicy
.protected void performInitialPassFromState(State initialState)
initialState
- the initial state from which to perform the BFS-like pass.protected java.util.List<HashableState> performRecahabilityAnalysisFrom(State si)
si
- the initial state from which to search for statesprotected boolean satisfiesGoal(State s)
s
- the state to test.