public class QMDP extends MDPSolver implements Planner, QFunction
QFunction.QFunctionHelper| Modifier and Type | Field and Description |
|---|---|
protected QFunction |
mdpQSource
The fully observable MDP
QFunction source. |
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf| Constructor and Description |
|---|
QMDP(PODomain domain,
QFunction mdpQSource)
Initializes.
|
QMDP(PODomain domain,
RewardFunction rf,
TerminalFunction tf,
double discount,
HashableStateFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes and creates a
ValueIteration planner
to solve the underling MDP. |
| Modifier and Type | Method and Description |
|---|---|
void |
forceMDPPlanningFromAllStates()
Calls the
Planner.planFromState(burlap.oomdp.core.states.State) method
on all states defined in the POMDP. |
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
Policy |
planFromState(State initialState)
|
double |
qForBelief(EnumerableBeliefState bs,
GroundedAction ga)
Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
|
protected double |
qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs,
GroundedAction ga)
Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
|
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
double |
value(State s)
Returns the value function evaluation of the given state.
|
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateActionclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitaddNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrintingpublic QMDP(PODomain domain, QFunction mdpQSource)
domain - the POMDP domainmdpQSource - the underlying fully observable MDP QFunction source.public QMDP(PODomain domain, RewardFunction rf, TerminalFunction tf, double discount, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)
ValueIteration planner
to solve the underling MDP. You should call the forceMDPPlanningFromAllStates() method after construction
to have the constructed ValueIteration instance
perform planning.domain - the POMDP domainrf - the POMDP hidden state reward functiontf - the POMDP hidden state terminal functiondiscount - the discount factorhashingFactory - the HashableStateFactory to use for the ValueIteration instance to use.maxDelta - the maximum value function change threshold that will cause planning to terminiatemaxIterations - the maximum number of value iteration iterations.public void forceMDPPlanningFromAllStates()
Planner.planFromState(burlap.oomdp.core.states.State) method
on all states defined in the POMDP. Calling this method requires that the PODomain provides a StateEnumerator,
otherwise an exception will be thrown.public java.util.List<QValue> getQs(State s)
QFunctionList of QValue objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QFunctionQValue for the given state-action pair.public double value(State s)
ValueFunctionvalue in interface ValueFunctions - the state to evaluate.public double qForBelief(EnumerableBeliefState bs, GroundedAction ga)
bs - the belief statega - the action whose Q-value is to be computedprotected double qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs, GroundedAction ga)
beliefs - belief state distributionga - the action whose Q-value is to be computedpublic Policy planFromState(State initialState)
PlannerPlanner to begin planning from the specified initial State.
It will then return an appropriate Policy object that captured the planning results.
Note that typically you can use a variety of different Policy objects
in conjunction with this Planner to get varying behavior and
the returned Policy is not required to be used.planFromState in interface PlannerinitialState - the initial state of the planning problemPolicy that captures the planning results from input State.public void resetSolver()
MDPSolverInterfaceresetSolver in interface MDPSolverInterfaceresetSolver in class MDPSolver