public class QMDP extends MDPSolver implements Planner, QProvider
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected QProvider |
mdpQSource
The fully observable MDP
QProvider source. |
actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel
Constructor and Description |
---|
QMDP(PODomain domain,
QProvider mdpQSource)
Initializes.
|
QMDP(PODomain domain,
RewardFunction rf,
TerminalFunction tf,
double discount,
HashableStateFactory hashingFactory,
double maxDelta,
int maxIterations)
Initializes and creates a
ValueIteration planner
to solve the underling MDP. |
Modifier and Type | Method and Description |
---|---|
void |
forceMDPPlanningFromAllStates()
Calls the
Planner.planFromState(State) method
on all states defined in the POMDP. |
Policy |
planFromState(State initialState)
|
double |
qForBelief(EnumerableBeliefState bs,
Action ga)
Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
|
protected double |
qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs,
Action ga)
Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
|
double |
qValue(State s,
Action a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
qValues(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
double |
value(State s)
Returns the value function evaluation of the given state.
|
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting
public QMDP(PODomain domain, QProvider mdpQSource)
domain
- the POMDP domainmdpQSource
- the underlying fully observable MDP QProvider
source.public QMDP(PODomain domain, RewardFunction rf, TerminalFunction tf, double discount, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)
ValueIteration
planner
to solve the underling MDP. You should call the forceMDPPlanningFromAllStates()
method after construction
to have the constructed ValueIteration
instance
perform planning.domain
- the POMDP domainrf
- the POMDP hidden state reward functiontf
- the POMDP hidden state terminal functiondiscount
- the discount factorhashingFactory
- the HashableStateFactory
to use for the ValueIteration
instance to use.maxDelta
- the maximum value function change threshold that will cause planning to terminiatemaxIterations
- the maximum number of value iteration iterations.public void forceMDPPlanningFromAllStates()
Planner.planFromState(State)
method
on all states defined in the POMDP. Calling this method requires that the PODomain provides a StateEnumerator
,
otherwise an exception will be thrown.public java.util.List<QValue> qValues(State s)
QProvider
List
of QValue
objects for ever permissible action for the given input state.public double qValue(State s, Action a)
QFunction
QValue
for the given state-action pair.public double value(State s)
ValueFunction
value
in interface ValueFunction
s
- the state to evaluate.public double qForBelief(EnumerableBeliefState bs, Action ga)
bs
- the belief statega
- the action whose Q-value is to be computedprotected double qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs, Action ga)
beliefs
- belief state distributionga
- the action whose Q-value is to be computedpublic Policy planFromState(State initialState)
Planner
Planner
to begin planning from the specified initial State
.
It will then return an appropriate Policy
object that captured the planning results.
Note that typically you can use a variety of different Policy
objects
in conjunction with this Planner
to get varying behavior and
the returned Policy
is not required to be used.planFromState
in interface Planner
initialState
- the initial state of the planning problemPolicy
that captures the planning results from input State
.public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver