QMDP

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.pomdp.qmdp.QMDP

All Implemented Interfaces:

MDPSolverInterface, Planner, QFunction, ValueFunction
```
public class QMDP
extends MDPSolver
implements Planner, QFunction
```
An implementation of QMDP for POMDP domains. This is a fast approximation method that has the agent acting as though it would obtain perfect knowledge of the state in the next time step. It works by solving the underling fully observable MDP, and then setting the Q-value for belief states to be the expected fully observable Q-value. Therefore, planning is only as hard as MDP planning. This implementation can take different sources for the MDP QFunction.

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction
  QFunction.QFunctionHelper

Field Summary

Fields
Modifier and Type Field and Description

protected QFunction mdpQSource
The fully observable MDP QFunction source.
- Fields inherited from class burlap.behavior.singleagent.MDPSolver
  actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf

Fields
Modifier and Type	Field and Description
`protected QFunction`	`mdpQSource` The fully observable MDP `QFunction` source.

Constructor Summary

Constructors
Constructor and Description
`QMDP(PODomain domain, QFunction mdpQSource)` Initializes.
`QMDP(PODomain domain, RewardFunction rf, TerminalFunction tf, double discount, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)` Initializes and creates a `ValueIteration` planner to solve the underling MDP.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`forceMDPPlanningFromAllStates()` Calls the `Planner.planFromState(burlap.oomdp.core.states.State)` method on all states defined in the POMDP.
`QValue`	`getQ(State s, AbstractGroundedAction a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`getQs(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`Policy`	`planFromState(State initialState)` This method will cause the `Planner` to begin planning from the specified initial `State`.
`double`	`qForBelief(EnumerableBeliefState bs, GroundedAction ga)` Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
`protected double`	`qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs, GroundedAction ga)` Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addNonDomainReferencedAction, getActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, toggleDebugPrinting

- Field Detail
  - mdpQSource
```
protected QFunction mdpQSource
```
    The fully observable MDP QFunction source.
- Constructor Detail
  - QMDP
```
public QMDP(PODomain domain,
    QFunction mdpQSource)
```
    Initializes.
    
    Parameters:
    domain - the POMDP domain
    mdpQSource - the underlying fully observable MDP QFunction source.
  - QMDP
```
public QMDP(PODomain domain,
    RewardFunction rf,
    TerminalFunction tf,
    double discount,
    HashableStateFactory hashingFactory,
    double maxDelta,
    int maxIterations)
```
    Initializes and creates a ValueIteration planner to solve the underling MDP. You should call the forceMDPPlanningFromAllStates() method after construction to have the constructed ValueIteration instance perform planning.
    
    Parameters:
    domain - the POMDP domain
    rf - the POMDP hidden state reward function
    tf - the POMDP hidden state terminal function
    discount - the discount factor
    hashingFactory - the HashableStateFactory to use for the ValueIteration instance to use.
    maxDelta - the maximum value function change threshold that will cause planning to terminiate
    maxIterations - the maximum number of value iteration iterations.
- Method Detail
  - forceMDPPlanningFromAllStates
```
public void forceMDPPlanningFromAllStates()
```
    Calls the Planner.planFromState(burlap.oomdp.core.states.State) method on all states defined in the POMDP. Calling this method requires that the PODomain provides a StateEnumerator, otherwise an exception will be thrown.
  - getQs
```
public java.util.List<QValue> getQs(State s)
```
    Description copied from interface: QFunction
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    getQs in interface QFunction
    
    Parameters:
    s - the state for which Q-values are to be returned.
    
    Returns:
    a List of QValue objects for ever permissible action for the given input state.
  - getQ
```
public QValue getQ(State s,
          AbstractGroundedAction a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    getQ in interface QFunction
    
    Parameters:
    s - the input state
    a - the input action
    
    Returns:
    the QValue for the given state-action pair.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    s - the state to evaluate.
    
    Returns:
    the value function evaluation of the given state.
  - qForBelief
```
public double qForBelief(EnumerableBeliefState bs,
                GroundedAction ga)
```
    Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
    
    Parameters:
    bs - the belief state
    ga - the action whose Q-value is to be computed
    
    Returns:
    the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
  - qForBeliefList
```
protected double qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs,
                    GroundedAction ga)
```
    Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
    
    Parameters:
    beliefs - belief state distribution
    ga - the action whose Q-value is to be computed
    
    Returns:
    the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
  - planFromState
```
public Policy planFromState(State initialState)
```
    Description copied from interface: Planner
    
    This method will cause the Planner to begin planning from the specified initial State. It will then return an appropriate Policy object that captured the planning results. Note that typically you can use a variety of different Policy objects in conjunction with this Planner to get varying behavior and the returned Policy is not required to be used.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    initialState - the initial state of the planning problem
    
    Returns:
    a Policy that captures the planning results from input State.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver

Class QMDP

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QFunction

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

mdpQSource

Constructor Detail

QMDP

QMDP

Method Detail

forceMDPPlanningFromAllStates

getQs

getQ

value

qForBelief

qForBeliefList

planFromState

resetSolver