QMDP

java.lang.Object
- burlap.behavior.singleagent.MDPSolver
- - burlap.behavior.singleagent.pomdp.qmdp.QMDP

All Implemented Interfaces:

MDPSolverInterface, Planner, QFunction, QProvider, ValueFunction
```
public class QMDP
extends MDPSolver
implements Planner, QProvider
```
An implementation of QMDP for POMDP domains. This is a fast approximation method that has the agent acting as though it would obtain perfect knowledge of the state in the next time step. It works by solving the underling fully observable MDP, and then setting the Q-value for belief states to be the expected fully observable Q-value. Therefore, planning is only as hard as MDP planning. This implementation can take different sources for the MDP QFunction.

Nested Class Summary
- Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider
  QProvider.Helper

Field Summary

Fields
Modifier and Type Field and Description

protected QProvider mdpQSource
The fully observable MDP QProvider source.
- Fields inherited from class burlap.behavior.singleagent.MDPSolver
  actionTypes, debugCode, domain, gamma, hashingFactory, model, usingOptionModel

Fields
Modifier and Type	Field and Description
`protected QProvider`	`mdpQSource` The fully observable MDP `QProvider` source.

Constructor Summary

Constructors
Constructor and Description
`QMDP(PODomain domain, QProvider mdpQSource)` Initializes.
`QMDP(PODomain domain, RewardFunction rf, TerminalFunction tf, double discount, HashableStateFactory hashingFactory, double maxDelta, int maxIterations)` Initializes and creates a `ValueIteration` planner to solve the underling MDP.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`forceMDPPlanningFromAllStates()` Calls the `Planner.planFromState(State)` method on all states defined in the POMDP.
`Policy`	`planFromState(State initialState)` This method will cause the `Planner` to begin planning from the specified initial `State`.
`double`	`qForBelief(EnumerableBeliefState bs, Action ga)` Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
`protected double`	`qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs, Action ga)` Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
`double`	`qValue(State s, Action a)` Returns the `QValue` for the given state-action pair.
`java.util.List<QValue>`	`qValues(State s)` Returns a `List` of `QValue` objects for ever permissible action for the given input state.
`void`	`resetSolver()` This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
`double`	`value(State s)` Returns the value function evaluation of the given state.

Methods inherited from class burlap.behavior.singleagent.MDPSolver
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface
addActionType, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, getModel, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, toggleDebugPrinting

- Field Detail
  - mdpQSource
```
protected QProvider mdpQSource
```
    The fully observable MDP QProvider source.
- Constructor Detail
  - QMDP
```
public QMDP(PODomain domain,
            QProvider mdpQSource)
```
    Initializes.
    
    Parameters:
    
    domain - the POMDP domain
    
    mdpQSource - the underlying fully observable MDP QProvider source.
  - QMDP
```
public QMDP(PODomain domain,
            RewardFunction rf,
            TerminalFunction tf,
            double discount,
            HashableStateFactory hashingFactory,
            double maxDelta,
            int maxIterations)
```
    Initializes and creates a ValueIteration planner to solve the underling MDP. You should call the forceMDPPlanningFromAllStates() method after construction to have the constructed ValueIteration instance perform planning.
    
    Parameters:
    
    domain - the POMDP domain
    
    rf - the POMDP hidden state reward function
    
    tf - the POMDP hidden state terminal function
    
    discount - the discount factor
    
    hashingFactory - the HashableStateFactory to use for the ValueIteration instance to use.
    
    maxDelta - the maximum value function change threshold that will cause planning to terminiate
    
    maxIterations - the maximum number of value iteration iterations.
- Method Detail
  - forceMDPPlanningFromAllStates
```
public void forceMDPPlanningFromAllStates()
```
    Calls the Planner.planFromState(State) method on all states defined in the POMDP. Calling this method requires that the PODomain provides a StateEnumerator, otherwise an exception will be thrown.
  - qValues
```
public java.util.List<QValue> qValues(State s)
```
    Description copied from interface: QProvider
    
    Returns a List of QValue objects for ever permissible action for the given input state.
    
    Specified by:
    
    qValues in interface QProvider
    
    Parameters:
    
    s - the state for which Q-values are to be returned.
    
    Returns:
    
    a List of QValue objects for ever permissible action for the given input state.
  - qValue
```
public double qValue(State s,
                     Action a)
```
    Description copied from interface: QFunction
    
    Returns the QValue for the given state-action pair.
    
    Specified by:
    
    qValue in interface QFunction
    
    Parameters:
    
    s - the input state
    
    a - the input action
    
    Returns:
    
    the QValue for the given state-action pair.
  - value
```
public double value(State s)
```
    Description copied from interface: ValueFunction
    
    Returns the value function evaluation of the given state. If the value is not stored, then the default value specified by the ValueFunctionInitialization object of this class is returned.
    
    Specified by:
    
    value in interface ValueFunction
    
    Parameters:
    
    s - the state to evaluate.
    
    Returns:
    
    the value function evaluation of the given state.
  - qForBelief
```
public double qForBelief(EnumerableBeliefState bs,
                         Action ga)
```
    Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
    
    Parameters:
    
    bs - the belief state
    
    ga - the action whose Q-value is to be computed
    
    Returns:
    
    the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
  - qForBeliefList
```
protected double qForBeliefList(java.util.List<EnumerableBeliefState.StateBelief> beliefs,
                                Action ga)
```
    Computes the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
    
    Parameters:
    
    beliefs - belief state distribution
    
    ga - the action whose Q-value is to be computed
    
    Returns:
    
    the expected Q-value of the underlying hidden MDP by marginalizing over of the states in the belief state.
  - planFromState
```
public Policy planFromState(State initialState)
```
    Description copied from interface: Planner
    
    This method will cause the Planner to begin planning from the specified initial State. It will then return an appropriate Policy object that captured the planning results. Note that typically you can use a variety of different Policy objects in conjunction with this Planner to get varying behavior and the returned Policy is not required to be used.
    
    Specified by:
    
    planFromState in interface Planner
    
    Parameters:
    
    initialState - the initial state of the planning problem
    
    Returns:
    
    a Policy that captures the planning results from input State.
  - resetSolver
```
public void resetSolver()
```
    Description copied from interface: MDPSolverInterface
    
    This method resets all solver results so that a solver can be restarted fresh as if had never solved the MDP.
    
    Specified by:
    
    resetSolver in interface MDPSolverInterface
    
    Specified by:
    
    resetSolver in class MDPSolver

Class QMDP

Nested Class Summary

Nested classes/interfaces inherited from interface burlap.behavior.valuefunction.QProvider

Field Summary

Fields inherited from class burlap.behavior.singleagent.MDPSolver

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.singleagent.MDPSolver

Methods inherited from class java.lang.Object

Methods inherited from interface burlap.behavior.singleagent.MDPSolverInterface

Field Detail

mdpQSource

Constructor Detail

QMDP

QMDP

Method Detail

forceMDPPlanningFromAllStates

qValues

qValue

value

qForBelief

qForBeliefList

planFromState

resetSolver