public class DynamicProgramming extends MDPSolver implements ValueFunction, QFunction
QFunction
which can return
Q-values by using the transition dynamics and the stored value function.
Note that by default DynamicProgramming
instances
will cache the transition dynamics so that they do not have to be procedurally generated
by the Action
. Transition dynamic caching can be disable by calling the toggleUseCachedTransitionDynamics(boolean)
method. This may be desirable if the transition dynamics are expected to change with time, such as when the model is being learned in model-based RL.Modifier and Type | Class and Description |
---|---|
static class |
DynamicProgramming.StaticVFPlanner
This class is used to store tabular value function values that can be manipulated with the
DynamicProgramming
methods. |
QFunction.QFunctionHelper
Modifier and Type | Field and Description |
---|---|
protected java.util.Map<HashableState,java.util.List<ActionTransitions>> |
transitionDynamics
A data structure for storing the hashed transition dynamics from each state, if this algorithm is set to use them.
|
protected boolean |
useCachedTransitions
A boolean toggle to indicate whether the transition dynamics should cached in a hashed data structure for quicker access,
or computed as needed by the Action methods.
|
protected java.util.Map<HashableState,java.lang.Double> |
valueFunction
A map for storing the current value function estimate for each state.
|
protected ValueFunctionInitialization |
valueInitializer
The value function initialization to use; defaulted to an initialization of 0 everywhere.
|
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
DynamicProgramming() |
Modifier and Type | Method and Description |
---|---|
protected double |
computeQ(HashableState sh,
GroundedAction ga)
Computes the Q-value using the uncached transition dynamics produced by the Action object methods.
|
protected double |
computeQ(State s,
ActionTransitions trans)
Returns the Q-value for a given set and the possible transitions from it for a given action.
|
void |
DPPInit(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
HashableStateFactory hashingFactory)
Common init method for
DynamicProgramming instances. |
protected java.util.List<ActionTransitions> |
getActionsTransitions(HashableState sh)
Returns the stored action transitions for the given state.
|
java.util.List<State> |
getAllStates()
This method will return all states that are stored in this planners value function.
|
DynamicProgramming |
getCopyOfValueFunction() |
protected double |
getDefaultValue(State s)
Returns the default V-value to use for the state
|
protected QValue |
getQ(HashableState sh,
GroundedAction a,
java.util.Map<java.lang.String,java.lang.String> matching)
Gets a Q-Value for a hashed state, grounded action, and object instance matching from the hashed states an internally stored hashed transition dynamics.
|
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
ValueFunctionInitialization |
getValueFunctionInitialization()
Returns the value initialization function used.
|
boolean |
hasComputedValueFor(State s)
Returns whether a value for the given state has been computed previously.
|
protected void |
initializeOptionsForExpectationComputations()
Options need to to have transition probabilities computed and keep track of the possible termination states
using as hashed data structure.
|
protected double |
performBellmanUpdateOn(HashableState sh)
Performs a Bellman value function update on the provided (hashed) state.
|
double |
performBellmanUpdateOn(State s)
Performs a Bellman value function update on the provided state.
|
protected double |
performFixedPolicyBellmanUpdateOn(HashableState sh,
Policy p)
Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided (hashed) state.
|
double |
performFixedPolicyBellmanUpdateOn(State s,
Policy p)
Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state.
|
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
void |
setValueFunctionInitialization(ValueFunctionInitialization vfInit)
Sets the value function initialization to use.
|
void |
toggleUseCachedTransitionDynamics(boolean useCachedTransitions)
Sets whether this object should cache hashed transition dynamics for each for faster look up, or whether
to procedurally generate the transition dynamics as needed from the
Action objects. |
double |
value(HashableState sh)
Returns the value function evaluation of the given hashed state.
|
double |
value(State s)
Returns the value function evaluation of the given state.
|
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
protected boolean useCachedTransitions
protected java.util.Map<HashableState,java.util.List<ActionTransitions>> transitionDynamics
protected java.util.Map<HashableState,java.lang.Double> valueFunction
protected ValueFunctionInitialization valueInitializer
public void DPPInit(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, HashableStateFactory hashingFactory)
DynamicProgramming
instances. This will automatically call the
MDPSolver.solverInit(burlap.oomdp.core.Domain, burlap.oomdp.singleagent.RewardFunction, burlap.oomdp.core.TerminalFunction, double, burlap.oomdp.statehashing.HashableStateFactory)
method.domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factorypublic void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver
public void setValueFunctionInitialization(ValueFunctionInitialization vfInit)
vfInit
- the object that defines how to initializes the value function.public ValueFunctionInitialization getValueFunctionInitialization()
public boolean hasComputedValueFor(State s)
s
- the state to checkpublic double value(State s)
value
in interface ValueFunction
s
- the state to evaluate.public double value(HashableState sh)
sh
- the hashed state to evaluate.public void toggleUseCachedTransitionDynamics(boolean useCachedTransitions)
Action
objects.
Letting the transition dynamics be procedurally generated may be useful if the transition dynamics can change over the time
such as when using a learned model.useCachedTransitions
- true if the transition dynamics should be cached and stored; false if they should always be procedurally generated from the Action
objects.public java.util.List<QValue> getQs(State s)
QFunction
List
of QValue
objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QFunction
QValue
for the given state-action pair.public java.util.List<State> getAllStates()
public DynamicProgramming getCopyOfValueFunction()
protected QValue getQ(HashableState sh, GroundedAction a, java.util.Map<java.lang.String,java.lang.String> matching)
sh
- the input statea
- the action to get the Q-value formatching
- the object instance matching from sh to the corresponding state stored in the value functionprotected java.util.List<ActionTransitions> getActionsTransitions(HashableState sh)
sh
- the input state from which to get the transitionspublic double performBellmanUpdateOn(State s)
s
- the state on which to perform the Bellman update.public double performFixedPolicyBellmanUpdateOn(State s, Policy p)
s
- the state on which to perform the Bellman update.p
- the policy that is being evaluatedprotected double performBellmanUpdateOn(HashableState sh)
sh
- the hashed state on which to perform the Bellman update.protected double performFixedPolicyBellmanUpdateOn(HashableState sh, Policy p)
sh
- the hashed state on which to perform the Bellman update.p
- the policy that is being evaluatedprotected double computeQ(State s, ActionTransitions trans)
Option
objects.s
- the given statetrans
- the given action transitionsprotected double computeQ(HashableState sh, GroundedAction ga)
Option
objects.sh
- the given statega
- the given actionprotected double getDefaultValue(State s)
s
- the input state to get the default V-value forprotected void initializeOptionsForExpectationComputations()