public abstract class ValueFunctionPlanner extends OOMDPPlanner implements ValueFunction, QComputablePlanner
Action
. Transition dynamic caching can be disable by calling the toggleUseCachedTransitionDynamics(boolean)
method. This may be desirable if the transition dynamics are expected to change with time, such as when the model is being learned in model-based RL.Modifier and Type | Class and Description |
---|---|
static class |
ValueFunctionPlanner.StaticVFPlanner
This class is used to store tabular value function values that can be manipulated with the
ValueFunctionPlanner
methods. |
QComputablePlanner.QComputablePlannerHelper
Modifier and Type | Field and Description |
---|---|
protected java.util.Map<StateHashTuple,java.util.List<ActionTransitions>> |
transitionDynamics
A data structure for storing the hashed transition dynamics from each state, if this algorithm is set to use them.
|
protected boolean |
useCachedTransitions
A boolean toggle to indicate whether the transition dynamics should cached in a hashed data structure for quicker access,
or computed as needed by the Action methods.
|
protected java.util.Map<StateHashTuple,java.lang.Double> |
valueFunction
A map for storing the current value function estimate for each state.
|
protected ValueFunctionInitialization |
valueInitializer
The value function initialization to use; defaulted to an initialization of 0 everywhere.
|
actions, containsParameterizedActions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
ValueFunctionPlanner() |
Modifier and Type | Method and Description |
---|---|
protected double |
computeQ(State s,
ActionTransitions trans)
Returns the Q-value for a given set and the possible transitions from it for a given action.
|
protected double |
computeQ(StateHashTuple sh,
GroundedAction ga)
Computes the Q-value using the uncached transition dynamics produced by the Action object methods.
|
protected java.util.List<ActionTransitions> |
getActionsTransitions(StateHashTuple sh)
Returns the stored action transitions for the given state.
|
java.util.List<State> |
getAllStates()
This method will return all states that are stored in this planners value function.
|
ValueFunctionPlanner.StaticVFPlanner |
getCopyOfValueFunction() |
protected double |
getDefaultValue(State s)
Returns the default V-value to use for the state
|
QValue |
getQ(State s,
AbstractGroundedAction a)
Returns the
QValue for the given state-action pair. |
protected QValue |
getQ(StateHashTuple sh,
GroundedAction a,
java.util.Map<java.lang.String,java.lang.String> matching)
Gets a Q-Value for a hashed state, grounded action, and object instance matching from the hashed states an internally stored hashed transition dynamics.
|
java.util.List<QValue> |
getQs(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
ValueFunctionInitialization |
getValueFunctionInitialization()
Returns the value initialization function used.
|
boolean |
hasComputedValueFor(State s)
Returns whether a value for the given state has been computed previously.
|
protected void |
initializeOptionsForExpectationComputations()
Options need to to have transition probabilities computed and keep track of the possible termination states
using as hashed data structure.
|
double |
performBellmanUpdateOn(State s)
Performs a Bellman value function update on the provided state.
|
protected double |
performBellmanUpdateOn(StateHashTuple sh)
Performs a Bellman value function update on the provided (hashed) state.
|
protected double |
performFixedPolicyBellmanUpdateOn(StateHashTuple sh,
Policy p)
Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided (hashed) state.
|
double |
performFixedPolicyBellmanUpdateOn(State s,
Policy p)
Performs a fixed-policy Bellman value function update (i.e., policy evaluation) on the provided state.
|
abstract void |
planFromState(State initialState)
This method will cause the planner to begin planning from the specified initial state
|
void |
resetPlannerResults()
Use this method to reset all planner results so that planning can be started fresh with a call to
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. |
void |
setValueFunctionInitialization(ValueFunctionInitialization vfInit)
Sets the value function initialization to use.
|
void |
toggleUseCachedTransitionDynamics(boolean useCachedTransitions)
Sets whether this object should cache hashed transition dynamics for each for faster look up, or whether
to procedurally generate the transition dynamics as needed from the
Action objects. |
double |
value(State s)
Returns the value function evaluation of the given state.
|
double |
value(StateHashTuple sh)
Returns the value function evaluation of the given hashed state.
|
void |
VFPInit(Domain domain,
RewardFunction rf,
TerminalFunction tf,
double gamma,
StateHashFactory hashingFactory)
Common init method for ValueFunction Planners.
|
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, plannerInit, setActions, setDebugCode, setDomain, setGamma, setRf, setTf, stateHash, toggleDebugPrinting, translateAction
protected boolean useCachedTransitions
protected java.util.Map<StateHashTuple,java.util.List<ActionTransitions>> transitionDynamics
protected java.util.Map<StateHashTuple,java.lang.Double> valueFunction
protected ValueFunctionInitialization valueInitializer
public abstract void planFromState(State initialState)
OOMDPPlanner
planFromState
in class OOMDPPlanner
initialState
- the initial state of the planning problempublic void VFPInit(Domain domain, RewardFunction rf, TerminalFunction tf, double gamma, StateHashFactory hashingFactory)
domain
- the domain in which to planrf
- the reward functiontf
- the terminal state functiongamma
- the discount factorhashingFactory
- the state hashing factorypublic void resetPlannerResults()
OOMDPPlanner
OOMDPPlanner.planFromState(State)
as if no planning had ever been performed before. Specifically, data produced from calls to the
OOMDPPlanner.planFromState(State)
will be cleared, but all other planner settings should remain the same.
This is useful if the reward function or transition dynamics have changed, thereby
requiring new results to be computed. If there were other objects this planner was provided that may have changed
and need to be reset, you will need to reset them yourself. For instance, if you told a planner to follow a policy
that had a temperature parameter decrease with time, you will need to reset the policy's temperature yourself.resetPlannerResults
in class OOMDPPlanner
public void setValueFunctionInitialization(ValueFunctionInitialization vfInit)
vfInit
- the object that defines how to initializes the value function.public ValueFunctionInitialization getValueFunctionInitialization()
public boolean hasComputedValueFor(State s)
s
- the state to checkpublic double value(State s)
value
in interface ValueFunction
s
- the state to evaluate.public double value(StateHashTuple sh)
sh
- the hashed state to evaluate.public void toggleUseCachedTransitionDynamics(boolean useCachedTransitions)
Action
objects.
Letting the transition dynamics be procedurally generated may be useful if the transition dynamics can change over the time
such as when using a learned model.useCachedTransitions
- true if the transition dynamics should be cached and stored; false if they should always be procedurally generated from the Action
objects.public java.util.List<QValue> getQs(State s)
QComputablePlanner
List
of QValue
objects for ever permissible action for the given input state.getQs
in interface QComputablePlanner
s
- the state for which Q-values are to be returned.List
of QValue
objects for ever permissible action for the given input state.public QValue getQ(State s, AbstractGroundedAction a)
QComputablePlanner
QValue
for the given state-action pair.getQ
in interface QComputablePlanner
s
- the input statea
- the input actionQValue
for the given state-action pair.public java.util.List<State> getAllStates()
public ValueFunctionPlanner.StaticVFPlanner getCopyOfValueFunction()
protected QValue getQ(StateHashTuple sh, GroundedAction a, java.util.Map<java.lang.String,java.lang.String> matching)
sh
- the input statea
- the action to get the Q-value formatching
- the object instance matching from sh to the corresponding state stored in the value functionprotected java.util.List<ActionTransitions> getActionsTransitions(StateHashTuple sh)
sh
- the input state from which to get the transitionspublic double performBellmanUpdateOn(State s)
s
- the state on which to perform the Bellman update.public double performFixedPolicyBellmanUpdateOn(State s, Policy p)
s
- the state on which to perform the Bellman update.p
- the policy that is being evaluatedprotected double performBellmanUpdateOn(StateHashTuple sh)
sh
- the hashed state on which to perform the Bellman update.protected double performFixedPolicyBellmanUpdateOn(StateHashTuple sh, Policy p)
sh
- the hashed state on which to perform the Bellman update.p
- the policy that is being evaluatedprotected double computeQ(State s, ActionTransitions trans)
Option
objects.s
- the given statetrans
- the given action transitionsprotected double computeQ(StateHashTuple sh, GroundedAction ga)
Option
objects.sh
- the given statega
- the given actionprotected double getDefaultValue(State s)
s
- the input state to get the default V-value forprotected void initializeOptionsForExpectationComputations()