public class LinearStateActionDifferentiableRF extends DifferentiableRF
DifferentiableRF
.
The class takes as input a StateToFeatureVectorGenerator
and the set of possible
grounded actions that can be applied in the world. The dimensionality of this reward function is equal to |A|*|f|,
where A is the set of possible grounded actions, and |f| is the state feature vector dimensionality.
The reward function is defined as R(s, a, s') = w(a) * f(s), where w(a) is the set of weights (the parameters) of this
reward functions associated with action a, * is the dot product operator, and f(s) is the feature vector for state s.
Note that when the gradient is a vector of size |A||f|, since the feature vector is replicated for each action, and the gradient
for all entries associated with an action other than the one taken in the (s, a, s') query will have a gradient value of zero.
The set of possible grounded actions must be defined either in the LinearStateActionDifferentiableRF(burlap.behavior.singleagent.vfa.StateToFeatureVectorGenerator, int, burlap.oomdp.singleagent.GroundedAction...)
constructor, or added iteratively with the addAction(burlap.oomdp.singleagent.GroundedAction)
method.Modifier and Type | Field and Description |
---|---|
protected java.util.Map<GroundedAction,java.lang.Integer> |
actionMap
An ordering of grounded actions
|
protected StateToFeatureVectorGenerator |
fvGen
The state feature vector generator to use
|
protected int |
numStateFeatures
The number of state features
|
dim, parameters
Constructor and Description |
---|
LinearStateActionDifferentiableRF(StateToFeatureVectorGenerator stateFeatures,
int numStateFeatures,
GroundedAction... allPossibleActions)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
void |
addAction(GroundedAction ga)
Adds a possible grounded action.
|
protected DifferentiableRF |
copyHelper()
A helper method for making a copy of this reward function.
|
protected void |
copyInto(double[] source,
double[] target,
int index)
The copies the values of source into the target, starting in target index position index.
|
double[] |
getGradient(State s,
GroundedAction ga,
State sp)
Returns the gradient of the reward function for the given state transition.
|
double |
reward(State s,
GroundedAction a,
State sprime)
Returns the reward received when action a is executed in state s and the agent transitions to state sprime.
|
copy, getParameterDimension, getParameters, randomizeParameters, setParameter, setParameters, toString
protected java.util.Map<GroundedAction,java.lang.Integer> actionMap
protected StateToFeatureVectorGenerator fvGen
protected int numStateFeatures
public LinearStateActionDifferentiableRF(StateToFeatureVectorGenerator stateFeatures, int numStateFeatures, GroundedAction... allPossibleActions)
addAction(burlap.oomdp.singleagent.GroundedAction)
method.stateFeatures
- the state feature vector generatornumStateFeatures
- the dimensionality of the state feature vectorallPossibleActions
- the set of possible grounded actions.public void addAction(GroundedAction ga)
ga
- the possible grounded action to add to this reward function's definition.protected DifferentiableRF copyHelper()
DifferentiableRF
DifferentiableRF.copy()
method.copyHelper
in class DifferentiableRF
public double reward(State s, GroundedAction a, State sprime)
RewardFunction
s
- the state in which the action was executeda
- the action executedsprime
- the state to which the agent transitionedpublic double[] getGradient(State s, GroundedAction ga, State sp)
DifferentiableRF
getGradient
in class DifferentiableRF
s
- the source statega
- the action taken in the source statesp
- the resulting state from the actionprotected void copyInto(double[] source, double[] target, int index)
source
- the source valuestarget
- the target array to receive the source valuesindex
- the starting index in the target array into which the source values will be copied.