public class RewardValueProjection extends java.lang.Object implements QProvider
QProvider
/ValueFunction
wrapper to provide the immediate reward signals for a source RewardFunction
.
It is useful for analyzing learned reward function through IRL, for example, for passing a learned reward function
to a ValueFunctionVisualizerGUI
to visualize what
was learned. This class returns values based one of four possible reward projection types
(RewardValueProjection.RewardProjectionType
):SOURCESTATE: when the reward function only depends on the source state
DESTINATIONSTATE: when the reward function only depends on the destination state (the state to which the agent transitions)
STATEACTION: when the reward function only depends on the state-action pair
ONESTEP: when the reward function depends on a transition of some sort (e.g., from a source state to a target state)
The default assumption is DESTINATIONSTATE.
When the value(State)
of a state is queried, it returns the value of the
RewardFunction
using the most minimal information. For example, if the projection
type is DESTINATIONSTATE, then the value returned is rf.reward(null, null, s), where rf is the input RewardFunction
and s is the input State
to the value(State)
method.
If it's SOURCESTATE, then it returns rf.reward(s, null, null). If it is STATEACTION or ONESTEP,
then the Domain
will need to have been input with the RewardValueProjection(RewardFunction, RewardProjectionType, SADomain)
constructor so that the actions can be enumerated (and in the case of ONESTEP, the transitions enumerated) and the max reward taken.
Similarly, the qValue(State, Action)
and
qValues(State)
methods may need the Domain
provided to properly answer the query.
Modifier and Type | Class and Description |
---|---|
static class |
RewardValueProjection.CustomRewardNoTermModel |
static class |
RewardValueProjection.RewardProjectionType |
QProvider.Helper
Modifier and Type | Field and Description |
---|---|
protected SADomain |
domain |
protected SparseSampling |
oneStepBellmanPlanner |
protected RewardValueProjection.RewardProjectionType |
projectionType |
protected RewardFunction |
rf |
Constructor and Description |
---|
RewardValueProjection(RewardFunction rf)
Initializes for the given
RewardFunction assuming that it only depends on the destination state. |
RewardValueProjection(RewardFunction rf,
RewardValueProjection.RewardProjectionType projectionType)
Initializes.
|
RewardValueProjection(RewardFunction rf,
RewardValueProjection.RewardProjectionType projectionType,
SADomain domain)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
double |
qValue(State s,
Action a)
Returns the
QValue for the given state-action pair. |
java.util.List<QValue> |
qValues(State s)
Returns a
List of QValue objects for ever permissible action for the given input state. |
double |
value(State s)
Returns the value function evaluation of the given state.
|
protected RewardFunction rf
protected RewardValueProjection.RewardProjectionType projectionType
protected SparseSampling oneStepBellmanPlanner
protected SADomain domain
public RewardValueProjection(RewardFunction rf)
RewardFunction
assuming that it only depends on the destination state.rf
- the input RewardFunction
to project for one step.public RewardValueProjection(RewardFunction rf, RewardValueProjection.RewardProjectionType projectionType)
Domain
to enumerate the actions and transition dynamics. Use the
RewardValueProjection(RewardFunction, RewardProjectionType, SADomain)
constructor instead.rf
- the input RewardFunction
to project for one step.projectionType
- the type of reward projection to use.public RewardValueProjection(RewardFunction rf, RewardValueProjection.RewardProjectionType projectionType, SADomain domain)
rf
- the input RewardFunction
to project for one step.projectionType
- the type of reward projection to use.domain
- the Domain
in which the RewardFunction
is evaluated.public java.util.List<QValue> qValues(State s)
QProvider
List
of QValue
objects for ever permissible action for the given input state.public double qValue(State s, Action a)
QFunction
QValue
for the given state-action pair.public double value(State s)
ValueFunction
value
in interface ValueFunction
s
- the state to evaluate.