public class PotentialShapedRF extends ShapedRewardFunction
PotentialFunction and the discount being used by the MDP. The additive reward is defined as:
d * p(s') - p(s)
where d is this discount factor, s' is the most recent state, s is the previous state, and p(s) is the potential of state s.
1. Ng, Andrew Y., Daishi Harada, and Stuart Russell. "Policy invariance under reward transformations: Theory and application to reward shaping." ICML. 1999.| Modifier and Type | Field and Description |
|---|---|
protected double |
discount
The discount factor the MDP (required for this to shaping to preserve policy optimality)
|
protected PotentialFunction |
potentialFunction
The potential function that can be used to return the potential reward from input states.
|
baseRF| Constructor and Description |
|---|
PotentialShapedRF(RewardFunction baseRF,
PotentialFunction potentialFunction,
double discount)
Initializes the shaping with the objective reward function, the potential function, and the discount of the MDP.
|
| Modifier and Type | Method and Description |
|---|---|
double |
additiveReward(State s,
GroundedAction a,
State sprime)
Returns the reward value to add to the base objective reward function.
|
rewardprotected PotentialFunction potentialFunction
protected double discount
public PotentialShapedRF(RewardFunction baseRF, PotentialFunction potentialFunction, double discount)
baseRF - the objective task reward function.potentialFunction - the potential function to use.discount - the discount factor of the MDP.public double additiveReward(State s, GroundedAction a, State sprime)
ShapedRewardFunctionadditiveReward in class ShapedRewardFunctions - the previous statea - the action taken the previous statesprime - the successor state