public class PotentialShapedRF extends ShapedRewardFunction
PotentialFunction
and the discount being used by the MDP. The additive reward is defined as:
d * p(s') - p(s)
where d is this discount factor, s' is the most recent state, s is the previous state, and p(s) is the potential of state s.
1. Ng, Andrew Y., Daishi Harada, and Stuart Russell. "Policy invariance under reward transformations: Theory and application to reward shaping." ICML. 1999.Modifier and Type | Field and Description |
---|---|
protected double |
discount
The discount factor the MDP (required for this to shaping to preserve policy optimality)
|
protected PotentialFunction |
potentialFunction
The potential function that can be used to return the potential reward from input states.
|
baseRF
Constructor and Description |
---|
PotentialShapedRF(RewardFunction baseRF,
PotentialFunction potentialFunction,
double discount)
Initializes the shaping with the objective reward function, the potential function, and the discount of the MDP.
|
Modifier and Type | Method and Description |
---|---|
double |
additiveReward(State s,
Action a,
State sprime)
Returns the reward value to add to the base objective reward function.
|
reward
protected PotentialFunction potentialFunction
protected double discount
public PotentialShapedRF(RewardFunction baseRF, PotentialFunction potentialFunction, double discount)
baseRF
- the objective task reward function.potentialFunction
- the potential function to use.discount
- the discount factor of the MDP.public double additiveReward(State s, Action a, State sprime)
ShapedRewardFunction
additiveReward
in class ShapedRewardFunction
s
- the previous statea
- the action taken the previous statesprime
- the successor state