public class EnvironmentOptionOutcome extends EnvironmentOutcome
EnvironmentOutcomeclass for reporting the effects of applying an
Optionin a given
Environment. This class extends the standard
EnvironmentOutcometo include the discount to apply to the value of time steps following the application of an
Optionand the number of steps taken by the
Environment. The discount is therefore the gamma^t, where gamma is the MDP discount factor and t is the number of time steps taken by the option. The saved reward value (
EnvironmentOutcome.r) for this object will also represent the cumulative discounted reward.
|Modifier and Type||Field and Description|
The discount factor to apply to the value of time steps immediately following the application of an
The executed episode from this execution
|Constructor and Description|
public double discount
Option. Specifically, this value is gamma^t where gamma is the discount factor of the MDP and t is the number of time steps taken by the option.
public Episode episode
public EnvironmentOptionOutcome(State s, Action a, State sp, double r, boolean terminated, double discountFactor, Episode episode)
discountof this object will be set to discountFactor^numSteps, since discountFactor is the discount factor of the MDP and
discountrepresents the amount values in the time step following the option application should be discounted.
s- The previous state of the environment when the action was taken.
a- The action taken in the environment
sp- The next state to which the environment transitioned
r- The reward received
terminated- Whether the next state to which the environment transitioned is a terminal state (true if so, false otherwise)
discountFactor- The discount factor of the MDP.
episode- the episode of execution