If an option deterministically terminates with a fixed number of steps, then it may be useful for an option to immediately transition from the state in which the option was initiated to the end terminal state, rather than having to simulate each step of execution.
An option with a defined policy, initiation state set and termination state set.
It is typical for options to be defined for following policies to subgoals and it is often useful to use a planning or learning algorithm to define these policies, in which case a subgoal reward function for the option would need to be specified.
It is typical for options to be defined for following policies to subgoals and it is often useful to use a planning or learning algorithm to define these policies, in which case a terminal function for the option would need to be specified in order to learn or plan for its policy.
A macro action is an action that always executes a sequence of actions.
This is an abstract class to provide support to learning and planning with options , which are temporally extended actions.
This class is a reward function that accepts a reward function for primitive actions and returns that when the query action is a primitive.
This is a subgoal option whose initiation states are defined by the state in which the policy is defined.
This class is just an option wrapper of a standard primitive action.