Interface for defining planning algorithms that operate on iteratively learned models. Planning algorithms that operate on iteratively learned models
must support features for replanning when the model changes and returning the policy of the plan under the current model.
This is method is expected to be called at the beginning of any new learning episode. This may be useful for planning algorithms
that do not solve the policy for every state since new episodes may starts in states the planning algorithm had not previously considered.
before a learning episode begins.