This class implements a Boltzmann policy where the the Q-values represent the components of the Boltzmann distribution.
This class can be used to lazily cache the policy of a source policy.
This class defines a an epsilon-greedy policy over Q-values and requires a QComputable planner to be specified.
A greedy policy that breaks ties by choosing the first action with the maximum value.
A greedy policy that breaks ties by randomly choosing an action amongst the tied actions.