Computes the exact Q-value using full Bellman update with the actual transition dynamics. This procedure will cause Sparse Sampling
to compute the exact Q-values and optimal policy for a finite horizon problem. It is recommended when the number of transitions from
any given state is small tractable to compute.
ga - the action for which the Q-value estimate is to be returned
the exact finite horizon Q-value
public double estimateV()
Returns the estimated Q-value if this node is closed, or estimates it and closes it otherwise.