Returns the estimated amount of reward that will be received when following the optimal policy from the given state.
Since deterministic forward search planning algorithms typically expect costs, this is represented by simply using negative reward, where
values closer to zero are better. For instance, if it was known that state s was 3 steps away from the goal, an optimal heuristic (the true reward
from the state) would return -3.