BoltzmannPolicyGradient

java.lang.Object
- burlap.behavior.singleagent.learnbydemo.mlirl.support.BoltzmannPolicyGradient

```
public class BoltzmannPolicyGradient
extends java.lang.Object
```
This class provides methods to compute the gradient of a Boltzmann policy. Numerous logarithmic tricks are performed to to avoid overflow issues that a straight computation of the exponentials might induce. The methods require that that input come from a differentiable planner and reward functions, which means that the planner should be implementing a Boltzmann value backup instead of a Bellman value backup.

Author:

James MacGlashan.

Constructor Summary

Constructors
Constructor and Description

BoltzmannPolicyGradient()

Constructors
Constructor and Description
`BoltzmannPolicyGradient()`

Method Summary

Methods
Modifier and Type	Method and Description
`static double[]`	`computeBoltzmannPolicyGradient(State s, GroundedAction a, QGradientPlanner planner, double beta)` Computes the gradient of a Boltzmann policy using the given differentiable planner.
`static double[]`	`computePolicyGradient(DifferentiableRF rf, double beta, double[] qs, double maxBetaScaled, double logSum, double[][] gqs, int aInd)` Computes the gradient of a Boltzmann policy using values derived from a Differentiable Botlzmann backup planner.
`static double`	`logSum(double[] qs, double maxBetaScaled, double beta)` Computes the log sum of exponentiated Q-values (Scaled by beta)
`static double`	`maxBetaScaled(double[] qs, double beta)` Given an array of Q-values, returns the maximum Q-value multiplied by the parameter beta.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - BoltzmannPolicyGradient
```
public BoltzmannPolicyGradient()
```
- Method Detail
  - computeBoltzmannPolicyGradient
```
public static double[] computeBoltzmannPolicyGradient(State s,
                                      GroundedAction a,
                                      QGradientPlanner planner,
                                      double beta)
```
    Computes the gradient of a Boltzmann policy using the given differentiable planner.
    
    Parameters:
    s - the input state of the policy gradient
    a - the action whose policy probability gradient being queried
    planner - the differentiable QGradientPlanner planner
    beta - the Boltzmann beta parameter. This parameter is the inverse of the Botlzmann temperature. As beta becomes larger, the policy becomes more deterministic. Should lie in [0, +ifnty].
    
    Returns:
    the gradient of the policy.
  - computePolicyGradient
```
public static double[] computePolicyGradient(DifferentiableRF rf,
                             double beta,
                             double[] qs,
                             double maxBetaScaled,
                             double logSum,
                             double[][] gqs,
                             int aInd)
```
    Computes the gradient of a Boltzmann policy using values derived from a Differentiable Botlzmann backup planner.
    
    Parameters:
    rf - the planner's DifferentiableRF
    beta - the Boltzmann beta parameter. This parameter is the inverse of the Botlzmann temperature. As beta becomes larger, the policy becomes more deterministic. Should lie in [0, +ifnty].
    qs - an array holding the Q-value for each action.
    maxBetaScaled - the maximum Q-value after being scaled by the parameter beta
    logSum - the log sum of the exponentiated q values
    gqs - a matrix holding the Q-value gradient for each action. The matrix's major order is the action index, followed by the parameter gradient
    aInd - the index of the query action for which the policy's gradient is being computed
    
    Returns:
    the gradient of the policy.
  - maxBetaScaled
```
public static double maxBetaScaled(double[] qs,
                   double beta)
```
    Given an array of Q-values, returns the maximum Q-value multiplied by the parameter beta.
    
    Parameters:
    qs - an array of Q-values
    beta - the scaling beta parameter.
    
    Returns:
    the maximum Q-value multiplied by the parameter beta
  - logSum
```
public static double logSum(double[] qs,
            double maxBetaScaled,
            double beta)
```
    Computes the log sum of exponentiated Q-values (Scaled by beta)
    
    Parameters:
    qs - the Q-values
    maxBetaScaled - the maximum Q-value scaled by the parameter beta
    beta - the scaling value.
    
    Returns:
    the log sum of exponentiated Q-values (Scaled by beta)

Class BoltzmannPolicyGradient

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

BoltzmannPolicyGradient

Method Detail

computeBoltzmannPolicyGradient

computePolicyGradient

maxBetaScaled

logSum