MLIRL

java.lang.Object
- burlap.behavior.singleagent.learnfromdemo.mlirl.MLIRL

```
public class MLIRL
extends java.lang.Object
```
An implementation of Maximum-likelihood Inverse Reinforcement Learning [1]. This class takes as input (from an MLIRLRequest object) a set of expert trajectories through a domain and a DifferentiableRF model, and learns the parameters of the reward function model that maximizes the likelihood of the trajectories. The reward function parameter spaces is searched using gradient ascent. Since the policy gradient it uses is non-linear, it's possible that it may get stuck in local optimas. Computing the policy gradient is done by iteratively replanning after each gradient ascent step with a DifferentiableQFunction instance provided in the MLIRLRequest object.
The gradient ascent will stop either after a fixed number of steps or until the change in likelihood is smaller than some threshold. If the max number of steps is set to -1, then it will continue until the change in likelihood is smaller than the threshold.
1. Babes, Monica, et al. "Apprenticeship learning about multiple intentions." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.

Author:

James MacGlashan.

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`debugCode` The debug code used for printing information to the terminal.
`protected double`	`learningRate` The gradient ascent learning rate
`protected double`	`maxLikelihoodChange` The likelihood change threshold to stop gradient ascent.
`protected int`	`maxSteps` The maximum number of steps of gradient ascent.
`protected MLIRLRequest`	`request` The MLRIL request defining the IRL problem.

Constructor Summary

Constructors
Constructor and Description

MLIRL(MLIRLRequest request, double learningRate, double maxLikelihoodChange, int maxSteps)
Initializes.

Constructors
Constructor and Description
`MLIRL(MLIRLRequest request, double learningRate, double maxLikelihoodChange, int maxSteps)` Initializes.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected static void`	`addToVector(double[] sumVector, double[] deltaVector)` Performs a vector addition and stores the results in sumVector
`int`	`getDebugCode()` Returns the debug code used for printing to the terminal
`double`	`logLikelihood()` Computes and returns the log-likelihood of all expert trajectories under the current reward function parameters.
`FunctionGradient`	`logLikelihoodGradient()` Computes and returns the gradient of the log-likelihood of all trajectories
`double`	`logLikelihoodOfTrajectory(Episode ea, double weight)` Computes and returns the log-likelihood of the given trajectory under the current reward function parameters and weights it by the given weight.
`FunctionGradient`	`logPolicyGrad(State s, Action ga)` Computes and returns the gradient of the Boltzmann policy for the given state and action.
`void`	`performIRL()` Runs gradient ascent.
`void`	`setDebugCode(int debugCode)` Sets the debug code used for printing to the terminal
`void`	`setRequest(MLIRLRequest request)` Sets the `MLIRLRequest` object defining the IRL problem.
`void`	`toggleDebugPrinting(boolean printDebug)` Sets whether information during learning is printed to the terminal.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - request
```
protected MLIRLRequest request
```
    The MLRIL request defining the IRL problem.
  - learningRate
```
protected double learningRate
```
    The gradient ascent learning rate
  - maxLikelihoodChange
```
protected double maxLikelihoodChange
```
    The likelihood change threshold to stop gradient ascent.
  - maxSteps
```
protected int maxSteps
```
    The maximum number of steps of gradient ascent. when set to -1, there is no limit and termination will be based on the maxLikelihoodChange alone.
  - debugCode
```
protected int debugCode
```
    The debug code used for printing information to the terminal.
- Constructor Detail
  - MLIRL
```
public MLIRL(MLIRLRequest request,
             double learningRate,
             double maxLikelihoodChange,
             int maxSteps)
```
    Initializes.
    
    Parameters:
    
    request - the problem request definition
    
    learningRate - the gradient ascent learning rate
    
    maxLikelihoodChange - the likelihood change threshold that must be reached to terminate gradient ascent
    
    maxSteps - the maximum number of gradient ascent steps allowed before termination is forced. Set to -1 to rely only on likelihood threshold.
- Method Detail
  - setRequest
```
public void setRequest(MLIRLRequest request)
```
    Sets the MLIRLRequest object defining the IRL problem.
    
    Parameters:
    
    request - the MLIRLRequest object defining the IRL problem.
  - toggleDebugPrinting
```
public void toggleDebugPrinting(boolean printDebug)
```
    Sets whether information during learning is printed to the terminal. Will automatically toggle the debug printing for the underlying valueFunction as well.
    
    Parameters:
    
    printDebug - if true, information is printed to the terminal; if false then it is silent.
  - getDebugCode
```
public int getDebugCode()
```
    Returns the debug code used for printing to the terminal
    
    Returns:
    
    the debug code used for printing to the terminal.
  - setDebugCode
```
public void setDebugCode(int debugCode)
```
    Sets the debug code used for printing to the terminal
    
    Parameters:
    
    debugCode - the debug code used for printing to the terminal
  - performIRL
```
public void performIRL()
```
    Runs gradient ascent.
  - logLikelihood
```
public double logLikelihood()
```
    Computes and returns the log-likelihood of all expert trajectories under the current reward function parameters.
    
    Returns:
    
    the log-likelihood of all expert trajectories under the current reward function parameters.
  - logLikelihoodOfTrajectory
```
public double logLikelihoodOfTrajectory(Episode ea,
                                        double weight)
```
    Computes and returns the log-likelihood of the given trajectory under the current reward function parameters and weights it by the given weight.
    
    Parameters:
    
    ea - the trajectory
    
    weight - the weight to assign the trajectory
    
    Returns:
    
    the log-likelihood of the given trajectory under the current reward function parameters and weights it by the given weight.
  - logLikelihoodGradient
```
public FunctionGradient logLikelihoodGradient()
```
    Computes and returns the gradient of the log-likelihood of all trajectories
    
    Returns:
    
    the gradient of the log-likelihood of all trajectories
  - logPolicyGrad
```
public FunctionGradient logPolicyGrad(State s,
                                      Action ga)
```
    Computes and returns the gradient of the Boltzmann policy for the given state and action.
    
    Parameters:
    
    s - the state in which the policy is queried
    
    ga - the action for which the policy is queried.
    
    Returns:
    
    s the gradient of the Boltzmann policy for the given state and action.
  - addToVector
```
protected static void addToVector(double[] sumVector,
                                  double[] deltaVector)
```
    Performs a vector addition and stores the results in sumVector
    
    Parameters:
    
    sumVector - the input vector to which the values in deltaVector will be added.
    
    deltaVector - the vector values to add to sumVector.

Class MLIRL

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

request

learningRate

maxLikelihoodChange

maxSteps

debugCode

Constructor Detail

MLIRL

Method Detail

setRequest

toggleDebugPrinting

getDebugCode

setDebugCode

performIRL

logLikelihood

logLikelihoodOfTrajectory

logLikelihoodGradient

logPolicyGrad

addToVector