MultipleIntentionsMLIRL

java.lang.Object
- burlap.behavior.singleagent.learnfromdemo.mlirl.MultipleIntentionsMLIRL

```
public class MultipleIntentionsMLIRL
extends java.lang.Object
```
An implementation of Multiple Intentions Maximum-likelihood Inverse Reinforcement Learning [1]. This algorithm takes as input a set of expert trajectories, a number of clusters, and a differentiable reward function model; and clusters the trajectories assigning each cluster its own reward function parameter values. The algorithm uses EM to find the reward function parameter values for each cluster and uses MLIRL to perform the maximization step of the parameter values. EM is run for a specified number of iterations.
At initialization, the reward function parameters for each behavior cluster will be randomly assigned values between -1 and 1. If you want to change this behavior, subclass this object and override the initializeClusterRFParameters(java.util.List) method.
1. Babes, Monica, et al. "Apprenticeship learning about multiple intentions." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
Acknowledgements: Lei Yang for code on which this was based.

Author:

James MacGlashan

Field Summary

Fields
Modifier and Type	Field and Description
`protected double[]`	`clusterPriors` The prior probabilities on each cluster.
`protected java.util.List<MLIRLRequest>`	`clusterRequests` The invididual `MLIRLRequest` objects for each behavior cluster.
`protected int`	`debugCode` The debug code used for printing information to the terminal.
`protected MLIRL`	`mlirlInstance` The `MLIRL` instance used to perform the maximization step for each clusters reward function parameter values.
`protected int`	`numEMIterations` The number of EM iterations to run.
`protected java.util.Random`	`rand` A random object used for initializing each cluster's RF parameters randomly.
`protected MultipleIntentionsMLIRLRequest`	`request` The source problem request defining the problem to be solved.

Constructor Summary

Constructors
Constructor and Description
`MultipleIntentionsMLIRL(MultipleIntentionsMLIRLRequest request, int emIterations, double mlIRLLearningRate, double maxMLIRLLikelihoodChange, int maxMLIRLSteps)` Initializes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected double`	`computeClusterTrajectoryLoggedNormalization(int t, double[][] logWeightedLikelihoods)` Given a matrix holding the log[Pr(c)] + log(Pr(t \| c)] values in its entries, where Pr(c) is the probability of the cluster and Pr(t \| c)] is the probability of the trajectory given the cluster, this method returns the log probability of the standard probability normalization factor for trajectory t in the matrix.
`protected double[][]`	`computePerClusterMLIRLWeights()` Computes the probability of each trajectory being generated by each cluster and returns it in a matrix.
`double[]`	`computeProbabilityOfClustersGivenTrajectory(Episode t)` Returns the probability of each behavior cluster given the trajectory.
`double[]`	`getClusterPriors()` Returns the behavior cluster prior probabilities.
`java.util.List<DifferentiableRF>`	`getClusterRFs()` Returns the `DifferentiableRF` obejcts defining each behavior cluster.
`int`	`getDebugCode()` Returns the debug code used for printing to the terminal
`protected void`	`initializeClusterRFParameters(java.util.List<DifferentiableRF> rfs)` Initializes the `DifferentiableRF` parameters for each cluster.
`protected void`	`initializeClusters(int k, QGradientPlannerFactory plannerFactory)` Initializes cluster data; i.e., it initializes RF parameters, cluster prior parameters (to uniform), and creates `MLIRLRequest` objects for each cluster.
`void`	`performIRL()` Performs multiple intention inverse reinforcement learning.
`protected void`	`randomizeParameters(DifferentiableRF rf)` Randomizes the parameters for a given `DifferentiableRF`.
`protected void`	`randomizeParameters(double[] paramVec)` Randomizes parameters in the given vector between -1 and 1.
`void`	`setDebugCode(int debugCode)` Sets the debug code used for printing to the terminal
`void`	`toggleDebugPrinting(boolean printDebug)` Sets whether information during learning is printed to the terminal.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - request
```
protected MultipleIntentionsMLIRLRequest request
```
    The source problem request defining the problem to be solved.
  - clusterRequests
```
protected java.util.List<MLIRLRequest> clusterRequests
```
    The invididual MLIRLRequest objects for each behavior cluster.
  - clusterPriors
```
protected double[] clusterPriors
```
    The prior probabilities on each cluster.
  - mlirlInstance
```
protected MLIRL mlirlInstance
```
    The MLIRL instance used to perform the maximization step for each clusters reward function parameter values.
  - numEMIterations
```
protected int numEMIterations
```
    The number of EM iterations to run.
  - debugCode
```
protected int debugCode
```
    The debug code used for printing information to the terminal.
  - rand
```
protected java.util.Random rand
```
    A random object used for initializing each cluster's RF parameters randomly.
- Constructor Detail
  - MultipleIntentionsMLIRL
```
public MultipleIntentionsMLIRL(MultipleIntentionsMLIRLRequest request,
                               int emIterations,
                               double mlIRLLearningRate,
                               double maxMLIRLLikelihoodChange,
                               int maxMLIRLSteps)
```
    Initializes. Reward function parameters for each cluster will be initialized randomly between -1 and 1.
    
    Parameters:
    
    request - the request that defines the problem.
    
    emIterations - the number of EM iterations to perform.
    
    mlIRLLearningRate - the learning rate of the underlying MLIRL instance.
    
    maxMLIRLLikelihoodChange - the likelihood change threshold that causes MLIRL gradient ascent to stop.
    
    maxMLIRLSteps - the maximum number of gradient ascent steps allowd by the underlying MLIRLRequest gradient ascent.
- Method Detail
  - performIRL
```
public void performIRL()
```
    Performs multiple intention inverse reinforcement learning.
  - computeProbabilityOfClustersGivenTrajectory
```
public double[] computeProbabilityOfClustersGivenTrajectory(Episode t)
```
    Returns the probability of each behavior cluster given the trajectory.
    
    Parameters:
    
    t - the trajectory (stored as an Episode object) to evaluate.
    
    Returns:
    
    the probability of each behavior cluster given the trajectory.
  - getClusterRFs
```
public java.util.List<DifferentiableRF> getClusterRFs()
```
    Returns the DifferentiableRF obejcts defining each behavior cluster.
    
    Returns:
    
    the DifferentiableRF obejcts defining each behavior cluster.
  - getClusterPriors
```
public double[] getClusterPriors()
```
    Returns the behavior cluster prior probabilities.
    
    Returns:
    
    the behavior cluster prior probabilities.
  - toggleDebugPrinting
```
public void toggleDebugPrinting(boolean printDebug)
```
    Sets whether information during learning is printed to the terminal. Will automatically toggle the debug printing for the underlying MLIRL that runs.
    
    Parameters:
    
    printDebug - if true, information is printed to the terminal; if false then it is silent.
  - getDebugCode
```
public int getDebugCode()
```
    Returns the debug code used for printing to the terminal
    
    Returns:
    
    the debug code used for printing to the terminal.
  - setDebugCode
```
public void setDebugCode(int debugCode)
```
    Sets the debug code used for printing to the terminal
    
    Parameters:
    
    debugCode - the debug code used for printing to the terminal
  - computePerClusterMLIRLWeights
```
protected double[][] computePerClusterMLIRLWeights()
```
    Computes the probability of each trajectory being generated by each cluster and returns it in a matrix. The prior probability of each cluster prior is also updated to maximize these values. The returned matrix has clusters along the rows and trajectories along the columns. These values are used to weight the contribution of each trajectory for the MLIRL performed to maxmize each cluster RF parameters.
    
    Returns:
    
    the probability of each trajectory being generated by each cluster
  - computeClusterTrajectoryLoggedNormalization
```
protected double computeClusterTrajectoryLoggedNormalization(int t,
                                                             double[][] logWeightedLikelihoods)
```
    Given a matrix holding the log[Pr(c)] + log(Pr(t | c)] values in its entries, where Pr(c) is the probability of the cluster and Pr(t | c)] is the probability of the trajectory given the cluster, this method returns the log probability of the standard probability normalization factor for trajectory t in the matrix. That is, it returns log [ \sum_i Pr(c_i) * Pr(t | c_i) ]. The matrix is ordered such that the rows are cluster indices and columns are trajectories.
    
    Parameters:
    
    t - the trajectory in question.
    
    logWeightedLikelihoods - the matrix of log[Pr(c)] + log(Pr(t | c)] values.
    
    Returns:
    
    log [ \sum_i Pr(c_i) * Pr(t | c_i) ]
  - initializeClusters
```
protected void initializeClusters(int k,
                                  QGradientPlannerFactory plannerFactory)
```
    Initializes cluster data; i.e., it initializes RF parameters, cluster prior parameters (to uniform), and creates MLIRLRequest objects for each cluster.
    
    Parameters:
    
    k - the number of clusters
    
    plannerFactory - the QGradientPlannerFactory to use to generate a valueFunction for each cluster.
  - initializeClusterRFParameters
```
protected void initializeClusterRFParameters(java.util.List<DifferentiableRF> rfs)
```
    Initializes the DifferentiableRF parameters for each cluster. Will set the parameters randomly between -1 and 1.
    
    Parameters:
    
    rfs - the DifferentiableRF whose parameters are to be initialized.
  - randomizeParameters
```
protected void randomizeParameters(DifferentiableRF rf)
```
    Randomizes the parameters for a given DifferentiableRF.
    
    Parameters:
    
    rf - the DifferentiableRF whose parameters are not be randomized
  - randomizeParameters
```
protected void randomizeParameters(double[] paramVec)
```
    Randomizes parameters in the given vector between -1 and 1.
    
    Parameters:
    
    paramVec - the parameter vector to randomize.

Class MultipleIntentionsMLIRL

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

request

clusterRequests

clusterPriors

mlirlInstance

numEMIterations

debugCode

rand

Constructor Detail

MultipleIntentionsMLIRL

Method Detail

performIRL

computeProbabilityOfClustersGivenTrajectory

getClusterRFs

getClusterPriors

toggleDebugPrinting

getDebugCode

setDebugCode

computePerClusterMLIRLWeights

computeClusterTrajectoryLoggedNormalization

initializeClusters

initializeClusterRFParameters

randomizeParameters

randomizeParameters