public class MLIRL
extends java.lang.Object
MLIRLRequest object) a set of expert trajectories
 through a domain and a DifferentiableRF model,
 and learns the parameters of the reward function model that maximizes the likelihood of the trajectories.
 The reward function parameter spaces is searched using gradient ascent. Since the policy gradient it uses
 is non-linear, it's possible that it may get stuck in local optimas. Computing the policy gradient is done
 by iteratively replanning after each gradient ascent step with a QGradientPlanner
 instance provided in the MLIRLRequest object.
 
 The gradient ascent will stop either after a fixed number of steps or until the change in likelihood is smaller
 than some threshold. If the max number of steps is set to -1, then it will continue until the change in likelihood
 is smaller than the threshold.
 
 1. Babes, Monica, et al. "Apprenticeship learning about multiple intentions." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.| Modifier and Type | Field and Description | 
|---|---|
| protected int | debugCodeThe debug code used for printing information to the terminal. | 
| protected double | learningRateThe gradient ascent learning rate | 
| protected double | maxLikelihoodChangeThe likelihood change threshold to stop gradient ascent. | 
| protected int | maxStepsThe maximum number of steps of gradient ascent. | 
| protected MLIRLRequest | requestThe MLRIL request defining the IRL problem. | 
| Constructor and Description | 
|---|
| MLIRL(MLIRLRequest request,
     double learningRate,
     double maxLikelihoodChange,
     int maxSteps)Initializes. | 
| Modifier and Type | Method and Description | 
|---|---|
| protected static void | addToVector(double[] sumVector,
           double[] deltaVector)Performs a vector addition and stores the results in sumVector | 
| int | getDebugCode()Returns the debug code used for printing to the terminal | 
| double | logLikelihood()Computes and returns the log-likelihood of all expert trajectories under the current reward function parameters. | 
| FunctionGradient | logLikelihoodGradient()Computes and returns the gradient of the log-likelihood of all trajectories | 
| double | logLikelihoodOfTrajectory(EpisodeAnalysis ea,
                         double weight)Computes and returns the log-likelihood of the given trajectory under the current reward function parameters and weights it by the given weight. | 
| FunctionGradient | logPolicyGrad(State s,
             GroundedAction ga)Computes and returns the gradient of the Boltzmann policy for the given state and action. | 
| void | performIRL()Runs gradient ascent. | 
| void | setDebugCode(int debugCode)Sets the debug code used for printing to the terminal | 
| void | setRequest(MLIRLRequest request)Sets the  MLIRLRequestobject defining the IRL problem. | 
| void | toggleDebugPrinting(boolean printDebug)Sets whether information during learning is printed to the terminal. | 
protected MLIRLRequest request
protected double learningRate
protected double maxLikelihoodChange
protected int maxSteps
maxLikelihoodChange alone.protected int debugCode
public MLIRL(MLIRLRequest request, double learningRate, double maxLikelihoodChange, int maxSteps)
request - the problem request definitionlearningRate - the gradient ascent learning ratemaxLikelihoodChange - the likelihood change threshold that must be reached to terminate gradient ascentmaxSteps - the maximum number of gradient ascent steps allowed before termination is forced. Set to -1 to rely only on likelihood threshold.public void setRequest(MLIRLRequest request)
MLIRLRequest object defining the IRL problem.request - the MLIRLRequest object defining the IRL problem.public void toggleDebugPrinting(boolean printDebug)
printDebug - if true, information is printed to the terminal; if false then it is silent.public int getDebugCode()
public void setDebugCode(int debugCode)
debugCode - the debug code used for printing to the terminalpublic void performIRL()
public double logLikelihood()
public double logLikelihoodOfTrajectory(EpisodeAnalysis ea, double weight)
ea - the trajectoryweight - the weight to assign the trajectorypublic FunctionGradient logLikelihoodGradient()
public FunctionGradient logPolicyGrad(State s, GroundedAction ga)
s - the state in which the policy is queriedga - the action for which the policy is queried.protected static void addToVector(double[] sumVector,
               double[] deltaVector)
sumVector - the input vector to which the values in deltaVector will be added.deltaVector - the vector values to add to sumVector.