public class MLIRL
extends java.lang.Object
MLIRLRequest
object) a set of expert trajectories
through a domain and a DifferentiableRF
model,
and learns the parameters of the reward function model that maximizes the likelihood of the trajectories.
The reward function parameter spaces is searched using gradient ascent. Since the policy gradient it uses
is non-linear, it's possible that it may get stuck in local optimas. Computing the policy gradient is done
by iteratively replanning after each gradient ascent step with a DifferentiableQFunction
instance provided in the MLIRLRequest
object.
The gradient ascent will stop either after a fixed number of steps or until the change in likelihood is smaller than some threshold. If the max number of steps is set to -1, then it will continue until the change in likelihood is smaller than the threshold.
1. Babes, Monica, et al. "Apprenticeship learning about multiple intentions." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
Modifier and Type | Field and Description |
---|---|
protected int |
debugCode
The debug code used for printing information to the terminal.
|
protected double |
learningRate
The gradient ascent learning rate
|
protected double |
maxLikelihoodChange
The likelihood change threshold to stop gradient ascent.
|
protected int |
maxSteps
The maximum number of steps of gradient ascent.
|
protected MLIRLRequest |
request
The MLRIL request defining the IRL problem.
|
Constructor and Description |
---|
MLIRL(MLIRLRequest request,
double learningRate,
double maxLikelihoodChange,
int maxSteps)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
protected static void |
addToVector(double[] sumVector,
double[] deltaVector)
Performs a vector addition and stores the results in sumVector
|
int |
getDebugCode()
Returns the debug code used for printing to the terminal
|
double |
logLikelihood()
Computes and returns the log-likelihood of all expert trajectories under the current reward function parameters.
|
FunctionGradient |
logLikelihoodGradient()
Computes and returns the gradient of the log-likelihood of all trajectories
|
double |
logLikelihoodOfTrajectory(Episode ea,
double weight)
Computes and returns the log-likelihood of the given trajectory under the current reward function parameters and weights it by the given weight.
|
FunctionGradient |
logPolicyGrad(State s,
Action ga)
Computes and returns the gradient of the Boltzmann policy for the given state and action.
|
void |
performIRL()
Runs gradient ascent.
|
void |
setDebugCode(int debugCode)
Sets the debug code used for printing to the terminal
|
void |
setRequest(MLIRLRequest request)
Sets the
MLIRLRequest object defining the IRL problem. |
void |
toggleDebugPrinting(boolean printDebug)
Sets whether information during learning is printed to the terminal.
|
protected MLIRLRequest request
protected double learningRate
protected double maxLikelihoodChange
protected int maxSteps
maxLikelihoodChange
alone.protected int debugCode
public MLIRL(MLIRLRequest request, double learningRate, double maxLikelihoodChange, int maxSteps)
request
- the problem request definitionlearningRate
- the gradient ascent learning ratemaxLikelihoodChange
- the likelihood change threshold that must be reached to terminate gradient ascentmaxSteps
- the maximum number of gradient ascent steps allowed before termination is forced. Set to -1 to rely only on likelihood threshold.public void setRequest(MLIRLRequest request)
MLIRLRequest
object defining the IRL problem.request
- the MLIRLRequest
object defining the IRL problem.public void toggleDebugPrinting(boolean printDebug)
printDebug
- if true, information is printed to the terminal; if false then it is silent.public int getDebugCode()
public void setDebugCode(int debugCode)
debugCode
- the debug code used for printing to the terminalpublic void performIRL()
public double logLikelihood()
public double logLikelihoodOfTrajectory(Episode ea, double weight)
ea
- the trajectoryweight
- the weight to assign the trajectorypublic FunctionGradient logLikelihoodGradient()
public FunctionGradient logPolicyGrad(State s, Action ga)
s
- the state in which the policy is queriedga
- the action for which the policy is queried.protected static void addToVector(double[] sumVector, double[] deltaVector)
sumVector
- the input vector to which the values in deltaVector will be added.deltaVector
- the vector values to add to sumVector.