public class MultipleIntentionsMLIRL
extends java.lang.Object
MLIRL
to perform the maximization step of the parameter values. EM is run for a specified number of iterations.
At initialization, the reward function parameters for each behavior cluster will be randomly assigned values between
-1 and 1. If you want to change this behavior, subclass this object and override the
initializeClusterRFParameters(java.util.List)
method.
1. Babes, Monica, et al. "Apprenticeship learning about multiple intentions." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
Acknowledgements: Lei Yang for code on which this was based.
Modifier and Type | Field and Description |
---|---|
protected double[] |
clusterPriors
The prior probabilities on each cluster.
|
protected java.util.List<MLIRLRequest> |
clusterRequests
The invididual
MLIRLRequest objects for each behavior cluster. |
protected int |
debugCode
The debug code used for printing information to the terminal.
|
protected MLIRL |
mlirlInstance
The
MLIRL instance used to perform the maximization step
for each clusters reward function parameter values. |
protected int |
numEMIterations
The number of EM iterations to run.
|
protected java.util.Random |
rand
A random object used for initializing each cluster's RF parameters randomly.
|
protected MultipleIntentionsMLIRLRequest |
request
The source problem request defining the problem to be solved.
|
Constructor and Description |
---|
MultipleIntentionsMLIRL(MultipleIntentionsMLIRLRequest request,
int emIterations,
double mlIRLLearningRate,
double maxMLIRLLikelihoodChange,
int maxMLIRLSteps)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
protected double |
computeClusterTrajectoryLoggedNormalization(int t,
double[][] logWeightedLikelihoods)
Given a matrix holding the log[Pr(c)] + log(Pr(t | c)] values in its entries, where
Pr(c) is the probability of the cluster and Pr(t | c)] is the probability of the trajectory given the cluster,
this method returns the log probability of the standard probability normalization factor for trajectory t in
the matrix.
|
protected double[][] |
computePerClusterMLIRLWeights()
Computes the probability of each trajectory being generated by each cluster and returns it in a matrix.
|
double[] |
computeProbabilityOfClustersGivenTrajectory(Episode t)
Returns the probability of each behavior cluster given the trajectory.
|
double[] |
getClusterPriors()
Returns the behavior cluster prior probabilities.
|
java.util.List<DifferentiableRF> |
getClusterRFs()
Returns the
DifferentiableRF obejcts defining each behavior cluster. |
int |
getDebugCode()
Returns the debug code used for printing to the terminal
|
protected void |
initializeClusterRFParameters(java.util.List<DifferentiableRF> rfs)
Initializes the
DifferentiableRF parameters
for each cluster. |
protected void |
initializeClusters(int k,
QGradientPlannerFactory plannerFactory)
Initializes cluster data; i.e., it initializes RF parameters, cluster prior parameters (to uniform), and creates
MLIRLRequest
objects for each cluster. |
void |
performIRL()
Performs multiple intention inverse reinforcement learning.
|
protected void |
randomizeParameters(DifferentiableRF rf)
Randomizes the parameters for a given
DifferentiableRF . |
protected void |
randomizeParameters(double[] paramVec)
Randomizes parameters in the given vector between -1 and 1.
|
void |
setDebugCode(int debugCode)
Sets the debug code used for printing to the terminal
|
void |
toggleDebugPrinting(boolean printDebug)
Sets whether information during learning is printed to the terminal.
|
protected MultipleIntentionsMLIRLRequest request
protected java.util.List<MLIRLRequest> clusterRequests
MLIRLRequest
objects for each behavior cluster.protected double[] clusterPriors
protected MLIRL mlirlInstance
MLIRL
instance used to perform the maximization step
for each clusters reward function parameter values.protected int numEMIterations
protected int debugCode
protected java.util.Random rand
public MultipleIntentionsMLIRL(MultipleIntentionsMLIRLRequest request, int emIterations, double mlIRLLearningRate, double maxMLIRLLikelihoodChange, int maxMLIRLSteps)
request
- the request that defines the problem.emIterations
- the number of EM iterations to perform.mlIRLLearningRate
- the learning rate of the underlying MLIRL
instance.maxMLIRLLikelihoodChange
- the likelihood change threshold that causes MLIRL
gradient ascent to stop.maxMLIRLSteps
- the maximum number of gradient ascent steps allowd by the underlying MLIRLRequest
gradient ascent.public void performIRL()
public double[] computeProbabilityOfClustersGivenTrajectory(Episode t)
t
- the trajectory (stored as an Episode
object) to evaluate.public java.util.List<DifferentiableRF> getClusterRFs()
DifferentiableRF
obejcts defining each behavior cluster.DifferentiableRF
obejcts defining each behavior cluster.public double[] getClusterPriors()
public void toggleDebugPrinting(boolean printDebug)
printDebug
- if true, information is printed to the terminal; if false then it is silent.public int getDebugCode()
public void setDebugCode(int debugCode)
debugCode
- the debug code used for printing to the terminalprotected double[][] computePerClusterMLIRLWeights()
protected double computeClusterTrajectoryLoggedNormalization(int t, double[][] logWeightedLikelihoods)
t
- the trajectory in question.logWeightedLikelihoods
- the matrix of log[Pr(c)] + log(Pr(t | c)] values.protected void initializeClusters(int k, QGradientPlannerFactory plannerFactory)
MLIRLRequest
objects for each cluster.k
- the number of clustersplannerFactory
- the QGradientPlannerFactory
to use to generate a valueFunction for each cluster.protected void initializeClusterRFParameters(java.util.List<DifferentiableRF> rfs)
DifferentiableRF
parameters
for each cluster. Will set the parameters randomly between -1 and 1.rfs
- the DifferentiableRF
whose parameters are to be initialized.protected void randomizeParameters(DifferentiableRF rf)
DifferentiableRF
.rf
- the DifferentiableRF
whose parameters are not be randomizedprotected void randomizeParameters(double[] paramVec)
paramVec
- the parameter vector to randomize.