Multiple kernel learning

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.
Multiple kernel learning approaches have been used in many applications, such as event recognition in video, object recognition in images, and biomedical data fusion.

Algorithms

Multiple kernel learning algorithms have been developed for supervised, semi-supervised, as well as unsupervised learning. Most work has been done on the supervised learning case with linear combinations of kernels, however, many algorithms have been developed. The basic idea behind multiple kernel learning algorithms is to add an extra parameter to the minimization problem of the learning algorithm. As an example, consider the case of supervised learning of a linear combination of a set of kernels. We introduce a new kernel, where is a vector of coefficients for each kernel. Because the kernels are additive, this new function is still a kernel. For a set of data with labels, the minimization problem can then be written as
where is an error function and is a regularization term. is typically the square loss function or the hinge loss function, and is usually an norm or some combination of the norms. This optimization problem can then be solved by standard optimization methods. Adaptations of existing techniques such as the Sequential Minimal Optimization have also been developed for multiple kernel SVM-based methods.

Supervised learning

For supervised learning, there are many other algorithms that use different methods to learn the form of the kernel. The following categorization has been proposed by Gonen and Alpaydın

Fixed rules approaches

Fixed rules approaches such as the linear combination algorithm described above use rules to set the combination of the kernels. These do not require parameterization and use rules like summation and multiplication to combine the kernels. The weighting is learned in the algorithm. Other examples of fixed rules include pairwise kernels, which are of the form
These pairwise approaches have been used in predicting protein-protein interactions.

Heuristic approaches

These algorithms use a combination function that is parameterized. The parameters are generally defined for each individual kernel based on single-kernel performance or some computation from the kernel matrix. Examples of these include the kernel from Tenabe et al.. Letting be the accuracy obtained using only, and letting be a threshold less than the minimum of the single-kernel accuracies, we can define
Other approaches use a definition of kernel similarity, such as
Using this measure, Qui and Lane used the following heuristic to define

Optimization approaches

These approaches solve an optimization problem to determine parameters for the kernel combination function. This has been done with similarity measures and structural risk minimization approaches. For similarity measures such as the one defined above, the problem can be formulated as follows:
where is the kernel of the training set.
Structural risk minimization approaches that have been used include linear approaches, such as that used by Lanckriet et al.. We can define the implausibility of a kernel to be the value of the objective function after solving a canonical SVM problem. We can then solve the following minimization problem:
where is a positive constant.
Many other variations exist on the same idea, with different methods of refining and solving the problem, e.g. with nonnegative weights for individual kernels and using non-linear combinations of kernels.

Bayesian approaches

Bayesian approaches put priors on the kernel parameters and learn the parameter values from the priors and the base algorithm. For example, the decision function can be written as
can be modeled with a Dirichlet prior and can be modeled with a zero-mean Gaussian and an inverse gamma variance prior. This model is then optimized using a customized multinomial probit approach with a Gibbs sampler.
These methods have been used successfully in applications such as protein fold recognition and protein homology problems

Boosting approaches

Boosting approaches add new kernels iteratively until some stopping criteria that is a function of performance is reached. An example of this is the MARK model developed by Bennett et al.
The parameters and are learned by gradient descent on a coordinate basis. In this way, each iteration of the descent algorithm identifies the best kernel column to choose at each particular iteration and adds that to the combined kernel. The model is then rerun to generate the optimal weights and.

Semisupervised learning

approaches to multiple kernel learning are similar to other extensions of supervised learning approaches. An inductive procedure has been developed that uses a log-likelihood empirical loss and group LASSO regularization with conditional expectation consensus on unlabeled data for image categorization. We can define the problem as follows. Let be the labeled data, and let be the set of unlabeled data. Then, we can write the decision function as follows.
The problem can be written as
where is the loss function, is the regularization parameter, and is the conditional expectation consensus penalty on unlabeled data. The CEC penalty is defined as follows. Let the marginal kernel density for all the data be
where and is a non-negative random vector with a 2-norm of 1. The value of is the number of times each kernel is projected. Expectation regularization is then performed on the MKD, resulting in a reference expectation and model expectation. Then, we define
where is the Kullback-Leibler divergence.
The combined minimization problem is optimized using a modified block gradient descent algorithm. For more information, see Wang et al.

Unsupervised learning

multiple kernel learning algorithms have also been proposed by Zhuang et al. The problem is defined as follows. Let be a set of unlabeled data. The kernel definition is the linear combined kernel. In this problem, the data needs to be "clustered" into groups based on the kernel distances. Let be a group or cluster of which is a member. We define the loss function as. Furthermore, we minimize the distortion by minimizing. Finally, we add a regularization term to avoid overfitting. Combining these terms, we can write the minimization problem as follows.
where. One formulation of this is defined as follows. Let be a matrix such that means that and are neighbors. Then,. Note that these groups must be learned as well. Zhuang et al. solve this problem by an alternating minimization method for and the groups. For more information, see Zhuang et al.

Libraries

Available MKL libraries include

: A scalable C++ MKL SVM library that can handle a million kernels.
: Generalized Multiple Kernel Learning code in MATLAB, does and regularization for supervised learning.
: A different MATLAB MKL code that can also perform elastic net regularization
: C++ source code for a Sequential Minimal Optimization MKL algorithm. Does -n orm regularization.
: A MATLAB code based on the SimpleMKL algorithm for MKL SVM.
: A Python framework for MKL and kernel machines scikit-compliant with different algorithms, e.g. EasyMKL and others.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...