Title: Locally Non-linear Embeddings for Extreme Multi-label Learning
Speaker: Prateek Jain, MSR Bangalore
Venue: SERC 102
Date and Time: Mar 9 (Mon), 1-2pm
Abstract:
The objective in extreme multi-label classification is to learn a
classifier that can automatically tag a data point with the most relevant
{\it subset} of labels from an extremely large label set. State-of-the-art
embedding based approaches make training and prediction tractable by
assuming that the training label matrix is nearly low-rank and hence the
effective number of labels can be reduced by linearly projecting the high
dimensional label vectors onto a low dimensional subspace. However, most
real-world problems violate the low-rank assumption as they have a long
tail of labels, i.e.\ several labels appear in only a small number of data
points.
In this talk, I will discuss our proposed X-One algorithm which is based on
the observation that, instead of a linear basis, the label vectors lie on a
low-dimensional non-linear manifold. This implies that the distances to
only a few nearest neighbors can be preserved by a low-dimensional
embedding. X-One therefore learns non-linear embeddings of the label
vectors such that only the k-nearest neighbour distances are preserved in
the training set. In addition, X-One also learns separate non-linear
embeddings in each local region of space so as to reduce training and
prediction time as well as further boost prediction accuracy.
Experiments reveal that X-One’s prediction accuracy can be as much as 30\%
more as compared to leading embedding methods. Furthermore, we can train
X-One on a data set with more than a million labels which is beyond the
pale of the state-of-the-art.
Joint work with Kush Bhatia, Himanshu Jain, Purushottam Kar, and Manik Varma