Title: Efficient Algorithms for Structured Output Learning
Speaker: Mr. Balamurugan P
Date and Time: Wednesday, November 26, 2014, 12:00 PM
Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract
The task of classifying an input into one of the two known output classes,
called binary classification (or two-class classification) is a
well-studied problem in machine learning. A natural extension of binary
classification is multi-class classification problem, where the output
space contains more than two distinct classes. However, in many problems,
the output might be composed of multiple components which interact with
each other in a complex and structured manner, like a tree, graph,
sequence or an image. We refer to such outputs as structured outputs.
Applications which involve structured outputs are abundant in computer
vision, computational biology, natural language processing (NLP) and other
diverse fields. For example, in a simple NLP application like
part-of-speech tagging, the output is a tag sequence for a sentence. In an
image segmentation task, the output is a structured output consisting of
image segments. The machine learning task associated with learning a
classifier to predict structured outputs is referred to as structured
output learning or structured prediction. In this work, we consider
different problems associated with structured output learning.
We propose an efficient sequential dual optimization method for structural
Support Vector Machine (Structural SVM), an important tool for structured
output learning. The sequential optimization algorithm visits the training
examples one at a time and solves a simple optimization problem related to
the particular example being visited. The idea of processing one example
at a time and building the model incrementally makes the sequential method
advantageous for large-scale datasets. Experimental evidence shows that
the sequential dual method for structural SVM is an order of magnitude
faster than existing approaches like cutting-plane method and stochastic
gradient descent method. Motivated by its success on structural SVMs, the
sequential dual method is extended to another popular model for structured
output learning, namely conditional random fields (CRFs). An extensive
empirical evaluation of the the sequential method with various existing
approaches for structured output learning is presented and recommendations
useful for a practitioner are given regarding the choice of a particular
learning method over another. The sequential dual method is also adapted
using a coordinate descent procedure for a structural SVM with linear
cumulative loss.
Obtaining labeled training examples is relatively costly for structured
output learning, as the labels take the form of trees, sequences or image
segments. Labeling sentences with their corresponding part-of-speech tags
or parse trees requires years of manual effort for large datasets. In such
scenarios, semi-supervised approaches for structured output learning are
useful. The thesis proposes a simple and efficient label-switching
algorithm for semi-supervised structured output learning with domain
constraints. The proposed algorithm works on the components of the output
label, in contrast to most of the existing approaches which consider the
structured output label as a whole entity and use them in their training.
Empirical evidence on sequence labeling and natural language parsing show
that the proposed label-switching procedure, though simple, achieves
comparable generalization performance.
Large datasets for structured output learning contain thousands of
examples and millions of features. Designing sparse classifiers for
structured output learning becomes useful in identifying the most relevant
features used for inference and in improving the inference speed. The
thesis explores the use of elastic net, a sparsity promoting regularizer
for structural SVM. An efficient sequential alternating proximal method
for solving the dual optimization problem of elastic net regularized
structural SVM is proposed. This algorithm achieves near state-of-the-art
generalization performance after a few passes over the data with very
sparse models. The sequential alternating proximal method for elastic net
regularized structural SVM is also naturally extensible to CRFs.
The algorithm is further refined to a simple alternating optimization
method and is applied to elastic net regularized SVMs and logistic
regression for two-class classification. Experiments on large-scale
classification datasets show that the alternating optimization for elastic
net regularized SVMs and logistic regression achieves state-of-the-art
performance after a single pass through the data with extremely sparse
models. Finally, a fast algorithm using alternating direction method of
multipliers (ADMM) is proposed which improves the training speed of both
L1 and elastic net regularized structural SVM.
Host Faculty: Prof. Shirish K. Shevade
Speaker Bio:
Balamurugan P is a graduate student at the Department of Computer Science and
Automation, Indian Institute of Bangalore.