Support Vector Machines: A geometric point of view
Summary
Support Vector Machines have been established as one of the major classification
and regression tools for Pattern Recognition and Signal Analysis. Over the last
decade a number of theoretical arguments have been developed in order to justify
their enhanced performance. The most widely known scenario is to look at them as
maximum margin classifiers. Another approach is via learning theory arguments and
the structural risk minimization principle, which leads to an optimal trade off
between performance and complexity. An alternative path is to look at the cost
function, associated with the SVMs, as a regularized minimizer that asymptotically
tends to the Bayesian classifier. A less known viewpoint is the geometric one that
leads to the notion of reduced convex hulls. For the non-separable class case, the
SVM solution is shown to be equivalent with computing the minimum distance between
two reduced versions of the original convex hulls that "encircle" the two classes
(for the two class case).
In this talk I will focus on the geometric approach and new results will be discussed
concerning a) novel, necessary for our case, theorems concerning the structure and
properties of the reduced convex hulls (RCH) and b) novel algorithms for computing
the minimum distance between the resulting RCH/s.
This problem is far from being trivial, since existing algorithms, which compute the
minimum distance between convex hulls, rely on their respective extreme points. However,
computing the extreme points of a reduced convex hull, as we have shown, is a computationally
hard task of a combinatorial nature. A basic projection theorem, that we have shown, will be
discussed that bypasses the combinatorial burden of the task and opens the way to employ
geometric minimum distance algorithms to the SVM task. Most important, this theorem "respects"
inner products, thus allowing to the well known kernel trick to be easily incorporated into the
algorithmic schemes, making them appropriate for the general nonlinear non-separable problem.
The derived geometric algorithms are much more efficient compared to the classical and widely
used SMO algorithm and its versions. A number of tests with well known test beds have shown that,
sometimes, a gain of an order of magnitude in the number of kernel computations, for similar error
rates, can be achieved. Furthermore, the new schemes are closer to our intuitive understanding of
an iterative algorithm in simple geometric arguments.