Machine learning for non-metric proximity data

Sunday 10 July 2022

SIAM - AN 22 - Indefinite Large Scale Kernel Approximation To Loose or to Preserve Information?

Indefinite Large Scale Kernel Approximation

To Loose or to Preserve Information?

Abstract.

Matrix approximations are a key element in large-scale algebraic machine learning approaches.
Focusing on similarities, a common assumption is to have positive semi definite (psd) kernel
functions as the underlying source.
This is often a too strong constraint and limits practical applications.
The approximation either cannot be validly applied or it changes the underlying
similarity function. Approximations could in fact also accidentally lead to non-psd representations.
In any case the modifications of the original or approximated similarities can have a severe
impact on the encoded information.

One may loose information or can introduced disturbances.
This is particular important if branch-and-bound approaches or mathematically
well principled error minimizers are used, to obtain predictive models.
Strategies to correct similarities beforehand or after the approximation are
often based on spectral corrections, embeddings or proxy approaches.
These corrections do often not scale to large data and counteract with the approximation.
We explain the problem setting and detail traditional and recent developments
in the domain of indefinite learning to correct symmetric similarities at large scale.

Authors

Maximilian Muench, University of Groningen, The Netherlands, m.a.munch@rug.nl
Simon Heilig, University of Applied Sciences Würzburg-Schweinfurt, Germany
Frank-Michael Schleif, University of Applied Sciences Würzburg-Schweinfurt, Germany, frank-michael.schleif@fhws.de

SIAM AN 22 - Slides

Friday 1 July 2022

Application of indefinite learning in the life sciences (open access)

Maximilian Münch, Christoph Raab, Michael Biehl, Frank-Michael Schleif: Data-Driven Supervised Learning for Life Science Data. Frontiers Appl. Math. Stat. 6: 553000 (2020)

Sparsification of core set models in non-metric supervised learning

We sparsified our original proposal of an indefinite core vector machine.

Fast and sparse - check it out

Frank-Michael Schleif, Christoph Raab, Peter Tiño: Sparsification of core set models in non-metric supervised learning. Pattern Recognit. Lett. 129: 1-7 (2020)

Friday 5 January 2018

Code for indefinite Core Vector Machine (iCVM) published

A simplified matlab code and an armadillo/C++ implementation of the indefinite
core vector machine (iCVM) is published at iCVM - Indefinite Core Vector Machine
and MLOSS it implements ideas published in the paper

Frank-Michael Schleif, Peter Tiño:
Indefinite Core Vector Machine. Pattern Recognition 71: 187-195 (2017)

The C++ code provides additionally a new approach to (re-)sparsify the indefinite model
such that the final indefinite decision function remains sparse and permits easy out of sample extension.

Tuesday 6 June 2017

New Pattern Recognition paper on Indefinite Core Vector Machine

Our new article Indefinite Core Vector Machine is now online at Pattern Recognition (Elsevier)
http://www.sciencedirect.com/science/article/pii/S0031320317302261

Highlights

Indefinite Core Vector Machine (iCVM) is proposed
approximation concepts are provided leading to linear runtime complexity under moderate assumptions
sparsification of iCVM is proposed showing that in many cases also a low memory complexity can be obtained with an acceptable loss in accuracy
the algorithm is compared to a number of related methods and multiple datasets showing competitive performance but with much lower computational and memory complexity

Sunday 28 May 2017

Accepted paper @ ICANN 2017

New paper proposing indefinite Support Vector Regression will be presented at ICANN 2017

Saturday 31 October 2015

Accepted paper@Simbad 2015

Got a paper accepted about Large scale Indefinite Kernel Fisher Discriminant
to be presented at the next Simbad Workshop. We present a way to get linear
runtime complexity for the iKFD algorithm given the input matrix is (approximately)
low rank. The original iKFD has cubic runtime complexity. iKFD is a very good
classification algorithm for indefinite / non-metric / non-positive input kernels and
our proposal makes iKFD ready for large scale problems.