Date
Fri, 25 Jan 2019
Time
10:00 - 11:00
Location
L5
Speaker
Stephane Chretien
Organisation
NPL

Clustering is a very important task in data analytics and is usually addressed using (i) statistical tools based on maximum likelihood estimators for mixture models, (ii) techniques based on network models such as the stochastic block model, or (iii) relaxations of the K-means approach based on semi-definite programming (or even simpler spectral approaches). Statistical approaches of type (i) often suffer from not being solvable with sufficient guarantees, because of the non-convexity of the underlying cost function to optimise. The other two approaches (ii) and (iii) are amenable to convex programming but do not usually scale to large datasets. In the big data setting, one usually needs to resort to data subsampling, a preprocessing stage also known as "coreset selection". We will present this last approach and the problem of selecting a coreset for the special cases of K-means and spectral-type relaxations.

 

Please contact us with feedback and comments about this page. Last updated on 03 Apr 2022 01:32.