Scalable Data Analytics
Abstract
Very-large scale data analytics are an alleged golden goose for efforts in parallel and distributed computing, and yet contemporary statistics remain somewhat of a dark art for the uninitiated. In this presentation, we are going to take a mathematical and algorithmic look beyond the veil of Big Data by studying the structure of the algorithms and data, and by analyzing the fit to existing and proposed computer systems and programming models. Towards highly scalable kernels, we will also discuss some of the promises and challenges of approximation algorithms using randomization, sampling, and decoupled processing, touching some contemporary topics in parallel numerics.