We consider the problem of large-scale regression where both the number of predictors, p, and the number of observations, n, may be in the order of millions or more. Computing a simple OLS or ridge regression estimator for such data, though potentially sensible from a purely statistical perspective (if n is large enough), can be a real computational challenge. One recent approach to tackling this problem in the common situation where the matrix of predictors is sparse, is to first compress the data by mapping it to an n by L matrix with L << p, using a scheme called b-bit min-wise hashing (Li and König, 2011). We study this technique from a theoretical perspective and obtain finite-sample bounds on the prediction error of regression following such data compression, showing how it exploits the sparsity of the data matrix to achieve good statistical performance. Surprisingly, we also find that a main effects model in the compressed data is able to approximate an interaction model in the original data. Fitting interactions requires no modification of the compression scheme, but only a higher-dimensional mapping with a larger L.
This is joint work with Nicolai Meinshausen (ETH Zürich).
- Stochastic Analysis Seminar