Kernel tests of homogeneity, independence, and multi-variable interaction

19 May 2014
We consider three nonparametric hypothesis testing problems: (1) Given samples from distributions p and q, a homogeneity test determines whether to accept or reject p=q; (2) Given a joint distribution p_xy over random variables x and y, an independence test investigates whether p_xy = p_x p_y, (3) Given a joint distribution over several variables, we may test for whether there exist a factorization (e.g., P_xyz = P_xyP_z, or for the case of total independence, P_xyz=P_xP_yP_z). We present nonparametric tests for the three cases above, based on distances between embeddings of probability measures to reproducing kernel Hilbert spaces (RKHS), which constitute the test statistics (eg for independence, the distance is between the embedding of the joint, and that of the product of the marginals). The tests benefit from years of machine research on kernels for various domains, and thus apply to distributions on high dimensional vectors, images, strings, graphs, groups, and semigroups, among others. The energy distance and distance covariance statistics are also shown to fall within the RKHS family, when semimetrics of negative type are used. The final test (3) is of particular interest, as it may be used in detecting cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence, even when these variables have high dimension.
  • Stochastic Analysis Seminar