Optimization, Speed-up, and Out-of-distribution Prediction in Deep Learning
Abstract
In this talk, I will introduce our investigations on how to make deep learning easier to optimize, faster to train, and more robust to out-of-distribution prediction. To be specific, we design a group-invariant optimization framework for ReLU neural networks; we compensate the gradient delay in asynchronized distributed training; and we improve the out-of-distribution prediction by incorporating “causal” invariance.