Mon, 07 Nov 2022

14:00 - 15:00
L4

Solving Continuous Control via Q-Learning

Markus Wulfmeier
(DeepMind)
Abstract

While there have been substantial successes of actor-critic methods in continuous control, simpler critic-only methods such as Q-learning often remain intractable in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements as well as wider hyperparameter search spaces. To address this limitation, we demonstrate in two stages how a simple variant of Deep Q Learning matches state-of-the-art continuous actor-critic methods when learning from simpler features or even directly from raw pixels. First, we take inspiration from control theory and shift from continuous control with policy distributions whose support covers the entire action space to pure bang-bang control via Bernoulli distributions. And second, we combine this approach with naive value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL). We finally add illustrative examples from control theory as well as classical bandit examples from cooperative MARL to provide intuition for 1) when action extrema are sufficient and 2) how decoupled value functions leverage state information to coordinate joint optimization.

Fri, 12 Mar 2021

12:00 - 13:00

The Metric is All You Need (for Disentangling)

David Pfau
(DeepMind)
Abstract

Learning a representation from data that disentangles different factors of variation is hypothesized to be a critical ingredient for unsupervised learning. Defining disentangling is challenging - a "symmetry-based" definition was provided by Higgins et al. (2018), but no prescription was given for how to learn such a representation. We present a novel nonparametric algorithm, the Geometric Manifold Component Estimator (GEOMANCER), which partially answers the question of how to implement symmetry-based disentangling. We show that fully unsupervised factorization of a data manifold is possible if the true metric of the manifold is known and each factor manifold has nontrivial holonomy – for example, rotation in 3D. Our algorithm works by estimating the subspaces that are invariant under random walk diffusion, giving an approximation to the de Rham decomposition from differential geometry. We demonstrate the efficacy of GEOMANCER on several complex synthetic manifolds. Our work reduces the question of whether unsupervised disentangling is possible to the question of whether unsupervised metric learning is possible, providing a unifying insight into the geometric nature of representation learning.

 

Subscribe to DeepMind