Seminar series
Date
Mon, 07 Nov 2022
Time
14:00 - 15:00
Location
L4
Speaker
Markus Wulfmeier
Organisation
DeepMind

While there have been substantial successes of actor-critic methods in continuous control, simpler critic-only methods such as Q-learning often remain intractable in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements as well as wider hyperparameter search spaces. To address this limitation, we demonstrate in two stages how a simple variant of Deep Q Learning matches state-of-the-art continuous actor-critic methods when learning from simpler features or even directly from raw pixels. First, we take inspiration from control theory and shift from continuous control with policy distributions whose support covers the entire action space to pure bang-bang control via Bernoulli distributions. And second, we combine this approach with naive value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL). We finally add illustrative examples from control theory as well as classical bandit examples from cooperative MARL to provide intuition for 1) when action extrema are sufficient and 2) how decoupled value functions leverage state information to coordinate joint optimization.

Please contact us with feedback and comments about this page. Last updated on 25 Oct 2022 16:09.