Multi-agent reinforcement learning: a mean-field perspective

Seminar series

Mathematical and Computational Finance Internal Seminar

Date

Thu, 04 Jun 2020

Time

16:00 - 17:00

Speaker

Renyuan Xu

Organisation

University of Oxford

Multi-agent reinforcement learning (MARL) has enjoyed substantial successes in many applications including the game of Go, online Ad bidding systems, realtime resource allocation, and autonomous driving. Despite the empirical success of MARL, general theories behind MARL algorithms are less developed due to the intractability of interactions, complex information structure, and the curse of dimensionality. Instead of directly analyzing the multi-agent games, mean-field theory provides a powerful approach to approximate the games under various notions of equilibria. Moreover, the analytical feasible framework of mean-field theory leads to learning algorithms with theoretical guarantees. In this talk, we will demonstrate how mean-field theory can contribute to the simultaneous-learning-and-decision-making problems with unknown rewards and dynamics.

To approximate Nash equilibrium, we first formulate a generalized mean-field game (MFG) and establish the existence and uniqueness of the MFG solution. Next we show the lack of stability in naive combination of the Q-learning algorithm and the three-step fixed-point approach in classical MFGs. We then propose both value-based and policy-based algorithms with smoothing and stabilizing techniques, and establish their convergence and complexity results. The numerical performance shows superior computational efficiency. This is based on joint work with Xin Guo (UC Berkeley), Anran Hu (UC Berkeley), and Junzi Zhang (Stanford).

If time allows, we will also discuss learning algorithms for multi-agent collaborative games using mean-field control. The key idea is to establish the time consistent property, i.e., the dynamic programming principle (DPP) on the lifted probability measure space. We then propose a kernel-based Q-learning algorithm. The convergence and complexity results are carried out accordingly. This is based on joint work with Haotian Gu, Xin Guo, and Xiaoli Wei (UC Berkeley).