Dynamic Programming Principles for Learning Mean-Field Controls

Author

Gu, H

Guo, X

Wei, X

XU, R

Last updated

2020-05-22T14:38:38.93+01:00

Abstract

This paper establishes the time consistent property, i.e., the dynamic programming principle (DPP), for learning mean-field controls (MFCs). The key idea is to define the correct form of the Q function, called the IQ function, for learning MFCs. This particular form of IQ function reflects the essence of MFCs and is an “integration” of the classical Q function over the state and action distributions. The DPP in the form of the Bellman equation for this IQ function generalizes the classical DPP of Q-learning to the McKean-Vlasov system. It also generalizes the DPP for MFCs to the learning framework. In addition, to accommodate model-based learning for MFCs, the DPP for
the associated value function is derived. Finally, numerical experiments are presented to illustrate the time consistency of this IQ function.

Symplectic ID

1106094

Download URL

https://renyuanxu.github.io/

Submitted to ORA

Off

Favourite

Off

Publication type

59