Author
Gu, H
Guo, X
Wei, X
XU, R
Last updated
2020-05-22T14:38:38.93+01:00
Abstract
This paper establishes the time consistent property, i.e., the dynamic programming principle (DPP), for learning mean-field controls (MFCs). The key idea is to define the correct form of the Q function, called the IQ function, for learning MFCs. This particular form of IQ function reflects the essence of MFCs and is an “integration” of the classical Q function over the state and action distributions. The DPP in the form of the Bellman equation for this IQ function generalizes the classical DPP of Q-learning to the McKean-Vlasov system. It also generalizes the DPP for MFCs to the learning framework. In addition, to accommodate model-based learning for MFCs, the DPP for
the associated value function is derived. Finally, numerical experiments are presented to illustrate the time consistency of this IQ function.
Symplectic ID
1106094
Download URL
https://renyuanxu.github.io/
Favourite
Off
Publication type
59
Please contact us with feedback and comments about this page. Created on 22 May 2020 - 17:30.