A New Framework for Reinforcement Learning in the Physical World
Abstract
We study reinforcement learning in the physical world, where the underlying dynamics evolve according to an unknown stochastic differential equation, while only discrete-time data are available. Existing RL algorithms typically ignore this SDE structure, which can limit their effectiveness in physical-world settings. We develop a systematic approach for adapting existing RL algorithms to this setting with minimal modifications, by leveraging the smoothness of the underlying continuous-time dynamics. In particular, for the LQR setting, we show that our framework can recover the exact continuous-time optimal control with only discrete-time information. We further identify a fundamental trade-off between discretization error and statistical error that is intrinsic to RL in the physical world. Finally, we extend the framework to mean-field optimal control.