Reinforcement Learning in near-continuous time for continuous state-action spaces

Seminar series

Mathematical and Computational Finance Seminar

Date

Thu, 25 Apr 2024
16:00

Location

Speaker

Dr Lorenzo Croissant

Organisation

CEREMADE, Université Paris-Dauphine

We consider the reinforcement learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for systems in which interactions occur at a high frequency, if not in continuous time, or those whose state spaces are large if not inherently continuous. Perhaps the only exception is the linear quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure.

This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$ which captures arbitrary time scales from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning by extending the eluder dimension framework and propose an approximate planning method based on a diffusive limit ($\varepsilon\downarrow0$) approximation of the jump process.

Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}}(\sqrt{T})$ or $\tilde{\mathcal{O}}(\varepsilon^{1/2} T+\sqrt{T})$ with the approximate planning. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}}(\sqrt{T})$ is attainable in near-continuous time.

Further Information

Please join us for reshments outside the lecture room from 1530.