Reinforcement Learning for Optimal Execution
Abstract
Optimal execution of large positions over a given trading period is a fundamental decision-making problem for financial services. In this talk we explore reinforcement learning methods, in particular policy gradient methods, for finding the optimal policy in the optimal liquidation problem. We show results for the case where we assume a linear quadratic regulator (LQR) model for the underlying dynamics and where we apply the method to the data directly. The empirical evidence suggests that the policy gradient method can learn the global optimal solution for a larger class of stochastic systems containing the LQR framework, and that it is more robust with respect to model misspecification when compared to a model-based approach.