Asymptotic Randomised Control with an application to bandit and dynamic pricing

Seminar series

Mathematical and Computational Finance Internal Seminar

Date

Thu, 03 Dec 2020

Time

16:00 - 17:00

Speaker

Tanut Treetanthiploet

Organisation

University of Oxford

Abstract: In many situations, one needs to decide between acting to reveal data about a system and acting to generate profit; this is the trade-off between exploration and exploitation. A simple situation where we face this trade-off is a multiarmed bandit problem, where one has M ‘bandits’ which generate reward from an unknown distribution, and one must choose which bandit to play at each time. The key difficulty in the multi-armed bandit problem is that the action often affects the information obtained. Due to the curse of dimensionality, solving the bandit problem directly is often computationally intractable.

In this talk, we will formulate a general class of the multi-armed bandit problem as a relaxed stochastic control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process, obtained numerically by solving a fixed point problem, which can be interpreted as explicitly balancing an exploration–exploitation trade-off. Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably with other approaches to correlated multi-armed bandits.

As an application of the multi-armed bandit, we also consider a multi-armed bandit problem where the observation from each bandit arrive from a Generalised Linear Model. We then use such model to consider a dynamic online pricing problem. The numerical simulation shows that the ARC algorithm also performs well compared to others.
============