Clearing the Jungle of Stochastic Optimization

Seminar series

Nomura Seminar

Date

Thu, 14 May 2015

Time

16:00 - 17:00

Location

Speaker

Professor Warren Powell

Organisation

Princeton University

Stochastic optimization for sequential decision problems under uncertainty arises in many settings, and as a result as evolved under several canonical frameworks with names such as dynamic programming, stochastic programming, optimal control, robust optimization, and simulation optimization (to name a few). This is in sharp contrast with the universally accepted canonical frameworks for deterministic math programming (or deterministic optimal control). We have found that these competing frameworks are actually hiding different classes of policies to solve a single problem which encompasses all of these fields. In this talk, I provide a canonical framework which, while familiar to some, is not universally used, but should be. The framework involves solving an objective function which requires searching over a class of policies, a step that can seem like mathematical hand waving. We then identify four fundamental classes of policies, called policy function approximations (PFAs), cost function approximations (CFAs), policies based on value function approximations (VFAs), and lookahead policies (which themselves come in different flavors). With the exception of CFAs, these policies have been widely studied under names that make it seem as if they are fundamentally different approaches (policy search, approximate dynamic programming or reinforcement learning, model predictive control, stochastic programming and robust optimization). We use a simple energy storage problem to demonstrate that minor changes in the nature of the data can produce problems where each of the four classes might work best, or a hybrid. This exercise supports our claim that any formulation of a sequential decision problem should start with a recognition that we need to search over a space of policies.