Reinforcement learning for optimal stopping problems with exploration: A singular control formulation
We address continuous time and state-space optimal stopping problems from the reinforcement learning point of view. We first introduce a formulation of the stopping problem via singular controls, which allows the agent to randomize strategies. Then, we consider a regularized version of the problem by penalizing the cumulative residual entropy of the chosen strategy to incentivize exploration. The regularized version of the problem is then studied using the dynamic programming principle approach, which allows us to characterize the unique optimal exploratory strategy. In a benchmark example, we are able to solve the regularized problem explicitly, thus allowing us to study the effect of the entropy regularization and the vanishing entropy limit.
Area: IS14 - Stochastic Control and Game-theoretic Models in Economics and Finance (Giorgio Ferrari)
Keywords: Singular control, forward-backward stochastic differential equations, Nash equilibrium
Please Login in order to download this file