Author: Jin Li, Ye Luo, Zigan Wang and Xiaowei Zhang
| We identify a new type of bias in data analysis, termed reinforcement bias, and develop IV-based reinforcement learning algorithms to correct it. Additionally, we establish their theoretical properties by integrating them into a stochastic approximation framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement.
