Working Paper

Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity

This version: December 2024
JEL: C10

Keywords: actor-critic, endogeneity, instrumental variable, markov decision process, measurement error, q-learning, reinforcement bias, reinforcement learning, stochastic approximation

↓ Download PDF

⌞ Cite this paper

Abstract

In the standard data analysis framework, data is collected (once and for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias—reinforcement bias—that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their theoretical properties by incorporating them into a stochastic approximation (SA) framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement. We also provide formulas for inference on optimal policies of the IV-RL algorithms. These formulas highlight how intertemporal dependencies of the Markovian environment affect the inference.

Acknowledgements

We thank Chunrong Ai, Jose Blanchet, Nan Chen, Xinyun Chen, Victor Chernozhukov, Jim Dai, Yifan Feng, Ivan Fernandez-Val, Jean-Jacques Forneron, Kay Giesecke, Peter W. Glynn, Wei Jiang, Hiroaki Kaido, Tom Luo, as well as seminar participants at Boston University, Chinese University of Hong Kong, Peking University, Shenzhen Research Institute of Big Data, Stanford University, and Mostly OM 2024 Workshop for helpful discussions. All remaining errors are ours.

Related Research

Suggested Citation

Li, J., Luo, Y., Wang, Z., & Zhang, X. (2024). Asymptotic theory for IV-based reinforcement learning with potential endogeneity (CAMO Working Paper No. 2024-02). HKU Centre for AI, Management and Organization. https://camo.hku.hk/asymptotic-theory-for-iv-based-reinforcement-learning-with-potential-endogeneity/

BibTeX

@techreport{li_luo_wang_zhang_2024,
  author      = {Li, Jin and Luo, Ye and Wang, Zigan and Zhang, Xiaowei},
  title       = {Asymptotic Theory for {IV}-Based Reinforcement Learning with Potential Endogeneity},
  institution = {HKU Centre for AI, Management and Organization},
  type        = {{CAMO} Working Paper},
  number      = {2024-02},
  year        = {2024},
  url         = {https://camo.hku.hk/[slug]/asymptotic-theory-for-iv-based-reinforcement-learning-with-potential-endogeneity/}
}

Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity

Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity

Abstract

Acknowledgements

Related Research

Related posts:

Suggested Citation