Author: Jin Li, Ye Luo and Xiaowei Zhang

| In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. Our proposed IV-based algorithms correct this bias, obtaining true parameter values and achieving low regret. We also establish a central limit theorem to support statistical inference, providing a technique to separate data and action interdependence.