1.<spanid="ref1">T. L. Lai, and H. Robbins, "Asymtotically efficient adaptive allocation rules," *Advances in Applied Mathematics*, 1985.</span>
2.<spanid="ref2">P. Auer, C. B. Nicolo, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," *Maching Learning*, 2002.</span>
3.<spanid="ref3">R. I. Brafman, and M. Tennenholtz, "R-max - a general polynomial time algorithm for near-optimal reinforcement learning," *Journal of Maching Learning Research*, 2002.</span>