From 15ae07986cf1720361481c20c125c5104cebbb40 Mon Sep 17 00:00:00 2001 From: xiaowei_xing <997427575@qq.com> Date: Thu, 16 Jan 2020 21:40:43 +0900 Subject: [PATCH] test --- docs/11&12.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/11&12.md b/docs/11&12.md index 96b2906..8cf2f4e 100644 --- a/docs/11&12.md +++ b/docs/11&12.md @@ -245,4 +245,10 @@ $\bullet$ 概率匹配:选择有最大概率是最优的动作,如 汤普森 $\bullet$ 信息状态空间:建立并解决扩展 MDP,因此直接包含了信息的价值 -## 参考文献 \ No newline at end of file +## 参考文献 + +1. T. L. Lai, and H. Robbins, "Asymtotically efficient adaptive allocation rules," *Advances in Applied Mathematics*, 1985. + +2. P. Auer, C. B. Nicolo, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," *Maching Learning*, 2002. + +3. R. I. Brafman, and M. Tennenholtz, "R-max - a general polynomial time algorithm for near-optimal reinforcement learning," *Journal of Maching Learning Research*, 2002. \ No newline at end of file -- GitLab