test

15ae0798 · xiaowei_xing · 25c94b7d · 15ae0798
隐藏空白更改
内联并排

Showing with 7 addition and 1 deletion

docs/11&12.md docs/11&12.md +7 -1

未找到文件。
--- a/docs/11&12.md
+++ b/docs/11&12.md
@@ -245,4 +245,10 @@ $\bullet$ 概率匹配：选择有最大概率是最优的动作，如 汤普森

 $\bullet$ 信息状态空间：建立并解决扩展 MDP，因此直接包含了信息的价值

-## 参考文献
\ No newline at end of file
+## 参考文献
+
+1. <span id="ref1">T. L. Lai, and H. Robbins, "Asymtotically efficient adaptive allocation rules," *Advances in Applied Mathematics*, 1985.</span>
+
+2. <span id="ref2">P. Auer, C. B. Nicolo, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," *Maching Learning*, 2002.</span>
+
+3. <span id="ref3">R. I. Brafman, and M. Tennenholtz, "R-max - a general polynomial time algorithm for near-optimal reinforcement learning," *Journal of Maching Learning Research*, 2002.</span>
\ No newline at end of file