diff --git a/docs/11&12.md b/docs/11&12.md
index 96b2906b7edbd6dfbc3d10695ea9ab8277862337..8cf2f4e36fd6843f6c6117a647530a4b33181076 100644
--- a/docs/11&12.md
+++ b/docs/11&12.md
@@ -245,4 +245,10 @@ $\bullet$ 概率匹配:选择有最大概率是最优的动作,如 汤普森
$\bullet$ 信息状态空间:建立并解决扩展 MDP,因此直接包含了信息的价值
-## 参考文献
\ No newline at end of file
+## 参考文献
+
+1. T. L. Lai, and H. Robbins, "Asymtotically efficient adaptive allocation rules," *Advances in Applied Mathematics*, 1985.
+
+2. P. Auer, C. B. Nicolo, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," *Maching Learning*, 2002.
+
+3. R. I. Brafman, and M. Tennenholtz, "R-max - a general polynomial time algorithm for near-optimal reinforcement learning," *Journal of Maching Learning Research*, 2002.
\ No newline at end of file