From 767648b38b39f1922c7a4c36371a9c79a39a8c73 Mon Sep 17 00:00:00 2001 From: xiaowei_xing <997427575@qq.com> Date: Fri, 20 Dec 2019 16:29:21 +0900 Subject: [PATCH] test --- docs/11&12.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/11&12.md b/docs/11&12.md index b8c32af..e8cbe6c 100644 --- a/docs/11&12.md +++ b/docs/11&12.md @@ -149,4 +149,11 @@ $$ $$ a_{t}=\mathop{\arg\max}_ {a\in A}(\mu_a + c\frac{\sigma_{a}}{\sqrt{N(a)}})。 \tag{11} +$$ + +另一种方式是概率匹配(probability matching),即根据某一动作是最优动作的概率选择动作, + +$$ +\pi(a|h_t) = P[Q(a)>Q(a'),\forall a'\neq a|h_t]。 +\tag{12} $$ \ No newline at end of file -- GitLab