From 89cfc97a955fca5d43efd4375fc919e65d5311a5 Mon Sep 17 00:00:00 2001 From: daming-lu Date: Fri, 23 Mar 2018 09:02:33 -0700 Subject: [PATCH] rewrite formula in MathJax so that it can be rendered correctly. --- 08.machine_translation/README.cn.md | 6 ++---- 08.machine_translation/README.md | 6 ++---- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/08.machine_translation/README.cn.md b/08.machine_translation/README.cn.md index d91d019..d3bf363 100644 --- a/08.machine_translation/README.cn.md +++ b/08.machine_translation/README.cn.md @@ -122,10 +122,8 @@ $$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$ 从公式中可以看出,注意力机制是通过对编码器中各时刻的RNN状态$h_j$进行加权平均实现的。权重$a_{ij}$表示目标语言中第$i$个词对源语言中第$j$个词的注意力大小,$a_{ij}$的计算公式如下: -\begin{align} -a_{ij}&=\frac{exp(e_{ij})}{\sum_{k=1}^{T}exp(e_{ik})}\\\\ -e_{ij}&=align(z_i,h_j)\\\\ -\end{align} +$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$ +$$e_{ij} = {align(z_i, h_j)}$$ 其中,$align$可以看作是一个对齐模型,用来衡量目标语言中第$i$个词和源语言中第$j$个词的匹配程度。具体而言,这个程度是通过解码RNN的第$i$个隐层状态$z_i$和源语言句子的第$j$个上下文片段$h_j$计算得到的。传统的对齐模型中,目标语言的每个词明确对应源语言的一个或多个词(hard alignment);而在注意力模型中采用的是soft alignment,即任何两个目标语言和源语言词间均存在一定的关联,且这个关联强度是由模型计算得到的实数,因此可以融入整个NMT框架,并通过反向传播算法进行训练。 diff --git a/08.machine_translation/README.md b/08.machine_translation/README.md index e11297a..637ddd2 100644 --- a/08.machine_translation/README.md +++ b/08.machine_translation/README.md @@ -157,10 +157,8 @@ $$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$ It is noted that the attention mechanism is achieved by a weighted average over the RNN hidden states $h_j$. The weight $a_{ij}$ denotes the strength of attention of the $i$-th word in the target language sentence to the $j$-th word in the source sentence and is calculated as -\begin{align} -a_{ij}&=\frac{exp(e_{ij})}{\sum_{k=1}^{T}exp(e_{ik})}\\\\ -e_{ij}&=align(z_i,h_j)\\\\ -\end{align} +$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$ +$$e_{ij} = {align(z_i, h_j)}$$ where $align$ is an alignment model that measures the fitness between the $i$-th word in the target language sentence and the $j$-th word in the source sentence. More concretely, the fitness is computed with the $i$-th hidden state $z_i$ of the decoder RNN and the $j$-th context vector $h_j$ of the source sentence. Hard alignment is used in the conventional alignment model, which means each word in the target language explicitly corresponds to one or more words from the target language sentence. In an attention model, soft alignment is used, where any word in source sentence is related to any word in the target language sentence, where the strength of the relation is a real number computed via the model, thus can be incorporated into the NMT framework and can be trained via back-propagation. -- GitLab