From f607c3f1013922e67728833fb4de4e32192e881f Mon Sep 17 00:00:00 2001
From: Aston Zhang <asv325@gmail.com>
Date: Tue, 1 Jan 2019 09:28:29 +0000
Subject: [PATCH] mention transformer and bert

---
 chapter_natural-language-processing/attention.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/chapter_natural-language-processing/attention.md b/chapter_natural-language-processing/attention.md
index 7feeb07..749a45e 100644
--- a/chapter_natural-language-processing/attention.md
+++ b/chapter_natural-language-processing/attention.md
@@ -73,6 +73,14 @@ $$
 
 其中含下标的$\boldsymbol{W}$和$\boldsymbol{b}$分别为门控循环单元的权重参数和偏差参数。
 
+
+
+## 发展
+
+本质上，注意力机制能够为特征中较有价值的部分分配较多的计算资源。变换器（Transformer）依靠注意力机制，在不使用卷积神经网络或循环神经网络的前提下，能够有效编码输入序列，并将其解码为输出序列 [2]。之后的BERT预训练模型应用了含注意力机制的变换器编码结构：微调后的模型在多达11项自然语言处理任务中取得了当时最先进的结果 [3]。除了自然语言处理领域，注意力机制还被广泛用于图像分类、自动图像描述、唇语解读以及语音识别。
+
+
+
 ## 小结
 
 * 我们可以在解码器的每个时间步使用不同的背景变量，并对输入序列中不同时间步编码的信息分配不同的注意力。
@@ -96,3 +104,7 @@ $$
 ## 参考文献
 
 [1] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
+
+[2] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008).
+
+[3] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
-- 
GitLab