Merge pull request #529 from zanshuxun/master

Update 22.md

Merge pull request #529 from zanshuxun/master
Update 22.md
adc58301 · 片刻小哥哥 · GitHub · f034f881 · d2a0cf1c · adc58301
隐藏空白更改
内联并排

Showing with 29 addition and 36 deletion

docs/1.4/22.md docs/1.4/22.md +29 -36

未找到文件。
--- a/docs/1.4/22.md
+++ b/docs/1.4/22.md
@@ -2,21 +2,17 @@

 > 原文： [https://pytorch.org/tutorials/beginner/transformer_tutorial.html](https://pytorch.org/tutorials/beginner/transformer_tutorial.html)

-注意
+本教程展示了如何使用[nn.Transformer](https://pytorch.org/docs/master/nn.html?highlight=nn%20transformer#torch.nn.Transformer) 模块训练一个seq2seq模型。[单击此处下载完整的示例代码](https://pytorch.org/tutorials/_downloads/dca13261bbb4e9809d1a3aa521d22dd7/transformer_tutorial.ipynb)

-单击此处的[下载完整的示例代码](#sphx-glr-download-beginner-transformer-tutorial-py)
-
-这是一个有关如何训练使用 [nn.Transformer](https://pytorch.org/docs/master/nn.html?highlight=nn%20transformer#torch.nn.Transformer) 模块的序列到序列模型的教程。
-
-PyTorch 1.2 版本包括一个基于纸张[的标准变压器模块。 事实证明，该变压器模型在许多序列间问题上具有较高的质量，同时具有更高的可并行性。 `nn.Transformer`模块完全依赖于注意力机制(另一个最近实现为](https://arxiv.org/pdf/1706.03762.pdf) [nn.MultiheadAttention](https://pytorch.org/docs/master/nn.html?highlight=multiheadattention#torch.nn.MultiheadAttention) 的模块）来绘制输入和输出之间的全局依存关系。 `nn.Transformer`模块现在已高度模块化，因此可以轻松地修改/组成单个组件(例如本教程中的 [nn.TransformerEncoder](https://pytorch.org/docs/master/nn.html?highlight=nn%20transformerencoder#torch.nn.TransformerEncoder))。
+PyTorch 1.2 发布了一个基于论文《[Attention is All You Need](https://arxiv.org/pdf/1706.03762.pdf)》的标准transformer模块。transformer模型在很多seq2seq问题上效果更好，且更容易实现并行训练。`nn.Transformer`模块使用一种注意力机制（最近实现的另一种注意力为 [nn.MultiheadAttention](https://pytorch.org/docs/master/nn.html?highlight=multiheadattention#torch.nn.MultiheadAttention)）来捕捉输出和输入之间的整体依赖关系。 `nn.Transformer`做到了高度模块化，其中的单个组件也很容易进行修改和使用(例如本教程中的 [nn.TransformerEncoder](https://pytorch.org/docs/master/nn.html?highlight=nn%20transformerencoder#torch.nn.TransformerEncoder))。

 ![../_images/transformer_architecture.jpg](img/4b79dddf1ff54b9384754144d8246d9b.jpg)

 ## 定义模型

-在本教程中，我们在语言建模任务上训练`nn.TransformerEncoder`模型。 语言建模任务是为给定单词(或单词序列）遵循单词序列的可能性分配概率。 令牌序列首先传递到嵌入层，然后传递到位置编码层以说明单词的顺序(有关更多详细信息，请参见下一段）。 `nn.TransformerEncoder`由多层 [nn.TransformerEncoderLayer](https://pytorch.org/docs/master/nn.html?highlight=transformerencoderlayer#torch.nn.TransformerEncoderLayer) 组成。 与输入序列一起，还需要一个正方形的注意掩码，因为`nn.TransformerEncoder`中的自注意层只允许出现在该序列中的较早位置。 对于语言建模任务，应屏蔽将来头寸上的所有标记。 为了获得实际的单词，`nn.TransformerEncoder`模型的输出将发送到最终的 Linear 层，然后是 log-Softmax 函数。
+在本教程中，我们训练了一个`nn.TransformerEncoder`模型来进行语言建模任务。语言建模任务是指：已有一句话，预测其后续出现某个词或某句话的概率。这句话（一串符号）经过嵌入（embedding）层之后，再使用一个位置编码（positional encoding）层来学习其中的词顺序（详见下一段）。`nn.TransformerEncoder`由多层 [nn.TransformerEncoderLayer](https://pytorch.org/docs/master/nn.html?highlight=transformerencoderlayer#torch.nn.TransformerEncoderLayer) 组成。除了输入序列之外，还需要一个正方形的注意力掩码矩阵。因为是用已经出现的词预测后面的词，训练过程中模型不能看到后面已经出现的词，需要用mask矩阵掩盖掉。 为了获得每个单词的预测概率，`nn.TransformerEncoder`后面会接上一个Linear层和softmax层。

-```
+```python
 import math
 import torch
 import torch.nn as nn
@@ -63,9 +59,9 @@ class TransformerModel(nn.Module):

 ```

-`PositionalEncoding`模块注入一些有关令牌在序列中的相对或绝对位置的信息。 位置编码的尺寸与嵌入的尺寸相同，因此可以将两者相加。 在这里，我们使用不同频率的`sine`和`cosine`功能。
+`PositionalEncoding`模块能够学到一些序列中符号的相对或绝对位置信息。位置编码层的输出维度与嵌入层相同，两者可以相加。这里我们使用`sine`和`cosine`函数来学习单词之间的位置信息。

-```
+```python
 class PositionalEncoding(nn.Module):

    def __init__(self, d_model, dropout=0.1, max_len=5000):
@@ -86,15 +82,15 @@ class PositionalEncoding(nn.Module):

 ```

-## 加载和批处理数据
+## 加载数据

-训练过程使用`torchtext`中的 Wikitext-2 数据集。 vocab 对象是基于训练数据集构建的，用于将标记数字化为张量。 从顺序数据开始，`batchify()`函数将数据集排列为列，以修剪掉数据分成大小为`batch_size`的批次后剩余的所有令牌。 例如，以字母为序列(总长度为 26）并且批大小为 4，我们将字母分为 4 个长度为 6 的序列：
+训练时使用`torchtext`中的 Wikitext-2 数据集。下面代码中的 vocab 对象可以将数据集中的符号转为张量，`batchify()`函数用于生成批次数据，将训练集按照`batch_size`切分为多个序列，并剔除多余的字符。 例如，当输入序列是字母表时(总长度为 26），设置`batch_size`为 4，`batchify()`函数会将输入序列分为 4 个长度为 6 的序列：

-![](img/e04a27246098e9a6b2338d4a6ed99680.jpg)
+$$\begin{split}\begin{bmatrix} \text{A} & \text{B} & \text{C} & \ldots & \text{X} & \text{Y} & \text{Z} \end{bmatrix} \Rightarrow \begin{bmatrix} \begin{bmatrix}\text{A} \\ \text{B} \\ \text{C} \\ \text{D} \\ \text{E} \\ \text{F}\end{bmatrix} & \begin{bmatrix}\text{G} \\ \text{H} \\ \text{I} \\ \text{J} \\ \text{K} \\ \text{L}\end{bmatrix} & \begin{bmatrix}\text{M} \\ \text{N} \\ \text{O} \\ \text{P} \\ \text{Q} \\ \text{R}\end{bmatrix} & \begin{bmatrix}\text{S} \\ \text{T} \\ \text{U} \\ \text{V} \\ \text{W} \\ \text{X}\end{bmatrix} \end{bmatrix}\end{split}$$

-这些列被模型视为独立的，这意味着无法了解`G`和`F`的依赖性，但可以进行更有效的批处理。
+不同列对于模型来说是独立的，这意味模型无法学习`G`和`F`的依赖性，但可以进行更有效的批次训练。

-```
+```python
 import torchtext
 from torchtext.data.utils import get_tokenizer
 TEXT = torchtext.data.Field(tokenize=get_tokenizer("basic_english"),
@@ -123,7 +119,7 @@ test_data = batchify(test_txt, eval_batch_size)

 ```

-出：
+输出：

 ```
 downloading wikitext-2-v1.zip
@@ -131,15 +127,15 @@ extracting

 ```

-### 产生输入和目标序列的功能
+### 生成输入序列和目标序列

-`get_batch()`功能为变压器模型生成输入和目标序列。 它将源数据细分为长度为`bptt`的块。 对于语言建模任务，模型需要以下单词作为`Target`。 例如，如果`bptt`值为 2，则`i` = 0 时，我们将获得以下两个变量：
+`get_batch()`函数为transformer 模型生成输入和目标序列，将源数据切分为长度为`bptt`的块。 对于语言建模任务，模型需要后面出现的单词作为`Target`。 例如，`bptt`值为 2、`i` = 0 时，`get_batch()`函数会得到以下两个变量：

 ![../_images/transformer_input_target.png](img/20ef8681366b44461cf49d1ab98ab8f2.jpg)

-应该注意的是，这些块沿维度 0，与 Transformer 模型中的`S`维度一致。 批次尺寸`N`沿尺寸 1。
+注意，每一块数据的第0维与 Transformer 模型中的`S`维度一致，第1维是批次尺寸`N`。

-```
+```python
 bptt = 35
 def get_batch(source, i):
    seq_len = min(bptt, len(source) - 1 - i)
@@ -149,11 +145,11 @@ def get_batch(source, i):

 ```

-## 启动实例
+## 初始化模型实例

-使用下面的超参数建立模型。 vocab 的大小等于 vocab 对象的长度。
+使用下面的超参数创建模型。 词表大小等于 vocab 对象的长度。

-```
+```python
 ntokens = len(TEXT.vocab.stoi) # the size of vocabulary
 emsize = 200 # embedding dimension
 nhid = 200 # the dimension of the feedforward network model in nn.TransformerEncoder
@@ -166,9 +162,9 @@ model = TransformerModel(ntokens, emsize, nhead, nhid, nlayers, dropout).to(devi

 ## 运行模型

-[CrossEntropyLoss](https://pytorch.org/docs/master/nn.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss) 用于跟踪损失， [SGD](https://pytorch.org/docs/master/optim.html?highlight=sgd#torch.optim.SGD) 实现随机梯度下降法作为优化器。 初始学习率设置为 5.0。 [StepLR](https://pytorch.org/docs/master/optim.html?highlight=steplr#torch.optim.lr_scheduler.StepLR) 用于通过历时调整学习速率。 在训练过程中，我们使用 [nn.utils.clip_grad_norm_](https://pytorch.org/docs/master/nn.html?highlight=nn%20utils%20clip_grad_norm#torch.nn.utils.clip_grad_norm_) 函数将所有梯度缩放在一起，以防止爆炸。
+损失函数采用[CrossEntropyLoss](https://pytorch.org/docs/master/nn.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss) ， 优化器采用[SGD](https://pytorch.org/docs/master/optim.html?highlight=sgd#torch.optim.SGD) 中实现的随机梯度下降法。初始学习率设置为 5.0。 [StepLR](https://pytorch.org/docs/master/optim.html?highlight=steplr#torch.optim.lr_scheduler.StepLR) 用于在不同的迭代伦次（epochs）中调整学习率。 训练时使用 [nn.utils.clip_grad_norm_](https://pytorch.org/docs/master/nn.html?highlight=nn%20utils%20clip_grad_norm#torch.nn.utils.clip_grad_norm_) 函数将所有梯度进行缩放，来防止发生梯度爆炸。

-```
+```python
 criterion = nn.CrossEntropyLoss()
 lr = 5.0 # learning rate
 optimizer = torch.optim.SGD(model.parameters(), lr=lr)
@@ -217,9 +213,9 @@ def evaluate(eval_model, data_source):

 ```

-循环遍历。 如果验证损失是迄今为止迄今为止最好的，请保存模型。 在每个时期之后调整学习率。
+对训练集最多遍历`epochs`次。 如果模型在验证集上的损失达到最优，则保存模型。 每一轮训练结束后都会调整学习率。

-```
+```python
 best_val_loss = float("inf")
 epochs = 3 # The number of epochs
 best_model = None
@@ -242,7 +238,7 @@ for epoch in range(1, epochs + 1):

 ```

-Out:
+输出:

 ```
 | epoch   1 |   200/ 2981 batches | lr 5.00 | ms/batch 29.47 | loss  8.04 | ppl  3112.50
@@ -299,11 +295,11 @@ Out:

 ```

-## 使用测试数据集评估模型
+## 在测试集上评估模型

-应用最佳模型以检查测试数据集的结果。
+在测试集上使用保存的最优模型来评估效果。

-```
+```python
 test_loss = evaluate(best_model, test_data)
 print('=' * 89)
 print('| End of training | test loss {:5.2f} | test ppl {:8.2f}'.format(
@@ -312,7 +308,7 @@ print('=' * 89)

 ```

-Out:
+输出:

 ```
 =========================================================================================
@@ -321,8 +317,5 @@ Out:

 ```

-**脚本的总运行时间：**(4 分钟 39.556 秒）
-
-[`Download Python source code: transformer_tutorial.py`](../_downloads/f53285338820248a7c04a947c5110f7b/transformer_tutorial.py) [`Download Jupyter notebook: transformer_tutorial.ipynb`](../_downloads/dca13261bbb4e9809d1a3aa521d22dd7/transformer_tutorial.ipynb)
+**脚本的总运行时间：**(4 分钟 42.167 秒）

-[由狮身人面像画廊](https://sphinx-gallery.readthedocs.io)生成的画廊
\ No newline at end of file