2021-01-15 22:00:35

0a791e63 · wizardforcel · b1322cb3 · 0a791e63 · 0a791e63 · 0a791e63
7 changed file
--- a/new/dl-pt-workshop/1.md
+++ b/new/dl-pt-workshop/1.md
@@ -343,7 +343,7 @@ loss_funct = nn.MSELoss（）

    要了解有关 PyTorch 中种子的更多信息，请访问[这里](https://pytorch.org/docs/stable/notes/randomness.html)。

-2.  Define the number of features of the input data as **10** (**input_units**) and the number of nodes of the output layer as`1`(**output_units**):
+2.  Define the number of features of the input data as`10`(**input_units**) and the number of nodes of the output layer as`1`(**output_units**):

    input_units = 10

@@ -469,7 +469,7 @@ Optimizer.step（）

    y = torch.randint（0,2，（20,1））。type（torch.FloatTensor）

-3.  Define the optimization algorithm as the Adam optimizer. Set the learning rate equal to **0.01**:
+3.  Define the optimization algorithm as the Adam optimizer. Set the learning rate equal to`0.01`:

    优化程序= optim.Adam（model.parameters（），lr = 0.01）


--- a/new/dl-pt-workshop/3.md
+++ b/new/dl-pt-workshop/3.md
@@ -416,7 +416,7 @@ o = F.softmax（self.linear2（z））

    导入功能为 F 的 torch.nn。

-2.  Define the necessary variables for the input, hidden, and output dimensions. Set them to **10**,`5`, and`2`, respectively:
+2.  Define the necessary variables for the input, hidden, and output dimensions. Set them to`10`,`5`, and`2`, respectively:

    D_i = 10

@@ -639,7 +639,7 @@ batch_size = 100
 4.  使用 scikit-learn 的`train_test_split`函数，将数据集分为训练，验证和测试集。 使用 60:20:20 的分配比例。 将`random_state`设置为`0`。
 5.  考虑到要素矩阵应为浮点型，而目标矩阵则应为非浮点型，将验证和测试集转换为张量。 目前暂时不要转换训练集，因为它们将进行进一步的转换。
 6.  构建用于定义网络层的自定义模块类。 包括一个前向功能，该功能指定将应用于每层输出的激活功能。 对于除输出之外的所有层，请使用 **ReLU**，在此处应使用`log_softmax`。
-7.  实例化模型并定义训练模型所需的所有变量。 将时期数设置为`50`，并将批大小设置为 **128** 。 使用`0.001`的学习率。
+7.  实例化模型并定义训练模型所需的所有变量。 将时期数设置为`50`，并将批大小设置为`128`。 使用`0.001`的学习率。
 8.  Train the network using the training set's data. Use the validation sets to measure performance. To do this, save the loss and the accuracy for both the training and validation sets in each epoch.

    注意
@@ -722,7 +722,7 @@ batch_size = 100

 该练习不需要进行任何编码，而是需要对先前活动的结果进行分析。

-1.  Assuming a Bayes error of **0.15**, perform error analysis and diagnose the model:
+1.  Assuming a Bayes error of`0.15`, perform error analysis and diagnose the model:

    贝叶斯误差（BE）= 0.15

@@ -730,7 +730,7 @@ batch_size = 100

    验证设置误差（VSE）= 1 – 0.71 = 0.29

-    用作两组精度的值（**0.716** 和 **0.71**）是在*活动 3.01* ，*建立 ANN* ：
+    用作两组精度的值（`0.716`和`0.71`）是在*活动 3.01* ，*建立 ANN* ：

    高偏置= TSE – BE = 0.134


--- a/new/dl-pt-workshop/7.md
+++ b/new/dl-pt-workshop/7.md
@@ -36,7 +36,7 @@

    loss_function = torch.nn.MSELoss（）

-6.  Define the optimizer of your model. Use the Adam optimizer and a learning rate of **0.01**:
+6.  Define the optimizer of your model. Use the Adam optimizer and a learning rate of`0.01`:

    优化程序= torch.optim.Adam（model.parameters（），lr = 0.01）

@@ -62,7 +62,7 @@

    打印（loss.item（））

-    最终损失应约为 **0.24** 。
+    最终损失应约为`0.24`。

 8.  Make a line plot to display the loss value for each iteration step:

@@ -428,7 +428,7 @@

    返回

-7.  Instantiate the model and define all the variables required to train the model. Set the number of epochs to **50** and the batch size to **128**. Use a learning rate of **0.001**:
+7.  Instantiate the model and define all the variables required to train the model. Set the number of epochs to`50`and the batch size to`128`. Use a learning rate of`0.001`:

    模型=分类器（X_train.shape [1]）

@@ -1132,7 +1132,7 @@

    前面的代码段包含一个定义了网络体系结构的类（**__init__** 方法），以及在信息正向传递过程中所遵循的步骤（**正向** 方法）。

-7.  Define all of the parameters that are required to train your model. Set the number of epochs to **50**:
+7.  Define all of the parameters that are required to train your model. Set the number of epochs to`50`:

    型号= CNN（）

@@ -1496,7 +1496,7 @@

 2.  Change the definition of the **transform** variable so that it includes, in addition to normalizing and converting the data into tensors, the following transformations:

-    对于训练/验证集，请使用 **RandomHorizontalFlip** 函数，其概率为 50% （**0.5**），并使用 **RandomGrayscale** 函数，其概率为 10% （**0.1**）。
+    对于训练/验证集，请使用 **RandomHorizontalFlip** 函数，其概率为 50% （`0.5`），并使用 **RandomGrayscale** 函数，其概率为 10% （`0.1`）。

    对于测试集，请勿添加任何其他转换：

@@ -1864,7 +1864,7 @@

    Beta = 1e5

-10.  Run the model for 500 iterations. Define the Adam optimization algorithm before starting to train the model, using **0.001** as the learning rate:
+10.  Run the model for 500 iterations. Define the Adam optimization algorithm before starting to train the model, using`0.001`as the learning rate:

    注意

@@ -2074,7 +2074,7 @@

    与之前的活动一样，该类包含 **__init__** 方法以及网络体系结构，以及**转发**方法，该方法确定信息在各层之间的流动。

-6.  Instantiate the **class** function containing the model. Feed the input size, the number of neurons in each recurrent layer (**10**), and the number of recurrent layers (`1`):
+6.  Instantiate the **class** function containing the model. Feed the input size, the number of neurons in each recurrent layer (`10`), and the number of recurrent layers (`1`):

    模型= RNN（data_train.shape [1]，10，1）

@@ -2346,7 +2346,7 @@

    x = np.array（x）.reshape（（[ n_seq，-1））

-8.  Instantiate your model by using **256** as the number of hidden units for a total of two recurrent layers:
+8.  Instantiate your model by using`256`as the number of hidden units for a total of two recurrent layers:

    模型= LSTM（仅（字符），256、2）

@@ -2366,7 +2366,7 @@

    模型= LSTM（len（chars），256，2）.to（“ cuda”）

-9.  Define the loss function and the optimization algorithms. Use the Adam optimizer and the cross-entropy loss to do this. Train the network for **20** epochs:
+9.  Define the loss function and the optimization algorithms. Use the Adam optimizer and the cross-entropy loss to do this. Train the network for`20`epochs:

    loss_function = nn.CrossEntropyLoss（）

@@ -2374,7 +2374,7 @@

    时代= 20

-    如果您的机器有可用的 GPU，请尝试运行 **500** 时期的训练过程：
+    如果您的机器有可用的 GPU，请尝试运行`500`时期的训练过程：

    时代= 500


--- a/new/handson-nlp-pt-1x/2.md
+++ b/new/handson-nlp-pt-1x/2.md
@@ -186,7 +186,7 @@ PyTorch 与其他深度学习框架之间的另一个主要区别是语法。 Py

    train = train.drop（“ label”，axis = 1）.values.reshape（len（train），1,28,28）

-    请注意，我们将输入重塑为[ **1，** **1，** **28，** **28**），每个张量为 1,000 张图像 由 28x28 像素组成。
+    请注意，我们将输入重塑为[ **1，** **1，** **28，**`28`），每个张量为 1,000 张图像 由 28x28 像素组成。

 2.  Next, we convert our training data and training labels into PyTorch tensors so they can be fed into the neural network:

@@ -216,7 +216,7 @@ self.fc4 = nn.Linear（98，10）

 我们像从 Python PyTorch 中继承 **nn.Module** 一样，在 Python 中构建普通类，从而构建分类器。 在我们的**初始**方法中，我们定义了神经网络的每一层。 在这里，我们定义了大小可变的完全连接的线性层。

-我们的第一层接受 **784** 输入，因为这是我们要分类的每个图像的大小（28x28）。 然后，我们看到一层的输出必须与下一层的输入具有相同的值，这意味着我们的第一个完全连接的层输出 **392** 个单位，而我们的第二层则采用 **392** 单位作为输入。 对每一层都重复一次，每次它们具有一半的单位数量，直到我们到达最终的完全连接层为止，该层输出 **10** 个单位。 这是我们分类层的长度。
+我们的第一层接受`784`输入，因为这是我们要分类的每个图像的大小（28x28）。 然后，我们看到一层的输出必须与下一层的输入具有相同的值，这意味着我们的第一个完全连接的层输出`392`个单位，而我们的第二层则采用`392`单位作为输入。 对每一层都重复一次，每次它们具有一半的单位数量，直到我们到达最终的完全连接层为止，该层输出`10`个单位。 这是我们分类层的长度。

 我们的网络现在看起来像这样：

@@ -224,7 +224,7 @@ self.fc4 = nn.Linear（98，10）

 图 2.12 –我们的神经网络

-在这里，我们可以看到我们的最后一层输出了 **10** 个单位。 这是因为我们希望预测每个图像是否为 0 到 9 之间的数字，总共是 10 种不同的可能分类。 我们的输出是长度为 **10** 的向量，并且包含图像的 10 种可能值中的每一个的预测。 在进行最终分类时，我们将数值最高的数字分类作为模型的最终预测。 例如，对于给定的预测，我们的模型可能会预测图像类型为 1 的概率为 10%，类型 2 的概率为 10%，类型 3 的概率为 80%。 因此，我们将类型 3 作为预测，因为它以最高概率被预测。
+在这里，我们可以看到我们的最后一层输出了`10`个单位。 这是因为我们希望预测每个图像是否为 0 到 9 之间的数字，总共是 10 种不同的可能分类。 我们的输出是长度为`10`的向量，并且包含图像的 10 种可能值中的每一个的预测。 在进行最终分类时，我们将数值最高的数字分类作为模型的最终预测。 例如，对于给定的预测，我们的模型可能会预测图像类型为 1 的概率为 10%，类型 2 的概率为 10%，类型 3 的概率为 80%。 因此，我们将类型 3 作为预测，因为它以最高概率被预测。

 ## 实施辍学


--- a/new/handson-nlp-pt-1x/3.md
+++ b/new/handson-nlp-pt-1x/3.md
@@ -212,11 +212,11 @@ projection_king_embedding =手套['女王']-手套['女人'] +手套['男人']

    图 3.10 –编码数据

-5.  We then define the length of our embeddings. While this can technically be any number you wish, there are some tradeoffs to consider. While higher-dimensional embeddings can lead to a more detailed representation of the words, the feature space also becomes sparser, which means high-dimensional embeddings are only appropriate for large corpuses. Furthermore, larger embeddings mean more parameters to learn, so increasing the embedding size can increase training time significantly. We are only training on a very small dataset, so we have opted to use embeddings of size **20**:
+5.  We then define the length of our embeddings. While this can technically be any number you wish, there are some tradeoffs to consider. While higher-dimensional embeddings can lead to a more detailed representation of the words, the feature space also becomes sparser, which means high-dimensional embeddings are only appropriate for large corpuses. Furthermore, larger embeddings mean more parameters to learn, so increasing the embedding size can increase training time significantly. We are only training on a very small dataset, so we have opted to use embeddings of size`20`:

    embedding_length = 20

-    接下来，我们在 PyTorch 中定义 **CBOW** 模型。 我们定义嵌入层，以便它接受语料库长度的向量，并输出单个嵌入。 我们将线性层定义为一个完全连接的层，该层将嵌入并输出 **64** 的向量。 我们将最后一层定义为与文本语料库相同长度的分类层。
+    接下来，我们在 PyTorch 中定义 **CBOW** 模型。 我们定义嵌入层，以便它接受语料库长度的向量，并输出单个嵌入。 我们将线性层定义为一个完全连接的层，该层将嵌入并输出`64`的向量。 我们将最后一层定义为与文本语料库相同长度的分类层。

 6.  We define our forward pass by obtaining and summing the embeddings for all input context words. This then passes through the fully connected layer with ReLU activation functions and finally into the classification layer, which predicts which word in the corpus corresponds to the summed embeddings of the context words the most:


--- a/new/handson-nlp-pt-1x/5.md
+++ b/new/handson-nlp-pt-1x/5.md
@@ -239,7 +239,7 @@ int_to_word_dict

 图 5.14 –长度值

-我们可以看到最长的句子是 **70** 个字长，平均句子长度是 **11.78** 。 为了捕获所有句子中的所有信息，我们希望填充所有句子，使它们的长度为 70。但是，使用更长的句子意味着更长的序列，这会使我们的 LSTM 层变得更深。 这意味着模型训练需要更长的时间，因为我们必须通过更多层反向传播梯度，但是这也意味着我们输入的很大一部分只是稀疏并且充满了空令牌，这使得从数据中学习的效率大大降低 。 我们的最大句子长度远大于我们的平均句子长度，这说明了这一点。 为了捕获我们大部分的句子信息而不会不必要地填充我们的输入并使它们太稀疏，我们选择使用 **50** 的输入大小。 您可能希望尝试在 **20** 和 **70** 之间使用不同的输入大小，以了解这如何影响模型性能。
+我们可以看到最长的句子是`70`个字长，平均句子长度是`11.78`。 为了捕获所有句子中的所有信息，我们希望填充所有句子，使它们的长度为 70。但是，使用更长的句子意味着更长的序列，这会使我们的 LSTM 层变得更深。 这意味着模型训练需要更长的时间，因为我们必须通过更多层反向传播梯度，但是这也意味着我们输入的很大一部分只是稀疏并且充满了空令牌，这使得从数据中学习的效率大大降低 。 我们的最大句子长度远大于我们的平均句子长度，这说明了这一点。 为了捕获我们大部分的句子信息而不会不必要地填充我们的输入并使它们太稀疏，我们选择使用`50`的输入大小。 您可能希望尝试在`20`和`70`之间使用不同的输入大小，以了解这如何影响模型性能。

 我们将创建一个函数，使我们能够填充句子，使它们的大小相同。 对于短于序列长度的评论，我们用空标记填充它们。 对于长度超过序列长度的评论，我们只需丢弃超过最大序列长度的所有标记：

@@ -425,7 +425,7 @@ valid_loader = DataLoader（valid_data，batch_size = batch_size，随机播放=

 test_loader = DataLoader（test_data，batch_size = batch_size，随机播放= True）

-现在我们已经为我们的三个数据集定义了 **DataLoader** 对象，接下来我们定义训练循环。 我们首先定义许多超参数，这些超参数将在我们的训练循环中使用。 最重要的是，我们将损失函数定义为二进制交叉熵（因为我们正在处理单个二进制类别的预测），并且将优化器定义为 **Adam** ，学习率为 **0.001** 。 我们还定义了模型以运行较短的时间（以节省时间），并设置 **clip = 5** 来定义梯度裁剪：
+现在我们已经为我们的三个数据集定义了 **DataLoader** 对象，接下来我们定义训练循环。 我们首先定义许多超参数，这些超参数将在我们的训练循环中使用。 最重要的是，我们将损失函数定义为二进制交叉熵（因为我们正在处理单个二进制类别的预测），并且将优化器定义为 **Adam** ，学习率为`0.001`。 我们还定义了模型以运行较短的时间（以节省时间），并设置 **clip = 5** 来定义梯度裁剪：

 print_every = 2400

@@ -569,7 +569,7 @@ final.append（word_to_int_dict ['']）

 返回最终

-我们删除标点符号和尾随空格，将字母转换为小写，并像以前一样对输入句子进行标记化。 我们将句子填充到长度为 **50** 的序列上，然后使用我们的预先计算的字典将的标记转换为数值。 请注意，我们的输入内容可能包含我们的网络从未见过的新词。 在这种情况下，我们的函数会将它们视为空令牌。
+我们删除标点符号和尾随空格，将字母转换为小写，并像以前一样对输入句子进行标记化。 我们将句子填充到长度为`50`的序列上，然后使用我们的预先计算的字典将的标记转换为数值。 请注意，我们的输入内容可能包含我们的网络从未见过的新词。 在这种情况下，我们的函数会将它们视为空令牌。

 接下来，我们创建实际的**预言（）**函数。 我们预处理输入检查，将其转换为张量，然后将其传递给数据加载器。 然后，我们遍历该数据加载器（即使它仅包含一个句子），并通过我们的网络进行审查以获得预测。 最后，我们评估我们的预测并打印出正面还是负面的评价：

@@ -727,7 +727,7 @@ mkdir 模型

    单词= np.array（[preprocess_review（review = i）]）

-5.  To define our output, we return a JSON response consisting of the output from our model and a response code, **200**, which is what is returned by our predict function:
+5.  To define our output, we return a JSON response consisting of the output from our model and a response code,`200`, which is what is returned by our predict function:

    输出= model（x）[0] .item（）


--- a/new/handson-nlp-pt-1x/6.md
+++ b/new/handson-nlp-pt-1x/6.md
@@ -229,7 +229,7 @@ Questions.vocab.vectors

 图 6.14 –张量内容

-接下来，我们创建数据迭代器。 我们为培训和验证数据创建单独的迭代器。 我们首先指定一种设备，以便能够使用支持 CUDA 的 GPU 更快地训练模型。 在迭代器中，我们还指定了要由迭代器返回的批处理的大小，在这种情况下为 **64** 。 您可能希望对模型使用不同批次大小的进行试验，因为这可能会影响训练速度以及模型收敛到其全局最优速度的速度：
+接下来，我们创建数据迭代器。 我们为培训和验证数据创建单独的迭代器。 我们首先指定一种设备，以便能够使用支持 CUDA 的 GPU 更快地训练模型。 在迭代器中，我们还指定了要由迭代器返回的批处理的大小，在这种情况下为`64`。 您可能希望对模型使用不同批次大小的进行试验，因为这可能会影响训练速度以及模型收敛到其全局最优速度的速度：

 device = torch.device（如果 torch.cuda.is_available（）则为“ cuda”，否则为“ cpu”）

@@ -257,7 +257,7 @@ batch_size = 64，

    self.embedding = nn.Embedding（vocab_size，embedding_dim，padding_idx = pad_idx）

-    嵌入层将由词汇表中每个可能单词的嵌入组成，因此该层的大小是词汇表的长度和嵌入向量的长度。 我们正在使用 200 维 GLoVe 向量，因此在这种情况下，长度为 **200** 。 我们还必须传递 padding 索引，该索引是我们嵌入层的索引，用于使嵌入填充我们的句子，以便它们的长度相同。 我们将在稍后初始化模型时手动定义此嵌入。
+    嵌入层将由词汇表中每个可能单词的嵌入组成，因此该层的大小是词汇表的长度和嵌入向量的长度。 我们正在使用 200 维 GLoVe 向量，因此在这种情况下，长度为`200`。 我们还必须传递 padding 索引，该索引是我们嵌入层的索引，用于使嵌入填充我们的句子，以便它们的长度相同。 我们将在稍后初始化模型时手动定义此嵌入。

 3.  Next, we define the actual convolutional layers within our network:

@@ -339,7 +339,7 @@ dropout_pc = 0.5

 模型= CNN（input_dimensions，embedding_dimensions，number_of_filters，filter_sizes，output_dimensions，dropout_pc，pad_index）

-输入维度将始终是词汇量的长度，而输出维度将是我们希望预测的类的数量。 在这里，我们从六个不同的类别进行预测，因此我们的输出向量的长度为`6`。 我们的嵌入维数是 GLoVe 向量的长度（在这种情况下为 **200**）。 填充索引可以从我们的词汇表中手动获取。
+输入维度将始终是词汇量的长度，而输出维度将是我们希望预测的类的数量。 在这里，我们从六个不同的类别进行预测，因此我们的输出向量的长度为`6`。 我们的嵌入维数是 GLoVe 向量的长度（在这种情况下为`200`）。 填充索引可以从我们的词汇表中手动获取。

 可以手动调整接下来的三个超参数，因此您不妨尝试选择不同的值，以了解这如何影响网络的最终输出。 我们传递了一个过滤器大小列表，以便我们的模型将使用大小为`2`，`3`和`4`的卷积训练卷积层。 我们将针对每种滤镜尺寸训练 100 个滤镜，因此总共将有 300 个滤镜。 我们还为我们的网络定义了 50% 的辍学率，以确保其充分正规化。 如果模型似乎容易过拟合或过拟合，则可以升高/降低此值。 一般的经验法则是，如果模型拟合不足，则尝试降低辍学率；如果模型拟合过度，则尝试提高辍学率。

@@ -355,7 +355,7 @@ model.embedding.weight.data.copy_（glove_embeddings）

 图 6.16 –降低压差后的张量输出

-接下来，我们需要定义模型如何处理我们的模型处理嵌入层中未包含的未知标记的实例，以及我们的模型如何将填充应用于我们的输入语句。 幸运的是，解决这两种情况的最简单方法是使用由全零组成的向量。 我们确保这些零值张量与嵌入向量的长度相同（在这种情况下为 **200**）：
+接下来，我们需要定义模型如何处理我们的模型处理嵌入层中未包含的未知标记的实例，以及我们的模型如何将填充应用于我们的输入语句。 幸运的是，解决这两种情况的最简单方法是使用由全零组成的向量。 我们确保这些零值张量与嵌入向量的长度相同（在这种情况下为`200`）：

 unknown_index = questions.vocab.stoi [questions.unk_token]