2021-01-16 23:10:40

ab07dab4 · wizardforcel · a2394af0 · ab07dab4 · ab07dab4
隐藏空白更改
内联并排

Showing with 40 addition and 40 deletion

new/handson-nlp-pt-1x/7.md new/handson-nlp-pt-1x/7.md +12 -12

new/handson-nlp-pt-1x/8.md new/handson-nlp-pt-1x/8.md +28 -28

未找到文件。
--- a/new/handson-nlp-pt-1x/7.md
+++ b/new/handson-nlp-pt-1x/7.md
@@ -446,7 +446,7 @@ pred = self.fc_out（output.squeeze（0））

    epoch_loss = 0

-4.  We then loop through each batch within our training iterator and extract the sentence to be translated (**src**) and the correct translation of this sentence (**trg**). We then zero our gradients (to prevent gradient accumulation) and calculate the output of our model by passing our model function our inputs and outputs:
+4.  然后，我们在训练迭代器内循环检查每个批次，并提取要翻译的句子（`src`）和这个句子的正确翻译（`trg`）。然后我们将我们的梯度归零（以防止梯度累积），并通过传递我们的模型函数我们的输入和输出来计算我们模型的输出。

    对于我，在枚举（迭代器）中进行批量：

@@ -458,7 +458,7 @@ pred = self.fc_out（output.squeeze（0））

    输出=模型（src，trg）

-5.  Next, we need to calculate the loss of our model’s prediction by comparing our predicted output to the true, correct translated sentence. We reshape our output data and our target data using the shape and view functions in order to create two tensors that can be compared to calculate the loss. We calculate the **loss** criterion between our output and **trg** tensors and then backpropagate this loss through the network:
+5.  接下来，我们需要通过比较我们的预测输出和真实的、正确的翻译句子来计算我们模型预测的损失。我们使用 shape 和 view 函数重塑我们的输出数据和目标数据，以便创建两个可以比较的 tensors 来计算损失。我们计算我们的输出和`trg`向量之间的`loss`标准，然后通过网络反推这个损失。

    output_dims = output.shape [-1]

@@ -470,7 +470,7 @@ pred = self.fc_out（output.squeeze（0））

    loss.backward（）

-6.  We then implement gradient clipping to prevent exploding gradients within our model, step our optimizer in order to perform the necessary parameter updates via gradient descent, and finally add the loss of the batch to the epoch loss. This whole process is repeated for all the batches within a single training epoch, whereby the final averaged loss per batch is returned:
+6.  然后，我们实现梯度剪裁以防止模型内的梯度爆炸，通过梯度下降对我们的优化器进行必要的参数更新，最后将批次的损失添加到周期损失中。整个过程对一个训练周期内的所有批次重复进行，从而返回每个批次的最终平均损失。

    torch.nn.utils.clip_grad_norm_（model.parameters（），剪辑）

@@ -480,19 +480,19 @@ pred = self.fc_out（output.squeeze（0））

    返回 epoch_loss / len（迭代器）

-7.  After, we create a similar function called **evaluate()**. This function will calculate the loss of our validation data across the network in order to evaluate how our model performs when translating data it hasn’t seen before. This function is almost identical to our **train()** function, with the exception of the fact that we switch to evaluation mode:
+7.  之后，我们创建一个类似的函数，叫做`evaluate()`。这个函数将计算我们的验证数据在整个网络中的损失，以便评估我们的模型在翻译它以前没有见过的数据时的表现。这个函数与我们的`train()`函数几乎相同，只是我们切换到了评估模式。

    model.eval（）

-8.  Since we don’t perform any updates for our weights, we need to make sure to implement **no_grad** mode:
+8.  由于我们不对我们的权重进行任何更新，我们需要确保实现`no_grad`模式。

    使用 torch.no_grad（）：

-9.  The only other difference is that we need to make sure we turn off teacher forcing when in evaluation mode. We wish to assess our model’s performance on unseen data, and enabling teacher forcing would use our correct (target) data to help our model make better predictions. We want our model to be able to make perfect, unaided predictions:
+9.  唯一不同的是，我们需要确保在评估模式下关闭教师强制。我们希望评估我们的模型在未见数据上的表现，而启用教师强迫将使用我们正确的（目标）数据来帮助我们的模型做出更好的预测。我们希望我们的模型能够做出完美的、无辅助的预测。

    输出=模型（src，trg，0）

-10.  Finally, we need to create a training loop, within which our **train()** and **evaluate()** functions are called. We begin by defining how many epochs we wish to train for and our maximum gradient (for use with gradient clipping). We also set our lowest validation loss to infinity. This will be used later to select our best-performing model:
+10.  最后，我们需要创建一个训练循环，在这个循环中调用`train()`和`evaluate()`函数。我们首先定义了我们希望训练的次数和最大梯度（用于梯度剪接）。我们还将最低验证损失设置为无穷大。这将在后面用来选择我们表现最好的模型。

    时代= 10

@@ -500,7 +500,7 @@ pred = self.fc_out（output.squeeze（0））

    minimum_validation_loss = float（‘inf’）

-11.  We then loop through each of our epochs and within each epoch, calculate our training and validation loss using our **train()** and **evaluate()** functions. We also time how long this takes by calling **time.time()** before and after the training process:
+11.  然后，我们循环浏览我们的每个时代，并在每个时代内，使用我们的`train()`和·evaluate()函数计算我们的训练和验证损失。我们还通过在训练过程前后调用`time.time()`来计算时间。

    对于范围内的纪元（纪元）：

@@ -512,7 +512,7 @@ pred = self.fc_out（output.squeeze（0））

    end_time = time.time（）

-12.  Next, for each epoch, we determine whether the model we just trained is the best-performing model we have seen thus far. If our model performs the best on our validation data (if the validation loss is the lowest we have seen so far), we save our model:
+12.  接下来，对于每个时代，我们确定我们刚刚训练的模型是否是我们迄今为止看到的表现最好的模型。如果我们的模型在我们的验证数据上表现最好（如果验证损失是我们迄今为止看到的最低的），我们就保存我们的模型。

    如果有效损失

@@ -520,7 +520,7 @@ pred = self.fc_out（output.squeeze（0））

    torch.save（model.state_dict（），seq2seq.pt）

-13.  Finally, we simply print our output:
+13.  最后，我们只需打印我们的输出。

    print（f’Epoch：{epoch + 1：02} | Time：{np.round（end_time-start_time，0）} s’）

@@ -540,13 +540,13 @@ pred = self.fc_out（output.squeeze（0））

 为了评估我们的模型，我们将使用我们的测试数据集并通过我们的模型运行英语句子，以获得对德语翻译的预测。 然后，我们将能够将其与真实的预测进行比较，以查看我们的模型是否做出了准确的预测。 让我们开始吧！

-1.  We start by creating a **translate()** function. This is functionally identical to the **evaluate()** function we created to calculate the loss over our validation set. However, this time, we are not concerned with the loss of our model, but rather the predicted output. We pass the model our source and target sentences and also make sure we turn teacher forcing off so that our model does not use these to make predictions. We then take our model’s predictions and use an **argmax** function to determine the index of the word that our model predicted for each word in our predicted output sentence:
+1.  我们首先创建一个`translate()`函数。这在功能上与我们创建的`evaluate()`函数相同，以计算验证集的损失。然而，这一次，我们不关心我们的模型的损失，而是预测的输出。我们将源句和目标句传递给模型，同时确保我们将教师强制关闭，这样我们的模型就不会使用这些句子来进行预测。然后我们把我们模型的预测结果，用`argmax`函数来确定我们模型预测输出句子中每个单词的索引。

    输出=模型（src，trg，0）

    preds = torch.tensor（[[x 输出中的[[torch.argmax（x）.item（）]]]

-2.  Then, we can use this index to obtain the actual predicted word from our German vocabulary. Finally, we compare the English input to our model that contains the correct German sentence and the predicted German sentence. Note that here, we use **[1:-1]** to drop the start and end tokens from our predictions and we reverse the order of our English input (since the input sentences were reversed before they were fed into the model):
+2.  然后，我们可以使用这个指数从我们的德语词汇中获得实际的预测词。最后，我们将英语输入与我们的模型进行比较，该模型包含正确的德语句子和预测的德语句子。请注意，在这里，我们使用`[1:-1]`从我们的预测中删除开始和结束标记，并且我们将英语输入的顺序反过来（因为输入句子在被输入到模型之前已经被反过来了）。

    print（‘英语输入：’+ str（[sOURCE.vocab.itos [x] for src 中的 x] [1：-1] [::-1]））


--- a/new/handson-nlp-pt-1x/8.md
+++ b/new/handson-nlp-pt-1x/8.md
@@ -137,7 +137,7 @@ corpus_name = "movie_corpus"

 过去，我们的语料库由几个词典组成，这些词典由我们的语料库中的唯一单词以及在单词和索引之间的查找组成。 但是，我们可以通过创建一个包含所有必需元素的词汇表类，以一种更为优雅的方式来实现此目的：

-1.  We start by creating our **Vocabulary** class. We initialize this class with empty dictionaries—**word2index** and **word2count**. We also initialize the **index2word** dictionary with placeholders for our padding tokens, as well as our **Start-of-Sentence** (**SOS**) and **End-of-Sentence** (**EOS**) tokens. We keep a running count of the number of words in our vocabulary, too (which is 3 to start with as our corpus already contains the three tokens mentioned). These are the default values for an empty vocabulary; however, they will be populated as we read our data in:
+1.  我们先创建`Vocabulary`类。我们用空字典--`word2index`和`word2count`来初始化这个类。我们还用填充标记的占位符以及**句子开始**（**SOS**）和**句子结束**（**EOS**）标记初始化了`index2word`字典。我们也会对词汇中的单词数量进行统计（首先是3个，因为我们的语料库已经包含了上述三个标记）。这些是一个空词汇的默认值，但是，当我们读入数据时，它们会被填充。

    PAD_token = 0

@@ -161,7 +161,7 @@ corpus_name = "movie_corpus"

    self.num_words = 3

-2.  Next, we create the functions that we will use to populate our vocabulary. **addWord** takes a word as input. If this is a new word that is not already in our vocabulary, we add this word to our indices, set the count of this word to 1, and increment the total number of words in our vocabulary by 1\. If the word in question is already in our vocabulary, we simply increment the count of this word by 1:
+2.  接下来，我们创建我们将用来填充词汇的函数。`addWord`接收一个单词作为输入。如果这是个新词，还没有在我们的词汇中，我们就把这个词添加到我们的索引中，把这个词的计数设为1，并把我们词汇中的总词数递增1。如果这个词已经在我们的词汇中，我们只需将这个词的数量增加1。

    def addWord（self，w）：

@@ -179,7 +179,7 @@ corpus_name = "movie_corpus"

    self.word2count [w] + = 1

-3.  We also use the **addSentence** function to apply the **addWord** function to all the words within a given sentence:
+3.  我们还使用`addSentence`函数将`addWord`函数应用于给定句子中的所有单词。

    def addSentence（自己发送）：

@@ -189,7 +189,7 @@ corpus_name = "movie_corpus"

    我们可以做的加快模型训练的一件事是减少词汇量。 这意味着任何嵌入层都将更小，并且模型中学习的参数总数会更少。 一种简单的方法是从我们的词汇表中删除所有低频词。 在我们的数据集中仅出现一次或两次的任何单词都不太可能具有巨大的预测能力，因此在最终模型中将它们从语料库中删除并替换为空白标记可以减少我们训练模型所需的时间并减少过拟合而无需 对我们模型的预测有很大的负面影响。

-4.  To remove low-frequency words from our vocabulary, we can implement a **trim** function. The function first loops through the word count dictionary and if the occurrence of the word is greater than the minimum required count, it is appended to a new list:
+4.  为了从词汇中删除低频词，我们可以实现一个`trim`函数。该函数首先循环浏览单词计数词典，如果该单词的出现次数大于所需的最小计数，则将其追加到一个新的列表中。

    def trim（self，min_cnt）：

@@ -213,7 +213,7 @@ corpus_name = "movie_corpus"

    len（words_to_keep）/ len（self.word2index）））

-5.  Finally, our indices are rebuilt from the new **words_to_keep** list. We set all the indices to their initial empty values and then repopulate them by looping through our kept words with the **addWord** function:
+5.  最后，我们的索引从新的`words_to_keep`列表中重建。我们将所有的索引设置为初始的空值，然后通过`addWord`函数循环浏览我们保留的单词来重新填充它们。

    self.word2index = {}

@@ -237,7 +237,7 @@ corpus_name = "movie_corpus"

 我们将通过以下步骤开始加载数据：

-1.  The first step for reading in our data is to perform any necessary steps to clean the data and make it more human-readable. We start by converting it from Unicode into ASCII format. We can easily use a function to do this:
+1.  读取我们的数据的第一步是执行任何必要的步骤来清理数据，使其更易于人类阅读。我们首先将数据从Unicode转换为ASCII格式。我们可以很容易地使用一个函数来完成这个工作。

    def unicodeToAscii（s）：

@@ -249,7 +249,7 @@ corpus_name = "movie_corpus"

    )

-2.  Next, we want to process our input strings so that they are all in lowercase and do not contain any trailing whitespace or punctuation, except the most basic characters. We can do this by using a series of regular expressions:
+2.  接下来，我们要处理我们的输入字符串，使它们都是小写的，除了最基本的字符外，不包含任何尾部的空格或标点符号。我们可以通过使用一系列的正则表达式来实现。

    def cleanString（s）：

@@ -263,7 +263,7 @@ corpus_name = "movie_corpus"

    返回 s

-3.  Finally, we apply this function within a wider function—**readVocs**. This function reads our data file into lines and then applies the **cleanString** function to every line. It also creates an instance of the **Vocabulary** class that we created earlier, meaning this function outputs both our data and vocabulary:
+3.  最后，我们在一个更广泛的函数--`readVocs`中应用这个函数。这个函数将我们的数据文件读成行，然后将`cleanString`函数应用到每一行。它还创建了一个我们前面创建的`Vocabulary`类的实例，这意味着这个函数同时输出我们的数据和词汇。

    def readVocs（数据文件，语料库名称）：

@@ -279,7 +279,7 @@ corpus_name = "movie_corpus"

    接下来，我们根据输入对的最大长度对其进行过滤。 再次这样做是为了减少我们模型的潜在维数。 预测数百个单词长的句子将需要非常深的架构。 为了节省训练时间，我们希望将此处的训练数据限制为输入和输出少于 10 个字长的实例。

-4.  To do this, we create a couple of filter functions. The first one, **filterPair**, returns a Boolean value based on whether the current line has an input and output length that is less than the maximum length. Our second function, **filterPairs**, simply applies this condition to all the pairs within our dataset, only keeping the ones that meet this condition:
+4.  为此，我们创建了几个过滤函数。第一个函数，`filterPair`，根据当前行的输入和输出长度是否小于最大长度，返回一个布尔值。我们的第二个函数`filterPairs`，简单地将此条件应用于数据集中的所有对，只保留满足此条件的对。

    def filterPair（p，max_length）：

@@ -289,7 +289,7 @@ corpus_name = "movie_corpus"

    返回[成对成对，如果 filterPair（pair，max_length）]

-5.  Now, we just need to create one final function that applies all the previous functions we have put together and run it to create our vocabulary and data pairs:
+5.  现在，我们只需要创建一个最后的函数，应用我们之前整理的所有函数，并运行它来创建我们的词汇和数据对。

    def loadData（语料库，语料库名称，数据文件，save_dir，max_length）：

@@ -321,7 +321,7 @@ corpus_name = "movie_corpus"

    图 8.7 –数据集中句子的值

-6.  We can print a selection of our processed input/output pairs in order to verify that our functions have all worked correctly:
+6.  我们可以打印我们处理过的输入/输出对中的一部分，以验证我们的函数是否全部正确工作。

    打印（“示例对：”）

@@ -345,7 +345,7 @@ corpus_name = "movie_corpus"

 您可能还记得我们在词汇表中内置了`trim`函数，这使我们能够从词汇表中删除不经常出现的单词。 现在，我们可以创建一个函数来删除这些稀有单词，并从词汇表中调用`trim`方法，这是我们的第一步。 您将看到，这从我们的词汇表中删除了大部分单词，这表明我们词汇表中的大多数单词很少出现。 这是可以预期的，因为任何语言模型中的单词分布都会遵循长尾分布。 我们将使用以下步骤删除单词：

-1.  We first calculate the percentage of words that we will keep within our model:
+1.  我们首先要计算出我们将保留在模型中的词的百分比。

    def removeRareWords（voc，all_pairs，最小值）：

@@ -357,7 +357,7 @@ corpus_name = "movie_corpus"

    图 8.9 –要保留的单词百分比

-2.  Within this same function, we loop through all the words in the input and output sentences. If for a given pair either the input or output sentence has a word that isn't in our new trimmed corpus, we drop this pair from our dataset. We print the output and see that even though we have dropped over half of our vocabulary, we only drop around 17% of our training pairs. This again reflects how our corpus of words is distributed over our individual training pairs:
+2.  在这个函数中，我们循环检查输入和输出句子中的所有单词。如果对于一个给定的对子，无论是输入句还是输出句都有一个不在我们新修剪的语料中的单词，我们就从我们的数据集中删除这个对子。我们打印输出结果，发现即使我们放弃了一半以上的词汇，也只放弃了 17% 左右的训练对。这再次反映了我们的词汇语料库是如何分布在我们的各个训练对上的。

    pair_to_keep = []

@@ -409,7 +409,7 @@ corpus_name = "movie_corpus"

 我们知道我们的模型不会将原始文本作为输入，而是将句子的张量表示作为输入。 我们也不会一一处理句子，而是分批量。 为此，我们需要将输入和输出语句都转换为张量，其中张量的宽度表示我们希望在其上训练的批量的大小：

-1.  We start by creating several helper functions, which we can use to transform our pairs into tensors. We first create a **indexFromSentence** function, which grabs the index of each word in the sentence from the vocabulary and appends an EOS token to the end:
+1.  我们首先创建几个辅助函数，用来将我们的词对转化为时序。我们首先创建一个`indexFromSentence`函数，它从词汇中抓取句子中每个单词的索引，并在句尾附加一个EOS标记。

    def indexFromSentence（voc，句子）：

@@ -417,7 +417,7 @@ corpus_name = "movie_corpus"

    send.split（）] + [EOS_token]

-2.  Secondly, we create a **zeroPad** function, which pads any tensors with zeroes so that all of the sentences within the tensor are effectively the same length:
+2.  其次，我们创建了一个`zeroPad`函数，它可以将任何张量用零来填充，这样张量中的所有句子实际上都是相同的长度。

    def zeroPad（l，fillvalue = PAD_token）：

@@ -425,7 +425,7 @@ corpus_name = "movie_corpus"

    fillvalue = fillvalue））

-3.  Then, to generate our input tensor, we apply both of these functions. First, we get the indices of our input sentence, then apply padding, and then transform the output into **LongTensor**. We will also obtain the lengths of each of our input sentences out output this as a tensor:
+3.  然后，为了生成我们的输入张量，我们应用这两个函数。首先，我们得到我们输入句子的指数，然后应用填充，然后将输出转化为`LongTensor`。我们还将获得我们每个输入句子的长度输出这个作为一个张量。

    def inputVar（l，voc）：

@@ -441,7 +441,7 @@ corpus_name = "movie_corpus"

    返回垫张量，长度

-4.  Within our network, our padded tokens should generally be ignored. We don't want to train our model on these padded tokens, so we create a Boolean mask to ignore these tokens. To do so, we use a **getMask** function, which we apply to our output tensor. This simply returns`1`if the output consists of a word and`0`if it consists of a padding token:
+4.  在我们的网络中，我们的padded tokens一般应该被忽略。我们不想在这些填充的标记上训练我们的模型，所以我们创建一个布尔掩码来忽略这些标记。为此，我们使用`getMask`函数，将其应用到我们的输出张量上。如果输出由一个词组成，则返回`1`，如果由一个填充标记组成，则返回`0`。

    def getMask（l，value = PAD_token）：

@@ -463,7 +463,7 @@ corpus_name = "movie_corpus"

    返回 m

-5.  We then apply this to our **outputVar** function. This is identical to the **inputVar** function, except that along with the indexed output tensor and the tensor of lengths, we also return the Boolean mask of our output tensor. This Boolean mask just returns **True** when there is a word within the output tensor and **False** when there is a padding token. We also return the maximum length of sentences within our output tensor:
+5.  然后我们将其应用于`outputVar`函数。这和`inputVar`函数是一样的，只是除了有索引的输出张量和长度张量之外，我们还返回输出张量的布尔掩码。这个布尔掩码只是在输出张量内有词时返回`True`，有填充标记时返回`False`。我们还返回输出张量中句子的最大长度。

    def outputVar（l，voc）：

@@ -483,7 +483,7 @@ corpus_name = "movie_corpus"

    返回 padTensor，遮罩，max_target_len

-6.  Finally, in order to create our input and output batches concurrently, we loop through the pairs in our batch and create input and output tensors for both pairs using the functions we created previously. We then return all the necessary variables:
+6.  最后，为了同时创建我们的输入和输出批次，我们循环浏览批次中的对，并使用之前创建的函数为两个对创建输入和输出时序。然后我们返回所有必要的变量。

    def batch2Train（voc，batch）：

@@ -507,7 +507,7 @@ corpus_name = "movie_corpus"

    返回 inp，长度，输出，掩码，max_target_len

-7.  This function should be all we need to transform our training pairs into tensors for training our model. We can validate that this is working correctly by performing a single iteration of our **batch2Train** function on a random selection of our data. We set our batch size to`5`and run this once:
+7.  这个函数应该是我们将训练对转化为训练模型所需的全部内容。我们可以通过在随机选择的数据上执行`batch2Train`函数的单次迭代来验证这个函数是否正确。我们将我们的批次大小设置为`5`，然后运行一次。

    test_batch_size = 5

@@ -537,7 +537,7 @@ corpus_name = "movie_corpus"

 现在，我们将通过以下步骤创建编码器：

-1.  As with all of our PyTorch models, we start by creating an **Encoder** class that inherits from **nn.Module**. All the elements here should look familiar to the ones used in previous chapters:
+1.  与我们所有的 PyTorch 模型一样，我们首先创建一个`Encoder`类，该类继承自`nn.Module`。这里的所有元素看起来都应该和前面章节中使用的元素一样熟悉。

    EncoderRNN（nn.Module）类：

@@ -561,7 +561,7 @@ corpus_name = "movie_corpus"

    c）事实证明，GRU 在学习小型数据集方面比 LSTM 更有效。 由于我们的训练数据的规模相对于我们要学习的任务的复杂性而言较小，因此我们应该选择使用 GRU。

-2.  We now define our GRU, taking into account the size of our input, the number of layers, and whether we should implement dropout:
+2.  现在我们定义我们的GRU，考虑到输入的大小，层数，以及是否应该实现丢弃。

    self.gru = nn.GRU（hidden_size，hidden_size，n_layers，

@@ -577,7 +577,7 @@ corpus_name = "movie_corpus"

    我们在输入句子中保持两个隐藏状态以及每一步的输出。

-3.  Next, we need to create a forward pass for our encoder. We do this by first embedding our input sentences and then using the **pack_padded_sequence** function on our embeddings. This function "packs" our padded sequence so that all of our inputs are of the same length. We then pass out the packed sequences through our GRU to perform a forward pass:
+3.  接下来，我们需要为我们的编码器创建一个正向传递。我们首先将输入句子嵌入，然后使用`pack_padded_sequence`函数对我们的嵌入进行处理。这个函数对我们的填充序列进行 "打包"，使我们所有的输入都具有相同的长度。然后，我们将打包后的序列通过GRU传递出去，进行前向传递。

    def forward（自我，input_seq，input_lengths，hidden = None）：

@@ -589,7 +589,7 @@ corpus_name = "movie_corpus"

    输出，隐藏= self.gru（打包，隐藏）

-4.  After this, we unpack our padding and sum the GRU outputs. We can then return this summed output, along with our final hidden state, to complete our forward pass:
+4.  在这之后，我们解包我们的填充并对GRU输出进行求和。然后，我们可以返回这个加和后的输出，以及我们最终的隐藏状态，来完成我们的前向传递。

    输出，_ = nn.utils.rnn.pad_packed_sequence（输出）

@@ -605,7 +605,7 @@ corpus_name = "movie_corpus"

 接下来，我们需要构建我们的注意力模块，该模块将应用于我们的编码器，以便我们可以从编码器输出的相关部分中学习。 我们将按照以下方式进行：

-1.  Start by creating a class for the attention model:
+1.  首先为注意力模型创建一个类。

    类 Attn（nn.Module）：

@@ -615,13 +615,13 @@ corpus_name = "movie_corpus"

    self.hidden_size = hidden_size

-2.  Then, create the **dot_score** function within this class. This function simply calculates the dot product of our encoder output with the output of our hidden state by our encoder. While there are other ways of transforming these two tensors into a single representation, using a dot product is one of the simplest:
+2.  然后，在这个类中创建`dot_score`函数。这个函数简单地计算我们的编码器输出与我们的编码器输出的隐藏状态的点积。虽然还有其他的方法可以将这两个 tensors 转化为单一的表示方式，但使用点积是最简单的方法之一。

    def dot_score（自身，隐藏，encoder_output）：

    返回 torch.sum（隐藏* encoder_output，dim = 2）

-3.  We then use this function within our forward pass. First, calculate the attention weights/energies based on the **dot_score** method, then transpose the results, and return the softmax transformed probability scores:
+3.  然后，我们在前传内使用这个函数。首先，根据`dot_score`方法计算注意力权重/能量，然后对结果进行转置，并返回软max变换后的概率分数。

    def forward（自身，隐藏，encoder_outputs）：

@@ -639,7 +639,7 @@ corpus_name = "movie_corpus"

 我们现在将构造解码器，如下所示：

-1.  We begin by creating our **DecoderRNN** class, inheriting from **nn.Module** and defining the initialization parameters:
+1.  我们首先创建`DecoderRNN`类，继承自`nn.Module`并定义初始化参数。

    类 DecoderRNN（nn.Module）：