diff --git a/new/handson-nlp-pt-1x/8.md b/new/handson-nlp-pt-1x/8.md index 25ff203ba960bbad386bff2b1ed33506b6763ac3..bbfff5262289a8c46dc5a6d5e0d851fb49f157bd 100644 --- a/new/handson-nlp-pt-1x/8.md +++ b/new/handson-nlp-pt-1x/8.md @@ -657,7 +657,7 @@ corpus_name = "movie_corpus" self.dropout =辍学 -2. We then create our layers within this module. We will create an embedding layer and a corresponding dropout layer. We use GRUs again for our decoder; however, this time, we do not need to make our GRU layer bidirectional as we will be decoding the output from our encoder sequentially. We will also create two linear layers—one regular layer for calculating our output and one layer that can be used for concatenation. This layer is twice the width of the regular hidden layer as it will be used on two concatenated vectors, each with a length of **hidden_size**. We also initialize an instance of our attention module from the last section in order to be able to use it within our **Decoder** class: +2. 然后我们在这个模块中创建我们的层。我们将创建一个嵌入层和一个相应的丢弃层。我们再次为我们的解码器使用GRU;但是,这次我们不需要使我们的GRU层成为双向的,因为我们将依次对编码器的输出进行解码。我们还将创建两个线性层--一个是用于计算我们的输出的常规层,另一个是可用于连接的层。这个层的宽度是常规隐藏层的两倍,因为它将用于两个连通向量,每个向量的长度为`hidden_size`。我们还初始化了上一节中的注意力模块的一个实例,以便能够在我们的`Decoder`类中使用它。 self.embeddding =嵌入 @@ -671,7 +671,7 @@ corpus_name = "movie_corpus" self.attn = Attn(hidden_​​size) -3. After defining all of our layers, we need to create a forward pass for the decoder. Notice how the forward pass will be used one step (word) at a time. We start by getting the embedding of the current input word and making a forward pass through the GRU layer to get our output and hidden states: +3. 在定义了所有的层之后,我们需要为解码器创建一个前向通道。请注意前向通证将如何一步一步(单词)地使用。我们首先得到当前输入词的嵌入,然后通过GRU层进行前向通证,得到我们的输出和隐藏状态。 def forward(自我,input_step,last_hidden,encoder_outputs): @@ -681,7 +681,7 @@ corpus_name = "movie_corpus" rnn_output,隐藏= self.gru(嵌入,last_hidden) -4. Next, we use the attention module to get the attention weights from the GRU output. These weights are then multiplied by the encoder outputs to effectively give us a weighted sum of our attention weights and our encoder output: +4. 接下来,我们使用注意力模块从GRU输出中获取注意力权重。然后将这些权重与编码器输出相乘,从而有效地得到我们的注意力权重和编码器输出的加权和。 attn_weights = self.attn(rnn_output,编码器输出) @@ -689,7 +689,7 @@ corpus_name = "movie_corpus" 1)) -5. We then concatenate our weighted context vector with the output of our GRU and apply a **tanh** function to get out final concatenated output: +5. 然后,我们将加权上下文向量与GRU的输出相连接,并应用`tanh`函数得到最终的连接输出。 rnn_output = rnn_output.squeeze(0) @@ -699,7 +699,7 @@ corpus_name = "movie_corpus" concat_output =火炬.tanh(self.concat(concat_input)) -6. For the final step within our decoder, we simply use this final concatenated output to predict the next word and apply a **softmax** function. The forward pass finally returns this output, along with the final hidden state. This forward pass will be iterated upon, with the next forward pass using the next word in the sentence and this new hidden state: +6. 对于我们解码器内的最后一步,我们只需使用这个最终的连通输出来预测下一个词,并应用一个 **softmax** 函数。前向传递最后会返回这个输出,以及最终的隐藏状态。这个前向通证将被迭代,下一个前向通证将使用句子中的下一个词和这个新的隐藏状态。 输出= self.out(concat_output) @@ -713,7 +713,7 @@ corpus_name = "movie_corpus" 训练过程的第一步是为我们的模型定义损失的度量。 由于我们的输入张量可能由填充序列组成,由于我们输入的句子都具有不同的长度,因此我们不能简单地计算真实输出和预测输出张量之间的差。 为了解决这个问题,我们将定义一个损失函数,该函数将布尔掩码应用于输出,并且仅计算未填充令牌的损失: -1. In the following function, we can see that we calculate cross-entropy loss across the whole output tensors. However, to get the total loss, we only average over the elements of the tensor that are selected by the Boolean mask: +1. 在下面的函数中,我们可以看到,我们计算的是整个输出张量的交叉熵损失。然而,为了得到总损失,我们只对被布尔掩码选中的张量元素进行平均。 def NLLMaskLoss(inp,target,mask): @@ -727,7 +727,7 @@ corpus_name = "movie_corpus" 回波损耗,TotalN.item() -2. For the majority of our training, we need two main functions—one function, **train()**, which performs training on a single batch of our training data and another function, **trainIters()**, which iterates through our whole dataset and calls **train()** on each of the individual batches. We start by defining **train()** in order to train on a single batch of data. Create the **train()** function, then get the gradients to 0, define the device options, and initialize the variables: +2. 对于我们的大部分训练,我们需要两个主要函数--一个函数`train()`,它对我们的单批训练数据进行训练,另一个函数`trainIters()`,它遍历我们的整个数据集,并对每个单独的批次调用`train()`。我们先定义`train()`,以便对单批数据进行训练。创建`train()`函数,然后让梯度为0,定义设备选项,并初始化变量。 def train(输入变量,长度,目标变量,\ @@ -757,11 +757,11 @@ corpus_name = "movie_corpus" n_totals = 0 -3. Then, perform a forward pass of the inputs and sequence lengths though the encoder to get the output and hidden states: +3. 然后,通过编码器执行输入和序列长度的正向传递,得到输出和隐藏状态。 编码器输出,编码器隐藏=编码器(输入变量,长度) -4. Next, we create our initial decoder input, starting with SOS tokens for each sentence. We then set the initial hidden state of our decoder to be equal to that of the encoder: +4. 接下来,我们创建我们的初始解码器输入,从每个句子的 SOS 标记开始。然后我们将解码器的初始隐藏状态设置为与编码器的状态相等。 coder_input = torch.LongTensor([[[SOS_token for _ in \ @@ -773,11 +773,11 @@ corpus_name = "movie_corpus" 接下来,我们实施教师强迫。 如果您从上一章的老师强迫中回想起,当以给定的概率生成输出序列时,我们将使用真正的上一个输出标记而不是预测的上一个输出标记来生成输出序列中的下一个单词。 使用教师强制可以帮助我们的模型更快收敛。 但是,我们必须小心,不要使教师强迫率过高,否则我们的模型将过于依赖教师强迫,并且不会学会独立产生正确的输出。 -5. Determine whether we should use teacher forcing for the current step: +5. 确定我们是否应该对当前步骤使用教师强制。 use_TF =如果 random.random() -6. Then, if we do need to implement teacher forcing, run the following code. We pass each of our sequence batches through the decoder to obtain our output. We then set the next input as the true output (**target**). Finally, we calculate and accumulate the loss using our loss function and print this to the console: +6. 然后,如果我们确实需要实现教师强制,请运行以下代码。我们将每一个序列批次通过解码器来获得我们的输出。然后,我们将下一个输入设置为真实输出(**目标**)。最后,我们使用我们的损失函数计算和累积损失,并将其打印到控制台。 对于范围内的 t(max_target_len): @@ -797,7 +797,7 @@ corpus_name = "movie_corpus" n_totals + = nTotal -7. If we do not implement teacher forcing on a given batch, the procedure is almost identical. However, instead of using the true output as the next input into the sequence, we use the one generated by the model: +7. 如果我们不对给定的批次实施教师强迫,程序几乎是相同的。但是,我们不使用真实输出作为序列的下一个输入,而是使用模型生成的输出。 _,topi =解码器输出.topk(1) @@ -807,7 +807,7 @@ corpus_name = "movie_corpus" 解码器输入=解码器输入.to(设备) -8. Finally, as with all of our models, the final steps are to perform backpropagation, implement gradient clipping, and step through both of our encoder and decoder optimizers to update the weights using gradient descent. Remember that we clip out gradients in order to prevent the vanishing/exploding gradient problem, which was discussed in earlier chapters. Finally, our training step returns our average loss: +8. 最后,与我们所有的模型一样,最后的步骤是执行反向传播,实现梯度剪接,并通过我们的编码器和解码器优化器来使用梯度下降更新权重。请记住,我们剪掉梯度是为了防止梯度消失/爆炸的问题,这在前面的章节中已经讨论过。最后,我们的训练步骤返回我们的平均损失。 loss.backward() @@ -821,7 +821,7 @@ corpus_name = "movie_corpus" 返回总和(print_losses)/ n_totals -9. Next, as previously stated, we need to create the **trainIters()** function, which repeatedly calls our training function on different batches of input data. We start by splitting our data into batches using the **batch2Train** function we created earlier: +9. 接下来,如前所述,我们需要创建`trainIters()`函数,它在不同批次的输入数据上反复调用我们的训练函数。我们首先使用之前创建的`batch2Train`函数将我们的数据分成若干批次。 def trainIters(model_name,voc,对,编码器,解码器,\ @@ -843,7 +843,7 @@ corpus_name = "movie_corpus" 范围(n_iteration)] -10. We then create a few variables that will allow us to count iterations and keep track of the total loss over each epoch: +10. 然后,我们创建一些变量,使我们能够计算迭代次数,并跟踪每个时代的总损失。 打印('开始...') @@ -855,7 +855,7 @@ corpus_name = "movie_corpus" start_iteration = checkpoint ['iteration'] + 1 -11. Next, we define our training loop. For each iteration, we get a training batch from our list of batches. We then extract the relevant fields from our batch and run a single training iteration using these parameters. Finally, we add the loss from this batch to our overall loss: +11. 接下来,我们定义我们的训练循环。对于每次迭代,我们从我们的批次列表中得到一个训练批次。然后,我们从我们的批次中提取相关字段,并使用这些参数运行一次训练迭代。最后,我们将这个批次的损失加入到我们的总体损失中。 打印(“开始训练...”) @@ -879,7 +879,7 @@ corpus_name = "movie_corpus" print_loss + =损失 -12. On every iteration, we also make sure we print our progress so far, keeping track of how many iterations we have completed and what our loss was for each epoch: +12. 在每一次迭代中,我们还确保打印出迄今为止的进度,跟踪我们已经完成了多少次迭代,以及每个纪元的损失是多少。 如果迭代% print_every == 0: @@ -895,7 +895,7 @@ corpus_name = "movie_corpus" print_loss = 0 -13. For the sake of completion, we also need to save our model state after every few epochs. This allows us to revisit any historical models we have trained; for example, if our model were to begin overfitting, we could revert back to an earlier iteration: +13. 为了完成,我们还需要在每隔几个纪元后保存我们的模型状态。这让我们可以重新审视我们已经训练过的任何历史模型;例如,如果我们的模型开始过拟合,我们可以恢复到早期的迭代。 如果(迭代% save_every == 0): @@ -955,7 +955,7 @@ corpus_name = "movie_corpus" 在这里,我们有三个不同的响应,每个响应都同样有效。 因此,在与聊天机器人进行对话的每个阶段,都不会出现任何“正确”的响应。 因此,评估要困难得多。 测试聊天机器人是否产生有效输出的最直观方法是与之对话! 这意味着我们需要以一种使我们能够与其进行对话以确定其是否运行良好的方式来设置聊天机器人: -1. We will start by defining a class that will allow us to decode the encoded input and produce text. We do this by using what is known as a **greedy encoder**. This simply means that at each step of the decoder, our model takes the word with the highest predicted probability as the output. We start by initializing the **GreedyEncoder()** class with our pretrained encoder and decoder: +1. 我们首先要定义一个类,让我们能够对编码输入进行解码并生成文本。我们通过使用所谓的`GreedyEncoder`来实现这一目标。这简单地说,在解码器的每一步,我们的模型都将预测概率最高的词作为输出。我们先用预先训练好的编码器和解码器初始化`GreedyEncoder`类。 GreedySearchDecoder(nn.Module)类: @@ -967,7 +967,7 @@ corpus_name = "movie_corpus" self.decoder =解码器 -2. Next, define a forward pass for our decoder. We pass the input through our encoder to get our encoder's output and hidden state. We take the encoder's final hidden layer to be the first hidden input to the decoder: +2. 接下来,为我们的解码器定义一个正向传递。我们将输入通过编码器得到我们编码器的输出和隐藏状态。我们把编码器的最后一个隐藏层作为解码器的第一个隐藏输入。 def 转发(自己,input_seq,input_length,max_length): @@ -977,7 +977,7 @@ corpus_name = "movie_corpus" coder_hidden = encoder_hidden [:decoder.n_layers] -3. Then, create the decoder input with SOS tokens and initialize the tensors to append decoded words to (initialized as a single zero value): +3. 然后,用SOS令牌创建解码器输入,并初始化附加解码词的令牌(初始化为单个零值)。 coder_input = torch.ones(1,1,device = device,dtype = torch.long)* SOS_token @@ -985,7 +985,7 @@ corpus_name = "movie_corpus" all_scores = torch.zeros([0],device = device) -4. After that, iterate through the sequence, decoding one word at a time. We perform a forward pass through the encoder and add a **max** function to obtain the highest-scoring predicted word and its score, which we then append to the **all_tokens** and **all_scores** variables. Finally, we take this predicted token and use it as the next input to our decoder. After the whole sequence has been iterated over, we return the complete predicted sentence: +4. 之后,对序列进行迭代,每次解码一个词。我们对编码器进行正向传递,并添加一个`max`函数,以获得得分最高的预测词及其得分,然后将其追加到`all_tokens`和`all_scores`变量中。最后,我们将这个预测的标记作为我们解码器的下一个输入。在整个序列被迭代过后,我们返回完整的预测句。 对于 _ 范围(最大长度): @@ -1011,7 +1011,7 @@ corpus_name = "movie_corpus" 所有的部分都开始融合在一起。 我们具有已定义的训练和评估功能,因此最后一步是编写一个功能,该功能实际上会将我们的输入作为文本,将其传递给我们的模型,并从模型中获取响应。 这将是我们聊天机器人的“界面”,我们实际上在那里进行对话。 -5. We first define an **evaluate()** function, which takes our input function and returns the predicted output words. We start by transforming our input sentence into indices using our vocabulary. We then obtain a tensor of the lengths of each of these sentences and transpose it: +5. 我们首先定义一个`evaluate()`函数,它接受我们的输入函数并返回预测的输出词汇。我们首先使用我们的词汇将输入句子转化为指数。然后,我们获得这些句子中每个句子的长度的张量,并对其进行转置。 def 评估(编码器,解码器,搜索器,voc,句子,\ @@ -1025,7 +1025,7 @@ corpus_name = "movie_corpus" input_batch =火炬。LongTensor(indices).transpose(0,1) -6. Then, we assign our lengths and input tensors to the relevant devices. Next, run the inputs through the searcher (**GreedySearchDecoder**) to obtain the word indices of the predicted output. Finally, we transform these word indices back into word tokens before returning them as the function output: +6. 然后,我们将我们的长度和输入时序分配给相关设备。接下来,通过搜索器(`GreedySearchDecoder`)运行输入,以获得预测输出的词索引。最后,我们将这些词索引转化回词令牌,再作为函数输出返回。 input_batch = input_batch.to(设备) @@ -1041,7 +1041,7 @@ corpus_name = "movie_corpus" 返回 decoded_words -7. Finally, we create a **runchatbot** function, which acts as the interface with our chatbot. This function takes human-typed input and prints the chatbot's response. We create this function as a **while** loop that continues until we terminate the function or type **quit** as our input: +7. 最后,我们创建一个`runchatbot`函数,作为我们聊天机器人的接口。这个函数接受人类输入的信息并打印聊天机器人的响应。我们将这个函数创建为一个`while`循环,一直到我们终止该函数或输入`quit`为止。 def runchatbot(encoder, decoder, searcher, voc): @@ -1055,7 +1055,7 @@ corpus_name = "movie_corpus" 如果 input_sentence =='quit':中断 -8. We then take the typed input and normalize it, before passing the normalized input to our **evaluate()** function, which returns the predicted words from the chatbot: +8. 然后,我们将输入的类型化输入进行归一化处理,再将归一化输入传给我们的`evaluate()`函数,该函数返回聊天机器人的预测词。 input_sentence = cleanString(input_sentence) @@ -1063,7 +1063,7 @@ corpus_name = "movie_corpus" voc,input_sentence) -9. Finally, we take these output words and format them, ignoring the EOS and padding tokens, before printing the chatbot's response. Because this is a **while** loop, this allows us to continue the conversation with the chatbot indefinitely: +9. 最后,我们将这些输出词进行格式化,忽略EOS和填充标记,然后再打印聊天机器人的响应。因为这是一个`while`循环,这让我们可以无限期地继续与聊天机器人对话。 output_words [:] = [x 表示 output_words 中的 x,如果\ @@ -1077,7 +1077,7 @@ corpus_name = "movie_corpus" 当我们定义了所有必需的功能时,训练模型就成为一种情况或初始化我们的超参数并调用我们的训练功能: -1. We first initialize our hyperparameters. While these are only suggested hyperparameters, our models have been set up in a way that will allow them to adapt to whatever hyperparameters they are passed. It is good practice to experiment with different hyperparameters to see which ones result in an optimal model configuration. Here, you could experiment with increasing the number of layers in your encoder and decoder, increasing or decreasing the size of the hidden layers, or increasing the batch size. All of these hyperparameters will have an effect on how well your model learns, as well as a number of other factors, such as the time it takes to train the model: +1. 我们首先初始化我们的超参数。虽然这些只是建议的超参数,但我们的模型已经被设置为允许它们适应任何传递给它们的超参数的方式。用不同的超参数进行实验,看看哪些超参数能带来最佳的模型配置,这是一个很好的做法。在这里,你可以试验增加编码器和解码器的层数,增加或减少隐藏层的大小,或者增加批次大小。所有这些超参数都会对模型的学习效果产生影响,同时也会影响其他一些因素,例如训练模型所需的时间。 model_name ='聊天机器人模型' @@ -1091,7 +1091,7 @@ corpus_name = "movie_corpus" batch_size = 64 -2. After that, we can load our checkpoints. If we have previously trained a model, we can load the checkpoints and model states from previous iterations. This saves us from having to retrain our model each time: +2. 之后,我们可以加载我们的检查点。如果我们之前已经训练过一个模型,我们可以加载之前迭代中的检查点和模型状态。这就节省了我们每次都要重新训练我们的模型。 loadFilename =无 @@ -1113,7 +1113,7 @@ corpus_name = "movie_corpus" 您.__ dict__ =检查点['voc_dict'] -3. After that, we can begin to build our models. We first load our embeddings from the vocabulary. If we have already trained a model, we can load the trained embeddings layer: +3. 之后,我们可以开始构建我们的模型。我们首先从词汇中加载我们的嵌入。如果我们已经训练了一个模型,我们可以加载训练好的嵌入层。 嵌入= nn。嵌入(voc.num_words,hidden_​​size) @@ -1121,7 +1121,7 @@ corpus_name = "movie_corpus" embedding.load_state_dict(embedding_sd) -4. We then do the same for our encoder and decoder, creating model instances using the defined hyperparameters. Again, if we have already trained a model, we simply load the trained model states into our models: +4. 然后,我们对编码器和解码器做同样的工作,使用定义的超参数创建模型实例。同样,如果我们已经训练了一个模型,我们只需将训练好的模型状态加载到我们的模型中。 编码器= EncoderRNN(hidden_​​size,嵌入,\ @@ -1139,7 +1139,7 @@ corpus_name = "movie_corpus" coder.load_state_dict(decoder_sd) -5. Last but not least, we specify a device for each of our models to be trained on. Remember, this is a crucial step if you wish to use GPU training: +5. 最后但并非最不重要的是,我们为我们的每个模型指定一个要训练的设备。请记住,如果你想使用GPU训练,这是至关重要的一步。 编码器= encoder.to(设备) @@ -1157,7 +1157,7 @@ corpus_name = "movie_corpus" 我们首先初始化一些训练超参数。 以与模型超参数相同的方式,可以调整这些参数以影响训练时间以及模型的学习方式。 裁剪控制梯度裁剪,教师强迫控制我们在模型中使用教师强迫的频率。 请注意,我们如何使用教师强制比 1,以便始终使用教师强制。 降低教学强迫率将意味着我们的模型需要更长的时间才能收敛。 但是,从长远来看,这可能有助于我们的模型更好地自行生成正确的句子。 -6. We also need to define the learning rates of our models and our decoder learning ratio. You will find that your model performs better when the decoder carries out larger parameter updates during gradient descent. Therefore, we introduce a decoder learning ratio to apply a multiplier to the learning rate so that the learning rate is greater for the decoder than it is for the encoder. We also define how often our model prints and saves the results, as well as how many epochs we want our model to run for: +6. 我们还需要定义模型的学习率和解码器的学习率。你会发现,当解码器在梯度下降过程中进行较大的参数更新时,你的模型表现会更好。因此,我们引入一个解码器学习率,对学习率施加一个倍数,使解码器的学习率大于编码器的学习率。我们还定义了我们的模型打印和保存结果的频率,以及我们希望我们的模型运行多少个纪元。 save_dir ='./' @@ -1175,13 +1175,13 @@ corpus_name = "movie_corpus" save_every = 500 -7. Next, as always when training models in PyTorch, we switch our models to training mode to allow the parameters to be updated: +7. 接下来,和以往在 PyTorch 中训练模型时一样,我们将模型切换到训练模式,以便更新参数。 encoder.train() coder.train() -8. Next, we create optimizers for both the encoder and decoder. We initialize these as Adam optimizers, but other optimizers will work equally well. Experimenting with different optimizers may yield different levels of model performance. If you have trained a model previously, you can also load the optimizer states if required: +8. 接下来,我们为编码器和解码器创建优化器。我们将这些优化器初始化为Adam优化器,但其他优化器也同样适用。用不同的优化器进行实验可能会产生不同级别的模型性能。如果你之前已经训练过一个模型,如果需要的话,你也可以加载优化器的状态。 print('建筑优化器...') @@ -1203,7 +1203,7 @@ corpus_name = "movie_corpus" coder_optimizer_sd) -9. The final step before running the training is to make sure CUDA is configured to be called if you wish to use GPU training. To do this, we simply loop through the optimizer states for both the encoder and decoder and enable CUDA across all of the states: +9. 运行训练前的最后一步是确保CUDA被配置为被调用,如果你想使用GPU训练。要做到这一点,我们只需简单地循环编码器和解码器的优化器状态,并在所有状态中启用CUDA。 对于 encoder_optimizer.state.values()中的状态: @@ -1221,7 +1221,7 @@ corpus_name = "movie_corpus" 状态[k] = v.cuda() -10. Finally, we are ready to train our model. This can be done by simply calling the **trainIters** function with all the required parameters: +10. 最后,我们准备好训练我们的模型。这可以通过简单地调用`trainIters`函数来完成,其中包含所有所需参数。 打印(“开始训练!”) @@ -1255,17 +1255,17 @@ corpus_name = "movie_corpus" ### 既然我们已经成功创建并训练了我们的模型,那么现在该评估其性能了。 我们将通过以下步骤进行操作: -1. To begin the evaluation, we first switch our model into evaluation mode. As with all other PyTorch models, this is done to prevent any further parameter updates occurring within the evaluation process: +1. 为了开始评估,我们首先将模型切换到评估模式。与所有其他 PyTorch 模型一样,这样做是为了防止在评估过程中发生任何进一步的参数更新。 encoder.eval() coder.eval() -2. We also initialize an instance of **GreedySearchDecoder** in order to be able to perform the evaluation and return the predicted output as text: +2. 我们还初始化了一个`GreedySearchDecoder`的实例,以便能够进行评估,并将预测的输出结果作为文本返回 搜索者= GreedySearchDecoder(编码器,解码器) -3. Finally, to run the chatbot, we simply call the **runchatbot** function, passing it **encoder**, **decoder**, **searcher**, and **voc**: +3. 最后,要运行聊天机器人,我们只需调用`runchatbot`函数,将`encoder`、`decoder`、`searcher`和`voc`传递给它。 runchatbot(encoder, decoder, searcher, voc)