fix typo in chapter_recurrent-neural-networks/ (#1013)

* fix typo in chapter_recurrent-neural-networks/sequence.md * fix typo in chapter_recurrent-neural-networks/text-preprocessing.md * change the Chinese translation of 'surprisal' in chapter_recurrent-neural-networks/rnn.md as it is in chapter_linear-networks/softmax-regression.md * fix typo in in chapter_recurrent-neural-networks/rnn.md * fix typo in chapter_recurrent-neural-networks/sequence.md * change the Chinese name of the book "The War of the Worlds" in chapter_recurrent-neural-networks/rnn-scratch.md * fix typo in chapter_recurrent-neural-networks/rnn-scratch.md * fix typo in chapter_recurrent-neural-networks/rnn-concise.md * fix typo in chapter_recurrent-neural-networks/bptt.md

fix typo in chapter_recurrent-neural-networks/ (#1013)
* fix typo in chapter_recurrent-neural-networks/sequence.md * fix typo in chapter_recurrent-neural-networks/text-preprocessing.md * change the Chinese translation of 'surprisal' in chapter_recurrent-neural-networks/rnn.md as it is in chapter_linear-networks/softmax-regression.md * fix typo in in chapter_recurrent-neural-networks/rnn.md * fix typo in chapter_recurrent-neural-networks/sequence.md * change the Chinese name of the book "The War of the Worlds" in chapter_recurrent-neural-networks/rnn-scratch.md * fix typo in chapter_recurrent-neural-networks/rnn-scratch.md * fix typo in chapter_recurrent-neural-networks/rnn-concise.md * fix typo in chapter_recurrent-neural-networks/bptt.md
f00b36b1 · P. Yao · GitHub · 86afa3d0 · f00b36b1 · f00b36b1
6 changed file
--- a/chapter_recurrent-neural-networks/bptt.md
+++ b/chapter_recurrent-neural-networks/bptt.md
@@ -44,7 +44,7 @@
 在这个简化模型中，我们将时间步$t$的隐状态表示为$h_t$，
 输入表示为$x_t$，输出表示为$o_t$。
 回想一下我们在 :numref:`subsec_rnn_w_hidden_states`中的讨论，
-输入和隐状态可以拼接为隐藏层中的一个权重变量。
+输入和隐状态可以拼接后与隐藏层中的一个权重变量相乘。
 因此，我们分别使用$w_h$和$w_o$来表示隐藏层和输出层的权重。
 每个时间步的隐状态和输出可以写为：


--- a/chapter_recurrent-neural-networks/rnn-concise.md
+++ b/chapter_recurrent-neural-networks/rnn-concise.md
@@ -183,7 +183,7 @@ class RNNModel(nn.Module):
                                 batch_size, self.num_hiddens), 
                                device=device)
        else:
-            # `nn.LSTM` 以张量作为隐状态
+            # `nn.LSTM` 以元组作为隐状态
            return (torch.zeros((
                self.num_directions * self.rnn.num_layers,
                batch_size, self.num_hiddens), device=device),

--- a/chapter_recurrent-neural-networks/rnn-scratch.md
+++ b/chapter_recurrent-neural-networks/rnn-scratch.md
@@ -439,7 +439,7 @@ predict_ch8('time traveller ', 10, net, vocab)
 例如，使用$\eta > 0$作为学习率时，在一次迭代中，
 我们将$\mathbf{x}$更新为$\mathbf{x} - \eta \mathbf{g}$。
 如果我们进一步假设目标函数$f$表现良好，
-表示伴随常数$L$的*利普希茨连续*（Lipschitz continuous）。
+即函数$f$在常数$L$下是*利普希茨连续的*（Lipschitz continuous）。
 也就是说，对于任意$\mathbf{x}$和$\mathbf{y}$我们有：

 $$|f(\mathbf{x}) - f(\mathbf{y})| \leq L \|\mathbf{x} - \mathbf{y}\|.$$
@@ -784,7 +784,7 @@ train_ch8(net, train_iter, vocab_random_iter, lr, num_epochs, strategy,
    * 你能将困惑度降到多少？
    * 用可学习的嵌入表示替换独热编码，是否会带来更好的表现？
    * 如果用H.G.Wells的其他书作为数据集时效果如何，
-      例如[*星球大战*](http://www.gutenberg.org/ebooks/36)？
+      例如[*世界大战*](http://www.gutenberg.org/ebooks/36)？
 1. 修改预测函数，例如使用采样，而不是选择最有可能的下一个字符。
    * 会发生什么？
    * 调整模型使之偏向更可能的输出，例如，当$\alpha > 1$，从$q(x_t \mid x_{t-1}, \ldots, x_1) \propto P(x_t \mid x_{t-1}, \ldots, x_1)^\alpha$中采样。

--- a/chapter_recurrent-neural-networks/rnn.md
+++ b/chapter_recurrent-neural-networks/rnn.md
@@ -244,7 +244,7 @@ Bengio等人首先提出使用神经网络进行语言建模
 例$2$则要糟糕得多，因为其产生了一个无意义的续写。
 尽管如此，至少该模型已经学会了如何拼写单词，
 以及单词之间的某种程度的相关性。
-最后，例$3$表明了训练不足的模型是无法正确地拟合数据。
+最后，例$3$表明了训练不足的模型是无法正确地拟合数据的。

 我们可以通过计算序列的似然概率来度量模型的质量。
 然而这是一个难以理解、难以比较的数字。
@@ -255,7 +255,7 @@ Bengio等人首先提出使用神经网络进行语言建模

 在这里，信息论可以派上用场了。
 我们在引入softmax回归
-（ :numref:`subsec_info_theory_basics`）时定义了熵、惊奇和交叉熵，
+（ :numref:`subsec_info_theory_basics`）时定义了熵、惊异和交叉熵，
 并在[信息论的在线附录](https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/information-theory.html)
 中讨论了更多的信息论知识。
 如果想要压缩文本，我们可以根据当前词元集预测的下一个词元。

--- a/chapter_recurrent-neural-networks/sequence.md
+++ b/chapter_recurrent-neural-networks/sequence.md
@@ -32,10 +32,10 @@
  在统计学中，前者（对超出已知观测范围进行预测）称为*外推法*（extrapolation），
  而后者（在现有观测值之间进行估计）称为*内插法*（interpolation）。
 * 在本质上，音乐、语音、文本和视频都是连续的。
-  如果它们的序列被我们重排，那么原有的意义就会失去。
+  如果它们的序列被我们重排，那么就会失去原有的意义。
  比如，一个文本标题“狗咬人”远没有“人咬狗”那么令人惊讶，尽管组成两句话的字完全相同。
 * 地震具有很强的相关性，即大地震发生后，很可能会有几次小余震，
-  这些余震的强度比非大地震的余震要大得多。
+  这些余震的强度比非大地震后的余震要大得多。
  事实上，地震是时空相关的，即余震通常发生在很短的时间跨度和很近的距离内。
 * 人类之间的互动也是连续的，这可以从微博上的争吵和辩论中看出。

@@ -50,7 +50,7 @@

 其中，用$x_t$表示价格，即在*时间步*（time step）
 $t \in \mathbb{Z}^+$时，观察到的价格$x_t$。
-请注意，$t$对于本文中的序列通常是离散的，并随整数或其子集而变化。
+请注意，$t$对于本文中的序列通常是离散的，并在整数或其子集上变化。
 假设一个交易员想在$t$日的股市中表现良好，于是通过以下途径预测$x_t$：

 $$x_t \sim P(x_t \mid x_{t-1}, \ldots, x_1).$$

--- a/chapter_recurrent-neural-networks/text-preprocessing.md
+++ b/chapter_recurrent-neural-networks/text-preprocessing.md
@@ -4,7 +4,7 @@
 对于序列数据处理问题，我们在 :numref:`sec_sequence`中
 评估了所需的统计工具和预测时面临的挑战。
 这样的数据存在许多种形式，文本是最常见例子之一。
-例如，一篇文章可以简单地看作是一串单词序列，甚至是一串字符序列。
+例如，一篇文章可以被简单地看作是一串单词序列，甚至是一串字符序列。
 本节中，我们将解析文本的常见预处理步骤。
 这些步骤通常包括：