diff --git a/chapter_recurrent-neural-networks/bi-deep-rnn.md b/chapter_recurrent-neural-networks/bi-deep-rnn.md index 11cadf98428924b170193d14f6b2a85a0f11d8d8..fb1dfd4600498c3c617c229089b4c9f37809ca04 100644 --- a/chapter_recurrent-neural-networks/bi-deep-rnn.md +++ b/chapter_recurrent-neural-networks/bi-deep-rnn.md @@ -1,6 +1,46 @@ -# 双向和多层的循环神经网络 +# 多隐藏层和双向结构 +本章到目前为止介绍的循环神经网络只有一个单向的隐藏层:隐藏状态里的信息沿着时间步从早到晚依次传递。在实际中,我们有时会用到其他结构的循环神经网络。本节将分别介绍多隐藏层和双向结构。 + + +## 多隐藏层结构 + +给定时间步$t$的小批量输入$\boldsymbol{X}_t \in \mathbb{R}^{n \times x}$(样本数为$n$,输入个数为$x$)。在多隐藏层结构中, +设该时间步第$l$隐藏层的隐藏状态为$\boldsymbol{H}_t^{(l)} \in \mathbb{R}^{n \times h}$(隐藏单元个数为$h$),输出层变量为$\boldsymbol{O}_t \in \mathbb{R}^{n \times y}$(输出个数为$y$),隐藏层的激活函数为$\phi$。第一隐藏层的隐藏状态和之前的计算一样: + +$$\boldsymbol{H}_t^{(1)} = \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(1)} + \boldsymbol{H}_{t-1}^{(1)} \boldsymbol{W}_{hh}^{(1)} + \boldsymbol{b}_h^{(1)}),$$ + + +其中权重$\boldsymbol{W}_{xh}^{(1)} \in \mathbb{R}^{x \times h}, \boldsymbol{W}_{hh}^{(1)} \in \mathbb{R}^{h \times h}$和偏差 $\boldsymbol{b}_h^{(1)} \in \mathbb{R}^{1 \times h}$分别为第一隐藏层的模型参数。 + +假设隐藏层个数为$L$,当$1 < l \leq L$时,第$l$隐藏层的隐藏状态的表达式为 + +$$\boldsymbol{H}_t^{(l)} = \phi(\boldsymbol{H}_t^{(l-1)} \boldsymbol{W}_{xh}^{(l)} + \boldsymbol{H}_{t-1}^{(1)} \boldsymbol{W}_{hh}^{(l)} + \boldsymbol{b}_h^{(l)}),$$ + + +其中权重$\boldsymbol{W}_{xh}^{(l)} \in \mathbb{R}^{h \times h}, \boldsymbol{W}_{hh}^{(l)} \in \mathbb{R}^{h \times h}$和偏差 $\boldsymbol{b}_h^{(l)} \in \mathbb{R}^{1 \times h}$分别为第$l$隐藏层的模型参数。 + +最终,输出层的输出只需基于第$L$隐藏层的隐藏状态: + +$$\boldsymbol{O}_t = \boldsymbol{H}_t^{(L)} \boldsymbol{W}_{hy} + \boldsymbol{b}_y,$$ + +其中权重$\boldsymbol{W}_{hy} \in \mathbb{R}^{h \times y}$和偏差$\boldsymbol{b}_y \in \mathbb{R}^{1 \times y}$为输出层的模型参数。 + +多隐藏层循环神经网络结构如图6.3所示。我们将在下一节中实验多隐藏层循环神经网络。 + +![多隐藏层循环神经网络结构。](../img/deep-rnn.svg) + + + + + + +## 双向结构 + + +![双向循环神经网络结构。这里省略了输出层。](../img/bi-rnn.svg) + diff --git a/chapter_recurrent-neural-networks/hidden-state.md b/chapter_recurrent-neural-networks/hidden-state.md index d80e6150a2a8e5166217db1b8601b104a2e61c47..8be223cc93875f701afd73445119951ace5b89fb 100644 --- a/chapter_recurrent-neural-networks/hidden-state.md +++ b/chapter_recurrent-neural-networks/hidden-state.md @@ -18,7 +18,7 @@ $$\mathbb{P}(w_t \mid w_{t-(n-1)}, \ldots, w_{t-1}).$$ $$\boldsymbol{H} = \phi(\boldsymbol{X} \boldsymbol{W}_{xh} + \boldsymbol{b}_h),$$ -其中权重参数$\boldsymbol{W}_{xh} \in \mathbb{R}^{x \times h}$,偏差参数 $\boldsymbol{b}_h \in \mathbb{R}^{1 \times h}$,$h$为隐藏单元个数。我们之前也提到,上式的两项相加使用了广播机制。把隐藏变量$\boldsymbol{H}$作为输出层的输入,且设输出个数为$y$(例如分类问题中的类别数),输出层的输出 +其中权重参数$\boldsymbol{W}_{xh} \in \mathbb{R}^{x \times h}$,偏差参数 $\boldsymbol{b}_h \in \mathbb{R}^{1 \times h}$,$h$为隐藏单元个数。上式相加的两项形状不同,因此将按照广播机制相加(参见[“数据操作”](../chapter_crashcourse/ndarray.md)一节)。把隐藏变量$\boldsymbol{H}$作为输出层的输入,且设输出个数为$y$(例如分类问题中的类别数),输出层的输出 $$\boldsymbol{O} = \boldsymbol{H} \boldsymbol{W}_{hy} + \boldsymbol{b}_y,$$ diff --git a/img/bi-rnn.svg b/img/bi-rnn.svg index 40eccf128e76c27cd78b8132070a502e1f67d5c5..00ba913ea184545b6f50bef1e33f1ba59aedd23d 100644 --- a/img/bi-rnn.svg +++ b/img/bi-rnn.svg @@ -1,2 +1,325 @@ + -
h(f)1
[Not supported by viewer]
h(f)2
[Not supported by viewer]
h(f)T-1
[Not supported by viewer]
h(f)T
[Not supported by viewer]
...
[Not supported by viewer]
h(b)1
[Not supported by viewer]
h(b)2
[Not supported by viewer]
h(b)T-1
[Not supported by viewer]
h(b)T
[Not supported by viewer]
...
[Not supported by viewer]
x1
[Not supported by viewer]
x2
[Not supported by viewer]
xT-1
[Not supported by viewer]
xT
[Not supported by viewer]
h1
[Not supported by viewer]
h2
[Not supported by viewer]
hT-1
[Not supported by viewer]
hT
[Not supported by viewer]
\ No newline at end of file + + + + + + + + + + + + + + + + + + + + + + + + Produced by OmniGraffle 7.7.1 + 2018-06-04 00:14:35 +0000 + + + Canvas 1 + + Layer 2 + + + + + X + T-1 + + + + + + + X + T + + + + + + + X + 2 + + + + + + + X + 1 + + + + + + + + H + T-1 + + + + + + + + + + + H + T + + + + + + + + + + H + 1 + + + + + + + + + + H + 2 + + + + + + + + + + H + T-1 + + + + + + + H + T + + + + + + + H + 1 + + + + + + + H + 2 + + + + + + + + + + + + + + + + + + + H + 1 + + + + + + + H + 2 + + + + + + + H + T-1 + + + + + + + H + T + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + O + 1 + + + + + + + O + 2 + + + + + + + O + T-1 + + + + + + + O + T + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/img/deep-rnn.svg b/img/deep-rnn.svg new file mode 100644 index 0000000000000000000000000000000000000000..18f4d86d3c9dc07d3f7823f6f67ac2d6e3532a23 --- /dev/null +++ b/img/deep-rnn.svg @@ -0,0 +1,320 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Produced by OmniGraffle 7.7.1 + 2018-06-04 00:07:24 +0000 + + + Canvas 1 + + Layer 2 + + + + + X + T-1 + + + + + + + + + + + + H + T-1 + + + + + (1) + + + + + + + X + T + + + + + + + X + 2 + + + + + + + X + 1 + + + + + + + H + T + + + + + (1) + + + + + + + H + 2 + + + + + (1) + + + + + + + H + 1 + + + + + (1) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + H + T-1 + + + + + (L) + + + + + + + H + T + + + + + (L) + + + + + + + H + 2 + + + + + (L) + + + + + + + H + 1 + + + + + (L) + + + + + + + + + + + + O + T-1 + + + + + + + + + + + + O + T + + + + + + + O + 2 + + + + + + + O + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +