finish doc 5.12

1fd528ac · ShusenTang · 469ca244 · 1fd528ac
隐藏空白更改
内联并排

Showing with 100 addition and 60 deletion

docs/chapter05_CNN/5.12_densenet.md docs/chapter05_CNN/5.12_densenet.md +100 -60

未找到文件。
--- a/docs/chapter05_CNN/5.12_densenet.md
+++ b/docs/chapter05_CNN/5.12_densenet.md
@@ -16,112 +16,160 @@ DenseNet的主要构建模块是稠密块（dense block）和过渡层（transit
 DenseNet使用了ResNet改良版的“批量归一化、激活和卷积”结构，我们首先在`conv_block`函数里实现这个结构。
-```{.python .input  n=1}
+``` python
-import d2lzh as d2l
+import time
-from mxnet import gluon, init, nd
+import torch
-from mxnet.gluon import nn
+from torch import nn, optim
+import torch.nn.functional as F
-def conv_block(num_channels):
-    blk = nn.Sequential()
+import sys
-    blk.add(nn.BatchNorm(), nn.Activation('relu'),
+sys.path.append("..") 
-            nn.Conv2D(num_channels, kernel_size=3, padding=1))
+import d2lzh_pytorch as d2l
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+def conv_block(in_channels, out_channels):
+    blk = nn.Sequential(nn.BatchNorm2d(in_channels), 
+                        nn.ReLU(),
+                        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
    return blk
 ```
 稠密块由多个`conv_block`组成，每块使用相同的输出通道数。但在前向计算时，我们将每块的输入和输出在通道维上连结。
-```{.python .input  n=2}
+``` python
-class DenseBlock(nn.Block):
+class DenseBlock(nn.Module):
-    def __init__(self, num_convs, num_channels, **kwargs):
+    def __init__(self, num_convs, in_channels, out_channels):
-        super(DenseBlock, self).__init__(**kwargs)
+        super(DenseBlock, self).__init__()
-        self.net = nn.Sequential()
+        net = []
-        for _ in range(num_convs):
+        for i in range(num_convs):
-            self.net.add(conv_block(num_channels))
+            in_c = in_channels + i * out_channels
+            net.append(conv_block(in_c, out_channels))
+        self.net = nn.ModuleList(net)
+        self.out_channels = in_channels + num_convs * out_channels # 计算输出通道数
    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
-            X = nd.concat(X, Y, dim=1)  # 在通道维上将输入和输出连结
+            X = torch.cat((X, Y), dim=1)  # 在通道维上将输入和输出连结
        return X
 ```
 在下面的例子中，我们定义一个有2个输出通道数为10的卷积块。使用通道数为3的输入时，我们会得到通道数为$3+2\times 10=23$的输出。卷积块的通道数控制了输出通道数相对于输入通道数的增长，因此也被称为增长率（growth rate）。
-```{.python .input  n=8}
+``` python
-blk = DenseBlock(2, 10)
+blk = DenseBlock(2, 3, 10)
-blk.initialize()
+X = torch.rand(4, 3, 8, 8)
-X = nd.random.uniform(shape=(4, 3, 8, 8))
 Y = blk(X)
-Y.shape
+Y.shape # torch.Size([4, 23, 8, 8])
 ```
 ## 5.12.2 过渡层
 由于每个稠密块都会带来通道数的增加，使用过多则会带来过于复杂的模型。过渡层用来控制模型复杂度。它通过$1\times1$卷积层来减小通道数，并使用步幅为2的平均池化层减半高和宽，从而进一步降低模型复杂度。
-```{.python .input  n=3}
+``` python
-def transition_block(num_channels):
+def transition_block(in_channels, out_channels):
-    blk = nn.Sequential()
+    blk = nn.Sequential(
-    blk.add(nn.BatchNorm(), nn.Activation('relu'),
+            nn.BatchNorm2d(in_channels), 
-            nn.Conv2D(num_channels, kernel_size=1),
+            nn.ReLU(),
-            nn.AvgPool2D(pool_size=2, strides=2))
+            nn.Conv2d(in_channels, out_channels, kernel_size=1),
+            nn.AvgPool2d(kernel_size=2, stride=2))
    return blk
 ```
 对上一个例子中稠密块的输出使用通道数为10的过渡层。此时输出的通道数减为10，高和宽均减半。
-```{.python .input}
+``` python
-blk = transition_block(10)
+blk = transition_block(23, 10)
-blk.initialize()
+blk(Y).shape # torch.Size([4, 10, 4, 4])
-blk(Y).shape
 ```
 ## 5.12.3 DenseNet模型
 我们来构造DenseNet模型。DenseNet首先使用同ResNet一样的单卷积层和最大池化层。
-```{.python .input}
+``` python
-net = nn.Sequential()
+net = nn.Sequential(
-net.add(nn.Conv2D(64, kernel_size=7, strides=2, padding=3),
+        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
-        nn.BatchNorm(), nn.Activation('relu'),
+        nn.BatchNorm2d(64), 
-        nn.MaxPool2D(pool_size=3, strides=2, padding=1))
+        nn.ReLU(),
+        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
 ```
 类似于ResNet接下来使用的4个残差块，DenseNet使用的是4个稠密块。同ResNet一样，我们可以设置每个稠密块使用多少个卷积层。这里我们设成4，从而与上一节的ResNet-18保持一致。稠密块里的卷积层通道数（即增长率）设为32，所以每个稠密块将增加128个通道。
 ResNet里通过步幅为2的残差块在每个模块之间减小高和宽。这里我们则使用过渡层来减半高和宽，并减半通道数。
-```{.python .input  n=5}
+``` python
 num_channels, growth_rate = 64, 32  # num_channels为当前的通道数
 num_convs_in_dense_blocks = [4, 4, 4, 4]
 for i, num_convs in enumerate(num_convs_in_dense_blocks):
-    net.add(DenseBlock(num_convs, growth_rate))
+    DB = DenseBlock(num_convs, num_channels, growth_rate)
+    net.add_module("DenseBlosk_%d" % i, DB)
    # 上一个稠密块的输出通道数
-    num_channels += num_convs * growth_rate
+    num_channels = DB.out_channels
    # 在稠密块之间加入通道数减半的过渡层
    if i != len(num_convs_in_dense_blocks) - 1:
-        net.add(transition_block(num_channels // 2))
+        net.add_module("transition_block_%d" % i, transition_block(num_channels, num_channels // 2))
+        num_channels = num_channels // 2
 ```
 同ResNet一样，最后接上全局池化层和全连接层来输出。
-```{.python .input}
+``` python
-net.add(nn.BatchNorm(), nn.Activation('relu'), nn.GlobalAvgPool2D(),
+net.add_module("BN", nn.BatchNorm2d(num_channels))
-        nn.Dense(10))
+net.add_module("relu", nn.ReLU())
+net.add_module("global_avg_pool", d2l.GlobalAvgPool2d()) # GlobalAvgPool2d的输出: (Batch, num_channels, 1, 1)
+net.add_module("fc", nn.Sequential(d2l.FlattenLayer(), nn.Linear(num_channels, 10))) 
+```
+我们尝试打印每个子模块的输出维度确保网络无误：
+``` python
+X = torch.rand((1, 1, 96, 96))
+for name, layer in net.named_children():
+    X = layer(X)
+    print(name, ' output shape:\t', X.shape)
+```
+输出：
+```
+0  output shape:	 torch.Size([1, 64, 48, 48])
+1  output shape:	 torch.Size([1, 64, 48, 48])
+2  output shape:	 torch.Size([1, 64, 48, 48])
+3  output shape:	 torch.Size([1, 64, 24, 24])
+DenseBlosk_0  output shape:	 torch.Size([1, 192, 24, 24])
+transition_block_0  output shape:	 torch.Size([1, 96, 12, 12])
+DenseBlosk_1  output shape:	 torch.Size([1, 224, 12, 12])
+transition_block_1  output shape:	 torch.Size([1, 112, 6, 6])
+DenseBlosk_2  output shape:	 torch.Size([1, 240, 6, 6])
+transition_block_2  output shape:	 torch.Size([1, 120, 3, 3])
+DenseBlosk_3  output shape:	 torch.Size([1, 248, 3, 3])
+BN  output shape:	 torch.Size([1, 248, 3, 3])
+relu  output shape:	 torch.Size([1, 248, 3, 3])
+global_avg_pool  output shape:	 torch.Size([1, 248, 1, 1])
+fc  output shape:	 torch.Size([1, 10])
 ```
 ## 5.12.4 获取数据并训练模型
 由于这里使用了比较深的网络，本节里我们将输入高和宽从224降到96来简化计算。
-```{.python .input}
+``` python
-lr, num_epochs, batch_size, ctx = 0.1, 5, 256, d2l.try_gpu()
+batch_size = 256
-net.initialize(ctx=ctx, init=init.Xavier())
+# 如出现“out of memory”的报错信息，可减小batch_size或resize
-trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})
 train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
-d2l.train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx,
-              num_epochs)
+lr, num_epochs = 0.001, 5
+optimizer = torch.optim.Adam(net.parameters(), lr=lr)
+d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)
+```
+输出：
+```
+training on  cuda
+epoch 1, loss 0.0020, train acc 0.834, test acc 0.749, time 27.7 sec
+epoch 2, loss 0.0011, train acc 0.900, test acc 0.824, time 25.5 sec
+epoch 3, loss 0.0009, train acc 0.913, test acc 0.839, time 23.8 sec
+epoch 4, loss 0.0008, train acc 0.921, test acc 0.889, time 24.9 sec
+epoch 5, loss 0.0008, train acc 0.929, test acc 0.884, time 24.3 sec
 ```
 ## 小结
@@ -129,18 +177,10 @@ d2l.train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx,
 * 在跨层连接上，不同于ResNet中将输入与输出相加，DenseNet在通道维上连结输入与输出。
 * DenseNet的主要构建模块是稠密块和过渡层。
-## 练习
-* DenseNet论文中提到的一个优点是模型参数比ResNet的更小，这是为什么？
-* DenseNet被人诟病的一个问题是内存或显存消耗过多。真的会这样吗？可以把输入形状换成$224\times 224$，来看看实际的消耗。
-* 实现DenseNet论文中的表1提出的不同版本的DenseNet [1]。
 ## 参考文献
 [1] Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, No. 2).
-## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/1664)
+-----------
+> 注：除代码外本节与原书此节基本相同，[原书传送门](https://zh.d2l.ai/chapter_convolutional-neural-networks/densenet.html)
-![](../img/qr_densenet.svg)