revise resnet18 and Residual

406584a6 · Aston Zhang · 4b2a9241 · 406584a6 · 406584a6 · 406584a6
7 changed file
--- a/build/index.md
+++ b/build/index.md
@@ -2,8 +2,8 @@

 这是一个深度学习的教学项目。我们将使用 [Apache MXNet (incubating)](https://github.com/apache/incubator-mxnet) 的最新 gluon 接口来演示如何从0开始实现深度学习的各个算法。我们的将利用 [Jupyter notebook](http://jupyter.org/) 能将文档，代码，公式和图形统一在一起的优势，提供一个交互式的学习体验。这个项目可以作为一本书，上课用的材料，现场演示的案例，和一个可以尽情拷贝的代码库。据我们所知，目前并没有哪个项目能既覆盖全面深度学习，又提供交互式的可执行代码。我们将尝试弥补这个空白。

- [第一季十九课视频汇总](https://discuss.gluon.ai/t/topic/753)（本教程在不断改进中。最接近视频中的[版本](https://github.com/mli/gluon-tutorials-zh/archive/v0.61.zip)）
- [可打印的 PDF 版本在这里](./gluon_tutorials_zh.pdf)
+- [第一季十九课视频汇总](https://discuss.gluon.ai/t/topic/753)（本教程在不断改进中。最接近视频中的版本[在这里](https://github.com/mli/gluon-tutorials-zh/archive/v0.61.zip)）
+- 可打印的 PDF 版本[在这里](./gluon_tutorials_zh.pdf)
 - 课程源代码在 [Github](https://github.com/mli/gluon-tutorials-zh) （亲，给个好评加颗星）
 - 请使用 [http://discuss.gluon.ai/](http://discuss.gluon.ai/) 来进行讨论


--- a/chapter_appendix/gluonbook.md
+++ b/chapter_appendix/gluonbook.md
@@ -36,9 +36,9 @@

 * `read_voc_images`

-* `Residual`
+* `Residual`，[残差网络（ResNet）](../chapter_convolutional-neural-networks/resnet.md)

-* `resnet18`
+* `resnet18`，[多GPU计算的Gluon实现](../chapter_computational-performance/multiple-gpus-gluon.md)

 * `semilogy`，[欠拟合、过拟合和模型选择](../chapter_deep-learning-basics/underfit-overfit.md)


--- a/chapter_computational-performance/multiple-gpus-gluon.md
+++ b/chapter_computational-performance/multiple-gpus-gluon.md
@@ -10,16 +10,39 @@ sys.path.append('..')
 import gluonbook as gb
 import mxnet as mx
 from mxnet import autograd, gluon, init, nd
-from mxnet.gluon import loss as gloss, utils as gutils
+from mxnet.gluon import loss as gloss, nn, utils as gutils
 from time import time
 ```

 ## 多GPU上初始化模型参数

-我们使用ResNet-18来作为本节的样例模型。
+我们使用ResNet-18来作为本节的样例模型。我们将`resnet18`函数定义在`gluonbook`包中供后面章节调用。

 ```{.python .input  n=1}
-net = gb.resnet18(10)
+def resnet18(num_classes):
+    net = nn.Sequential()
+    net.add(nn.Conv2D(64, kernel_size=3, strides=1, padding=1),
+            nn.BatchNorm(), nn.Activation('relu'),
+            nn.MaxPool2D(pool_size=3, strides=2, padding=1))                 
+
+    def resnet_block(num_channels, num_residuals, first_block=False):
+        blk = nn.Sequential()
+        for i in range(num_residuals):
+            if i == 0 and not first_block:
+                blk.add(gb.Residual(num_channels, use_1x1conv=True,
+                                    strides=2))
+            else:
+                blk.add(gb.Residual(num_channels))
+        return blk 
+
+    net.add(resnet_block(64, 2, first_block=True),
+            resnet_block(128, 2), 
+            resnet_block(256, 2), 
+            resnet_block(512, 2)) 
+    net.add(nn.GlobalAvgPool2D(), nn.Dense(num_classes))
+    return net 
+
+net = resnet18(10)
 ```

 之前我们介绍了如何使用`initialize`函数的`ctx`参数在CPU或单个GPU上初始化模型参数。事实上，`ctx`可以接受一系列的CPU/GPU，从而使初始化好的模型参数复制到`ctx`里所有的CPU/GPU上。
@@ -40,7 +63,7 @@ net(gpu_x[0]), net(gpu_x[1])
 回忆一下[“模型参数的延后初始化”](../chapter_deep-learning-computation/deferred-init.md)一节中介绍的延后的初始化。现在，我们可以通过`data`访问初始化好的模型参数值了。需要注意的是，默认下`weight.data()`会返回CPU上的参数值。由于我们指定了2个GPU来初始化模型参数，我们需要指定GPU访问。我们看到，相同参数在不同的GPU上的值一样。

 ```{.python .input}
-weight = net[1].params.get('weight')
+weight = net[0].params.get('weight')
 try:
    weight.data()
 except:

--- a/chapter_computer-vision/image-augmentation.md
+++ b/chapter_computer-vision/image-augmentation.md
@@ -4,7 +4,7 @@

 首先，导入本节实验所需的包或模块。

-```{.python .input}
+```{.python .input  n=1}
 import sys
 sys.path.insert(0, '..')
 import gluonbook as gb
@@ -18,14 +18,14 @@ from time import time

 我们先读取一张$400\times 500$的图片作为样例。

-```{.python .input}
+```{.python .input  n=2}
 img = image.imread('../img/cat1.jpg')
 gb.plt.imshow(img.asnumpy())
 ```

 因为大部分的增广方法都有一定的随机性。接下来我们定义一个辅助函数，它对输入图片`img`运行多次增广方法`aug`并显示所有结果。

-```{.python .input  n=2}
+```{.python .input  n=3}
 def apply(img, aug, num_rows=2, num_cols=4, scale=1.5):
    Y = [aug(img) for _ in range(num_rows * num_cols)]
    gb.show_images(Y, num_rows, num_cols, scale)
@@ -35,13 +35,13 @@ def apply(img, aug, num_rows=2, num_cols=4, scale=1.5):

 左右翻转图片通常不物体的类别，它是最早也是最广泛使用的一种增广。下面我们使用transform模块里的`RandomFlipLeftRight`类来实现按0.5的概率左右翻转图片：

-```{.python .input  n=3}
+```{.python .input  n=4}
 apply(img, gdata.vision.transforms.RandomFlipLeftRight())
 ```

 上下翻转不如水平翻转通用，但是至少对于样例图片，上下翻转不会造成识别障碍。

-```{.python .input  n=4}
+```{.python .input  n=5}
 apply(img, gdata.vision.transforms.RandomFlipTopBottom())
 ```

@@ -49,7 +49,7 @@ apply(img, gdata.vision.transforms.RandomFlipTopBottom())

 下面代码里我们每次随机裁剪一片面积为原面积10%到100%的区域，其宽和高的比例在0.5和2之间，然后再将高宽缩放到200像素大小。

-```{.python .input  n=5}
+```{.python .input  n=6}
 shape_aug = gdata.vision.transforms.RandomResizedCrop(
    (200, 200), scale=(0.1, 1), ratio=(0.5, 2))
 apply(img, shape_aug)
@@ -59,19 +59,19 @@ apply(img, shape_aug)

 另一类增广方法是变化颜色。我们可以从四个维度改变图片的颜色：亮度、对比、饱和度和色相。在下面的例子里，我们将随机亮度改为原图的50%到150%。

-```{.python .input  n=6}
+```{.python .input  n=7}
 apply(img, gdata.vision.transforms.RandomBrightness(0.5))
 ```

 类似的，我们可以修改色相。

-```{.python .input  n=7}
+```{.python .input  n=8}
 apply(img, gdata.vision.transforms.RandomHue(0.5))
 ```

 或者用使用`RandomColorJitter`来一起使用。

-```{.python .input  n=8}
+```{.python .input  n=9}
 color_aug = gdata.vision.transforms.RandomColorJitter(
    brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
 apply(img, color_aug)
@@ -81,7 +81,7 @@ apply(img, color_aug)

 实际应用中我们会将多个增广叠加使用。`Compose`类可以将多个增广串联起来。

-```{.python .input  n=9}
+```{.python .input  n=10}
 augs = gdata.vision.transforms.Compose([
    gdata.vision.transforms.RandomFlipLeftRight(), color_aug, shape_aug])
 apply(img, augs)
@@ -91,13 +91,13 @@ apply(img, augs)

 接下来我们来看一个将图片增广应用在实际训练中的例子，并比较其与不使用时的区别。这里我们使用CIFAR-10数据集，而不是之前我们一直使用的Fashion-MNIST。原因在于Fashion-MNIST中物体位置和尺寸都已经归一化了，而CIFAR-10中物体颜色和大小区别更加显著。下面我们展示CIFAR-10中的前32张训练图片。

-```{.python .input  n=10}
+```{.python .input  n=11}
 gb.show_images(gdata.vision.CIFAR10(train=True)[0:32][0], 4, 8, scale=0.8);
 ```

 我们通常将图片增广用在训练样本上，但是在预测的时候并不使用随机增广。这里我们仅仅使用最简单的随机水平翻转。此外，我们使用`ToTensor`变换来将图片转成MXNet需要的格式，即格式为（批量，通道，高，宽）以及类型为32位浮点数。

-```{.python .input  n=11}
+```{.python .input  n=12}
 train_augs = gdata.vision.transforms.Compose([
    gdata.vision.transforms.RandomFlipLeftRight(),
    gdata.vision.transforms.ToTensor(),
@@ -110,7 +110,7 @@ test_augs = gdata.vision.transforms.Compose([

 接下来我们定义一个辅助函数来方便读取图片并应用增广。Gluon的数据集提供`transform_first`函数来对数据里面的第一项（数据一般有图片和标签两项）来应用增广。另外图片增广将增加计算复杂度，我们使用两个额外CPU进程加来加速计算。

-```{.python .input  n=12}
+```{.python .input  n=13}
 def load_cifar10(is_train, augs, batch_size):
    return gdata.DataLoader(
        gdata.vision.CIFAR10(train=is_train).transform_first(augs),
@@ -123,7 +123,7 @@ def load_cifar10(is_train, augs, batch_size):

 首先，我们定义`try_all_gpus`函数，从而能够使用所有可用的GPU。

-```{.python .input}
+```{.python .input  n=14}
 def try_all_gpus():
    ctxes = []
    try:
@@ -140,7 +140,7 @@ def try_all_gpus():

 然后，我们定义`evaluate_accuracy`函数评价模型的分类准确率。与[“Softmax回归的从零开始实现”](../chapter_deep-learning-basics/softmax-regression-scratch.md)和[“卷积神经网络（LeNet）”](../chapter_convolutional-neural-networks/lenet.md)两节中描述的`evaluate_accuracy`函数不同，当`ctx`包含多个GPU时，这里定义的函数通过辅助函数`_get_batch`将小批量数据样本划分并复制到各个GPU上。

-```{.python .input}
+```{.python .input  n=15}
 def _get_batch(batch, ctx):
    features, labels = batch
    if labels.dtype != features.dtype:
@@ -167,7 +167,7 @@ def evaluate_accuracy(data_iter, net, ctx=[mx.cpu()]):

 接下来，我们定义`train`函数使用多GPU训练并评价模型。

-```{.python .input}
+```{.python .input  n=16}
 def train(train_iter, test_iter, net, loss, trainer, ctx, num_epochs):
    print('training on', ctx)
    if isinstance(ctx, mx.Context):
@@ -198,8 +198,8 @@ def train(train_iter, test_iter, net, loss, trainer, ctx, num_epochs):

 现在，我们可以定义函数使用图片增广来训练模型了。

-```{.python .input  n=13}
-def train_with_data_aug(train_augs, test_augs, lr=0.1):
+```{.python .input  n=17}
+def train_with_data_aug(train_augs, test_augs, lr=0.01):
    batch_size = 256
    ctx = try_all_gpus()
    net = gb.resnet18(10)
@@ -214,13 +214,13 @@ def train_with_data_aug(train_augs, test_augs, lr=0.1):

 我们先观察使用了图片增广的结果。

-```{.python .input  n=14}
+```{.python .input  n=18}
 train_with_data_aug(train_augs, test_augs)
 ```

 作为对比，我们尝试只对训练数据做中间剪裁。

-```{.python .input  n=15}
+```{.python .input  n=19}
 train_with_data_aug(test_augs, test_augs)
 ```


--- a/chapter_computer-vision/neural-style.md
+++ b/chapter_computer-vision/neural-style.md
@@ -239,7 +239,7 @@ gb.plt.imsave('neural-style-2.png', postprocess(z).asnumpy())

 ## 小结

-通过匹配神经网络的中间层输出可以有效的融合不同图片的内容和样式。
+* 通过匹配神经网络的中间层输出可以有效的融合不同图片的内容和样式。

 ## 练习


--- a/chapter_convolutional-neural-networks/resnet.md
+++ b/chapter_convolutional-neural-networks/resnet.md
@@ -14,7 +14,7 @@ ResNet的基础块叫做残差块 (Residual Block) 。如下图所示，它将

 ResNet沿用了VGG全$3\times 3$卷积层设计。残差块里首先是两个有同样输出通道的$3\times 3$卷积层，每个卷积层后跟一个批量归一化层和ReLU激活层。然后我们将输入跳过这两个卷积层后直接加在最后的ReLU激活层前。这样的设计要求两个卷积层的输出与输入形状一样，从而可以相加。如果想改变输出的通道数，我们需要引入一个额外的$1\times 1$卷积层来将输入变换成需要的形状后再相加。

-残差块的实现见下。它可以设定输出通道数，是否使用额外的卷积层来修改输入通道数，以及卷积层的步幅大小。
+残差块的实现如下。它可以设定输出通道数，是否使用额外的卷积层来修改输入通道数，以及卷积层的步幅大小。我们将`Residual`类定义在`gluonbook`包中供后面章节调用。

 ```{.python .input  n=1}
 import sys

--- a/gluonbook/utils.py
+++ b/gluonbook/utils.py
@@ -263,42 +263,50 @@ def read_voc_images(root='../data/VOCdevkit/VOC2012', train=True):
    return data, label


-class Residual(nn.HybridBlock):
+class Residual(nn.Block):
    """The residual block."""
-    def __init__(self, channels, same_shape=True, **kwargs):
+    def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):
        super(Residual, self).__init__(**kwargs)
-        self.same_shape = same_shape
-        strides = 1 if same_shape else 2
-        self.conv1 = nn.Conv2D(channels, kernel_size=3, padding=1,
+        self.conv1 = nn.Conv2D(num_channels, kernel_size=3, padding=1,
                               strides=strides)
+        self.conv2 = nn.Conv2D(num_channels, kernel_size=3, padding=1)
+        if use_1x1conv:
+            self.conv3 = nn.Conv2D(num_channels, kernel_size=1,
+                                   strides=strides)
+        else:
+            self.conv3 = None
        self.bn1 = nn.BatchNorm()
-        self.conv2 = nn.Conv2D(channels, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm()
-        if not same_shape:
-            self.conv3 = nn.Conv2D(channels, kernel_size=1, strides=strides)

-    def hybrid_forward(self, F, x):
-        out = F.relu(self.bn1(self.conv1(x)))
-        out = self.bn2(self.conv2(out))
-        if not self.same_shape:
-            x = self.conv3(x)
-        return F.relu(out + x)
+    def forward(self, X):
+        Y = nd.relu(self.bn1(self.conv1(X)))
+        Y = self.bn2(self.conv2(Y))
+        if self.conv3:
+            X = self.conv3(X)
+        return nd.relu(Y + X)


 def resnet18(num_classes):
    """The ResNet-18 model."""
-    net = nn.HybridSequential()
-    net.add(nn.BatchNorm(),
-            nn.Conv2D(64, kernel_size=3, strides=1),
-            nn.MaxPool2D(pool_size=3, strides=2),
-            Residual(64),
-            Residual(64),
-            Residual(128, same_shape=False),
-            Residual(128),
-            Residual(256, same_shape=False),
-            Residual(256),
-            nn.GlobalAvgPool2D(),
-            nn.Dense(num_classes))
+    net = nn.Sequential()
+    net.add(nn.Conv2D(64, kernel_size=3, strides=1, padding=1),
+            nn.BatchNorm(), nn.Activation('relu'),
+            nn.MaxPool2D(pool_size=3, strides=2, padding=1))
+
+    def resnet_block(num_channels, num_residuals, first_block=False):
+        blk = nn.Sequential()
+        for i in range(num_residuals):
+            if i == 0 and not first_block:
+                blk.add(Residual(num_channels, use_1x1conv=True, strides=2))
+            else:
+                blk.add(Residual(num_channels))
+        return blk
+
+    net.add(resnet_block(64, 2, first_block=True),
+            resnet_block(128, 2),
+            resnet_block(256, 2),
+            resnet_block(512, 2))
+    net.add(nn.GlobalAvgPool2D(), nn.Dense(num_classes))
    return net