epochs 8 in image aug

d5b80b1b · Aston Zhang · 0a5539e4 · d5b80b1b
隐藏空白更改
内联并排

Showing with 23 addition and 22 deletion

chapter_computer-vision/image-augmentation.md chapter_computer-vision/image-augmentation.md +23 -22

未找到文件。
--- a/chapter_computer-vision/image-augmentation.md
+++ b/chapter_computer-vision/image-augmentation.md
 # 图像增广

+
 在[“深度卷积神经网络（AlexNet）”](../chapter_convolutional-neural-networks/alexnet.md)小节里我们提到过，大规模数据集是成功应用深度神经网络的前提。图像增广（image augmentation）技术通过对训练图像做一系列随机改变，来产生相似但又不同的训练样本，从而扩大训练数据集的规模。图像增广的另一种解释是，随机改变训练样本可以降低模型对某些属性的依赖，从而提高模型的泛化能力。例如，我们可以对图像进行不同方式的裁剪，使得感兴趣的物体出现在不同位置，从而让模型减轻对物体出现位置的依赖性。我们也可以调整亮度、色彩等因素来降低模型对色彩的敏感度。可以说，在当年AlexNet的成功中，图像增广技术功不可没。本小节我们将讨论这个在计算机视觉里被广泛使用的技术。

 首先，导入本节实验所需的包或模块。

-```{.python .input  n=21}
+```{.python .input  n=1}
 %matplotlib inline
 import gluonbook as gb
 import mxnet as mx
@@ -18,7 +19,7 @@ from time import time

 我们来读取一张形状为$400\times 500$的图像作为实验中的样例。

-```{.python .input  n=22}
+```{.python .input  n=2}
 gb.set_figsize()
 img = image.imread('../img/cat1.jpg')
 gb.plt.imshow(img.asnumpy())
@@ -26,7 +27,7 @@ gb.plt.imshow(img.asnumpy())

 下面定义绘图函数`show_images`。

-```{.python .input  n=23}
+```{.python .input  n=3}
 # 本函数已保存在 gluonbook 包中方便以后使用。
 def show_images(imgs, num_rows, num_cols, scale=2):
    figsize = (num_cols * scale, num_rows * scale)
@@ -41,7 +42,7 @@ def show_images(imgs, num_rows, num_cols, scale=2):

 大部分的图像增广方法都有一定的随机性。为了方便我们观察图像增广的效果，接下来我们定义一个辅助函数`apply`。该函数对输入图像`img`多次运行图像增广方法`aug`并展示所有的结果。

-```{.python .input  n=24}
+```{.python .input  n=4}
 def apply(img, aug, num_rows=2, num_cols=4, scale=1.5):
    Y = [aug(img) for _ in range(num_rows * num_cols)]
    show_images(Y, num_rows, num_cols, scale)
@@ -51,13 +52,13 @@ def apply(img, aug, num_rows=2, num_cols=4, scale=1.5):

 左右翻转图像通常不改变物体的类别。它是最早也是最广泛使用的一种图像增广方法。下面我们通过`transforms`模块创建`RandomFlipLeftRight`实例来实现一半几率的图像左右翻转。

-```{.python .input  n=25}
+```{.python .input  n=5}
 apply(img, gdata.vision.transforms.RandomFlipLeftRight())
 ```

 上下翻转不如左右翻转通用。但是至少对于样例图像，上下翻转不会造成识别障碍。下面我们创建`RandomFlipTopBottom`实例来实现一半几率的图像上下翻转。

-```{.python .input  n=26}
+```{.python .input  n=6}
 apply(img, gdata.vision.transforms.RandomFlipTopBottom())
 ```

@@ -65,7 +66,7 @@ apply(img, gdata.vision.transforms.RandomFlipTopBottom())

 在下面的代码里，我们每次随机裁剪出一块面积为原面积10%到100%的区域，且该区域的宽和高之比随机取自0.5和2之间，然后再将该区域的宽和高分别缩放到200像素。如无特殊说明，本节中$a$和$b$之间的随机数指的是从区间$[a,b]$中均匀采样所得到的连续值。

-```{.python .input  n=27}
+```{.python .input  n=7}
 shape_aug = gdata.vision.transforms.RandomResizedCrop(
    (200, 200), scale=(0.1, 1), ratio=(0.5, 2))
 apply(img, shape_aug)
@@ -75,19 +76,19 @@ apply(img, shape_aug)

 另一类增广方法是变化颜色。我们可以从四个方面改变图像的颜色：亮度、对比度、饱和度和色调。在下面的例子里，我们将图像的亮度随机变化为原图亮度的50%（$1-0.5$）到150%（$1+0.5$）之间。

-```{.python .input  n=28}
+```{.python .input  n=8}
 apply(img, gdata.vision.transforms.RandomBrightness(0.5))
 ```

 类似地，我们也可以随机变化图像的色调。

-```{.python .input  n=29}
+```{.python .input  n=9}
 apply(img, gdata.vision.transforms.RandomHue(0.5))
 ```

 我们也可以创建`RandomColorJitter`实例并同时设置如何随机变化图像的亮度（`brightness`）、对比度（`contrast`）、饱和度（`saturation`）和色调（`hue`）。

-```{.python .input  n=30}
+```{.python .input  n=10}
 color_aug = gdata.vision.transforms.RandomColorJitter(
    brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
 apply(img, color_aug)
@@ -97,7 +98,7 @@ apply(img, color_aug)

 实际应用中我们会将多个图像增广方法叠加使用。我们可以通过`Compose`实例将以上定义的多个图像增广方法叠加起来，再应用到每个图像之上。

-```{.python .input  n=31}
+```{.python .input  n=11}
 augs = gdata.vision.transforms.Compose([
    gdata.vision.transforms.RandomFlipLeftRight(), color_aug, shape_aug])
 apply(img, augs)
@@ -107,13 +108,13 @@ apply(img, augs)

 下面，我们来看一个将图像增广应用在实际训练中的例子。这里我们使用CIFAR-10数据集，而不是之前我们一直使用的Fashion-MNIST数据集。这是因为Fashion-MNIST数据集中物体的位置和尺寸都已经经过归一化处理，而在CIFAR-10数据集中物体的颜色和大小区别更加显著。以下展示了CIFAR-10数据集中前32张训练图像。

-```{.python .input  n=32}
+```{.python .input  n=12}
 show_images(gdata.vision.CIFAR10(train=True)[0:32][0], 4, 8, scale=0.8);
 ```

 为了在预测时得到确定的结果，我们通常只将图像增广应用在训练样本上，而不在预测时使用含随机操作的图像增广。在这里我们仅仅使用最简单的随机左右翻转。此外，我们使用`ToTensor`实例将小批量图像转成MXNet需要的格式，即形状为（批量大小，通道数，高，宽）、值域在0到1之间且类型为32位浮点数。

-```{.python .input  n=33}
+```{.python .input  n=13}
 flip_aug = gdata.vision.transforms.Compose([
    gdata.vision.transforms.RandomFlipLeftRight(),
    gdata.vision.transforms.ToTensor()])
@@ -124,7 +125,7 @@ no_aug = gdata.vision.transforms.Compose([

 接下来我们定义一个辅助函数来方便读取图像并应用图像增广。Gluon的数据集提供的`transform_first`函数将图像增广应用在每个训练样本（图像和标签）的第一个元素，即图像之上。有关`DataLoader`的详细介绍，可参考更早的[“图像分类数据集（Fashion-MNIST）”](../chapter_deep-learning-basics/fashion-mnist.md)一节。

-```{.python .input  n=34}
+```{.python .input  n=14}
 def load_cifar10(is_train, augs, batch_size):
    return gdata.DataLoader(
        gdata.vision.CIFAR10(train=is_train).transform_first(augs),
@@ -137,7 +138,7 @@ def load_cifar10(is_train, augs, batch_size):

 首先，我们定义`try_all_gpus`函数，从而能够获取所有可用的GPU。

-```{.python .input  n=35}
+```{.python .input  n=15}
 def try_all_gpus():  # 本函数已保存在 gluonbook 包中方便以后使用。
    ctxes = []
    try:
@@ -154,7 +155,7 @@ def try_all_gpus():  # 本函数已保存在 gluonbook 包中方便以后使用

 以下定义的辅助函数`_get_batch`将小批量数据样本`batch`划分并复制到`ctx`变量所包含的各个GPU上。

-```{.python .input}
+```{.python .input  n=16}
 def _get_batch(batch, ctx):
    features, labels = batch
    if labels.dtype != features.dtype:
@@ -167,7 +168,7 @@ def _get_batch(batch, ctx):

 然后，我们定义`evaluate_accuracy`函数评价模型的分类准确率。与[“Softmax回归的从零开始实现”](../chapter_deep-learning-basics/softmax-regression-scratch.md)和[“卷积神经网络（LeNet）”](../chapter_convolutional-neural-networks/lenet.md)两节中描述的`evaluate_accuracy`函数不同，这里定义的函数更加通用：它通过辅助函数`_get_batch`使用`ctx`变量所包含的所有GPU来评价模型。

-```{.python .input  n=36}
+```{.python .input  n=17}
 # 本函数已保存在 gluonbook 包中方便以后使用。
 def evaluate_accuracy(data_iter, net, ctx=[mx.cpu()]):
    if isinstance(ctx, mx.Context):
@@ -186,7 +187,7 @@ def evaluate_accuracy(data_iter, net, ctx=[mx.cpu()]):

 接下来，我们定义`train`函数使用多GPU训练并评价模型。

-```{.python .input  n=37}
+```{.python .input  n=18}
 # 本函数已保存在 gluonbook 包中方便以后使用。
 def train(train_iter, test_iter, net, loss, trainer, ctx, num_epochs):
    print('training on', ctx)
@@ -218,7 +219,7 @@ def train(train_iter, test_iter, net, loss, trainer, ctx, num_epochs):

 现在，我们可以定义`train_with_data_aug`函数使用图像增广来训练模型了。该函数获取了所有可用的GPU，并将Adam作为训练使用的优化算法，然后将图像增广应用于训练数据集之上，最后调用刚才定义的`train`函数训练并评价模型。

-```{.python .input  n=38}
+```{.python .input  n=19}
 def train_with_data_aug(train_augs, test_augs, lr=0.001):
    batch_size, ctx, net = 256, try_all_gpus(), gb.resnet18(10)
    net.initialize(ctx=ctx, init=init.Xavier())
@@ -227,20 +228,20 @@ def train_with_data_aug(train_augs, test_augs, lr=0.001):
    loss = gloss.SoftmaxCrossEntropyLoss()
    train_iter = load_cifar10(True, train_augs, batch_size)
    test_iter = load_cifar10(False, test_augs, batch_size)
-    train(train_iter, test_iter, net, loss, trainer, ctx, num_epochs=10)
+    train(train_iter, test_iter, net, loss, trainer, ctx, num_epochs=8)
 ```

 ### 有关图像增广的对比实验

 我们先观察使用了图像增广的结果。

-```{.python .input  n=39}
+```{.python .input  n=20}
 train_with_data_aug(flip_aug, no_aug)
 ```

 作为对比，下面我们尝试不使用图像增广。

-```{.python .input  n=40}
+```{.python .input  n=21}
 train_with_data_aug(no_aug, no_aug)
 ```