convert-markdown-into-html

1a64a4d4 · liaogang · 1f130732 · 1a64a4d4
隐藏空白更改
内联并排

Showing with 137 addition and 195 deletion

image_classification/index.en.html image_classification/index.en.html +137 -195

未找到文件。
--- a/image_classification/index.en.html
+++ b/image_classification/index.en.html
@@ -43,7 +43,7 @@
 Image Classification
 =======================

-The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle[Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.
+The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle [Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.

 ## Background

@@ -177,146 +177,73 @@ Figure 10. ResNet model for ImageNet
 </p>


-## Data Preparation
-
-### Data description and downloading
+## Dataset

 Commonly used public datasets for image classification are CIFAR(https://www.cs.toronto.edu/~kriz/cifar.html), ImageNet(http://image-net.org/), COCO(http://mscoco.org/), etc. Those used for fine-grained image classification are CUB-200-2011(http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), Stanford Dog(http://vision.stanford.edu/aditya86/ImageNetDogs/), Oxford-flowers(http://www.robots.ox.ac.uk/~vgg/data/flowers/), etc. Among them, ImageNet are the largest and most research results are reported on ImageNet as mentioned in Model Overview section. Since 2010, the data of Imagenet has gone through some changes. The commonly used ImageNet-2012 dataset contains 1000 categories. There are 1,281,167 training images, ranging from 732 to 1200 images per category, and 50,000 validation images with 50 images per category in average.

-Since ImageNet is too large to be downloaded and trained efficiently, we use CIFAR10 (https://www.cs.toronto.edu/~kriz/cifar.html) in this tutorial. The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Figure 11 shows all the classes in CIFAR10 as well as 10 images randomly sampled from each category.
+Since ImageNet is too large to be downloaded and trained efficiently, we use CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html) in this tutorial. The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Figure 11 shows all the classes in CIFAR-10 as well as 10 images randomly sampled from each category.

 <p align="center">
 <img src="image/cifar.png" width="350"><br/>
 Figure 11. CIFAR10 dataset[21]
 </p>

-The following command is used for downloading data and calculating the mean image used for data preprocessing.
-
-```bash
-./data/get_data.sh
-```
+ `paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess CIFAR-10.

-###  Data provider for PaddlePaddle
+After issuing a command `python train.py`, training will starting immediately. The details will be unpacked by the following sessions to see how it works.

-We use Python interface for providing data to PaddlePaddle. The following file dataprovider.py is a complete example for CIFAR10.
+## Model Architecture

- 'initializer' function performs initialization of dataprovider: loading the mean image, defining two input types -- image and label.
+### Initialize PaddlePaddle

- 'process' function sends preprocessed data to PaddlePaddle. Data preprocessing performed in this function includes data perturbation, random horizontal flipping, deducting mean image from the raw image.
+We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).

 ```python
-import numpy as np
-import cPickle
-from paddle.trainer.PyDataProvider2 import *
-
-def initializer(settings, mean_path, is_train, **kwargs):
-    settings.is_train = is_train
-    settings.input_size = 3 * 32 * 32
-    settings.mean = np.load(mean_path)['mean']
-    settings.input_types = {
-        'image': dense_vector(settings.input_size),
-        'label': integer_value(10)
-    }
-
-
-@provider(init_hook=initializer, pool_size=50000)
-def process(settings, file_list):
-    with open(file_list, 'r') as fdata:
-        for fname in fdata:
-            fo = open(fname.strip(), 'rb')
-            batch = cPickle.load(fo)
-            fo.close()
-            images = batch['data']
-            labels = batch['labels']
-            for im, lab in zip(images, labels):
-                if settings.is_train and np.random.randint(2):
-                    im = im.reshape(3, 32, 32)
-                    im = im[:,:,::-1]
-                    im = im.flatten()
-                im = im - settings.mean
-                yield {
-                    'image': im.astype('float32'),
-                    'label': int(lab)
-                }
-```
+import sys
+import paddle.v2 as paddle

-## Model Config
-
-### Data Definition
-
-In model config file, function `define_py_data_sources2` sets argument 'module' to dataprovider file for loading data, 'args' to mean image file. If the config file is used for prediction, then there is no need to set argument 'train_list'.
-
-```python
-from paddle.trainer_config_helpers import *
-
-is_predict = get_config_arg("is_predict", bool, False)
-if not is_predict:
-    define_py_data_sources2(
-        train_list='data/train.list',
-        test_list='data/test.list',
-        module='dataprovider',
-        obj='process',
-        args={'mean_path': 'data/mean.meta'})
-```
-
-### Algorithm Settings
-
-In model config file, function 'settings' specifies optimization algorithm, batch size, learning rate, momentum and L2 regularization.
-
-```python
-settings(
-    batch_size=128,
-    learning_rate=0.1 / 128.0,
-    learning_rate_decay_a=0.1,
-    learning_rate_decay_b=50000 * 100,
-    learning_rate_schedule='discexp',
-    learning_method=MomentumOptimizer(0.9),
-    regularization=L2Regularization(0.0005 * 128),)
+# PaddlePaddle init
+paddle.init(use_gpu=False, trainer_count=1)
 ```

-The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
-$$  lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
-where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate set in 'settings'.
-
-### Model Architecture
-
-Here we provide the cofig files for VGG and ResNet models.
+As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both VGG and ResNet models.

-#### VGG
+### VGG

-First we define VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we uses a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
+First, we use a VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we uses a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.

 1. Define input data and its dimension

-        The input to the network is defined as `data_layer`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.
+        The input to the network is defined as `paddle.layer.data`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.

    ```python
    datadim = 3 * 32 * 32
    classdim = 10
-    data = data_layer(name='image', size=datadim)
+    image = paddle.layer.data(
+        name="image", type=paddle.data_type.dense_vector(datadim))
    ```

 2. Define VGG main module

    ```python
-    net = vgg_bn_drop(data)
+    net = vgg_bn_drop(image)
    ```
-        The input to VGG main module is from data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
+        The input to VGG main module is from the data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:

    ```python
-    def vgg_bn_drop(input, num_channels):
-        def conv_block(ipt, num_filter, groups, dropouts, num_channels_=None):
-            return img_conv_group(
+    def vgg_bn_drop(input):
+        def conv_block(ipt, num_filter, groups, dropouts, num_channels=None):
+            return paddle.networks.img_conv_group(
                input=ipt,
-                num_channels=num_channels_,
+                num_channels=num_channels,
                pool_size=2,
                pool_stride=2,
                conv_num_filter=[num_filter] * groups,
                conv_filter_size=3,
-                conv_act=ReluActivation(),
+                conv_act=paddle.activation.Relu(),
                conv_with_batchnorm=True,
                conv_batchnorm_drop_rate=dropouts,
-                pool_type=MaxPooling())
+                pool_type=paddle.pooling.Max())

        conv1 = conv_block(input, 64, 2, [0.3, 0], 3)
        conv2 = conv_block(conv1, 128, 2, [0.4, 0])
@@ -324,16 +251,17 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
        conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])
        conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])

-        drop = dropout_layer(input=conv5, dropout_rate=0.5)
-        fc1 = fc_layer(input=drop, size=512, act=LinearActivation())
-        bn = batch_norm_layer(
-            input=fc1, act=ReluActivation(), layer_attr=ExtraAttr(drop_rate=0.5))
-        fc2 = fc_layer(input=bn, size=512, act=LinearActivation())
+        drop = paddle.layer.dropout(input=conv5, dropout_rate=0.5)
+        fc1 = paddle.layer.fc(input=drop, size=512, act=paddle.activation.Linear())
+        bn = paddle.layer.batch_norm(
+            input=fc1,
+            act=paddle.activation.Relu(),
+            layer_attr=paddle.attr.Extra(drop_rate=0.5))
+        fc2 = paddle.layer.fc(input=bn, size=512, act=paddle.activation.Linear())
        return fc2
-
    ```

-        2.1. First defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.trainer_config_helpers` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
+        2.1. First defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.


        2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
@@ -351,15 +279,12 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela

 4. Define Loss Function and Outputs

-        In the context of supervised learning, labels of training images are defined in `data_layer`, too. During training, cross-entropy is used as loss function and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
+        In the context of supervised learning, labels of training images are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.

    ```python
-    if not is_predict:
-        lbl = data_layer(name="label", size=class_num)
-        cost = classification_cost(input=out, label=lbl)
-        outputs(cost)
-    else:
-        outputs(out)
+    lbl = paddle.layer.data(
+        name="label", type=paddle.data_type.integer_value(classdim))
+    cost = paddle.layer.classification_cost(input=out, label=lbl)
    ```

 ### ResNet
@@ -367,13 +292,13 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
 The first, third and forth steps of a ResNet are the same as a VGG. The second one is the main module.

 ```python
-net = resnet_cifar10(data, depth=56)
+net = resnet_cifar10(data, depth=32)
 ```

 Here are some basic functions used in `resnet_cifar10`:

  - `conv_bn_layer` : convolutional layer followed by BN.
-  - `shortcut` : the shortcut branch in a residual block. There are two kinds of shortcuts: 1x1 convolution used when the number of channels between input and output are different; direct connection used otherwise.
+  - `shortcut` : the shortcut branch in a residual block. There are two kinds of shortcuts: 1x1 convolution used when the number of channels between input and output is different; direct connection used otherwise.

  - `basicblock` : a basic residual module as shown in the left of Figure 9, consisting of two sequential 3x3 convolutions and one "shortcut" branch.
  - `bottleneck` : a bottleneck module as shown in the right of Figure 9, consisting of a two 1x1 convolutions with one 3x3 convolution in between branch and a "shortcut" branch.
@@ -385,47 +310,38 @@ def conv_bn_layer(input,
                  filter_size,
                  stride,
                  padding,
-                  active_type=ReluActivation(),
+                  active_type=paddle.activation.Relu(),
                  ch_in=None):
-    tmp = img_conv_layer(
+    tmp = paddle.layer.img_conv(
        input=input,
        filter_size=filter_size,
        num_channels=ch_in,
        num_filters=ch_out,
        stride=stride,
        padding=padding,
-        act=LinearActivation(),
+        act=paddle.activation.Linear(),
        bias_attr=False)
-    return batch_norm_layer(input=tmp, act=active_type)
-
+    return paddle.layer.batch_norm(input=tmp, act=active_type)

 def shortcut(ipt, n_in, n_out, stride):
    if n_in != n_out:
-        return conv_bn_layer(ipt, n_out, 1, stride, 0, LinearActivation())
+        return conv_bn_layer(ipt, n_out, 1, stride, 0,
+                             paddle.activation.Linear())
    else:
        return ipt

 def basicblock(ipt, ch_out, stride):
-    ch_in = ipt.num_filters
+    ch_in = ch_out * 2
    tmp = conv_bn_layer(ipt, ch_out, 3, stride, 1)
-    tmp = conv_bn_layer(tmp, ch_out, 3, 1, 1, LinearActivation())
-    short = shortcut(ipt, ch_in, ch_out, stride)
-    return addto_layer(input=[ipt, short], act=ReluActivation())
-
-def bottleneck(ipt, ch_out, stride):
-    ch_in = ipt.num_filter
-    tmp = conv_bn_layer(ipt, ch_out, 1, stride, 0)
-    tmp = conv_bn_layer(tmp, ch_out, 3, 1, 1)
-    tmp = conv_bn_layer(tmp, ch_out * 4, 1, 1, 0, LinearActivation())
+    tmp = conv_bn_layer(tmp, ch_out, 3, 1, 1, paddle.activation.Linear())
    short = shortcut(ipt, ch_in, ch_out, stride)
-    return addto_layer(input=[ipt, short], act=ReluActivation())
+    return paddle.layer.addto(input=[tmp, short], act=paddle.activation.Relu())

 def layer_warp(block_func, ipt, features, count, stride):
    tmp = block_func(ipt, features, stride)
    for i in range(1, count):
        tmp = block_func(tmp, features, 1)
    return tmp
-
 ```

 The following are the components of `resnet_cifar10`:
@@ -437,106 +353,132 @@ The following are the components of `resnet_cifar10`:
 Note: besides the first convolutional layer and the last fully-connected layer, the total number of layers in three `layer_warp` should be dividable by 6, that is the depth of `resnet_cifar10` should satisfy $(depth - 2) % 6 == 0$.

 ```python
-def resnet_cifar10(ipt, depth=56):
+def resnet_cifar10(ipt, depth=32):
    # depth should be one of 20, 32, 44, 56, 110, 1202
    assert (depth - 2) % 6 == 0
    n = (depth - 2) / 6
    nStages = {16, 64, 128}
-    conv1 = conv_bn_layer(ipt,
-        ch_in=3,
-        ch_out=16,
-        filter_size=3,
-        stride=1,
-        padding=1)
+    conv1 = conv_bn_layer(
+        ipt, ch_in=3, ch_out=16, filter_size=3, stride=1, padding=1)
    res1 = layer_warp(basicblock, conv1, 16, n, 1)
    res2 = layer_warp(basicblock, res1, 32, n, 2)
    res3 = layer_warp(basicblock, res2, 64, n, 2)
-    pool = img_pool_layer(input=res3,
-                         pool_size=8,
-                         stride=1,
-                         pool_type=AvgPooling())
+    pool = paddle.layer.img_pool(
+        input=res3, pool_size=8, stride=1, pool_type=paddle.pooling.Avg())
    return pool
 ```

 ## Model Training

-We can train the model by running the script train.sh, which specifies config file, device type, number of threads, number of passes, path to the trained models, etc,
+### Define Parameters

-``` bash
-sh train.sh
-```
+First, we create the model parameters according to the previous model configuration `cost`.

-Here is an example script `train.sh`:
-
-```bash
-#cfg=models/resnet.py
-cfg=models/vgg.py
-output=output
-log=train.log
-
-paddle train \
-    --config=$cfg \
-    --use_gpu=true \
-    --trainer_count=1 \
-    --log_period=100 \
-    --num_passes=300 \
-    --save_dir=$output \
-    2>&1 | tee $log
+```python
+# Create parameters
+parameters = paddle.parameters.create(cost)
 ```

- `--config=$cfg` : specifies config file. The default is `models/vgg.py`.
- `--use_gpu=true` : uses GPU for training. If use CPU，set it to be false.
- `--trainer_count=1` : specifies the number of threads or GPUs.
- `--log_period=100` : specifies the number of batches between two logs.
- `--save_dir=$output` : specifies the path for saving trained models.
+### Create Trainer

-Here is an example log after training for one pass. The average error rates are 0.79958 on training set and 0.7858 on validation set.
+Before jumping into creating a training module, algorithm setting is also necessary.
+Here we specified `Momentum` optimization algorithm via `paddle.optimizer`.

-```text
-TrainerInternal.cpp:165]  Batch=300 samples=38400 AvgCost=2.07708 CurrentCost=1.96158 Eval: classification_error_evaluator=0.81151  CurrentEval: classification_error_evaluator=0.789297
-TrainerInternal.cpp:181]  Pass=0 Batch=391 samples=50000 AvgCost=2.03348 Eval: classification_error_evaluator=0.79958
-Tester.cpp:115]  Test samples=10000 cost=1.99246 Eval: classification_error_evaluator=0.7858
+```python
+# Create optimizer
+momentum_optimizer = paddle.optimizer.Momentum(
+    momentum=0.9,
+    regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
+    learning_rate=0.1 / 128.0,
+    learning_rate_decay_a=0.1,
+    learning_rate_decay_b=50000 * 100,
+    learning_rate_schedule='discexp',
+    batch_size=128)
+
+# Create trainer
+trainer = paddle.trainer.SGD(cost=cost,
+                             parameters=parameters,
+                             update_equation=momentum_optimizer)
 ```

-Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
+The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
+$$  lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
+where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate.

-<p align="center">
-<img src="image/plot_en.png" width="400" ><br/>
-Figure 12. The error rate of VGG model on CIFAR10
-</p>
+### Training

-## Model Application
+`cifar.train10()` will yield records during each pass, after shuffling, a batch input is generated for training.

-After training is done, the model from each pass is saved in `output/pass-%05d`. For example, the model of Pass 300 is saved in `output/pass-00299`. The script `classify.py` can be used to extract features and to classify an image. The default config file of this script is `models/vgg.py`.
+```python
+reader=paddle.reader.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.cifar.train10(), buf_size=50000),
+        batch_size=128)
+```
+
+`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance,
+ the first column of data generated by `cifar.train10()` corresponds to image layer's feature.

+```python
+feeding={'image': 0,
+         'label': 1}
+```

-### Prediction
+Callback function `event_handler` will be called during training when a pre-defined event happens.

-We can run the following script to predict the category of an image. The default device is GPU. If to use CPU, set `-c`.

-```bash
-python classify.py --job=predict --model=output/pass-00299 --data=image/dog.png # -c
+```python
+# event handler to track training and testing process
+def event_handler(event):
+    if isinstance(event, paddle.event.EndIteration):
+        if event.batch_id % 100 == 0:
+            print "\nPass %d, Batch %d, Cost %f, %s" % (
+                event.pass_id, event.batch_id, event.cost, event.metrics)
+        else:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+    if isinstance(event, paddle.event.EndPass):
+        result = trainer.test(
+            reader=paddle.reader.batch(
+                paddle.dataset.cifar.test10(), batch_size=128),
+            reader_dict={'image': 0,
+                         'label': 1})
+        print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 ```

-Here is the result：
+Finally, we can invoke `trainer.train` to start training:

-```text
-Label of image/dog.png is: 5
+```python
+trainer.train(
+    reader=reader,
+    num_passes=200,
+    event_handler=event_handler,
+    feeding=feeding)
 ```

-### Feature Extraction
+Here is an example log after training for one pass. The average error rates are 0.6875 on the training set and 0.8852 on the validation set.

-We can run the following command to extract features from an image. Here `job` should be `extract` and the default layer is the first convolutional layer. Figure 13 shows the 64 feature maps output from the first convolutional layer of the VGG model.
-
-```bash
-python classify.py --job=extract --model=output/pass-00299 --data=image/dog.png # -c
+```text
+Pass 0, Batch 0, Cost 2.473182, {'classification_error_evaluator': 0.9140625}
+...................................................................................................
+Pass 0, Batch 100, Cost 1.913076, {'classification_error_evaluator': 0.78125}
+...................................................................................................
+Pass 0, Batch 200, Cost 1.783041, {'classification_error_evaluator': 0.7421875}
+...................................................................................................
+Pass 0, Batch 300, Cost 1.668833, {'classification_error_evaluator': 0.6875}
+..........................................................................................
+Test with Pass 0, {'classification_error_evaluator': 0.885200023651123}
 ```

+Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
 <p align="center">
-<img src="image/fea_conv0.png" width="500"><br/>
-Figre 13. Visualization of convolution layer feature maps
+<img src="image/plot_en.png" width="400" ><br/>
+Figure 12. The error rate of VGG model on CIFAR10
 </p>

+
+After training is done, the model from each pass is saved in `output/pass-%05d`. For example, the model of Pass 300 is saved in `output/pass-00299`.
+
 ## Conclusion

 Traditional image classification methods involve multiple stages of processing and the framework is very complicated. In contrast, CNN models can be trained end-to-end with significant increase of classification accuracy. In this chapter, we introduce three models -- VGG, GoogleNet, ResNet, provide PaddlePaddle config files for training VGG and ResNet on CIFAR10, and explain how to perform prediction and feature extraction using PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.