The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle[Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.
The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle[Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.
## Background
## Background
...
@@ -135,146 +135,73 @@ Figure 10. ResNet model for ImageNet
...
@@ -135,146 +135,73 @@ Figure 10. ResNet model for ImageNet
</p>
</p>
## Data Preparation
## Dataset
### Data description and downloading
Commonly used public datasets for image classification are CIFAR(https://www.cs.toronto.edu/~kriz/cifar.html), ImageNet(http://image-net.org/), COCO(http://mscoco.org/), etc. Those used for fine-grained image classification are CUB-200-2011(http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), Stanford Dog(http://vision.stanford.edu/aditya86/ImageNetDogs/), Oxford-flowers(http://www.robots.ox.ac.uk/~vgg/data/flowers/), etc. Among them, ImageNet are the largest and most research results are reported on ImageNet as mentioned in Model Overview section. Since 2010, the data of Imagenet has gone through some changes. The commonly used ImageNet-2012 dataset contains 1000 categories. There are 1,281,167 training images, ranging from 732 to 1200 images per category, and 50,000 validation images with 50 images per category in average.
Commonly used public datasets for image classification are CIFAR(https://www.cs.toronto.edu/~kriz/cifar.html), ImageNet(http://image-net.org/), COCO(http://mscoco.org/), etc. Those used for fine-grained image classification are CUB-200-2011(http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), Stanford Dog(http://vision.stanford.edu/aditya86/ImageNetDogs/), Oxford-flowers(http://www.robots.ox.ac.uk/~vgg/data/flowers/), etc. Among them, ImageNet are the largest and most research results are reported on ImageNet as mentioned in Model Overview section. Since 2010, the data of Imagenet has gone through some changes. The commonly used ImageNet-2012 dataset contains 1000 categories. There are 1,281,167 training images, ranging from 732 to 1200 images per category, and 50,000 validation images with 50 images per category in average.
Since ImageNet is too large to be downloaded and trained efficiently, we use CIFAR10 (https://www.cs.toronto.edu/~kriz/cifar.html) in this tutorial. The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Figure 11 shows all the classes in CIFAR10 as well as 10 images randomly sampled from each category.
Since ImageNet is too large to be downloaded and trained efficiently, we use CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html) in this tutorial. The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Figure 11 shows all the classes in CIFAR-10 as well as 10 images randomly sampled from each category.
<palign="center">
<palign="center">
<imgsrc="image/cifar.png"width="350"><br/>
<imgsrc="image/cifar.png"width="350"><br/>
Figure 11. CIFAR10 dataset[21]
Figure 11. CIFAR10 dataset[21]
</p>
</p>
The following command is used for downloading data and calculating the mean image used for data preprocessing.
`paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess CIFAR-10.
```bash
./data/get_data.sh
```
### Data provider for PaddlePaddle
After issuing a command `python train.py`, training will starting immediately. The details will be unpacked by the following sessions to see how it works.
We use Python interface for providing data to PaddlePaddle. The following file dataprovider.py is a complete example for CIFAR10.
## Model Structure
- 'initializer' function performs initialization of dataprovider: loading the mean image, defining two input types -- image and label.
### Initialize PaddlePaddle
- 'process' function sends preprocessed data to PaddlePaddle. Data preprocessing performed in this function includes data perturbation, random horizontal flipping, deducting mean image from the raw image.
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
In model config file, function `define_py_data_sources2` sets argument 'module' to dataprovider file for loading data, 'args' to mean image file. If the config file is used for prediction, then there is no need to set argument 'train_list'.
In model config file, function 'settings' specifies optimization algorithm, batch size, learning rate, momentum and L2 regularization.
```python
settings(
batch_size=128,
learning_rate=0.1/128.0,
learning_rate_decay_a=0.1,
learning_rate_decay_b=50000*100,
learning_rate_schedule='discexp',
learning_method=MomentumOptimizer(0.9),
regularization=L2Regularization(0.0005*128),)
```
```
The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both VGG and ResNet models.
$$ lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate set in 'settings'.
### Model Architecture
Here we provide the cofig files for VGG and ResNet models.
#### VGG
### VGG
First we define VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we uses a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
First, we use a VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we uses a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
1. Define input data and its dimension
1. Define input data and its dimension
The input to the network is defined as `data_layer`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.
The input to the network is defined as `paddle.layer.data`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.
The input to VGG main module is from data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
The input to VGG main module is from the data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
2.1. First defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.trainer_config_helpers` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2.1. First defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
...
@@ -309,15 +237,12 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
...
@@ -309,15 +237,12 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
4. Define Loss Function and Outputs
4. Define Loss Function and Outputs
In the context of supervised learning, labels of training images are defined in `data_layer`, too. During training, cross-entropy is used as loss function and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
In the context of supervised learning, labels of training images are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
@@ -325,13 +250,13 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
...
@@ -325,13 +250,13 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
The first, third and forth steps of a ResNet are the same as a VGG. The second one is the main module.
The first, third and forth steps of a ResNet are the same as a VGG. The second one is the main module.
```python
```python
net=resnet_cifar10(data,depth=56)
net=resnet_cifar10(data,depth=32)
```
```
Here are some basic functions used in `resnet_cifar10`:
Here are some basic functions used in `resnet_cifar10`:
-`conv_bn_layer` : convolutional layer followed by BN.
-`conv_bn_layer` : convolutional layer followed by BN.
-`shortcut` : the shortcut branch in a residual block. There are two kinds of shortcuts: 1x1 convolution used when the number of channels between input and output are different; direct connection used otherwise.
-`shortcut` : the shortcut branch in a residual block. There are two kinds of shortcuts: 1x1 convolution used when the number of channels between input and output is different; direct connection used otherwise.
-`basicblock` : a basic residual module as shown in the left of Figure 9, consisting of two sequential 3x3 convolutions and one "shortcut" branch.
-`basicblock` : a basic residual module as shown in the left of Figure 9, consisting of two sequential 3x3 convolutions and one "shortcut" branch.
-`bottleneck` : a bottleneck module as shown in the right of Figure 9, consisting of a two 1x1 convolutions with one 3x3 convolution in between branch and a "shortcut" branch.
-`bottleneck` : a bottleneck module as shown in the right of Figure 9, consisting of a two 1x1 convolutions with one 3x3 convolution in between branch and a "shortcut" branch.
The following are the components of `resnet_cifar10`:
The following are the components of `resnet_cifar10`:
...
@@ -395,106 +311,131 @@ The following are the components of `resnet_cifar10`:
...
@@ -395,106 +311,131 @@ The following are the components of `resnet_cifar10`:
Note: besides the first convolutional layer and the last fully-connected layer, the total number of layers in three `layer_warp` should be dividable by 6, that is the depth of `resnet_cifar10` should satisfy $(depth - 2) % 6 == 0$.
Note: besides the first convolutional layer and the last fully-connected layer, the total number of layers in three `layer_warp` should be dividable by 6, that is the depth of `resnet_cifar10` should satisfy $(depth - 2) % 6 == 0$.
```python
```python
defresnet_cifar10(ipt,depth=56):
defresnet_cifar10(ipt,depth=32):
# depth should be one of 20, 32, 44, 56, 110, 1202
# depth should be one of 20, 32, 44, 56, 110, 1202
We can train the model by running the script train.sh, which specifies config file, device type, number of threads, number of passes, path to the trained models, etc,
### Define Parameters
``` bash
First, we create the model parameters according to the previous model configuration `cost`.
sh train.sh
```
Here is an example script `train.sh`:
```python
# Create parameters
```bash
parameters=paddle.parameters.create(cost)
#cfg=models/resnet.py
cfg=models/vgg.py
output=output
log=train.log
paddle train \
--config=$cfg\
--use_gpu=true\
--trainer_count=1 \
--log_period=100 \
--num_passes=300 \
--save_dir=$output\
2>&1 | tee$log
```
```
-`--config=$cfg` : specifies config file. The default is `models/vgg.py`.
### Create Trainer
-`--use_gpu=true` : uses GPU for training. If use CPU,set it to be false.
-`--trainer_count=1` : specifies the number of threads or GPUs.
-`--log_period=100` : specifies the number of batches between two logs.
-`--save_dir=$output` : specifies the path for saving trained models.
Here is an example log after training for one pass. The average error rates are 0.79958 on training set and 0.7858 on validation set.
Before jumping into creating a training module, algorithm setting is also necessary.
Here we specified `Momentum` optimization algorithm via `paddle.optimizer`.
Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
$$ lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate.
<palign="center">
### Training
<imgsrc="image/plot_en.png"width="400"><br/>
Figure 12. The error rate of VGG model on CIFAR10
`cifar.train10()` will yield records during each pass, after shuffling, a batch input is generated for training.
</p>
## Model Application
```python
reader=paddle.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(),buf_size=50000),
batch_size=128)
```
After training is done, the model from each pass is saved in `output/pass-%05d`. For example, the model of Pass 300 is saved in `output/pass-00299`. The script `classify.py` can be used to extract features and to classify an image. The default config file of this script is `models/vgg.py`.
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance,
the first column of data generated by `cifar.train10()` corresponds to image layer's feature.
```python
feeding={'image':0,
'label':1}
```
### Prediction
Callback function `event_handler` will be called during training when a pre-defined event happens.
We can run the following script to predict the category of an image. The default device is GPU. If to use CPU, set `-c`.
print"\nTest with Pass %d, %s"%(event.pass_id,result.metrics)
```
```
Here is the result:
Finally, we can invoke `trainer.train` to start training:
```text
```python
Label of image/dog.png is: 5
trainer.train(
reader=reader,
num_passes=200,
event_handler=event_handler,
feeding=feeding)
```
```
### Feature Extraction
Here is an example log after training for one pass. The average error rates are 0.6875 on the training set and 0.8852 on the validation set.
We can run the following command to extract features from an image. Here `job` should be `extract` and the default layer is the first convolutional layer. Figure 13 shows the 64 feature maps output from the first convolutional layer of the VGG model.
Test with Pass 0, {'classification_error_evaluator': 0.885200023651123}
```
```
Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
<palign="center">
<palign="center">
<imgsrc="image/fea_conv0.png"width="500"><br/>
<imgsrc="image/plot_en.png"width="400"><br/>
Figre 13. Visualization of convolution layer feature maps
Figure 12. The error rate of VGG model on CIFAR10
</p>
</p>
After training is done, the model from each pass is saved in `output/pass-%05d`. For example, the model of Pass 300 is saved in `output/pass-00299`.
## Conclusion
## Conclusion
Traditional image classification methods involve multiple stages of processing and the framework is very complicated. In contrast, CNN models can be trained end-to-end with significant increase of classification accuracy. In this chapter, we introduce three models -- VGG, GoogleNet, ResNet, provide PaddlePaddle config files for training VGG and ResNet on CIFAR10, and explain how to perform prediction and feature extraction using PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.
Traditional image classification methods involve multiple stages of processing and the framework is very complicated. In contrast, CNN models can be trained end-to-end with significant increase of classification accuracy. In this chapter, we introduce three models -- VGG, GoogleNet, ResNet, provide PaddlePaddle config files for training VGG and ResNet on CIFAR10, and explain how to perform prediction and feature extraction using PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.
The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle[Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.
The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle[Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.
## Background
## Background
...
@@ -177,146 +177,73 @@ Figure 10. ResNet model for ImageNet
...
@@ -177,146 +177,73 @@ Figure 10. ResNet model for ImageNet
</p>
</p>
## Data Preparation
## Dataset
### Data description and downloading
Commonly used public datasets for image classification are CIFAR(https://www.cs.toronto.edu/~kriz/cifar.html), ImageNet(http://image-net.org/), COCO(http://mscoco.org/), etc. Those used for fine-grained image classification are CUB-200-2011(http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), Stanford Dog(http://vision.stanford.edu/aditya86/ImageNetDogs/), Oxford-flowers(http://www.robots.ox.ac.uk/~vgg/data/flowers/), etc. Among them, ImageNet are the largest and most research results are reported on ImageNet as mentioned in Model Overview section. Since 2010, the data of Imagenet has gone through some changes. The commonly used ImageNet-2012 dataset contains 1000 categories. There are 1,281,167 training images, ranging from 732 to 1200 images per category, and 50,000 validation images with 50 images per category in average.
Commonly used public datasets for image classification are CIFAR(https://www.cs.toronto.edu/~kriz/cifar.html), ImageNet(http://image-net.org/), COCO(http://mscoco.org/), etc. Those used for fine-grained image classification are CUB-200-2011(http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), Stanford Dog(http://vision.stanford.edu/aditya86/ImageNetDogs/), Oxford-flowers(http://www.robots.ox.ac.uk/~vgg/data/flowers/), etc. Among them, ImageNet are the largest and most research results are reported on ImageNet as mentioned in Model Overview section. Since 2010, the data of Imagenet has gone through some changes. The commonly used ImageNet-2012 dataset contains 1000 categories. There are 1,281,167 training images, ranging from 732 to 1200 images per category, and 50,000 validation images with 50 images per category in average.
Since ImageNet is too large to be downloaded and trained efficiently, we use CIFAR10 (https://www.cs.toronto.edu/~kriz/cifar.html) in this tutorial. The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Figure 11 shows all the classes in CIFAR10 as well as 10 images randomly sampled from each category.
Since ImageNet is too large to be downloaded and trained efficiently, we use CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html) in this tutorial. The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Figure 11 shows all the classes in CIFAR-10 as well as 10 images randomly sampled from each category.
<palign="center">
<palign="center">
<imgsrc="image/cifar.png"width="350"><br/>
<imgsrc="image/cifar.png"width="350"><br/>
Figure 11. CIFAR10 dataset[21]
Figure 11. CIFAR10 dataset[21]
</p>
</p>
The following command is used for downloading data and calculating the mean image used for data preprocessing.
`paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess CIFAR-10.
```bash
./data/get_data.sh
```
### Data provider for PaddlePaddle
We use Python interface for providing data to PaddlePaddle. The following file dataprovider.py is a complete example for CIFAR10.
- 'initializer' function performs initialization of dataprovider: loading the mean image, defining two input types -- image and label.
- 'process' function sends preprocessed data to PaddlePaddle. Data preprocessing performed in this function includes data perturbation, random horizontal flipping, deducting mean image from the raw image.
After issuing a command `python train.py`, training will starting immediately. The details will be unpacked by the following sessions to see how it works.
@provider(init_hook=initializer, pool_size=50000)
## Model Structure
def process(settings, file_list):
with open(file_list, 'r') as fdata:
for fname in fdata:
fo = open(fname.strip(), 'rb')
batch = cPickle.load(fo)
fo.close()
images = batch['data']
labels = batch['labels']
for im, lab in zip(images, labels):
if settings.is_train and np.random.randint(2):
im = im.reshape(3, 32, 32)
im = im[:,:,::-1]
im = im.flatten()
im = im - settings.mean
yield {
'image': im.astype('float32'),
'label': int(lab)
}
```
## Model Config
### Data Definition
### Initialize PaddlePaddle
In model config file, function `define_py_data_sources2` sets argument 'module' to dataprovider file for loading data, 'args' to mean image file. If the config file is used for prediction, then there is no need to set argument 'train_list'.
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
In model config file, function 'settings' specifies optimization algorithm, batch size, learning rate, momentum and L2 regularization.
```python
# PaddlePaddle init
settings(
paddle.init(use_gpu=False, trainer_count=1)
batch_size=128,
learning_rate=0.1 / 128.0,
learning_rate_decay_a=0.1,
learning_rate_decay_b=50000 * 100,
learning_rate_schedule='discexp',
learning_method=MomentumOptimizer(0.9),
regularization=L2Regularization(0.0005 * 128),)
```
```
The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both VGG and ResNet models.
$$ lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate set in 'settings'.
### Model Architecture
Here we provide the cofig files for VGG and ResNet models.
### VGG
#### VGG
First we define VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we uses a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
First, we use a VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we uses a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
1. Define input data and its dimension
1. Define input data and its dimension
The input to the network is defined as `data_layer`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.
The input to the network is defined as `paddle.layer.data`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.
The input to VGG main module is from data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
The input to VGG main module is from the data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
2.1. First defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.trainer_config_helpers` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2.1. First defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
...
@@ -351,15 +279,12 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
...
@@ -351,15 +279,12 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
4. Define Loss Function and Outputs
4. Define Loss Function and Outputs
In the context of supervised learning, labels of training images are defined in `data_layer`, too. During training, cross-entropy is used as loss function and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
In the context of supervised learning, labels of training images are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
@@ -367,13 +292,13 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
...
@@ -367,13 +292,13 @@ First we define VGG network. Since the image size and amount of CIFAR10 are rela
The first, third and forth steps of a ResNet are the same as a VGG. The second one is the main module.
The first, third and forth steps of a ResNet are the same as a VGG. The second one is the main module.
```python
```python
net = resnet_cifar10(data, depth=56)
net = resnet_cifar10(data, depth=32)
```
```
Here are some basic functions used in `resnet_cifar10`:
Here are some basic functions used in `resnet_cifar10`:
- `conv_bn_layer` : convolutional layer followed by BN.
- `conv_bn_layer` : convolutional layer followed by BN.
- `shortcut` : the shortcut branch in a residual block. There are two kinds of shortcuts: 1x1 convolution used when the number of channels between input and output are different; direct connection used otherwise.
- `shortcut` : the shortcut branch in a residual block. There are two kinds of shortcuts: 1x1 convolution used when the number of channels between input and output is different; direct connection used otherwise.
- `basicblock` : a basic residual module as shown in the left of Figure 9, consisting of two sequential 3x3 convolutions and one "shortcut" branch.
- `basicblock` : a basic residual module as shown in the left of Figure 9, consisting of two sequential 3x3 convolutions and one "shortcut" branch.
- `bottleneck` : a bottleneck module as shown in the right of Figure 9, consisting of a two 1x1 convolutions with one 3x3 convolution in between branch and a "shortcut" branch.
- `bottleneck` : a bottleneck module as shown in the right of Figure 9, consisting of a two 1x1 convolutions with one 3x3 convolution in between branch and a "shortcut" branch.
The following are the components of `resnet_cifar10`:
The following are the components of `resnet_cifar10`:
...
@@ -437,106 +353,131 @@ The following are the components of `resnet_cifar10`:
...
@@ -437,106 +353,131 @@ The following are the components of `resnet_cifar10`:
Note: besides the first convolutional layer and the last fully-connected layer, the total number of layers in three `layer_warp` should be dividable by 6, that is the depth of `resnet_cifar10` should satisfy $(depth - 2) % 6 == 0$.
Note: besides the first convolutional layer and the last fully-connected layer, the total number of layers in three `layer_warp` should be dividable by 6, that is the depth of `resnet_cifar10` should satisfy $(depth - 2) % 6 == 0$.
```python
```python
def resnet_cifar10(ipt, depth=56):
def resnet_cifar10(ipt, depth=32):
# depth should be one of 20, 32, 44, 56, 110, 1202
# depth should be one of 20, 32, 44, 56, 110, 1202
We can train the model by running the script train.sh, which specifies config file, device type, number of threads, number of passes, path to the trained models, etc,
### Define Parameters
``` bash
First, we create the model parameters according to the previous model configuration `cost`.
sh train.sh
```
Here is an example script `train.sh`:
```python
# Create parameters
```bash
parameters = paddle.parameters.create(cost)
#cfg=models/resnet.py
cfg=models/vgg.py
output=output
log=train.log
paddle train \
--config=$cfg \
--use_gpu=true \
--trainer_count=1 \
--log_period=100 \
--num_passes=300 \
--save_dir=$output \
2>&1 | tee $log
```
```
- `--config=$cfg` : specifies config file. The default is `models/vgg.py`.
### Create Trainer
- `--use_gpu=true` : uses GPU for training. If use CPU,set it to be false.
- `--trainer_count=1` : specifies the number of threads or GPUs.
- `--log_period=100` : specifies the number of batches between two logs.
- `--save_dir=$output` : specifies the path for saving trained models.
Here is an example log after training for one pass. The average error rates are 0.79958 on training set and 0.7858 on validation set.
Before jumping into creating a training module, algorithm setting is also necessary.
Here we specified `Momentum` optimization algorithm via `paddle.optimizer`.
Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
$$ lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate.
<palign="center">
### Training
<imgsrc="image/plot_en.png"width="400"><br/>
Figure 12. The error rate of VGG model on CIFAR10
</p>
## Model Application
`cifar.train10()` will yield records during each pass, after shuffling, a batch input is generated for training.
After training is done, the model from each pass is saved in `output/pass-%05d`. For example, the model of Pass 300 is saved in `output/pass-00299`. The script `classify.py` can be used to extract features and to classify an image. The default config file of this script is `models/vgg.py`.
```python
reader=paddle.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(), buf_size=50000),
batch_size=128)
```
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance,
the first column of data generated by `cifar.train10()` corresponds to image layer's feature.
```python
feeding={'image': 0,
'label': 1}
```
### Prediction
Callback function `event_handler` will be called during training when a pre-defined event happens.
We can run the following script to predict the category of an image. The default device is GPU. If to use CPU, set `-c`.
print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
```
```
Here is the result:
Finally, we can invoke `trainer.train` to start training:
```text
```python
Label of image/dog.png is: 5
trainer.train(
reader=reader,
num_passes=200,
event_handler=event_handler,
feeding=feeding)
```
```
### Feature Extraction
Here is an example log after training for one pass. The average error rates are 0.6875 on the training set and 0.8852 on the validation set.
We can run the following command to extract features from an image. Here `job` should be `extract` and the default layer is the first convolutional layer. Figure 13 shows the 64 feature maps output from the first convolutional layer of the VGG model.
Test with Pass 0, {'classification_error_evaluator': 0.885200023651123}
```
```
Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
<palign="center">
<palign="center">
<imgsrc="image/fea_conv0.png"width="500"><br/>
<imgsrc="image/plot_en.png"width="400"><br/>
Figre 13. Visualization of convolution layer feature maps
Figure 12. The error rate of VGG model on CIFAR10
</p>
</p>
After training is done, the model from each pass is saved in `output/pass-%05d`. For example, the model of Pass 300 is saved in `output/pass-00299`.
## Conclusion
## Conclusion
Traditional image classification methods involve multiple stages of processing and the framework is very complicated. In contrast, CNN models can be trained end-to-end with significant increase of classification accuracy. In this chapter, we introduce three models -- VGG, GoogleNet, ResNet, provide PaddlePaddle config files for training VGG and ResNet on CIFAR10, and explain how to perform prediction and feature extraction using PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.
Traditional image classification methods involve multiple stages of processing and the framework is very complicated. In contrast, CNN models can be trained end-to-end with significant increase of classification accuracy. In this chapter, we introduce three models -- VGG, GoogleNet, ResNet, provide PaddlePaddle config files for training VGG and ResNet on CIFAR10, and explain how to perform prediction and feature extraction using PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.