After running the command `python train.py`, training will start immediately. The following sections will describe in details.
## Model Structure
## Model Configuration
### Initialize PaddlePaddle
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
Let's start with importing the Paddle Fluid API package and the helper modules.
```python
importpaddle
importpaddle.fluidasfluid
importnumpy
importsys
importpaddle.v2aspaddle
fromvggimportvgg_bn_drop
fromresnetimportresnet_cifar10
# PaddlePaddle init
paddle.init(use_gpu=False,trainer_count=1)
```
Now we are going to walk you through the implementations of the VGG and ResNet.
### VGG
Let's start with the VGG model. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we use a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
1. Define input data and its dimension
The input to the network is defined as `paddle.layer.data`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.
The input to VGG main module is from the data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
2.1. Firstly, it defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
2.3. The last two layers are fully-connected layers of dimension 512.
3. Define Classifier
The above VGG network extracts high-level features and maps them to a vector of the same size as the categories. Softmax function or classifier is then used for calculating the probability of the image belonging to each category.
```python
out = paddle.layer.fc(input=net,
size=classdim,
act=paddle.activation.Softmax())
```
4. Define Loss Function and Outputs
In the context of supervised learning, labels of training images are defined in `paddle.layer.data` as well. During training, the cross-entropy loss function is used and the loss is the output of the network. During testing, the outputs are the probabilities calculated in the classifier.
The first, third and fourth steps of a ResNet are the same as a VGG. The second step is the main module of ResNet.
The input to VGG main module is from the data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
1. Firstly, it defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
3. The last two layers are fully-connected layers of dimension 512.
4. The above VGG network extracts high-level features and maps them to a vector of the same size as the categories. Softmax function or classifier is then used for calculating the probability of the image belonging to each category.
### ResNet
Here are some basic functions used in `resnet_cifar10`:
-`conv_bn_layer` : convolutional layer followed by BN.
The input to the network is defined as `fluid.layers.data`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32).
# predict = vgg_bn_drop(images) # un-comment to use vgg net
returnpredict
```
### Define Parameters
## Train Program Configuration
Then we need to setup the the `train_program`. It takes the prediction from the inference_program first.
During the training, it will calculate the `avg_loss` from the prediction.
Firstly, we create the model parameters according to the previous model configuration `cost`.
In the context of supervised learning, labels of training images are defined in `fluid.layers.data` as well. During training, the cross-entropy loss function is used and the loss is the output of the network. During testing, the outputs are the probabilities calculated in the classifier.
**NOTE:** A train program should return an array and the first returned argument has to be `avg_cost`.
The trainer always implicitly use it to calculate the gradient.
The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
$$ lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate.
## Model Training
### Training
### Create Trainer
`cifar.train10()` will yield records during each pass, after shuffling, a batch input is generated for training.
Before creating a training module, it is necessary to set the algorithm.
Here we specify `Adam` optimization algorithm via `fluid.optimizer`.
print"\nTest with Pass %d, %s"%(event.pass_id,result.metrics)
ifparams_dirnameisnotNone:
trainer.save_params(params_dirname)
```
Finally, we can invoke `trainer.train` to start training:
### Training
Finally, we can invoke `trainer.train` to start training.
**Note:** On CPU, each epoch will take about 15~20 minutes. This part may take a while. Please feel free to modify the code to run the test on GPU to increase the training speed.
```python
trainer.train(
reader=reader,
num_passes=200,
event_handler=event_handler_plot,
feeding=feeding)
reader=train_reader,
num_epochs=2,
event_handler=event_handler,
feed_order=['pixel','label'])
```
Here is an example log after training for one pass. The average error rates are 0.6875 on the training set and 0.8852 on the validation set.
Here is an example log after training for one pass. The accuracy rates are 0.59 on the training set and 0.6 on the validation set.
Test with Pass 0, {'classification_error_evaluator': 0.885200023651123}
Test with Pass 0, Loss 1.1, Acc 0.6
```
Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
...
...
@@ -478,42 +471,52 @@ Figure 12. The error rate of VGG model on CIFAR10
</p>
## Application
After training is completed, users can use the trained model to classify images. The following code shows how to infer through `paddle.infer` interface. You can uncomment some lines from below to change the model name.
After training is completed, users can use the trained model to classify images. The following code shows how to infer through `fluid.Inferencer` interface. You can uncomment some lines from below to change the model name.
### Generate input data for inferring
`dog.png` is an example image of a dog. Turn it into a numpy array to match the data feeder format.
```python
# Prepare testing data.
fromPILimportImage
importnumpyasnp
importos
defload_image(file):
im=Image.open(file)
im=im.resize((32,32),Image.ANTIALIAS)
im=np.array(im).astype(np.float32)
# The storage order of the loaded image is W(widht),
# The storage order of the loaded image is W(width),
# H(height), C(channel). PaddlePaddle requires
# the CHW order, so transpose them.
im=im.transpose((2,0,1))# CHW
# In the training phase, the channel order of CIFAR
# image is B(Blue), G(green), R(Red). But PIL open
# image in RGB mode. It must swap the channel order.
After running the command `python train.py`, training will start immediately. The following sections will describe in details.
## Model Structure
## Model Configuration
### Initialize PaddlePaddle
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
Let's start with importing the Paddle Fluid API package and the helper modules.
```python
import paddle
import paddle.fluid as fluid
import numpy
import sys
import paddle.v2 as paddle
from vgg import vgg_bn_drop
from resnet import resnet_cifar10
# PaddlePaddle init
paddle.init(use_gpu=False, trainer_count=1)
```
Now we are going to walk you through the implementations of the VGG and ResNet.
### VGG
Let's start with the VGG model. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we use a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
1. Define input data and its dimension
The input to the network is defined as `paddle.layer.data`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32), and the number of categories is 10.
The input to VGG main module is from the data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
2.1. Firstly, it defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
2.3. The last two layers are fully-connected layers of dimension 512.
3. Define Classifier
The above VGG network extracts high-level features and maps them to a vector of the same size as the categories. Softmax function or classifier is then used for calculating the probability of the image belonging to each category.
```python
out = paddle.layer.fc(input=net,
size=classdim,
act=paddle.activation.Softmax())
```
4. Define Loss Function and Outputs
In the context of supervised learning, labels of training images are defined in `paddle.layer.data` as well. During training, the cross-entropy loss function is used and the loss is the output of the network. During testing, the outputs are the probabilities calculated in the classifier.
The first, third and fourth steps of a ResNet are the same as a VGG. The second step is the main module of ResNet.
The input to VGG main module is from the data layer. `vgg_bn_drop` defines a 16-layer VGG network, with each convolutional layer followed by BN and dropout layers. Here is the definition in detail:
1. Firstly, it defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.
3. The last two layers are fully-connected layers of dimension 512.
4. The above VGG network extracts high-level features and maps them to a vector of the same size as the categories. Softmax function or classifier is then used for calculating the probability of the image belonging to each category.
### ResNet
Here are some basic functions used in `resnet_cifar10`:
- `conv_bn_layer` : convolutional layer followed by BN.
The input to the network is defined as `fluid.layers.data`, or image pixels in the context of image classification. The images in CIFAR10 are 32x32 color images of three channels. Therefore, the size of the input data is 3072 (3x32x32).
# predict = vgg_bn_drop(images) # un-comment to use vgg net
return predict
```
### Define Parameters
## Train Program Configuration
Then we need to setup the the `train_program`. It takes the prediction from the inference_program first.
During the training, it will calculate the `avg_loss` from the prediction.
Firstly, we create the model parameters according to the previous model configuration `cost`.
In the context of supervised learning, labels of training images are defined in `fluid.layers.data` as well. During training, the cross-entropy loss function is used and the loss is the output of the network. During testing, the outputs are the probabilities calculated in the classifier.
**NOTE:** A train program should return an array and the first returned argument has to be `avg_cost`.
The trainer always implicitly use it to calculate the gradient.
The learning rate adjustment policy can be defined with variables `learning_rate_decay_a`($a$), `learning_rate_decay_b`($b$) and `learning_rate_schedule`. In this example, discrete exponential method is used for adjusting learning rate. The formula is as follows,
$$ lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
where $n$ is the number of processed samples, $lr_{0}$ is the learning_rate.
## Model Training
### Training
### Create Trainer
`cifar.train10()` will yield records during each pass, after shuffling, a batch input is generated for training.
Before creating a training module, it is necessary to set the algorithm.
Here we specify `Adam` optimization algorithm via `fluid.optimizer`.
```python
reader=paddle.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(), buf_size=50000),
batch_size=128)
use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(
train_func=train_program,
optimizer_func=optimizer_program,
place=place)
```
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance,
the first column of data generated by `cifar.train10()` corresponds to image layer's feature.
### Data Feeders Configuration
`cifar.train10()` will yield records during each pass, after shuffling, a batch input is generated for training.
print('\nTest with Pass {0}, Loss {1:2.2}, Acc {2:2.2}'.format(event.epoch, avg_cost, accuracy))
# save parameters
with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
trainer.save_parameter_to_tar(f)
result = trainer.test(
reader=paddle.batch(
paddle.dataset.cifar.test10(), batch_size=128),
feeding=feeding)
print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
if params_dirname is not None:
trainer.save_params(params_dirname)
```
Finally, we can invoke `trainer.train` to start training:
### Training
Finally, we can invoke `trainer.train` to start training.
**Note:** On CPU, each epoch will take about 15~20 minutes. This part may take a while. Please feel free to modify the code to run the test on GPU to increase the training speed.
```python
trainer.train(
reader=reader,
num_passes=200,
event_handler=event_handler_plot,
feeding=feeding)
reader=train_reader,
num_epochs=2,
event_handler=event_handler,
feed_order=['pixel', 'label'])
```
Here is an example log after training for one pass. The average error rates are 0.6875 on the training set and 0.8852 on the validation set.
Here is an example log after training for one pass. The accuracy rates are 0.59 on the training set and 0.6 on the validation set.
Test with Pass 0, {'classification_error_evaluator': 0.885200023651123}
Test with Pass 0, Loss 1.1, Acc 0.6
```
Figure 12 shows the curve of training error rate, which indicates it converges at Pass 200 with error rate 8.54%.
...
...
@@ -520,42 +513,52 @@ Figure 12. The error rate of VGG model on CIFAR10
</p>
## Application
After training is completed, users can use the trained model to classify images. The following code shows how to infer through `paddle.infer` interface. You can uncomment some lines from below to change the model name.
After training is completed, users can use the trained model to classify images. The following code shows how to infer through `fluid.Inferencer` interface. You can uncomment some lines from below to change the model name.
### Generate input data for inferring
`dog.png` is an example image of a dog. Turn it into a numpy array to match the data feeder format.
```python
# Prepare testing data.
from PIL import Image
import numpy as np
import os
def load_image(file):
im = Image.open(file)
im = im.resize((32, 32), Image.ANTIALIAS)
im = np.array(im).astype(np.float32)
# The storage order of the loaded image is W(widht),
# The storage order of the loaded image is W(width),
# H(height), C(channel). PaddlePaddle requires
# the CHW order, so transpose them.
im = im.transpose((2, 0, 1)) # CHW
# In the training phase, the channel order of CIFAR
# image is B(Blue), G(green), R(Red). But PIL open
# image in RGB mode. It must swap the channel order.