@@ -90,7 +90,7 @@ Figure 5. A CNN example <a src="#References">[20]</a>
Parameter updates at each layer during training causes input layer distributions to change and in turn requires hyper-parameters to be carefully tuned. In 2015, Sergey Ioffe and Christian Szegedy proposed a Batch Normalization (BN) algorithm \[[14](#References)\], which normalizes the features of each batch in a layer, and enables relatively stable distribution in each layer. Not only does BN algorithm act as a regularizer, but also eliminates the need for meticulous hyper-parameter design. Experiments demonstrate that BN algorithm accelerates the training convergence and has been widely used in further deeper models.
In the following sections, we will take a tour through the following network architectures - VGG, GoogleNet and ResNets.
In the following sections, we will take a tour through the following network architectures - VGG, GoogLeNet and ResNets.
### VGG
...
...
@@ -101,9 +101,9 @@ The Oxford Visual Geometry Group (VGG) proposed the VGG network in ILSVRC 2014 \
Figure 6. VGG16 model for ImageNet
</p>
### GoogleNet
### GoogLeNet
GoogleNet \[[12](#References)\] won the ILSVRC championship in 2014. GoogleNet borrowed some ideas from the Network in Network(NIN) model \[[13](#References)\] and is built on the Inception blocks. Let us first familiarize ourselves with these concepts first.
GoogLeNet \[[12](#References)\] won the ILSVRC championship in 2014. GoogLeNet borrowed some ideas from the Network in Network(NIN) model \[[13](#References)\] and is built on the Inception blocks. Let us first familiarize ourselves with these concepts first.
The two main characteristics of the NIN model are:
...
...
@@ -118,16 +118,16 @@ Figure 7 depicts two Inception blocks. Figure 7(a) is the simplest design. The o
Figure 7. Inception block
</p>
GoogleNet comprises multiple stacked Inception blocks followed by an avg-pooling layer as in NIN instead of traditional fully connected layers. The difference between GoogleNet and NIN is that GoogleNet adds a fully connected layer after avg-pooling layer to output a vector of category size. Besides these two characteristics, the features from middle layers of a GoogleNet are also very discriminative. Therefore, GoogeleNet inserts two auxiliary classifiers in the model for enhancing gradient and regularization when doing back-propagation. The loss function of the whole network is the weighted sum of these three classifiers.
GoogLeNet comprises multiple stacked Inception blocks followed by an avg-pooling layer as in NIN instead of traditional fully connected layers. The difference between GoogLeNet and NIN is that GoogLeNet adds a fully connected layer after avg-pooling layer to output a vector of category size. Besides these two characteristics, the features from middle layers of a GoogLeNet are also very discriminative. Therefore, GoogeleNet inserts two auxiliary classifiers in the model for enhancing gradient and regularization when doing back-propagation. The loss function of the whole network is the weighted sum of these three classifiers.
Figure 8 illustrates the neural architecture of a GoogleNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second group has five, and the third group has two again. Finally, It ends with an average pooling and a fully-connected layer.
Figure 8 illustrates the neural architecture of a GoogLeNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second group has five, and the third group has two again. Finally, It ends with an average pooling and a fully-connected layer.
The model above is the first version of GoogleNet or the so-called GoogelNet-v1. GoogleNet-v2 \[[14](#References)\] introduced BN layer; GoogleNet-v3 \[[16](#References)\] further split some convolutional layers, which increases non-linearity and network depth; GoogelNet-v4 \[[17](#References)\] is inspired by the design idea of ResNet which will be introduced in the next section. The evolution from v1 to v4 improved the accuracy rate consistently. The length of this article being limited, we will not scrutinize the neural architectures of v2 to v4.
The model above is the first version of GoogLeNet or the so-called GoogelNet-v1. GoogLeNet-v2 \[[14](#References)\] introduced BN layer; GoogLeNet-v3 \[[16](#References)\] further split some convolutional layers, which increases non-linearity and network depth; GoogelNet-v4 \[[17](#References)\] is inspired by the design idea of ResNet which will be introduced in the next section. The evolution from v1 to v4 improved the accuracy rate consistently. The length of this article being limited, we will not scrutinize the neural architectures of v2 to v4.
### ResNet
...
...
@@ -574,7 +574,7 @@ with fluid.scope_guard(inference_scope):
## Summary
The traditional image classification method consists of multiple stages. The framework is a little complex. In contrast, the end-to-end CNN model can be implemented in one step, and the accuracy of classification is greatly improved. In this article, we first introduced three classic models, VGG, GoogleNet and ResNet. Then we have introduced how to use PaddlePaddle to configure and train CNN models based on CIFAR10 dataset, especially VGG and ResNet models. Finally, we have guided you how to use PaddlePaddle's API interfaces to predict images and extract features. For other datasets such as ImageNet, the configuration and training process is the same, so you can embark on your adventure on your own.
The traditional image classification method consists of multiple stages. The framework is a little complex. In contrast, the end-to-end CNN model can be implemented in one step, and the accuracy of classification is greatly improved. In this article, we first introduced three classic models, VGG, GoogLeNet and ResNet. Then we have introduced how to use PaddlePaddle to configure and train CNN models based on CIFAR10 dataset, especially VGG and ResNet models. Finally, we have guided you how to use PaddlePaddle's API interfaces to predict images and extract features. For other datasets such as ImageNet, the configuration and training process is the same, so you can embark on your adventure on your own.
@@ -132,7 +132,7 @@ Figure 5. A CNN example <a src="#References">[20]</a>
Parameter updates at each layer during training causes input layer distributions to change and in turn requires hyper-parameters to be carefully tuned. In 2015, Sergey Ioffe and Christian Szegedy proposed a Batch Normalization (BN) algorithm \[[14](#References)\], which normalizes the features of each batch in a layer, and enables relatively stable distribution in each layer. Not only does BN algorithm act as a regularizer, but also eliminates the need for meticulous hyper-parameter design. Experiments demonstrate that BN algorithm accelerates the training convergence and has been widely used in further deeper models.
In the following sections, we will take a tour through the following network architectures - VGG, GoogleNet and ResNets.
In the following sections, we will take a tour through the following network architectures - VGG, GoogLeNet and ResNets.
### VGG
...
...
@@ -143,9 +143,9 @@ The Oxford Visual Geometry Group (VGG) proposed the VGG network in ILSVRC 2014 \
Figure 6. VGG16 model for ImageNet
</p>
### GoogleNet
### GoogLeNet
GoogleNet \[[12](#References)\] won the ILSVRC championship in 2014. GoogleNet borrowed some ideas from the Network in Network(NIN) model \[[13](#References)\] and is built on the Inception blocks. Let us first familiarize ourselves with these concepts first.
GoogLeNet \[[12](#References)\] won the ILSVRC championship in 2014. GoogLeNet borrowed some ideas from the Network in Network(NIN) model \[[13](#References)\] and is built on the Inception blocks. Let us first familiarize ourselves with these concepts first.
The two main characteristics of the NIN model are:
...
...
@@ -160,16 +160,16 @@ Figure 7 depicts two Inception blocks. Figure 7(a) is the simplest design. The o
Figure 7. Inception block
</p>
GoogleNet comprises multiple stacked Inception blocks followed by an avg-pooling layer as in NIN instead of traditional fully connected layers. The difference between GoogleNet and NIN is that GoogleNet adds a fully connected layer after avg-pooling layer to output a vector of category size. Besides these two characteristics, the features from middle layers of a GoogleNet are also very discriminative. Therefore, GoogeleNet inserts two auxiliary classifiers in the model for enhancing gradient and regularization when doing back-propagation. The loss function of the whole network is the weighted sum of these three classifiers.
GoogLeNet comprises multiple stacked Inception blocks followed by an avg-pooling layer as in NIN instead of traditional fully connected layers. The difference between GoogLeNet and NIN is that GoogLeNet adds a fully connected layer after avg-pooling layer to output a vector of category size. Besides these two characteristics, the features from middle layers of a GoogLeNet are also very discriminative. Therefore, GoogeleNet inserts two auxiliary classifiers in the model for enhancing gradient and regularization when doing back-propagation. The loss function of the whole network is the weighted sum of these three classifiers.
Figure 8 illustrates the neural architecture of a GoogleNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second group has five, and the third group has two again. Finally, It ends with an average pooling and a fully-connected layer.
Figure 8 illustrates the neural architecture of a GoogLeNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second group has five, and the third group has two again. Finally, It ends with an average pooling and a fully-connected layer.
The model above is the first version of GoogleNet or the so-called GoogelNet-v1. GoogleNet-v2 \[[14](#References)\] introduced BN layer; GoogleNet-v3 \[[16](#References)\] further split some convolutional layers, which increases non-linearity and network depth; GoogelNet-v4 \[[17](#References)\] is inspired by the design idea of ResNet which will be introduced in the next section. The evolution from v1 to v4 improved the accuracy rate consistently. The length of this article being limited, we will not scrutinize the neural architectures of v2 to v4.
The model above is the first version of GoogLeNet or the so-called GoogelNet-v1. GoogLeNet-v2 \[[14](#References)\] introduced BN layer; GoogLeNet-v3 \[[16](#References)\] further split some convolutional layers, which increases non-linearity and network depth; GoogelNet-v4 \[[17](#References)\] is inspired by the design idea of ResNet which will be introduced in the next section. The evolution from v1 to v4 improved the accuracy rate consistently. The length of this article being limited, we will not scrutinize the neural architectures of v2 to v4.
### ResNet
...
...
@@ -616,7 +616,7 @@ with fluid.scope_guard(inference_scope):
## Summary
The traditional image classification method consists of multiple stages. The framework is a little complex. In contrast, the end-to-end CNN model can be implemented in one step, and the accuracy of classification is greatly improved. In this article, we first introduced three classic models, VGG, GoogleNet and ResNet. Then we have introduced how to use PaddlePaddle to configure and train CNN models based on CIFAR10 dataset, especially VGG and ResNet models. Finally, we have guided you how to use PaddlePaddle's API interfaces to predict images and extract features. For other datasets such as ImageNet, the configuration and training process is the same, so you can embark on your adventure on your own.
The traditional image classification method consists of multiple stages. The framework is a little complex. In contrast, the end-to-end CNN model can be implemented in one step, and the accuracy of classification is greatly improved. In this article, we first introduced three classic models, VGG, GoogLeNet and ResNet. Then we have introduced how to use PaddlePaddle to configure and train CNN models based on CIFAR10 dataset, especially VGG and ResNet models. Finally, we have guided you how to use PaddlePaddle's API interfaces to predict images and extract features. For other datasets such as ImageNet, the configuration and training process is the same, so you can embark on your adventure on your own.