From 49dcb5ca93a8ee85c154c0420881550a5e8cab99 Mon Sep 17 00:00:00 2001 From: cuicheng01 Date: Tue, 7 Dec 2021 02:31:53 +0000 Subject: [PATCH] update some en docs --- .../ImageNet_models_en.md | 0 docs/en/models/{DLA.md => DLA_en.md} | 10 ++- docs/en/models/DPN_DenseNet_en.md | 24 ++++--- docs/en/models/ESNet_en.md | 23 ++++++ .../EfficientNet_and_ResNeXt101_wsl_en.md | 21 ++++-- docs/en/models/HRNet_en.md | 22 ++++-- docs/en/models/{HarDNet.md => HarDNet_en.md} | 11 ++- docs/en/models/Inception_en.md | 22 ++++-- docs/en/models/LeViT_en.md | 11 ++- docs/en/models/MixNet_en.md | 11 ++- docs/en/models/Mobile_en.md | 21 ++++-- docs/en/models/Others_en.md | 21 ++++-- docs/en/models/PP-LCNet_en.md | 70 +++++++++++++------ docs/en/models/ReXNet_en.md | 7 ++ docs/en/models/{RedNet.md => RedNet_en.md} | 12 +++- docs/en/models/RepVGG_en.md | 11 ++- docs/en/models/ResNeSt_RegNet_en.md | 18 +++-- docs/en/models/ResNet_and_vd_en.md | 22 ++++-- docs/en/models/SEResNext_and_Res2Net_en.md | 22 ++++-- docs/en/models/SwinTransformer_en.md | 12 +++- docs/en/models/{TNT.md => TNT_en.md} | 12 +++- docs/en/models/{Twins.md => Twins_en.md} | 11 ++- docs/en/models/ViT_and_DeiT_en.md | 12 +++- .../config_description_en.md | 48 +++++++++++-- .../en/{ => others}/competition_support_en.md | 0 docs/en/{ => others}/update_history_en.md | 0 .../models_training/config_description.md | 1 + 27 files changed, 347 insertions(+), 108 deletions(-) rename docs/en/{ => algorithm_introduction}/ImageNet_models_en.md (100%) rename docs/en/models/{DLA.md => DLA_en.md} (92%) create mode 100644 docs/en/models/ESNet_en.md rename docs/en/models/{HarDNet.md => HarDNet_en.md} (86%) rename docs/en/models/{RedNet.md => RedNet_en.md} (83%) rename docs/en/models/{TNT.md => TNT_en.md} (85%) rename docs/en/models/{Twins.md => Twins_en.md} (88%) rename docs/en/{tutorials => models_training}/config_description_en.md (94%) rename docs/en/{ => others}/competition_support_en.md (100%) rename docs/en/{ => others}/update_history_en.md (100%) diff --git a/docs/en/ImageNet_models_en.md b/docs/en/algorithm_introduction/ImageNet_models_en.md similarity index 100% rename from docs/en/ImageNet_models_en.md rename to docs/en/algorithm_introduction/ImageNet_models_en.md diff --git a/docs/en/models/DLA.md b/docs/en/models/DLA_en.md similarity index 92% rename from docs/en/models/DLA.md rename to docs/en/models/DLA_en.md index 176d6d1a..fc5d75c0 100644 --- a/docs/en/models/DLA.md +++ b/docs/en/models/DLA_en.md @@ -1,11 +1,17 @@ # DLA series +--- +## Catalogue +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + ## Overview DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484) - -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | |:-----------------:|:----------:|:---------:|:---------:|:---------:| diff --git a/docs/en/models/DPN_DenseNet_en.md b/docs/en/models/DPN_DenseNet_en.md index 3e6aac76..7447d7a1 100644 --- a/docs/en/models/DPN_DenseNet_en.md +++ b/docs/en/models/DPN_DenseNet_en.md @@ -1,6 +1,14 @@ # DPN and DenseNet series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPs and Parameters](#2) +* [3. Inference speed based on V100 GPU](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPS and parameters. @@ -18,10 +26,10 @@ The pretrained models of these two types of models (a total of 10) are open sour For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks. + +## 2. Accuracy, FLOPs and Parameters -## Accuracy, FLOPS and Parameters - -| Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | +| Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPs
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| | DenseNet121 | 0.757 | 0.926 | 0.750 | | 5.690 | 7.980 | | DenseNet161 | 0.786 | 0.941 | 0.778 | | 15.490 | 28.680 | @@ -36,8 +44,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher - -## Inference speed based on V100 GPU + +## 3. Inference speed based on V100 GPU | Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-------------|-----------|-------------------|--------------------------| @@ -53,8 +61,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher | DPN131 | 224 | 256 | 28.083 | - -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/ESNet_en.md b/docs/en/models/ESNet_en.md new file mode 100644 index 00000000..77219229 --- /dev/null +++ b/docs/en/models/ESNet_en.md @@ -0,0 +1,23 @@ +# ESNet Series +--- +## Catalogue + +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + +## 1. Overview + +ESNet (Enhanced ShuffleNet) is a lightweight network developed by Baidu. This network combines the advantages of MobileNetV3, GhostNet, and PPLCNet on the basis of ShuffleNetV2 to form a faster and more accurate network on ARM devices, Because of its excellent performance, [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) launched in PaddleDetection uses this model as a backbone, with stronger object detection algorithm, the final mAP index refreshed the SOTA index of the object detection model on the ARM device in one fell swoop. + + +## 2. Accuracy, FLOPS and Parameters + +| Models | Top1 | Top5 | FLOPs
(M) | Params
(M) | +|:--:|:--:|:--:|:--:|:--:| +| ESNet_x0_25 | 62.48 | 83.46 | 30.9 | 2.83 | +| ESNet_x0_5 | 68.82 | 88.04 | 67.3 | 3.25 | +| ESNet_x0_75 | 72.24 | 90.45 | 123.7 | 3.87 | +| ESNet_x1_0 | 73.92 | 91.40 | 197.3 | 4.64 | + +Please stay tuned for information such as Inference speed. diff --git a/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md b/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md index 07dff3da..6b25f69d 100644 --- a/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md +++ b/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md @@ -1,6 +1,14 @@ # EfficientNet and ResNeXt101_wsl series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed based on V100 GPU](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture. However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments. @@ -21,7 +29,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed. -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -40,8 +49,8 @@ At present, there are a total of 14 pretrained models of the two types of models | EfficientNetB7 | 0.843 | 0.969 | 0.844 | 0.971 | 72.350 | 64.920 | | EfficientNetB0_
small | 0.758 | 0.926 | | | 0.720 | 4.650 | - -## Inference speed based on V100 GPU + +## 3. Inference speed based on V100 GPU | Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-------------------------------|-----------|-------------------|--------------------------| @@ -61,8 +70,8 @@ At present, there are a total of 14 pretrained models of the two types of models | EfficientNetB0_
small | 224 | 256 | 1.692 | - -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/HRNet_en.md b/docs/en/models/HRNet_en.md index 971aa677..847f849a 100644 --- a/docs/en/models/HRNet_en.md +++ b/docs/en/models/HRNet_en.md @@ -1,6 +1,14 @@ # HRNet series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed based on V100 GPU](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation. @@ -16,8 +24,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training. - -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -32,8 +40,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC | HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 | | SE_HRNet_W64_C_ssld | 0.847 | 0.973 | | | 57.830 | 128.970 | - -## Inference speed based on V100 GPU + +## 3. Inference speed based on V100 GPU | Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-------------|-----------|-------------------|--------------------------| @@ -49,8 +57,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC - -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/HarDNet.md b/docs/en/models/HarDNet_en.md similarity index 86% rename from docs/en/models/HarDNet.md rename to docs/en/models/HarDNet_en.md index 4201cdba..ba1c2e5e 100644 --- a/docs/en/models/HarDNet.md +++ b/docs/en/models/HarDNet_en.md @@ -1,10 +1,17 @@ # HarDNet series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + +## 1. Overview HarDNet(Harmonic DenseNet)is a brand new neural network proposed by National Tsing Hua University in 2019, which to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. [Paper](https://arxiv.org/abs/1909.00948). -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | |:---------------------:|:----------:|:---------:|:---------:|:---------:| diff --git a/docs/en/models/Inception_en.md b/docs/en/models/Inception_en.md index 1291f992..9b312392 100644 --- a/docs/en/models/Inception_en.md +++ b/docs/en/models/Inception_en.md @@ -1,6 +1,14 @@ # Inception series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed based on V100 GPU](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview GoogLeNet is a new neural network structure designed by Google in 2014, which, together with VGG network, became the twin champions of the ImageNet challenge that year. GoogLeNet introduces the Inception structure for the first time, and stacks the Inception structure in the network so that the number of network layers reaches 22, which is also the mark of the convolutional network exceeding 20 layers for the first time. Since 1x1 convolution is used in the Inception structure to reduce the dimension of channel number, and Global pooling is used to replace the traditional method of processing features in multiple fc layers, the final GoogLeNet network has much less FLOPS and parameters than VGG network, which has become a beautiful scenery of neural network design at that time. @@ -22,8 +30,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models The figure above reflects the relationship between the accuracy of Xception series and InceptionV4 and other indicators. Among them, Xception_deeplab is consistent with the structure of the paper, and Xception is an improved model developed by PaddleClas, which improves the accuracy by about 0.6% when the inference speed is basically unchanged. Details of the improved model are being updated, so stay tuned. - -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -37,8 +45,8 @@ The figure above reflects the relationship between the accuracy of Xception seri | InceptionV4 | 0.808 | 0.953 | 0.800 | 0.950 | 24.570 | 42.680 | - -## Inference speed based on V100 GPU + +## 3. Inference speed based on V100 GPU | Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |------------------------|-----------|-------------------|--------------------------| @@ -51,8 +59,8 @@ The figure above reflects the relationship between the accuracy of Xception seri | InceptionV4 | 299 | 320 | 11.141 | - -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/LeViT_en.md b/docs/en/models/LeViT_en.md index 7fd953ac..4d7e5dbb 100644 --- a/docs/en/models/LeViT_en.md +++ b/docs/en/models/LeViT_en.md @@ -1,9 +1,16 @@ # LeViT series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + +## 1. Overview LeViT is a fast inference hybrid neural network for image classification tasks. Its design considers the performance of the network model on different hardware platforms, so it can better reflect the real scenarios of common applications. Through a large number of experiments, the author found a better way to combine the convolutional neural network and the Transformer system, and proposed an attention-based method to integrate the position information encoding in the Transformer. [Paper](https://arxiv.org/abs/2104.01136)。 -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(M) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| diff --git a/docs/en/models/MixNet_en.md b/docs/en/models/MixNet_en.md index 0734e843..a0faa3dc 100644 --- a/docs/en/models/MixNet_en.md +++ b/docs/en/models/MixNet_en.md @@ -1,6 +1,12 @@ # MixNet series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + +## 1. Overview MixNet is a lightweight network proposed by Google. The main idea of MixNet is to explore the combination of different size of kernels. The author found that the current network has the following two problems: @@ -9,7 +15,8 @@ MixNet is a lightweight network proposed by Google. The main idea of MixNet is t In order to solve the above two problems, MDConv(mixed depthwise convolution) is proposed. In this method, different size of kernels are mixed in a convolution operation block. And based on AutoML, a series of networks called MixNets are proposed, which have achieved good results on Imagenet. [paper](https://arxiv.org/pdf/1907.09595.pdf) -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | FLOPS
(M) | Params
(G | | :------: | :---: | :---: | :---------------: | :----------: | ------------- | diff --git a/docs/en/models/Mobile_en.md b/docs/en/models/Mobile_en.md index 6bd7c94c..543bfb9e 100644 --- a/docs/en/models/Mobile_en.md +++ b/docs/en/models/Mobile_en.md @@ -1,6 +1,14 @@ # Mobile and Embedded Vision Applications Network series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed and storage size based on SD855](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview MobileNetV1 is a network launched by Google in 2017 for use on mobile devices or embedded devices. The network replaces the depthwise separable convolution with the traditional convolution operation, that is, the combination of depthwise convolution and pointwise convolution. Compared with the traditional convolution operation, this combination can greatly save the number of parameters and computation. At the same time, MobileNetV1 can also be used for object detection, image segmentation and other visual tasks. @@ -22,7 +30,8 @@ GhosttNet is a brand-new lightweight network structure proposed by Huawei in 202 Currently there are 32 pretrained models of the mobile series open source by PaddleClas, and their indicators are shown in the figure below. As you can see from the picture, newer lightweight models tend to perform better, and MobileNetV3 represents the latest lightweight neural network architecture. In MobileNetV3, the author used 1x1 convolution after global-avg-pooling in order to obtain higher accuracy,this operation significantly increases the number of parameters but has little impact on the amount of computation, so if the model is evaluated from a storage perspective of excellence, MobileNetV3 does not have much advantage, but because of its smaller computation, it has a faster inference speed. In addition, the SSLD distillation model in our model library performs excellently, refreshing the accuracy of the current lightweight model from various perspectives. Due to the complex structure and many branches of the MobileNetV3 model, which is not GPU friendly, the GPU inference speed is not as good as that of MobileNetV1. -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -64,8 +73,8 @@ Currently there are 32 pretrained models of the mobile series open source by Pad | GhostNet_x1_3 | 0.757 | 0.925 | 0.757 | 0.927 | 0.440 | 7.300 | | GhostNet_x1_3_ssld | 0.794 | 0.945 | 0.757 | 0.927 | 0.440 | 7.300 | - -## Inference speed and storage size based on SD855 + +## 3. Inference speed and storage size based on SD855 | Models | Batch Size=1(ms) | Storage Size(M) | |:--:|:--:|:--:| @@ -107,8 +116,8 @@ Currently there are 32 pretrained models of the mobile series open source by Pad | GhostNet_x1_3 | 19.982 | 29.000 | | GhostNet_x1_3_ssld | 19.982 | 29.000 | - -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------| diff --git a/docs/en/models/Others_en.md b/docs/en/models/Others_en.md index 4511ddb4..e8101b5e 100644 --- a/docs/en/models/Others_en.md +++ b/docs/en/models/Others_en.md @@ -1,6 +1,14 @@ # Other networks +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed and storage size based on SD855](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview In 2012, AlexNet network proposed by Alex et al. won the ImageNet competition by far surpassing the second place, and the convolutional neural network and even deep learning attracted wide attention. AlexNet used relu as the activation function of CNN to solve the gradient dispersion problem of sigmoid when the network is deep. During the training, Dropout was used to randomly lose a part of the neurons, avoiding the overfitting of the model. In the network, overlapping maximum pooling is used to replace the average pooling commonly used in CNN, which avoids the fuzzy effect of average pooling and improves the feature richness. In a sense, AlexNet has exploded the research and application of neural networks. @@ -11,8 +19,8 @@ VGG is a convolutional neural network developed by researchers at Oxford Univers DarkNet53 is designed for object detection by YOLO author in the paper. The network is basically composed of 1x1 and 3x3 kernel, with a total of 53 layers, named DarkNet53. - -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -26,8 +34,8 @@ DarkNet53 is designed for object detection by YOLO author in the paper. The netw | DarkNet53 | 0.780 | 0.941 | 0.772 | 0.938 | 18.580 | 41.600 | - -## Inference speed based on V100 GPU + +## 3. Inference speed based on V100 GPU | Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | @@ -41,7 +49,8 @@ DarkNet53 is designed for object detection by YOLO author in the paper. The netw | VGG19 | 224 | 256 | 3.076 | | DarkNet53 | 256 | 256 | 3.139 | -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/PP-LCNet_en.md b/docs/en/models/PP-LCNet_en.md index 57e34151..bc1e0759 100644 --- a/docs/en/models/PP-LCNet_en.md +++ b/docs/en/models/PP-LCNet_en.md @@ -1,26 +1,49 @@ # PP-LCNet Series +--- -## Abstract + +## Catalogue + +- [1. Abstract](#1) +- [2. Introduction](#2) +- [3. Method](#3) + - [3.1 Better Activation Function](#3.1) + - [3.2 SE Modules at Appropriate Positions](#3.2) + - [3.3 Larger Convolution Kernels](#3.3) + - [3.4 Larger Dimensional 1 × 1 Conv Layer after GAP](#3.4) +- [4. Experiments](#4) + - [4.1 Image Classification](#4.1) + - [4.2 Object Detection](#4.2) + - [4.3 Semantic Segmentation](#4.3) +- [5. Conclusion](#5) +- [6. Reference](#6) + + +## 1. Abstract In the field of computer vision, the quality of backbone network determines the outcome of the whole vision task. In previous studies, researchers generally focus on the optimization of FLOPs or Params, but inference speed actually serves as an importance indicator of model quality in real-world scenarios. Nevertheless, it is difficult to balance inference speed and accuracy. In view of various CPU-based applications in industry, we are now working to raise the adaptability of the backbone network to Intel CPU, so as to obtain a faster and more accurate lightweight backbone network. At the same time, the performance of downstream vision tasks such as object detection and semantic segmentation are also improved. -## Introduction + +## 2. Introduction Recent years witnessed the emergence of many lightweight backbone networks. In past two years, in particular, there were abundant networks searched by NAS that either enjoy advantages on FLOPs or Params, or have an edge in terms of inference speed on ARM devices. However, few of them dedicated to specified optimization of Intel CPU, resulting their imperfect inference speed on the intel CPU side. Based on this, we specially design the backbone network PP-LCNet for Intel CPU devices with its acceleration library MKLDNN. Compared with other lightweight SOTA models, this backbone network can further improve the performance of the model without increasing the inference time, significantly outperforming the existing SOTA models. A comparison chart with other models is shown below.
-## Method + +## 3. Method The overall structure of the network is shown in the figure below.
Build on extensive experiments, we found that many seemingly less time-consuming operations will increase the latency on Intel CPU-based devices, especially when the MKLDNN acceleration library is enabled. Therefore, we finally chose a block with the leanest possible structure and the fastest possible speed to form our BaseNet (similar to MobileNetV1). Based on BaseNet, we summarized four strategies that can improve the accuracy of the model without increasing the latency, and we combined these four strategies to form PP-LCNet. Each of these four strategies is introduced as below: -### Better Activation Function + +### 3.1 Better Activation Function Since the adoption of ReLU activation function by convolutional neural network, the network performance has been improved substantially, and variants of the ReLU activation function have appeared in recent years, such as Leaky-ReLU, P-ReLU, ELU, etc. In 2017, Google Brain searched to obtain the swish activation function, which performs well on lightweight networks. In 2019, the authors of MobileNetV3 further optimized this activation function to H-Swish, which removes the exponential operation, leading to faster speed and an almost unaffected network accuracy. After many experiments, we also recognized its excellent performance on lightweight networks. Therefore, this activation function is adopted in PP-LCNet. -### SE Modules at Appropriate Positions + +### 3.2 SE Modules at Appropriate Positions The SE module is a channel attention mechanism proposed by SENet, which can effectively improve the accuracy of the model. However, on the Intel CPU side, the module also presents a large latency, leaving us the task of balancing accuracy and speed. The search of the location of the SE module in NAS search-based networks such as MobileNetV3 brings no general conclusions, but we found through our experiments that the closer the SE module is to the tail of the network the greater the improvement in model accuracy. The following table also shows some of our experimental results: @@ -33,12 +56,13 @@ The SE module is a channel attention mechanism proposed by SENet, which can effe The option in the third row of the table was chosen for the location of the SE module in PP-LCNet. -### Larger Convolution Kernels + +### 3.3 Larger Convolution Kernels In the paper of MixNet, the author analyzes the effect of convolutional kernel size on model performance and concludes that larger convolutional kernels within a certain range can improve the performance of the model, but beyond this range will be detrimental to the model’s performance. So the author forms MixConv with split-concat paradigm combined, which can improve the performance of the model but is not conducive to inference. We experimentally summarize the role of some larger convolutional kernels at different positions that are similar to those of the SE module, and find that larger convolutional kernels display more prominent roles in the middle and tail of the network. The following table shows the effect of the position of the 5x5 convolutional kernels on the accuracy: -| SE Location | Top-1 Acc(\%) | Latency(ms) | -|-------------------|---------------|-------------| +| Larger Convolution Location | Top-1 Acc(\%) | Latency(ms) | +|----------------------------|---------------|-------------| | 1111111111111 | 63.22 | 2.08 | | 1111111000000 | 62.70 | 2.07 | | 0000001111111 | 63.14 | 2.05 | @@ -46,7 +70,8 @@ In the paper of MixNet, the author analyzes the effect of convolutional kernel s Experiments show that a larger convolutional kernel placed at the middle and tail of the network can achieve the same accuracy as placed at all positions, coupled with faster inference. The option in the third row of the table was the final choice of PP-LCNet. -### Larger Dimensional 1 × 1 Conv Layer after GAP + +### 3.4 Larger Dimensional 1 × 1 Conv Layer after GAP Since the introduction of GoogLeNet, GAP (Global-Average-Pooling) is often directly followed by a classification layer, which fails to result in further integration and processing of features extracted after GAP in the lightweight network. If a larger 1x1 convolutional layer (equivalent to the FC layer) is used after GAP, the extracted features, instead of directly passing through the classification layer, will first be integrated, and then classified. This can greatly improve the accuracy rate without affecting the inference speed of the model. The above four improvements were made to BaseNet to obtain PP-LCNet. The following table further illustrates the impact of each scheme on the results: @@ -58,10 +83,11 @@ Since the introduction of GoogLeNet, GAP (Global-Average-Pooling) is often direc | 1 | 1 | 1 | 0 | 59.91 | 1.85 | | 1 | 1 | 1 | 1 | 63.14 | 2.05 | + +## 4. Experiments -## Experiments - -### Image Classification + +### 4.1 Image Classification For image classification, ImageNet dataset is adopted. Compared with the current mainstream lightweight network, PP-LCNet can obtain faster inference speed with the same accuracy. When using Baidu’s self-developed SSLD distillation strategy, the accuracy is further improved, with the Top-1 Acc of ImageNet exceeding 80% at an inference speed of about 5ms on the Intel CPU side. @@ -75,9 +101,9 @@ For image classification, ImageNet dataset is adopted. Compared with the current | PP-LCNet-1.5x | 4.5 | 342 | 73.71 | 91.53 | 3.19 | | PP-LCNet-2x | 6.5 | 590 | 75.18 | 92.27 | 4.27 | | PP-LCNet-2.5x | 9.0 | 906 | 76.60 | 93.00 | 5.39 | -| PP-LCNet-0.25x\* | 1.9 | 47 | 66.10 | 86.46 | 2.05 | -| PP-LCNet-0.25x\* | 3.0 | 161 | 74.39 | 92.09 | 2.46 | -| PP-LCNet-0.25x\* | 9.0 | 906 | 80.82 | 95.33 | 5.39 | +| PP-LCNet-0.5x\* | 1.9 | 47 | 66.10 | 86.46 | 2.05 | +| PP-LCNet-1.0x\* | 3.0 | 161 | 74.39 | 92.09 | 2.46 | +| PP-LCNet-2.5x\* | 9.0 | 906 | 80.82 | 95.33 | 5.39 | \* denotes the model after using SSLD distillation. @@ -98,8 +124,8 @@ Performance comparison with other lightweight networks: | MobileNetV3-small-1.25x | 3.6 | 100 | 70.67 | 89.51 | 3.95 | | PP-LCNet-1x | 3.0 | 161 | 71.32 | 90.03 | 2.46 | - -### Object Detection + +### 4.2 Object Detection For object detection, we adopt Baidu’s self-developed PicoDet, which focuses on lightweight object detection scenarios. The following table shows the comparison between the results of PP-LCNet and MobileNetV3 on the COCO dataset. PP-LCNet has an obvious advantage in both accuracy and speed. @@ -110,8 +136,8 @@ MobileNetV3-large-0.35x | 19.2 | 8.1 | MobileNetV3-large-0.75x | 25.8 | 11.1 | PP-LCNet-1x | 26.9 | 7.9 | - -### Semantic Segmentation + +### 4.3 Semantic Segmentation For semantic segmentation, DeeplabV3+ is adopted. The following table presents the comparison between PP-LCNet and MobileNetV3 on the Cityscapes dataset, and PP-LCNet also stands out in terms of accuracy and speed. @@ -122,11 +148,13 @@ MobileNetV3-large-0.5x | 55.42 | 135 | MobileNetV3-large-0.75x | 64.53 | 151 | PP-LCNet-1x | 66.03 | 96 | -## Conclusion + +## 5. Conclusion Rather than holding on to perfect FLOPs and Params as academics do, PP-LCNet focuses on analyzing how to add Intel CPU-friendly modules to improve the performance of the model, which can better balance accuracy and inference time. The experimental conclusions therein are available to other researchers in network structure design, while providing NAS search researchers with a smaller search space and general conclusions. The finished PP-LCNet can also be better accepted and applied in industry. -## Reference + +## 6. Reference Reference to cite when you use PP-LCNet in a paper: ``` diff --git a/docs/en/models/ReXNet_en.md b/docs/en/models/ReXNet_en.md index df9f2ed4..ab9cd01d 100644 --- a/docs/en/models/ReXNet_en.md +++ b/docs/en/models/ReXNet_en.md @@ -1,9 +1,16 @@ # ReXNet series +--- +## Catalogue +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + ## Overview ReXNet is proposed by NAVER AI Lab, which is based on new network design principles. Aiming at the problem of representative bottleneck in the existing network, a set of design principles are proposed. The author believes that the conventional design produce representational bottlenecks, which would affect model performance. To investigate the representational bottleneck, the author study the matrix rank of the features generated by ten thousand random networks. Besides, entire layer’s channel configuration is also studied to design more accurate network architectures. In the end, the author proposes a set of simple and effective design principles to mitigate the representational bottleneck. [paper](https://arxiv.org/pdf/2007.00992.pdf) + ## Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | FLOPS
(G) | Params
(M) | diff --git a/docs/en/models/RedNet.md b/docs/en/models/RedNet_en.md similarity index 83% rename from docs/en/models/RedNet.md rename to docs/en/models/RedNet_en.md index b93607f2..b9b5c8ed 100644 --- a/docs/en/models/RedNet.md +++ b/docs/en/models/RedNet_en.md @@ -1,11 +1,17 @@ # RedNet series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) -In the backbone of ResNet and in all bottleneck positions of backbone, the convolution is replaced by Involution, but all convolutions are reserved for channel mapping and fusion. These carefully redesigned entities combine to form a new efficient backbone network, called Rednet. [paper](https://arxiv.org/abs/2103.06255). + +## 1. Overview +In the backbone of ResNet and in all bottleneck positions of backbone, the convolution is replaced by Involution, but all convolutions are reserved for channel mapping and fusion. These carefully redesigned entities combine to form a new efficient backbone network, called Rednet. [paper](https://arxiv.org/abs/2103.06255). -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | |:---------------------:|:----------:|:---------:|:---------:|:---------:| diff --git a/docs/en/models/RepVGG_en.md b/docs/en/models/RepVGG_en.md index f2171a8f..0a028706 100644 --- a/docs/en/models/RepVGG_en.md +++ b/docs/en/models/RepVGG_en.md @@ -1,10 +1,17 @@ # RepVGG series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + +## 1. Overview RepVGG (Making VGG-style ConvNets Great Again) series model is a simple but powerful convolutional neural network architecture proposed by Tsinghua University (Guiguang Ding's team), MEGVII Technology (Jian Sun et al.), HKUST and Aberystwyth University in 2021. The architecture has an inference time agent similar to VGG. The main body is composed of 3x3 convolution and relu stack, while the training time model has multi branch topology. The decoupling of training time and inference time is realized by re-parameterization technology, so the model is called repvgg. [paper](https://arxiv.org/abs/2101.03697). -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1| FLOPS
(G) | |:--:|:--:|:--:|:--:|:--:| diff --git a/docs/en/models/ResNeSt_RegNet_en.md b/docs/en/models/ResNeSt_RegNet_en.md index a2203ad9..748d075b 100644 --- a/docs/en/models/ResNeSt_RegNet_en.md +++ b/docs/en/models/ResNeSt_RegNet_en.md @@ -1,10 +1,20 @@ -## Overview +# ResNeSt and RegNet series +--- +## Catalogue + +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed based on T4 GPU](#3) + + +## 1. Overview The ResNeSt series was proposed in 2020. The original resnet network structure has been improved by introducing K groups and adding an attention module similar to SEBlock in different groups, the accuracy is greater than that of the basic model ResNet, but the parameter amount and flops are almost the same as the basic ResNet. RegNet was proposed in 2020 by Facebook to deepen the concept of design space. Based on AnyNetX, the model performance is gradually improved by shared bottleneck ratio, shared group width, adjusting network depth or width and other strategies. What's more, the design space structure is simplified, whose interpretability is also be improved. The quality of design space is improved while its diversity is maintained. Under similar conditions, the performance of the designed RegNet model performs better than EfficientNet and 5 times faster than EfficientNet. -## Accuracy, FLOPs and Parameters + +## 2. Accuracy, FLOPs and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -12,8 +22,8 @@ RegNet was proposed in 2020 by Facebook to deepen the concept of design space. B | ResNeSt50 | 0.8083 | 0.9542| 0.8113 | -| 10.78 | 27.5 | | RegNetX_4GF | 0.7850 | 0.9416| 0.7860 | -| 8.0 | 22.1 | - -## Inference speed based on T4 GPU + +## 3. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/ResNet_and_vd_en.md b/docs/en/models/ResNet_and_vd_en.md index 5a947081..3ffeb292 100644 --- a/docs/en/models/ResNet_and_vd_en.md +++ b/docs/en/models/ResNet_and_vd_en.md @@ -1,6 +1,14 @@ # ResNet and ResNet_vd series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed based on V100 GPU](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview The ResNet series model was proposed in 2015 and won the championship in the ILSVRC2015 competition with a top5 error rate of 3.57%. The network innovatively proposed the residual structure, and built the ResNet network by stacking multiple residual structures. Experiments show that using residual blocks can improve the convergence speed and accuracy effectively. @@ -23,8 +31,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models As can be seen from the above curves, the higher the number of layers, the higher the accuracy, but the corresponding number of parameters, calculation and latency will increase. ResNet50_vd_ssld further improves the accuracy of top-1 of the ImageNet-1k validation set by using stronger teachers and more data, reaching 82.39%, refreshing the accuracy of ResNet50 series models. - -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -49,8 +57,8 @@ As can be seen from the above curves, the higher the number of layers, the highe * Note: `ResNet50_vd_ssld_v2` is obtained by adding AutoAugment in training process on the basis of `ResNet50_vd_ssld` training strategy.`Fix_ResNet50_vd_ssld_v2` stopped all parameter updates of `ResNet50_vd_ssld_v2` except the FC layer,and fine-tuned on ImageNet1k dataset, the resolution is 320x320. - -## Inference speed based on V100 GPU + +## 3. Inference speed based on V100 GPU | Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |------------------|-----------|-------------------|--------------------------| @@ -71,8 +79,8 @@ As can be seen from the above curves, the higher the number of layers, the highe | ResNet50_vd_ssld | 224 | 256 | 3.165 | | ResNet101_vd_ssld | 224 | 256 | 5.252 | - -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/SEResNext_and_Res2Net_en.md b/docs/en/models/SEResNext_and_Res2Net_en.md index 4ccbce59..e574fd97 100644 --- a/docs/en/models/SEResNext_and_Res2Net_en.md +++ b/docs/en/models/SEResNext_and_Res2Net_en.md @@ -1,6 +1,14 @@ # SEResNeXt and Res2Net series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) +* [3. Inference speed based on V100 GPU](#3) +* [4. Inference speed based on T4 GPU](#4) + + +## 1. Overview ResNeXt, one of the typical variants of ResNet, was presented at the CVPR conference in 2017. Prior to this, the methods to improve the model accuracy mainly focused on deepening or widening the network, which increased the number of parameters and calculation, and slowed down the inference speed accordingly. The concept of cardinality was proposed in ResNeXt structure. The author found that increasing the number of channel groups was more effective than increasing the depth and width through experiments. It can improve the accuracy without increasing the parameter complexity and reduce the number of parameters at the same time, so it is a more successful variant of ResNet. @@ -23,8 +31,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models At present, there are a total of 24 pretrained models of the three categories open sourced by PaddleClas, and the indicators are shown in the figure. It can be seen from the diagram that under the same Flops and Params, the improved model tends to have higher accuracy, but the inference speed is often inferior to the ResNet series. On the other hand, Res2Net performed better. Compared with group operation in ResNeXt and SE structure operation in SEResNet, Res2Net tended to have better accuracy in the same Flops, Params and inference speed. - -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Parameters
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| @@ -57,8 +65,8 @@ At present, there are a total of 24 pretrained models of the three categories op | SENet154_vd | 0.814 | 0.955 | | | 45.830 | 114.290 | - -## Inference speed based on V100 GPU + +## 3. Inference speed based on V100 GPU | Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-----------------------|-----------|-------------------|--------------------------| @@ -87,8 +95,8 @@ At present, there are a total of 24 pretrained models of the three categories op | SE_ResNeXt101_32x4d | 224 | 256 | 19.204 | | SENet154_vd | 224 | 256 | 50.406 | - -## Inference speed based on T4 GPU + +## 4. Inference speed based on T4 GPU | Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | |-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| diff --git a/docs/en/models/SwinTransformer_en.md b/docs/en/models/SwinTransformer_en.md index 11d45d6c..95afaaf6 100644 --- a/docs/en/models/SwinTransformer_en.md +++ b/docs/en/models/SwinTransformer_en.md @@ -1,10 +1,16 @@ # SwinTransformer +--- +## Catalogue -## Overview -Swin Transformer a new vision Transformer, that capably serves as a general-purpose backbone for computer vision. It is a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. [Paper](https://arxiv.org/abs/2103.14030)。 +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + +## 1. Overview +Swin Transformer a new vision Transformer, that capably serves as a general-purpose backbone for computer vision. It is a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. [Paper](https://arxiv.org/abs/2103.14030)。 -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| diff --git a/docs/en/models/TNT.md b/docs/en/models/TNT_en.md similarity index 85% rename from docs/en/models/TNT.md rename to docs/en/models/TNT_en.md index 7e20edab..abdcfbaa 100644 --- a/docs/en/models/TNT.md +++ b/docs/en/models/TNT_en.md @@ -1,12 +1,18 @@ # TNT series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) -TNT(Transformer-iN-Transformer) series models were proposed by Huawei-Noah in 2021 for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3% top-1 accuracy on ImageNet which is 1.5% higher than that of DeiT with similar computational cost. [Paper](https://arxiv.org/abs/2103.00112). + +## 1. Overview +TNT(Transformer-iN-Transformer) series models were proposed by Huawei-Noah in 2021 for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3% top-1 accuracy on ImageNet which is 1.5% higher than that of DeiT with similar computational cost. [Paper](https://arxiv.org/abs/2103.00112). -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | |:---------------------:|:----------:|:---------:|:---------:|:---------:| diff --git a/docs/en/models/Twins.md b/docs/en/models/Twins_en.md similarity index 88% rename from docs/en/models/Twins.md rename to docs/en/models/Twins_en.md index ccd83e44..f86f537c 100644 --- a/docs/en/models/Twins.md +++ b/docs/en/models/Twins_en.md @@ -1,9 +1,16 @@ # Twins +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + +## 1. Overview The Twins network includes Twins-PCPVT and Twins-SVT, which focuses on the meticulous design of the spatial attention mechanism, resulting in a simple but more effective solution. Since the architecture only involves matrix multiplication, and the current deep learning framework has a high degree of optimization for matrix multiplication, the architecture is very efficient and easy to implement. Moreover, this architecture can achieve excellent performance in a variety of downstream vision tasks such as image classification, target detection, and semantic segmentation. [Paper](https://arxiv.org/abs/2104.13840). -## Accuracy, FLOPs and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPs
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| diff --git a/docs/en/models/ViT_and_DeiT_en.md b/docs/en/models/ViT_and_DeiT_en.md index ac275d9b..789ad86c 100644 --- a/docs/en/models/ViT_and_DeiT_en.md +++ b/docs/en/models/ViT_and_DeiT_en.md @@ -1,13 +1,19 @@ # ViT and DeiT series +--- +## Catalogue -## Overview +* [1. Overview](#1) +* [2. Accuracy, FLOPS and Parameters](#2) + + +## 1. Overview ViT(Vision Transformer) series models were proposed by Google in 2020. These models only use the standard transformer structure, completely abandon the convolution structure, splits the image into multiple patches and then inputs them into the transformer, showing the potential of transformer in the CV field.。[Paper](https://arxiv.org/abs/2010.11929)。 DeiT(Data-efficient Image Transformers) series models were proposed by Facebook at the end of 2020. Aiming at the problem that the ViT models need large-scale dataset training, the DeiT improved them, and finally achieved 83.1% Top1 accuracy on ImageNet. More importantly, using convolution model as teacher model, and performing knowledge distillation on these models, the Top1 accuracy of 85.2% can be achieved on the ImageNet dataset. - -## Accuracy, FLOPS and Parameters + +## 2. Accuracy, FLOPS and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| diff --git a/docs/en/tutorials/config_description_en.md b/docs/en/models_training/config_description_en.md similarity index 94% rename from docs/en/tutorials/config_description_en.md rename to docs/en/models_training/config_description_en.md index d510df75..d0025476 100644 --- a/docs/en/tutorials/config_description_en.md +++ b/docs/en/models_training/config_description_en.md @@ -8,11 +8,35 @@ The parameters in the PaddleClas configuration file(`ppcls/configs/*.yaml`)are d ## Details +### Catalogue + +- [1. Classification model](#1) + - [1.1 Global Configuration](#1.1) + - [1.2 Architecture](#1.2) + - [1.3 Loss function](#1.3) + - [1.4 Optimizer](#1.4) + - [1.5 Data reading module(DataLoader)](#1.5) + - [1.5.1 dataset](#1.5.1) + - [1.5.2 sampler](#1.5.2) + - [1.5.3 loader](#1.5.3) + - [1.6 Evaluation metric](#1.6) + - [1.7 Inference](#1.7) +- [2. Distillation model](#2) + - [2.1 Architecture](#2.1) + - [2.2 Loss function](#2.2) + - [2.3 Evaluation metric](#2.3) +- [3. Recognition model](#3) + - [3.1 Architechture](#3.1) + - [3.2 Evaluation metric](#3.2) + + + ### 1. Classification model Here the configuration of `ResNet50_vd` on`ImageNet-1k`is used as an example to explain the each parameter in detail. [Configure Path](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml). -#### 1.1Global Configuration + +#### 1.1 Global Configuration | Parameter name | Specific meaning | Defult value | Optional value | | ------------------ | ------------------------------------------------------- | ---------------- | ----------------- | @@ -31,6 +55,7 @@ Here the configuration of `ResNet50_vd` on`ImageNet-1k`is used as an example to **Note**:The http address of pre-trained model can be filled in the `pretrained_model` + #### 1.2 Architecture | Parameter name | Specific meaning | Defult value | Optional value | @@ -41,6 +66,7 @@ Here the configuration of `ResNet50_vd` on`ImageNet-1k`is used as an example to **Note**: Here pretrained can be set to True or False, so does the path of the weights. In addition, the pretrained is disabled when Global.pretrained_model is also set to the corresponding path. + #### 1.3 Loss function | Parameter name | Specific meaning | Defult value | Optional value | @@ -49,6 +75,7 @@ Here the configuration of `ResNet50_vd` on`ImageNet-1k`is used as an example to | CELoss.weight | The weight of CELoss in the whole Loss | 1.0 | float | | CELoss.epsilon | The epsilon value of label_smooth in CELoss | 0.1 | float,between 0 and 1 | + #### 1.4 Optimizer | Parameter name | Specific meaning | Defult value | Optional value | @@ -73,8 +100,10 @@ Here the configuration of `ResNet50_vd` on`ImageNet-1k`is used as an example to Referring to [learning_rate.py](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/optimizer/learning_rate.py) for adding method and parameters. -#### 1.5 Data reading module(DataLoader) + +#### 1.5 Data reading module(DataLoader) + ##### 1.5.1 dataset | Parameter name | Specific meaning | Defult value | Optional value | @@ -106,6 +135,7 @@ The parameter meaning of batch_transform_ops: | ------------- | -------------- | --------------------------------------- | | MixupOperator | alpha | Mixup parameter value,the larger the value, the stronger the augment | + ##### 1.5.2 sampler | Parameter name | Specific meaning | Default value | Optional value | @@ -114,7 +144,7 @@ The parameter meaning of batch_transform_ops: | batch_size | batch size | 64 | int | | drop_last | Whether to drop the last data that does reach the batch-size | False | bool | | shuffle | whether to shuffle the data | True | bool | - + ##### 1.5.3 loader | Parameter name | Specific meaning | Default meaning | Optional meaning | @@ -122,12 +152,14 @@ The parameter meaning of batch_transform_ops: | num_workers | Number of data read threads | 4 | int | | use_shared_memory | Whether to use shared memory | True | bool | + #### 1.6 Evaluation metric | Parameter name | Specific meaning | Default meaning | Optional meaning | | -------------- | ---------------- | --------------- | ---------------- | | TopkAcc | TopkAcc | [1, 5] | list, int | + #### 1.7 Inference | Parameter name | Specific meaning | Default meaning | Optional meaning | @@ -140,10 +172,12 @@ The parameter meaning of batch_transform_ops: **Note**:The interpretation of `transforms` in the Infer module refers to the interpretation of`transform_ops`in the dataset in the data reading module. -### 2.Distillation model + +### 2. Distillation model **Note**:Here the training configuration of `MobileNetV3_large_x1_0` on `ImageNet-1k` distilled MobileNetV3_small_x1_0 is used as an example to explain the meaning of each parameter in detail. [Configure path](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml). Only parameters that are distinct from the classification model are introduced here. + #### 2.1 Architecture | Parameter name | Specific meaning | Default meaning | Optional meaning | @@ -169,6 +203,7 @@ The parameter meaning of batch_transform_ops: 2.Student's parameters are similar and will not be repeated. + #### 2.2 Loss function | Parameter name | Specific meaning | Default meaning | Optional meaning | @@ -180,6 +215,7 @@ The parameter meaning of batch_transform_ops: | DistillationGTCELos.weight | Loss weight | 1.0 | float | | DistillationCELoss.model_names | Model names with real label for cross-entropy | ["Student"] | —— | + #### 2.3 Evaluation metric | Parameter name | Specific meaning | Default meaning | Optional meaning | @@ -190,10 +226,12 @@ The parameter meaning of batch_transform_ops: **Note**: `DistillationTopkAcc` has the same meaning as `TopkAcc`, except that it is only used in distillation tasks. + ### 3. Recognition model **Note**:The training configuration of`ResNet50` on`LogoDet-3k` is used here as an example to explain the meaning of each parameter in detail. [configure path](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/Logo/ResNet50_ReID.yaml). Only parameters that are distinct from the classification model are presented here. + #### 3.1 Architechture | Parameter name | Specific meaning | Default meaning | Optional meaning | @@ -223,7 +261,7 @@ The parameter meaning of batch_transform_ops: - + #### 3.2 Evaluation metric | Parameter name | Specific meaning | Default meaning | Optional meaning | diff --git a/docs/en/competition_support_en.md b/docs/en/others/competition_support_en.md similarity index 100% rename from docs/en/competition_support_en.md rename to docs/en/others/competition_support_en.md diff --git a/docs/en/update_history_en.md b/docs/en/others/update_history_en.md similarity index 100% rename from docs/en/update_history_en.md rename to docs/en/others/update_history_en.md diff --git a/docs/zh_CN/models_training/config_description.md b/docs/zh_CN/models_training/config_description.md index 6c73b838..8c51d7ab 100644 --- a/docs/zh_CN/models_training/config_description.md +++ b/docs/zh_CN/models_training/config_description.md @@ -22,6 +22,7 @@ - [1.5.2 sampler](#1.5.2) - [1.5.3 loader](#1.5.3) - [1.6 评估指标(Metric)](#1.6) + - [1.7 预测](#1.7) - [2. 蒸馏模型](#2) - [2.1 结构(Arch)](#2.1) - [2.2 损失函数(Loss)](#2.2) -- GitLab