diff --git a/PaddleCV/image_classification/README.md b/PaddleCV/image_classification/README.md index 570e4ae6b9c67a936820a30735e014fe295bd6a1..ada3b92c98bc7c841f7073462fc8455520db2359 100644 --- a/PaddleCV/image_classification/README.md +++ b/PaddleCV/image_classification/README.md @@ -1,24 +1,35 @@ # Image Classification and Model Zoo -Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid. --- ## Table of Contents -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training a model with flexible parameters](#training-a-model-with-flexible-parameters) -- [Using Mixed-Precision Training](#using-mixed-precision-training) -- [Finetuning](#finetuning) -- [Evaluation](#evaluation) -- [Inference](#inference) -- [Supported models and performances](#supported-models-and-performances) -## Installation +- [Introduction](#introduction) +- [Quick Start](#quick-start) + - [Installation](#installation) + - [Data preparation](#data-preparation) + - [Training](#training) + - [Finetuning](#finetuning) + - [Evaluation](#evaluation) + - [Inference](#inference) +- [Advanced Usage](#advanced-usage) + - [Using Mixed-Precision Training](#using-mixed-precision-training) + - [CE](#ce) +- [Supported Models and Performances](#supported-models-and-performances) +- [Reference](#reference) +- [Update](#update) +- [Contribute](#contribute) + +## Introduction + +Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid. + +## Quick Start -Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later, the latest release version is recommended, If the PaddlePaddle on your device is lower than v0.13.0, please follow the instructions in [installation document](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html) and make an update. +### Installation -Note: Please replace [fluid.ParallelExecutor](http://paddlepaddle.org/documentation/docs/zh/1.4/api_cn/fluid_cn.html#parallelexecutor) to [fluid.Executor](http://paddlepaddle.org/documentation/docs/zh/1.4/api_cn/fluid_cn.html#executor) when running the program in the windows & GPU environment. +Running sample code in this directory requires Python 2.7 and later, PaddelPaddle Fluid v1.5 and later, the latest release version is recommended, If the PaddlePaddle on your device is lower than v1.5, please follow the instructions in [installation document](http://paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/index_cn.html) and make an update. -## Data preparation +### Data preparation An example for ImageNet classification is as follows. First of all, preparation of imagenet data can be done as: ``` @@ -34,24 +45,18 @@ In the shell script ```download_imagenet2012.sh```, there are three steps to pr **step-3:** Download training and validation label files. There are two label files which contain train and validation image labels respectively: -* *train_list.txt*: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like: +* train_list.txt: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like: ``` train/n02483708/n02483708_2436.jpeg 369 -train/n03998194/n03998194_7015.jpeg 741 -train/n04523525/n04523525_38118.jpeg 884 -... ``` -* *val_list.txt*: label file of imagenet-2012 validation set, with each line seperated by ```SPACE```, like. +* val_list.txt: label file of imagenet-2012 validation set, with each line seperated by ```SPACE```, like. ``` val/ILSVRC2012_val_00000001.jpeg 65 -val/ILSVRC2012_val_00000002.jpeg 970 -val/ILSVRC2012_val_00000003.jpeg 230 -... ``` You may need to modify the path in reader.py to load data correctly. -## Training a model with flexible parameters +### Training After data preparation, one can start the training step by: @@ -69,6 +74,7 @@ python train.py \ --lr=0.1 ``` **parameter introduction:** + * **model**: name model to use. Default: "SE_ResNeXt50_32x4d". * **num_epochs**: the number of epochs. Default: 120. * **batch_size**: the size of each mini-batch. Default: 256. @@ -100,65 +106,70 @@ python train.py \ Or can start the training step by running the ```run.sh```. -**data reader introduction:** Data reader is defined in ```reader.py```and```reader_cv2.py```, Using CV2 reader can improve the speed of reading. In [training stage](#training-a-model-with-flexible-parameters), random crop and flipping are used, while center crop is used in [Evaluation](#evaluation) and [Inference](#inference) stages. Supported data augmentation includes: +**data reader introduction:** Data reader is defined in PIL: ```reader.py```and opencv: ```reader_cv2.py```, default reader is implemented by opencv. In [Training](#training), random crop and flipping are used, while center crop is used in [Evaluation](#evaluation) and [Inference](#inference) stages. Supported data augmentation includes: + * rotation -* color jitter +* color jitter (haven't implemented in cv2_reader) * random crop * center crop * resize * flipping -## Using Mixed-Precision Training +### Finetuning -You may add `--fp16=1` to start train using mixed precisioin training, which the training process will use float16 and the output model ("master" parameters) is saved as float32. You also may need to pass `--scale_loss` to overcome accuracy issues, usually `--scale_loss=8.0` will do. +Finetuning is to finetune model weights in a specific task by loading pretrained weights. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```, one can finetune a model by running following command: -Note that currently `--fp16` can not use together with `--with_mem_opt`, so pass `--with_mem_opt=0` to disable memory optimization pass. - -## Finetuning - -Finetuning is to finetune model weights in a specific task by loading pretrained weights. After initializing ```path_to_pretrain_model```, one can finetune a model as: ``` -python train.py - --model=SE_ResNeXt50_32x4d \ - --pretrained_model=${path_to_pretrain_model} \ - --batch_size=32 \ - --total_images=1281167 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=True \ - --lr_strategy=piecewise_decay \ - --lr=0.1 +python train.py \ + --pretrained_model=${path_to_pretrain_model} ``` -## Evaluation +Note: Add and adjust other parameters accroding to specific models and tasks. + +### Evaluation + Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command: + ``` python eval.py \ - --model=SE_ResNeXt50_32x4d \ - --batch_size=32 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ --pretrained_model=${path_to_pretrain_model} ``` -## Inference -Inference is used to get prediction score or image features based on trained models. +Note: Add and adjust other parameters accroding to specific models and tasks. + +### Inference + +Inference is used to get prediction score or image features based on trained models. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```. Run following command then obtain prediction score. + ``` python infer.py \ - --model=SE_ResNeXt50_32x4d \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ --pretrained_model=${path_to_pretrain_model} ``` -## Supported models and performances -The image classification models currently supported in models are listed in the table,and the top-1/top-5 accuracy on the imagenet-2012 validation set of the models and the inference time of Paddle Fluid and Paddle TensorRT based on dynamic link library(test GPU model: Tesla P4) are given. As the activation function swish used by ShuffleNetV2 and the activation function relu6 used by MobileNetV2 are not supported by Paddle TensorRT, inference acceleration is not obvious. Paddle TensorRT will support both op soon. The inference method based on dynamic link library will be also released soon,The inference speed indicator may be updated with the official released tool. Pretrained models can be downloaded by clicking related model names. +Note: Add and adjust other parameters accroding to specific models and tasks. + +## Advanced Usage + +### Using Mixed-Precision Training + +You may add `--fp16=1` to start train using mixed precisioin training, which the training process will use float16 and the output model ("master" parameters) is saved as float32. You also may need to pass `--scale_loss` to overcome accuracy issues, usually `--scale_loss=8.0` will do. + +Note that currently `--fp16` can not use together with `--with_mem_opt`, so pass `--with_mem_opt=0` to disable memory optimization pass. + +### CE + +CE is only for internal testing, don't have to set it. + +## Supported Models and Performances + +The image classification models currently supported by PaddlePaddle are listed in the table. It shows the top-1/top-5 accuracy on the ImageNet-2012 validation set of these models, the inference time of Paddle Fluid and Paddle TensorRT based on dynamic link library(test GPU model: Tesla P4). +As the activation function ```swish``` and ```relu6``` which separately used in ShuffleNetV2 and MobileNetV2 net are not supported by Paddle TensorRT, inference acceleration performance of them doesn't significient improve. Pretrained models can be downloaded by clicking related model names. + - Note1: ResNet50_vd_v2 is the distilled version of ResNet50_vd. -- Note2:In addition to the input image resolution 299x299 adopted by InceptionV4, the resolution used by other models is 224x224. -- Note3: Calling dynamic link library to infer requires converting the train model to a binary model. The conversion method is as follows: a. Set the save_inference parameter in infer.py to True; b. Execute infer.py +- Note2: In addition to the image resolution feeded in InceptionV4 net is ```299x299```, others are ```224x224```. +- Note3: It's necessary to convert the train model to a binary model when appling dynamic link library to infer, One can do it by running following command: + + ```python infer.py --save_inference=True``` |model | top-1/top-5 accuracy(CV2) | Paddle Fluid inference time(ms) | Paddle TensorRT inference time(ms) | |- |:-: |:-: |:-: | @@ -185,6 +196,44 @@ The image classification models currently supported in models are listed in the |[SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_32x4d_pretrained.tar) | 78.44%/93.96% | 14.916 | 12.126 | |[SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.tar) | 79.12%/94.20% | 30.085 | 24.110 | |[SE154_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar) | 81.40%/95.48% | 71.892 | 64.855 | -|[GoogleNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.70%/89.66% | 6.528 | 3.076 | +|[GoogLeNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.70%/89.66% | 6.528 | 3.076 | |[ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar) | 70.03%/89.17% | 6.078 | 6.282 | |[InceptionV4](https://paddle-imagenet-models-name.bj.bcebos.com/InceptionV4_pretrained.tar) | 80.77%/95.26% | 32.413 | 18.154 | + +## FAQ + +**Q:** How to solve this problem when I try to train a 6-classes dataset with indicating pretrained_model parameter ? +``` +Enforce failed. Expected x_dims[1] == labels_dims[1], but received x_dims[1]:1000 != labels_dims[1]:6. +``` + +**A:** It may be caused by dismatch dimensions. Please remove fc parameter in pretrained models, It usually named with a prefix ```fc_``` + +## Reference + + +- AlexNet: [imagenet-classification-with-deep-convolutional-neural-networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf), Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton +- ResNet: [Deep Residual Learning for Image Recognitio](https://arxiv.org/abs/1512.03385), Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun +- ResNeXt: [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431), Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He +- SeResNeXt: [Squeeze-and-Excitation Networks](https://arxiv.org/pdf/1709.01507.pdf)Jie Hu, Li Shen, Samuel Albanie +- ShuffleNetV1: [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083), Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun +- ShuffleNetV2: [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164), Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun +- MobileNetV1: [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861), Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam +- MobileNetV2: [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/pdf/1801.04381v4.pdf), Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen +- VGG: [Very Deep Convolutional Networks for Large-scale Image Recognition](https://arxiv.org/pdf/1409.1556), Karen Simonyan, Andrew Zisserman +- GoogLeNet: [Going Deeper with Convolutions](https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf), Christian Szegedy1, Wei Liu2, Yangqing Jia +- InceptionV4: [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261), Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi + + +## Update + +- 2018/12/03 **Stage1**: Update AlexNet, ResNet50, ResNet101, MobileNetV1 +- 2018/12/23 **Stage2**: Update VGG Series, SeResNeXt50_32x4d, SeResNeXt101_32x4d, ResNet152 +- 2019/01/31 Update MobileNetV2 +- 2019/04/01 **Stage3**: Update ResNet18, ResNet34, GoogLeNet, ShuffleNetV2 +- 2019/06/12 **Stage4**:Update ResNet50_vc, ResNet50_vd, ResNet101_vd, ResNet152_vd, ResNet200_vd, SE154_vd InceptionV4, ResNeXt101_64x4d, ResNeXt101_vd_64x4d +- 2019/06/22 Update ResNet50_vd_v2 + +## Contribute + +If you can fix an issue or add a new feature, please open a PR to us. If your PR is accepted, you can get scores according to the quality and difficulty of your PR(0~5), while you got 10 scores, you can contact us for interview or recommendation letter. diff --git a/PaddleCV/image_classification/README_cn.md b/PaddleCV/image_classification/README_cn.md index 46da59616f8ff29fee3c0b7b4da9abd8a42554dc..0de76b32b1ec4cb5d7b611602fcb9286bc286304 100644 --- a/PaddleCV/image_classification/README_cn.md +++ b/PaddleCV/image_classification/README_cn.md @@ -1,24 +1,34 @@ # 图像分类以及模型库 -图像分类是计算机视觉的重要领域,它的目标是将图像分类到预定义的标签。近期,许多研究者提出很多不同种类的神经网络,并且极大的提升了分类算法的性能。本页将介绍如何使用PaddlePaddle进行图像分类。 --- ## 内容 -- [安装](#安装) -- [数据准备](#数据准备) -- [模型训练](#模型训练) -- [混合精度训练](#混合精度训练) -- [参数微调](#参数微调) -- [模型评估](#模型评估) -- [模型预测](#模型预测) -- [已有模型及其性能](#已有模型及其性能) - -## 安装 +- [简介](#简介) +- [快速开始](#快速开始) + - [安装说明](#安装说明) + - [数据准备](#数据准备) + - [模型训练](#模型训练) + - [参数微调](#参数微调) + - [模型评估](#模型评估) + - [模型预测](#模型预测) +- [进阶使用](#进阶使用) + - [混合精度训练](#混合精度训练) + - [CE测试](#ce测试) +- [已发布模型及其性能](#已发布模型及其性能) +- [FAQ](#faq) +- [参考文献](#参考文献) +- [版本更新](#版本更新) +- [如何贡献代码](#如何贡献代码) +- [反馈](#反馈) + +## 简介 +图像分类是计算机视觉的重要领域,它的目标是将图像分类到预定义的标签。近期,许多研究者提出很多不同种类的神经网络,并且极大的提升了分类算法的性能。本页将介绍如何使用PaddlePaddle进行图像分类。 -在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据 [installation document](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html) 中的说明来更新PaddlePaddle。 +## 快速开始 -注意:由于windows不支持nccl,当使用Windows GPU环境时候,需要将示例代码中的[fluid.ParallelExecutor](http://paddlepaddle.org/documentation/docs/zh/1.4/api_cn/fluid_cn.html#parallelexecutor)替换为[fluid.Executor](http://paddlepaddle.org/documentation/docs/zh/1.4/api_cn/fluid_cn.html#executor)。 +### 安装说明 +在当前目录下运行样例代码需要python 2.7及以上版本,PadddlePaddle Fluid v1.5或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据 [installation document](http://paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/index_cn.html) 中的说明来更新PaddlePaddle。 -## 数据准备 +### 数据准备 下面给出了ImageNet分类任务的样例,首先,通过如下的方式进行数据的准备: ``` @@ -33,23 +43,17 @@ sh download_imagenet2012.sh **步骤三:** 下载训练与验证集合对应的标签文件。下面两个文件分别包含了训练集合与验证集合中图像的标签: -* *train_list.txt*: ImageNet-2012训练集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: +* train_list.txt: ImageNet-2012训练集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: ``` train/n02483708/n02483708_2436.jpeg 369 -train/n03998194/n03998194_7015.jpeg 741 -train/n04523525/n04523525_38118.jpeg 884 -... ``` -* *val_list.txt*: ImageNet-2012验证集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: +* val_list.txt: ImageNet-2012验证集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: ``` val/ILSVRC2012_val_00000001.jpeg 65 -val/ILSVRC2012_val_00000002.jpeg 970 -val/ILSVRC2012_val_00000003.jpeg 230 -... ``` -注意:需要根据本地环境调整reader.py相关路径来正确读取数据。 +注意:可能需要根据本地环境调整reader.py相关路径来正确读取数据。 -## 模型训练 +### 模型训练 数据准备完毕后,可以通过如下的方式启动训练: ``` @@ -66,6 +70,7 @@ python train.py \ --lr=0.1 ``` **参数说明:** + * **model**: 模型名称, 默认值: "SE_ResNeXt50_32x4d" * **num_epochs**: 训练回合数,默认值: 120 * **batch_size**: 批大小,默认值: 256 @@ -84,69 +89,76 @@ python train.py \ * **scale_loss**: 调整混合训练的loss scale值,默认值: 1.0 * **l2_decay**: l2_decay值,默认值: 1e-4 * **momentum_rate**: momentum_rate值,默认值: 0.9 +* **use_label_smoothing**: 是否对数据进行label smoothing处理,默认值:False +* **label_smoothing_epsilon**: label_smoothing的epsilon值,默认值:0.2 +* **lower_scale**: 数据随机裁剪处理时的lower scale值, upper scale值固定为1.0,默认值:0.08 +* **lower_ratio**: 数据随机裁剪处理时的lower ratio值,默认值:3./4. +* **upper_ration**: 数据随机裁剪处理时的upper ratio值,默认值:4./3. +* **resize_short_size**: 指定数据处理时改变图像大小的短边值,默认值: 256 +* **use_mixup**: 是否对数据进行mixup处理,默认值:False +* **mixup_alpha**: 指定mixup处理时的alpha值,默认值: 0.2 +* **is_distill**: 是否进行蒸馏训练,默认值: False + +**在```run.sh```中有用于训练的脚本.** -在```run.sh```中有用于训练的脚本. +**数据读取器说明:** 数据读取器定义在PIL:```reader.py```和CV2:```reader_cv2.py```文件中,现在默认基于cv2的数据读取器, 在[训练阶段](#模型训练), 默认采用的增广方式是随机裁剪与水平翻转, 而在[模型评估](#模型评估)与[模型预测](#模型预测)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有: -**数据读取器说明:** 数据读取器定义在```reader.py```和```reader_cv2.py```中。一般, CV2可以提高数据读取速度, PIL reader可以得到相对更高的精度, 我们现在默认基于cv2的数据读取器, 在[训练阶段](#模型训练), 默认采用的增广方式是随机裁剪与水平翻转, 而在[模型评估](#模型评估)与[模型预测](#模型预测)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有: * 旋转 -* 颜色抖动 +* 颜色抖动(cv2暂未实现) * 随机裁剪 * 中心裁剪 * 长宽调整 * 水平翻转 -## 混合精度训练 - -可以通过开启`--fp16=True`启动混合精度训练,这样训练过程会使用float16数据,并输出float32的模型参数("master"参数)。您可能需要同时传入`--scale_loss`来解决fp16训练的精度问题,通常传入`--scale_loss=8.0`即可。 - -注意,目前混合精度训练不能和内存优化功能同时使用,所以需要传`--with_mem_opt=False`这个参数来禁用内存优化功能。 +### 参数微调 -## 参数微调 - -参数微调是指在特定任务上微调已训练模型的参数。通过初始化```path_to_pretrain_model```,微调一个模型可以采用如下的命令: +参数微调是指在特定任务上微调已训练模型的参数。可以下载[已有模型及其性能](#已有模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径,微调一个模型可以采用如下的命令: ``` -python train.py - --model=SE_ResNeXt50_32x4d \ - --pretrained_model=${path_to_pretrain_model} \ - --batch_size=32 \ - --total_images=1281167 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=True \ - --lr_strategy=piecewise_decay \ - --lr=0.1 +python train.py \ + --pretrained_model=${path_to_pretrain_model} ``` +注意:根据具体模型和任务添加并调整其他参数 -## 模型评估 -模型评估是指对训练完毕的模型评估各类性能指标。用户可以下载[已有模型及其性能](#已有模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径。运行如下的命令,可以获得一个模型top-1/top-5精度: +### 模型评估 +模型评估是指对训练完毕的模型评估各类性能指标。可以下载[已有模型及其性能](#已有模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径。运行如下的命令,可以获得模型top-1/top-5精度: ``` python eval.py \ - --model=SE_ResNeXt50_32x4d \ - --batch_size=32 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ --pretrained_model=${path_to_pretrain_model} ``` +注意:根据具体模型和任务添加并调整其他参数 -## 模型预测 -模型预测可以获取一个模型的预测分数或者图像的特征: +### 模型预测 +模型预测可以获取一个模型的预测分数或者图像的特征,可以下载[已有模型及其性能](#已有模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径。运行如下的命令获得预测分数,: ``` python infer.py \ - --model=SE_ResNeXt50_32x4d \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ --pretrained_model=${path_to_pretrain_model} ``` +注意:根据具体模型和任务添加并调整其他参数 + + +##进阶使用 + +### 混合精度训练 + +可以通过开启`--fp16=True`启动混合精度训练,这样训练过程会使用float16数据,并输出float32的模型参数("master"参数)。您可能需要同时传入`--scale_loss`来解决fp16训练的精度问题,通常传入`--scale_loss=8.0`即可。 + +注意,目前混合精度训练不能和内存优化功能同时使用,所以需要传`--with_mem_opt=False`这个参数来禁用内存优化功能。 + +### CE测试 + +注意:CE相关代码仅用于内部测试,enable_ce默认设置False。 -## 已有模型及其性能 + +## 已发布模型及其性能 表格中列出了在models目录下目前支持的图像分类模型,并且给出了已完成训练的模型在ImageNet-2012验证集合上的top-1/top-5精度,以及Paddle Fluid和Paddle TensorRT基于动态链接库的预测时间(测 -试GPU型号为Tesla P4)。由于Paddle TensorRT对ShuffleNetV2使用的激活函数swish,MobileNetV2使用的激活函数relu6不支持,因此预测加速不明显,Paddle TensorRT不久后添加对这两个op的支持。基于动态链接库的预测方法也将在不久后发布,预测速度指标可能会随着正式发布的工具而更新。可以通过点击相应模型的名称下载对应的预训练模型。 -- 注意1:ResNet50_vd_v2是ResNet50_vd蒸馏版本。 -- 注意2:除了InceptionV4采用的输入图像的分辨率为299x299,其余模型测试时使用的分辨率均为224x224。 -- 注意3:调用动态链接库预测时需要将训练模型转换为二进制模型,转换方法如下:a.将infer.py中参数save_inference设置为True; b.执行infer.py。 +试GPU型号为Tesla P4)。由于Paddle TensorRT对ShuffleNetV2使用的激活函数swish,MobileNetV2使用的激活函数relu6不支持,因此预测加速不明显。可以通过点击相应模型的名称下载对应的预训练模型。 + +- 注意 + 1:ResNet50_vd_v2是ResNet50_vd蒸馏版本。 + 2:除了InceptionV4采用的输入图像的分辨率为299x299,其余模型测试时使用的分辨率均为224x224。 + 3:调用动态链接库预测时需要将训练模型转换为二进制模型 + + ```python infer.py --save_inference=True``` |model | top-1/top-5 accuracy(CV2) | Paddle Fluid inference time(ms) | Paddle TensorRT inference time(ms) | |- |:-: |:-: |:-: | @@ -173,6 +185,38 @@ python infer.py \ |[SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_32x4d_pretrained.tar) | 78.44%/93.96% | 14.916 | 12.126 | |[SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.tar) | 79.12%/94.20% | 30.085 | 24.110 | |[SE154_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar) | 81.40%/95.48% | 71.892 | 64.855 | -|[GoogleNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.70%/89.66% | 6.528 | 3.076 | +|[GoogLeNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.70%/89.66% | 6.528 | 3.076 | |[ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar) | 70.03%/89.17% | 6.078 | 6.282 | |[InceptionV4](https://paddle-imagenet-models-name.bj.bcebos.com/InceptionV4_pretrained.tar) | 80.77%/95.26% | 32.413 | 18.154 | + + +## FAQ + +**Q:** 加载预训练模型报错,Enforce failed. Expected x_dims[1] == labels_dims[1], but received x_dims[1]:1000 != labels_dims[1]:6. + +**A:** 维度对不上,删掉预训练参数中的FC + +## 参考文献 +- AlexNet: [imagenet-classification-with-deep-convolutional-neural-networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf), Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton +- ResNet: [Deep Residual Learning for Image Recognitio](https://arxiv.org/abs/1512.03385), Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun +- ResNeXt: [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431), Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He +- SeResNeXt: [Squeeze-and-Excitation Networks](https://arxiv.org/pdf/1709.01507.pdf)Jie Hu, Li Shen, Samuel Albanie +- ShuffleNetV1: [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083), Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun +- ShuffleNetV2: [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164), Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun +- MobileNetV1: [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861), Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam +- MobileNetV2: [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/pdf/1801.04381v4.pdf), Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen +- VGG: [Very Deep Convolutional Networks for Large-scale Image Recognition](https://arxiv.org/pdf/1409.1556), Karen Simonyan, Andrew Zisserman +- GoogLeNet: [Going Deeper with Convolutions](https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf), Christian Szegedy1, Wei Liu2, Yangqing Jia +- InceptionV4: [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261), Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi + +## 版本更新 +- 2018/12/03 **Stage1**: 更新AlexNet,ResNet50,ResNet101,MobileNetV1 +- 2018/12/23 **Stage2**: 更新VGG系列 SeResNeXt50_32x4d,SeResNeXt101_32x4d,ResNet152 +- 2019/01/31 更新MobileNetV2 +- 2019/04/01 **Stage3**: 更新ResNet18,ResNet34,GoogLeNet,ShuffleNetV2 +- 2019/06/12 **Stage4**: 更新ResNet50_vc,ResNet50_vd,ResNet101_vd,ResNet152_vd,ResNet200_vd,SE154_vd InceptionV4,ResNeXt101_64x4d,ResNeXt101_vd_64x4d +- 2019/06/22 更新ResNet50_vd_v2 + +## 如何贡献代码 + +如果你可以修复某个issue或者增加一个新功能,欢迎给我们提交PR。如果对应的PR被接受了,我们将根据贡献的质量和难度进行打分(0-5分,越高越好)。如果你累计获得了10分,可以联系我们获得面试机会或者为你写推荐信。 diff --git a/PaddleCV/image_classification/train.py b/PaddleCV/image_classification/train.py index a50b1253b701fb19947f4a391c48cc3d99886696..e24b0b957384de39b491f8bce9b3463aa2f6ea45 100644 --- a/PaddleCV/image_classification/train.py +++ b/PaddleCV/image_classification/train.py @@ -539,7 +539,7 @@ def train(args): test_acc5 = np.array(test_info[2]).mean() if use_mixup: - print("End pass {0}, train_loss {1}, test_loss {4}, test_acc1 {5}, test_acc5 {6}".format( + print("End pass {0}, train_loss {1}, test_loss {2}, test_acc1 {3}, test_acc5 {4}".format( pass_id, "%.5f"%train_loss, "%.5f"%test_loss, "%.5f"%test_acc1, "%.5f"%test_acc5)) else: