未验证 提交 26f2a9bc 编写于 作者: S SunGaofeng 提交者: GitHub

Merge pull request #2 from PaddlePaddle/develop

pull request to update video 
paddle/operators/check_t.save
paddle/operators/check_tensor.ls
paddle/operators/tensor.save
python/paddle/v2/fluid/tests/book/image_classification_resnet.inference.model/
python/paddle/v2/fluid/tests/book/image_classification_vgg.inference.model/
python/paddle/v2/fluid/tests/book/label_semantic_roles.inference.model/
*.DS_Store
*.vs
build/
build_doc/
*.user
.vscode
.idea
.project
.cproject
.pydevproject
.settings/
*.pyc
CMakeSettings.json
Makefile
.test_env/
third_party/
*~
bazel-*
third_party/
build_*
# clion workspace.
cmake-build-*
model_test
\ No newline at end of file
*.vscode
......@@ -7,3 +7,6 @@
[submodule "fluid/PaddleNLP/Senta"]
path = fluid/PaddleNLP/Senta
url = https://github.com/baidu/Senta.git
[submodule "fluid/PaddleNLP/LARK"]
path = fluid/PaddleNLP/LARK
url = https://github.com/PaddlePaddle/LARK
......@@ -16,54 +16,57 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
## PaddleCV
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[AlexNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks)
[AlexNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks)
[VGG](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力|[Very Deep ConvNets for Large-Scale Inage Recognition](https://arxiv.org/pdf/1409.1556.pdf)
[GoogleNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594)
[ResNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
[Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261)
[MobileNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
[DPN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629)
[SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507)
[SSD](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325)
[Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf)
[Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)
[ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
[DCGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf)
[ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
[CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593)
[CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks)
[Attention模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247)
[GoogleNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594)
[ResNet](./fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
[Inception-v4](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261)
[MobileNet](./fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
[DPN](./fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629)
[SE-ResNeXt](./fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507)
[SSD](./fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325)
[Face Detector: PyramidBox](./fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf)
[Faster RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)
[Mask RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|基于Faster RCNN模型的经典实例分割模型|在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。|[Mask R-CNN](https://arxiv.org/abs/1703.06870)
[ICNet](./fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
[DCGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf)
[ConditionalGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
[CycleGAN](./fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593)
[CRNN-CTC模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks)
[Attention模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247)
[Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/metric_learning)|度量学习模型|能够用于分析对象时间的关联、比较关系,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域|-
[TSN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859)
[caffe2fluid](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|-
[TSN](./fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859)
[视频模型库](./fluid/PaddleCV/video)|视频模型库|给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型||
[caffe2fluid](./fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|-
## PaddleNLP
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[Transformer](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762)
[Transformer](./fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762)
[BERT](https://github.com/PaddlePaddle/LARK/tree/develop/BERT)|语义表示模型|在多个 NLP 任务上取得 SOTA 效果,支持多卡多机训练,支持混合精度训练|[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
[LAC](https://github.com/baidu/lac/blob/master/README.md)|联合的词法分析模型|能够整体性地完成中文分词、词性标注、专名识别任务|[Chinese Lexical Analysis with Deep Bi-GRU-CRF Network](https://arxiv.org/abs/1807.01882)
[Senta](https://github.com/baidu/Senta/blob/master/README.md)|情感倾向分析模型集|百度AI开放平台中情感倾向分析模型|-
[DAM](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103)
[DAM](./fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103)
[SimNet](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)|语义匹配框架|使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力|-
[DuReader](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|-
[Bi-GRU-CRF](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|-
[DuReader](./fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|-
[Bi-GRU-CRF](./fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|-
## PaddleRec
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[TagSpace](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462)
[GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)
[SSR](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726)
[DeepCTR](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247)
[Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)
[TagSpace](./fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462)
[GRU4Rec](./fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)
[SSR](./fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726)
[DeepCTR](./fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247)
[Multiview-Simnet](./fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)
## Other Models
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[DeepASR](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|-
[DQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236)
[DoubleDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
[DuelingDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
[DeepASR](./fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|-
[DQN](./fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236)
[DoubleDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
[DuelingDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
## License
This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE).
......
# LRC Local Rademachar Complexity Regularization
Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper
> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
> Yingzhen Yang, Xingjian Li, Jun Huan.\
> _arXiv:1902.00873_.
---
# Table of Contents
- [Installation](#installation)
- [Data preparation](#data-preparation)
- [Training](#training)
## Installation
Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update.
## Data preparation
When you want to use the cifar-10 dataset for the first time, you can download the dataset as:
sh ./dataset/download.sh
Please make sure your environment has an internet connection.
The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the same directory as the `train.py`. If automatic download fails, you can download cifar-10-python.tar.gz from https://www.cs.toronto.edu/~kriz/cifar.html and decompress it to the location mentioned above.
## Training
After data preparation, one can start the training step by:
python -u train_mixup.py \
--batch_size=80 \
--auxiliary \
--weight_decay=0.0003 \
--learning_rate=0.025 \
--lrc_loss_lambda=0.7 \
--cutout
- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train.
- For more help on arguments:
python train_mixup.py --help
**data reader introduction:**
* Data reader is defined in `reader.py`.
* Reshape the images to 32 * 32.
* In training stage, images are padding to 40 * 40 and cropped randomly to the original size.
* In training stage, images are horizontally random flipped.
* Images are standardized to (0, 1).
* In training stage, cutout images randomly.
* Shuffle the order of the input images during training.
**model configuration:**
* Use auxiliary loss and auxiliary\_weight=0.4.
* Use dropout and drop\_path\_prob=0.2.
* Set lrc\_loss\_lambda=0.7.
**training strategy:**
* Use momentum optimizer with momentum=0.9.
* Weight decay is 0.0003.
* Use cosine decay with init\_lr=0.025.
* Total epoch is 600.
* Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc.
* Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d.
## Reference
- DARTS: Differentiable Architecture Search [`paper`](https://arxiv.org/abs/1806.09055)
- Differentiable architecture search in PyTorch [`code`](https://github.com/quark0/darts)
# LRC 局部Rademachar复杂度正则化
为了在深度神经网络中提升泛化能力,正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则(LRC)的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合,在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布
> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
> Yingzhen Yang, Xingjian Li, Jun Huan.\
> _arXiv:1902.00873_.
---
# 内容
- [安装](#安装)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
## 安装
在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。
## 数据准备
第一次使用CIFAR-10数据集时,您可以通过如果命令下载:
sh ./dataset/download.sh
请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败,您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。
## 模型训练
数据准备好后,可以通过如下命令开始训练:
python -u train_mixup.py \
--batch_size=80 \
--auxiliary \
--weight_decay=0.0003 \
--learning_rate=0.025 \
--lrc_loss_lambda=0.7 \
--cutout
- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。
- 可选参数见:
python train_mixup.py --help
**数据读取器说明:**
* 数据读取器定义在`reader.py`
* 输入图像尺寸统一变换为32 * 32
* 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小
* 训练时图像随机水平翻转
* 对图像每个像素做归一化处理
* 训练时对图像做随机遮挡
* 训练时对输入图像做随机洗牌
**模型配置:**
* 使用辅助损失,辅助损失权重为0.4
* 使用dropout,随机丢弃率为0.2
* 设置lrc\_loss\_lambda为0.7
**训练策略:**
* 采用momentum优化算法训练,momentum=0.9
* 权重衰减系数为0.0001
* 采用正弦学习率衰减,初始学习率为0.025
* 总共训练600轮
* 对卷积权重采用Xaiver初始化,对batch norm权重采用固定初始化,对全连接层权重采用高斯初始化
* 对batch norm和全连接层偏差采用固定初始化,不对卷积设置偏差
## 引用
- DARTS: Differentiable Architecture Search [`论文`](https://arxiv.org/abs/1806.09055)
- Differentiable Architecture Search in PyTorch [`代码`](https://github.com/quark0/darts)
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
mkdir cifar
cd cifar
# Download the data.
echo "Downloading..."
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
# Extract the data.
echo "Extracting..."
tar zvxf cifar-10-python.tar.gz
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from collections import namedtuple
Genotype = namedtuple('Genotype', 'normal normal_concat reduce reduce_concat')
PRIMITIVES = [
'none', 'max_pool_3x3', 'avg_pool_3x3', 'skip_connect', 'sep_conv_3x3',
'sep_conv_5x5', 'dil_conv_3x3', 'dil_conv_5x5'
]
NASNet = Genotype(
normal=[
('sep_conv_5x5', 1),
('sep_conv_3x3', 0),
('sep_conv_5x5', 0),
('sep_conv_3x3', 0),
('avg_pool_3x3', 1),
('skip_connect', 0),
('avg_pool_3x3', 0),
('avg_pool_3x3', 0),
('sep_conv_3x3', 1),
('skip_connect', 1),
],
normal_concat=[2, 3, 4, 5, 6],
reduce=[
('sep_conv_5x5', 1),
('sep_conv_7x7', 0),
('max_pool_3x3', 1),
('sep_conv_7x7', 0),
('avg_pool_3x3', 1),
('sep_conv_5x5', 0),
('skip_connect', 3),
('avg_pool_3x3', 2),
('sep_conv_3x3', 2),
('max_pool_3x3', 1),
],
reduce_concat=[4, 5, 6], )
AmoebaNet = Genotype(
normal=[
('avg_pool_3x3', 0),
('max_pool_3x3', 1),
('sep_conv_3x3', 0),
('sep_conv_5x5', 2),
('sep_conv_3x3', 0),
('avg_pool_3x3', 3),
('sep_conv_3x3', 1),
('skip_connect', 1),
('skip_connect', 0),
('avg_pool_3x3', 1),
],
normal_concat=[4, 5, 6],
reduce=[
('avg_pool_3x3', 0),
('sep_conv_3x3', 1),
('max_pool_3x3', 0),
('sep_conv_7x7', 2),
('sep_conv_7x7', 0),
('avg_pool_3x3', 1),
('max_pool_3x3', 0),
('max_pool_3x3', 1),
('conv_7x1_1x7', 0),
('sep_conv_3x3', 5),
],
reduce_concat=[3, 4, 6])
DARTS_V1 = Genotype(
normal=[('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('skip_connect', 0),
('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1),
('sep_conv_3x3', 0), ('skip_connect', 2)],
normal_concat=[2, 3, 4, 5],
reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
('max_pool_3x3', 0), ('max_pool_3x3', 0), ('skip_connect', 2),
('skip_connect', 2), ('avg_pool_3x3', 0)],
reduce_concat=[2, 3, 4, 5])
DARTS_V2 = Genotype(
normal=[('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0),
('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('skip_connect', 0),
('skip_connect', 0), ('dil_conv_3x3', 2)],
normal_concat=[2, 3, 4, 5],
reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
('max_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2),
('skip_connect', 2), ('max_pool_3x3', 1)],
reduce_concat=[2, 3, 4, 5])
MY_DARTS = Genotype(
normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 0),
('dil_conv_5x5', 1), ('skip_connect', 0), ('sep_conv_3x3', 1),
('skip_connect', 0), ('sep_conv_3x3', 1)],
normal_concat=range(2, 6),
reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('max_pool_3x3', 0),
('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2),
('skip_connect', 2), ('skip_connect', 3)],
reduce_concat=range(2, 6))
DARTS = MY_DARTS
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers.ops as ops
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
import math
from paddle.fluid.initializer import init_on_cpu
def cosine_decay(learning_rate, num_epoch, steps_one_epoch):
"""Applies cosine decay to the learning rate.
lr = 0.5 * (math.cos(epoch * (math.pi / 120)) + 1)
"""
global_step = _decay_step_counter()
with init_on_cpu():
decayed_lr = learning_rate * \
(ops.cos((global_step / steps_one_epoch) \
* math.pi / num_epoch) + 1)/2
return decayed_lr
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import numpy as np
import time
import functools
import paddle
import paddle.fluid as fluid
from operations import *
class Cell():
def __init__(self, genotype, C_prev_prev, C_prev, C, reduction,
reduction_prev):
print(C_prev_prev, C_prev, C)
if reduction_prev:
self.preprocess0 = functools.partial(FactorizedReduce, C_out=C)
else:
self.preprocess0 = functools.partial(
ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0)
self.preprocess1 = functools.partial(
ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0)
if reduction:
op_names, indices = zip(*genotype.reduce)
concat = genotype.reduce_concat
else:
op_names, indices = zip(*genotype.normal)
concat = genotype.normal_concat
print(op_names, indices, concat, reduction)
self._compile(C, op_names, indices, concat, reduction)
def _compile(self, C, op_names, indices, concat, reduction):
assert len(op_names) == len(indices)
self._steps = len(op_names) // 2
self._concat = concat
self.multiplier = len(concat)
self._ops = []
for name, index in zip(op_names, indices):
stride = 2 if reduction and index < 2 else 1
op = functools.partial(OPS[name], C=C, stride=stride, affine=True)
self._ops += [op]
self._indices = indices
def forward(self, s0, s1, drop_prob, is_train, name):
self.training = is_train
preprocess0_name = name + 'preprocess0.'
preprocess1_name = name + 'preprocess1.'
s0 = self.preprocess0(s0, name=preprocess0_name)
s1 = self.preprocess1(s1, name=preprocess1_name)
out = [s0, s1]
for i in range(self._steps):
h1 = out[self._indices[2 * i]]
h2 = out[self._indices[2 * i + 1]]
op1 = self._ops[2 * i]
op2 = self._ops[2 * i + 1]
h3 = op1(h1, name=name + '_ops.' + str(2 * i) + '.')
h4 = op2(h2, name=name + '_ops.' + str(2 * i + 1) + '.')
if self.training and drop_prob > 0.:
if h3 != h1:
h3 = fluid.layers.dropout(
h3,
drop_prob,
dropout_implementation='upscale_in_train')
if h4 != h2:
h4 = fluid.layers.dropout(
h4,
drop_prob,
dropout_implementation='upscale_in_train')
s = h3 + h4
out += [s]
return fluid.layers.concat([out[i] for i in self._concat], axis=1)
def AuxiliaryHeadCIFAR(input, num_classes, aux_name='auxiliary_head'):
relu_a = fluid.layers.relu(input)
pool_a = fluid.layers.pool2d(relu_a, 5, 'avg', 3)
conv2d_a = fluid.layers.conv2d(
pool_a,
128,
1,
name=aux_name + '.features.2',
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=aux_name + '.features.2.weight'),
bias_attr=False)
bn_a_name = aux_name + '.features.3'
bn_a = fluid.layers.batch_norm(
conv2d_a,
act='relu',
name=bn_a_name,
param_attr=ParamAttr(
initializer=Constant(1.), name=bn_a_name + '.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=bn_a_name + '.bias'),
moving_mean_name=bn_a_name + '.running_mean',
moving_variance_name=bn_a_name + '.running_var')
conv2d_b = fluid.layers.conv2d(
bn_a,
768,
2,
name=aux_name + '.features.5',
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=aux_name + '.features.5.weight'),
bias_attr=False)
bn_b_name = aux_name + '.features.6'
bn_b = fluid.layers.batch_norm(
conv2d_b,
act='relu',
name=bn_b_name,
param_attr=ParamAttr(
initializer=Constant(1.), name=bn_b_name + '.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=bn_b_name + '.bias'),
moving_mean_name=bn_b_name + '.running_mean',
moving_variance_name=bn_b_name + '.running_var')
fc_name = aux_name + '.classifier'
fc = fluid.layers.fc(bn_b,
num_classes,
name=fc_name,
param_attr=ParamAttr(
initializer=Normal(scale=1e-3),
name=fc_name + '.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=fc_name + '.bias'))
return fc
def StemConv(input, C_out, kernel_size, padding):
conv_a = fluid.layers.conv2d(
input,
C_out,
kernel_size,
padding=padding,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0), name='stem.0.weight'),
bias_attr=False)
bn_a = fluid.layers.batch_norm(
conv_a,
param_attr=ParamAttr(
initializer=Constant(1.), name='stem.1.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name='stem.1.bias'),
moving_mean_name='stem.1.running_mean',
moving_variance_name='stem.1.running_var')
return bn_a
class NetworkCIFAR(object):
def __init__(self, C, class_num, layers, auxiliary, genotype):
self.class_num = class_num
self._layers = layers
self._auxiliary = auxiliary
stem_multiplier = 3
self.drop_path_prob = 0
C_curr = stem_multiplier * C
C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
self.cells = []
reduction_prev = False
for i in range(layers):
if i in [layers // 3, 2 * layers // 3]:
C_curr *= 2
reduction = True
else:
reduction = False
cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction,
reduction_prev)
reduction_prev = reduction
self.cells += [cell]
C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr
if i == 2 * layers // 3:
C_to_auxiliary = C_prev
def forward(self, init_channel, is_train):
self.training = is_train
self.logits_aux = None
num_channel = init_channel * 3
s0 = StemConv(self.image, num_channel, kernel_size=3, padding=1)
s1 = s0
for i, cell in enumerate(self.cells):
name = 'cells.' + str(i) + '.'
s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train,
name)
if i == int(2 * self._layers // 3):
if self._auxiliary and self.training:
self.logits_aux = AuxiliaryHeadCIFAR(s1, self.class_num)
out = fluid.layers.adaptive_pool2d(s1, (1, 1), "avg")
self.logits = fluid.layers.fc(out,
size=self.class_num,
param_attr=ParamAttr(
initializer=Normal(scale=1e-3),
name='classifier.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
name='classifier.bias'))
return self.logits, self.logits_aux
def build_input(self, image_shape, batch_size, is_train):
if is_train:
py_reader = fluid.layers.py_reader(
capacity=64,
shapes=[[-1] + image_shape, [-1, 1], [-1, 1], [-1, 1], [-1, 1],
[-1, 1], [-1, batch_size, self.class_num - 1]],
lod_levels=[0, 0, 0, 0, 0, 0, 0],
dtypes=[
"float32", "int64", "int64", "float32", "int32", "int32",
"float32"
],
use_double_buffer=True,
name='train_reader')
else:
py_reader = fluid.layers.py_reader(
capacity=64,
shapes=[[-1] + image_shape, [-1, 1]],
lod_levels=[0, 0],
dtypes=["float32", "int64"],
use_double_buffer=True,
name='test_reader')
return py_reader
def train_model(self, py_reader, init_channels, aux, aux_w, batch_size,
loss_lambda):
self.image, self.ya, self.yb, self.lam, self.label_reshape,\
self.non_label_reshape, self.rad_var = fluid.layers.read_file(py_reader)
self.logits, self.logits_aux = self.forward(init_channels, True)
self.mixup_loss = self.mixup_loss(aux, aux_w)
self.lrc_loss = self.lrc_loss(batch_size)
return self.mixup_loss + loss_lambda * self.lrc_loss
def test_model(self, py_reader, init_channels):
self.image, self.ya = fluid.layers.read_file(py_reader)
self.logits, _ = self.forward(init_channels, False)
prob = fluid.layers.softmax(self.logits, use_cudnn=False)
loss = fluid.layers.cross_entropy(prob, self.ya)
acc_1 = fluid.layers.accuracy(self.logits, self.ya, k=1)
acc_5 = fluid.layers.accuracy(self.logits, self.ya, k=5)
return loss, acc_1, acc_5
def mixup_loss(self, auxiliary, auxiliary_weight):
prob = fluid.layers.softmax(self.logits, use_cudnn=False)
loss_a = fluid.layers.cross_entropy(prob, self.ya)
loss_b = fluid.layers.cross_entropy(prob, self.yb)
loss_a_mean = fluid.layers.reduce_mean(loss_a)
loss_b_mean = fluid.layers.reduce_mean(loss_b)
loss = self.lam * loss_a_mean + (1 - self.lam) * loss_b_mean
if auxiliary:
prob_aux = fluid.layers.softmax(self.logits_aux, use_cudnn=False)
loss_a_aux = fluid.layers.cross_entropy(prob_aux, self.ya)
loss_b_aux = fluid.layers.cross_entropy(prob_aux, self.yb)
loss_a_aux_mean = fluid.layers.reduce_mean(loss_a_aux)
loss_b_aux_mean = fluid.layers.reduce_mean(loss_b_aux)
loss_aux = self.lam * loss_a_aux_mean + (1 - self.lam
) * loss_b_aux_mean
return loss + auxiliary_weight * loss_aux
def lrc_loss(self, batch_size):
y_diff_reshape = fluid.layers.reshape(self.logits, shape=(-1, 1))
label_reshape = fluid.layers.squeeze(self.label_reshape, axes=[1])
non_label_reshape = fluid.layers.squeeze(
self.non_label_reshape, axes=[1])
label_reshape.stop_gradient = True
non_label_reshape.stop_graident = True
y_diff_label_reshape = fluid.layers.gather(y_diff_reshape,
label_reshape)
y_diff_non_label_reshape = fluid.layers.gather(y_diff_reshape,
non_label_reshape)
y_diff_label = fluid.layers.reshape(
y_diff_label_reshape, shape=(-1, batch_size, 1))
y_diff_non_label = fluid.layers.reshape(
y_diff_non_label_reshape,
shape=(-1, batch_size, self.class_num - 1))
y_diff_ = y_diff_non_label - y_diff_label
y_diff_ = fluid.layers.transpose(y_diff_, perm=[1, 2, 0])
rad_var_trans = fluid.layers.transpose(self.rad_var, perm=[1, 2, 0])
rad_y_diff_trans = rad_var_trans * y_diff_
lrc_loss_sum = fluid.layers.reduce_sum(rad_y_diff_trans, dim=[0, 1])
lrc_loss_ = fluid.layers.abs(lrc_loss_sum) / (batch_size *
(self.class_num - 1))
lrc_loss_mean = fluid.layers.reduce_mean(lrc_loss_)
return lrc_loss_mean
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import numpy as np
import time
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Xavier
from paddle.fluid.initializer import Normal
from paddle.fluid.initializer import Constant
OPS = {
'none' : lambda input, C, stride, name, affine: Zero(input, stride, name),
'avg_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'avg', pool_stride=stride, pool_padding=1, name=name),
'max_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'max', pool_stride=stride, pool_padding=1, name=name),
'skip_connect' : lambda input,C, stride, name, affine: Identity(input, name) if stride == 1 else FactorizedReduce(input, C, name=name, affine=affine),
'sep_conv_3x3' : lambda input,C, stride, name, affine: SepConv(input, C, C, 3, stride, 1, name=name, affine=affine),
'sep_conv_5x5' : lambda input,C, stride, name, affine: SepConv(input, C, C, 5, stride, 2, name=name, affine=affine),
'sep_conv_7x7' : lambda input,C, stride, name, affine: SepConv(input, C, C, 7, stride, 3, name=name, affine=affine),
'dil_conv_3x3' : lambda input,C, stride, name, affine: DilConv(input, C, C, 3, stride, 2, 2, name=name, affine=affine),
'dil_conv_5x5' : lambda input,C, stride, name, affine: DilConv(input, C, C, 5, stride, 4, 2, name=name, affine=affine),
'conv_7x1_1x7' : lambda input,C, stride, name, affine: SevenConv(input, C, name=name, affine=affine)
}
def ReLUConvBN(input, C_out, kernel_size, stride, padding, name='',
affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_out,
kernel_size,
stride,
padding,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False)
if affine:
reluconvbn_out = fluid.layers.batch_norm(
conv2d_a,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.2.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.2.bias'),
moving_mean_name=name + 'op.2.running_mean',
moving_variance_name=name + 'op.2.running_var')
else:
reluconvbn_out = fluid.layers.batch_norm(
conv2d_a,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.2.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.2.bias'),
moving_mean_name=name + 'op.2.running_mean',
moving_variance_name=name + 'op.2.running_var')
return reluconvbn_out
def DilConv(input,
C_in,
C_out,
kernel_size,
stride,
padding,
dilation,
name='',
affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_in,
kernel_size,
stride,
padding,
dilation,
groups=C_in,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False,
use_cudnn=False)
conv2d_b = fluid.layers.conv2d(
conv2d_a,
C_out,
1,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.2.weight'),
bias_attr=False)
if affine:
dilconv_out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
else:
dilconv_out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
return dilconv_out
def SepConv(input,
C_in,
C_out,
kernel_size,
stride,
padding,
name='',
affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_in,
kernel_size,
stride,
padding,
groups=C_in,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False,
use_cudnn=False)
conv2d_b = fluid.layers.conv2d(
conv2d_a,
C_in,
1,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.2.weight'),
bias_attr=False)
if affine:
bn_a = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
else:
bn_a = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
relu_b = fluid.layers.relu(bn_a)
conv2d_d = fluid.layers.conv2d(
relu_b,
C_in,
kernel_size,
1,
padding,
groups=C_in,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.5.weight'),
bias_attr=False,
use_cudnn=False)
conv2d_e = fluid.layers.conv2d(
conv2d_d,
C_out,
1,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.6.weight'),
bias_attr=False)
if affine:
sepconv_out = fluid.layers.batch_norm(
conv2d_e,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.7.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.7.bias'),
moving_mean_name=name + 'op.7.running_mean',
moving_variance_name=name + 'op.7.running_var')
else:
sepconv_out = fluid.layers.batch_norm(
conv2d_e,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.7.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.7.bias'),
moving_mean_name=name + 'op.7.running_mean',
moving_variance_name=name + 'op.7.running_var')
return sepconv_out
def SevenConv(input, C_out, stride, name='', affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_out, (1, 7), (1, stride), (0, 3),
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False)
conv2d_b = fluid.layers.conv2d(
conv2d_a,
C_out, (7, 1), (stride, 1), (3, 0),
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.2.weight'),
bias_attr=False)
if affine:
out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
else:
out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
def Identity(input, name=''):
return input
def Zero(input, stride, name=''):
ones = np.ones(input.shape[-2:])
ones[::stride, ::stride] = 0
ones = fluid.layers.assign(ones)
return input * ones
def FactorizedReduce(input, C_out, name='', affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_out // 2,
1,
2,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'conv_1.weight'),
bias_attr=False)
h_end = relu_a.shape[2]
w_end = relu_a.shape[3]
slice_a = fluid.layers.slice(relu_a, [2, 3], [1, 1], [h_end, w_end])
conv2d_b = fluid.layers.conv2d(
slice_a,
C_out // 2,
1,
2,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'conv_2.weight'),
bias_attr=False)
out = fluid.layers.concat([conv2d_a, conv2d_b], axis=1)
if affine:
out = fluid.layers.batch_norm(
out,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'bn.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'bn.bias'),
moving_mean_name=name + 'bn.running_mean',
moving_variance_name=name + 'bn.running_var')
else:
out = fluid.layers.batch_norm(
out,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'bn.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'bn.bias'),
moving_mean_name=name + 'bn.running_mean',
moving_variance_name=name + 'bn.running_var')
return out
# Copyright (c) 2019 PaddlePaddle Authors. All Rig hts Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
"""
CIFAR-10 dataset.
This module will download dataset from
https://www.cs.toronto.edu/~kriz/cifar.html and parse train/test set into
paddle reader creators.
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class. There are 50000 training images and 10000 test images.
"""
from PIL import Image
from PIL import ImageOps
import numpy as np
import cPickle
import random
import utils
import paddle.fluid as fluid
import time
import os
import functools
import paddle.reader
__all__ = ['train10', 'test10']
image_size = 32
image_depth = 3
half_length = 8
CIFAR_MEAN = [0.4914, 0.4822, 0.4465]
CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
def generate_reshape_label(label, batch_size, CIFAR_CLASSES=10):
reshape_label = np.zeros((batch_size, 1), dtype='int32')
reshape_non_label = np.zeros(
(batch_size * (CIFAR_CLASSES - 1), 1), dtype='int32')
num = 0
for i in range(batch_size):
label_i = label[i]
reshape_label[i] = label_i + i * CIFAR_CLASSES
for j in range(CIFAR_CLASSES):
if label_i != j:
reshape_non_label[num] = \
j + i * CIFAR_CLASSES
num += 1
return reshape_label, reshape_non_label
def generate_bernoulli_number(batch_size, CIFAR_CLASSES=10):
rcc_iters = 50
rad_var = np.zeros((rcc_iters, batch_size, CIFAR_CLASSES - 1))
for i in range(rcc_iters):
bernoulli_num = np.random.binomial(size=batch_size, n=1, p=0.5)
bernoulli_map = np.array([])
ones = np.ones((CIFAR_CLASSES - 1, 1))
for batch_id in range(batch_size):
num = bernoulli_num[batch_id]
var_id = 2 * ones * num - 1
bernoulli_map = np.append(bernoulli_map, var_id)
rad_var[i] = bernoulli_map.reshape((batch_size, CIFAR_CLASSES - 1))
return rad_var.astype('float32')
def preprocess(sample, is_training, args):
image_array = sample.reshape(3, image_size, image_size)
rgb_array = np.transpose(image_array, (1, 2, 0))
img = Image.fromarray(rgb_array, 'RGB')
if is_training:
# pad and ramdom crop
img = ImageOps.expand(img, (4, 4, 4, 4), fill=0) # pad to 40 * 40 * 3
left_top = np.random.randint(9, size=2) # rand 0 - 8
img = img.crop((left_top[0], left_top[1], left_top[0] + image_size,
left_top[1] + image_size))
if np.random.randint(2):
img = img.transpose(Image.FLIP_LEFT_RIGHT)
img = np.array(img).astype(np.float32)
# per_image_standardization
img_float = img / 255.0
img = (img_float - CIFAR_MEAN) / CIFAR_STD
if is_training and args.cutout:
center = np.random.randint(image_size, size=2)
offset_width = max(0, center[0] - half_length)
offset_height = max(0, center[1] - half_length)
target_width = min(center[0] + half_length, image_size)
target_height = min(center[1] + half_length, image_size)
for i in range(offset_height, target_height):
for j in range(offset_width, target_width):
img[i][j][:] = 0.0
img = np.transpose(img, (2, 0, 1))
return img
def reader_creator_filepath(filename, sub_name, is_training, args):
files = os.listdir(filename)
names = [each_item for each_item in files if sub_name in each_item]
names.sort()
datasets = []
for name in names:
print("Reading file " + name)
batch = cPickle.load(open(filename + name, 'rb'))
data = batch['data']
labels = batch.get('labels', batch.get('fine_labels', None))
assert labels is not None
dataset = zip(data, labels)
datasets.extend(dataset)
random.shuffle(datasets)
def read_batch(datasets, args):
for sample, label in datasets:
im = preprocess(sample, is_training, args)
yield im, [int(label)]
def reader():
batch_data = []
batch_label = []
for data, label in read_batch(datasets, args):
batch_data.append(data)
batch_label.append(label)
if len(batch_data) == args.batch_size:
batch_data = np.array(batch_data, dtype='float32')
batch_label = np.array(batch_label, dtype='int64')
if is_training:
flatten_label, flatten_non_label = \
generate_reshape_label(batch_label, args.batch_size)
rad_var = generate_bernoulli_number(args.batch_size)
mixed_x, y_a, y_b, lam = utils.mixup_data(
batch_data, batch_label, args.batch_size,
args.mix_alpha)
batch_out = [[mixed_x, y_a, y_b, lam, flatten_label, \
flatten_non_label, rad_var]]
yield batch_out
else:
batch_out = [[batch_data, batch_label]]
yield batch_out
batch_data = []
batch_label = []
return reader
def train10(args):
"""
CIFAR-10 training set creator.
It returns a reader creator, each sample in the reader is image pixels in
[0, 1] and label in [0, 9].
:return: Training reader creator
:rtype: callable
"""
return reader_creator_filepath(args.data, 'data_batch', True, args)
def test10(args):
"""
CIFAR-10 test set creator.
It returns a reader creator, each sample in the reader is image pixels in
[0, 1] and label in [0, 9].
:return: Test reader creator.
:rtype: callable
"""
return reader_creator_filepath(args.data, 'test_batch', False, args)
CUDA_VISIBLE_DEVICES=0 python -u train_mixup.py \
--batch_size=80 \
--auxiliary \
--weight_decay=0.0003 \
--learning_rate=0.025 \
--lrc_loss_lambda=0.7 \
--cutout
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from learning_rate import cosine_decay
import numpy as np
import argparse
from model import NetworkCIFAR as Network
import reader
import sys
import os
import time
import logging
import genotypes
import paddle.fluid as fluid
import shutil
import utils
import cPickle as cp
parser = argparse.ArgumentParser("cifar")
parser.add_argument(
'--data',
type=str,
default='./dataset/cifar/cifar-10-batches-py/',
help='location of the data corpus')
parser.add_argument('--batch_size', type=int, default=96, help='batch size')
parser.add_argument(
'--learning_rate', type=float, default=0.025, help='init learning rate')
parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
parser.add_argument(
'--weight_decay', type=float, default=3e-4, help='weight decay')
parser.add_argument(
'--report_freq', type=float, default=50, help='report frequency')
parser.add_argument(
'--epochs', type=int, default=600, help='num of training epochs')
parser.add_argument(
'--init_channels', type=int, default=36, help='num of init channels')
parser.add_argument(
'--layers', type=int, default=20, help='total number of layers')
parser.add_argument(
'--model_path',
type=str,
default='saved_models',
help='path to save the model')
parser.add_argument(
'--auxiliary',
action='store_true',
default=False,
help='use auxiliary tower')
parser.add_argument(
'--auxiliary_weight',
type=float,
default=0.4,
help='weight for auxiliary loss')
parser.add_argument(
'--cutout', action='store_true', default=False, help='use cutout')
parser.add_argument(
'--cutout_length', type=int, default=16, help='cutout length')
parser.add_argument(
'--drop_path_prob', type=float, default=0.2, help='drop path probability')
parser.add_argument('--save', type=str, default='EXP', help='experiment name')
parser.add_argument(
'--arch', type=str, default='DARTS', help='which architecture to use')
parser.add_argument(
'--grad_clip', type=float, default=5, help='gradient clipping')
parser.add_argument(
'--lr_exp_decay',
action='store_true',
default=False,
help='use exponential_decay learning_rate')
parser.add_argument('--mix_alpha', type=float, default=0.5, help='mixup alpha')
parser.add_argument(
'--lrc_loss_lambda', default=0, type=float, help='lrc_loss_lambda')
parser.add_argument(
'--loss_type',
default=1,
type=float,
help='loss_type 0: cross entropy 1: multi margin loss 2: max margin loss')
args = parser.parse_args()
CIFAR_CLASSES = 10
dataset_train_size = 50000
image_size = 32
def main():
image_shape = [3, image_size, image_size]
devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(","))
logging.info("args = %s", args)
genotype = eval("genotypes.%s" % args.arch)
model = Network(args.init_channels, CIFAR_CLASSES, args.layers,
args.auxiliary, genotype)
steps_one_epoch = dataset_train_size / (devices_num * args.batch_size)
train(model, args, image_shape, steps_one_epoch)
def build_program(main_prog, startup_prog, args, is_train, model, im_shape,
steps_one_epoch):
out = []
with fluid.program_guard(main_prog, startup_prog):
py_reader = model.build_input(im_shape, args.batch_size, is_train)
if is_train:
with fluid.unique_name.guard():
loss = model.train_model(py_reader, args.init_channels,
args.auxiliary, args.auxiliary_weight,
args.batch_size, args.lrc_loss_lambda)
optimizer = fluid.optimizer.Momentum(
learning_rate=cosine_decay(args.learning_rate, \
args.epochs, steps_one_epoch),
regularization=fluid.regularizer.L2Decay(\
args.weight_decay),
momentum=args.momentum)
optimizer.minimize(loss)
out = [py_reader, loss]
else:
with fluid.unique_name.guard():
loss, acc_1, acc_5 = model.test_model(py_reader,
args.init_channels)
out = [py_reader, loss, acc_1, acc_5]
return out
def train(model, args, im_shape, steps_one_epoch):
train_startup_prog = fluid.Program()
test_startup_prog = fluid.Program()
train_prog = fluid.Program()
test_prog = fluid.Program()
train_py_reader, loss_train = build_program(train_prog, train_startup_prog,
args, True, model, im_shape,
steps_one_epoch)
test_py_reader, loss_test, acc_1, acc_5 = build_program(
test_prog, test_startup_prog, args, False, model, im_shape,
steps_one_epoch)
test_prog = test_prog.clone(for_test=True)
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(train_startup_prog)
exe.run(test_startup_prog)
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_threads = 1
train_exe = fluid.ParallelExecutor(
main_program=train_prog,
use_cuda=True,
loss_name=loss_train.name,
exec_strategy=exec_strategy)
train_reader = reader.train10(args)
test_reader = reader.test10(args)
train_py_reader.decorate_paddle_reader(train_reader)
test_py_reader.decorate_paddle_reader(test_reader)
fluid.clip.set_gradient_clip(fluid.clip.GradientClipByNorm(args.grad_clip))
fluid.memory_optimize(fluid.default_main_program())
def save_model(postfix, main_prog):
model_path = os.path.join(args.model_path, postfix)
if os.path.isdir(model_path):
shutil.rmtree(model_path)
fluid.io.save_persistables(exe, model_path, main_program=main_prog)
def test(epoch_id):
test_fetch_list = [loss_test, acc_1, acc_5]
objs = utils.AvgrageMeter()
top1 = utils.AvgrageMeter()
top5 = utils.AvgrageMeter()
test_py_reader.start()
test_start_time = time.time()
step_id = 0
try:
while True:
prev_test_start_time = test_start_time
test_start_time = time.time()
loss_test_v, acc_1_v, acc_5_v = exe.run(
test_prog, fetch_list=test_fetch_list)
objs.update(np.array(loss_test_v), args.batch_size)
top1.update(np.array(acc_1_v), args.batch_size)
top5.update(np.array(acc_5_v), args.batch_size)
if step_id % args.report_freq == 0:
print("Epoch {}, Step {}, acc_1 {}, acc_5 {}, time {}".
format(epoch_id, step_id,
np.array(acc_1_v),
np.array(acc_5_v), test_start_time -
prev_test_start_time))
step_id += 1
except fluid.core.EOFException:
test_py_reader.reset()
print("Epoch {0}, top1 {1}, top5 {2}".format(epoch_id, top1.avg,
top5.avg))
train_fetch_list = [loss_train]
epoch_start_time = time.time()
for epoch_id in range(args.epochs):
model.drop_path_prob = args.drop_path_prob * epoch_id / args.epochs
train_py_reader.start()
epoch_end_time = time.time()
if epoch_id > 0:
print("Epoch {}, total time {}".format(epoch_id - 1, epoch_end_time
- epoch_start_time))
epoch_start_time = epoch_end_time
epoch_end_time
start_time = time.time()
step_id = 0
try:
while True:
prev_start_time = start_time
start_time = time.time()
loss_v, = train_exe.run(
fetch_list=[v.name for v in train_fetch_list])
print("Epoch {}, Step {}, loss {}, time {}".format(epoch_id, step_id, \
np.array(loss_v).mean(), start_time-prev_start_time))
step_id += 1
sys.stdout.flush()
except fluid.core.EOFException:
train_py_reader.reset()
if epoch_id % 50 == 0 or epoch_id == args.epochs - 1:
save_model(str(epoch_id), train_prog)
test(epoch_id)
if __name__ == '__main__':
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
import os
import sys
import time
import math
import numpy as np
def mixup_data(x, y, batch_size, alpha=1.0):
'''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda'''
if alpha > 0.:
lam = np.random.beta(alpha, alpha)
else:
lam = 1.
index = np.random.permutation(batch_size)
mixed_x = lam * x + (1 - lam) * x[index, :]
y_a, y_b = y, y[index]
return mixed_x.astype('float32'), y_a.astype('int64'),\
y_b.astype('int64'), np.array(lam, dtype='float32')
class AvgrageMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.avg = 0
self.sum = 0
self.cnt = 0
def update(self, val, n=1):
self.sum += val * n
self.cnt += n
self.avg = self.sum / self.cnt
#-*- coding: utf-8 -*-
import math
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
import math
from tqdm import tqdm
from utils import fluid_flatten
class DQNModel(object):
......@@ -39,34 +38,51 @@ class DQNModel(object):
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
self.predict_program = fluid.Program()
self.train_program = fluid.Program()
self._sync_program = fluid.Program()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
with fluid.program_guard(self.predict_program):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
with fluid.program_guard(self.train_program):
state, action, reward, next_s, isOver = self._get_inputs()
pred_value = self.get_DQN_prediction(state)
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
self._sync_program = self._build_sync_target_network()
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
# define program
self.train_program = fluid.default_main_program()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
vars = list(self.train_program.list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
with fluid.program_guard(self._sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
......@@ -81,50 +97,50 @@ class DQNModel(object):
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
filter_size=5,
stride=1,
padding=2,
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv1, pool_size=2, pool_stride=2, pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
filter_size=5,
stride=1,
padding=2,
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv2, pool_size=2, pool_stride=2, pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
filter_size=4,
stride=1,
padding=1,
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv3, pool_size=2, pool_stride=2, pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
filter_size=3,
stride=1,
padding=1,
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
flatten = fluid.layers.flatten(conv4, axis=1)
out = fluid.layers.fc(
input=flatten,
......@@ -133,23 +149,6 @@ class DQNModel(object):
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
......
#-*- coding: utf-8 -*-
import math
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
from tqdm import tqdm
import math
from utils import fluid_argmax, fluid_flatten
class DoubleDQNModel(object):
......@@ -39,41 +38,59 @@ class DoubleDQNModel(object):
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
self.predict_program = fluid.Program()
self.train_program = fluid.Program()
self._sync_program = fluid.Program()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
with fluid.program_guard(self.predict_program):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
with fluid.program_guard(self.train_program):
state, action, reward, next_s, isOver = self._get_inputs()
pred_value = self.get_DQN_prediction(state)
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
next_s_predcit_value = self.get_DQN_prediction(next_s)
greedy_action = fluid_argmax(next_s_predcit_value)
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
best_v = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value),
dim=1)
best_v.stop_gradient = True
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
next_s_predcit_value = self.get_DQN_prediction(next_s)
greedy_action = fluid.layers.argmax(next_s_predcit_value, axis=1)
greedy_action = fluid.layers.unsqueeze(greedy_action, axes=[1])
self._sync_program = self._build_sync_target_network()
predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
best_v = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value),
dim=1)
best_v.stop_gradient = True
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
# define program
self.train_program = fluid.default_main_program()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
vars = list(self.train_program.list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
with fluid.program_guard(self._sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
......@@ -88,50 +105,50 @@ class DoubleDQNModel(object):
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
filter_size=5,
stride=1,
padding=2,
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv1, pool_size=2, pool_stride=2, pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
filter_size=5,
stride=1,
padding=2,
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv2, pool_size=2, pool_stride=2, pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
filter_size=4,
stride=1,
padding=1,
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv3, pool_size=2, pool_stride=2, pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
filter_size=3,
stride=1,
padding=1,
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
flatten = fluid.layers.flatten(conv4, axis=1)
out = fluid.layers.fc(
input=flatten,
......@@ -140,23 +157,6 @@ class DoubleDQNModel(object):
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
......
#-*- coding: utf-8 -*-
import math
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
from tqdm import tqdm
import math
from utils import fluid_flatten
class DuelingDQNModel(object):
......@@ -39,34 +38,51 @@ class DuelingDQNModel(object):
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
self.predict_program = fluid.Program()
self.train_program = fluid.Program()
self._sync_program = fluid.Program()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
with fluid.program_guard(self.predict_program):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
with fluid.program_guard(self.train_program):
state, action, reward, next_s, isOver = self._get_inputs()
pred_value = self.get_DQN_prediction(state)
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
self._sync_program = self._build_sync_target_network()
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
# define program
self.train_program = fluid.default_main_program()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
vars = list(self.train_program.list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
with fluid.program_guard(self._sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
......@@ -81,50 +97,50 @@ class DuelingDQNModel(object):
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
filter_size=5,
stride=1,
padding=2,
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv1, pool_size=2, pool_stride=2, pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
filter_size=5,
stride=1,
padding=2,
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv2, pool_size=2, pool_stride=2, pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
filter_size=4,
stride=1,
padding=1,
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
input=conv3, pool_size=2, pool_stride=2, pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
filter_size=3,
stride=1,
padding=1,
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
flatten = fluid.layers.flatten(conv4, axis=1)
value = fluid.layers.fc(
input=flatten,
......@@ -143,24 +159,6 @@ class DuelingDQNModel(object):
advantage, dim=1, keep_dim=True))
return Q
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# The prune API is deprecated, please don't use it any more.
sync_program = sync_program._prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
......@@ -186,12 +184,14 @@ class DuelingDQNModel(object):
self.global_step += 1
action = np.expand_dims(action, -1)
self.exe.run(self.train_program, \
feed={'state': state.astype('float32'), \
'action': action.astype('int32'), \
'reward': reward, \
'next_s': next_state.astype('float32'), \
'isOver': isOver})
self.exe.run(self.train_program,
feed={
'state': state.astype('float32'),
'action': action.astype('int32'),
'reward': reward,
'next_s': next_state.astype('float32'),
'isOver': isOver
})
def sync_target_network(self):
self.exe.run(self._sync_program)
......@@ -29,7 +29,7 @@ The average game rewards that can be obtained for the three models as the number
+ gym
+ tqdm
+ opencv-python
+ paddlepaddle-gpu>=0.12.0
+ paddlepaddle-gpu>=1.0.0
+ ale_python_interface
### Install Dependencies:
......
......@@ -28,7 +28,7 @@
+ gym
+ tqdm
+ opencv-python
+ paddlepaddle-gpu>=0.12.0
+ paddlepaddle-gpu>=1.0.0
+ ale_python_interface
### 下载依赖:
......
#-*- coding: utf-8 -*-
#File: utils.py
import paddle.fluid as fluid
import numpy as np
def fluid_argmax(x):
"""
Get index of max value for the last dimension
"""
_, max_index = fluid.layers.topk(x, k=1)
return max_index
def fluid_flatten(x):
"""
Flatten fluid variable along the first dimension
"""
return fluid.layers.reshape(x, shape=[-1, np.prod(x.shape[1:])])
deeplabv3plus_xception65_initialize.params
deeplabv3plus.params
deeplabv3plus.tar.gz
*.tgz
deeplabv3plus_gn_init*
deeplabv3plus_xception65_initialize*
*.log
*.sh
output*
......@@ -20,7 +20,7 @@ cudaid=${deeplabv3plus_m:=0,1,2,3} # use 0,1,2,3 card as default
export CUDA_VISIBLE_DEVICES=$cudaid
FLAGS_benchmark=true python train.py \
--batch_size=2 \
--batch_size=8 \
--train_crop_size=769 \
--total_step=50 \
--save_weights_path=output4 \
......
DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.0.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。
DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.3.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。
## 代码结构
......@@ -38,15 +38,16 @@ data/cityscape/
# 预训练模型准备
我们为了节约更多的显存,在这里我们使用Group Norm作为我们的归一化手段。
如果需要从头开始训练模型,用户需要下载我们的初始化模型
```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz
tar -xf deeplabv3plus_xception65_initialize.tar.gz && rm deeplabv3plus_xception65_initialize.tar.gz
wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz
tar -xf deeplabv3plus_gn_init.tgz && rm deeplabv3plus_gn_init.tgz
```
如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型
```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz
tar -xf deeplabv3plus.tar.gz && rm deeplabv3plus.tar.gz
wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz
tar -xf deeplabv3plus_gn.tgz && rm deeplabv3plus_gn.tgz
```
......@@ -59,6 +60,7 @@ python ./train.py \
--batch_size=1 \
--train_crop_size=769 \
--total_step=50 \
--norm_type=gn \
--init_weights_path=$INIT_WEIGHTS_PATH \
--save_weights_path=$SAVE_WEIGHTS_PATH \
--dataset_path=$DATASET_PATH
......@@ -70,21 +72,26 @@ python train.py --help
以上命令用于测试训练过程是否正常,仅仅迭代了50次并且使用了1的batch size,如果需要复现
原论文的实验,请使用以下设置:
```
CUDA_VISIBLE_DEVICES=0 \
python ./train.py \
--batch_size=8 \
--parallel=true \
--batch_size=4 \
--parallel=True \
--norm_type=gn \
--train_crop_size=769 \
--total_step=90000 \
--init_weights_path=deeplabv3plus_xception65_initialize.params \
--save_weights_path=output/ \
--total_step=500000 \
--base_lr=0.001 \
--init_weights_path=deeplabv3plus_gn_init \
--save_weights_path=output \
--dataset_path=$DATASET_PATH
```
如果您的显存不足,可以尝试减小`batch_size`,同时等比例放大`total_step`, 缩小`base_lr`, 保证相乘的值不变,这得益于Group Norm的特性,改变 `batch_size` 并不会显著影响结果,而且能够节约更多显存, 比如您可以设置`--batch_size=2 --total_step=1000000 --base_lr=0.0005`
### 测试
执行以下命令在`Cityscape`测试数据集上进行测试:
```
python ./eval.py \
--init_weights=deeplabv3plus.params \
--init_weights_path=deeplabv3plus_gn \
--norm_type=gn \
--dataset_path=$DATASET_PATH
```
需要通过选项`--model_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。
......@@ -93,15 +100,16 @@ python ./eval.py \
## 实验结果
训练完成以后,使用`eval.py`在验证集上进行测试,得到以下结果:
```
load from: ../models/deeplabv3p
load from: ../models/deeplabv3plus_gn
total number 500
step: 500, mIoU: 0.7873
step: 500, mIoU: 0.7881
```
## 其他信息
|数据集 | pretrained model | trained model | mean IoU
|---|---|---|---|
|CityScape | [deeplabv3plus_xception65_initialize.tar.gz](http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz) | [deeplabv3plus.tar.gz](http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz) | 0.7873 |
|数据集 | norm type | pretrained model | trained model | mean IoU
|---|---|---|---|---|
|CityScape | group norm | [deeplabv3plus_gn_init.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz) | [deeplabv3plus_gn.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz) | 0.7881 |
## 参考
......
......@@ -2,7 +2,9 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
if 'FLAGS_fraction_of_gpu_memory_to_use' not in os.environ:
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
os.environ['FLAGS_enable_parallel_graph'] = '1'
import paddle
import paddle.fluid as fluid
......@@ -12,21 +14,20 @@ from reader import CityscapeDataset
import reader
import models
import sys
import utility
parser = argparse.ArgumentParser()
add_arg = lambda *args: utility.add_arguments(*args, argparser=parser)
def add_argument(name, type, default, help):
parser.add_argument('--' + name, default=default, type=type, help=help)
def add_arguments():
add_argument('total_step', int, -1,
"Number of the step to be evaluated, -1 for full evaluation.")
add_argument('init_weights_path', str, None,
"Path of the weights to evaluate.")
add_argument('dataset_path', str, None, "Cityscape dataset path.")
add_argument('verbose', bool, False, "Print mIoU for each step if verbose.")
add_argument('use_gpu', bool, True, "Whether use GPU or CPU.")
add_argument('num_classes', int, 19, "Number of classes.")
# yapf: disable
add_arg('total_step', int, -1, "Number of the step to be evaluated, -1 for full evaluation.")
add_arg('init_weights_path', str, None, "Path of the weights to evaluate.")
add_arg('dataset_path', str, None, "Cityscape dataset path.")
add_arg('use_gpu', bool, True, "Whether use GPU or CPU.")
add_arg('num_classes', int, 19, "Number of classes.")
add_arg('use_py_reader', bool, True, "Use py_reader.")
add_arg('norm_type', str, 'bn', "Normalization type, should be 'bn' or 'gn'.")
#yapf: enable
def mean_iou(pred, label):
......@@ -43,7 +44,7 @@ def mean_iou(pred, label):
def load_model():
if args.init_weights_path.endswith('/'):
if os.path.isdir(args.init_weights_path):
fluid.io.load_params(
exe, dirname=args.init_weights_path, main_program=tp)
else:
......@@ -53,13 +54,11 @@ def load_model():
CityscapeDataset = reader.CityscapeDataset
parser = argparse.ArgumentParser()
add_arguments()
args = parser.parse_args()
models.clean()
models.is_train = False
models.default_norm_type = args.norm_type
deeplabv3p = models.deeplabv3p
image_shape = [1025, 2049]
......@@ -73,8 +72,15 @@ reader.default_config['shuffle'] = False
num_classes = args.num_classes
with fluid.program_guard(tp, sp):
img = fluid.layers.data(name='img', shape=[3, 0, 0], dtype='float32')
label = fluid.layers.data(name='label', shape=eval_shape, dtype='int32')
if args.use_py_reader:
py_reader = fluid.layers.py_reader(capacity=64,
shapes=[[1, 3, 0, 0], [1] + eval_shape],
dtypes=['float32', 'int32'])
img, label = fluid.layers.read_file(py_reader)
else:
img = fluid.layers.data(name='img', shape=[3, 0, 0], dtype='float32')
label = fluid.layers.data(name='label', shape=eval_shape, dtype='int32')
img = fluid.layers.resize_bilinear(img, image_shape)
logit = deeplabv3p(img)
logit = fluid.layers.resize_bilinear(logit, eval_shape)
......@@ -105,24 +111,29 @@ else:
total_step = args.total_step
batches = dataset.get_batch_generator(batch_size, total_step)
if args.use_py_reader:
py_reader.decorate_tensor_provider(lambda :[ (yield b[1],b[2]) for b in batches])
py_reader.start()
sum_iou = 0
all_correct = np.array([0], dtype=np.int64)
all_wrong = np.array([0], dtype=np.int64)
for i, imgs, labels, names in batches:
result = exe.run(tp,
feed={'img': imgs,
'label': labels},
fetch_list=[pred, miou, out_wrong, out_correct])
for i in range(total_step):
if not args.use_py_reader:
_, imgs, labels, names = next(batches)
result = exe.run(tp,
feed={'img': imgs,
'label': labels},
fetch_list=[pred, miou, out_wrong, out_correct])
else:
result = exe.run(tp,
fetch_list=[pred, miou, out_wrong, out_correct])
wrong = result[2][:-1] + all_wrong
right = result[3][:-1] + all_correct
all_wrong = wrong.copy()
all_correct = right.copy()
mp = (wrong + right) != 0
miou2 = np.mean((right[mp] * 1.0 / (right[mp] + wrong[mp])))
if args.verbose:
print('step: %s, mIoU: %s' % (i + 1, miou2))
else:
print('\rstep: %s, mIoU: %s' % (i + 1, miou2))
sys.stdout.flush()
print('step: %s, mIoU: %s' % (i + 1, miou2))
......@@ -5,6 +5,7 @@ import paddle
import paddle.fluid as fluid
import contextlib
import os
name_scope = ""
decode_channel = 48
......@@ -146,10 +147,12 @@ def bn_relu(data):
def relu(data):
return append_op_result(fluid.layers.relu(data), 'relu')
return append_op_result(
fluid.layers.relu(
data, name=name_scope + 'relu'), 'relu')
def seq_conv(input, channel, stride, filter, dilation=1, act=None):
def seperate_conv(input, channel, stride, filter, dilation=1, act=None):
with scope('depthwise'):
input = conv(
input,
......@@ -187,14 +190,14 @@ def xception_block(input,
with scope('separable_conv' + str(i + 1)):
if not activation_fn_in_separable_conv:
data = relu(data)
data = seq_conv(
data = seperate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation)
else:
data = seq_conv(
data = seperate_conv(
data,
channels[i],
strides[i],
......@@ -273,11 +276,11 @@ def encoder(input):
with scope("aspp0"):
aspp0 = bn_relu(conv(input, channel, 1, 1, groups=1, padding=0))
with scope("aspp1"):
aspp1 = seq_conv(input, channel, 1, 3, dilation=6, act=relu)
aspp1 = seperate_conv(input, channel, 1, 3, dilation=6, act=relu)
with scope("aspp2"):
aspp2 = seq_conv(input, channel, 1, 3, dilation=12, act=relu)
aspp2 = seperate_conv(input, channel, 1, 3, dilation=12, act=relu)
with scope("aspp3"):
aspp3 = seq_conv(input, channel, 1, 3, dilation=18, act=relu)
aspp3 = seperate_conv(input, channel, 1, 3, dilation=18, act=relu)
with scope("concat"):
data = append_op_result(
fluid.layers.concat(
......@@ -300,10 +303,10 @@ def decoder(encode_data, decode_shortcut):
[encode_data, decode_shortcut], axis=1)
append_op_result(encode_data, 'concat')
with scope("separable_conv1"):
encode_data = seq_conv(
encode_data = seperate_conv(
encode_data, encode_channel, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = seq_conv(
encode_data = seperate_conv(
encode_data, encode_channel, 1, 3, dilation=1, act=relu)
return encode_data
......
......@@ -9,7 +9,7 @@ import six
default_config = {
"shuffle": True,
"min_resize": 0.5,
"max_resize": 2,
"max_resize": 4,
"crop_size": 769,
}
......@@ -90,9 +90,21 @@ class CityscapeDataset:
break
if shape == -1:
return img, label, ln
random_scale = np.random.rand(1) * (self.config['max_resize'] -
self.config['min_resize']
) + self.config['min_resize']
if np.random.rand() > 0.5:
range_l = 1
range_r = self.config['max_resize']
else:
range_l = self.config['min_resize']
range_r = 1
if np.random.rand() > 0.5:
assert len(img.shape) == 3 and len(
label.shape) == 3, "{} {}".format(img.shape, label.shape)
img = img[:, :, ::-1]
label = label[:, :, ::-1]
random_scale = np.random.rand(1) * (range_r - range_l) + range_l
crop_size = int(shape / random_scale)
bb = crop_size // 2
......
......@@ -2,7 +2,8 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
if 'FLAGS_fraction_of_gpu_memory_to_use' not in os.environ:
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
import paddle
import paddle.fluid as fluid
......@@ -12,105 +13,95 @@ from reader import CityscapeDataset
import reader
import models
import time
import contextlib
import paddle.fluid.profiler as profiler
import utility
def add_argument(name, type, default, help):
parser.add_argument('--' + name, default=default, type=type, help=help)
def add_arguments():
add_argument('batch_size', int, 2,
"The number of images in each batch during training.")
add_argument('train_crop_size', int, 769,
"'Image crop size during training.")
add_argument('base_lr', float, 0.0001,
"The base learning rate for model training.")
add_argument('total_step', int, 90000, "Number of the training step.")
add_argument('init_weights_path', str, None,
"Path of the initial weights in paddlepaddle format.")
add_argument('save_weights_path', str, None,
"Path of the saved weights during training.")
add_argument('dataset_path', str, None, "Cityscape dataset path.")
add_argument('parallel', bool, False, "using ParallelExecutor.")
add_argument('use_gpu', bool, True, "Whether use GPU or CPU.")
add_argument('num_classes', int, 19, "Number of classes.")
parser.add_argument(
'--enable_ce',
action='store_true',
help='If set, run the task with continuous evaluation logs.')
parser = argparse.ArgumentParser()
add_arg = lambda *args: utility.add_arguments(*args, argparser=parser)
# yapf: disable
add_arg('batch_size', int, 4, "The number of images in each batch during training.")
add_arg('train_crop_size', int, 769, "Image crop size during training.")
add_arg('base_lr', float, 0.001, "The base learning rate for model training.")
add_arg('total_step', int, 500000, "Number of the training step.")
add_arg('init_weights_path', str, None, "Path of the initial weights in paddlepaddle format.")
add_arg('save_weights_path', str, None, "Path of the saved weights during training.")
add_arg('dataset_path', str, None, "Cityscape dataset path.")
add_arg('parallel', bool, True, "using ParallelExecutor.")
add_arg('use_gpu', bool, True, "Whether use GPU or CPU.")
add_arg('num_classes', int, 19, "Number of classes.")
add_arg('load_logit_layer', bool, True, "Load last logit fc layer or not. If you are training with different number of classes, you should set to False.")
add_arg('memory_optimize', bool, True, "Using memory optimizer.")
add_arg('norm_type', str, 'bn', "Normalization type, should be 'bn' or 'gn'.")
add_arg('profile', bool, False, "Enable profiler.")
add_arg('use_py_reader', bool, True, "Use py reader.")
parser.add_argument(
'--enable_ce',
action='store_true',
help='If set, run the task with continuous evaluation logs. Users can ignore this agument.')
#yapf: enable
@contextlib.contextmanager
def profile_context(profile=True):
if profile:
with profiler.profiler('All', 'total', '/tmp/profile_file2'):
yield
else:
yield
def load_model():
myvars = [
x for x in tp.list_vars()
if isinstance(x, fluid.framework.Parameter) and x.name.find('logit') ==
-1
]
if args.init_weights_path.endswith('/'):
if args.num_classes == 19:
if os.path.isdir(args.init_weights_path):
load_vars = [
x for x in tp.list_vars()
if isinstance(x, fluid.framework.Parameter) and x.name.find('logit') ==
-1
]
if args.load_logit_layer:
fluid.io.load_params(
exe, dirname=args.init_weights_path, main_program=tp)
else:
fluid.io.load_vars(exe, dirname=args.init_weights_path, vars=myvars)
fluid.io.load_vars(exe, dirname=args.init_weights_path, vars=load_vars)
else:
if args.num_classes == 19:
fluid.io.load_params(
exe,
dirname="",
filename=args.init_weights_path,
main_program=tp)
else:
fluid.io.load_vars(
exe, dirname="", filename=args.init_weights_path, vars=myvars)
fluid.io.load_params(
exe,
dirname="",
filename=args.init_weights_path,
main_program=tp)
def save_model():
if args.save_weights_path.endswith('/'):
fluid.io.save_params(
exe, dirname=args.save_weights_path, main_program=tp)
else:
fluid.io.save_params(
exe, dirname="", filename=args.save_weights_path, main_program=tp)
assert not os.path.isfile(args.save_weights_path)
fluid.io.save_params(
exe, dirname=args.save_weights_path, main_program=tp)
def loss(logit, label):
label_nignore = (label < num_classes).astype('float32')
label = fluid.layers.elementwise_min(
label,
fluid.layers.assign(np.array(
[num_classes - 1], dtype=np.int32)))
label_nignore = fluid.layers.less_than(
label.astype('float32'),
fluid.layers.assign(np.array([num_classes], 'float32')),
force_cpu=False).astype('float32')
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
label_nignore = fluid.layers.reshape(label_nignore, [-1, 1])
loss = fluid.layers.softmax_with_cross_entropy(logit, label)
loss = loss * label_nignore
no_grad_set.add(label_nignore.name)
no_grad_set.add(label.name)
logit = fluid.layers.softmax(logit, use_cudnn=False)
loss = fluid.layers.cross_entropy(logit, label, ignore_index=255)
label_nignore.stop_gradient = True
label.stop_gradient = True
return loss, label_nignore
def get_cards(args):
if args.enable_ce:
cards = os.environ.get('CUDA_VISIBLE_DEVICES')
num = len(cards.split(","))
return num
else:
return args.num_devices
CityscapeDataset = reader.CityscapeDataset
parser = argparse.ArgumentParser()
add_arguments()
args = parser.parse_args()
utility.print_arguments(args)
models.clean()
models.bn_momentum = 0.9997
models.dropout_keep_prop = 0.9
models.label_number = args.num_classes
models.default_norm_type = args.norm_type
deeplabv3p = models.deeplabv3p
sp = fluid.Program()
......@@ -133,12 +124,17 @@ weight_decay = 0.00004
base_lr = args.base_lr
total_step = args.total_step
no_grad_set = set()
with fluid.program_guard(tp, sp):
img = fluid.layers.data(
name='img', shape=[3] + image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=image_shape, dtype='int32')
if args.use_py_reader:
batch_size_each = batch_size // fluid.core.get_cuda_device_count()
py_reader = fluid.layers.py_reader(capacity=64,
shapes=[[batch_size_each, 3] + image_shape, [batch_size_each] + image_shape],
dtypes=['float32', 'int32'])
img, label = fluid.layers.read_file(py_reader)
else:
img = fluid.layers.data(
name='img', shape=[3] + image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=image_shape, dtype='int32')
logit = deeplabv3p(img)
pred = fluid.layers.argmax(logit, axis=1).astype('int32')
loss, mask = loss(logit, label)
......@@ -154,11 +150,21 @@ with fluid.program_guard(tp, sp):
lr,
momentum=0.9,
regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=weight_decay), )
retv = opt.minimize(loss_mean, startup_program=sp, no_grad_set=no_grad_set)
fluid.memory_optimize(
tp, print_log=False, skip_opt_set=set([pred.name, loss_mean.name]), level=1)
regularization_coeff=weight_decay))
optimize_ops, params_grads = opt.minimize(loss_mean, startup_program=sp)
# ir memory optimizer has some issues, we need to seed grad persistable to
# avoid this issue
for p,g in params_grads: g.persistable = True
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_threads = fluid.core.get_cuda_device_count()
exec_strategy.num_iteration_per_drop_scope = 100
build_strategy = fluid.BuildStrategy()
if args.memory_optimize:
build_strategy.fuse_relu_depthwise_conv = True
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True
place = fluid.CPUPlace()
if args.use_gpu:
......@@ -170,47 +176,58 @@ if args.init_weights_path:
print("load from:", args.init_weights_path)
load_model()
dataset = CityscapeDataset(args.dataset_path, 'train')
dataset = reader.CityscapeDataset(args.dataset_path, 'train')
if args.parallel:
exe_p = fluid.ParallelExecutor(
use_cuda=True, loss_name=loss_mean.name, main_program=tp)
batches = dataset.get_batch_generator(batch_size, total_step)
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
loss_name=loss_mean.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
else:
binary = fluid.compiler.CompiledProgram(main)
if args.use_py_reader:
assert(batch_size % fluid.core.get_cuda_device_count() == 0)
def data_gen():
batches = dataset.get_batch_generator(
batch_size // fluid.core.get_cuda_device_count(),
total_step * fluid.core.get_cuda_device_count())
for b in batches:
yield b[1], b[2]
py_reader.decorate_tensor_provider(data_gen)
py_reader.start()
else:
batches = dataset.get_batch_generator(batch_size, total_step)
total_time = 0.0
epoch_idx = 0
train_loss = 0
for i, imgs, labels, names in batches:
epoch_idx += 1
begin_time = time.time()
prev_start_time = time.time()
if args.parallel:
retv = exe_p.run(fetch_list=[pred.name, loss_mean.name],
feed={'img': imgs,
'label': labels})
else:
retv = exe.run(tp,
feed={'img': imgs,
'label': labels},
fetch_list=[pred, loss_mean])
end_time = time.time()
total_time += end_time - begin_time
if i % 100 == 0:
print("Model is saved to", args.save_weights_path)
save_model()
print("step {:d}, loss: {:.6f}, step_time_cost: {:.3f}".format(
i, np.mean(retv[1]), end_time - prev_start_time))
# only for ce
train_loss = np.mean(retv[1])
with profile_context(args.profile):
for i in range(total_step):
epoch_idx += 1
begin_time = time.time()
prev_start_time = time.time()
if not args.use_py_reader:
_, imgs, labels, names = next(batches)
train_loss, = exe.run(binary,
feed={'img': imgs,
'label': labels}, fetch_list=[loss_mean])
else:
train_loss, = exe.run(binary, fetch_list=[loss_mean])
train_loss = np.mean(train_loss)
end_time = time.time()
total_time += end_time - begin_time
if i % 100 == 0:
print("Model is saved to", args.save_weights_path)
save_model()
print("step {:d}, loss: {:.6f}, step_time_cost: {:.3f}".format(
i, train_loss, end_time - prev_start_time))
print("Training done. Model is saved to", args.save_weights_path)
save_model()
if args.enable_ce:
gpu_num = get_cards(args)
gpu_num = fluid.core.get_cuda_device_count()
print("kpis\teach_pass_duration_card%s\t%s" %
(gpu_num, total_time / epoch_idx))
print("kpis\ttrain_loss_card%s\t%s" % (gpu_num, train_loss))
print("Training done. Model is saved to", args.save_weights_path)
save_model()
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util
import six
def print_arguments(args):
"""Print argparse's arguments.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
parser.add_argument("name", default="Jonh", type=str, help="User name.")
args = parser.parse_args()
print_arguments(args)
:param args: Input argparse.Namespace for printing.
:type args: argparse.Namespace
"""
print("----------- Configuration Arguments -----------")
for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
def add_arguments(argname, type, default, help, argparser, **kwargs):
"""Add argparse's argument.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
add_argument("name", str, "Jonh", "User name.", parser)
args = parser.parse_args()
"""
type = distutils.util.strtobool if type == bool else type
argparser.add_argument(
"--" + argname,
default=default,
type=type,
help=help + ' Default: %(default)s.',
**kwargs)
......@@ -44,11 +44,15 @@ def infer(args, config):
shrink, max_shrink = get_shrink(image.size[1], image.size[0])
det0 = detect_face(image, shrink)
det1 = flip_test(image, shrink)
[det2, det3] = multi_scale_test(image, max_shrink)
det4 = multi_scale_test_pyramid(image, max_shrink)
det = np.row_stack((det0, det1, det2, det3, det4))
dets = bbox_vote(det)
if args.use_gpu:
det1 = flip_test(image, shrink)
[det2, det3] = multi_scale_test(image, max_shrink)
det4 = multi_scale_test_pyramid(image, max_shrink)
det = np.row_stack((det0, det1, det2, det3, det4))
dets = bbox_vote(det)
else:
# when infer on cpu, use a simple case
dets = det0
keep_index = np.where(dets[:, 4] >= args.confs_threshold)[0]
dets = dets[keep_index, :]
......@@ -121,7 +125,7 @@ def detect_face(image, shrink):
return_numpy=False)
detection = np.array(detection)
# layout: xmin, ymin, xmax. ymax, score
if detection.shape == (1, ):
if np.prod(detection.shape) == 1:
print("No face detected")
return np.array([[0, 0, 0, 0, 0]])
det_conf = detection[:, 1]
......
......@@ -74,8 +74,8 @@ env CUDA_VISIBLE_DEVICES=0 python train.py
```
env CUDA_VISIBLE_DEVICE=0 python infer.py \
--init_model="models/1" --input="./data/inputA/*" \
--output="./output"
--init_model="checkpoints/1" --input="./data/inputA/*" \
--input_style A --output="./output"
```
训练150轮的模型预测效果如图2和图3所示:
......
......@@ -26,8 +26,10 @@ def infer(args):
data_shape = [-1, 3, 256, 256]
input = fluid.layers.data(name='input', shape=data_shape, dtype='float32')
if args.input_style == "A":
model_name = 'g_a'
fake = build_generator_resnet_9blocks(input, name="g_A")
elif args.input_style == "B":
model_name = 'g_b'
fake = build_generator_resnet_9blocks(input, name="g_B")
else:
raise "Input with style [%s] is not supported." % args.input_style
......@@ -37,7 +39,7 @@ def infer(args):
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
fluid.io.load_persistables(exe, args.init_model)
fluid.io.load_persistables(exe, args.init_model + "/" + model_name)
if not os.path.exists(args.output):
os.makedirs(args.output)
......
......@@ -3,10 +3,12 @@ import paddle.fluid as fluid
import numpy as np
import os
use_cudnn = True
# cudnn is not better when batch size is 1.
use_cudnn = False
if 'ce_mode' in os.environ:
use_cudnn = False
def cal_padding(img_size, stride, filter_size, dilation=1):
"""Calculate padding size."""
valid_filter_size = dilation * (filter_size - 1) + 1
......@@ -18,6 +20,8 @@ def cal_padding(img_size, stride, filter_size, dilation=1):
def instance_norm(input, name=None):
# TODO(lvmengsi@baidu.com): Check the accuracy when using fluid.layers.layer_norm.
# return fluid.layers.layer_norm(input, begin_norm_axis=2)
helper = fluid.layer_helper.LayerHelper("instance_norm", **locals())
dtype = helper.input_dtype()
epsilon = 1e-5
......
......@@ -17,7 +17,6 @@ import data_reader
from utility import add_arguments, print_arguments, ImagePool
from trainer import *
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
......@@ -36,7 +35,7 @@ add_arg('run_ce', bool, False, "Whether to run for model ce.")
def train(args):
max_images_num = data_reader.max_images_num()
shuffle=True
shuffle = True
if args.run_ce:
np.random.seed(10)
fluid.default_startup_program().random_seed = 90
......@@ -66,9 +65,11 @@ def train(args):
exe.run(fluid.default_startup_program())
A_pool = ImagePool()
B_pool = ImagePool()
A_reader = paddle.batch(data_reader.a_reader(shuffle=shuffle), args.batch_size)()
B_reader = paddle.batch(data_reader.b_reader(shuffle=shuffle), args.batch_size)()
A_reader = paddle.batch(
data_reader.a_reader(shuffle=shuffle), args.batch_size)()
B_reader = paddle.batch(
data_reader.b_reader(shuffle=shuffle), args.batch_size)()
if not args.run_ce:
A_test_reader = data_reader.a_test_reader()
B_test_reader = data_reader.b_test_reader()
......@@ -119,13 +120,13 @@ def train(args):
if not os.path.exists(out_path):
os.makedirs(out_path)
fluid.io.save_persistables(
exe, out_path + "/g_a", main_program=g_A_trainer.program, filename="params")
exe, out_path + "/g_a", main_program=g_A_trainer.program)
fluid.io.save_persistables(
exe, out_path + "/g_b", main_program=g_B_trainer.program, filename="params")
exe, out_path + "/g_b", main_program=g_B_trainer.program)
fluid.io.save_persistables(
exe, out_path + "/d_a", main_program=d_A_trainer.program, filename="params")
exe, out_path + "/d_a", main_program=d_A_trainer.program)
fluid.io.save_persistables(
exe, out_path + "/d_b", main_program=d_B_trainer.program, filename="params")
exe, out_path + "/d_b", main_program=d_B_trainer.program)
print("saved checkpoint to {}".format(out_path))
sys.stdout.flush()
......@@ -144,8 +145,24 @@ def train(args):
if args.init_model:
init_model()
losses=[[], []]
losses = [[], []]
t_time = 0
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = False
build_strategy.memory_optimize = False
g_A_trainer_program = fluid.CompiledProgram(
g_A_trainer.program).with_data_parallel(
loss_name=g_A_trainer.g_loss_A.name, build_strategy=build_strategy)
g_B_trainer_program = fluid.CompiledProgram(
g_B_trainer.program).with_data_parallel(
loss_name=g_B_trainer.g_loss_B.name, build_strategy=build_strategy)
d_B_trainer_program = fluid.CompiledProgram(
d_B_trainer.program).with_data_parallel(
loss_name=d_B_trainer.d_loss_B.name, build_strategy=build_strategy)
d_A_trainer_program = fluid.CompiledProgram(
d_A_trainer.program).with_data_parallel(
loss_name=d_A_trainer.d_loss_A.name, build_strategy=build_strategy)
for epoch in range(args.epoch):
batch_id = 0
for i in range(max_images_num):
......@@ -158,7 +175,7 @@ def train(args):
s_time = time.time()
# optimize the g_A network
g_A_loss, fake_B_tmp = exe.run(
g_A_trainer.program,
g_A_trainer_program,
fetch_list=[g_A_trainer.g_loss_A, g_A_trainer.fake_B],
feed={"input_A": tensor_A,
"input_B": tensor_B})
......@@ -167,14 +184,14 @@ def train(args):
# optimize the d_B network
d_B_loss = exe.run(
d_B_trainer.program,
d_B_trainer_program,
fetch_list=[d_B_trainer.d_loss_B],
feed={"input_B": tensor_B,
"fake_pool_B": fake_pool_B})[0]
# optimize the g_B network
g_B_loss, fake_A_tmp = exe.run(
g_B_trainer.program,
g_B_trainer_program,
fetch_list=[g_B_trainer.g_loss_B, g_B_trainer.fake_A],
feed={"input_A": tensor_A,
"input_B": tensor_B})
......@@ -183,16 +200,16 @@ def train(args):
# optimize the d_A network
d_A_loss = exe.run(
d_A_trainer.program,
d_A_trainer_program,
fetch_list=[d_A_trainer.d_loss_A],
feed={"input_A": tensor_A,
"fake_pool_A": fake_pool_A})[0]
batch_time = time.time() - s_time
t_time += batch_time
print("epoch{}; batch{}; g_A_loss: {}; d_B_loss: {}; g_B_loss: {}; d_A_loss: {}; "
"Batch_time_cost: {:.2f}".format(
epoch, batch_id, g_A_loss[0], d_B_loss[0], g_B_loss[0],
d_A_loss[0], batch_time))
print(
"epoch{}; batch{}; g_A_loss: {}; d_B_loss: {}; g_B_loss: {}; d_A_loss: {}; "
"Batch_time_cost: {:.2f}".format(epoch, batch_id, g_A_loss[
0], d_B_loss[0], g_B_loss[0], d_A_loss[0], batch_time))
losses[0].append(g_A_loss[0])
losses[1].append(d_A_loss[0])
sys.stdout.flush()
......
......@@ -10,23 +10,28 @@ This is a simple demonstration of re-implementation in [PaddlePaddle.Fluid](http
## Requirements
- Python == 2.7
- PaddlePaddle >= 1.0
- PaddlePaddle >= 1.1.0
- opencv-python >= 3.3
- tqdm >= 4.25
## Environment
The code is developed and tested under 4 Tesla K40 GPUS cards on CentOS with installed CUDA-9.2/8.0 and cuDNN-7.1.
## Known Issues
- The model does not converge with large batch\_size (e.g. = 32) on Tesla P40 / V100 / P100 GPUS cards, because PaddlePaddle uses the batch normalization function of cuDNN. Changing batch\_size into 1 image on each card during training will ease this problem, but not sure the performance. The issue can be tracked at [here](https://github.com/PaddlePaddle/Paddle/issues/14580).
The code is developed and tested under 4 Tesla K40/P40 GPUS cards on CentOS with installed CUDA-9.2/8.0 and cuDNN-7.1.
## Results on MPII Val
| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1| Models |
| ---- |:----:|:--------:|:-----:|:-----:|:---:|:----:|:-----:|:----:|:-------:|:------:|
| 383x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - |
| 383x384\_pose\_resnet\_50 in Fluid | 96.248 | 95.346 | 89.807 | 84.873 | 88.298 | 83.679 | 78.649 | 88.767 | 37.374 | [`link`](http://paddlemodels.bj.bcebos.com/pose/pose-resnet-50-384x384-mpii.tar.gz) |
| 256x256\_pose\_resnet\_50 in PyTorch | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 | - |
| 256x256\_pose\_resnet\_50 in Fluid | 96.385 | 95.363 | 89.211 | 84.084 | 88.454 | 84.182 | 79.546 | 88.748 | 33.750 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-256x256.tar.gz) |
| 384x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - |
| 384x384\_pose\_resnet\_50 in Fluid | 96.862 | 95.635 | 90.046 | 85.557 | 88.818 | 84.948 | 78.484 | 89.235 | 38.093 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz) |
## Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | Models |
| ---- |:--:|:-----:|:------:|:------:|:------:|:--:|:-----:|:------:|:------:|:------:|:------:|
| 256x192\_pose\_resnet\_50 in PyTorch | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 | - |
| 256x192\_pose\_resnet\_50 in Fluid | 0.712 | 0.897 | 0.786 | 0.683 | 0.756 | 0.741 | 0.906 | 0.806 | 0.709 | 0.790 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-256x192.tar.gz) |
| 384x288\_pose\_resnet\_50 in PyTorch | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 | - |
| 384x288\_pose\_resnet\_50 in Fluid | 0.727 | 0.897 | 0.796 | 0.690 | 0.783 | 0.754 | 0.907 | 0.813 | 0.714 | 0.814 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-384x288.tar.gz) |
### Notes:
......@@ -77,16 +82,16 @@ python2 setup.py install --user
### Perform Validating
Downloading the checkpoints of Pose-ResNet-50 trained on MPII dataset from [here](http://paddlemodels.bj.bcebos.com/pose/pose-resnet-50-384x384-mpii.tar.gz). Extract it into the folder `checkpoints` under the directory root of this repo. Then run
Downloading the checkpoints of Pose-ResNet-50 trained on MPII dataset from [here](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz). Extract it into the folder `checkpoints` under the directory root of this repo. Then run
```bash
python2 val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii'
python val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet50-mpii-384x384'
```
### Perform Training
```bash
python2 train.py --dataset 'mpii' # or coco
python train.py --dataset 'mpii' # or coco
```
**Note**: Configurations for training are aggregated in the `lib/mpii_reader.py` and `lib/coco_reader.py`.
......@@ -96,10 +101,10 @@ python2 train.py --dataset 'mpii' # or coco
Put the images into the folder `test` under the directory root of this repo. Then run
```bash
python2 test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii'
python test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii'
```
If there are multiple persons in images, detectors such as [Faster R-CNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn), [SSD](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/object_detection) or others should be used first to crop them out. Because the simple baseline for human pose estimation is a top-down method.
If there are multiple persons in images, detectors such as [Faster R-CNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn), [SSD](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/object_detection) or others should be used first to crop them out. Because the simple baseline for human pose estimation is a top-down method.
## Reference
......
# 关键点检测(Simple Baselines for Human Pose Estimation)
## 介绍
本目录包含了对论文[Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/abs/1804.06208) (ECCV'18)的复现.
![demo](demo.gif)
> **演示视频**: *Bruno Mars - That’s What I Like [官方视频]*.
## 环境依赖
本目录下的代码均在4卡Tesla K40/P40 GPU,CentOS系统,CUDA-9.2/8.0,cuDNN-7.1环境下测试运行无误
- Python == 2.7
- PaddlePaddle >= 1.1.0
- opencv-python >= 3.3
## MPII Val结果
| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1| Models |
| ---- |:----:|:--------:|:-----:|:-----:|:---:|:----:|:-----:|:----:|:-------:|:------:|
| 256x256\_pose\_resnet\_50 in PyTorch | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 | - |
| 256x256\_pose\_resnet\_50 in Fluid | 96.385 | 95.363 | 89.211 | 84.084 | 88.454 | 84.182 | 79.546 | 88.748 | 33.750 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-256x256.tar.gz) |
| 384x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - |
| 384x384\_pose\_resnet\_50 in Fluid | 96.862 | 95.635 | 90.046 | 85.557 | 88.818 | 84.948 | 78.484 | 89.235 | 38.093 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz) |
## COCO val2017结果(使用的检测器在COCO val2017数据集上AP为56.4)
| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | Models |
| ---- |:--:|:-----:|:------:|:------:|:------:|:--:|:-----:|:------:|:------:|:------:|:------:|
| 256x192\_pose\_resnet\_50 in PyTorch | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 | - |
| 256x192\_pose\_resnet\_50 in Fluid | 0.712 | 0.897 | 0.786 | 0.683 | 0.756 | 0.741 | 0.906 | 0.806 | 0.709 | 0.790 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-256x192.tar.gz) |
| 384x288\_pose\_resnet\_50 in PyTorch | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 | - |
| 384x288\_pose\_resnet\_50 in Fluid | 0.727 | 0.897 | 0.796 | 0.690 | 0.783 | 0.754 | 0.907 | 0.813 | 0.714 | 0.814 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-384x288.tar.gz) |
### 说明
- 使用Flip test
- 对当前模型结果并没有进行调参选择,使用下面相关实验配置训练后,取最后一个epoch后的模型作为最终模型,即可得到上述实验结果
## 开始
### 数据准备和预训练模型
- 安照[提示](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation)进行数据准备
- 下载预训练好的ResNet-50
```bash
wget http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar
```
下载完成后,将模型解压、放入到根目录下的'pretrained'文件夹中,默认文件路径树为:
```
${根目录}
`-- pretrained
`-- resnet_50
|-- 115
`-- data
`-- coco
|-- annotations
|-- images
`-- mpii
|-- annot
|-- images
```
### 安装 [COCOAPI](https://github.com/cocodataset/cocoapi)
```bash
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
```
### 模型验证(COCO或MPII)
下载COCO/MPII预训练模型(见上表最后一列所附链接),保存到根目录下的'checkpoints'文件夹中,运行:
```bash
python val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet50-mpii-384x384'
```
### 模型训练
```bash
python train.py --dataset 'mpii' # or coco
```
**说明** 详细参数配置已保存到`lib/mpii_reader.py``lib/coco_reader.py`文件中,通过设置dataset来选择使用具体的参数配置
### 模型测试(任意图片,使用上述COCO或MPII预训练好的模型)
将测试图片放入根目录下的'test'文件夹中,执行
```bash
python test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii'
```
## 引用
- Simple Baselines for Human Pose Estimation and Tracking in PyTorch [`code`](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation)
......@@ -216,7 +216,7 @@ def data_augmentation(sample, is_train):
joints_vis = sample['joints_3d_vis']
c = sample['center']
s = sample['scale']
# score = sample['score'] if 'score' in sample else 1
score = sample['score'] if 'score' in sample else 1
# imgnum = sample['imgnum'] if 'imgnum' in sample else ''
r = 0
......@@ -261,7 +261,7 @@ def data_augmentation(sample, is_train):
if is_train:
return input, target, target_weight
else:
return input, target, target_weight, c, s
return input, target, target_weight, c, s, score, image_file
# Create a reader
def _reader_creator(root, image_set, shuffle=False, is_train=False, use_gt_bbox=False):
......@@ -316,7 +316,14 @@ def train():
return pop
def valid():
reader, mapper = _reader_creator(cfg.DATAROOT, 'val', shuffle=False, is_train=False)
reader, mapper = _reader_creator(cfg.DATAROOT, 'val', shuffle=False, is_train=False, use_gt_bbox=True)
def pop():
for i, x in enumerate(reader()):
yield mapper(x)
return pop
def test():
reader, mapper = _reader_creator(cfg.DATAROOT, 'test', shuffle=False, is_train=False, use_gt_bbox=True)
def pop():
for i, x in enumerate(reader()):
yield mapper(x)
......
......@@ -119,7 +119,7 @@ def test_data_augmentation(sample):
image_file = sample['image']
filename = sample['filename'] if 'filename' in sample else ''
file_id = int(filename.split('.')[0].split('_')[1])
file_id = int(filename.split('.')[0])
input = cv2.imread(
image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
......@@ -174,19 +174,19 @@ def _reader_creator(root, image_set, shuffle=False, is_train=False):
joints_3d_vis[:, 1] = joints_vis[:]
yield dict(
image = os.path.join(cfg.DATAROOT, cfg.IMAGEDIR, image_name),
center = c,
scale = s,
joints_3d = joints_3d,
joints_3d_vis = joints_3d_vis,
filename = image_name,
test_mode = False,
imagenum = 0)
image=os.path.join(cfg.DATAROOT, cfg.IMAGEDIR, image_name),
center=c,
scale=s,
joints_3d=joints_3d,
joints_3d_vis=joints_3d_vis,
filename=image_name,
test_mode=False,
imagenum=0)
else:
fold = 'test'
for img_name in os.listdir(fold):
yield dict(image = os.path.join(fold, img_name),
filename = img_name)
yield dict(image=os.path.join(fold, img_name),
filename=img_name)
if not image_set == 'test':
mapper = functools.partial(data_augmentation, is_train=is_train)
......
......@@ -23,7 +23,7 @@ import paddle.fluid as fluid
__all__ = ["ResNet", "ResNet50", "ResNet101", "ResNet152"]
# Global parameters
BN_MOMENTUM = 0.1
BN_MOMENTUM = 0.9
class ResNet():
def __init__(self, layers=50, kps_num=16, test_mode=False):
......
......@@ -22,7 +22,6 @@ import paddle
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from tqdm import tqdm
from lib import pose_resnet
from utils.transforms import flip_back
from utils.utility import *
......@@ -34,37 +33,24 @@ add_arg = functools.partial(add_arguments, argparser=parser)
add_arg('batch_size', int, 32, "Minibatch size.")
add_arg('dataset', str, 'mpii', "Dataset")
add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
add_arg('num_epochs', int, 140, "Number of epochs.")
add_arg('total_images', int, 144406, "Training image number.")
add_arg('kp_dim', int, 16, "Class number.")
add_arg('model_save_dir', str, "output", "Model save directory")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('checkpoint', str, None, "Whether to resume checkpoint.")
add_arg('lr', float, 0.001, "Set learning rate.")
add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.")
add_arg('flip_test', bool, True, "Flip test")
add_arg('shift_heatmap', bool, True, "Shift heatmap")
add_arg('post_process', bool, False, "post process")
# yapf: enable
FLIP_PAIRS = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]]
def test(args):
import lib.mpii_reader as reader
if args.dataset == 'coco':
import lib.coco_reader as reader
IMAGE_SIZE = [288, 384]
# HEATMAP_SIZE = [72, 96]
FLIP_PAIRS = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
args.kp_dim = 17
args.total_images = 144406 # 149813
elif args.dataset == 'mpii':
import lib.mpii_reader as reader
IMAGE_SIZE = [384, 384]
# HEATMAP_SIZE = [96, 96]
FLIP_PAIRS = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]]
args.kp_dim = 16
args.total_images = 2958 # validation
else:
raise ValueError('The dataset {} is not supported yet.'.format(args.dataset))
......@@ -80,15 +66,6 @@ def test(args):
# Output
output = model.net(input=image, target=None, target_weight=None)
# Parameters from model and arguments
params = {}
params["total_images"] = args.total_images
params["lr"] = args.lr
params["num_epochs"] = args.num_epochs
params["learning_strategy"] = {}
params["learning_strategy"]["batch_size"] = args.batch_size
params["learning_strategy"]["name"] = args.lr_strategy
if args.with_mem_opt:
fluid.memory_optimize(fluid.default_main_program(),
skip_opt_set=[output.name])
......@@ -97,13 +74,6 @@ def test(args):
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
args.pretrained_model = './pretrained/resnet_50/115'
if args.pretrained_model:
def if_exist(var):
exist_flag = os.path.exists(os.path.join(args.pretrained_model, var.name))
return exist_flag
fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist)
if args.checkpoint is not None:
fluid.io.load_persistables(exe, args.checkpoint)
......@@ -113,12 +83,12 @@ def test(args):
test_exe = fluid.ParallelExecutor(
use_cuda=True if args.use_gpu else False,
main_program=fluid.default_main_program().clone(for_test=False),
main_program=fluid.default_main_program().clone(for_test=True),
loss_name=None)
fetch_list = [image.name, output.name]
for batch_id, data in tqdm(enumerate(test_reader())):
for batch_id, data in enumerate(test_reader()):
num_images = len(data)
file_ids = []
......
......@@ -30,18 +30,18 @@ from utils.utility import *
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('batch_size', int, 32, "Minibatch size.")
add_arg('dataset', str, 'mpii', "Dataset")
add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
add_arg('num_epochs', int, 140, "Number of epochs.")
add_arg('total_images', int, 144406, "Training image number.")
add_arg('kp_dim', int, 16, "Class number.")
add_arg('model_save_dir', str, "output", "Model save directory")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('checkpoint', str, None, "Whether to resume checkpoint.")
add_arg('lr', float, 0.001, "Set learning rate.")
add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.")
add_arg('batch_size', int, 128, "Minibatch size totally.")
add_arg('dataset', str, 'mpii', "Dataset, valid value: mpii, coco")
add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
add_arg('num_epochs', int, 140, "Number of epochs.")
add_arg('total_images', int, 144406, "Training image number.")
add_arg('kp_dim', int, 16, "Class number.")
add_arg('model_save_dir', str, "output", "Model save directory")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, "pretrained/resnet_50/115", "Whether to use pretrained model.")
add_arg('checkpoint', str, None, "Whether to resume checkpoint.")
add_arg('lr', float, 0.001, "Set learning rate.")
add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.")
# yapf: enable
def optimizer_setting(args, params):
......@@ -54,7 +54,7 @@ def optimizer_setting(args, params):
batch_size = ls["batch_size"]
step = int(total_images / batch_size + 1)
ls['epochs'] = [91, 121]
ls['epochs'] = [90, 120]
print('=> LR will be dropped at the epoch of {}'.format(ls['epochs']))
bd = [step * e for e in ls["epochs"]]
......@@ -85,7 +85,7 @@ def train(args):
elif args.dataset == 'mpii':
import lib.mpii_reader as reader
IMAGE_SIZE = [384, 384]
HEATMAP_SIZE = [96, 96]
HEATMAP_SIZE = [96, 96]
args.kp_dim = 16
args.total_images = 22246
else:
......@@ -125,7 +125,7 @@ def train(args):
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
args.pretrained_model = './pretrained/resnet_50/115'
if args.pretrained_model:
def if_exist(var):
exist_flag = os.path.exists(os.path.join(args.pretrained_model, var.name))
......
# Copyright (c) 2019-present, Baidu, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Interface for evaluation."""
class BaseEvaluator(object):
def __init__(self, root, kp_dim):
"""
:param root: the root dir of dataset
:param kp_dim: the dimension of keypoints
"""
self.root = root
self.kp_dim = kp_dim
def evaluate(self, *args, **kwargs):
"""
Need Implementation for specific task / dataset
"""
raise NotImplementedError
# Copyright (c) 2019-present, Baidu, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Interface for COCO evaluation."""
import os
import json
import numpy as np
from collections import defaultdict
from collections import OrderedDict
import pickle
from utils.base_evaluator import BaseEvaluator
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
from nms.nms import oks_nms
class COCOEvaluator(BaseEvaluator):
def __init__(self, root, kp_dim=17):
"""
:param root: the root dir of dataset
:param kp_dim: the dimension of keypoints
"""
super(COCOEvaluator, self).__init__(root, kp_dim)
self.kp_dim = kp_dim
self.in_vis_thre = 0.2
self.oks_thre = 0.9
self.coco = COCO(os.path.join(root, 'annotations', 'person_keypoints_val2017.json'))
cats = [cat['name']
for cat in self.coco.loadCats(self.coco.getCatIds())]
self.classes = ['__background__'] + cats
self.num_classes = len(self.classes)
self._class_to_ind = dict(zip(self.classes, range(self.num_classes)))
self._class_to_coco_ind = dict(zip(cats, self.coco.getCatIds()))
self._coco_ind_to_class_ind = dict([(self._class_to_coco_ind[cls],
self._class_to_ind[cls])
for cls in self.classes[1:]])
def evaluate(self, preds, output_dir, all_boxes, img_path, *args, **kwargs):
"""
:param preds: the predictions to be evaluated
:param output_dir: target directory to save evaluation results
:param all_boxes: ground truth
:param img_path: paths of the original image
:return:
"""
res_folder = os.path.join(output_dir, 'results')
if not os.path.exists(res_folder):
os.makedirs(res_folder)
res_file = os.path.join(res_folder, 'keypoints_coco_results.json')
# person x (keypoints)
_kpts = []
for idx, kpt in enumerate(preds):
_kpts.append({
'keypoints': kpt,
'center': all_boxes[idx][0:2],
'scale': all_boxes[idx][2:4],
'area': all_boxes[idx][4],
'score': all_boxes[idx][5],
'image': int(img_path[idx][-16:-4])
})
# image x person x (keypoints)
kpts = defaultdict(list)
for kpt in _kpts:
kpts[kpt['image']].append(kpt)
# rescoring and oks nms
kp_dim = self.kp_dim
in_vis_thre = self.in_vis_thre
oks_thre = self.oks_thre
oks_nmsed_kpts = []
for img in kpts.keys():
img_kpts = kpts[img]
for n_p in img_kpts:
box_score = n_p['score']
kpt_score = 0
valid_num = 0
for n_jt in range(0, kp_dim):
t_s = n_p['keypoints'][n_jt][2]
if t_s > in_vis_thre:
kpt_score = kpt_score + t_s
valid_num = valid_num + 1
if valid_num != 0:
kpt_score = kpt_score / valid_num
# rescoring
n_p['score'] = kpt_score * box_score
keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))], oks_thre)
if len(keep) == 0:
oks_nmsed_kpts.append(img_kpts)
else:
oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep])
self._write_coco_keypoint_results(oks_nmsed_kpts, res_file)
info_str = self._do_python_keypoint_eval(res_file, res_folder)
name_value = OrderedDict(info_str)
return name_value, name_value['AP']
def _write_coco_keypoint_results(self, keypoints, res_file):
data_pack = [{'cat_id': self._class_to_coco_ind[cls],
'cls_ind': cls_ind,
'cls': cls,
'ann_type': 'keypoints',
'keypoints': keypoints
}
for cls_ind, cls in enumerate(self.classes) if not cls == '__background__']
results = self._coco_keypoint_results_one_category_kernel(data_pack[0])
with open(res_file, 'w') as f:
json.dump(results, f, sort_keys=True, indent=4)
try:
json.load(open(res_file))
except Exception:
content = []
with open(res_file, 'r') as f:
for line in f:
content.append(line)
content[-1] = ']'
with open(res_file, 'w') as f:
for c in content:
f.write(c)
def _coco_keypoint_results_one_category_kernel(self, data_pack):
cat_id = data_pack['cat_id']
keypoints = data_pack['keypoints']
cat_results = []
for img_kpts in keypoints:
if len(img_kpts) == 0:
continue
_key_points = np.array([img_kpts[k]['keypoints']
for k in range(len(img_kpts))])
key_points = np.zeros(
(_key_points.shape[0], self.kp_dim * 3), dtype=np.float)
for ipt in range(self.kp_dim):
key_points[:, ipt * 3 + 0] = _key_points[:, ipt, 0]
key_points[:, ipt * 3 + 1] = _key_points[:, ipt, 1]
key_points[:, ipt * 3 + 2] = _key_points[:, ipt, 2] # keypoints score.
result = [{'image_id': img_kpts[k]['image'],
'category_id': cat_id,
'keypoints': list(key_points[k]),
'score': img_kpts[k]['score'],
'center': list(img_kpts[k]['center']),
'scale': list(img_kpts[k]['scale'])
} for k in range(len(img_kpts))]
cat_results.extend(result)
return cat_results
def _do_python_keypoint_eval(self, res_file, res_folder):
coco_dt = self.coco.loadRes(res_file)
coco_eval = COCOeval(self.coco, coco_dt, 'keypoints')
coco_eval.params.useSegm = None
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
stats_names = ['AP', 'Ap .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5', 'AR .75', 'AR (M)', 'AR (L)']
info_str = []
for ind, name in enumerate(stats_names):
info_str.append((name, coco_eval.stats[ind]))
eval_file = os.path.join(res_folder, 'keypoints_val_results.pkl')
with open(eval_file, 'wb') as f:
pickle.dump(coco_eval, f, pickle.HIGHEST_PROTOCOL)
print('=> coco eval results saved to %s' % eval_file)
return info_str
# Copyright (c) 2019-present, Baidu, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Interface for building evaluator."""
from utils.coco_evaluator import COCOEvaluator
from utils.mpii_evaluator import MPIIEvaluator
evaluator_map = {
'coco': COCOEvaluator,
'mpii': MPIIEvaluator
}
def create_evaluator(dataset):
"""
:param dataset: specific dataset to be evaluated
:return:
"""
return evaluator_map[dataset]
# Copyright (c) 2019-present, Baidu, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Interface for MPII evaluation."""
import os
import numpy as np
from collections import OrderedDict
from scipy.io import loadmat, savemat
from utils.base_evaluator import BaseEvaluator
class MPIIEvaluator(BaseEvaluator):
def __init__(self, root, kp_dim=16):
"""
:param root: the root dir of dataset
:param kp_dim: the dimension of keypoints
"""
super(MPIIEvaluator, self).__init__(root, kp_dim)
self.root = root
self.kp_dim = kp_dim
self.sc_bias = 0.6
self.threshold = 0.5
def evaluate(self, preds, output_dir, *args, **kwargs):
"""
:param preds: the predictions to be evaluated
:param output_dir: target directory to save evaluation results
:return:
"""
# Convert 0-based index to 1-based index
preds = preds[:, :, 0:2] + 1.0
if output_dir:
pred_file = os.path.join(output_dir, 'pred.mat')
savemat(pred_file, mdict={'preds': preds})
gt_file = os.path.join(self.root, 'annot', 'gt_valid.mat')
gt_dict = loadmat(gt_file)
dataset_joints = gt_dict['dataset_joints']
jnt_missing = gt_dict['jnt_missing']
pos_gt_src = gt_dict['pos_gt_src']
headboxes_src = gt_dict['headboxes_src']
pos_pred_src = np.transpose(preds, [1, 2, 0])
head = np.where(dataset_joints == 'head')[1][0]
lsho = np.where(dataset_joints == 'lsho')[1][0]
lelb = np.where(dataset_joints == 'lelb')[1][0]
lwri = np.where(dataset_joints == 'lwri')[1][0]
lhip = np.where(dataset_joints == 'lhip')[1][0]
lkne = np.where(dataset_joints == 'lkne')[1][0]
lank = np.where(dataset_joints == 'lank')[1][0]
rsho = np.where(dataset_joints == 'rsho')[1][0]
relb = np.where(dataset_joints == 'relb')[1][0]
rwri = np.where(dataset_joints == 'rwri')[1][0]
rkne = np.where(dataset_joints == 'rkne')[1][0]
rank = np.where(dataset_joints == 'rank')[1][0]
rhip = np.where(dataset_joints == 'rhip')[1][0]
jnt_visible = 1 - jnt_missing
uv_error = pos_pred_src - pos_gt_src
uv_err = np.linalg.norm(uv_error, axis=1)
headsizes = headboxes_src[1, :, :] - headboxes_src[0, :, :]
headsizes = np.linalg.norm(headsizes, axis=0)
headsizes *= self.sc_bias
scale = np.multiply(headsizes, np.ones((len(uv_err), 1)))
scaled_uv_err = np.divide(uv_err, scale)
scaled_uv_err = np.multiply(scaled_uv_err, jnt_visible)
jnt_count = np.sum(jnt_visible, axis=1)
less_than_threshold = np.multiply((scaled_uv_err <= self.threshold), jnt_visible)
PCKh = np.divide(100. * np.sum(less_than_threshold, axis=1), jnt_count)
# Save
rng = np.arange(0, 0.5 + 0.01, 0.01)
pckAll = np.zeros((len(rng), self.kp_dim))
for r in range(len(rng)):
thresh = rng[r]
less_than_threshold = np.multiply(scaled_uv_err <= thresh, jnt_visible)
pckAll[r, :] = np.divide(100. * np.sum(less_than_threshold, axis=1), jnt_count)
PCKh = np.ma.array(PCKh, mask=False)
PCKh.mask[6:8] = True
jnt_count = np.ma.array(jnt_count, mask=False)
jnt_count.mask[6:8] = True
jnt_ratio = jnt_count / np.sum(jnt_count).astype(np.float64)
name_value = [
('Head', PCKh[head]),
('Shoulder', 0.5 * (PCKh[lsho] + PCKh[rsho])),
('Elbow', 0.5 * (PCKh[lelb] + PCKh[relb])),
('Wrist', 0.5 * (PCKh[lwri] + PCKh[rwri])),
('Hip', 0.5 * (PCKh[lhip] + PCKh[rhip])),
('Knee', 0.5 * (PCKh[lkne] + PCKh[rkne])),
('Ankle', 0.5 * (PCKh[lank] + PCKh[rank])),
('Mean', np.sum(PCKh * jnt_ratio)),
('Mean@0.1', np.sum(pckAll[11, :] * jnt_ratio))
]
name_value = OrderedDict(name_value)
return name_value, name_value['Mean']
......@@ -18,36 +18,35 @@
import os
import argparse
import functools
import numpy as np
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from collections import OrderedDict
from scipy.io import loadmat, savemat
from lib import pose_resnet
from utils.transforms import flip_back
from utils.utility import *
from utils.evaluator_builder import create_evaluator
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('batch_size', int, 32, "Minibatch size.")
add_arg('dataset', str, 'mpii', "Dataset")
add_arg('batch_size', int, 128, "Minibatch size.")
add_arg('dataset', str, 'coco', "Dataset")
add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
add_arg('num_epochs', int, 140, "Number of epochs.")
add_arg('total_images', int, 144406, "Training image number.")
add_arg('kp_dim', int, 16, "Class number.")
add_arg('model_save_dir', str, "output", "Model save directory")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('checkpoint', str, None, "Whether to resume checkpoint.")
add_arg('lr', float, 0.001, "Set learning rate.")
add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.")
add_arg('flip_test', bool, True, "Flip test")
add_arg('shift_heatmap', bool, True, "Shift heatmap")
add_arg('post_process', bool, True, "Post process")
add_arg('post_process', bool, True, "Post process")
add_arg('data_root', str, "data/coco", "Root directory of dataset")
# yapf: enable
def valid(args):
......@@ -57,14 +56,14 @@ def valid(args):
HEATMAP_SIZE = [72, 96]
FLIP_PAIRS = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
args.kp_dim = 17
args.total_images = 144406 # 149813
args.total_images = 6108
elif args.dataset == 'mpii':
import lib.mpii_reader as reader
IMAGE_SIZE = [384, 384]
HEATMAP_SIZE = [96, 96]
FLIP_PAIRS = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]]
args.kp_dim = 16
args.total_images = 2958 # validation
args.total_images = 2958
else:
raise ValueError('The dataset {} is not supported yet.'.format(args.dataset))
......@@ -101,10 +100,11 @@ def valid(args):
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
args.pretrained_model = './pretrained/resnet_50/115'
if args.pretrained_model:
def if_exist(var):
exist_flag = os.path.exists(os.path.join(args.pretrained_model, var.name))
if exist_flag:
print("Copy pretrianed weights from: %s" % var.name)
return exist_flag
fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist)
......@@ -117,7 +117,7 @@ def valid(args):
valid_exe = fluid.ParallelExecutor(
use_cuda=True if args.use_gpu else False,
main_program=fluid.default_main_program().clone(for_test=False),
main_program=fluid.default_main_program().clone(for_test=True),
loss_name=loss.name)
fetch_list = [image.name, loss.name, output.name, target.name]
......@@ -127,11 +127,18 @@ def valid(args):
idx = 0
num_samples = args.total_images
all_preds = np.zeros((num_samples, args.kp_dim, 3),
dtype=np.float32)
all_preds = np.zeros((num_samples, args.kp_dim, 3), dtype=np.float32)
all_boxes = np.zeros((num_samples, 6))
for batch_id, data in enumerate(valid_reader()):
image_path = []
for batch_id, meta in enumerate(valid_reader()):
num_images = len(meta)
data = meta
if args.dataset == 'coco':
for i in range(num_images):
image_path.append(meta[i][-1])
data[i] = data[i][:-1]
num_images = len(data)
centers = []
......@@ -198,7 +205,6 @@ def valid(args):
all_boxes[idx:idx + num_images, 2:4] = scales[:, 0:2]
all_boxes[idx:idx + num_images, 4] = np.prod(scales*200, 1)
all_boxes[idx:idx + num_images, 5] = scores
# image_path.extend(meta['image'])
idx += num_images
......@@ -210,106 +216,11 @@ def valid(args):
save_batch_heatmaps(input_image, out_heatmaps, file_name='visualization@val.jpg', normalize=True)
# Evaluate
args.DATAROOT = 'data/mpii'
args.TEST_SET = 'valid'
output_dir = ''
filenames = []
imgnums = []
image_path = []
name_values, perf_indicator = mpii_evaluate(
args, all_preds, output_dir, all_boxes, image_path,
filenames, imgnums)
output_dir = './'
evaluator = create_evaluator(args.dataset)(args.data_root, args.kp_dim)
name_values, perf_indicator = evaluator.evaluate(all_preds, output_dir, all_boxes, image_path)
print_name_value(name_values, perf_indicator)
def mpii_evaluate(cfg, preds, output_dir, *args, **kwargs):
# Convert 0-based index to 1-based index
preds = preds[:, :, 0:2] + 1.0
if output_dir:
pred_file = os.path.join(output_dir, 'pred.mat')
savemat(pred_file, mdict={'preds': preds})
if 'test' in cfg.TEST_SET:
return {'Null': 0.0}, 0.0
SC_BIAS = 0.6
threshold = 0.5
gt_file = os.path.join(cfg.DATAROOT,
'annot',
'gt_{}.mat'.format(cfg.TEST_SET))
gt_dict = loadmat(gt_file)
dataset_joints = gt_dict['dataset_joints']
jnt_missing = gt_dict['jnt_missing']
pos_gt_src = gt_dict['pos_gt_src']
headboxes_src = gt_dict['headboxes_src']
pos_pred_src = np.transpose(preds, [1, 2, 0])
head = np.where(dataset_joints == 'head')[1][0]
lsho = np.where(dataset_joints == 'lsho')[1][0]
lelb = np.where(dataset_joints == 'lelb')[1][0]
lwri = np.where(dataset_joints == 'lwri')[1][0]
lhip = np.where(dataset_joints == 'lhip')[1][0]
lkne = np.where(dataset_joints == 'lkne')[1][0]
lank = np.where(dataset_joints == 'lank')[1][0]
rsho = np.where(dataset_joints == 'rsho')[1][0]
relb = np.where(dataset_joints == 'relb')[1][0]
rwri = np.where(dataset_joints == 'rwri')[1][0]
rkne = np.where(dataset_joints == 'rkne')[1][0]
rank = np.where(dataset_joints == 'rank')[1][0]
rhip = np.where(dataset_joints == 'rhip')[1][0]
jnt_visible = 1 - jnt_missing
uv_error = pos_pred_src - pos_gt_src
uv_err = np.linalg.norm(uv_error, axis=1)
headsizes = headboxes_src[1, :, :] - headboxes_src[0, :, :]
headsizes = np.linalg.norm(headsizes, axis=0)
headsizes *= SC_BIAS
scale = np.multiply(headsizes, np.ones((len(uv_err), 1)))
scaled_uv_err = np.divide(uv_err, scale)
scaled_uv_err = np.multiply(scaled_uv_err, jnt_visible)
jnt_count = np.sum(jnt_visible, axis=1)
less_than_threshold = np.multiply((scaled_uv_err <= threshold),
jnt_visible)
PCKh = np.divide(100.*np.sum(less_than_threshold, axis=1), jnt_count)
# Save
rng = np.arange(0, 0.5+0.01, 0.01)
pckAll = np.zeros((len(rng), cfg.kp_dim))
for r in range(len(rng)):
threshold = rng[r]
less_than_threshold = np.multiply(scaled_uv_err <= threshold,
jnt_visible)
pckAll[r, :] = np.divide(100.*np.sum(less_than_threshold, axis=1),
jnt_count)
PCKh = np.ma.array(PCKh, mask=False)
PCKh.mask[6:8] = True
jnt_count = np.ma.array(jnt_count, mask=False)
jnt_count.mask[6:8] = True
jnt_ratio = jnt_count / np.sum(jnt_count).astype(np.float64)
name_value = [
('Head', PCKh[head]),
('Shoulder', 0.5 * (PCKh[lsho] + PCKh[rsho])),
('Elbow', 0.5 * (PCKh[lelb] + PCKh[relb])),
('Wrist', 0.5 * (PCKh[lwri] + PCKh[rwri])),
('Hip', 0.5 * (PCKh[lhip] + PCKh[rhip])),
('Knee', 0.5 * (PCKh[lkne] + PCKh[rkne])),
('Ankle', 0.5 * (PCKh[lank] + PCKh[rank])),
('Mean', np.sum(PCKh * jnt_ratio)),
('Mean@0.1', np.sum(pckAll[11, :] * jnt_ratio))
]
name_value = OrderedDict(name_value)
return name_value, name_value['Mean']
# TODO: coco_evaluate()
if __name__ == '__main__':
args = parser.parse_args()
......
......@@ -103,7 +103,7 @@ python infer.py \
## 其他信息
|数据集 | pretrained model |
|---|---|
|CityScape | [Model]()[md: ] |
|CityScape | [pretrained_model](https://paddle-icnet-models.bj.bcebos.com/model_1000.tar.gz) |
## 参考
......
......@@ -155,6 +155,17 @@ class DataGenerater:
else:
return np.pad(image, ((0, pad_h), (0, pad_w)), 'constant')
def random_crop(self, im, out_shape, is_color=True):
h, w = im.shape[:2]
h_start = np.random.randint(0, h - out_shape[0] + 1)
w_start = np.random.randint(0, w - out_shape[1] + 1)
h_end, w_end = h_start + out_shape[0], w_start + out_shape[1]
if is_color:
im = im[h_start:h_end, w_start:w_end, :]
else:
im = im[h_start:h_end, w_start:w_end]
return im
def resize(self, image, label, out_size):
"""
Resize image and label by padding or cropping.
......@@ -166,8 +177,7 @@ class DataGenerater:
combined = np.concatenate((image, label), axis=2)
combined = self.padding_as(
combined, out_size[0], out_size[1], is_color=True)
combined = dataset.image.random_crop(
combined, out_size[0], is_color=True)
combined = self.random_crop(combined, out_size, is_color=True)
image = combined[:, :, 0:3]
label = combined[:, :, 3:4] + ignore_label
return image, label
......
......@@ -235,12 +235,12 @@ def proj_block(input, filter_num, padding=0, dilation=None, stride=1,
def sub_net_4(input, input_shape):
tmp = interp(input, out_shape=np.ceil(input_shape // 32))
tmp = interp(input, out_shape=(input_shape // 32))
tmp = dilation_convs(tmp)
tmp = pyramis_pooling(tmp, input_shape)
tmp = conv(tmp, 1, 1, 256, 1, 1, name="conv5_4_k1")
tmp = bn(tmp, relu=True)
tmp = interp(tmp, input_shape // 16)
tmp = interp(tmp, out_shape=np.ceil(input_shape / 16))
return tmp
......
# Image Classification and Model Zoo
Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-a-model), [finetuning](#finetuning), [evaluation](#evaluation) and [inference](#inference).
Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid.
---
## Table of Contents
- [Installation](#installation)
- [Data preparation](#data-preparation)
- [Training a model with flexible parameters](#training-a-model)
- [Training a model with flexible parameters](#training-a-model-with-flexible-parameters)
- [Using Mixed-Precision Training](#using-mixed-precision-training)
- [Finetuning](#finetuning)
- [Evaluation](#evaluation)
- [Inference](#inference)
- [Supported models and performances](#supported-models)
- [Supported models and performances](#supported-models-and-performances)
## Installation
Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update.
Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later, the latest release version is recommended, If the PaddlePaddle on your device is lower than v0.13.0, please follow the instructions in [installation document](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html) and make an update.
## Data preparation
......@@ -51,6 +51,8 @@ val/ILSVRC2012_val_00000005.jpeg 516
...
```
You may need to modify the path in reader.py to load data correctly.
## Training a model with flexible parameters
After data preparation, one can start the training step by:
......@@ -76,12 +78,17 @@ python train.py \
* **class_dim**: the class number of the classification task. Default: 1000.
* **image_shape**: input size of the network. Default: "3,224,224".
* **model_save_dir**: the directory to save trained model. Default: "output".
* **with_mem_opt**: whether to use memory optimization or not. Default: False.
* **with_mem_opt**: whether to use memory optimization or not. Default: True.
* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay".
* **lr**: initialized learning rate. Default: 0.1.
* **pretrained_model**: model path for pretraining. Default: None.
* **checkpoint**: the checkpoint path to resume. Default: None.
* **model_category**: the category of models, ("models"|"models_name"). Default: "models".
* **data_dir**: the data path. Default: "./data/ILSVRC2012".
* **model_category**: the category of models, ("models"|"models_name"). Default: "models_name".
* **fp16**: whether to enable half precisioin training with fp16. Default: False.
* **scale_loss**: scale loss for fp16. Default: 1.0.
* **l2_decay**: L2_decay parameter. Default: 1e-4.
* **momentum_rate**: momentum_rate. Default: 0.9.
Or can start the training step by running the ```run.sh```.
......@@ -138,7 +145,7 @@ python train.py
```
## Evaluation
Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command:
Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command:
```
python eval.py \
--model=SE_ResNeXt50_32x4d \
......@@ -168,7 +175,6 @@ Inference is used to get prediction score or image features based on trained mod
```
python infer.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--class_dim=1000 \
--image_shape=3,224,224 \
--with_mem_opt=True \
......@@ -209,6 +215,7 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by `
|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% |
|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% |
|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% |
|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% |
|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% |
|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% |
|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% |
......@@ -220,6 +227,8 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by `
- Released models: not specify parameter names
**NOTE: These are trained by using model_category=models**
|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) |
|- |:-: |:-:|
|[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% |
......
......@@ -79,7 +79,7 @@ python train.py \
* **lr**: initialized learning rate. Default: 0.1.
* **pretrained_model**: model path for pretraining. Default: None.
* **checkpoint**: the checkpoint path to resume. Default: None.
* **model_category**: the category of models, ("models"|"models_name"). Default:"models".
* **model_category**: the category of models, ("models"|"models_name"). Default:"models_name".
**数据读取器说明:** 数据读取器定义在```reader.py``````reader_cv2.py```中, 一般, CV2 reader可以提高数据读取速度, reader(PIL)可以得到相对更高的精度, 在[训练阶段](#training-a-model), 默认采用的增广方式是随机裁剪与水平翻转, 而在[评估](#inference)[推断](#inference)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有:
* 旋转
......@@ -164,7 +164,6 @@ Testbatch 80,loss 0.0969972759485, acc1 1.0,acc5 1.0,time 0.41 sec
```
python infer.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--class_dim=1000 \
--image_shape=3,224,224 \
--with_mem_opt=True \
......@@ -204,6 +203,7 @@ Models包括两种模型:带有参数名字的模型,和不带有参数名
|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% |
|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% |
|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% |
|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% |
|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% |
|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% |
|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% |
......@@ -212,6 +212,8 @@ Models包括两种模型:带有参数名字的模型,和不带有参数名
- Released models: not specify parameter names
**注意:这是model_category = models 的预训练模型**
|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) |
|- |:-: |:-:|
|[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% |
......
# PaddlePaddle inference and training script
This directory contains configuration and instructions to run the PaddlePaddle + nGraph for a local training and inference.
# How to build PaddlePaddle framework with NGraph engine
In order to build the PaddlePaddle + nGraph engine and run proper script, follow up a few steps:
1. Install PaddlePaddle project
2. set env exports for nGraph and OpenMP
3. run the inference/training script
Currently supported models:
* ResNet50 (inference and training).
Only support Adam optimizer yet.
Short description of aforementioned steps:
## 1. Install PaddlePaddle
Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you [build from source](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/beginners_guide/install/compile/compile_Ubuntu_en.md), please use the following cmake arguments and ensure to set `-DWITH_NGRAPH=ON`.
```
cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_MKLDNN=ON -DWITH_NGRAPH=ON
```
Note: MKLDNN and MKL are required.
## 2. Set env exports for nGraph and OMP
Set the following exports needed for running nGraph:
```
export FLAGS_use_ngraph=true
export OMP_NUM_THREADS=<num_cpu_cores>
```
If multiple threads are used, you may export the following for better performance:
```
export KMP_AFFINITY=granularity=fine,compact,1,0
```
## 3. How the benchmark script might be run.
If everything built successfully, you can run command in ResNet50 nGraph session in script [run.sh](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/image_classification/run.sh) to start the benchmark job locally. You will need to uncomment the `#ResNet50 nGraph` part of script.
Above is training job using the nGraph, to run the inference job using the nGraph:
Please download the pre-trained resnet50 model from [supported models](https://github.com/PaddlePaddle/models/tree/72dcc7c1a8d5de9d19fbd65b4143bd0d661eee2c/fluid/PaddleCV/image_classification#supported-models-and-performances) for inference script.
......@@ -39,6 +39,8 @@ You can test if distributed training works on a single node before deploying to
***NOTE: for best performance, we recommend using multi-process mode, see No.3. And together with fp16.***
***NOTE: for nccl2 distributed mode, you must ensure each node train same number of samples, or set skip_unbalanced_data to 1 to do sync training.***
1. simply run `python dist_train.py` to start local training with default configuratioins.
2. for pserver mode, run `bash run_ps_mode.sh` to start 2 pservers and 2 trainers, these 2 trainers
will use GPU 0 and 1 to simulate 2 workers.
......@@ -90,4 +92,19 @@ The default resnet50 distributed training config is based on this paper: https:/
### Performance
TBD
The below figure shows fluid distributed training performances. We did these on a 4-node V100 GPU cluster,
each has 8 V100 GPU card, with total of 32 GPUs. All modes can reach the "state of the art (choose loss scale carefully when using fp16 mode)" of ResNet50 model with imagenet dataset. The Y axis in the figure shows
the images/s while the X-axis shows the number of GPUs.
<p align="center">
<img src="../images/imagenet_dist_performance.png" width=528> <br />
Performance of Multiple-GPU Training of Resnet50 on Imagenet
</p>
The second figure shows speed-ups when using multiple GPUs according to the above figure.
<p align="center">
<img src="../images/imagenet_dist_speedup.png" width=528> <br />
Speed-ups of Multiple-GPU Training of Resnet50 on Imagenet
</p>
import paddle.fluid as fluid
import numpy as np
def copyback_repeat_bn_params(main_prog):
repeat_vars = set()
......
......@@ -46,7 +46,7 @@ def parse_args():
add_arg('class_dim', int, 1000, "Class number.")
add_arg('image_shape', str, "3,224,224", "input image size")
add_arg('model_save_dir', str, "output", "model save directory")
add_arg('with_mem_opt', bool, False, "Whether to use memory optimization or not.")
add_arg('with_mem_opt', bool, False, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('checkpoint', str, None, "Whether to resume checkpoint.")
add_arg('lr', float, 0.1, "set learning rate.")
......@@ -57,6 +57,7 @@ def parse_args():
add_arg('model_category', str, "models", "Whether to use models_name or not, valid value:'models','models_name'" )
add_arg('fp16', bool, False, "Enable half precision training with fp16." )
add_arg('scale_loss', float, 1.0, "Scale loss for fp16." )
add_arg('reduce_master_grad', bool, False, "Whether to allreduce fp32 gradients." )
# for distributed
add_arg('update_method', str, "local", "Can be local, pserver, nccl2.")
add_arg('multi_batch_repeat', int, 1, "Batch merge repeats.")
......@@ -66,6 +67,7 @@ def parse_args():
add_arg('async_mode', bool, False, "Async distributed training, only for pserver mode.")
add_arg('reduce_strategy', str, "allreduce", "Choose from reduce or allreduce.")
add_arg('skip_unbalanced_data', bool, False, "Skip data not if data not balanced on nodes.")
add_arg('enable_sequential_execution', bool, False, "Skip data not if data not balanced on nodes.")
# yapf: enable
args = parser.parse_args()
return args
......@@ -130,7 +132,7 @@ def build_program(is_train, main_prog, startup_prog, args):
if os.getenv("FLAGS_selected_gpus"):
# in multi process mode, "trainer_count" will be total devices
# in the whole cluster, and we need to scale num_of nodes.
end_lr *= device_num_per_worker
end_lr /= device_num_per_worker
total_images = args.total_images / trainer_count
step = int(total_images / (args.batch_size * args.multi_batch_repeat) + 1)
......@@ -158,7 +160,8 @@ def build_program(is_train, main_prog, startup_prog, args):
if args.fp16:
params_grads = optimizer.backward(avg_cost)
master_params_grads = utils.create_master_params_grads(
params_grads, main_prog, startup_prog, args.scale_loss)
params_grads, main_prog, startup_prog, args.scale_loss,
reduce_master_grad = args.reduce_master_grad)
optimizer.apply_gradients(master_params_grads)
utils.master_param_to_train_param(master_params_grads, params_grads, main_prog)
else:
......@@ -187,6 +190,24 @@ def test_single(exe, test_prog, args, pyreader, fetch_list):
test_avg_loss = np.mean(np.array(test_losses))
return test_avg_loss, np.mean(acc1.eval()), np.mean(acc5.eval())
def test_parallel(exe, test_prog, args, pyreader, fetch_list):
acc1 = fluid.metrics.Accuracy()
acc5 = fluid.metrics.Accuracy()
test_losses = []
pyreader.start()
while True:
try:
acc_rets = exe.run(fetch_list=fetch_list)
test_losses.append(acc_rets[0])
acc1.update(value=np.array(acc_rets[1]), weight=args.batch_size)
acc5.update(value=np.array(acc_rets[2]), weight=args.batch_size)
except fluid.core.EOFException:
pyreader.reset()
break
test_avg_loss = np.mean(np.array(test_losses))
return test_avg_loss, np.mean(acc1.eval()), np.mean(acc5.eval())
def run_pserver(train_prog, startup_prog):
server_exe = fluid.Executor(fluid.CPUPlace())
server_exe.run(startup_prog)
......@@ -221,14 +242,17 @@ def train_parallel(args):
append_bn_repeat_init_op(train_prog, startup_prog, args.multi_batch_repeat)
startup_exe.run(startup_prog)
if args.checkpoint:
fluid.io.load_persistables(startup_exe, args.checkpoint, main_program=train_prog)
strategy = fluid.ExecutionStrategy()
strategy.num_threads = args.num_threads
build_strategy = fluid.BuildStrategy()
if args.multi_batch_repeat > 1:
pass_builder = build_strategy._finalize_strategy_and_create_passes()
mypass = pass_builder.insert_pass(
len(pass_builder.all_passes()) - 2, "multi_batch_merge_pass")
mypass.set_int("num_repeats", args.multi_batch_repeat)
build_strategy.enable_inplace = False
build_strategy.memory_optimize = False
build_strategy.enable_sequential_execution = bool(args.enable_sequential_execution)
if args.reduce_strategy == "reduce":
build_strategy.reduce_strategy = fluid.BuildStrategy(
).ReduceStrategy.Reduce
......@@ -245,6 +269,15 @@ def train_parallel(args):
else:
num_trainers = args.dist_env["num_trainers"]
trainer_id = args.dist_env["trainer_id"]
# Set this to let build_strategy to add "allreduce_deps_pass" automatically
build_strategy.num_trainers = num_trainers
build_strategy.trainer_id = trainer_id
if args.multi_batch_repeat > 1:
pass_builder = build_strategy._finalize_strategy_and_create_passes()
mypass = pass_builder.insert_pass(
len(pass_builder.all_passes()) - 4, "multi_batch_merge_pass")
mypass.set("num_repeats", args.multi_batch_repeat)
exe = fluid.ParallelExecutor(
True,
......@@ -255,6 +288,14 @@ def train_parallel(args):
num_trainers=num_trainers,
trainer_id=trainer_id)
# Uncomment below lines to use ParallelExecutor to run test.
# test_exe = fluid.ParallelExecutor(
# True,
# main_program=test_prog,
# share_vars_from=exe,
# scope=fluid.global_scope().new_scope()
# )
over_all_start = time.time()
fetch_list = [train_cost.name, train_acc1.name, train_acc5.name]
steps_per_pass = args.total_images / args.batch_size / args.dist_env["num_trainers"]
......@@ -270,8 +311,8 @@ def train_parallel(args):
if batch_id % 30 == 0:
fetch_ret = exe.run(fetch_list)
fetched_data = [np.mean(np.array(d)) for d in fetch_ret]
print("Pass %d, batch %d, loss %s, acc1: %s, acc5: %s, avg batch time %.4f" %
(pass_id, batch_id, fetched_data[0], fetched_data[1],
print("Pass [%d/%d], batch [%d/%d], loss %s, acc1: %s, acc5: %s, avg batch time %.4f" %
(pass_id, args.num_epochs, batch_id, steps_per_pass, fetched_data[0], fetched_data[1],
fetched_data[2], (time.time()-start_time) / batch_id))
else:
fetch_ret = exe.run([])
......@@ -287,15 +328,21 @@ def train_parallel(args):
print_train_time(start_time, time.time(), num_samples)
train_pyreader.reset()
if pass_id > args.start_test_pass:
if pass_id >= args.start_test_pass:
if args.multi_batch_repeat > 1:
copyback_repeat_bn_params(train_prog)
test_fetch_list = [test_cost.name, test_acc1.name, test_acc5.name]
test_ret = test_single(startup_exe, test_prog, args, test_pyreader,test_fetch_list)
# NOTE: switch to below line if you use ParallelExecutor to run test.
# test_ret = test_parallel(test_exe, test_prog, args, test_pyreader,test_fetch_list)
print("Pass: %d, Test Loss %s, test acc1: %s, test acc5: %s\n" %
(pass_id, test_ret[0], test_ret[1], test_ret[2]))
model_path = os.path.join(args.model_save_dir + '/' + args.model,
str(pass_id))
print("saving model to ", model_path)
if not os.path.isdir(model_path):
os.makedirs(model_path)
fluid.io.save_persistables(startup_exe, model_path, main_program=train_prog)
startup_exe.close()
print("total train time: ", time.time() - over_all_start)
......@@ -324,3 +371,4 @@ def main():
if __name__ == "__main__":
main()
......@@ -7,8 +7,6 @@ import time
import sys
import paddle
import paddle.fluid as fluid
#import models
import models_name as models
#import reader_cv2 as reader
import reader as reader
import argparse
......@@ -26,10 +24,21 @@ add_arg('class_dim', int, 1000, "Class number.")
add_arg('image_shape', str, "3,224,224", "Input image size")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." )
# yapf: enable
model_list = [m for m in dir(models) if "__" not in m]
def set_models(model_category):
global models
assert model_category in ["models", "models_name"
], "{} is not in lists: {}".format(
model_category, ["models", "models_name"])
if model_category == "models_name":
import models_name as models
else:
import models as models
def eval(args):
......@@ -40,6 +49,7 @@ def eval(args):
with_memory_optimization = args.with_mem_opt
image_shape = [int(m) for m in args.image_shape.split(",")]
model_list = [m for m in dir(models) if "__" not in m]
assert model_name in model_list, "{} is not in lists: {}".format(args.model,
model_list)
......@@ -63,11 +73,11 @@ def eval(args):
acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
else:
out = model.net(input=image, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label)
cost, pred = fluid.layers.softmax_with_cross_entropy(
out, label, return_softmax=True)
avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
acc_top1 = fluid.layers.accuracy(input=pred, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=pred, label=label, k=5)
test_program = fluid.default_main_program().clone(for_test=True)
......@@ -125,6 +135,7 @@ def eval(args):
def main():
args = parser.parse_args()
print_arguments(args)
set_models(args.model_category)
eval(args)
......
# PaddlePaddle Fast ImageNet Training
PaddlePaddle Fast ImageNet can train ImageNet dataset with fewer epochs. We implemented the it according to the blog
[Now anyone can train Imagenet in 18 minutes](https://www.fast.ai/2018/08/10/fastai-diu-imagenet/) which published on the [fast.ai] website.
PaddlePaddle Fast ImageNet using the dynmiac batch size, dynamic image size, rectangular images validation and etc... so that the Fast ImageNet can achieve the baseline
(acc1: 75%, acc5: 93%) by 27 epochs on 8 * V100 GPUs.
## Experiment
1. Prepare the training data, resize the images to 160 and 352 using `resize.py`, the prepared data folder should look like:
``` text
`-ImageNet
|-train
|-validation
|-160
|-train
`-validation
`-352
|-train
`-validation
```
1. Install the requirements by `pip install -r requirement.txt`.
1. Launch the training job: `python train.py --data_dir /data/imagenet`
1. Learning curve, we launch the training job on V100 GPU card:
<p align="center">
<img src="src/acc_curve.png" hspace='10' /> <br />
</p>
from PIL import Image
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from functools import partial
import multiprocessing
cpus = multiprocessing.cpu_count()
cpus = min(36,cpus)
PATH = Path('/data/imagenet2')
DEST = Path('/data/imagenet2/sz')
def mkdir(path):
if not path.exists():
path.mkdir()
mkdir(DEST)
szs = (160, 352)
def resize_img(p, im, fn, sz):
w,h = im.size
ratio = min(h/sz,w/sz)
im = im.resize((int(w/ratio), int(h/ratio)), resample=Image.BICUBIC)
new_fn = DEST/str(sz)/fn.relative_to(PATH)
mkdir(new_fn.parent())
im.save(new_fn)
def resizes(p, fn):
im = Image.open(fn)
for sz in szs: resize_img(p, im, fn, sz)
def resize_imgs(p):
files = p.glob('*/*.jpeg')
with ProcessPoolExecutor(cpus) as e: e.map(partial(resizes, p), files)
for sz in szs:
ssz=str(sz)
mkdir((DEST/ssz))
for ds in ('validation','train'): mkdir((DEST/ssz/ds))
for ds in ('train',): mkdir((DEST/ssz/ds))
for ds in ("validation", "train"):
print(PATH/ds)
resize_imgs(PATH/ds)
\ No newline at end of file
import os
import numpy as np
import math
import random
import torch
import torch.utils.data
from torch.utils.data.distributed import DistributedSampler
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data.sampler import Sampler
import torchvision
import pickle
from tqdm import tqdm
import time
import multiprocessing
FINISH_EVENT = "FINISH_EVENT"
class PaddleDataLoader(object):
def __init__(self, torch_dataset, indices=None, concurrent=24, queue_size=3072, shuffle=True, shuffle_seed=0):
self.torch_dataset = torch_dataset
self.data_queue = multiprocessing.Queue(queue_size)
self.indices = indices
self.concurrent = concurrent
self.shuffle = shuffle
self.shuffle_seed=shuffle_seed
def _worker_loop(self, dataset, worker_indices, worker_id):
cnt = 0
for idx in worker_indices:
cnt += 1
img, label = self.torch_dataset[idx]
img = np.array(img).astype('uint8').transpose((2, 0, 1))
self.data_queue.put((img, label))
print("worker: [%d] read [%d] samples. " % (worker_id, cnt))
self.data_queue.put(FINISH_EVENT)
def reader(self):
def _reader_creator():
worker_processes = []
total_img = len(self.torch_dataset)
print("total image: ", total_img)
if self.shuffle:
self.indices = [i for i in xrange(total_img)]
random.seed(self.shuffle_seed)
random.shuffle(self.indices)
print("shuffle indices: %s ..." % self.indices[:10])
imgs_per_worker = int(math.ceil(total_img / self.concurrent))
for i in xrange(self.concurrent):
start = i * imgs_per_worker
end = (i + 1) * imgs_per_worker if i != self.concurrent - 1 else None
sliced_indices = self.indices[start:end]
w = multiprocessing.Process(
target=self._worker_loop,
args=(self.torch_dataset, sliced_indices, i)
)
w.daemon = True
w.start()
worker_processes.append(w)
finish_workers = 0
worker_cnt = len(worker_processes)
while finish_workers < worker_cnt:
sample = self.data_queue.get()
if sample == FINISH_EVENT:
finish_workers += 1
else:
yield sample
return _reader_creator
def train(traindir, sz, min_scale=0.08, shuffle_seed=0):
train_tfms = [
transforms.RandomResizedCrop(sz, scale=(min_scale, 1.0)),
transforms.RandomHorizontalFlip()
]
train_dataset = datasets.ImageFolder(traindir, transforms.Compose(train_tfms))
return PaddleDataLoader(train_dataset, shuffle_seed=shuffle_seed).reader()
def test(valdir, bs, sz, rect_val=False):
if rect_val:
idx_ar_sorted = sort_ar(valdir)
idx_sorted, _ = zip(*idx_ar_sorted)
idx2ar = map_idx2ar(idx_ar_sorted, bs)
ar_tfms = [transforms.Resize(int(sz* 1.14)), CropArTfm(idx2ar, sz)]
val_dataset = ValDataset(valdir, transform=ar_tfms)
return PaddleDataLoader(val_dataset, concurrent=1, indices=idx_sorted, shuffle=False).reader()
val_tfms = [transforms.Resize(int(sz* 1.14)), transforms.CenterCrop(sz)]
val_dataset = datasets.ImageFolder(valdir, transforms.Compose(val_tfms))
return PaddleDataLoader(val_dataset).reader()
class ValDataset(datasets.ImageFolder):
def __init__(self, root, transform=None, target_transform=None):
super(ValDataset, self).__init__(root, transform, target_transform)
def __getitem__(self, index):
path, target = self.imgs[index]
sample = self.loader(path)
if self.transform is not None:
for tfm in self.transform:
if isinstance(tfm, CropArTfm):
sample = tfm(sample, index)
else:
sample = tfm(sample)
if self.target_transform is not None:
target = self.target_transform(target)
return sample, target
class CropArTfm(object):
def __init__(self, idx2ar, target_size):
self.idx2ar, self.target_size = idx2ar, target_size
def __call__(self, img, idx):
target_ar = self.idx2ar[idx]
if target_ar < 1:
w = int(self.target_size / target_ar)
size = (w // 8 * 8, self.target_size)
else:
h = int(self.target_size * target_ar)
size = (self.target_size, h // 8 * 8)
return torchvision.transforms.functional.center_crop(img, size)
def sort_ar(valdir):
idx2ar_file = valdir + '/../sorted_idxar.p'
if os.path.isfile(idx2ar_file):
return pickle.load(open(idx2ar_file, 'rb'))
print('Creating AR indexes. Please be patient this may take a couple minutes...')
val_dataset = datasets.ImageFolder(valdir) # AS: TODO: use Image.open instead of looping through dataset
sizes = [img[0].size for img in tqdm(val_dataset, total=len(val_dataset))]
idx_ar = [(i, round(s[0] * 1.0/ s[1], 5)) for i, s in enumerate(sizes)]
sorted_idxar = sorted(idx_ar, key=lambda x: x[1])
pickle.dump(sorted_idxar, open(idx2ar_file, 'wb'))
print('Done')
return sorted_idxar
def chunks(l, n):
n = max(1, n)
return (l[i:i + n] for i in range(0, len(l), n))
def map_idx2ar(idx_ar_sorted, batch_size):
ar_chunks = list(chunks(idx_ar_sorted, batch_size))
idx2ar = {}
for chunk in ar_chunks:
idxs, ars = list(zip(*chunk))
mean = round(np.mean(ars), 5)
for idx in idxs:
idx2ar[idx] = mean
return idx2ar
......@@ -7,7 +7,6 @@ import time
import sys
import paddle
import paddle.fluid as fluid
import models
import reader
import argparse
import functools
......@@ -23,9 +22,19 @@ add_arg('image_shape', str, "3,224,224", "Input image size")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." )
# yapf: enable
model_list = [m for m in dir(models) if "__" not in m]
def set_models(model_category):
global models
assert model_category in ["models", "models_name"
], "{} is not in lists: {}".format(
model_category, ["models", "models_name"])
if model_category == "models_name":
import models_name as models
else:
import models as models
def infer(args):
......@@ -35,7 +44,7 @@ def infer(args):
pretrained_model = args.pretrained_model
with_memory_optimization = args.with_mem_opt
image_shape = [int(m) for m in args.image_shape.split(",")]
model_list = [m for m in dir(models) if "__" not in m]
assert model_name in model_list, "{} is not in lists: {}".format(args.model,
model_list)
......@@ -85,6 +94,7 @@ def infer(args):
def main():
args = parser.parse_args()
print_arguments(args)
set_models(args.model_category)
infer(args)
......
......@@ -9,3 +9,4 @@ from .inception_v4 import InceptionV4
from .se_resnext import SE_ResNeXt50_32x4d, SE_ResNeXt101_32x4d, SE_ResNeXt152_32x4d
from .dpn import DPN68, DPN92, DPN98, DPN107, DPN131
from .shufflenet_v2 import ShuffleNetV2_x0_5, ShuffleNetV2_x1_0, ShuffleNetV2_x1_5, ShuffleNetV2_x2_0
from .fast_imagenet import FastImageNet
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import numpy as np
import time
import os
import math
import cProfile, pstats, StringIO
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
import paddle.fluid.profiler as profiler
import utils
__all__ = ["FastImageNet"]
class FastImageNet():
def __init__(self, layers=50, is_train=True):
self.layers = layers
self.is_train = is_train
def net(self, input, class_dim=1000, img_size=224, is_train=True):
layers = self.layers
supported_layers = [50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
if layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input, num_filters=64, filter_size=7, stride=2, act='relu')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
for block in range(len(depth)):
for i in range(depth[block]):
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1)
pool_size = int(img_size / 32)
pool = fluid.layers.pool2d(
input=conv, pool_size=pool_size, pool_type='avg', global_pooling=True)
out = fluid.layers.fc(input=pool,
size=class_dim,
act=None,
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.NormalInitializer(0.0, 0.01),
regularizer=fluid.regularizer.L2Decay(1e-4)),
bias_attr=fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(1e-4)))
return out
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
bn_init_value=1.0):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
bias_attr=False,
param_attr=fluid.ParamAttr(regularizer=fluid.regularizer.L2Decay(1e-4)))
return fluid.layers.batch_norm(input=conv, act=act, is_test=not self.is_train,
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Constant(bn_init_value),
regularizer=None))
def shortcut(self, input, ch_out, stride):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
return self.conv_bn_layer(input, ch_out, 1, stride)
else:
return input
def bottleneck_block(self, input, num_filters, stride):
conv0 = self.conv_bn_layer(
input=input, num_filters=num_filters, filter_size=1, act='relu')
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu')
# init bn-weight0
conv2 = self.conv_bn_layer(
input=conv1, num_filters=num_filters * 4, filter_size=1, act=None, bn_init_value=0.0)
short = self.shortcut(input, num_filters * 4, stride)
return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
def lr_decay(lrs, epochs, bs, total_image):
boundaries = []
values = []
for idx, epoch in enumerate(epochs):
step = total_image // bs[idx]
if step * bs[idx] < total_image:
step += 1
ratio = (lrs[idx][1] - lrs[idx][0])*1.0 / (epoch[1] - epoch[0])
lr_base = lrs[idx][0]
for s in xrange(epoch[0], epoch[1]):
if boundaries:
boundaries.append(boundaries[-1] + step + 1)
else:
boundaries = [step]
lr = lr_base + ratio * (s - epoch[0])
values.append(lr)
print("epoch: [%d], steps: [%d], lr: [%f]" % (s, boundaries[-1], values[-1]))
values.append(lrs[-1])
print("epoch: [%d:], steps: [%d:], lr:[%f]" % (epochs[-1][-1], boundaries[-1], values[-1]))
return boundaries, values
......@@ -156,11 +156,13 @@ def _reader_creator(file_list,
for line in lines:
if mode == 'train' or mode == 'val':
img_path, label = line.split()
img_path = img_path.replace("JPEG", "jpeg")
#img_path = img_path.replace("JPEG", "jpeg")
img_path = os.path.join(data_dir, img_path)
yield img_path, int(label)
elif mode == 'test':
img_path = os.path.join(data_dir, line)
img_path, label = line.split()
#img_path = img_path.replace("JPEG", "jpeg")
img_path = os.path.join(data_dir, img_path)
yield [img_path]
mapper = functools.partial(
......
此差异已折叠。
......@@ -76,7 +76,7 @@ python train_pair.py \
```
## Evaluation
Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models) and set its path to ```path_to_pretrain_model```. Then Recall@Rank-1 can be obtained by running the following command:
Evaluation is to evaluate the performance of a trained model. You should set model path to ```path_to_pretrain_model```. Then Recall@Rank-1 can be obtained by running the following command:
```
python eval.py \
--model=ResNet50 \
......@@ -95,7 +95,7 @@ python infer.py \
## Performances
For comparation, many metric learning models with different neural networks and loss functions are trained using corresponding experiential parameters. Recall@Rank-1 is used as evaluation metric and the performance is listed in the table. Pretrained models can be downloaded by clicking related model names.
For comparation, many metric learning models with different neural networks and loss functions are trained using corresponding experiential parameters. Recall@Rank-1 is used as evaluation metric and the performance is listed in the table.
|pretrain model | softmax | arcmargin
|- | - | -:
......@@ -103,9 +103,11 @@ For comparation, many metric learning models with different neural networks and
|fine-tuned with triplet | 78.37% | 79.21%
|fine-tuned with quadruplet | 78.10% | 79.59%
|fine-tuned with eml | 79.32% | 80.11%
|fine-tuned with npairs | - | 79.81%
## Reference
- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [link](https://arxiv.org/abs/1801.07698)
- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [link](https://arxiv.org/abs/1710.00478)
- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [link](https://arxiv.org/abs/1212.6094)
- Improved Deep Metric Learning with Multi-class N-pair Loss Objective [link](http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf)
......@@ -103,9 +103,11 @@ python infer.py \
|使用triplet微调 | 78.37% | 79.21%
|使用quadruplet微调 | 78.10% | 79.59%
|使用eml微调 | 79.32% | 80.11%
|使用npairs微调 | - | 79.81%
## 引用
- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [链接](https://arxiv.org/abs/1801.07698)
- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [链接](https://arxiv.org/abs/1710.00478)
- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [链接](https://arxiv.org/abs/1212.6094)
- Improved Deep Metric Learning with Multi-class N-pair Loss Objective [链接](http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf)
......@@ -6,3 +6,4 @@ from .arcmarginloss import ArcMarginLoss
from .tripletloss import TripletLoss
from .quadrupletloss import QuadrupletLoss
from .emlloss import EmlLoss
from .npairsloss import NpairsLoss
......@@ -38,7 +38,7 @@ class EmlLoss():
loss = loss1 + loss2 - bias
return loss
def loss(self, input):
def loss(self, input, label=None):
samples_each_class = self.samples_each_class
batch_size = self.cal_loss_batch_size
#input = fluid.layers.l2_normalize(input, axis=1)
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册