未验证 提交 6625a04f 编写于 作者: S SunGaofeng 提交者: GitHub

Merge pull request #1 from PaddlePaddle/develop

update to paddle paddle/models
paddle/operators/check_t.save
paddle/operators/check_tensor.ls
paddle/operators/tensor.save
python/paddle/v2/fluid/tests/book/image_classification_resnet.inference.model/
python/paddle/v2/fluid/tests/book/image_classification_vgg.inference.model/
python/paddle/v2/fluid/tests/book/label_semantic_roles.inference.model/
*.DS_Store *.DS_Store
*.vs *.vs
build/
build_doc/
*.user *.user
.vscode
.idea
.project
.cproject
.pydevproject
.settings/
*.pyc *.pyc
CMakeSettings.json
Makefile
.test_env/
third_party/
*~ *~
bazel-*
third_party/
build_*
# clion workspace.
cmake-build-*
model_test
\ No newline at end of file
...@@ -16,54 +16,56 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ...@@ -16,54 +16,56 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
## PaddleCV ## PaddleCV
模型|简介|模型优势|参考论文 模型|简介|模型优势|参考论文
--|:--:|:--:|:--: --|:--:|:--:|:--:
[AlexNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks) [AlexNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks)
[VGG](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力|[Very Deep ConvNets for Large-Scale Inage Recognition](https://arxiv.org/pdf/1409.1556.pdf) [VGG](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力|[Very Deep ConvNets for Large-Scale Inage Recognition](https://arxiv.org/pdf/1409.1556.pdf)
[GoogleNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594) [GoogleNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594)
[ResNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) [ResNet](./fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
[Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) [Inception-v4](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261)
[MobileNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) [MobileNet](./fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
[DPN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629) [DPN](./fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629)
[SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507) [SE-ResNeXt](./fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507)
[SSD](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) [SSD](./fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325)
[Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf) [Face Detector: PyramidBox](./fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf)
[Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) [Faster RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)
[ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545) [Mask RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|基于Faster RCNN模型的经典实例分割模型|在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。|[Mask R-CNN](https://arxiv.org/abs/1703.06870)
[DCGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf) [ICNet](./fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
[ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784) [DCGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf)
[CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) [ConditionalGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
[CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks) [CycleGAN](./fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593)
[Attention模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247) [CRNN-CTC模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks)
[Attention模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247)
[Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/metric_learning)|度量学习模型|能够用于分析对象时间的关联、比较关系,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域|- [Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/metric_learning)|度量学习模型|能够用于分析对象时间的关联、比较关系,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域|-
[TSN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859) [TSN](./fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859)
[caffe2fluid](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|- [视频模型库](./fluid/PaddleCV/video)|视频模型库|给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型||
[caffe2fluid](./fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|-
## PaddleNLP ## PaddleNLP
模型|简介|模型优势|参考论文 模型|简介|模型优势|参考论文
--|:--:|:--:|:--: --|:--:|:--:|:--:
[Transformer](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762) [Transformer](./fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762)
[LAC](https://github.com/baidu/lac/blob/master/README.md)|联合的词法分析模型|能够整体性地完成中文分词、词性标注、专名识别任务|[Chinese Lexical Analysis with Deep Bi-GRU-CRF Network](https://arxiv.org/abs/1807.01882) [LAC](https://github.com/baidu/lac/blob/master/README.md)|联合的词法分析模型|能够整体性地完成中文分词、词性标注、专名识别任务|[Chinese Lexical Analysis with Deep Bi-GRU-CRF Network](https://arxiv.org/abs/1807.01882)
[Senta](https://github.com/baidu/Senta/blob/master/README.md)|情感倾向分析模型集|百度AI开放平台中情感倾向分析模型|- [Senta](https://github.com/baidu/Senta/blob/master/README.md)|情感倾向分析模型集|百度AI开放平台中情感倾向分析模型|-
[DAM](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103) [DAM](./fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103)
[SimNet](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)|语义匹配框架|使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力|- [SimNet](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)|语义匹配框架|使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力|-
[DuReader](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|- [DuReader](./fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|-
[Bi-GRU-CRF](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|- [Bi-GRU-CRF](./fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|-
## PaddleRec ## PaddleRec
模型|简介|模型优势|参考论文 模型|简介|模型优势|参考论文
--|:--:|:--:|:--: --|:--:|:--:|:--:
[TagSpace](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462) [TagSpace](./fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462)
[GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939) [GRU4Rec](./fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)
[SSR](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726) [SSR](./fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726)
[DeepCTR](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247) [DeepCTR](./fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247)
[Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf) [Multiview-Simnet](./fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)
## Other Models ## Other Models
模型|简介|模型优势|参考论文 模型|简介|模型优势|参考论文
--|:--:|:--:|:--: --|:--:|:--:|:--:
[DeepASR](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|- [DeepASR](./fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|-
[DQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236) [DQN](./fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236)
[DoubleDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) [DoubleDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
[DuelingDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) [DuelingDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
## License ## License
This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE).
......
# LRC Local Rademachar Complexity Regularization
Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper
> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
> Yingzhen Yang, Xingjian Li, Jun Huan.\
> _arXiv:1902.00873_.
---
# Table of Contents
- [Installation](#installation)
- [Data preparation](#data-preparation)
- [Training](#training)
## Installation
Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update.
## Data preparation
When you want to use the cifar-10 dataset for the first time, you can download the dataset as:
sh ./dataset/download.sh
Please make sure your environment has an internet connection.
The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the same directory as the `train.py`. If automatic download fails, you can download cifar-10-python.tar.gz from https://www.cs.toronto.edu/~kriz/cifar.html and decompress it to the location mentioned above.
## Training
After data preparation, one can start the training step by:
python -u train_mixup.py \
--batch_size=80 \
--auxiliary \
--weight_decay=0.0003 \
--learning_rate=0.025 \
--lrc_loss_lambda=0.7 \
--cutout
- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train.
- For more help on arguments:
python train_mixup.py --help
**data reader introduction:**
* Data reader is defined in `reader.py`.
* Reshape the images to 32 * 32.
* In training stage, images are padding to 40 * 40 and cropped randomly to the original size.
* In training stage, images are horizontally random flipped.
* Images are standardized to (0, 1).
* In training stage, cutout images randomly.
* Shuffle the order of the input images during training.
**model configuration:**
* Use auxiliary loss and auxiliary\_weight=0.4.
* Use dropout and drop\_path\_prob=0.2.
* Set lrc\_loss\_lambda=0.7.
**training strategy:**
* Use momentum optimizer with momentum=0.9.
* Weight decay is 0.0003.
* Use cosine decay with init\_lr=0.025.
* Total epoch is 600.
* Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc.
* Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d.
## Reference
- DARTS: Differentiable Architecture Search [`paper`](https://arxiv.org/abs/1806.09055)
- Differentiable architecture search in PyTorch [`code`](https://github.com/quark0/darts)
# LRC 局部Rademachar复杂度正则化
为了在深度神经网络中提升泛化能力,正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则(LRC)的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合,在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布
> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
> Yingzhen Yang, Xingjian Li, Jun Huan.\
> _arXiv:1902.00873_.
---
# 内容
- [安装](#安装)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
## 安装
在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。
## 数据准备
第一次使用CIFAR-10数据集时,您可以通过如果命令下载:
sh ./dataset/download.sh
请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败,您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。
## 模型训练
数据准备好后,可以通过如下命令开始训练:
python -u train_mixup.py \
--batch_size=80 \
--auxiliary \
--weight_decay=0.0003 \
--learning_rate=0.025 \
--lrc_loss_lambda=0.7 \
--cutout
- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。
- 可选参数见:
python train_mixup.py --help
**数据读取器说明:**
* 数据读取器定义在`reader.py`
* 输入图像尺寸统一变换为32 * 32
* 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小
* 训练时图像随机水平翻转
* 对图像每个像素做归一化处理
* 训练时对图像做随机遮挡
* 训练时对输入图像做随机洗牌
**模型配置:**
* 使用辅助损失,辅助损失权重为0.4
* 使用dropout,随机丢弃率为0.2
* 设置lrc\_loss\_lambda为0.7
**训练策略:**
* 采用momentum优化算法训练,momentum=0.9
* 权重衰减系数为0.0001
* 采用正弦学习率衰减,初始学习率为0.025
* 总共训练600轮
* 对卷积权重采用Xaiver初始化,对batch norm权重采用固定初始化,对全连接层权重采用高斯初始化
* 对batch norm和全连接层偏差采用固定初始化,不对卷积设置偏差
## 引用
- DARTS: Differentiable Architecture Search [`论文`](https://arxiv.org/abs/1806.09055)
- Differentiable Architecture Search in PyTorch [`代码`](https://github.com/quark0/darts)
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
mkdir cifar
cd cifar
# Download the data.
echo "Downloading..."
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
# Extract the data.
echo "Extracting..."
tar zvxf cifar-10-python.tar.gz
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from collections import namedtuple
Genotype = namedtuple('Genotype', 'normal normal_concat reduce reduce_concat')
PRIMITIVES = [
'none', 'max_pool_3x3', 'avg_pool_3x3', 'skip_connect', 'sep_conv_3x3',
'sep_conv_5x5', 'dil_conv_3x3', 'dil_conv_5x5'
]
NASNet = Genotype(
normal=[
('sep_conv_5x5', 1),
('sep_conv_3x3', 0),
('sep_conv_5x5', 0),
('sep_conv_3x3', 0),
('avg_pool_3x3', 1),
('skip_connect', 0),
('avg_pool_3x3', 0),
('avg_pool_3x3', 0),
('sep_conv_3x3', 1),
('skip_connect', 1),
],
normal_concat=[2, 3, 4, 5, 6],
reduce=[
('sep_conv_5x5', 1),
('sep_conv_7x7', 0),
('max_pool_3x3', 1),
('sep_conv_7x7', 0),
('avg_pool_3x3', 1),
('sep_conv_5x5', 0),
('skip_connect', 3),
('avg_pool_3x3', 2),
('sep_conv_3x3', 2),
('max_pool_3x3', 1),
],
reduce_concat=[4, 5, 6], )
AmoebaNet = Genotype(
normal=[
('avg_pool_3x3', 0),
('max_pool_3x3', 1),
('sep_conv_3x3', 0),
('sep_conv_5x5', 2),
('sep_conv_3x3', 0),
('avg_pool_3x3', 3),
('sep_conv_3x3', 1),
('skip_connect', 1),
('skip_connect', 0),
('avg_pool_3x3', 1),
],
normal_concat=[4, 5, 6],
reduce=[
('avg_pool_3x3', 0),
('sep_conv_3x3', 1),
('max_pool_3x3', 0),
('sep_conv_7x7', 2),
('sep_conv_7x7', 0),
('avg_pool_3x3', 1),
('max_pool_3x3', 0),
('max_pool_3x3', 1),
('conv_7x1_1x7', 0),
('sep_conv_3x3', 5),
],
reduce_concat=[3, 4, 6])
DARTS_V1 = Genotype(
normal=[('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('skip_connect', 0),
('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1),
('sep_conv_3x3', 0), ('skip_connect', 2)],
normal_concat=[2, 3, 4, 5],
reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
('max_pool_3x3', 0), ('max_pool_3x3', 0), ('skip_connect', 2),
('skip_connect', 2), ('avg_pool_3x3', 0)],
reduce_concat=[2, 3, 4, 5])
DARTS_V2 = Genotype(
normal=[('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0),
('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('skip_connect', 0),
('skip_connect', 0), ('dil_conv_3x3', 2)],
normal_concat=[2, 3, 4, 5],
reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
('max_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2),
('skip_connect', 2), ('max_pool_3x3', 1)],
reduce_concat=[2, 3, 4, 5])
MY_DARTS = Genotype(
normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 0),
('dil_conv_5x5', 1), ('skip_connect', 0), ('sep_conv_3x3', 1),
('skip_connect', 0), ('sep_conv_3x3', 1)],
normal_concat=range(2, 6),
reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('max_pool_3x3', 0),
('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2),
('skip_connect', 2), ('skip_connect', 3)],
reduce_concat=range(2, 6))
DARTS = MY_DARTS
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers.ops as ops
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
import math
from paddle.fluid.initializer import init_on_cpu
def cosine_decay(learning_rate, num_epoch, steps_one_epoch):
"""Applies cosine decay to the learning rate.
lr = 0.5 * (math.cos(epoch * (math.pi / 120)) + 1)
"""
global_step = _decay_step_counter()
with init_on_cpu():
decayed_lr = learning_rate * \
(ops.cos((global_step / steps_one_epoch) \
* math.pi / num_epoch) + 1)/2
return decayed_lr
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import numpy as np
import time
import functools
import paddle
import paddle.fluid as fluid
from operations import *
class Cell():
def __init__(self, genotype, C_prev_prev, C_prev, C, reduction,
reduction_prev):
print(C_prev_prev, C_prev, C)
if reduction_prev:
self.preprocess0 = functools.partial(FactorizedReduce, C_out=C)
else:
self.preprocess0 = functools.partial(
ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0)
self.preprocess1 = functools.partial(
ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0)
if reduction:
op_names, indices = zip(*genotype.reduce)
concat = genotype.reduce_concat
else:
op_names, indices = zip(*genotype.normal)
concat = genotype.normal_concat
print(op_names, indices, concat, reduction)
self._compile(C, op_names, indices, concat, reduction)
def _compile(self, C, op_names, indices, concat, reduction):
assert len(op_names) == len(indices)
self._steps = len(op_names) // 2
self._concat = concat
self.multiplier = len(concat)
self._ops = []
for name, index in zip(op_names, indices):
stride = 2 if reduction and index < 2 else 1
op = functools.partial(OPS[name], C=C, stride=stride, affine=True)
self._ops += [op]
self._indices = indices
def forward(self, s0, s1, drop_prob, is_train, name):
self.training = is_train
preprocess0_name = name + 'preprocess0.'
preprocess1_name = name + 'preprocess1.'
s0 = self.preprocess0(s0, name=preprocess0_name)
s1 = self.preprocess1(s1, name=preprocess1_name)
out = [s0, s1]
for i in range(self._steps):
h1 = out[self._indices[2 * i]]
h2 = out[self._indices[2 * i + 1]]
op1 = self._ops[2 * i]
op2 = self._ops[2 * i + 1]
h3 = op1(h1, name=name + '_ops.' + str(2 * i) + '.')
h4 = op2(h2, name=name + '_ops.' + str(2 * i + 1) + '.')
if self.training and drop_prob > 0.:
if h3 != h1:
h3 = fluid.layers.dropout(
h3,
drop_prob,
dropout_implementation='upscale_in_train')
if h4 != h2:
h4 = fluid.layers.dropout(
h4,
drop_prob,
dropout_implementation='upscale_in_train')
s = h3 + h4
out += [s]
return fluid.layers.concat([out[i] for i in self._concat], axis=1)
def AuxiliaryHeadCIFAR(input, num_classes, aux_name='auxiliary_head'):
relu_a = fluid.layers.relu(input)
pool_a = fluid.layers.pool2d(relu_a, 5, 'avg', 3)
conv2d_a = fluid.layers.conv2d(
pool_a,
128,
1,
name=aux_name + '.features.2',
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=aux_name + '.features.2.weight'),
bias_attr=False)
bn_a_name = aux_name + '.features.3'
bn_a = fluid.layers.batch_norm(
conv2d_a,
act='relu',
name=bn_a_name,
param_attr=ParamAttr(
initializer=Constant(1.), name=bn_a_name + '.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=bn_a_name + '.bias'),
moving_mean_name=bn_a_name + '.running_mean',
moving_variance_name=bn_a_name + '.running_var')
conv2d_b = fluid.layers.conv2d(
bn_a,
768,
2,
name=aux_name + '.features.5',
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=aux_name + '.features.5.weight'),
bias_attr=False)
bn_b_name = aux_name + '.features.6'
bn_b = fluid.layers.batch_norm(
conv2d_b,
act='relu',
name=bn_b_name,
param_attr=ParamAttr(
initializer=Constant(1.), name=bn_b_name + '.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=bn_b_name + '.bias'),
moving_mean_name=bn_b_name + '.running_mean',
moving_variance_name=bn_b_name + '.running_var')
fc_name = aux_name + '.classifier'
fc = fluid.layers.fc(bn_b,
num_classes,
name=fc_name,
param_attr=ParamAttr(
initializer=Normal(scale=1e-3),
name=fc_name + '.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=fc_name + '.bias'))
return fc
def StemConv(input, C_out, kernel_size, padding):
conv_a = fluid.layers.conv2d(
input,
C_out,
kernel_size,
padding=padding,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0), name='stem.0.weight'),
bias_attr=False)
bn_a = fluid.layers.batch_norm(
conv_a,
param_attr=ParamAttr(
initializer=Constant(1.), name='stem.1.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name='stem.1.bias'),
moving_mean_name='stem.1.running_mean',
moving_variance_name='stem.1.running_var')
return bn_a
class NetworkCIFAR(object):
def __init__(self, C, class_num, layers, auxiliary, genotype):
self.class_num = class_num
self._layers = layers
self._auxiliary = auxiliary
stem_multiplier = 3
self.drop_path_prob = 0
C_curr = stem_multiplier * C
C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
self.cells = []
reduction_prev = False
for i in range(layers):
if i in [layers // 3, 2 * layers // 3]:
C_curr *= 2
reduction = True
else:
reduction = False
cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction,
reduction_prev)
reduction_prev = reduction
self.cells += [cell]
C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr
if i == 2 * layers // 3:
C_to_auxiliary = C_prev
def forward(self, init_channel, is_train):
self.training = is_train
self.logits_aux = None
num_channel = init_channel * 3
s0 = StemConv(self.image, num_channel, kernel_size=3, padding=1)
s1 = s0
for i, cell in enumerate(self.cells):
name = 'cells.' + str(i) + '.'
s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train,
name)
if i == int(2 * self._layers // 3):
if self._auxiliary and self.training:
self.logits_aux = AuxiliaryHeadCIFAR(s1, self.class_num)
out = fluid.layers.adaptive_pool2d(s1, (1, 1), "avg")
self.logits = fluid.layers.fc(out,
size=self.class_num,
param_attr=ParamAttr(
initializer=Normal(scale=1e-3),
name='classifier.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
name='classifier.bias'))
return self.logits, self.logits_aux
def build_input(self, image_shape, batch_size, is_train):
if is_train:
py_reader = fluid.layers.py_reader(
capacity=64,
shapes=[[-1] + image_shape, [-1, 1], [-1, 1], [-1, 1], [-1, 1],
[-1, 1], [-1, batch_size, self.class_num - 1]],
lod_levels=[0, 0, 0, 0, 0, 0, 0],
dtypes=[
"float32", "int64", "int64", "float32", "int32", "int32",
"float32"
],
use_double_buffer=True,
name='train_reader')
else:
py_reader = fluid.layers.py_reader(
capacity=64,
shapes=[[-1] + image_shape, [-1, 1]],
lod_levels=[0, 0],
dtypes=["float32", "int64"],
use_double_buffer=True,
name='test_reader')
return py_reader
def train_model(self, py_reader, init_channels, aux, aux_w, batch_size,
loss_lambda):
self.image, self.ya, self.yb, self.lam, self.label_reshape,\
self.non_label_reshape, self.rad_var = fluid.layers.read_file(py_reader)
self.logits, self.logits_aux = self.forward(init_channels, True)
self.mixup_loss = self.mixup_loss(aux, aux_w)
self.lrc_loss = self.lrc_loss(batch_size)
return self.mixup_loss + loss_lambda * self.lrc_loss
def test_model(self, py_reader, init_channels):
self.image, self.ya = fluid.layers.read_file(py_reader)
self.logits, _ = self.forward(init_channels, False)
prob = fluid.layers.softmax(self.logits, use_cudnn=False)
loss = fluid.layers.cross_entropy(prob, self.ya)
acc_1 = fluid.layers.accuracy(self.logits, self.ya, k=1)
acc_5 = fluid.layers.accuracy(self.logits, self.ya, k=5)
return loss, acc_1, acc_5
def mixup_loss(self, auxiliary, auxiliary_weight):
prob = fluid.layers.softmax(self.logits, use_cudnn=False)
loss_a = fluid.layers.cross_entropy(prob, self.ya)
loss_b = fluid.layers.cross_entropy(prob, self.yb)
loss_a_mean = fluid.layers.reduce_mean(loss_a)
loss_b_mean = fluid.layers.reduce_mean(loss_b)
loss = self.lam * loss_a_mean + (1 - self.lam) * loss_b_mean
if auxiliary:
prob_aux = fluid.layers.softmax(self.logits_aux, use_cudnn=False)
loss_a_aux = fluid.layers.cross_entropy(prob_aux, self.ya)
loss_b_aux = fluid.layers.cross_entropy(prob_aux, self.yb)
loss_a_aux_mean = fluid.layers.reduce_mean(loss_a_aux)
loss_b_aux_mean = fluid.layers.reduce_mean(loss_b_aux)
loss_aux = self.lam * loss_a_aux_mean + (1 - self.lam
) * loss_b_aux_mean
return loss + auxiliary_weight * loss_aux
def lrc_loss(self, batch_size):
y_diff_reshape = fluid.layers.reshape(self.logits, shape=(-1, 1))
label_reshape = fluid.layers.squeeze(self.label_reshape, axes=[1])
non_label_reshape = fluid.layers.squeeze(
self.non_label_reshape, axes=[1])
label_reshape.stop_gradient = True
non_label_reshape.stop_graident = True
y_diff_label_reshape = fluid.layers.gather(y_diff_reshape,
label_reshape)
y_diff_non_label_reshape = fluid.layers.gather(y_diff_reshape,
non_label_reshape)
y_diff_label = fluid.layers.reshape(
y_diff_label_reshape, shape=(-1, batch_size, 1))
y_diff_non_label = fluid.layers.reshape(
y_diff_non_label_reshape,
shape=(-1, batch_size, self.class_num - 1))
y_diff_ = y_diff_non_label - y_diff_label
y_diff_ = fluid.layers.transpose(y_diff_, perm=[1, 2, 0])
rad_var_trans = fluid.layers.transpose(self.rad_var, perm=[1, 2, 0])
rad_y_diff_trans = rad_var_trans * y_diff_
lrc_loss_sum = fluid.layers.reduce_sum(rad_y_diff_trans, dim=[0, 1])
lrc_loss_ = fluid.layers.abs(lrc_loss_sum) / (batch_size *
(self.class_num - 1))
lrc_loss_mean = fluid.layers.reduce_mean(lrc_loss_)
return lrc_loss_mean
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import numpy as np
import time
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Xavier
from paddle.fluid.initializer import Normal
from paddle.fluid.initializer import Constant
OPS = {
'none' : lambda input, C, stride, name, affine: Zero(input, stride, name),
'avg_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'avg', pool_stride=stride, pool_padding=1, name=name),
'max_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'max', pool_stride=stride, pool_padding=1, name=name),
'skip_connect' : lambda input,C, stride, name, affine: Identity(input, name) if stride == 1 else FactorizedReduce(input, C, name=name, affine=affine),
'sep_conv_3x3' : lambda input,C, stride, name, affine: SepConv(input, C, C, 3, stride, 1, name=name, affine=affine),
'sep_conv_5x5' : lambda input,C, stride, name, affine: SepConv(input, C, C, 5, stride, 2, name=name, affine=affine),
'sep_conv_7x7' : lambda input,C, stride, name, affine: SepConv(input, C, C, 7, stride, 3, name=name, affine=affine),
'dil_conv_3x3' : lambda input,C, stride, name, affine: DilConv(input, C, C, 3, stride, 2, 2, name=name, affine=affine),
'dil_conv_5x5' : lambda input,C, stride, name, affine: DilConv(input, C, C, 5, stride, 4, 2, name=name, affine=affine),
'conv_7x1_1x7' : lambda input,C, stride, name, affine: SevenConv(input, C, name=name, affine=affine)
}
def ReLUConvBN(input, C_out, kernel_size, stride, padding, name='',
affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_out,
kernel_size,
stride,
padding,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False)
if affine:
reluconvbn_out = fluid.layers.batch_norm(
conv2d_a,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.2.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.2.bias'),
moving_mean_name=name + 'op.2.running_mean',
moving_variance_name=name + 'op.2.running_var')
else:
reluconvbn_out = fluid.layers.batch_norm(
conv2d_a,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.2.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.2.bias'),
moving_mean_name=name + 'op.2.running_mean',
moving_variance_name=name + 'op.2.running_var')
return reluconvbn_out
def DilConv(input,
C_in,
C_out,
kernel_size,
stride,
padding,
dilation,
name='',
affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_in,
kernel_size,
stride,
padding,
dilation,
groups=C_in,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False,
use_cudnn=False)
conv2d_b = fluid.layers.conv2d(
conv2d_a,
C_out,
1,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.2.weight'),
bias_attr=False)
if affine:
dilconv_out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
else:
dilconv_out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
return dilconv_out
def SepConv(input,
C_in,
C_out,
kernel_size,
stride,
padding,
name='',
affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_in,
kernel_size,
stride,
padding,
groups=C_in,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False,
use_cudnn=False)
conv2d_b = fluid.layers.conv2d(
conv2d_a,
C_in,
1,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.2.weight'),
bias_attr=False)
if affine:
bn_a = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
else:
bn_a = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
relu_b = fluid.layers.relu(bn_a)
conv2d_d = fluid.layers.conv2d(
relu_b,
C_in,
kernel_size,
1,
padding,
groups=C_in,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.5.weight'),
bias_attr=False,
use_cudnn=False)
conv2d_e = fluid.layers.conv2d(
conv2d_d,
C_out,
1,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.6.weight'),
bias_attr=False)
if affine:
sepconv_out = fluid.layers.batch_norm(
conv2d_e,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.7.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.7.bias'),
moving_mean_name=name + 'op.7.running_mean',
moving_variance_name=name + 'op.7.running_var')
else:
sepconv_out = fluid.layers.batch_norm(
conv2d_e,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.7.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.7.bias'),
moving_mean_name=name + 'op.7.running_mean',
moving_variance_name=name + 'op.7.running_var')
return sepconv_out
def SevenConv(input, C_out, stride, name='', affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_out, (1, 7), (1, stride), (0, 3),
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.1.weight'),
bias_attr=False)
conv2d_b = fluid.layers.conv2d(
conv2d_a,
C_out, (7, 1), (stride, 1), (3, 0),
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'op.2.weight'),
bias_attr=False)
if affine:
out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
else:
out = fluid.layers.batch_norm(
conv2d_b,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'op.3.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'op.3.bias'),
moving_mean_name=name + 'op.3.running_mean',
moving_variance_name=name + 'op.3.running_var')
def Identity(input, name=''):
return input
def Zero(input, stride, name=''):
ones = np.ones(input.shape[-2:])
ones[::stride, ::stride] = 0
ones = fluid.layers.assign(ones)
return input * ones
def FactorizedReduce(input, C_out, name='', affine=True):
relu_a = fluid.layers.relu(input)
conv2d_a = fluid.layers.conv2d(
relu_a,
C_out // 2,
1,
2,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'conv_1.weight'),
bias_attr=False)
h_end = relu_a.shape[2]
w_end = relu_a.shape[3]
slice_a = fluid.layers.slice(relu_a, [2, 3], [1, 1], [h_end, w_end])
conv2d_b = fluid.layers.conv2d(
slice_a,
C_out // 2,
1,
2,
param_attr=ParamAttr(
initializer=Xavier(
uniform=False, fan_in=0),
name=name + 'conv_2.weight'),
bias_attr=False)
out = fluid.layers.concat([conv2d_a, conv2d_b], axis=1)
if affine:
out = fluid.layers.batch_norm(
out,
param_attr=ParamAttr(
initializer=Constant(1.), name=name + 'bn.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.), name=name + 'bn.bias'),
moving_mean_name=name + 'bn.running_mean',
moving_variance_name=name + 'bn.running_var')
else:
out = fluid.layers.batch_norm(
out,
param_attr=ParamAttr(
initializer=Constant(1.),
learning_rate=0.,
name=name + 'bn.weight'),
bias_attr=ParamAttr(
initializer=Constant(0.),
learning_rate=0.,
name=name + 'bn.bias'),
moving_mean_name=name + 'bn.running_mean',
moving_variance_name=name + 'bn.running_var')
return out
# Copyright (c) 2019 PaddlePaddle Authors. All Rig hts Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
"""
CIFAR-10 dataset.
This module will download dataset from
https://www.cs.toronto.edu/~kriz/cifar.html and parse train/test set into
paddle reader creators.
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class. There are 50000 training images and 10000 test images.
"""
from PIL import Image
from PIL import ImageOps
import numpy as np
import cPickle
import random
import utils
import paddle.fluid as fluid
import time
import os
import functools
import paddle.reader
__all__ = ['train10', 'test10']
image_size = 32
image_depth = 3
half_length = 8
CIFAR_MEAN = [0.4914, 0.4822, 0.4465]
CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
def generate_reshape_label(label, batch_size, CIFAR_CLASSES=10):
reshape_label = np.zeros((batch_size, 1), dtype='int32')
reshape_non_label = np.zeros(
(batch_size * (CIFAR_CLASSES - 1), 1), dtype='int32')
num = 0
for i in range(batch_size):
label_i = label[i]
reshape_label[i] = label_i + i * CIFAR_CLASSES
for j in range(CIFAR_CLASSES):
if label_i != j:
reshape_non_label[num] = \
j + i * CIFAR_CLASSES
num += 1
return reshape_label, reshape_non_label
def generate_bernoulli_number(batch_size, CIFAR_CLASSES=10):
rcc_iters = 50
rad_var = np.zeros((rcc_iters, batch_size, CIFAR_CLASSES - 1))
for i in range(rcc_iters):
bernoulli_num = np.random.binomial(size=batch_size, n=1, p=0.5)
bernoulli_map = np.array([])
ones = np.ones((CIFAR_CLASSES - 1, 1))
for batch_id in range(batch_size):
num = bernoulli_num[batch_id]
var_id = 2 * ones * num - 1
bernoulli_map = np.append(bernoulli_map, var_id)
rad_var[i] = bernoulli_map.reshape((batch_size, CIFAR_CLASSES - 1))
return rad_var.astype('float32')
def preprocess(sample, is_training, args):
image_array = sample.reshape(3, image_size, image_size)
rgb_array = np.transpose(image_array, (1, 2, 0))
img = Image.fromarray(rgb_array, 'RGB')
if is_training:
# pad and ramdom crop
img = ImageOps.expand(img, (4, 4, 4, 4), fill=0) # pad to 40 * 40 * 3
left_top = np.random.randint(9, size=2) # rand 0 - 8
img = img.crop((left_top[0], left_top[1], left_top[0] + image_size,
left_top[1] + image_size))
if np.random.randint(2):
img = img.transpose(Image.FLIP_LEFT_RIGHT)
img = np.array(img).astype(np.float32)
# per_image_standardization
img_float = img / 255.0
img = (img_float - CIFAR_MEAN) / CIFAR_STD
if is_training and args.cutout:
center = np.random.randint(image_size, size=2)
offset_width = max(0, center[0] - half_length)
offset_height = max(0, center[1] - half_length)
target_width = min(center[0] + half_length, image_size)
target_height = min(center[1] + half_length, image_size)
for i in range(offset_height, target_height):
for j in range(offset_width, target_width):
img[i][j][:] = 0.0
img = np.transpose(img, (2, 0, 1))
return img
def reader_creator_filepath(filename, sub_name, is_training, args):
files = os.listdir(filename)
names = [each_item for each_item in files if sub_name in each_item]
names.sort()
datasets = []
for name in names:
print("Reading file " + name)
batch = cPickle.load(open(filename + name, 'rb'))
data = batch['data']
labels = batch.get('labels', batch.get('fine_labels', None))
assert labels is not None
dataset = zip(data, labels)
datasets.extend(dataset)
random.shuffle(datasets)
def read_batch(datasets, args):
for sample, label in datasets:
im = preprocess(sample, is_training, args)
yield im, [int(label)]
def reader():
batch_data = []
batch_label = []
for data, label in read_batch(datasets, args):
batch_data.append(data)
batch_label.append(label)
if len(batch_data) == args.batch_size:
batch_data = np.array(batch_data, dtype='float32')
batch_label = np.array(batch_label, dtype='int64')
if is_training:
flatten_label, flatten_non_label = \
generate_reshape_label(batch_label, args.batch_size)
rad_var = generate_bernoulli_number(args.batch_size)
mixed_x, y_a, y_b, lam = utils.mixup_data(
batch_data, batch_label, args.batch_size,
args.mix_alpha)
batch_out = [[mixed_x, y_a, y_b, lam, flatten_label, \
flatten_non_label, rad_var]]
yield batch_out
else:
batch_out = [[batch_data, batch_label]]
yield batch_out
batch_data = []
batch_label = []
return reader
def train10(args):
"""
CIFAR-10 training set creator.
It returns a reader creator, each sample in the reader is image pixels in
[0, 1] and label in [0, 9].
:return: Training reader creator
:rtype: callable
"""
return reader_creator_filepath(args.data, 'data_batch', True, args)
def test10(args):
"""
CIFAR-10 test set creator.
It returns a reader creator, each sample in the reader is image pixels in
[0, 1] and label in [0, 9].
:return: Test reader creator.
:rtype: callable
"""
return reader_creator_filepath(args.data, 'test_batch', False, args)
CUDA_VISIBLE_DEVICES=0 python -u train_mixup.py \
--batch_size=80 \
--auxiliary \
--weight_decay=0.0003 \
--learning_rate=0.025 \
--lrc_loss_lambda=0.7 \
--cutout
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from learning_rate import cosine_decay
import numpy as np
import argparse
from model import NetworkCIFAR as Network
import reader
import sys
import os
import time
import logging
import genotypes
import paddle.fluid as fluid
import shutil
import utils
import cPickle as cp
parser = argparse.ArgumentParser("cifar")
parser.add_argument(
'--data',
type=str,
default='./dataset/cifar/cifar-10-batches-py/',
help='location of the data corpus')
parser.add_argument('--batch_size', type=int, default=96, help='batch size')
parser.add_argument(
'--learning_rate', type=float, default=0.025, help='init learning rate')
parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
parser.add_argument(
'--weight_decay', type=float, default=3e-4, help='weight decay')
parser.add_argument(
'--report_freq', type=float, default=50, help='report frequency')
parser.add_argument(
'--epochs', type=int, default=600, help='num of training epochs')
parser.add_argument(
'--init_channels', type=int, default=36, help='num of init channels')
parser.add_argument(
'--layers', type=int, default=20, help='total number of layers')
parser.add_argument(
'--model_path',
type=str,
default='saved_models',
help='path to save the model')
parser.add_argument(
'--auxiliary',
action='store_true',
default=False,
help='use auxiliary tower')
parser.add_argument(
'--auxiliary_weight',
type=float,
default=0.4,
help='weight for auxiliary loss')
parser.add_argument(
'--cutout', action='store_true', default=False, help='use cutout')
parser.add_argument(
'--cutout_length', type=int, default=16, help='cutout length')
parser.add_argument(
'--drop_path_prob', type=float, default=0.2, help='drop path probability')
parser.add_argument('--save', type=str, default='EXP', help='experiment name')
parser.add_argument(
'--arch', type=str, default='DARTS', help='which architecture to use')
parser.add_argument(
'--grad_clip', type=float, default=5, help='gradient clipping')
parser.add_argument(
'--lr_exp_decay',
action='store_true',
default=False,
help='use exponential_decay learning_rate')
parser.add_argument('--mix_alpha', type=float, default=0.5, help='mixup alpha')
parser.add_argument(
'--lrc_loss_lambda', default=0, type=float, help='lrc_loss_lambda')
parser.add_argument(
'--loss_type',
default=1,
type=float,
help='loss_type 0: cross entropy 1: multi margin loss 2: max margin loss')
args = parser.parse_args()
CIFAR_CLASSES = 10
dataset_train_size = 50000
image_size = 32
def main():
image_shape = [3, image_size, image_size]
devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(","))
logging.info("args = %s", args)
genotype = eval("genotypes.%s" % args.arch)
model = Network(args.init_channels, CIFAR_CLASSES, args.layers,
args.auxiliary, genotype)
steps_one_epoch = dataset_train_size / (devices_num * args.batch_size)
train(model, args, image_shape, steps_one_epoch)
def build_program(main_prog, startup_prog, args, is_train, model, im_shape,
steps_one_epoch):
out = []
with fluid.program_guard(main_prog, startup_prog):
py_reader = model.build_input(im_shape, args.batch_size, is_train)
if is_train:
with fluid.unique_name.guard():
loss = model.train_model(py_reader, args.init_channels,
args.auxiliary, args.auxiliary_weight,
args.batch_size, args.lrc_loss_lambda)
optimizer = fluid.optimizer.Momentum(
learning_rate=cosine_decay(args.learning_rate, \
args.epochs, steps_one_epoch),
regularization=fluid.regularizer.L2Decay(\
args.weight_decay),
momentum=args.momentum)
optimizer.minimize(loss)
out = [py_reader, loss]
else:
with fluid.unique_name.guard():
loss, acc_1, acc_5 = model.test_model(py_reader,
args.init_channels)
out = [py_reader, loss, acc_1, acc_5]
return out
def train(model, args, im_shape, steps_one_epoch):
train_startup_prog = fluid.Program()
test_startup_prog = fluid.Program()
train_prog = fluid.Program()
test_prog = fluid.Program()
train_py_reader, loss_train = build_program(train_prog, train_startup_prog,
args, True, model, im_shape,
steps_one_epoch)
test_py_reader, loss_test, acc_1, acc_5 = build_program(
test_prog, test_startup_prog, args, False, model, im_shape,
steps_one_epoch)
test_prog = test_prog.clone(for_test=True)
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(train_startup_prog)
exe.run(test_startup_prog)
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_threads = 1
train_exe = fluid.ParallelExecutor(
main_program=train_prog,
use_cuda=True,
loss_name=loss_train.name,
exec_strategy=exec_strategy)
train_reader = reader.train10(args)
test_reader = reader.test10(args)
train_py_reader.decorate_paddle_reader(train_reader)
test_py_reader.decorate_paddle_reader(test_reader)
fluid.clip.set_gradient_clip(fluid.clip.GradientClipByNorm(args.grad_clip))
fluid.memory_optimize(fluid.default_main_program())
def save_model(postfix, main_prog):
model_path = os.path.join(args.model_path, postfix)
if os.path.isdir(model_path):
shutil.rmtree(model_path)
fluid.io.save_persistables(exe, model_path, main_program=main_prog)
def test(epoch_id):
test_fetch_list = [loss_test, acc_1, acc_5]
objs = utils.AvgrageMeter()
top1 = utils.AvgrageMeter()
top5 = utils.AvgrageMeter()
test_py_reader.start()
test_start_time = time.time()
step_id = 0
try:
while True:
prev_test_start_time = test_start_time
test_start_time = time.time()
loss_test_v, acc_1_v, acc_5_v = exe.run(
test_prog, fetch_list=test_fetch_list)
objs.update(np.array(loss_test_v), args.batch_size)
top1.update(np.array(acc_1_v), args.batch_size)
top5.update(np.array(acc_5_v), args.batch_size)
if step_id % args.report_freq == 0:
print("Epoch {}, Step {}, acc_1 {}, acc_5 {}, time {}".
format(epoch_id, step_id,
np.array(acc_1_v),
np.array(acc_5_v), test_start_time -
prev_test_start_time))
step_id += 1
except fluid.core.EOFException:
test_py_reader.reset()
print("Epoch {0}, top1 {1}, top5 {2}".format(epoch_id, top1.avg,
top5.avg))
train_fetch_list = [loss_train]
epoch_start_time = time.time()
for epoch_id in range(args.epochs):
model.drop_path_prob = args.drop_path_prob * epoch_id / args.epochs
train_py_reader.start()
epoch_end_time = time.time()
if epoch_id > 0:
print("Epoch {}, total time {}".format(epoch_id - 1, epoch_end_time
- epoch_start_time))
epoch_start_time = epoch_end_time
epoch_end_time
start_time = time.time()
step_id = 0
try:
while True:
prev_start_time = start_time
start_time = time.time()
loss_v, = train_exe.run(
fetch_list=[v.name for v in train_fetch_list])
print("Epoch {}, Step {}, loss {}, time {}".format(epoch_id, step_id, \
np.array(loss_v).mean(), start_time-prev_start_time))
step_id += 1
sys.stdout.flush()
except fluid.core.EOFException:
train_py_reader.reset()
if epoch_id % 50 == 0 or epoch_id == args.epochs - 1:
save_model(str(epoch_id), train_prog)
test(epoch_id)
if __name__ == '__main__':
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# DARTS
# Copyright (c) 2018, Hanxiao Liu.
# Licensed under the Apache License, Version 2.0;
# --------------------------------------------------------
import os
import sys
import time
import math
import numpy as np
def mixup_data(x, y, batch_size, alpha=1.0):
'''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda'''
if alpha > 0.:
lam = np.random.beta(alpha, alpha)
else:
lam = 1.
index = np.random.permutation(batch_size)
mixed_x = lam * x + (1 - lam) * x[index, :]
y_a, y_b = y, y[index]
return mixed_x.astype('float32'), y_a.astype('int64'),\
y_b.astype('int64'), np.array(lam, dtype='float32')
class AvgrageMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.avg = 0
self.sum = 0
self.cnt = 0
def update(self, val, n=1):
self.sum += val * n
self.cnt += n
self.avg = self.sum / self.cnt
#-*- coding: utf-8 -*- #-*- coding: utf-8 -*-
import math
import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr from paddle.fluid.param_attr import ParamAttr
import numpy as np
import math
from tqdm import tqdm from tqdm import tqdm
from utils import fluid_flatten
class DQNModel(object): class DQNModel(object):
...@@ -39,9 +38,17 @@ class DQNModel(object): ...@@ -39,9 +38,17 @@ class DQNModel(object):
name='isOver', shape=[], dtype='bool') name='isOver', shape=[], dtype='bool')
def _build_net(self): def _build_net(self):
self.predict_program = fluid.Program()
self.train_program = fluid.Program()
self._sync_program = fluid.Program()
with fluid.program_guard(self.predict_program):
state, action, reward, next_s, isOver = self._get_inputs() state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state) self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
with fluid.program_guard(self.train_program):
state, action, reward, next_s, isOver = self._get_inputs()
pred_value = self.get_DQN_prediction(state)
reward = fluid.layers.clip(reward, min=-1.0, max=1.0) reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
...@@ -49,7 +56,7 @@ class DQNModel(object): ...@@ -49,7 +56,7 @@ class DQNModel(object):
action_onehot = fluid.layers.cast(action_onehot, dtype='float32') action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum( pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1) fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1) best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
...@@ -60,13 +67,22 @@ class DQNModel(object): ...@@ -60,13 +67,22 @@ class DQNModel(object):
cost = fluid.layers.square_error_cost(pred_action_value, target) cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost) cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost) optimizer.minimize(cost)
# define program vars = list(self.train_program.list_vars())
self.train_program = fluid.default_main_program() policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
with fluid.program_guard(self._sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# fluid exe # fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
...@@ -81,50 +97,50 @@ class DQNModel(object): ...@@ -81,50 +97,50 @@ class DQNModel(object):
conv1 = fluid.layers.conv2d( conv1 = fluid.layers.conv2d(
input=image, input=image,
num_filters=32, num_filters=32,
filter_size=[5, 5], filter_size=5,
stride=[1, 1], stride=1,
padding=[2, 2], padding=2,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d( max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv1, pool_size=2, pool_stride=2, pool_type='max')
conv2 = fluid.layers.conv2d( conv2 = fluid.layers.conv2d(
input=max_pool1, input=max_pool1,
num_filters=32, num_filters=32,
filter_size=[5, 5], filter_size=5,
stride=[1, 1], stride=1,
padding=[2, 2], padding=2,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d( max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv2, pool_size=2, pool_stride=2, pool_type='max')
conv3 = fluid.layers.conv2d( conv3 = fluid.layers.conv2d(
input=max_pool2, input=max_pool2,
num_filters=64, num_filters=64,
filter_size=[4, 4], filter_size=4,
stride=[1, 1], stride=1,
padding=[1, 1], padding=1,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d( max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv3, pool_size=2, pool_stride=2, pool_type='max')
conv4 = fluid.layers.conv2d( conv4 = fluid.layers.conv2d(
input=max_pool3, input=max_pool3,
num_filters=64, num_filters=64,
filter_size=[3, 3], filter_size=3,
stride=[1, 1], stride=1,
padding=[1, 1], padding=1,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4) flatten = fluid.layers.flatten(conv4, axis=1)
out = fluid.layers.fc( out = fluid.layers.fc(
input=flatten, input=flatten,
...@@ -133,23 +149,6 @@ class DQNModel(object): ...@@ -133,23 +149,6 @@ class DQNModel(object):
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test): def act(self, state, train_or_test):
sample = np.random.random() sample = np.random.random()
......
#-*- coding: utf-8 -*- #-*- coding: utf-8 -*-
import math
import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr from paddle.fluid.param_attr import ParamAttr
import numpy as np
from tqdm import tqdm from tqdm import tqdm
import math
from utils import fluid_argmax, fluid_flatten
class DoubleDQNModel(object): class DoubleDQNModel(object):
...@@ -39,9 +38,17 @@ class DoubleDQNModel(object): ...@@ -39,9 +38,17 @@ class DoubleDQNModel(object):
name='isOver', shape=[], dtype='bool') name='isOver', shape=[], dtype='bool')
def _build_net(self): def _build_net(self):
self.predict_program = fluid.Program()
self.train_program = fluid.Program()
self._sync_program = fluid.Program()
with fluid.program_guard(self.predict_program):
state, action, reward, next_s, isOver = self._get_inputs() state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state) self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
with fluid.program_guard(self.train_program):
state, action, reward, next_s, isOver = self._get_inputs()
pred_value = self.get_DQN_prediction(state)
reward = fluid.layers.clip(reward, min=-1.0, max=1.0) reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
...@@ -49,12 +56,13 @@ class DoubleDQNModel(object): ...@@ -49,12 +56,13 @@ class DoubleDQNModel(object):
action_onehot = fluid.layers.cast(action_onehot, dtype='float32') action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum( pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1) fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
next_s_predcit_value = self.get_DQN_prediction(next_s) next_s_predcit_value = self.get_DQN_prediction(next_s)
greedy_action = fluid_argmax(next_s_predcit_value) greedy_action = fluid.layers.argmax(next_s_predcit_value, axis=1)
greedy_action = fluid.layers.unsqueeze(greedy_action, axes=[1])
predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim) predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
best_v = fluid.layers.reduce_sum( best_v = fluid.layers.reduce_sum(
...@@ -67,13 +75,22 @@ class DoubleDQNModel(object): ...@@ -67,13 +75,22 @@ class DoubleDQNModel(object):
cost = fluid.layers.square_error_cost(pred_action_value, target) cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost) cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost) optimizer.minimize(cost)
# define program vars = list(self.train_program.list_vars())
self.train_program = fluid.default_main_program() policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
with fluid.program_guard(self._sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# fluid exe # fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
...@@ -88,50 +105,50 @@ class DoubleDQNModel(object): ...@@ -88,50 +105,50 @@ class DoubleDQNModel(object):
conv1 = fluid.layers.conv2d( conv1 = fluid.layers.conv2d(
input=image, input=image,
num_filters=32, num_filters=32,
filter_size=[5, 5], filter_size=5,
stride=[1, 1], stride=1,
padding=[2, 2], padding=2,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d( max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv1, pool_size=2, pool_stride=2, pool_type='max')
conv2 = fluid.layers.conv2d( conv2 = fluid.layers.conv2d(
input=max_pool1, input=max_pool1,
num_filters=32, num_filters=32,
filter_size=[5, 5], filter_size=5,
stride=[1, 1], stride=1,
padding=[2, 2], padding=2,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d( max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv2, pool_size=2, pool_stride=2, pool_type='max')
conv3 = fluid.layers.conv2d( conv3 = fluid.layers.conv2d(
input=max_pool2, input=max_pool2,
num_filters=64, num_filters=64,
filter_size=[4, 4], filter_size=4,
stride=[1, 1], stride=1,
padding=[1, 1], padding=1,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d( max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv3, pool_size=2, pool_stride=2, pool_type='max')
conv4 = fluid.layers.conv2d( conv4 = fluid.layers.conv2d(
input=max_pool3, input=max_pool3,
num_filters=64, num_filters=64,
filter_size=[3, 3], filter_size=3,
stride=[1, 1], stride=1,
padding=[1, 1], padding=1,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4) flatten = fluid.layers.flatten(conv4, axis=1)
out = fluid.layers.fc( out = fluid.layers.fc(
input=flatten, input=flatten,
...@@ -140,23 +157,6 @@ class DoubleDQNModel(object): ...@@ -140,23 +157,6 @@ class DoubleDQNModel(object):
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test): def act(self, state, train_or_test):
sample = np.random.random() sample = np.random.random()
......
#-*- coding: utf-8 -*- #-*- coding: utf-8 -*-
import math
import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr from paddle.fluid.param_attr import ParamAttr
import numpy as np
from tqdm import tqdm from tqdm import tqdm
import math
from utils import fluid_flatten
class DuelingDQNModel(object): class DuelingDQNModel(object):
...@@ -39,9 +38,17 @@ class DuelingDQNModel(object): ...@@ -39,9 +38,17 @@ class DuelingDQNModel(object):
name='isOver', shape=[], dtype='bool') name='isOver', shape=[], dtype='bool')
def _build_net(self): def _build_net(self):
self.predict_program = fluid.Program()
self.train_program = fluid.Program()
self._sync_program = fluid.Program()
with fluid.program_guard(self.predict_program):
state, action, reward, next_s, isOver = self._get_inputs() state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state) self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
with fluid.program_guard(self.train_program):
state, action, reward, next_s, isOver = self._get_inputs()
pred_value = self.get_DQN_prediction(state)
reward = fluid.layers.clip(reward, min=-1.0, max=1.0) reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
...@@ -49,7 +56,7 @@ class DuelingDQNModel(object): ...@@ -49,7 +56,7 @@ class DuelingDQNModel(object):
action_onehot = fluid.layers.cast(action_onehot, dtype='float32') action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum( pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1) fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1) best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
...@@ -60,13 +67,22 @@ class DuelingDQNModel(object): ...@@ -60,13 +67,22 @@ class DuelingDQNModel(object):
cost = fluid.layers.square_error_cost(pred_action_value, target) cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost) cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost) optimizer.minimize(cost)
# define program vars = list(self.train_program.list_vars())
self.train_program = fluid.default_main_program() policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
with fluid.program_guard(self._sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# fluid exe # fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
...@@ -81,50 +97,50 @@ class DuelingDQNModel(object): ...@@ -81,50 +97,50 @@ class DuelingDQNModel(object):
conv1 = fluid.layers.conv2d( conv1 = fluid.layers.conv2d(
input=image, input=image,
num_filters=32, num_filters=32,
filter_size=[5, 5], filter_size=5,
stride=[1, 1], stride=1,
padding=[2, 2], padding=2,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d( max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv1, pool_size=2, pool_stride=2, pool_type='max')
conv2 = fluid.layers.conv2d( conv2 = fluid.layers.conv2d(
input=max_pool1, input=max_pool1,
num_filters=32, num_filters=32,
filter_size=[5, 5], filter_size=5,
stride=[1, 1], stride=1,
padding=[2, 2], padding=2,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d( max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv2, pool_size=2, pool_stride=2, pool_type='max')
conv3 = fluid.layers.conv2d( conv3 = fluid.layers.conv2d(
input=max_pool2, input=max_pool2,
num_filters=64, num_filters=64,
filter_size=[4, 4], filter_size=4,
stride=[1, 1], stride=1,
padding=[1, 1], padding=1,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d( max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max') input=conv3, pool_size=2, pool_stride=2, pool_type='max')
conv4 = fluid.layers.conv2d( conv4 = fluid.layers.conv2d(
input=max_pool3, input=max_pool3,
num_filters=64, num_filters=64,
filter_size=[3, 3], filter_size=3,
stride=[1, 1], stride=1,
padding=[1, 1], padding=1,
act='relu', act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4) flatten = fluid.layers.flatten(conv4, axis=1)
value = fluid.layers.fc( value = fluid.layers.fc(
input=flatten, input=flatten,
...@@ -143,24 +159,6 @@ class DuelingDQNModel(object): ...@@ -143,24 +159,6 @@ class DuelingDQNModel(object):
advantage, dim=1, keep_dim=True)) advantage, dim=1, keep_dim=True))
return Q return Q
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
# The prune API is deprecated, please don't use it any more.
sync_program = sync_program._prune(sync_ops)
return sync_program
def act(self, state, train_or_test): def act(self, state, train_or_test):
sample = np.random.random() sample = np.random.random()
...@@ -186,12 +184,14 @@ class DuelingDQNModel(object): ...@@ -186,12 +184,14 @@ class DuelingDQNModel(object):
self.global_step += 1 self.global_step += 1
action = np.expand_dims(action, -1) action = np.expand_dims(action, -1)
self.exe.run(self.train_program, \ self.exe.run(self.train_program,
feed={'state': state.astype('float32'), \ feed={
'action': action.astype('int32'), \ 'state': state.astype('float32'),
'reward': reward, \ 'action': action.astype('int32'),
'next_s': next_state.astype('float32'), \ 'reward': reward,
'isOver': isOver}) 'next_s': next_state.astype('float32'),
'isOver': isOver
})
def sync_target_network(self): def sync_target_network(self):
self.exe.run(self._sync_program) self.exe.run(self._sync_program)
...@@ -29,7 +29,7 @@ The average game rewards that can be obtained for the three models as the number ...@@ -29,7 +29,7 @@ The average game rewards that can be obtained for the three models as the number
+ gym + gym
+ tqdm + tqdm
+ opencv-python + opencv-python
+ paddlepaddle-gpu>=0.12.0 + paddlepaddle-gpu>=1.0.0
+ ale_python_interface + ale_python_interface
### Install Dependencies: ### Install Dependencies:
......
...@@ -28,7 +28,7 @@ ...@@ -28,7 +28,7 @@
+ gym + gym
+ tqdm + tqdm
+ opencv-python + opencv-python
+ paddlepaddle-gpu>=0.12.0 + paddlepaddle-gpu>=1.0.0
+ ale_python_interface + ale_python_interface
### 下载依赖: ### 下载依赖:
......
#-*- coding: utf-8 -*-
#File: utils.py
import paddle.fluid as fluid
import numpy as np
def fluid_argmax(x):
"""
Get index of max value for the last dimension
"""
_, max_index = fluid.layers.topk(x, k=1)
return max_index
def fluid_flatten(x):
"""
Flatten fluid variable along the first dimension
"""
return fluid.layers.reshape(x, shape=[-1, np.prod(x.shape[1:])])
DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.0.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。 DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.3.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。
## 代码结构 ## 代码结构
...@@ -38,15 +38,16 @@ data/cityscape/ ...@@ -38,15 +38,16 @@ data/cityscape/
# 预训练模型准备 # 预训练模型准备
我们为了节约更多的显存,在这里我们使用Group Norm作为我们的归一化手段。
如果需要从头开始训练模型,用户需要下载我们的初始化模型 如果需要从头开始训练模型,用户需要下载我们的初始化模型
``` ```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz
tar -xf deeplabv3plus_xception65_initialize.tar.gz && rm deeplabv3plus_xception65_initialize.tar.gz tar -xf deeplabv3plus_gn_init.tgz && rm deeplabv3plus_gn_init.tgz
``` ```
如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型 如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型
``` ```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz
tar -xf deeplabv3plus.tar.gz && rm deeplabv3plus.tar.gz tar -xf deeplabv3plus_gn.tgz && rm deeplabv3plus_gn.tgz
``` ```
...@@ -59,6 +60,7 @@ python ./train.py \ ...@@ -59,6 +60,7 @@ python ./train.py \
--batch_size=1 \ --batch_size=1 \
--train_crop_size=769 \ --train_crop_size=769 \
--total_step=50 \ --total_step=50 \
--norm_type=gn \
--init_weights_path=$INIT_WEIGHTS_PATH \ --init_weights_path=$INIT_WEIGHTS_PATH \
--save_weights_path=$SAVE_WEIGHTS_PATH \ --save_weights_path=$SAVE_WEIGHTS_PATH \
--dataset_path=$DATASET_PATH --dataset_path=$DATASET_PATH
...@@ -72,19 +74,25 @@ python train.py --help ...@@ -72,19 +74,25 @@ python train.py --help
``` ```
python ./train.py \ python ./train.py \
--batch_size=8 \ --batch_size=8 \
--parallel=true \ --parallel=True \
--norm_type=gn \
--train_crop_size=769 \ --train_crop_size=769 \
--total_step=90000 \ --total_step=90000 \
--init_weights_path=deeplabv3plus_xception65_initialize.params \ --base_lr=0.001 \
--save_weights_path=output/ \ --init_weights_path=deeplabv3plus_gn_init \
--save_weights_path=output \
--dataset_path=$DATASET_PATH --dataset_path=$DATASET_PATH
``` ```
如果您的显存不足,可以尝试减小`batch_size`,同时等比例放大`total_step`, 保证相乘的值不变,这得益于Group Norm的特性,改变 `batch_size` 并不会显著影响结果,而且能够节约更多显存, 比如您可以设置`--batch_size=4 --total_step=180000`
如果您希望使用多卡进行训练,可以同比增加`batch_size`,减小`total_step`, 比如原来单卡训练是`--batch_size=4 --total_step=180000`,使用4卡训练则是`--batch_size=16 --total_step=45000`
### 测试 ### 测试
执行以下命令在`Cityscape`测试数据集上进行测试: 执行以下命令在`Cityscape`测试数据集上进行测试:
``` ```
python ./eval.py \ python ./eval.py \
--init_weights=deeplabv3plus.params \ --init_weights=deeplabv3plus_gn \
--norm_type=gn \
--dataset_path=$DATASET_PATH --dataset_path=$DATASET_PATH
``` ```
需要通过选项`--model_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。 需要通过选项`--model_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。
...@@ -93,15 +101,17 @@ python ./eval.py \ ...@@ -93,15 +101,17 @@ python ./eval.py \
## 实验结果 ## 实验结果
训练完成以后,使用`eval.py`在验证集上进行测试,得到以下结果: 训练完成以后,使用`eval.py`在验证集上进行测试,得到以下结果:
``` ```
load from: ../models/deeplabv3p load from: ../models/deeplabv3plus_gn
total number 500 total number 500
step: 500, mIoU: 0.7873 step: 500, mIoU: 0.7881
``` ```
## 其他信息 ## 其他信息
|数据集 | pretrained model | trained model | mean IoU
|---|---|---|---| |数据集 | norm type | pretrained model | trained model | mean IoU
|CityScape | [deeplabv3plus_xception65_initialize.tar.gz](http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz) | [deeplabv3plus.tar.gz](http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz) | 0.7873 | |---|---|---|---|---|
|CityScape | batch norm | [deeplabv3plus_xception65_initialize.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_xception65_initialize.tgz) | [deeplabv3plus.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus.tgz) | 0.7873 |
|CityScape | group norm | [deeplabv3plus_gn_init.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz) | [deeplabv3plus_gn.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz) | 0.7881 |
## 参考 ## 参考
......
...@@ -2,7 +2,9 @@ from __future__ import absolute_import ...@@ -2,7 +2,9 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import os import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98' if 'FLAGS_fraction_of_gpu_memory_to_use' not in os.environ:
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
os.environ['FLAGS_enable_parallel_graph'] = '1'
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -12,21 +14,21 @@ from reader import CityscapeDataset ...@@ -12,21 +14,21 @@ from reader import CityscapeDataset
import reader import reader
import models import models
import sys import sys
import utility
parser = argparse.ArgumentParser()
add_arg = lambda *args: utility.add_arguments(*args, argparser=parser)
def add_argument(name, type, default, help): # yapf: disable
parser.add_argument('--' + name, default=default, type=type, help=help) add_arg('total_step', int, -1, "Number of the step to be evaluated, -1 for full evaluation.")
add_arg('init_weights_path', str, None, "Path of the weights to evaluate.")
add_arg('dataset_path', str, None, "Cityscape dataset path.")
def add_arguments(): add_arg('verbose', bool, False, "Print mIoU for each step if verbose.")
add_argument('total_step', int, -1, add_arg('use_gpu', bool, True, "Whether use GPU or CPU.")
"Number of the step to be evaluated, -1 for full evaluation.") add_arg('num_classes', int, 19, "Number of classes.")
add_argument('init_weights_path', str, None, add_arg('use_py_reader', bool, True, "Use py_reader.")
"Path of the weights to evaluate.") add_arg('norm_type', str, 'bn', "Normalization type, should be 'bn' or 'gn'.")
add_argument('dataset_path', str, None, "Cityscape dataset path.") #yapf: enable
add_argument('verbose', bool, False, "Print mIoU for each step if verbose.")
add_argument('use_gpu', bool, True, "Whether use GPU or CPU.")
add_argument('num_classes', int, 19, "Number of classes.")
def mean_iou(pred, label): def mean_iou(pred, label):
...@@ -43,7 +45,7 @@ def mean_iou(pred, label): ...@@ -43,7 +45,7 @@ def mean_iou(pred, label):
def load_model(): def load_model():
if args.init_weights_path.endswith('/'): if os.path.isdir(args.init_weights_path):
fluid.io.load_params( fluid.io.load_params(
exe, dirname=args.init_weights_path, main_program=tp) exe, dirname=args.init_weights_path, main_program=tp)
else: else:
...@@ -53,13 +55,11 @@ def load_model(): ...@@ -53,13 +55,11 @@ def load_model():
CityscapeDataset = reader.CityscapeDataset CityscapeDataset = reader.CityscapeDataset
parser = argparse.ArgumentParser()
add_arguments()
args = parser.parse_args() args = parser.parse_args()
models.clean() models.clean()
models.is_train = False models.is_train = False
models.default_norm_type = args.norm_type
deeplabv3p = models.deeplabv3p deeplabv3p = models.deeplabv3p
image_shape = [1025, 2049] image_shape = [1025, 2049]
...@@ -73,8 +73,15 @@ reader.default_config['shuffle'] = False ...@@ -73,8 +73,15 @@ reader.default_config['shuffle'] = False
num_classes = args.num_classes num_classes = args.num_classes
with fluid.program_guard(tp, sp): with fluid.program_guard(tp, sp):
if args.use_py_reader:
py_reader = fluid.layers.py_reader(capacity=64,
shapes=[[1, 3, 0, 0], [1] + eval_shape],
dtypes=['float32', 'int32'])
img, label = fluid.layers.read_file(py_reader)
else:
img = fluid.layers.data(name='img', shape=[3, 0, 0], dtype='float32') img = fluid.layers.data(name='img', shape=[3, 0, 0], dtype='float32')
label = fluid.layers.data(name='label', shape=eval_shape, dtype='int32') label = fluid.layers.data(name='label', shape=eval_shape, dtype='int32')
img = fluid.layers.resize_bilinear(img, image_shape) img = fluid.layers.resize_bilinear(img, image_shape)
logit = deeplabv3p(img) logit = deeplabv3p(img)
logit = fluid.layers.resize_bilinear(logit, eval_shape) logit = fluid.layers.resize_bilinear(logit, eval_shape)
...@@ -105,16 +112,25 @@ else: ...@@ -105,16 +112,25 @@ else:
total_step = args.total_step total_step = args.total_step
batches = dataset.get_batch_generator(batch_size, total_step) batches = dataset.get_batch_generator(batch_size, total_step)
if args.use_py_reader:
py_reader.decorate_tensor_provider(lambda :[ (yield b[1],b[2]) for b in batches])
py_reader.start()
sum_iou = 0 sum_iou = 0
all_correct = np.array([0], dtype=np.int64) all_correct = np.array([0], dtype=np.int64)
all_wrong = np.array([0], dtype=np.int64) all_wrong = np.array([0], dtype=np.int64)
for i, imgs, labels, names in batches: for i in range(total_step):
if not args.use_py_reader:
_, imgs, labels, names = next(batches)
result = exe.run(tp, result = exe.run(tp,
feed={'img': imgs, feed={'img': imgs,
'label': labels}, 'label': labels},
fetch_list=[pred, miou, out_wrong, out_correct]) fetch_list=[pred, miou, out_wrong, out_correct])
else:
result = exe.run(tp,
fetch_list=[pred, miou, out_wrong, out_correct])
wrong = result[2][:-1] + all_wrong wrong = result[2][:-1] + all_wrong
right = result[3][:-1] + all_correct right = result[3][:-1] + all_correct
all_wrong = wrong.copy() all_wrong = wrong.copy()
...@@ -122,7 +138,6 @@ for i, imgs, labels, names in batches: ...@@ -122,7 +138,6 @@ for i, imgs, labels, names in batches:
mp = (wrong + right) != 0 mp = (wrong + right) != 0
miou2 = np.mean((right[mp] * 1.0 / (right[mp] + wrong[mp]))) miou2 = np.mean((right[mp] * 1.0 / (right[mp] + wrong[mp])))
if args.verbose: if args.verbose:
print('step: %s, mIoU: %s' % (i + 1, miou2)) print('step: %s, mIoU: %s' % (i + 1, miou2), flush=True)
else: else:
print('\rstep: %s, mIoU: %s' % (i + 1, miou2)) print('\rstep: %s, mIoU: %s' % (i + 1, miou2), end='\r', flush=True)
sys.stdout.flush()
...@@ -5,6 +5,7 @@ import paddle ...@@ -5,6 +5,7 @@ import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import contextlib import contextlib
import os
name_scope = "" name_scope = ""
decode_channel = 48 decode_channel = 48
...@@ -146,10 +147,12 @@ def bn_relu(data): ...@@ -146,10 +147,12 @@ def bn_relu(data):
def relu(data): def relu(data):
return append_op_result(fluid.layers.relu(data), 'relu') return append_op_result(
fluid.layers.relu(
data, name=name_scope + 'relu'), 'relu')
def seq_conv(input, channel, stride, filter, dilation=1, act=None): def seperate_conv(input, channel, stride, filter, dilation=1, act=None):
with scope('depthwise'): with scope('depthwise'):
input = conv( input = conv(
input, input,
...@@ -187,14 +190,14 @@ def xception_block(input, ...@@ -187,14 +190,14 @@ def xception_block(input,
with scope('separable_conv' + str(i + 1)): with scope('separable_conv' + str(i + 1)):
if not activation_fn_in_separable_conv: if not activation_fn_in_separable_conv:
data = relu(data) data = relu(data)
data = seq_conv( data = seperate_conv(
data, data,
channels[i], channels[i],
strides[i], strides[i],
filters[i], filters[i],
dilation=dilation) dilation=dilation)
else: else:
data = seq_conv( data = seperate_conv(
data, data,
channels[i], channels[i],
strides[i], strides[i],
...@@ -273,11 +276,11 @@ def encoder(input): ...@@ -273,11 +276,11 @@ def encoder(input):
with scope("aspp0"): with scope("aspp0"):
aspp0 = bn_relu(conv(input, channel, 1, 1, groups=1, padding=0)) aspp0 = bn_relu(conv(input, channel, 1, 1, groups=1, padding=0))
with scope("aspp1"): with scope("aspp1"):
aspp1 = seq_conv(input, channel, 1, 3, dilation=6, act=relu) aspp1 = seperate_conv(input, channel, 1, 3, dilation=6, act=relu)
with scope("aspp2"): with scope("aspp2"):
aspp2 = seq_conv(input, channel, 1, 3, dilation=12, act=relu) aspp2 = seperate_conv(input, channel, 1, 3, dilation=12, act=relu)
with scope("aspp3"): with scope("aspp3"):
aspp3 = seq_conv(input, channel, 1, 3, dilation=18, act=relu) aspp3 = seperate_conv(input, channel, 1, 3, dilation=18, act=relu)
with scope("concat"): with scope("concat"):
data = append_op_result( data = append_op_result(
fluid.layers.concat( fluid.layers.concat(
...@@ -300,10 +303,10 @@ def decoder(encode_data, decode_shortcut): ...@@ -300,10 +303,10 @@ def decoder(encode_data, decode_shortcut):
[encode_data, decode_shortcut], axis=1) [encode_data, decode_shortcut], axis=1)
append_op_result(encode_data, 'concat') append_op_result(encode_data, 'concat')
with scope("separable_conv1"): with scope("separable_conv1"):
encode_data = seq_conv( encode_data = seperate_conv(
encode_data, encode_channel, 1, 3, dilation=1, act=relu) encode_data, encode_channel, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"): with scope("separable_conv2"):
encode_data = seq_conv( encode_data = seperate_conv(
encode_data, encode_channel, 1, 3, dilation=1, act=relu) encode_data, encode_channel, 1, 3, dilation=1, act=relu)
return encode_data return encode_data
......
...@@ -2,7 +2,8 @@ from __future__ import absolute_import ...@@ -2,7 +2,8 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import os import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98' if 'FLAGS_fraction_of_gpu_memory_to_use' not in os.environ:
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -12,105 +13,94 @@ from reader import CityscapeDataset ...@@ -12,105 +13,94 @@ from reader import CityscapeDataset
import reader import reader
import models import models
import time import time
import contextlib
import paddle.fluid.profiler as profiler
import utility
parser = argparse.ArgumentParser()
def add_argument(name, type, default, help): add_arg = lambda *args: utility.add_arguments(*args, argparser=parser)
parser.add_argument('--' + name, default=default, type=type, help=help)
# yapf: disable
add_arg('batch_size', int, 2, "The number of images in each batch during training.")
def add_arguments(): add_arg('train_crop_size', int, 769, "Image crop size during training.")
add_argument('batch_size', int, 2, add_arg('base_lr', float, 0.0001, "The base learning rate for model training.")
"The number of images in each batch during training.") add_arg('total_step', int, 90000, "Number of the training step.")
add_argument('train_crop_size', int, 769, add_arg('init_weights_path', str, None, "Path of the initial weights in paddlepaddle format.")
"'Image crop size during training.") add_arg('save_weights_path', str, None, "Path of the saved weights during training.")
add_argument('base_lr', float, 0.0001, add_arg('dataset_path', str, None, "Cityscape dataset path.")
"The base learning rate for model training.") add_arg('parallel', bool, True, "using ParallelExecutor.")
add_argument('total_step', int, 90000, "Number of the training step.") add_arg('use_gpu', bool, True, "Whether use GPU or CPU.")
add_argument('init_weights_path', str, None, add_arg('num_classes', int, 19, "Number of classes.")
"Path of the initial weights in paddlepaddle format.") add_arg('load_logit_layer', bool, True, "Load last logit fc layer or not. If you are training with different number of classes, you should set to False.")
add_argument('save_weights_path', str, None, add_arg('memory_optimize', bool, True, "Using memory optimizer.")
"Path of the saved weights during training.") add_arg('norm_type', str, 'bn', "Normalization type, should be 'bn' or 'gn'.")
add_argument('dataset_path', str, None, "Cityscape dataset path.") add_arg('profile', bool, False, "Enable profiler.")
add_argument('parallel', bool, False, "using ParallelExecutor.") add_arg('use_py_reader', bool, True, "Use py reader.")
add_argument('use_gpu', bool, True, "Whether use GPU or CPU.") parser.add_argument(
add_argument('num_classes', int, 19, "Number of classes.")
parser.add_argument(
'--enable_ce', '--enable_ce',
action='store_true', action='store_true',
help='If set, run the task with continuous evaluation logs.') help='If set, run the task with continuous evaluation logs.')
#yapf: enable
@contextlib.contextmanager
def profile_context(profile=True):
if profile:
with profiler.profiler('All', 'total', '/tmp/profile_file2'):
yield
else:
yield
def load_model(): def load_model():
myvars = [ if os.path.isdir(args.init_weights_path):
load_vars = [
x for x in tp.list_vars() x for x in tp.list_vars()
if isinstance(x, fluid.framework.Parameter) and x.name.find('logit') == if isinstance(x, fluid.framework.Parameter) and x.name.find('logit') ==
-1 -1
] ]
if args.init_weights_path.endswith('/'): if args.load_logit_layer:
if args.num_classes == 19:
fluid.io.load_params( fluid.io.load_params(
exe, dirname=args.init_weights_path, main_program=tp) exe, dirname=args.init_weights_path, main_program=tp)
else: else:
fluid.io.load_vars(exe, dirname=args.init_weights_path, vars=myvars) fluid.io.load_vars(exe, dirname=args.init_weights_path, vars=load_vars)
else: else:
if args.num_classes == 19:
fluid.io.load_params( fluid.io.load_params(
exe, exe,
dirname="", dirname="",
filename=args.init_weights_path, filename=args.init_weights_path,
main_program=tp) main_program=tp)
else:
fluid.io.load_vars(
exe, dirname="", filename=args.init_weights_path, vars=myvars)
def save_model(): def save_model():
if args.save_weights_path.endswith('/'): assert not os.path.isfile(args.save_weights_path)
fluid.io.save_params( fluid.io.save_params(
exe, dirname=args.save_weights_path, main_program=tp) exe, dirname=args.save_weights_path, main_program=tp)
else:
fluid.io.save_params(
exe, dirname="", filename=args.save_weights_path, main_program=tp)
def loss(logit, label): def loss(logit, label):
label_nignore = (label < num_classes).astype('float32') label_nignore = fluid.layers.less_than(
label = fluid.layers.elementwise_min( label.astype('float32'),
label, fluid.layers.assign(np.array([num_classes], 'float32')),
fluid.layers.assign(np.array( force_cpu=False).astype('float32')
[num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1]) logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes]) logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1]) label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64') label = fluid.layers.cast(label, 'int64')
label_nignore = fluid.layers.reshape(label_nignore, [-1, 1]) label_nignore = fluid.layers.reshape(label_nignore, [-1, 1])
loss = fluid.layers.softmax_with_cross_entropy(logit, label) loss = fluid.layers.softmax_with_cross_entropy(logit, label, ignore_index=255, numeric_stable_mode=True)
loss = loss * label_nignore label_nignore.stop_gradient = True
no_grad_set.add(label_nignore.name) label.stop_gradient = True
no_grad_set.add(label.name)
return loss, label_nignore return loss, label_nignore
def get_cards(args):
if args.enable_ce:
cards = os.environ.get('CUDA_VISIBLE_DEVICES')
num = len(cards.split(","))
return num
else:
return args.num_devices
CityscapeDataset = reader.CityscapeDataset
parser = argparse.ArgumentParser()
add_arguments()
args = parser.parse_args() args = parser.parse_args()
utility.print_arguments(args)
models.clean() models.clean()
models.bn_momentum = 0.9997 models.bn_momentum = 0.9997
models.dropout_keep_prop = 0.9 models.dropout_keep_prop = 0.9
models.label_number = args.num_classes models.label_number = args.num_classes
models.default_norm_type = args.norm_type
deeplabv3p = models.deeplabv3p deeplabv3p = models.deeplabv3p
sp = fluid.Program() sp = fluid.Program()
...@@ -133,9 +123,14 @@ weight_decay = 0.00004 ...@@ -133,9 +123,14 @@ weight_decay = 0.00004
base_lr = args.base_lr base_lr = args.base_lr
total_step = args.total_step total_step = args.total_step
no_grad_set = set()
with fluid.program_guard(tp, sp): with fluid.program_guard(tp, sp):
if args.use_py_reader:
batch_size_each = batch_size // fluid.core.get_cuda_device_count()
py_reader = fluid.layers.py_reader(capacity=64,
shapes=[[batch_size_each, 3] + image_shape, [batch_size_each] + image_shape],
dtypes=['float32', 'int32'])
img, label = fluid.layers.read_file(py_reader)
else:
img = fluid.layers.data( img = fluid.layers.data(
name='img', shape=[3] + image_shape, dtype='float32') name='img', shape=[3] + image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=image_shape, dtype='int32') label = fluid.layers.data(name='label', shape=image_shape, dtype='int32')
...@@ -154,11 +149,21 @@ with fluid.program_guard(tp, sp): ...@@ -154,11 +149,21 @@ with fluid.program_guard(tp, sp):
lr, lr,
momentum=0.9, momentum=0.9,
regularization=fluid.regularizer.L2DecayRegularizer( regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=weight_decay), ) regularization_coeff=weight_decay))
retv = opt.minimize(loss_mean, startup_program=sp, no_grad_set=no_grad_set) optimize_ops, params_grads = opt.minimize(loss_mean, startup_program=sp)
# ir memory optimizer has some issues, we need to seed grad persistable to
fluid.memory_optimize( # avoid this issue
tp, print_log=False, skip_opt_set=set([pred.name, loss_mean.name]), level=1) for p,g in params_grads: g.persistable = True
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_threads = fluid.core.get_cuda_device_count()
exec_strategy.num_iteration_per_drop_scope = 100
build_strategy = fluid.BuildStrategy()
if args.memory_optimize:
build_strategy.fuse_relu_depthwise_conv = True
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True
place = fluid.CPUPlace() place = fluid.CPUPlace()
if args.use_gpu: if args.use_gpu:
...@@ -170,47 +175,58 @@ if args.init_weights_path: ...@@ -170,47 +175,58 @@ if args.init_weights_path:
print("load from:", args.init_weights_path) print("load from:", args.init_weights_path)
load_model() load_model()
dataset = CityscapeDataset(args.dataset_path, 'train') dataset = reader.CityscapeDataset(args.dataset_path, 'train')
if args.parallel: if args.parallel:
exe_p = fluid.ParallelExecutor( binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
use_cuda=True, loss_name=loss_mean.name, main_program=tp) loss_name=loss_mean.name,
build_strategy=build_strategy,
batches = dataset.get_batch_generator(batch_size, total_step) exec_strategy=exec_strategy)
else:
binary = fluid.compiler.CompiledProgram(main)
if args.use_py_reader:
assert(batch_size % fluid.core.get_cuda_device_count() == 0)
def data_gen():
batches = dataset.get_batch_generator(
batch_size // fluid.core.get_cuda_device_count(),
total_step * fluid.core.get_cuda_device_count())
for b in batches:
yield b[1], b[2]
py_reader.decorate_tensor_provider(data_gen)
py_reader.start()
else:
batches = dataset.get_batch_generator(batch_size, total_step)
total_time = 0.0 total_time = 0.0
epoch_idx = 0 epoch_idx = 0
train_loss = 0 train_loss = 0
for i, imgs, labels, names in batches: with profile_context(args.profile):
for i in range(total_step):
epoch_idx += 1 epoch_idx += 1
begin_time = time.time() begin_time = time.time()
prev_start_time = time.time() prev_start_time = time.time()
if args.parallel: if not args.use_py_reader:
retv = exe_p.run(fetch_list=[pred.name, loss_mean.name], _, imgs, labels, names = next(batches)
train_loss, = exe.run(binary,
feed={'img': imgs, feed={'img': imgs,
'label': labels}) 'label': labels}, fetch_list=[loss_mean])
else: else:
retv = exe.run(tp, train_loss, = exe.run(binary, fetch_list=[loss_mean])
feed={'img': imgs, train_loss = np.mean(train_loss)
'label': labels},
fetch_list=[pred, loss_mean])
end_time = time.time() end_time = time.time()
total_time += end_time - begin_time total_time += end_time - begin_time
if i % 100 == 0: if i % 100 == 0:
print("Model is saved to", args.save_weights_path) print("Model is saved to", args.save_weights_path)
save_model() save_model()
print("step {:d}, loss: {:.6f}, step_time_cost: {:.3f}".format( print("step {:d}, loss: {:.6f}, step_time_cost: {:.3f}".format(
i, np.mean(retv[1]), end_time - prev_start_time)) i, train_loss, end_time - prev_start_time))
# only for ce print("Training done. Model is saved to", args.save_weights_path)
train_loss = np.mean(retv[1]) save_model()
if args.enable_ce: if args.enable_ce:
gpu_num = get_cards(args) gpu_num = fluid.core.get_cuda_device_count()
print("kpis\teach_pass_duration_card%s\t%s" % print("kpis\teach_pass_duration_card%s\t%s" %
(gpu_num, total_time / epoch_idx)) (gpu_num, total_time / epoch_idx))
print("kpis\ttrain_loss_card%s\t%s" % (gpu_num, train_loss)) print("kpis\ttrain_loss_card%s\t%s" % (gpu_num, train_loss))
print("Training done. Model is saved to", args.save_weights_path)
save_model()
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util
import six
def print_arguments(args):
"""Print argparse's arguments.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
parser.add_argument("name", default="Jonh", type=str, help="User name.")
args = parser.parse_args()
print_arguments(args)
:param args: Input argparse.Namespace for printing.
:type args: argparse.Namespace
"""
print("----------- Configuration Arguments -----------")
for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
def add_arguments(argname, type, default, help, argparser, **kwargs):
"""Add argparse's argument.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
add_argument("name", str, "Jonh", "User name.", parser)
args = parser.parse_args()
"""
type = distutils.util.strtobool if type == bool else type
argparser.add_argument(
"--" + argname,
default=default,
type=type,
help=help + ' Default: %(default)s.',
**kwargs)
...@@ -121,7 +121,7 @@ def detect_face(image, shrink): ...@@ -121,7 +121,7 @@ def detect_face(image, shrink):
return_numpy=False) return_numpy=False)
detection = np.array(detection) detection = np.array(detection)
# layout: xmin, ymin, xmax. ymax, score # layout: xmin, ymin, xmax. ymax, score
if detection.shape == (1, ): if np.prod(detection.shape) == 1:
print("No face detected") print("No face detected")
return np.array([[0, 0, 0, 0, 0]]) return np.array([[0, 0, 0, 0, 0]])
det_conf = detection[:, 1] det_conf = detection[:, 1]
......
...@@ -103,7 +103,7 @@ python infer.py \ ...@@ -103,7 +103,7 @@ python infer.py \
## 其他信息 ## 其他信息
|数据集 | pretrained model | |数据集 | pretrained model |
|---|---| |---|---|
|CityScape | [Model]()[md: ] | |CityScape | [pretrained_model](https://paddle-icnet-models.bj.bcebos.com/model_1000.tar.gz) |
## 参考 ## 参考
......
...@@ -155,6 +155,17 @@ class DataGenerater: ...@@ -155,6 +155,17 @@ class DataGenerater:
else: else:
return np.pad(image, ((0, pad_h), (0, pad_w)), 'constant') return np.pad(image, ((0, pad_h), (0, pad_w)), 'constant')
def random_crop(self, im, out_shape, is_color=True):
h, w = im.shape[:2]
h_start = np.random.randint(0, h - out_shape[0] + 1)
w_start = np.random.randint(0, w - out_shape[1] + 1)
h_end, w_end = h_start + out_shape[0], w_start + out_shape[1]
if is_color:
im = im[h_start:h_end, w_start:w_end, :]
else:
im = im[h_start:h_end, w_start:w_end]
return im
def resize(self, image, label, out_size): def resize(self, image, label, out_size):
""" """
Resize image and label by padding or cropping. Resize image and label by padding or cropping.
...@@ -166,8 +177,7 @@ class DataGenerater: ...@@ -166,8 +177,7 @@ class DataGenerater:
combined = np.concatenate((image, label), axis=2) combined = np.concatenate((image, label), axis=2)
combined = self.padding_as( combined = self.padding_as(
combined, out_size[0], out_size[1], is_color=True) combined, out_size[0], out_size[1], is_color=True)
combined = dataset.image.random_crop( combined = self.random_crop(combined, out_size, is_color=True)
combined, out_size[0], is_color=True)
image = combined[:, :, 0:3] image = combined[:, :, 0:3]
label = combined[:, :, 3:4] + ignore_label label = combined[:, :, 3:4] + ignore_label
return image, label return image, label
......
...@@ -235,12 +235,12 @@ def proj_block(input, filter_num, padding=0, dilation=None, stride=1, ...@@ -235,12 +235,12 @@ def proj_block(input, filter_num, padding=0, dilation=None, stride=1,
def sub_net_4(input, input_shape): def sub_net_4(input, input_shape):
tmp = interp(input, out_shape=np.ceil(input_shape // 32)) tmp = interp(input, out_shape=(input_shape // 32))
tmp = dilation_convs(tmp) tmp = dilation_convs(tmp)
tmp = pyramis_pooling(tmp, input_shape) tmp = pyramis_pooling(tmp, input_shape)
tmp = conv(tmp, 1, 1, 256, 1, 1, name="conv5_4_k1") tmp = conv(tmp, 1, 1, 256, 1, 1, name="conv5_4_k1")
tmp = bn(tmp, relu=True) tmp = bn(tmp, relu=True)
tmp = interp(tmp, input_shape // 16) tmp = interp(tmp, out_shape=np.ceil(input_shape / 16))
return tmp return tmp
......
...@@ -81,7 +81,7 @@ python train.py \ ...@@ -81,7 +81,7 @@ python train.py \
* **lr**: initialized learning rate. Default: 0.1. * **lr**: initialized learning rate. Default: 0.1.
* **pretrained_model**: model path for pretraining. Default: None. * **pretrained_model**: model path for pretraining. Default: None.
* **checkpoint**: the checkpoint path to resume. Default: None. * **checkpoint**: the checkpoint path to resume. Default: None.
* **model_category**: the category of models, ("models"|"models_name"). Default: "models". * **model_category**: the category of models, ("models"|"models_name"). Default: "models_name".
Or can start the training step by running the ```run.sh```. Or can start the training step by running the ```run.sh```.
...@@ -209,6 +209,7 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by ` ...@@ -209,6 +209,7 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by `
|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% | |[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% |
|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% | |[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% |
|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% | |[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% |
|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% |
|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% | |[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% |
|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% | |[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% |
|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% | |[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% |
...@@ -220,6 +221,8 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by ` ...@@ -220,6 +221,8 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by `
- Released models: not specify parameter names - Released models: not specify parameter names
**NOTE: These are trained by using model_category=models**
|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | |model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) |
|- |:-: |:-:| |- |:-: |:-:|
|[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% | |[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% |
......
...@@ -79,7 +79,7 @@ python train.py \ ...@@ -79,7 +79,7 @@ python train.py \
* **lr**: initialized learning rate. Default: 0.1. * **lr**: initialized learning rate. Default: 0.1.
* **pretrained_model**: model path for pretraining. Default: None. * **pretrained_model**: model path for pretraining. Default: None.
* **checkpoint**: the checkpoint path to resume. Default: None. * **checkpoint**: the checkpoint path to resume. Default: None.
* **model_category**: the category of models, ("models"|"models_name"). Default:"models". * **model_category**: the category of models, ("models"|"models_name"). Default:"models_name".
**数据读取器说明:** 数据读取器定义在```reader.py``````reader_cv2.py```中, 一般, CV2 reader可以提高数据读取速度, reader(PIL)可以得到相对更高的精度, 在[训练阶段](#training-a-model), 默认采用的增广方式是随机裁剪与水平翻转, 而在[评估](#inference)[推断](#inference)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有: **数据读取器说明:** 数据读取器定义在```reader.py``````reader_cv2.py```中, 一般, CV2 reader可以提高数据读取速度, reader(PIL)可以得到相对更高的精度, 在[训练阶段](#training-a-model), 默认采用的增广方式是随机裁剪与水平翻转, 而在[评估](#inference)[推断](#inference)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有:
* 旋转 * 旋转
...@@ -204,6 +204,7 @@ Models包括两种模型:带有参数名字的模型,和不带有参数名 ...@@ -204,6 +204,7 @@ Models包括两种模型:带有参数名字的模型,和不带有参数名
|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% | |[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% |
|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% | |[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% |
|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% | |[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% |
|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% |
|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% | |[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% |
|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% | |[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% |
|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% | |[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% |
...@@ -212,6 +213,8 @@ Models包括两种模型:带有参数名字的模型,和不带有参数名 ...@@ -212,6 +213,8 @@ Models包括两种模型:带有参数名字的模型,和不带有参数名
- Released models: not specify parameter names - Released models: not specify parameter names
**注意:这是model_category = models 的预训练模型**
|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | |model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) |
|- |:-: |:-:| |- |:-: |:-:|
|[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% | |[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% |
......
...@@ -39,6 +39,8 @@ You can test if distributed training works on a single node before deploying to ...@@ -39,6 +39,8 @@ You can test if distributed training works on a single node before deploying to
***NOTE: for best performance, we recommend using multi-process mode, see No.3. And together with fp16.*** ***NOTE: for best performance, we recommend using multi-process mode, see No.3. And together with fp16.***
***NOTE: for nccl2 distributed mode, you must ensure each node train same number of samples, or set skip_unbalanced_data to 1 to do sync training.***
1. simply run `python dist_train.py` to start local training with default configuratioins. 1. simply run `python dist_train.py` to start local training with default configuratioins.
2. for pserver mode, run `bash run_ps_mode.sh` to start 2 pservers and 2 trainers, these 2 trainers 2. for pserver mode, run `bash run_ps_mode.sh` to start 2 pservers and 2 trainers, these 2 trainers
will use GPU 0 and 1 to simulate 2 workers. will use GPU 0 and 1 to simulate 2 workers.
...@@ -90,4 +92,19 @@ The default resnet50 distributed training config is based on this paper: https:/ ...@@ -90,4 +92,19 @@ The default resnet50 distributed training config is based on this paper: https:/
### Performance ### Performance
TBD The below figure shows fluid distributed training performances. We did these on a 4-node V100 GPU cluster,
each has 8 V100 GPU card, with total of 32 GPUs. All modes can reach the "state of the art (choose loss scale carefully when using fp16 mode)" of ResNet50 model with imagenet dataset. The Y axis in the figure shows
the images/s while the X-axis shows the number of GPUs.
<p align="center">
<img src="../images/imagenet_dist_performance.png" width=528> <br />
Performance of Multiple-GPU Training of Resnet50 on Imagenet
</p>
The second figure shows speed-ups when using multiple GPUs according to the above figure.
<p align="center">
<img src="../images/imagenet_dist_speedup.png" width=528> <br />
Speed-ups of Multiple-GPU Training of Resnet50 on Imagenet
</p>
...@@ -7,8 +7,6 @@ import time ...@@ -7,8 +7,6 @@ import time
import sys import sys
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
#import models
import models_name as models
#import reader_cv2 as reader #import reader_cv2 as reader
import reader as reader import reader as reader
import argparse import argparse
...@@ -27,9 +25,20 @@ add_arg('image_shape', str, "3,224,224", "Input image size") ...@@ -27,9 +25,20 @@ add_arg('image_shape', str, "3,224,224", "Input image size")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.") add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.") add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." )
# yapf: enable # yapf: enable
model_list = [m for m in dir(models) if "__" not in m]
def set_models(model_category):
global models
assert model_category in ["models", "models_name"
], "{} is not in lists: {}".format(
model_category, ["models", "models_name"])
if model_category == "models_name":
import models_name as models
else:
import models as models
def eval(args): def eval(args):
...@@ -40,6 +49,7 @@ def eval(args): ...@@ -40,6 +49,7 @@ def eval(args):
with_memory_optimization = args.with_mem_opt with_memory_optimization = args.with_mem_opt
image_shape = [int(m) for m in args.image_shape.split(",")] image_shape = [int(m) for m in args.image_shape.split(",")]
model_list = [m for m in dir(models) if "__" not in m]
assert model_name in model_list, "{} is not in lists: {}".format(args.model, assert model_name in model_list, "{} is not in lists: {}".format(args.model,
model_list) model_list)
...@@ -63,11 +73,11 @@ def eval(args): ...@@ -63,11 +73,11 @@ def eval(args):
acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5) acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
else: else:
out = model.net(input=image, class_dim=class_dim) out = model.net(input=image, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label) cost, pred = fluid.layers.softmax_with_cross_entropy(
out, label, return_softmax=True)
avg_cost = fluid.layers.mean(x=cost) avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1) acc_top1 = fluid.layers.accuracy(input=pred, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5) acc_top5 = fluid.layers.accuracy(input=pred, label=label, k=5)
test_program = fluid.default_main_program().clone(for_test=True) test_program = fluid.default_main_program().clone(for_test=True)
...@@ -125,6 +135,7 @@ def eval(args): ...@@ -125,6 +135,7 @@ def eval(args):
def main(): def main():
args = parser.parse_args() args = parser.parse_args()
print_arguments(args) print_arguments(args)
set_models(args.model_category)
eval(args) eval(args)
......
...@@ -7,7 +7,6 @@ import time ...@@ -7,7 +7,6 @@ import time
import sys import sys
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import models
import reader import reader
import argparse import argparse
import functools import functools
...@@ -23,9 +22,19 @@ add_arg('image_shape', str, "3,224,224", "Input image size") ...@@ -23,9 +22,19 @@ add_arg('image_shape', str, "3,224,224", "Input image size")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.") add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.") add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." )
# yapf: enable # yapf: enable
model_list = [m for m in dir(models) if "__" not in m]
def set_models(model_category):
global models
assert model_category in ["models", "models_name"
], "{} is not in lists: {}".format(
model_category, ["models", "models_name"])
if model_category == "models_name":
import models_name as models
else:
import models as models
def infer(args): def infer(args):
...@@ -35,7 +44,7 @@ def infer(args): ...@@ -35,7 +44,7 @@ def infer(args):
pretrained_model = args.pretrained_model pretrained_model = args.pretrained_model
with_memory_optimization = args.with_mem_opt with_memory_optimization = args.with_mem_opt
image_shape = [int(m) for m in args.image_shape.split(",")] image_shape = [int(m) for m in args.image_shape.split(",")]
model_list = [m for m in dir(models) if "__" not in m]
assert model_name in model_list, "{} is not in lists: {}".format(args.model, assert model_name in model_list, "{} is not in lists: {}".format(args.model,
model_list) model_list)
...@@ -85,6 +94,7 @@ def infer(args): ...@@ -85,6 +94,7 @@ def infer(args):
def main(): def main():
args = parser.parse_args() args = parser.parse_args()
print_arguments(args) print_arguments(args)
set_models(args.model_category)
infer(args) infer(args)
......
#Hyperparameters config #Hyperparameters config
#Example: SE_ResNext50_32x4d
python train.py \ python train.py \
--model=SE_ResNeXt50_32x4d \ --model=SE_ResNeXt50_32x4d \
--batch_size=32 \ --batch_size=400 \
--total_images=1281167 \ --total_images=1281167 \
--class_dim=1000 \ --class_dim=1000 \
--image_shape=3,224,224 \ --image_shape=3,224,224 \
--model_save_dir=output/ \ --model_save_dir=output/ \
--with_mem_opt=False \ --with_mem_opt=True \
--lr_strategy=piecewise_decay \ --lr_strategy=cosine_decay \
--lr=0.1 --lr=0.1 \
--num_epochs=200 \
--l2_decay=1.2e-4 \
--model_category=models_name \
# >log_SE_ResNeXt50_32x4d.txt 2>&1 & # >log_SE_ResNeXt50_32x4d.txt 2>&1 &
#AlexNet: #AlexNet:
#python train.py \ #python train.py \
# --model=AlexNet \ # --model=AlexNet \
...@@ -19,39 +22,41 @@ python train.py \ ...@@ -19,39 +22,41 @@ python train.py \
# --class_dim=1000 \ # --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --model_save_dir=output/ \ # --model_save_dir=output/ \
# --with_mem_opt=False \ # --with_mem_opt=True \
# --model_category=models_name \
# --lr_strategy=piecewise_decay \ # --lr_strategy=piecewise_decay \
# --num_epochs=120 \ # --num_epochs=120 \
# --lr=0.01 # --lr=0.01 \
# --l2_decay=1e-4
#VGG11: #MobileNet v1:
#python train.py \ #python train.py \
# --model=VGG11 \ # --model=MobileNet \
# --batch_size=512 \ # --batch_size=256 \
# --total_images=1281167 \ # --total_images=1281167 \
# --class_dim=1000 \ # --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --model_save_dir=output/ \ # --model_save_dir=output/ \
# --with_mem_opt=False \ # --with_mem_opt=True \
# --model_category=models_name \
# --lr_strategy=piecewise_decay \ # --lr_strategy=piecewise_decay \
# --num_epochs=120 \ # --num_epochs=120 \
# --lr=0.1 # --lr=0.1 \
# --l2_decay=3e-5
#MobileNet v1:
#python train.py \ #python train.py \
# --model=MobileNet \ # --model=MobileNetV2 \
# --batch_size=256 \ # --batch_size=500 \
# --total_images=1281167 \ # --total_images=1281167 \
# --class_dim=1000 \ # --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --model_save_dir=output/ \ # --model_save_dir=output/ \
# --with_mem_opt=False \ # --model_category=models_name \
# --lr_strategy=piecewise_decay \ # --with_mem_opt=True \
# --num_epochs=120 \ # --lr_strategy=cosine_decay \
# --lr=0.1 # --num_epochs=240 \
# --lr=0.1 \
# --l2_decay=4e-5
#ResNet50: #ResNet50:
#python train.py \ #python train.py \
# --model=ResNet50 \ # --model=ResNet50 \
...@@ -60,10 +65,12 @@ python train.py \ ...@@ -60,10 +65,12 @@ python train.py \
# --class_dim=1000 \ # --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --model_save_dir=output/ \ # --model_save_dir=output/ \
# --with_mem_opt=False \ # --with_mem_opt=True \
# --model_category=models_name \
# --lr_strategy=piecewise_decay \ # --lr_strategy=piecewise_decay \
# --num_epochs=120 \ # --num_epochs=120 \
# --lr=0.1 # --lr=0.1 \
# --l2_decay=1e-4
#ResNet101: #ResNet101:
#python train.py \ #python train.py \
...@@ -73,44 +80,58 @@ python train.py \ ...@@ -73,44 +80,58 @@ python train.py \
# --class_dim=1000 \ # --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --model_save_dir=output/ \ # --model_save_dir=output/ \
# --with_mem_opt=False \ # --model_category=models_name \
# --with_mem_opt=True \
# --lr_strategy=piecewise_decay \ # --lr_strategy=piecewise_decay \
# --num_epochs=120 \ # --num_epochs=120 \
# --lr=0.1 # --lr=0.1 \
# --l2_decay=1e-4
#ResNet152: #ResNet152:
#python train.py \ #python train.py \
# --model=ResNet152 \ # --model=ResNet152 \
# --batch_size=256 \ # --batch_size=256 \
# --total_images=1281167 \ # --total_images=1281167 \
# --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --model_save_dir=output/ \
# --lr_strategy=piecewise_decay \ # --lr_strategy=piecewise_decay \
# --model_category=models_name \
# --with_mem_opt=True \
# --lr=0.1 \ # --lr=0.1 \
# --num_epochs=120 \ # --num_epochs=120 \
# --l2_decay=1e-4 \(TODO) # --l2_decay=1e-4
#SE_ResNeXt50: #SE_ResNeXt50_32x4d:
#python train.py \ #python train.py \
# --model=SE_ResNeXt50 \ # --model=SE_ResNeXt50_32x4d \
# --batch_size=400 \ # --batch_size=400 \
# --total_images=1281167 \ # --total_images=1281167 \
# --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --lr_strategy=cosine_decay \ # --lr_strategy=cosine_decay \
# --model_category=models_name \
# --model_save_dir=output/ \
# --lr=0.1 \ # --lr=0.1 \
# --num_epochs=200 \ # --num_epochs=200 \
# --l2_decay=12e-5 \(TODO) # --with_mem_opt=True \
# --l2_decay=1.2e-4
#SE_ResNeXt101: #SE_ResNeXt101_32x4d:
#python train.py \ #python train.py \
# --model=SE_ResNeXt101 \ # --model=SE_ResNeXt101_32x4d \
# --batch_size=400 \ # --batch_size=400 \
# --total_images=1281167 \ # --total_images=1281167 \
# --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --lr_strategy=cosine_decay \ # --lr_strategy=cosine_decay \
# --model_category=models_name \
# --model_save_dir=output/ \
# --lr=0.1 \ # --lr=0.1 \
# --num_epochs=200 \ # --num_epochs=200 \
# --l2_decay=15e-5 \(TODO) # --with_mem_opt=True \
# --l2_decay=1.5e-5
#VGG11: #VGG11:
#python train.py \ #python train.py \
...@@ -119,17 +140,55 @@ python train.py \ ...@@ -119,17 +140,55 @@ python train.py \
# --total_images=1281167 \ # --total_images=1281167 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --lr_strategy=cosine_decay \ # --lr_strategy=cosine_decay \
# --class_dim=1000 \
# --model_category=models_name \
# --model_save_dir=output/ \
# --lr=0.1 \ # --lr=0.1 \
# --num_epochs=90 \ # --num_epochs=90 \
# --l2_decay=2e-4 \(TODO) # --with_mem_opt=True \
# --l2_decay=2e-4
#VGG13: #VGG13:
#python train.py #python train.py
# --model=VGG13 \ # --model=VGG13 \
# --batch_size=256 \ # --batch_size=256 \
# --total_images=1281167 \ # --total_images=1281167 \
# --class_dim=1000 \
# --image_shape=3,224,224 \ # --image_shape=3,224,224 \
# --lr_strategy=cosine_decay \ # --lr_strategy=cosine_decay \
# --lr=0.01 \ # --lr=0.01 \
# --num_epochs=90 \ # --num_epochs=90 \
# --l2_decay=3e-4 \(TODO) # --model_category=models_name \
# --model_save_dir=output/ \
# --with_mem_opt=True \
# --l2_decay=3e-4
#VGG16:
#python train.py
# --model=VGG16 \
# --batch_size=256 \
# --total_images=1281167 \
# --class_dim=1000 \
# --lr_strategy=cosine_decay \
# --image_shape=3,224,224 \
# --model_category=models_name \
# --model_save_dir=output/ \
# --lr=0.01 \
# --num_epochs=90 \
# --with_mem_opt=True \
# --l2_decay=3e-4
#VGG19:
#python train.py
# --model=VGG19 \
# --batch_size=256 \
# --total_images=1281167 \
# --class_dim=1000 \
# --image_shape=3,224,224 \
# --lr_strategy=cosine_decay \
# --lr=0.01 \
# --num_epochs=90 \
# --with_mem_opt=True \
# --model_category=models_name \
# --model_save_dir=output/ \
# --l2_decay=3e-4
...@@ -10,7 +10,6 @@ import math ...@@ -10,7 +10,6 @@ import math
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.dataset.flowers as flowers import paddle.dataset.flowers as flowers
import models
import reader import reader
import argparse import argparse
import functools import functools
...@@ -19,8 +18,8 @@ import utils ...@@ -19,8 +18,8 @@ import utils
from utils.learning_rate import cosine_decay from utils.learning_rate import cosine_decay
from utils.fp16_utils import create_master_params_grads, master_param_to_train_param from utils.fp16_utils import create_master_params_grads, master_param_to_train_param
from utility import add_arguments, print_arguments from utility import add_arguments, print_arguments
import models
import models_name IMAGENET1000 = 1281167
parser = argparse.ArgumentParser(description=__doc__) parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser) add_arg = functools.partial(add_arguments, argparser=parser)
...@@ -40,25 +39,32 @@ add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate ...@@ -40,25 +39,32 @@ add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.") add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
add_arg('enable_ce', bool, False, "If set True, enable continuous evaluation job.") add_arg('enable_ce', bool, False, "If set True, enable continuous evaluation job.")
add_arg('data_dir', str, "./data/ILSVRC2012", "The ImageNet dataset root dir.") add_arg('data_dir', str, "./data/ILSVRC2012", "The ImageNet dataset root dir.")
add_arg('model_category', str, "models", "Whether to use models_name or not, valid value:'models','models_name'" ) add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." )
add_arg('fp16', bool, False, "Enable half precision training with fp16." ) add_arg('fp16', bool, False, "Enable half precision training with fp16." )
add_arg('scale_loss', float, 1.0, "Scale loss for fp16." ) add_arg('scale_loss', float, 1.0, "Scale loss for fp16." )
add_arg('l2_decay', float, 1e-4, "L2_decay parameter.")
add_arg('momentum_rate', float, 0.9, "momentum_rate.")
# yapf: enable # yapf: enable
def set_models(model): def set_models(model_category):
global models global models
if model == "models": assert model_category in ["models", "models_name"
models = models ], "{} is not in lists: {}".format(
model_category, ["models", "models_name"])
if model_category == "models_name":
import models_name as models
else: else:
models = models_name import models as models
def optimizer_setting(params): def optimizer_setting(params):
ls = params["learning_strategy"] ls = params["learning_strategy"]
l2_decay = params["l2_decay"]
momentum_rate = params["momentum_rate"]
if ls["name"] == "piecewise_decay": if ls["name"] == "piecewise_decay":
if "total_images" not in params: if "total_images" not in params:
total_images = 1281167 total_images = IMAGENET1000
else: else:
total_images = params["total_images"] total_images = params["total_images"]
batch_size = ls["batch_size"] batch_size = ls["batch_size"]
...@@ -71,16 +77,17 @@ def optimizer_setting(params): ...@@ -71,16 +77,17 @@ def optimizer_setting(params):
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.piecewise_decay( learning_rate=fluid.layers.piecewise_decay(
boundaries=bd, values=lr), boundaries=bd, values=lr),
momentum=0.9, momentum=momentum_rate,
regularization=fluid.regularizer.L2Decay(1e-4)) regularization=fluid.regularizer.L2Decay(l2_decay))
elif ls["name"] == "cosine_decay": elif ls["name"] == "cosine_decay":
if "total_images" not in params: if "total_images" not in params:
total_images = 1281167 total_images = IMAGENET1000
else: else:
total_images = params["total_images"] total_images = params["total_images"]
batch_size = ls["batch_size"] batch_size = ls["batch_size"]
l2_decay = params["l2_decay"]
momentum_rate = params["momentum_rate"]
step = int(total_images / batch_size + 1) step = int(total_images / batch_size + 1)
lr = params["lr"] lr = params["lr"]
...@@ -89,43 +96,42 @@ def optimizer_setting(params): ...@@ -89,43 +96,42 @@ def optimizer_setting(params):
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=cosine_decay( learning_rate=cosine_decay(
learning_rate=lr, step_each_epoch=step, epochs=num_epochs), learning_rate=lr, step_each_epoch=step, epochs=num_epochs),
momentum=0.9, momentum=momentum_rate,
regularization=fluid.regularizer.L2Decay(4e-5)) regularization=fluid.regularizer.L2Decay(l2_decay))
elif ls["name"] == "exponential_decay": elif ls["name"] == "linear_decay":
if "total_images" not in params: if "total_images" not in params:
total_images = 1281167 total_images = IMAGENET1000
else: else:
total_images = params["total_images"] total_images = params["total_images"]
batch_size = ls["batch_size"] batch_size = ls["batch_size"]
step = int(total_images / batch_size +1)
lr = params["lr"]
num_epochs = params["num_epochs"] num_epochs = params["num_epochs"]
learning_decay_rate_factor=ls["learning_decay_rate_factor"] start_lr = params["lr"]
num_epochs_per_decay = ls["num_epochs_per_decay"] l2_decay = params["l2_decay"]
NUM_GPUS = 1 momentum_rate = params["momentum_rate"]
end_lr = 0
total_step = int((total_images / batch_size) * num_epochs)
lr = fluid.layers.polynomial_decay(
start_lr, total_step, end_lr, power=1)
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.exponential_decay( learning_rate=lr,
learning_rate = lr * NUM_GPUS, momentum=momentum_rate,
decay_steps = step * num_epochs_per_decay / NUM_GPUS, regularization=fluid.regularizer.L2Decay(l2_decay))
decay_rate = learning_decay_rate_factor),
momentum=0.9,
regularization = fluid.regularizer.L2Decay(4e-5))
else: else:
lr = params["lr"] lr = params["lr"]
l2_decay = params["l2_decay"]
momentum_rate = params["momentum_rate"]
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=lr, learning_rate=lr,
momentum=0.9, momentum=momentum_rate,
regularization=fluid.regularizer.L2Decay(1e-4)) regularization=fluid.regularizer.L2Decay(l2_decay))
return optimizer return optimizer
def net_config(image, label, model, args): def net_config(image, label, model, args):
model_list = [m for m in dir(models) if "__" not in m] model_list = [m for m in dir(models) if "__" not in m]
assert args.model in model_list,"{} is not lists: {}".format( assert args.model in model_list, "{} is not lists: {}".format(args.model,
args.model, model_list) model_list)
class_dim = args.class_dim class_dim = args.class_dim
model_name = args.model model_name = args.model
...@@ -149,7 +155,8 @@ def net_config(image, label, model, args): ...@@ -149,7 +155,8 @@ def net_config(image, label, model, args):
acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5) acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
else: else:
out = model.net(input=image, class_dim=class_dim) out = model.net(input=image, class_dim=class_dim)
cost, pred = fluid.layers.softmax_with_cross_entropy(out, label, return_softmax=True) cost, pred = fluid.layers.softmax_with_cross_entropy(
out, label, return_softmax=True)
if args.scale_loss > 1: if args.scale_loss > 1:
avg_cost = fluid.layers.mean(x=cost) * float(args.scale_loss) avg_cost = fluid.layers.mean(x=cost) * float(args.scale_loss)
else: else:
...@@ -190,18 +197,24 @@ def build_program(is_train, main_prog, startup_prog, args): ...@@ -190,18 +197,24 @@ def build_program(is_train, main_prog, startup_prog, args):
params["num_epochs"] = args.num_epochs params["num_epochs"] = args.num_epochs
params["learning_strategy"]["batch_size"] = args.batch_size params["learning_strategy"]["batch_size"] = args.batch_size
params["learning_strategy"]["name"] = args.lr_strategy params["learning_strategy"]["name"] = args.lr_strategy
params["l2_decay"] = args.l2_decay
params["momentum_rate"] = args.momentum_rate
optimizer = optimizer_setting(params) optimizer = optimizer_setting(params)
if args.fp16: if args.fp16:
params_grads = optimizer.backward(avg_cost) params_grads = optimizer.backward(avg_cost)
master_params_grads = create_master_params_grads( master_params_grads = create_master_params_grads(
params_grads, main_prog, startup_prog, args.scale_loss) params_grads, main_prog, startup_prog, args.scale_loss)
optimizer.apply_gradients(master_params_grads) optimizer.apply_gradients(master_params_grads)
master_param_to_train_param(master_params_grads, params_grads, main_prog) master_param_to_train_param(master_params_grads,
params_grads, main_prog)
else: else:
optimizer.minimize(avg_cost) optimizer.minimize(avg_cost)
global_lr = optimizer._global_learning_rate()
if is_train:
return py_reader, avg_cost, acc_top1, acc_top5, global_lr
else:
return py_reader, avg_cost, acc_top1, acc_top5 return py_reader, avg_cost, acc_top1, acc_top5
...@@ -220,7 +233,7 @@ def train(args): ...@@ -220,7 +233,7 @@ def train(args):
startup_prog.random_seed = 1000 startup_prog.random_seed = 1000
train_prog.random_seed = 1000 train_prog.random_seed = 1000
train_py_reader, train_cost, train_acc1, train_acc5 = build_program( train_py_reader, train_cost, train_acc1, train_acc5, global_lr = build_program(
is_train=True, is_train=True,
main_prog=train_prog, main_prog=train_prog,
startup_prog=startup_prog, startup_prog=startup_prog,
...@@ -255,7 +268,8 @@ def train(args): ...@@ -255,7 +268,8 @@ def train(args):
if visible_device: if visible_device:
device_num = len(visible_device.split(',')) device_num = len(visible_device.split(','))
else: else:
device_num = subprocess.check_output(['nvidia-smi', '-L']).decode().count('\n') device_num = subprocess.check_output(
['nvidia-smi', '-L']).decode().count('\n')
train_batch_size = args.batch_size / device_num train_batch_size = args.batch_size / device_num
test_batch_size = 16 test_batch_size = 16
...@@ -283,11 +297,12 @@ def train(args): ...@@ -283,11 +297,12 @@ def train(args):
use_cuda=bool(args.use_gpu), use_cuda=bool(args.use_gpu),
loss_name=train_cost.name) loss_name=train_cost.name)
train_fetch_list = [train_cost.name, train_acc1.name, train_acc5.name] train_fetch_list = [
train_cost.name, train_acc1.name, train_acc5.name, global_lr.name
]
test_fetch_list = [test_cost.name, test_acc1.name, test_acc5.name] test_fetch_list = [test_cost.name, test_acc1.name, test_acc5.name]
params = models.__dict__[args.model]().params params = models.__dict__[args.model]().params
for pass_id in range(params["num_epochs"]): for pass_id in range(params["num_epochs"]):
train_py_reader.start() train_py_reader.start()
...@@ -299,7 +314,9 @@ def train(args): ...@@ -299,7 +314,9 @@ def train(args):
try: try:
while True: while True:
t1 = time.time() t1 = time.time()
loss, acc1, acc5 = train_exe.run(fetch_list=train_fetch_list) loss, acc1, acc5, lr = train_exe.run(
fetch_list=train_fetch_list)
t2 = time.time() t2 = time.time()
period = t2 - t1 period = t2 - t1
loss = np.mean(np.array(loss)) loss = np.mean(np.array(loss))
...@@ -308,12 +325,14 @@ def train(args): ...@@ -308,12 +325,14 @@ def train(args):
train_info[0].append(loss) train_info[0].append(loss)
train_info[1].append(acc1) train_info[1].append(acc1)
train_info[2].append(acc5) train_info[2].append(acc5)
lr = np.mean(np.array(lr))
train_time.append(period) train_time.append(period)
if batch_id % 10 == 0: if batch_id % 10 == 0:
print("Pass {0}, trainbatch {1}, loss {2}, \ print("Pass {0}, trainbatch {1}, loss {2}, \
acc1 {3}, acc5 {4} time {5}" acc1 {3}, acc5 {4}, lr{5}, time {6}"
.format(pass_id, batch_id, loss, acc1, acc5, .format(pass_id, batch_id, loss, acc1, acc5, "%.5f" %
"%2.2f sec" % period)) lr, "%2.2f sec" % period))
sys.stdout.flush() sys.stdout.flush()
batch_id += 1 batch_id += 1
except fluid.core.EOFException: except fluid.core.EOFException:
...@@ -322,7 +341,8 @@ def train(args): ...@@ -322,7 +341,8 @@ def train(args):
train_loss = np.array(train_info[0]).mean() train_loss = np.array(train_info[0]).mean()
train_acc1 = np.array(train_info[1]).mean() train_acc1 = np.array(train_info[1]).mean()
train_acc5 = np.array(train_info[2]).mean() train_acc5 = np.array(train_info[2]).mean()
train_speed = np.array(train_time).mean() / (train_batch_size * device_num) train_speed = np.array(train_time).mean() / (train_batch_size *
device_num)
test_py_reader.start() test_py_reader.start()
...@@ -394,10 +414,7 @@ def train(args): ...@@ -394,10 +414,7 @@ def train(args):
def main(): def main():
args = parser.parse_args() args = parser.parse_args()
models_now = args.model_category set_models(args.model_category)
assert models_now in ["models", "models_name"], "{} is not in lists: {}".format(
models_now, ["models", "models_name"])
set_models(models_now)
print_arguments(args) print_arguments(args)
train(args) train(args)
......
...@@ -202,5 +202,5 @@ env CUDA_VISIBLE_DEVICE=0 python infer.py \ ...@@ -202,5 +202,5 @@ env CUDA_VISIBLE_DEVICE=0 python infer.py \
|模型| 错误率| |模型| 错误率|
|- |:-: | |- |:-: |
|[ocr_ctc_params](https://drive.google.com/open?id=1gsg2ODO2_F2pswXwW5MXpf8RY8-BMRyZ) | 22.3% | |[ocr_ctc_params](https://paddle-ocr-models.bj.bcebos.com/ocr_ctc.zip) | 22.3% |
|[ocr_attention_params](https://drive.google.com/open?id=1Bx7-94mngyTaMA5kVjzYHDPAdXxOYbRm) | 15.8%| |[ocr_attention_params](https://paddle-ocr-models.bj.bcebos.com/ocr_attention.zip) | 15.8%|
# Faster RCNN Objective Detection # RCNN Objective Detection
--- ---
## Table of Contents ## Table of Contents
...@@ -9,7 +9,6 @@ ...@@ -9,7 +9,6 @@
- [Training](#training) - [Training](#training)
- [Evaluation](#evaluation) - [Evaluation](#evaluation)
- [Inference and Visualization](#inference-and-visualization) - [Inference and Visualization](#inference-and-visualization)
- [Appendix](#appendix)
## Installation ## Installation
...@@ -17,17 +16,20 @@ Running sample code in this directory requires PaddelPaddle Fluid v.1.0.0 and la ...@@ -17,17 +16,20 @@ Running sample code in this directory requires PaddelPaddle Fluid v.1.0.0 and la
## Introduction ## Introduction
[Faster Rcnn](https://arxiv.org/abs/1506.01497) is a typical two stage detector. The total framework of network can be divided into four parts, as shown below: Region Convolutional Neural Network (RCNN) models are two stages detector. According to proposals and feature extraction, obtain class and more precise proposals.
<p align="center"> Now RCNN model contains two typical models: Faster RCNN and Mask RCNN.
<img src="image/Faster_RCNN.jpg" height=400 width=400 hspace='10'/> <br />
Faster RCNN model [Faster RCNN](https://arxiv.org/abs/1506.01497), The total framework of network can be divided into four parts:
</p>
1. Base conv layer. As a CNN objective dection, Faster RCNN extract feature maps using a basic convolutional network. The feature maps then can be shared by RPN and fc layers. This sampel uses [ResNet-50](https://arxiv.org/abs/1512.03385) as base conv layer. 1. Base conv layer. As a CNN objective dection, Faster RCNN extract feature maps using a basic convolutional network. The feature maps then can be shared by RPN and fc layers. This sampel uses [ResNet-50](https://arxiv.org/abs/1512.03385) as base conv layer.
2. Region Proposal Network (RPN). RPN generates proposals for detection。This block generates anchors by a set of size and ratio and classifies anchors into fore-ground and back-ground by softmax. Then refine anchors to obtain more precise proposals using box regression. 2. Region Proposal Network (RPN). RPN generates proposals for detection。This block generates anchors by a set of size and ratio and classifies anchors into fore-ground and back-ground by softmax. Then refine anchors to obtain more precise proposals using box regression.
3. RoI Align. This layer takes feature maps and proposals as input. The proposals are mapped to feature maps and pooled to the same size. The output are sent to fc layers for classification and regression. RoIPool and RoIAlign are used separately to this layer and it can be set in roi\_func in config.py. 3. RoI Align. This layer takes feature maps and proposals as input. The proposals are mapped to feature maps and pooled to the same size. The output are sent to fc layers for classification and regression. RoIPool and RoIAlign are used separately to this layer and it can be set in roi\_func in config.py.
4. Detection layer. Using the output of roi pooling to compute the class and locatoin of each proposal in two fc layers. 4. Detection layer. Using the output of roi pooling to compute the class and locatoin of each proposal in two fc layers.
[Mask RCNN](https://arxiv.org/abs/1703.06870) is a classical instance segmentation model and an extension of Faster RCNN
Mask RCNN is a two stage model as well. At the first stage, it generates proposals from input images. At the second stage, it obtains class result, bbox and mask which is the result from segmentation branch on original Faster RCNN model. It decouples the relation between mask and classification.
## Data preparation ## Data preparation
Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below: Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below:
...@@ -62,12 +64,24 @@ To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. ...@@ -62,12 +64,24 @@ To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
After data preparation, one can start the training step by: After data preparation, one can start the training step by:
- Faster RCNN
python train.py \ python train.py \
--model_save_dir=output/ \ --model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model} --pretrained_model=${path_to_pretrain_model} \
--data_dir=${path_to_data} --data_dir=${path_to_data} \
--MASK_ON=False
- Mask RCNN
python train.py \
--model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model} \
--data_dir=${path_to_data} \
--MASK_ON=True
- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train. - Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train.
- Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model.
- For more help on arguments: - For more help on arguments:
python train.py --help python train.py --help
...@@ -93,7 +107,6 @@ After data preparation, one can start the training step by: ...@@ -93,7 +107,6 @@ After data preparation, one can start the training step by:
* In first 500 iteration, the learning rate increases linearly from 0.00333 to 0.01. Then lr is decayed at 120000, 160000 iteration with multiplier 0.1, 0.01. The maximum iteration is 180000. Also, we released a 2x model which has 360000 iterations and lr is decayed at 240000, 320000. These configuration can be set by max_iter and lr_steps in config.py. * In first 500 iteration, the learning rate increases linearly from 0.00333 to 0.01. Then lr is decayed at 120000, 160000 iteration with multiplier 0.1, 0.01. The maximum iteration is 180000. Also, we released a 2x model which has 360000 iterations and lr is decayed at 240000, 320000. These configuration can be set by max_iter and lr_steps in config.py.
* Set the learning rate of bias to two times as global lr in non basic convolutional layers. * Set the learning rate of bias to two times as global lr in non basic convolutional layers.
* In basic convolutional layers, parameters of affine layers and res body do not update. * In basic convolutional layers, parameters of affine layers and res body do not update.
* Use Nvidia Tesla V100 8GPU, total time for training is about 40 hours.
## Evaluation ## Evaluation
...@@ -101,14 +114,27 @@ Evaluation is to evaluate the performance of a trained model. This sample provid ...@@ -101,14 +114,27 @@ Evaluation is to evaluate the performance of a trained model. This sample provid
`eval_coco_map.py` is the main executor for evalution, one can start evalution step by: `eval_coco_map.py` is the main executor for evalution, one can start evalution step by:
- Faster RCNN
python eval_coco_map.py \ python eval_coco_map.py \
--dataset=coco2017 \ --dataset=coco2017 \
--pretrained_model=${path_to_pretrain_model} \ --pretrained_model=${path_to_pretrain_model} \
--MASK_ON=False
- Mask RCNN
python eval_coco_map.py \
--dataset=coco2017 \
--pretrained_model=${path_to_pretrain_model} \
--MASK_ON=True
- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval. - Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval.
- Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model.
Evalutaion result is shown as below: Evalutaion result is shown as below:
Faster RCNN:
| Model | RoI function | Batch size | Max iteration | mAP | | Model | RoI function | Batch size | Max iteration | mAP |
| :--------------- | :--------: | :------------: | :------------------: |------: | | :--------------- | :--------: | :------------: | :------------------: |------: |
| [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 | | [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 |
...@@ -121,6 +147,14 @@ Evalutaion result is shown as below: ...@@ -121,6 +147,14 @@ Evalutaion result is shown as below:
* Fluid RoIAlign no padding: Images without padding. * Fluid RoIAlign no padding: Images without padding.
* Fluid RoIAlign no padding 2x: Images without padding, train for 360000 iterations, learning rate is decayed at 240000, 320000. * Fluid RoIAlign no padding 2x: Images without padding, train for 360000 iterations, learning rate is decayed at 240000, 320000.
Mask RCNN:
| Model | Batch size | Max iteration | box mAP | mask mAP |
| :--------------- | :--------: | :------------: | :--------: |------: |
| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 |
* Fluid mask no padding: Use RoIAlign. Images without padding.
## Inference and Visualization ## Inference and Visualization
Inference is used to get prediction score or image features based on trained models. `infer.py` is the main executor for inference, one can start infer step by: Inference is used to get prediction score or image features based on trained models. `infer.py` is the main executor for inference, one can start infer step by:
...@@ -135,8 +169,12 @@ Inference is used to get prediction score or image features based on trained mod ...@@ -135,8 +169,12 @@ Inference is used to get prediction score or image features based on trained mod
Visualization of infer result is shown as below: Visualization of infer result is shown as below:
<p align="center"> <p align="center">
<img src="image/000000000139.jpg" height=300 width=400 hspace='10'/> <img src="image/000000000139.jpg" height=300 width=400 hspace='10'/>
<img src="image/000000127517.jpg" height=300 width=400 hspace='10'/> <img src="image/000000127517.jpg" height=300 width=400 hspace='10'/> <br />
<img src="image/000000203864.jpg" height=300 width=400 hspace='10'/>
<img src="image/000000515077.jpg" height=300 width=400 hspace='10'/> <br />
Faster RCNN Visualization Examples Faster RCNN Visualization Examples
</p> </p>
<p align="center">
<img src="image/000000000139_mask.jpg" height=300 width=400 hspace='10'/>
<img src="image/000000127517_mask.jpg" height=300 width=400 hspace='10'/> <br />
Mask RCNN Visualization Examples
</p>
# Faster RCNN 目标检测 # RCNN 系列目标检测
--- ---
## 内容 ## 内容
...@@ -9,25 +9,27 @@ ...@@ -9,25 +9,27 @@
- [模型训练](#模型训练) - [模型训练](#模型训练)
- [模型评估](#模型评估) - [模型评估](#模型评估)
- [模型推断及可视化](#模型推断及可视化) - [模型推断及可视化](#模型推断及可视化)
- [附录](#附录)
## 安装 ## 安装
在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.0.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/install/install_doc.html#paddlepaddle)中的说明来更新PaddlePaddle。 在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.0.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/install/install_doc.html#paddlepaddle)中的说明来更新PaddlePaddle。
## 简介 ## 简介
区域卷积神经网络(RCNN)系列模型为两阶段目标检测器。通过对图像生成候选区域,提取特征,判别特征类别并修正候选框位置。
RCNN系列目前包含两个代表模型:Faster RCNN,Mask RCNN
[Faster Rcnn](https://arxiv.org/abs/1506.01497) 是典型的两阶段目标检测器。如下图所示,整体网络可以分为4个主要内容: [Faster RCNN](https://arxiv.org/abs/1506.01497) 整体网络可以分为4个主要内容:
<p align="center">
<img src="image/Faster_RCNN.jpg" height=400 width=400 hspace='10'/> <br />
Faster RCNN 目标检测模型
</p>
1. 基础卷积层。作为一种卷积神经网络目标检测方法,Faster RCNN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续RPN层和全连接层共享。本示例采用[ResNet-50](https://arxiv.org/abs/1512.03385)作为基础卷积层。 1. 基础卷积层。作为一种卷积神经网络目标检测方法,Faster RCNN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续RPN层和全连接层共享。本示例采用[ResNet-50](https://arxiv.org/abs/1512.03385)作为基础卷积层。
2. 区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸和比例得到一组锚点(anchors), 通过softmax判断锚点属于前景或者背景,再利用区域回归修正锚点从而获得精确的候选区域。 2. 区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸和比例得到一组锚点(anchors), 通过softmax判断锚点属于前景或者背景,再利用区域回归修正锚点从而获得精确的候选区域。
3. RoI Align。该层收集输入的特征图和候选区域,将候选区域映射到特征图中并池化为统一大小的区域特征图,送入全连接层判定目标类别, 该层可选用RoIPool和RoIAlign两种方式,在config.py中设置roi\_func。 3. RoI Align。该层收集输入的特征图和候选区域,将候选区域映射到特征图中并池化为统一大小的区域特征图,送入全连接层判定目标类别, 该层可选用RoIPool和RoIAlign两种方式,在config.py中设置roi\_func。
4. 检测层。利用区域特征图计算候选区域的类别,同时再次通过区域回归获得检测框最终的精确位置。 4. 检测层。利用区域特征图计算候选区域的类别,同时再次通过区域回归获得检测框最终的精确位置。
[Mask RCNN](https://arxiv.org/abs/1703.06870) 扩展自Faster RCNN,是经典的实例分割模型。
Mask RCNN同样为两阶段框架,第一阶段扫描图像生成候选框;第二阶段根据候选框得到分类结果,边界框,同时在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。
## 数据准备 ## 数据准备
[MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。 [MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。
...@@ -61,12 +63,24 @@ Faster RCNN 目标检测模型 ...@@ -61,12 +63,24 @@ Faster RCNN 目标检测模型
数据准备完毕后,可以通过如下的方式启动训练: 数据准备完毕后,可以通过如下的方式启动训练:
- Faster RCNN
python train.py \ python train.py \
--model_save_dir=output/ \ --model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model} --pretrained_model=${path_to_pretrain_model} \
--data_dir=${path_to_data} --data_dir=${path_to_data} \
--MASK_ON=False
- Mask RCNN
python train.py \
--model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model} \
--data_dir=${path_to_data} \
--MASK_ON=True
- 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。 - 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。
- 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。
- 可选参数见: - 可选参数见:
python train.py --help python train.py --help
...@@ -83,11 +97,10 @@ Faster RCNN 目标检测模型 ...@@ -83,11 +97,10 @@ Faster RCNN 目标检测模型
**训练策略:** **训练策略:**
* 采用momentum优化算法训练Faster RCNN,momentum=0.9。 * 采用momentum优化算法训练,momentum=0.9。
* 权重衰减系数为0.0001,前500轮学习率从0.00333线性增加至0.01。在120000,160000轮时使用0.1,0.01乘子进行学习率衰减,最大训练180000轮。同时我们也提供了2x模型,该模型采用更多的迭代轮数进行训练,训练360000轮,学习率在240000,320000轮衰减,其他参数不变,训练最大轮数和学习率策略可以在config.py中对max_iter和lr_steps进行设置。 * 权重衰减系数为0.0001,前500轮学习率从0.00333线性增加至0.01。在120000,160000轮时使用0.1,0.01乘子进行学习率衰减,最大训练180000轮。同时我们也提供了2x模型,该模型采用更多的迭代轮数进行训练,训练360000轮,学习率在240000,320000轮衰减,其他参数不变,训练最大轮数和学习率策略可以在config.py中对max_iter和lr_steps进行设置。
* 非基础卷积层卷积bias学习率为整体学习率2倍。 * 非基础卷积层卷积bias学习率为整体学习率2倍。
* 基础卷积层中,affine_layers参数不更新,res2层参数不更新。 * 基础卷积层中,affine_layers参数不更新,res2层参数不更新。
* 使用Nvidia Tesla V100 8卡并行,总共训练时长大约40小时。
## 模型评估 ## 模型评估
...@@ -95,14 +108,27 @@ Faster RCNN 目标检测模型 ...@@ -95,14 +108,27 @@ Faster RCNN 目标检测模型
`eval_coco_map.py`是评估模块的主要执行程序,调用示例如下: `eval_coco_map.py`是评估模块的主要执行程序,调用示例如下:
- Faster RCNN
python eval_coco_map.py \ python eval_coco_map.py \
--dataset=coco2017 \ --dataset=coco2017 \
--pretrained_model=${path_to_pretrain_model} \ --pretrained_model=${path_to_pretrain_model} \
--MASK_ON=False
- Mask RCNN
python eval_coco_map.py \
--dataset=coco2017 \
--pretrained_model=${path_to_pretrain_model} \
--MASK_ON=True
- 通过设置export CUDA\_VISIBLE\_DEVICES=0指定单卡GPU评估。 - 通过设置export CUDA\_VISIBLE\_DEVICES=0指定单卡GPU评估。
- 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。
下表为模型评估结果: 下表为模型评估结果:
Faster RCNN
| 模型 | RoI处理方式 | 批量大小 | 迭代次数 | mAP | | 模型 | RoI处理方式 | 批量大小 | 迭代次数 | mAP |
| :--------------- | :--------: | :------------: | :------------------: |------: | | :--------------- | :--------: | :------------: | :------------------: |------: |
| [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 | | [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 |
...@@ -117,6 +143,14 @@ Faster RCNN 目标检测模型 ...@@ -117,6 +143,14 @@ Faster RCNN 目标检测模型
* Fluid RoIAlign no padding: 使用RoIAlign,不对图像做填充处理。 * Fluid RoIAlign no padding: 使用RoIAlign,不对图像做填充处理。
* Fluid RoIAlign no padding 2x: 使用RoIAlign,不对图像做填充处理。训练360000轮,学习率在240000,320000轮衰减。 * Fluid RoIAlign no padding 2x: 使用RoIAlign,不对图像做填充处理。训练360000轮,学习率在240000,320000轮衰减。
Mask RCNN:
| 模型 | 批量大小 | 迭代次数 | box mAP | mask mAP |
| :--------------- | :--------: | :------------: | :--------: |------: |
| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 |
* Fluid mask no padding: 使用RoIAlign,不对图像做填充处理
## 模型推断及可视化 ## 模型推断及可视化
模型推断可以获取图像中的物体及其对应的类别,`infer.py`是主要执行程序,调用示例如下: 模型推断可以获取图像中的物体及其对应的类别,`infer.py`是主要执行程序,调用示例如下:
...@@ -131,8 +165,12 @@ Faster RCNN 目标检测模型 ...@@ -131,8 +165,12 @@ Faster RCNN 目标检测模型
下图为模型可视化预测结果: 下图为模型可视化预测结果:
<p align="center"> <p align="center">
<img src="image/000000000139.jpg" height=300 width=400 hspace='10'/> <img src="image/000000000139.jpg" height=300 width=400 hspace='10'/>
<img src="image/000000127517.jpg" height=300 width=400 hspace='10'/> <img src="image/000000127517.jpg" height=300 width=400 hspace='10'/> <br />
<img src="image/000000203864.jpg" height=300 width=400 hspace='10'/>
<img src="image/000000515077.jpg" height=300 width=400 hspace='10'/> <br />
Faster RCNN 预测可视化 Faster RCNN 预测可视化
</p> </p>
<p align="center">
<img src="image/000000000139_mask.jpg" height=300 width=400 hspace='10'/>
<img src="image/000000127517_mask.jpg" height=300 width=400 hspace='10'/> <br />
Mask RCNN 预测可视化
</p>
...@@ -6,10 +6,11 @@ sys.path.append(os.environ['ceroot']) ...@@ -6,10 +6,11 @@ sys.path.append(os.environ['ceroot'])
from kpi import CostKpi from kpi import CostKpi
from kpi import DurationKpi from kpi import DurationKpi
each_pass_duration_card1_kpi = DurationKpi(
each_pass_duration_card1_kpi = DurationKpi('each_pass_duration_card1', 0.08, 0, actived=True) 'each_pass_duration_card1', 0.08, 0, actived=True)
train_loss_card1_kpi = CostKpi('train_loss_card1', 0.08, 0) train_loss_card1_kpi = CostKpi('train_loss_card1', 0.08, 0)
each_pass_duration_card4_kpi = DurationKpi('each_pass_duration_card4', 0.08, 0, actived=True) each_pass_duration_card4_kpi = DurationKpi(
'each_pass_duration_card4', 0.08, 0, actived=True)
train_loss_card4_kpi = CostKpi('train_loss_card4', 0.08, 0) train_loss_card4_kpi = CostKpi('train_loss_card4', 0.08, 0)
tracking_kpis = [ tracking_kpis = [
...@@ -17,7 +18,7 @@ tracking_kpis = [ ...@@ -17,7 +18,7 @@ tracking_kpis = [
train_loss_card1_kpi, train_loss_card1_kpi,
each_pass_duration_card4_kpi, each_pass_duration_card4_kpi,
train_loss_card4_kpi, train_loss_card4_kpi,
] ]
def parse_log(log): def parse_log(log):
......
...@@ -69,6 +69,7 @@ def clip_xyxy_to_image(x1, y1, x2, y2, height, width): ...@@ -69,6 +69,7 @@ def clip_xyxy_to_image(x1, y1, x2, y2, height, width):
y2 = np.minimum(height - 1., np.maximum(0., y2)) y2 = np.minimum(height - 1., np.maximum(0., y2))
return x1, y1, x2, y2 return x1, y1, x2, y2
def nms(dets, thresh): def nms(dets, thresh):
"""Apply classic DPM-style greedy NMS.""" """Apply classic DPM-style greedy NMS."""
if dets.shape[0] == 0: if dets.shape[0] == 0:
...@@ -123,3 +124,21 @@ def nms(dets, thresh): ...@@ -123,3 +124,21 @@ def nms(dets, thresh):
return np.where(suppressed == 0)[0] return np.where(suppressed == 0)[0]
def expand_boxes(boxes, scale):
"""Expand an array of boxes by a given scale."""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
# Based on:
# --------------------------------------------------------
# Detectron
# Copyright (c) 2017-present, Facebook, Inc.
# Licensed under the Apache License, Version 2.0;
# Written by Ross Girshick
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
def colormap(rgb=False):
color_list = np.array([
0.000, 0.447, 0.741, 0.850, 0.325, 0.098, 0.929, 0.694, 0.125, 0.494,
0.184, 0.556, 0.466, 0.674, 0.188, 0.301, 0.745, 0.933, 0.635, 0.078,
0.184, 0.300, 0.300, 0.300, 0.600, 0.600, 0.600, 1.000, 0.000, 0.000,
1.000, 0.500, 0.000, 0.749, 0.749, 0.000, 0.000, 1.000, 0.000, 0.000,
0.000, 1.000, 0.667, 0.000, 1.000, 0.333, 0.333, 0.000, 0.333, 0.667,
0.000, 0.333, 1.000, 0.000, 0.667, 0.333, 0.000, 0.667, 0.667, 0.000,
0.667, 1.000, 0.000, 1.000, 0.333, 0.000, 1.000, 0.667, 0.000, 1.000,
1.000, 0.000, 0.000, 0.333, 0.500, 0.000, 0.667, 0.500, 0.000, 1.000,
0.500, 0.333, 0.000, 0.500, 0.333, 0.333, 0.500, 0.333, 0.667, 0.500,
0.333, 1.000, 0.500, 0.667, 0.000, 0.500, 0.667, 0.333, 0.500, 0.667,
0.667, 0.500, 0.667, 1.000, 0.500, 1.000, 0.000, 0.500, 1.000, 0.333,
0.500, 1.000, 0.667, 0.500, 1.000, 1.000, 0.500, 0.000, 0.333, 1.000,
0.000, 0.667, 1.000, 0.000, 1.000, 1.000, 0.333, 0.000, 1.000, 0.333,
0.333, 1.000, 0.333, 0.667, 1.000, 0.333, 1.000, 1.000, 0.667, 0.000,
1.000, 0.667, 0.333, 1.000, 0.667, 0.667, 1.000, 0.667, 1.000, 1.000,
1.000, 0.000, 1.000, 1.000, 0.333, 1.000, 1.000, 0.667, 1.000, 0.167,
0.000, 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000,
0.000, 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000,
0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000,
0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, 0.000,
0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, 0.833,
0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.143, 0.143, 0.143, 0.286,
0.286, 0.286, 0.429, 0.429, 0.429, 0.571, 0.571, 0.571, 0.714, 0.714,
0.714, 0.857, 0.857, 0.857, 1.000, 1.000, 1.000
]).astype(np.float32)
color_list = color_list.reshape((-1, 3)) * 255
if not rgb:
color_list = color_list[:, ::-1]
return color_list
...@@ -90,6 +90,9 @@ _C.TRAIN.freeze_at = 2 ...@@ -90,6 +90,9 @@ _C.TRAIN.freeze_at = 2
# min area of ground truth box # min area of ground truth box
_C.TRAIN.gt_min_area = -1 _C.TRAIN.gt_min_area = -1
# Use horizontally-flipped images during training?
_C.TRAIN.use_flipped = True
# #
# Inference options # Inference options
# #
...@@ -120,7 +123,7 @@ _C.TEST.rpn_post_nms_top_n = 1000 ...@@ -120,7 +123,7 @@ _C.TEST.rpn_post_nms_top_n = 1000
_C.TEST.rpn_min_size = 0.0 _C.TEST.rpn_min_size = 0.0
# max number of detections # max number of detections
_C.TEST.detectiions_per_im = 100 _C.TEST.detections_per_im = 100
# NMS threshold used on RPN proposals # NMS threshold used on RPN proposals
_C.TEST.rpn_nms_thresh = 0.7 _C.TEST.rpn_nms_thresh = 0.7
...@@ -129,6 +132,9 @@ _C.TEST.rpn_nms_thresh = 0.7 ...@@ -129,6 +132,9 @@ _C.TEST.rpn_nms_thresh = 0.7
# Model options # Model options
# #
# Whether use mask rcnn head
_C.MASK_ON = True
# weight for bbox regression targets # weight for bbox regression targets
_C.bbox_reg_weights = [0.1, 0.1, 0.2, 0.2] _C.bbox_reg_weights = [0.1, 0.1, 0.2, 0.2]
...@@ -156,6 +162,15 @@ _C.roi_resolution = 14 ...@@ -156,6 +162,15 @@ _C.roi_resolution = 14
# spatial scale # spatial scale
_C.spatial_scale = 1. / 16. _C.spatial_scale = 1. / 16.
# resolution to represent mask labels
_C.resolution = 14
# Number of channels in the mask head
_C.dim_reduced = 256
# Threshold for converting soft masks to hard masks
_C.mrcnn_thresh_binarize = 0.5
# #
# SOLVER options # SOLVER options
# #
...@@ -204,12 +219,6 @@ _C.pixel_means = [102.9801, 115.9465, 122.7717] ...@@ -204,12 +219,6 @@ _C.pixel_means = [102.9801, 115.9465, 122.7717]
# clip box to prevent overflowing # clip box to prevent overflowing
_C.bbox_clip = np.log(1000. / 16.) _C.bbox_clip = np.log(1000. / 16.)
# dataset path
_C.train_file_list = 'annotations/instances_train2017.json'
_C.train_data_dir = 'train2017'
_C.val_file_list = 'annotations/instances_val2017.json'
_C.val_data_dir = 'val2017'
def merge_cfg_from_args(args, mode): def merge_cfg_from_args(args, mode):
"""Merge config keys, values in args into the global config.""" """Merge config keys, values in args into the global config."""
......
...@@ -18,8 +18,7 @@ from __future__ import print_function ...@@ -18,8 +18,7 @@ from __future__ import print_function
import os import os
import time import time
import numpy as np import numpy as np
from eval_helper import get_nmsed_box from eval_helper import *
from eval_helper import get_dt_res
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import reader import reader
...@@ -30,21 +29,21 @@ import json ...@@ -30,21 +29,21 @@ import json
from pycocotools.coco import COCO from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval, Params from pycocotools.cocoeval import COCOeval, Params
from config import cfg from config import cfg
from roidbs import DatasetPath
def eval(): def eval():
if '2014' in cfg.dataset:
test_list = 'annotations/instances_val2014.json' data_path = DatasetPath('val')
elif '2017' in cfg.dataset: test_list = data_path.get_file_list()
test_list = 'annotations/instances_val2017.json'
image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size] image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
class_nums = cfg.class_num class_nums = cfg.class_num
devices = os.getenv("CUDA_VISIBLE_DEVICES") or "" devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(",")) devices_num = len(devices.split(","))
total_batch_size = devices_num * cfg.TRAIN.im_per_batch total_batch_size = devices_num * cfg.TRAIN.im_per_batch
cocoGt = COCO(os.path.join(cfg.data_dir, test_list)) cocoGt = COCO(test_list)
numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())} num_id_to_cat_id_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
category_ids = cocoGt.getCatIds() category_ids = cocoGt.getCatIds()
label_list = { label_list = {
item['id']: item['name'] item['id']: item['name']
...@@ -52,51 +51,88 @@ def eval(): ...@@ -52,51 +51,88 @@ def eval():
} }
label_list[0] = ['background'] label_list[0] = ['background']
model = model_builder.FasterRCNN( model = model_builder.RCNN(
add_conv_body_func=resnet.add_ResNet50_conv4_body, add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head, add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=False, use_pyreader=False,
is_train=False) mode='val')
model.build_model(image_shape) model.build_model(image_shape)
rpn_rois, confs, locs = model.eval_out() pred_boxes = model.eval_bbox_out()
if cfg.MASK_ON:
masks = model.eval_mask_out()
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace() place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
# yapf: disable # yapf: disable
if cfg.pretrained_model: if cfg.pretrained_model:
def if_exist(var): def if_exist(var):
return os.path.exists(os.path.join(cfg.pretrained_model, var.name)) return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist) fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist)
# yapf: enable # yapf: enable
test_reader = reader.test(total_batch_size) test_reader = reader.test(total_batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds()) feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
dts_res = [] dts_res = []
fetch_list = [rpn_rois, confs, locs] segms_res = []
if cfg.MASK_ON:
fetch_list = [pred_boxes, masks]
else:
fetch_list = [pred_boxes]
eval_start = time.time()
for batch_id, batch_data in enumerate(test_reader()): for batch_id, batch_data in enumerate(test_reader()):
start = time.time() start = time.time()
im_info = [] im_info = []
for data in batch_data: for data in batch_data:
im_info.append(data[1]) im_info.append(data[1])
rpn_rois_v, confs_v, locs_v = exe.run( results = exe.run(fetch_list=[v.name for v in fetch_list],
fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(batch_data), feed=feeder.feed(batch_data),
return_numpy=False) return_numpy=False)
new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v,
class_nums, im_info,
numId_to_catId_map)
dts_res += get_dt_res(total_batch_size, new_lod, nmsed_out, batch_data) pred_boxes_v = results[0]
if cfg.MASK_ON:
masks_v = results[1]
new_lod = pred_boxes_v.lod()
nmsed_out = pred_boxes_v
dts_res += get_dt_res(total_batch_size, new_lod[0], nmsed_out,
batch_data, num_id_to_cat_id_map)
if cfg.MASK_ON and np.array(masks_v).shape != (1, 1):
segms_out = segm_results(nmsed_out, masks_v, im_info)
segms_res += get_segms_res(total_batch_size, new_lod[0], segms_out,
batch_data, num_id_to_cat_id_map)
end = time.time() end = time.time()
print('batch id: {}, time: {}'.format(batch_id, end - start)) print('batch id: {}, time: {}'.format(batch_id, end - start))
with open("detection_result.json", 'w') as outfile: eval_end = time.time()
total_time = eval_end - eval_start
print('average time of eval is: {}'.format(total_time / (batch_id + 1)))
assert len(dts_res) > 0, "The number of valid bbox detected is zero.\n \
Please use reasonable model and check input data."
assert len(segms_res) > 0, "The number of valid mask detected is zero.\n \
Please use reasonable model and check input data.."
with open("detection_bbox_result.json", 'w') as outfile:
json.dump(dts_res, outfile) json.dump(dts_res, outfile)
print("start evaluate using coco api") print("start evaluate bbox using coco api")
cocoDt = cocoGt.loadRes("detection_result.json") cocoDt = cocoGt.loadRes("detection_bbox_result.json")
cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
cocoEval.evaluate() cocoEval.evaluate()
cocoEval.accumulate() cocoEval.accumulate()
cocoEval.summarize() cocoEval.summarize()
if cfg.MASK_ON:
with open("detection_segms_result.json", 'w') as outfile:
json.dump(segms_res, outfile)
print("start evaluate mask using coco api")
cocoDt = cocoGt.loadRes("detection_segms_result.json")
cocoEval = COCOeval(cocoGt, cocoDt, 'segm')
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
if __name__ == '__main__': if __name__ == '__main__':
args = parse_args() args = parse_args()
......
...@@ -21,6 +21,10 @@ from PIL import Image ...@@ -21,6 +21,10 @@ from PIL import Image
from PIL import ImageDraw from PIL import ImageDraw
from PIL import ImageFont from PIL import ImageFont
from config import cfg from config import cfg
import pycocotools.mask as mask_util
import six
from colormap import colormap
import cv2
def box_decoder(deltas, boxes, weights): def box_decoder(deltas, boxes, weights):
...@@ -80,8 +84,7 @@ def clip_tiled_boxes(boxes, im_shape): ...@@ -80,8 +84,7 @@ def clip_tiled_boxes(boxes, im_shape):
return boxes return boxes
def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info, def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info):
numId_to_catId_map):
lod = rpn_rois.lod()[0] lod = rpn_rois.lod()[0]
rpn_rois_v = np.array(rpn_rois) rpn_rois_v = np.array(rpn_rois)
variance_v = np.array(cfg.bbox_reg_weights) variance_v = np.array(cfg.bbox_reg_weights)
...@@ -106,38 +109,41 @@ def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info, ...@@ -106,38 +109,41 @@ def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info,
inds = np.where(scores_n[:, j] > cfg.TEST.score_thresh)[0] inds = np.where(scores_n[:, j] > cfg.TEST.score_thresh)[0]
scores_j = scores_n[inds, j] scores_j = scores_n[inds, j]
rois_j = rois_n[inds, j * 4:(j + 1) * 4] rois_j = rois_n[inds, j * 4:(j + 1) * 4]
dets_j = np.hstack((rois_j, scores_j[:, np.newaxis])).astype( dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
np.float32, copy=False) np.float32, copy=False)
keep = box_utils.nms(dets_j, cfg.TEST.nms_thresh) keep = box_utils.nms(dets_j, cfg.TEST.nms_thresh)
nms_dets = dets_j[keep, :] nms_dets = dets_j[keep, :]
#add labels #add labels
cat_id = numId_to_catId_map[j] label = np.array([j for _ in range(len(keep))])
label = np.array([cat_id for _ in range(len(keep))])
nms_dets = np.hstack((nms_dets, label[:, np.newaxis])).astype( nms_dets = np.hstack((nms_dets, label[:, np.newaxis])).astype(
np.float32, copy=False) np.float32, copy=False)
cls_boxes[j] = nms_dets cls_boxes[j] = nms_dets
# Limit to max_per_image detections **over all classes** # Limit to max_per_image detections **over all classes**
image_scores = np.hstack( image_scores = np.hstack(
[cls_boxes[j][:, -2] for j in range(1, class_nums)]) [cls_boxes[j][:, 1] for j in range(1, class_nums)])
if len(image_scores) > cfg.TEST.detectiions_per_im: if len(image_scores) > cfg.TEST.detections_per_im:
image_thresh = np.sort(image_scores)[-cfg.TEST.detectiions_per_im] image_thresh = np.sort(image_scores)[-cfg.TEST.detections_per_im]
for j in range(1, class_nums): for j in range(1, class_nums):
keep = np.where(cls_boxes[j][:, -2] >= image_thresh)[0] keep = np.where(cls_boxes[j][:, 1] >= image_thresh)[0]
cls_boxes[j] = cls_boxes[j][keep, :] cls_boxes[j] = cls_boxes[j][keep, :]
im_results_n = np.vstack([cls_boxes[j] for j in range(1, class_nums)]) im_results_n = np.vstack([cls_boxes[j] for j in range(1, class_nums)])
im_results[i] = im_results_n im_results[i] = im_results_n
new_lod.append(len(im_results_n) + new_lod[-1]) new_lod.append(len(im_results_n) + new_lod[-1])
boxes = im_results_n[:, :-2] boxes = im_results_n[:, 2:]
scores = im_results_n[:, -2] scores = im_results_n[:, 1]
labels = im_results_n[:, -1] labels = im_results_n[:, 0]
im_results = np.vstack([im_results[k] for k in range(len(lod) - 1)]) im_results = np.vstack([im_results[k] for k in range(len(lod) - 1)])
return new_lod, im_results return new_lod, im_results
def get_dt_res(batch_size, lod, nmsed_out, data): def get_dt_res(batch_size, lod, nmsed_out, data, num_id_to_cat_id_map):
dts_res = [] dts_res = []
nmsed_out_v = np.array(nmsed_out) nmsed_out_v = np.array(nmsed_out)
if nmsed_out_v.shape == (
1,
1, ):
return dts_res
assert (len(lod) == batch_size + 1), \ assert (len(lod) == batch_size + 1), \
"Error Lod Tensor offset dimension. Lod({}) vs. batch_size({})"\ "Error Lod Tensor offset dimension. Lod({}) vs. batch_size({})"\
.format(len(lod), batch_size) .format(len(lod), batch_size)
...@@ -150,7 +156,8 @@ def get_dt_res(batch_size, lod, nmsed_out, data): ...@@ -150,7 +156,8 @@ def get_dt_res(batch_size, lod, nmsed_out, data):
for j in range(dt_num_this_img): for j in range(dt_num_this_img):
dt = nmsed_out_v[k] dt = nmsed_out_v[k]
k = k + 1 k = k + 1
xmin, ymin, xmax, ymax, score, category_id = dt.tolist() num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
category_id = num_id_to_cat_id_map[num_id]
w = xmax - xmin + 1 w = xmax - xmin + 1
h = ymax - ymin + 1 h = ymax - ymin + 1
bbox = [xmin, ymin, w, h] bbox = [xmin, ymin, w, h]
...@@ -164,24 +171,131 @@ def get_dt_res(batch_size, lod, nmsed_out, data): ...@@ -164,24 +171,131 @@ def get_dt_res(batch_size, lod, nmsed_out, data):
return dts_res return dts_res
def draw_bounding_box_on_image(image_path, nms_out, draw_threshold, label_list): def get_segms_res(batch_size, lod, segms_out, data, num_id_to_cat_id_map):
segms_res = []
segms_out_v = np.array(segms_out)
k = 0
for i in range(batch_size):
dt_num_this_img = lod[i + 1] - lod[i]
image_id = int(data[i][-1])
for j in range(dt_num_this_img):
dt = segms_out_v[k]
k = k + 1
segm, num_id, score = dt.tolist()
cat_id = num_id_to_cat_id_map[num_id]
if six.PY3:
if 'counts' in segm:
segm['counts'] = segm['counts'].decode("utf8")
segm_res = {
'image_id': image_id,
'category_id': cat_id,
'segmentation': segm,
'score': score
}
segms_res.append(segm_res)
return segms_res
def draw_bounding_box_on_image(image_path,
nms_out,
draw_threshold,
label_list,
num_id_to_cat_id_map,
image=None):
if image is None:
image = Image.open(image_path) image = Image.open(image_path)
draw = ImageDraw.Draw(image) draw = ImageDraw.Draw(image)
im_width, im_height = image.size im_width, im_height = image.size
for dt in nms_out: for dt in np.array(nms_out):
xmin, ymin, xmax, ymax, score, category_id = dt.tolist() num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
category_id = num_id_to_cat_id_map[num_id]
if score < draw_threshold: if score < draw_threshold:
continue continue
bbox = dt[:4]
xmin, ymin, xmax, ymax = bbox
draw.line( draw.line(
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
(xmin, ymin)], (xmin, ymin)],
width=4, width=2,
fill='red') fill='red')
if image.mode == 'RGB': if image.mode == 'RGB':
draw.text((xmin, ymin), label_list[int(category_id)], (255, 255, 0)) draw.text((xmin, ymin), label_list[int(category_id)], (255, 255, 0))
image_name = image_path.split('/')[-1] image_name = image_path.split('/')[-1]
print("image with bbox drawed saved as {}".format(image_name)) print("image with bbox drawed saved as {}".format(image_name))
image.save(image_name) image.save(image_name)
def draw_mask_on_image(image_path, segms_out, draw_threshold, alpha=0.7):
image = Image.open(image_path)
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
mask_color_id = 0
w_ratio = .4
image = np.array(image).astype('float32')
for dt in np.array(segms_out):
segm, num_id, score = dt.tolist()
if score < draw_threshold:
continue
mask = mask_util.decode(segm) * 255
color_list = colormap(rgb=True)
color_mask = color_list[mask_color_id % len(color_list), 0:3]
mask_color_id += 1
for c in range(3):
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
idx = np.nonzero(mask)
image[idx[0], idx[1], :] *= 1.0 - alpha
image[idx[0], idx[1], :] += alpha * color_mask
image = Image.fromarray(image.astype('uint8'))
return image
def segm_results(im_results, masks, im_info):
im_results = np.array(im_results)
class_num = cfg.class_num
M = cfg.resolution
scale = (M + 2.0) / M
lod = masks.lod()[0]
masks_v = np.array(masks)
boxes = im_results[:, 2:]
labels = im_results[:, 0]
segms_results = [[] for _ in range(len(lod) - 1)]
sum = 0
for i in range(len(lod) - 1):
im_results_n = im_results[lod[i]:lod[i + 1]]
cls_segms = []
masks_n = masks_v[lod[i]:lod[i + 1]]
boxes_n = boxes[lod[i]:lod[i + 1]]
labels_n = labels[lod[i]:lod[i + 1]]
im_h = int(round(im_info[i][0] / im_info[i][2]))
im_w = int(round(im_info[i][1] / im_info[i][2]))
boxes_n = box_utils.expand_boxes(boxes_n, scale)
boxes_n = boxes_n.astype(np.int32)
padded_mask = np.zeros((M + 2, M + 2), dtype=np.float32)
for j in range(len(im_results_n)):
class_id = int(labels_n[j])
padded_mask[1:-1, 1:-1] = masks_n[j, class_id, :, :]
ref_box = boxes_n[j, :]
w = ref_box[2] - ref_box[0] + 1
h = ref_box[3] - ref_box[1] + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
mask = cv2.resize(padded_mask, (w, h))
mask = np.array(mask > cfg.mrcnn_thresh_binarize, dtype=np.uint8)
im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
x_0 = max(ref_box[0], 0)
x_1 = min(ref_box[2] + 1, im_w)
y_0 = max(ref_box[1], 0)
y_1 = min(ref_box[3] + 1, im_h)
im_mask[y_0:y_1, x_0:x_1] = mask[(y_0 - ref_box[1]):(y_1 - ref_box[
1]), (x_0 - ref_box[0]):(x_1 - ref_box[0])]
sum += im_mask.sum()
rle = mask_util.encode(
np.array(
im_mask[:, :, np.newaxis], order='F'))[0]
cls_segms.append(rle)
segms_results[i] = np.array(cls_segms)[:, np.newaxis]
segms_results = np.vstack([segms_results[k] for k in range(len(lod) - 1)])
im_results = np.hstack([segms_results, im_results])
return im_results[:, :3]
import os import os
import time import time
import numpy as np import numpy as np
from eval_helper import get_nmsed_box from eval_helper import *
from eval_helper import get_dt_res
from eval_helper import draw_bounding_box_on_image
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import reader import reader
...@@ -14,17 +12,16 @@ import json ...@@ -14,17 +12,16 @@ import json
from pycocotools.coco import COCO from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval, Params from pycocotools.cocoeval import COCOeval, Params
from config import cfg from config import cfg
from roidbs import DatasetPath
def infer(): def infer():
if '2014' in cfg.dataset: data_path = DatasetPath('val')
test_list = 'annotations/instances_val2014.json' test_list = data_path.get_file_list()
elif '2017' in cfg.dataset:
test_list = 'annotations/instances_val2017.json'
cocoGt = COCO(os.path.join(cfg.data_dir, test_list)) cocoGt = COCO(test_list)
numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())} num_id_to_cat_id_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
category_ids = cocoGt.getCatIds() category_ids = cocoGt.getCatIds()
label_list = { label_list = {
item['id']: item['name'] item['id']: item['name']
...@@ -34,13 +31,15 @@ def infer(): ...@@ -34,13 +31,15 @@ def infer():
image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size] image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
class_nums = cfg.class_num class_nums = cfg.class_num
model = model_builder.FasterRCNN( model = model_builder.RCNN(
add_conv_body_func=resnet.add_ResNet50_conv4_body, add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head, add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=False, use_pyreader=False,
is_train=False) mode='infer')
model.build_model(image_shape) model.build_model(image_shape)
rpn_rois, confs, locs = model.eval_out() pred_boxes = model.eval_bbox_out()
if cfg.MASK_ON:
masks = model.eval_mask_out()
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace() place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
# yapf: disable # yapf: disable
...@@ -53,17 +52,29 @@ def infer(): ...@@ -53,17 +52,29 @@ def infer():
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds()) feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
dts_res = [] dts_res = []
fetch_list = [rpn_rois, confs, locs] segms_res = []
if cfg.MASK_ON:
fetch_list = [pred_boxes, masks]
else:
fetch_list = [pred_boxes]
data = next(infer_reader()) data = next(infer_reader())
im_info = [data[0][1]] im_info = [data[0][1]]
rpn_rois_v, confs_v, locs_v = exe.run( result = exe.run(fetch_list=[v.name for v in fetch_list],
fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data), feed=feeder.feed(data),
return_numpy=False) return_numpy=False)
new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v, class_nums, pred_boxes_v = result[0]
im_info, numId_to_catId_map) if cfg.MASK_ON:
masks_v = result[1]
new_lod = pred_boxes_v.lod()
nmsed_out = pred_boxes_v
path = os.path.join(cfg.image_path, cfg.image_name) path = os.path.join(cfg.image_path, cfg.image_name)
draw_bounding_box_on_image(path, nmsed_out, cfg.draw_threshold, label_list) image = None
if cfg.MASK_ON:
segms_out = segm_results(nmsed_out, masks_v, im_info)
image = draw_mask_on_image(path, segms_out, cfg.draw_threshold)
draw_bounding_box_on_image(path, nmsed_out, cfg.draw_threshold, label_list,
num_id_to_cat_id_map, image)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -16,23 +16,23 @@ import paddle.fluid as fluid ...@@ -16,23 +16,23 @@ import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Constant from paddle.fluid.initializer import Constant
from paddle.fluid.initializer import Normal from paddle.fluid.initializer import Normal
from paddle.fluid.initializer import MSRA
from paddle.fluid.regularizer import L2Decay from paddle.fluid.regularizer import L2Decay
from config import cfg from config import cfg
class FasterRCNN(object): class RCNN(object):
def __init__(self, def __init__(self,
add_conv_body_func=None, add_conv_body_func=None,
add_roi_box_head_func=None, add_roi_box_head_func=None,
is_train=True, mode='train',
use_pyreader=True, use_pyreader=True,
use_random=True): use_random=True):
self.add_conv_body_func = add_conv_body_func self.add_conv_body_func = add_conv_body_func
self.add_roi_box_head_func = add_roi_box_head_func self.add_roi_box_head_func = add_roi_box_head_func
self.is_train = is_train self.mode = mode
self.use_pyreader = use_pyreader self.use_pyreader = use_pyreader
self.use_random = use_random self.use_random = use_random
#self.py_reader = None
def build_model(self, image_shape): def build_model(self, image_shape):
self.build_input(image_shape) self.build_input(image_shape)
...@@ -41,31 +41,62 @@ class FasterRCNN(object): ...@@ -41,31 +41,62 @@ class FasterRCNN(object):
self.rpn_heads(body_conv) self.rpn_heads(body_conv)
# Fast RCNN # Fast RCNN
self.fast_rcnn_heads(body_conv) self.fast_rcnn_heads(body_conv)
if self.mode != 'train':
self.eval_bbox()
# Mask RCNN
if cfg.MASK_ON:
self.mask_rcnn_heads(body_conv)
def loss(self): def loss(self):
losses = []
# Fast RCNN loss # Fast RCNN loss
loss_cls, loss_bbox = self.fast_rcnn_loss() loss_cls, loss_bbox = self.fast_rcnn_loss()
# RPN loss # RPN loss
rpn_cls_loss, rpn_reg_loss = self.rpn_loss() rpn_cls_loss, rpn_reg_loss = self.rpn_loss()
return loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss, losses = [loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss]
rkeys = ['loss', 'loss_cls', 'loss_bbox', \
'loss_rpn_cls', 'loss_rpn_bbox',]
if cfg.MASK_ON:
loss_mask = self.mask_rcnn_loss()
losses = losses + [loss_mask]
rkeys = rkeys + ["loss_mask"]
loss = fluid.layers.sum(losses)
rloss = [loss] + losses
return rloss, rkeys
def eval_out(self): def eval_mask_out(self):
cls_prob = fluid.layers.softmax(self.cls_score, use_cudnn=False) return self.mask_fcn_logits
return [self.rpn_rois, cls_prob, self.bbox_pred]
def eval_bbox_out(self):
return self.pred_result
def build_input(self, image_shape): def build_input(self, image_shape):
if self.use_pyreader: if self.use_pyreader:
in_shapes = [[-1] + image_shape, [-1, 4], [-1, 1], [-1, 1],
[-1, 3], [-1, 1]]
lod_levels = [0, 1, 1, 1, 0, 0]
dtypes = [
'float32', 'float32', 'int32', 'int32', 'float32', 'int32'
]
if cfg.MASK_ON:
in_shapes.append([-1, 2])
lod_levels.append(3)
dtypes.append('float32')
self.py_reader = fluid.layers.py_reader( self.py_reader = fluid.layers.py_reader(
capacity=64, capacity=64,
shapes=[[-1] + image_shape, [-1, 4], [-1, 1], [-1, 1], [-1, 3], shapes=in_shapes,
[-1, 1]], lod_levels=lod_levels,
lod_levels=[0, 1, 1, 1, 0, 0], dtypes=dtypes,
dtypes=[
"float32", "float32", "int32", "int32", "float32", "int32"
],
use_double_buffer=True) use_double_buffer=True)
self.image, self.gt_box, self.gt_label, self.is_crowd, \ ins = fluid.layers.read_file(self.py_reader)
self.im_info, self.im_id = fluid.layers.read_file(self.py_reader) self.image = ins[0]
self.gt_box = ins[1]
self.gt_label = ins[2]
self.is_crowd = ins[3]
self.im_info = ins[4]
self.im_id = ins[5]
if cfg.MASK_ON:
self.gt_masks = ins[6]
else: else:
self.image = fluid.layers.data( self.image = fluid.layers.data(
name='image', shape=image_shape, dtype='float32') name='image', shape=image_shape, dtype='float32')
...@@ -74,24 +105,57 @@ class FasterRCNN(object): ...@@ -74,24 +105,57 @@ class FasterRCNN(object):
self.gt_label = fluid.layers.data( self.gt_label = fluid.layers.data(
name='gt_label', shape=[1], dtype='int32', lod_level=1) name='gt_label', shape=[1], dtype='int32', lod_level=1)
self.is_crowd = fluid.layers.data( self.is_crowd = fluid.layers.data(
name='is_crowd', name='is_crowd', shape=[1], dtype='int32', lod_level=1)
shape=[-1],
dtype='int32',
lod_level=1,
append_batch_size=False)
self.im_info = fluid.layers.data( self.im_info = fluid.layers.data(
name='im_info', shape=[3], dtype='float32') name='im_info', shape=[3], dtype='float32')
self.im_id = fluid.layers.data( self.im_id = fluid.layers.data(
name='im_id', shape=[1], dtype='int32') name='im_id', shape=[1], dtype='int32')
if cfg.MASK_ON:
self.gt_masks = fluid.layers.data(
name='gt_masks', shape=[2], dtype='float32', lod_level=3)
def feeds(self): def feeds(self):
if not self.is_train: if self.mode == 'infer':
return [self.image, self.im_info]
if self.mode == 'val':
return [self.image, self.im_info, self.im_id] return [self.image, self.im_info, self.im_id]
if not cfg.MASK_ON:
return [
self.image, self.gt_box, self.gt_label, self.is_crowd,
self.im_info, self.im_id
]
return [ return [
self.image, self.gt_box, self.gt_label, self.is_crowd, self.im_info, self.image, self.gt_box, self.gt_label, self.is_crowd, self.im_info,
self.im_id self.im_id, self.gt_masks
] ]
def eval_bbox(self):
self.im_scale = fluid.layers.slice(
self.im_info, [1], starts=[2], ends=[3])
im_scale_lod = fluid.layers.sequence_expand(self.im_scale,
self.rpn_rois)
boxes = self.rpn_rois / im_scale_lod
cls_prob = fluid.layers.softmax(self.cls_score, use_cudnn=False)
bbox_pred_reshape = fluid.layers.reshape(self.bbox_pred,
(-1, cfg.class_num, 4))
decoded_box = fluid.layers.box_coder(
prior_box=boxes,
prior_box_var=cfg.bbox_reg_weights,
target_box=bbox_pred_reshape,
code_type='decode_center_size',
box_normalized=False,
axis=1)
cliped_box = fluid.layers.box_clip(
input=decoded_box, im_info=self.im_info)
self.pred_result = fluid.layers.multiclass_nms(
bboxes=cliped_box,
scores=cls_prob,
score_threshold=cfg.TEST.score_thresh,
nms_top_k=-1,
nms_threshold=cfg.TEST.nms_thresh,
keep_top_k=cfg.TEST.detections_per_im,
normalized=False)
def rpn_heads(self, rpn_input): def rpn_heads(self, rpn_input):
# RPN hidden representation # RPN hidden representation
dim_out = rpn_input.shape[1] dim_out = rpn_input.shape[1]
...@@ -151,13 +215,13 @@ class FasterRCNN(object): ...@@ -151,13 +215,13 @@ class FasterRCNN(object):
rpn_cls_score_prob = fluid.layers.sigmoid( rpn_cls_score_prob = fluid.layers.sigmoid(
self.rpn_cls_score, name='rpn_cls_score_prob') self.rpn_cls_score, name='rpn_cls_score_prob')
param_obj = cfg.TRAIN if self.is_train else cfg.TEST param_obj = cfg.TRAIN if self.mode == 'train' else cfg.TEST
pre_nms_top_n = param_obj.rpn_pre_nms_top_n pre_nms_top_n = param_obj.rpn_pre_nms_top_n
post_nms_top_n = param_obj.rpn_post_nms_top_n post_nms_top_n = param_obj.rpn_post_nms_top_n
nms_thresh = param_obj.rpn_nms_thresh nms_thresh = param_obj.rpn_nms_thresh
min_size = param_obj.rpn_min_size min_size = param_obj.rpn_min_size
eta = param_obj.rpn_eta eta = param_obj.rpn_eta
rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals( self.rpn_rois, self.rpn_roi_probs = fluid.layers.generate_proposals(
scores=rpn_cls_score_prob, scores=rpn_cls_score_prob,
bbox_deltas=self.rpn_bbox_pred, bbox_deltas=self.rpn_bbox_pred,
im_info=self.im_info, im_info=self.im_info,
...@@ -168,10 +232,9 @@ class FasterRCNN(object): ...@@ -168,10 +232,9 @@ class FasterRCNN(object):
nms_thresh=nms_thresh, nms_thresh=nms_thresh,
min_size=min_size, min_size=min_size,
eta=eta) eta=eta)
self.rpn_rois = rpn_rois if self.mode == 'train':
if self.is_train:
outs = fluid.layers.generate_proposal_labels( outs = fluid.layers.generate_proposal_labels(
rpn_rois=rpn_rois, rpn_rois=self.rpn_rois,
gt_classes=self.gt_label, gt_classes=self.gt_label,
is_crowd=self.is_crowd, is_crowd=self.is_crowd,
gt_boxes=self.gt_box, gt_boxes=self.gt_box,
...@@ -191,27 +254,28 @@ class FasterRCNN(object): ...@@ -191,27 +254,28 @@ class FasterRCNN(object):
self.bbox_inside_weights = outs[3] self.bbox_inside_weights = outs[3]
self.bbox_outside_weights = outs[4] self.bbox_outside_weights = outs[4]
if cfg.MASK_ON:
mask_out = fluid.layers.generate_mask_labels(
im_info=self.im_info,
gt_classes=self.gt_label,
is_crowd=self.is_crowd,
gt_segms=self.gt_masks,
rois=self.rois,
labels_int32=self.labels_int32,
num_classes=cfg.class_num,
resolution=cfg.resolution)
self.mask_rois = mask_out[0]
self.roi_has_mask_int32 = mask_out[1]
self.mask_int32 = mask_out[2]
def fast_rcnn_heads(self, roi_input): def fast_rcnn_heads(self, roi_input):
if self.is_train: if self.mode == 'train':
pool_rois = self.rois pool_rois = self.rois
else: else:
pool_rois = self.rpn_rois pool_rois = self.rpn_rois
if cfg.roi_func == 'RoIPool': self.res5_2_sum = self.add_roi_box_head_func(roi_input, pool_rois)
pool = fluid.layers.roi_pool( rcnn_out = fluid.layers.pool2d(
input=roi_input, self.res5_2_sum, pool_type='avg', pool_size=7, name='res5_pool')
rois=pool_rois,
pooled_height=cfg.roi_resolution,
pooled_width=cfg.roi_resolution,
spatial_scale=cfg.spatial_scale)
elif cfg.roi_func == 'RoIAlign':
pool = fluid.layers.roi_align(
input=roi_input,
rois=pool_rois,
pooled_height=cfg.roi_resolution,
pooled_width=cfg.roi_resolution,
spatial_scale=cfg.spatial_scale,
sampling_ratio=cfg.sampling_ratio)
rcnn_out = self.add_roi_box_head_func(pool)
self.cls_score = fluid.layers.fc(input=rcnn_out, self.cls_score = fluid.layers.fc(input=rcnn_out,
size=cfg.class_num, size=cfg.class_num,
act=None, act=None,
...@@ -237,15 +301,87 @@ class FasterRCNN(object): ...@@ -237,15 +301,87 @@ class FasterRCNN(object):
learning_rate=2., learning_rate=2.,
regularizer=L2Decay(0.))) regularizer=L2Decay(0.)))
def SuffixNet(self, conv5):
mask_out = fluid.layers.conv2d_transpose(
input=conv5,
num_filters=cfg.dim_reduced,
filter_size=2,
stride=2,
act='relu',
param_attr=ParamAttr(
name='conv5_mask_w', initializer=MSRA(uniform=False)),
bias_attr=ParamAttr(
name='conv5_mask_b', learning_rate=2., regularizer=L2Decay(0.)))
act_func = None
if self.mode != 'train':
act_func = 'sigmoid'
mask_fcn_logits = fluid.layers.conv2d(
input=mask_out,
num_filters=cfg.class_num,
filter_size=1,
act=act_func,
param_attr=ParamAttr(
name='mask_fcn_logits_w', initializer=MSRA(uniform=False)),
bias_attr=ParamAttr(
name="mask_fcn_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
if self.mode != 'train':
mask_fcn_logits = fluid.layers.lod_reset(mask_fcn_logits,
self.pred_result)
return mask_fcn_logits
def mask_rcnn_heads(self, mask_input):
if self.mode == 'train':
conv5 = fluid.layers.gather(self.res5_2_sum,
self.roi_has_mask_int32)
self.mask_fcn_logits = self.SuffixNet(conv5)
else:
self.eval_bbox()
pred_res_shape = fluid.layers.shape(self.pred_result)
shape = fluid.layers.reduce_prod(pred_res_shape)
shape = fluid.layers.reshape(shape, [1, 1])
ones = fluid.layers.fill_constant([1, 1], value=1, dtype='int32')
cond = fluid.layers.equal(x=shape, y=ones)
ie = fluid.layers.IfElse(cond)
with ie.true_block():
pred_res_null = ie.input(self.pred_result)
ie.output(pred_res_null)
with ie.false_block():
pred_res = ie.input(self.pred_result)
pred_boxes = fluid.layers.slice(
pred_res, [1], starts=[2], ends=[6])
im_scale_lod = fluid.layers.sequence_expand(self.im_scale,
pred_boxes)
mask_rois = pred_boxes * im_scale_lod
conv5 = self.add_roi_box_head_func(mask_input, mask_rois)
mask_fcn = self.SuffixNet(conv5)
ie.output(mask_fcn)
self.mask_fcn_logits = ie()[0]
def mask_rcnn_loss(self):
mask_label = fluid.layers.cast(x=self.mask_int32, dtype='float32')
reshape_dim = cfg.class_num * cfg.resolution * cfg.resolution
mask_fcn_logits_reshape = fluid.layers.reshape(self.mask_fcn_logits,
(-1, reshape_dim))
loss_mask = fluid.layers.sigmoid_cross_entropy_with_logits(
x=mask_fcn_logits_reshape,
label=mask_label,
ignore_index=-1,
normalize=True)
loss_mask = fluid.layers.reduce_sum(loss_mask, name='loss_mask')
return loss_mask
def fast_rcnn_loss(self): def fast_rcnn_loss(self):
labels_int64 = fluid.layers.cast(x=self.labels_int32, dtype='int64') labels_int64 = fluid.layers.cast(x=self.labels_int32, dtype='int64')
labels_int64.stop_gradient = True labels_int64.stop_gradient = True
#loss_cls = fluid.layers.softmax_with_cross_entropy( loss_cls = fluid.layers.softmax_with_cross_entropy(
# logits=cls_score, logits=self.cls_score,
# label=labels_int64 label=labels_int64,
# ) numeric_stable_mode=True, )
cls_prob = fluid.layers.softmax(self.cls_score, use_cudnn=False)
loss_cls = fluid.layers.cross_entropy(cls_prob, labels_int64)
loss_cls = fluid.layers.reduce_mean(loss_cls) loss_cls = fluid.layers.reduce_mean(loss_cls)
loss_bbox = fluid.layers.smooth_l1( loss_bbox = fluid.layers.smooth_l1(
x=self.bbox_pred, x=self.bbox_pred,
...@@ -303,5 +439,4 @@ class FasterRCNN(object): ...@@ -303,5 +439,4 @@ class FasterRCNN(object):
norm = fluid.layers.reduce_prod(score_shape) norm = fluid.layers.reduce_prod(score_shape)
norm.stop_gradient = True norm.stop_gradient = True
rpn_reg_loss = rpn_reg_loss / norm rpn_reg_loss = rpn_reg_loss / norm
return rpn_cls_loss, rpn_reg_loss return rpn_cls_loss, rpn_reg_loss
...@@ -160,8 +160,22 @@ def add_ResNet50_conv4_body(body_input): ...@@ -160,8 +160,22 @@ def add_ResNet50_conv4_body(body_input):
return res4 return res4
def add_ResNet_roi_conv5_head(head_input): def add_ResNet_roi_conv5_head(head_input, rois):
res5 = layer_warp(bottleneck, head_input, 512, 3, 2, name="res5") if cfg.roi_func == 'RoIPool':
res5_pool = fluid.layers.pool2d( pool = fluid.layers.roi_pool(
res5, pool_type='avg', pool_size=7, name='res5_pool') input=head_input,
return res5_pool rois=rois,
pooled_height=cfg.roi_resolution,
pooled_width=cfg.roi_resolution,
spatial_scale=cfg.spatial_scale)
elif cfg.roi_func == 'RoIAlign':
pool = fluid.layers.roi_align(
input=head_input,
rois=rois,
pooled_height=cfg.roi_resolution,
pooled_width=cfg.roi_resolution,
spatial_scale=cfg.spatial_scale,
sampling_ratio=cfg.sampling_ratio)
res5 = layer_warp(bottleneck, pool, 512, 3, 2, name="res5")
return res5
...@@ -37,18 +37,15 @@ def train(): ...@@ -37,18 +37,15 @@ def train():
devices = os.getenv("CUDA_VISIBLE_DEVICES") or "" devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(",")) devices_num = len(devices.split(","))
total_batch_size = devices_num * cfg.TRAIN.im_per_batch total_batch_size = devices_num * cfg.TRAIN.im_per_batch
model = model_builder.FasterRCNN( model = model_builder.RCNN(
add_conv_body_func=resnet.add_ResNet50_conv4_body, add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head, add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=cfg.use_pyreader, use_pyreader=cfg.use_pyreader,
use_random=False) use_random=False)
model.build_model(image_shape) model.build_model(image_shape)
loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss = model.loss() losses, keys = model.loss()
loss_cls.persistable = True loss = losses[0]
loss_bbox.persistable = True fetch_list = [loss]
rpn_cls_loss.persistable = True
rpn_reg_loss.persistable = True
loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
boundaries = cfg.lr_steps boundaries = cfg.lr_steps
gamma = cfg.lr_gamma gamma = cfg.lr_gamma
...@@ -95,8 +92,6 @@ def train(): ...@@ -95,8 +92,6 @@ def train():
train_reader = reader.train(batch_size=total_batch_size, shuffle=False) train_reader = reader.train(batch_size=total_batch_size, shuffle=False)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds()) feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
fetch_list = [loss, loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss]
def run(iterations): def run(iterations):
reader_time = [] reader_time = []
run_time = [] run_time = []
...@@ -109,20 +104,16 @@ def train(): ...@@ -109,20 +104,16 @@ def train():
reader_time.append(end_time - start_time) reader_time.append(end_time - start_time)
start_time = time.time() start_time = time.time()
if cfg.parallel: if cfg.parallel:
losses = train_exe.run(fetch_list=[v.name for v in fetch_list], outs = train_exe.run(fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data)) feed=feeder.feed(data))
else: else:
losses = exe.run(fluid.default_main_program(), outs = exe.run(fluid.default_main_program(),
fetch_list=[v.name for v in fetch_list], fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data)) feed=feeder.feed(data))
end_time = time.time() end_time = time.time()
run_time.append(end_time - start_time) run_time.append(end_time - start_time)
total_images += len(data) total_images += len(data)
print("Batch {:d}, loss {:.6f} ".format(batch_id, np.mean(outs[0])))
lr = np.array(fluid.global_scope().find_var('learning_rate')
.get_tensor())
print("Batch {:d}, lr {:.6f}, loss {:.6f} ".format(batch_id, lr[0],
losses[0][0]))
return reader_time, run_time, total_images return reader_time, run_time, total_images
def run_pyreader(iterations): def run_pyreader(iterations):
...@@ -135,18 +126,16 @@ def train(): ...@@ -135,18 +126,16 @@ def train():
for batch_id in range(iterations): for batch_id in range(iterations):
start_time = time.time() start_time = time.time()
if cfg.parallel: if cfg.parallel:
losses = train_exe.run( outs = train_exe.run(
fetch_list=[v.name for v in fetch_list]) fetch_list=[v.name for v in fetch_list])
else: else:
losses = exe.run(fluid.default_main_program(), outs = exe.run(fluid.default_main_program(),
fetch_list=[v.name for v in fetch_list]) fetch_list=[v.name for v in fetch_list])
end_time = time.time() end_time = time.time()
run_time.append(end_time - start_time) run_time.append(end_time - start_time)
total_images += devices_num total_images += devices_num
lr = np.array(fluid.global_scope().find_var('learning_rate') print("Batch {:d}, loss {:.6f} ".format(batch_id,
.get_tensor()) np.mean(outs[0])))
print("Batch {:d}, lr {:.6f}, loss {:.6f} ".format(batch_id, lr[
0], losses[0][0]))
except fluid.core.EOFException: except fluid.core.EOFException:
py_reader.reset() py_reader.reset()
......
...@@ -27,6 +27,48 @@ from collections import deque ...@@ -27,6 +27,48 @@ from collections import deque
from roidbs import JsonDataset from roidbs import JsonDataset
import data_utils import data_utils
from config import cfg from config import cfg
import segm_utils
def roidb_reader(roidb, mode):
im, im_scales = data_utils.get_image_blob(roidb, mode)
im_id = roidb['id']
im_height = np.round(roidb['height'] * im_scales)
im_width = np.round(roidb['width'] * im_scales)
im_info = np.array([im_height, im_width, im_scales], dtype=np.float32)
if mode == 'infer':
return im, im_info
if mode == 'val':
return im, im_info, im_id
gt_boxes = roidb['gt_boxes'].astype('float32')
gt_classes = roidb['gt_classes'].astype('int32')
is_crowd = roidb['is_crowd'].astype('int32')
segms = roidb['segms']
outs = (im, gt_boxes, gt_classes, is_crowd, im_info, im_id)
if cfg.MASK_ON:
gt_masks = []
valid = True
segms = roidb['segms']
assert len(segms) == is_crowd.shape[0]
for i in range(len(roidb['segms'])):
segm, iscrowd = segms[i], is_crowd[i]
gt_segm = []
if iscrowd:
gt_segm.append([[0, 0]])
else:
for poly in segm:
if len(poly) == 0:
valid = False
break
gt_segm.append(np.array(poly).reshape(-1, 2))
if (not valid) or len(gt_segm) == 0:
break
gt_masks.append(gt_segm)
outs = outs + (gt_masks, )
return outs
def coco(mode, def coco(mode,
...@@ -34,48 +76,16 @@ def coco(mode, ...@@ -34,48 +76,16 @@ def coco(mode,
total_batch_size=None, total_batch_size=None,
padding_total=False, padding_total=False,
shuffle=False): shuffle=False):
if 'coco2014' in cfg.dataset:
cfg.train_file_list = 'annotations/instances_train2014.json'
cfg.train_data_dir = 'train2014'
cfg.val_file_list = 'annotations/instances_val2014.json'
cfg.val_data_dir = 'val2014'
elif 'coco2017' in cfg.dataset:
cfg.train_file_list = 'annotations/instances_train2017.json'
cfg.train_data_dir = 'train2017'
cfg.val_file_list = 'annotations/instances_val2017.json'
cfg.val_data_dir = 'val2017'
else:
raise NotImplementedError('Dataset {} not supported'.format(
cfg.dataset))
cfg.mean_value = np.array(cfg.pixel_means)[np.newaxis, cfg.mean_value = np.array(cfg.pixel_means)[np.newaxis,
np.newaxis, :].astype('float32') np.newaxis, :].astype('float32')
total_batch_size = total_batch_size if total_batch_size else batch_size total_batch_size = total_batch_size if total_batch_size else batch_size
if mode != 'infer': if mode != 'infer':
assert total_batch_size % batch_size == 0 assert total_batch_size % batch_size == 0
if mode == 'train': json_dataset = JsonDataset(mode)
cfg.train_file_list = os.path.join(cfg.data_dir, cfg.train_file_list)
cfg.train_data_dir = os.path.join(cfg.data_dir, cfg.train_data_dir)
elif mode == 'test' or mode == 'infer':
cfg.val_file_list = os.path.join(cfg.data_dir, cfg.val_file_list)
cfg.val_data_dir = os.path.join(cfg.data_dir, cfg.val_data_dir)
json_dataset = JsonDataset(train=(mode == 'train'))
roidbs = json_dataset.get_roidb() roidbs = json_dataset.get_roidb()
print("{} on {} with {} roidbs".format(mode, cfg.dataset, len(roidbs))) print("{} on {} with {} roidbs".format(mode, cfg.dataset, len(roidbs)))
def roidb_reader(roidb, mode):
im, im_scales = data_utils.get_image_blob(roidb, mode)
im_id = roidb['id']
im_height = np.round(roidb['height'] * im_scales)
im_width = np.round(roidb['width'] * im_scales)
im_info = np.array([im_height, im_width, im_scales], dtype=np.float32)
if mode == 'test' or mode == 'infer':
return im, im_info, im_id
gt_boxes = roidb['gt_boxes'].astype('float32')
gt_classes = roidb['gt_classes'].astype('int32')
is_crowd = roidb['is_crowd'].astype('int32')
return im, gt_boxes, gt_classes, is_crowd, im_info, im_id
def padding_minibatch(batch_data): def padding_minibatch(batch_data):
if len(batch_data) == 1: if len(batch_data) == 1:
return batch_data return batch_data
...@@ -93,39 +103,53 @@ def coco(mode, ...@@ -93,39 +103,53 @@ def coco(mode,
def reader(): def reader():
if mode == "train": if mode == "train":
if shuffle:
roidb_perm = deque(np.random.permutation(roidbs)) roidb_perm = deque(np.random.permutation(roidbs))
else:
roidb_perm = deque(roidbs)
roidb_cur = 0 roidb_cur = 0
count = 0
batch_out = [] batch_out = []
device_num = total_batch_size / batch_size
while True: while True:
roidb = roidb_perm[0] roidb = roidb_perm[0]
roidb_cur += 1 roidb_cur += 1
roidb_perm.rotate(-1) roidb_perm.rotate(-1)
if roidb_cur >= len(roidbs): if roidb_cur >= len(roidbs):
if shuffle:
roidb_perm = deque(np.random.permutation(roidbs)) roidb_perm = deque(np.random.permutation(roidbs))
else:
roidb_perm = deque(roidbs)
roidb_cur = 0 roidb_cur = 0
im, gt_boxes, gt_classes, is_crowd, im_info, im_id = roidb_reader( # im, gt_boxes, gt_classes, is_crowd, im_info, im_id, gt_masks
roidb, mode) datas = roidb_reader(roidb, mode)
if gt_boxes.shape[0] == 0: if datas[1].shape[0] == 0:
continue continue
batch_out.append( if cfg.MASK_ON:
(im, gt_boxes, gt_classes, is_crowd, im_info, im_id)) if len(datas[-1]) != datas[1].shape[0]:
continue
batch_out.append(datas)
if not padding_total: if not padding_total:
if len(batch_out) == batch_size: if len(batch_out) == batch_size:
yield padding_minibatch(batch_out) yield padding_minibatch(batch_out)
count += 1
batch_out = [] batch_out = []
else: else:
if len(batch_out) == total_batch_size: if len(batch_out) == total_batch_size:
batch_out = padding_minibatch(batch_out) batch_out = padding_minibatch(batch_out)
for i in range(total_batch_size / batch_size): for i in range(device_num):
sub_batch_out = [] sub_batch_out = []
for j in range(batch_size): for j in range(batch_size):
sub_batch_out.append(batch_out[i * batch_size + sub_batch_out.append(batch_out[i * batch_size +
j]) j])
yield sub_batch_out yield sub_batch_out
count += 1
sub_batch_out = [] sub_batch_out = []
batch_out = [] batch_out = []
iter_id = count // device_num
elif mode == "test": if iter_id >= cfg.max_iter:
return
elif mode == "val":
batch_out = [] batch_out = []
for roidb in roidbs: for roidb in roidbs:
im, im_info, im_id = roidb_reader(roidb, mode) im, im_info, im_id = roidb_reader(roidb, mode)
...@@ -140,8 +164,8 @@ def coco(mode, ...@@ -140,8 +164,8 @@ def coco(mode,
for roidb in roidbs: for roidb in roidbs:
if cfg.image_name not in roidb['image']: if cfg.image_name not in roidb['image']:
continue continue
im, im_info, im_id = roidb_reader(roidb, mode) im, im_info = roidb_reader(roidb, mode)
batch_out = [(im, im_info, im_id)] batch_out = [(im, im_info)]
yield batch_out yield batch_out
return reader return reader
...@@ -153,7 +177,7 @@ def train(batch_size, total_batch_size=None, padding_total=False, shuffle=True): ...@@ -153,7 +177,7 @@ def train(batch_size, total_batch_size=None, padding_total=False, shuffle=True):
def test(batch_size, total_batch_size=None, padding_total=False): def test(batch_size, total_batch_size=None, padding_total=False):
return coco('test', batch_size, total_batch_size, shuffle=False) return coco('val', batch_size, total_batch_size, shuffle=False)
def infer(): def infer():
......
...@@ -36,24 +36,39 @@ import matplotlib ...@@ -36,24 +36,39 @@ import matplotlib
matplotlib.use('Agg') matplotlib.use('Agg')
from pycocotools.coco import COCO from pycocotools.coco import COCO
import box_utils import box_utils
import segm_utils
from config import cfg from config import cfg
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
class DatasetPath(object):
def __init__(self, mode):
self.mode = mode
mode_name = 'train' if mode == 'train' else 'val'
if cfg.dataset != 'coco2014' and cfg.dataset != 'coco2017':
raise NotImplementedError('Dataset {} not supported'.format(
cfg.dataset))
self.sub_name = mode_name + cfg.dataset[-4:]
def get_data_dir(self):
return os.path.join(cfg.data_dir, self.sub_name)
def get_file_list(self):
sfile_list = 'annotations/instances_' + self.sub_name + '.json'
return os.path.join(cfg.data_dir, sfile_list)
class JsonDataset(object): class JsonDataset(object):
"""A class representing a COCO json dataset.""" """A class representing a COCO json dataset."""
def __init__(self, train=False): def __init__(self, mode):
print('Creating: {}'.format(cfg.dataset)) print('Creating: {}'.format(cfg.dataset))
self.name = cfg.dataset self.name = cfg.dataset
self.is_train = train self.is_train = mode == 'train'
if self.is_train: data_path = DatasetPath(mode)
data_dir = cfg.train_data_dir data_dir = data_path.get_data_dir()
file_list = cfg.train_file_list file_list = data_path.get_file_list()
else:
data_dir = cfg.val_data_dir
file_list = cfg.val_file_list
self.image_directory = data_dir self.image_directory = data_dir
self.COCO = COCO(file_list) self.COCO = COCO(file_list)
# Set up dataset classes # Set up dataset classes
...@@ -91,6 +106,7 @@ class JsonDataset(object): ...@@ -91,6 +106,7 @@ class JsonDataset(object):
end_time = time.time() end_time = time.time()
print('_add_gt_annotations took {:.3f}s'.format(end_time - print('_add_gt_annotations took {:.3f}s'.format(end_time -
start_time)) start_time))
if cfg.TRAIN.use_flipped:
print('Appending horizontally-flipped training examples...') print('Appending horizontally-flipped training examples...')
self._extend_with_flipped_entries(roidb) self._extend_with_flipped_entries(roidb)
print('Loaded dataset: {:s}'.format(self.name)) print('Loaded dataset: {:s}'.format(self.name))
...@@ -111,6 +127,7 @@ class JsonDataset(object): ...@@ -111,6 +127,7 @@ class JsonDataset(object):
entry['gt_classes'] = np.empty((0), dtype=np.int32) entry['gt_classes'] = np.empty((0), dtype=np.int32)
entry['gt_id'] = np.empty((0), dtype=np.int32) entry['gt_id'] = np.empty((0), dtype=np.int32)
entry['is_crowd'] = np.empty((0), dtype=np.bool) entry['is_crowd'] = np.empty((0), dtype=np.bool)
entry['segms'] = []
# Remove unwanted fields that come from the json file (if they exist) # Remove unwanted fields that come from the json file (if they exist)
for k in ['date_captured', 'url', 'license', 'file_name']: for k in ['date_captured', 'url', 'license', 'file_name']:
if k in entry: if k in entry:
...@@ -126,9 +143,15 @@ class JsonDataset(object): ...@@ -126,9 +143,15 @@ class JsonDataset(object):
objs = self.COCO.loadAnns(ann_ids) objs = self.COCO.loadAnns(ann_ids)
# Sanitize bboxes -- some are invalid # Sanitize bboxes -- some are invalid
valid_objs = [] valid_objs = []
valid_segms = []
width = entry['width'] width = entry['width']
height = entry['height'] height = entry['height']
for obj in objs: for obj in objs:
if isinstance(obj['segmentation'], list):
# Valid polygons have >= 3 points, so require >= 6 coordinates
obj['segmentation'] = [
p for p in obj['segmentation'] if len(p) >= 6
]
if obj['area'] < cfg.TRAIN.gt_min_area: if obj['area'] < cfg.TRAIN.gt_min_area:
continue continue
if 'ignore' in obj and obj['ignore'] == 1: if 'ignore' in obj and obj['ignore'] == 1:
...@@ -141,6 +164,8 @@ class JsonDataset(object): ...@@ -141,6 +164,8 @@ class JsonDataset(object):
if obj['area'] > 0 and x2 > x1 and y2 > y1: if obj['area'] > 0 and x2 > x1 and y2 > y1:
obj['clean_bbox'] = [x1, y1, x2, y2] obj['clean_bbox'] = [x1, y1, x2, y2]
valid_objs.append(obj) valid_objs.append(obj)
valid_segms.append(obj['segmentation'])
num_valid_objs = len(valid_objs) num_valid_objs = len(valid_objs)
gt_boxes = np.zeros((num_valid_objs, 4), dtype=entry['gt_boxes'].dtype) gt_boxes = np.zeros((num_valid_objs, 4), dtype=entry['gt_boxes'].dtype)
...@@ -158,6 +183,7 @@ class JsonDataset(object): ...@@ -158,6 +183,7 @@ class JsonDataset(object):
entry['gt_classes'] = np.append(entry['gt_classes'], gt_classes) entry['gt_classes'] = np.append(entry['gt_classes'], gt_classes)
entry['gt_id'] = np.append(entry['gt_id'], gt_id) entry['gt_id'] = np.append(entry['gt_id'], gt_id)
entry['is_crowd'] = np.append(entry['is_crowd'], is_crowd) entry['is_crowd'] = np.append(entry['is_crowd'], is_crowd)
entry['segms'].extend(valid_segms)
def _extend_with_flipped_entries(self, roidb): def _extend_with_flipped_entries(self, roidb):
"""Flip each entry in the given roidb and return a new roidb that is the """Flip each entry in the given roidb and return a new roidb that is the
...@@ -175,11 +201,13 @@ class JsonDataset(object): ...@@ -175,11 +201,13 @@ class JsonDataset(object):
gt_boxes[:, 2] = width - oldx1 - 1 gt_boxes[:, 2] = width - oldx1 - 1
assert (gt_boxes[:, 2] >= gt_boxes[:, 0]).all() assert (gt_boxes[:, 2] >= gt_boxes[:, 0]).all()
flipped_entry = {} flipped_entry = {}
dont_copy = ('gt_boxes', 'flipped') dont_copy = ('gt_boxes', 'flipped', 'segms')
for k, v in entry.items(): for k, v in entry.items():
if k not in dont_copy: if k not in dont_copy:
flipped_entry[k] = v flipped_entry[k] = v
flipped_entry['gt_boxes'] = gt_boxes flipped_entry['gt_boxes'] = gt_boxes
flipped_entry['segms'] = segm_utils.flip_segms(
entry['segms'], entry['height'], entry['width'])
flipped_entry['flipped'] = True flipped_entry['flipped'] = True
flipped_roidb.append(flipped_entry) flipped_roidb.append(flipped_entry)
roidb.extend(flipped_roidb) roidb.extend(flipped_roidb)
......
#!/bin/bash
export CUDA_VISIBLE_DEVICES=0
model=$1 # faster_rcnn, mask_rcnn
if [ "$model" = "faster_rcnn" ]; then
mask_on="--MASK_ON False"
elif [ "$model" = "mask_rcnn" ]; then
mask_on="--MASK_ON True"
else
echo "Invalid model provided. Please use one of {faster_rcnn, mask_rcnn}"
exit 1
fi
python -u ../eval_coco_map.py \
$mask_on \
--pretrained_model=../output/model_iter179999 \
--data_dir=../dataset/coco/ \
#!/bin/bash
export CUDA_VISIBLE_DEVICES=0
model=$1 # faster_rcnn, mask_rcnn
if [ "$model" = "faster_rcnn" ]; then
mask_on="--MASK_ON False"
elif [ "$model" = "mask_rcnn" ]; then
mask_on="--MASK_ON True"
else
echo "Invalid model provided. Please use one of {faster_rcnn, mask_rcnn}"
exit 1
fi
python -u ../infer.py \
$mask_on \
--pretrained_model=../output/model_iter179999 \
--image_path=../dataset/coco/val2017/ \
--image_name=000000000139.jpg \
--draw_threshold=0.6
#!/bin/bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
model=$1 # faster_rcnn, mask_rcnn
if [ "$model" = "faster_rcnn" ]; then
mask_on="--MASK_ON False"
elif [ "$model" = "mask_rcnn" ]; then
mask_on="--MASK_ON True"
else
echo "Invalid model provided. Please use one of {faster_rcnn, mask_rcnn}"
exit 1
fi
python -u ../train.py \
$mask_on \
--model_save_dir=../output/ \
--pretrained_model=../imagenet_resnet50_fusebn/ \
--data_dir=../dataset/coco/ \
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://w_idxw.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# Detectron
# Copyright (c) 2017-present, Facebook, Inc.
# Licensed under the Apache License, Version 2.0;
# Written by Ross Girshick
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import pycocotools.mask as mask_util
import cv2
def is_poly(segm):
"""Determine if segm is a polygon. Valid segm expected (polygon or RLE)."""
assert isinstance(segm, (list, dict)), \
'Invalid segm type: {}'.format(type(segm))
return isinstance(segm, list)
def segms_to_rle(segms, height, width):
rle = segms
if isinstance(segms, list):
# polygon -- a single object might consist of multiple parts
# we merge all parts into one mask rle code
rles = mask_util.frPyObjects(segms, height, width)
rle = mask_util.merge(rles)
elif isinstance(segms['counts'], list):
# uncompressed RLE
rle = mask_util.frPyObjects(segms, height, width)
return rle
def segms_to_mask(segms, iscrowd, height, width):
print('segms: ', segms)
if iscrowd:
return [[0 for i in range(width)] for j in range(height)]
rle = segms_to_rle(segms, height, width)
mask = mask_util.decode(rle)
return mask
def flip_segms(segms, height, width):
"""Left/right flip each mask in a list of masks."""
def _flip_poly(poly, width):
flipped_poly = np.array(poly)
flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
return flipped_poly.tolist()
def _flip_rle(rle, height, width):
if 'counts' in rle and type(rle['counts']) == list:
# Magic RLE format handling painfully discovered by looking at the
# COCO API showAnns function.
rle = mask_util.frPyObjects([rle], height, width)
mask = mask_util.decode(rle)
mask = mask[:, ::-1, :]
rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
return rle
flipped_segms = []
for segm in segms:
if is_poly(segm):
# Polygon format
flipped_segms.append([_flip_poly(poly, width) for poly in segm])
else:
# RLE format
flipped_segms.append(_flip_rle(segm, height, width))
return flipped_segms
...@@ -20,7 +20,8 @@ import sys ...@@ -20,7 +20,8 @@ import sys
import numpy as np import numpy as np
import time import time
import shutil import shutil
from utility import parse_args, print_arguments, SmoothedValue from utility import parse_args, print_arguments, SmoothedValue, TrainingStats, now_time
import collections
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -35,7 +36,7 @@ def train(): ...@@ -35,7 +36,7 @@ def train():
learning_rate = cfg.learning_rate learning_rate = cfg.learning_rate
image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size] image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size]
if cfg.debug or cfg.enable_ce: if cfg.enable_ce:
fluid.default_startup_program().random_seed = 1000 fluid.default_startup_program().random_seed = 1000
fluid.default_main_program().random_seed = 1000 fluid.default_main_program().random_seed = 1000
import random import random
...@@ -49,36 +50,36 @@ def train(): ...@@ -49,36 +50,36 @@ def train():
use_random = True use_random = True
if cfg.enable_ce: if cfg.enable_ce:
use_random = False use_random = False
model = model_builder.FasterRCNN( model = model_builder.RCNN(
add_conv_body_func=resnet.add_ResNet50_conv4_body, add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head, add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=cfg.use_pyreader, use_pyreader=cfg.use_pyreader,
use_random=use_random) use_random=use_random)
model.build_model(image_shape) model.build_model(image_shape)
loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss = model.loss() losses, keys = model.loss()
loss_cls.persistable = True loss = losses[0]
loss_bbox.persistable = True fetch_list = losses
rpn_cls_loss.persistable = True
rpn_reg_loss.persistable = True
loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
boundaries = cfg.lr_steps boundaries = cfg.lr_steps
gamma = cfg.lr_gamma gamma = cfg.lr_gamma
step_num = len(cfg.lr_steps) step_num = len(cfg.lr_steps)
values = [learning_rate * (gamma**i) for i in range(step_num + 1)] values = [learning_rate * (gamma**i) for i in range(step_num + 1)]
optimizer = fluid.optimizer.Momentum( lr = exponential_with_warmup_decay(
learning_rate=exponential_with_warmup_decay(
learning_rate=learning_rate, learning_rate=learning_rate,
boundaries=boundaries, boundaries=boundaries,
values=values, values=values,
warmup_iter=cfg.warm_up_iter, warmup_iter=cfg.warm_up_iter,
warmup_factor=cfg.warm_up_factor), warmup_factor=cfg.warm_up_factor)
optimizer = fluid.optimizer.Momentum(
learning_rate=lr,
regularization=fluid.regularizer.L2Decay(cfg.weight_decay), regularization=fluid.regularizer.L2Decay(cfg.weight_decay),
momentum=cfg.momentum) momentum=cfg.momentum)
optimizer.minimize(loss) optimizer.minimize(loss)
fetch_list = fetch_list + [lr]
fluid.memory_optimize(fluid.default_main_program()) fluid.memory_optimize(
fluid.default_main_program(), skip_opt_set=set(fetch_list))
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace() place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
...@@ -107,7 +108,8 @@ def train(): ...@@ -107,7 +108,8 @@ def train():
py_reader = model.py_reader py_reader = model.py_reader
py_reader.decorate_paddle_reader(train_reader) py_reader.decorate_paddle_reader(train_reader)
else: else:
train_reader = reader.train(batch_size=total_batch_size, shuffle=shuffle) train_reader = reader.train(
batch_size=total_batch_size, shuffle=shuffle)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds()) feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
def save_model(postfix): def save_model(postfix):
...@@ -116,79 +118,64 @@ def train(): ...@@ -116,79 +118,64 @@ def train():
shutil.rmtree(model_path) shutil.rmtree(model_path)
fluid.io.save_persistables(exe, model_path) fluid.io.save_persistables(exe, model_path)
fetch_list = [loss, rpn_cls_loss, rpn_reg_loss, loss_cls, loss_bbox]
def train_loop_pyreader(): def train_loop_pyreader():
py_reader.start() py_reader.start()
smoothed_loss = SmoothedValue(cfg.log_window) train_stats = TrainingStats(cfg.log_window, keys)
try: try:
start_time = time.time() start_time = time.time()
prev_start_time = start_time prev_start_time = start_time
total_time = 0
last_loss = 0
every_pass_loss = []
for iter_id in range(cfg.max_iter): for iter_id in range(cfg.max_iter):
prev_start_time = start_time prev_start_time = start_time
start_time = time.time() start_time = time.time()
losses = train_exe.run(fetch_list=[v.name for v in fetch_list]) outs = train_exe.run(fetch_list=[v.name for v in fetch_list])
every_pass_loss.append(np.mean(np.array(losses[0]))) stats = {k: np.array(v).mean() for k, v in zip(keys, outs[:-1])}
smoothed_loss.add_value(np.mean(np.array(losses[0]))) train_stats.update(stats)
lr = np.array(fluid.global_scope().find_var('learning_rate') logs = train_stats.log()
.get_tensor()) strs = '{}, iter: {}, lr: {:.5f}, {}, time: {:.3f}'.format(
print("Iter {:d}, lr {:.6f}, loss {:.6f}, time {:.5f}".format( now_time(), iter_id,
iter_id, lr[0], np.mean(outs[-1]), logs, start_time - prev_start_time)
smoothed_loss.get_median_value( print(strs)
), start_time - prev_start_time))
end_time = time.time()
total_time += end_time - start_time
last_loss = np.mean(np.array(losses[0]))
sys.stdout.flush() sys.stdout.flush()
if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0: if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
save_model("model_iter{}".format(iter_id)) save_model("model_iter{}".format(iter_id))
# only for ce end_time = time.time()
total_time = end_time - start_time
last_loss = np.array(outs[0]).mean()
if cfg.enable_ce: if cfg.enable_ce:
gpu_num = devices_num gpu_num = devices_num
epoch_idx = iter_id + 1 epoch_idx = iter_id + 1
loss = last_loss loss = last_loss
print("kpis\teach_pass_duration_card%s\t%s" % print("kpis\teach_pass_duration_card%s\t%s" %
(gpu_num, total_time / epoch_idx)) (gpu_num, total_time / epoch_idx))
print("kpis\ttrain_loss_card%s\t%s" % print("kpis\ttrain_loss_card%s\t%s" % (gpu_num, loss))
(gpu_num, loss)) except (StopIteration, fluid.core.EOFException):
except fluid.core.EOFException:
py_reader.reset() py_reader.reset()
return np.mean(every_pass_loss)
def train_loop(): def train_loop():
start_time = time.time() start_time = time.time()
prev_start_time = start_time prev_start_time = start_time
start = start_time start = start_time
total_time = 0 train_stats = TrainingStats(cfg.log_window, keys)
last_loss = 0
every_pass_loss = []
smoothed_loss = SmoothedValue(cfg.log_window)
for iter_id, data in enumerate(train_reader()): for iter_id, data in enumerate(train_reader()):
prev_start_time = start_time prev_start_time = start_time
start_time = time.time() start_time = time.time()
losses = train_exe.run(fetch_list=[v.name for v in fetch_list], outs = train_exe.run(fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data)) feed=feeder.feed(data))
loss_v = np.mean(np.array(losses[0])) stats = {k: np.array(v).mean() for k, v in zip(keys, outs[:-1])}
every_pass_loss.append(loss_v) train_stats.update(stats)
smoothed_loss.add_value(loss_v) logs = train_stats.log()
lr = np.array(fluid.global_scope().find_var('learning_rate') strs = '{}, iter: {}, lr: {:.5f}, {}, time: {:.3f}'.format(
.get_tensor()) now_time(), iter_id,
end_time = time.time() np.mean(outs[-1]), logs, start_time - prev_start_time)
total_time += end_time - start_time print(strs)
last_loss = loss_v
print("Iter {:d}, lr {:.6f}, loss {:.6f}, time {:.5f}".format(
iter_id, lr[0],
smoothed_loss.get_median_value(), start_time - prev_start_time))
sys.stdout.flush() sys.stdout.flush()
if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0: if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
save_model("model_iter{}".format(iter_id)) save_model("model_iter{}".format(iter_id))
if (iter_id + 1) == cfg.max_iter: if (iter_id + 1) == cfg.max_iter:
break break
end_time = time.time()
total_time = end_time - start_time
last_loss = np.array(outs[0]).mean()
# only for ce # only for ce
if cfg.enable_ce: if cfg.enable_ce:
gpu_num = devices_num gpu_num = devices_num
...@@ -196,8 +183,7 @@ def train(): ...@@ -196,8 +183,7 @@ def train():
loss = last_loss loss = last_loss
print("kpis\teach_pass_duration_card%s\t%s" % print("kpis\teach_pass_duration_card%s\t%s" %
(gpu_num, total_time / epoch_idx)) (gpu_num, total_time / epoch_idx))
print("kpis\ttrain_loss_card%s\t%s" % print("kpis\ttrain_loss_card%s\t%s" % (gpu_num, loss))
(gpu_num, loss))
return np.mean(every_pass_loss) return np.mean(every_pass_loss)
......
...@@ -22,7 +22,9 @@ import sys ...@@ -22,7 +22,9 @@ import sys
import distutils.util import distutils.util
import numpy as np import numpy as np
import six import six
import collections
from collections import deque from collections import deque
import datetime
from paddle.fluid import core from paddle.fluid import core
import argparse import argparse
import functools import functools
...@@ -85,6 +87,37 @@ class SmoothedValue(object): ...@@ -85,6 +87,37 @@ class SmoothedValue(object):
return np.median(self.deque) return np.median(self.deque)
def now_time():
return datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')
class TrainingStats(object):
def __init__(self, window_size, stats_keys):
self.smoothed_losses_and_metrics = {
key: SmoothedValue(window_size)
for key in stats_keys
}
def update(self, stats):
for k, v in self.smoothed_losses_and_metrics.items():
v.add_value(stats[k])
def get(self, extras=None):
stats = collections.OrderedDict()
if extras:
for k, v in extras.items():
stats[k] = v
for k, v in self.smoothed_losses_and_metrics.items():
stats[k] = round(v.get_median_value(), 3)
return stats
def log(self, extras=None):
d = self.get(extras)
strs = ', '.join(str(dict({x: y})).strip('{}') for x, y in d.items())
return strs
def parse_args(): def parse_args():
"""return all args """return all args
""" """
...@@ -108,14 +141,15 @@ def parse_args(): ...@@ -108,14 +141,15 @@ def parse_args():
add_arg('learning_rate', float, 0.01, "Learning rate.") add_arg('learning_rate', float, 0.01, "Learning rate.")
add_arg('max_iter', int, 180000, "Iter number.") add_arg('max_iter', int, 180000, "Iter number.")
add_arg('log_window', int, 20, "Log smooth window, set 1 for debug, set 20 for train.") add_arg('log_window', int, 20, "Log smooth window, set 1 for debug, set 20 for train.")
# FAST RCNN # RCNN
# RPN # RPN
add_arg('anchor_sizes', int, [32,64,128,256,512], "The size of anchors.") add_arg('anchor_sizes', int, [32,64,128,256,512], "The size of anchors.")
add_arg('aspect_ratios', float, [0.5,1.0,2.0], "The ratio of anchors.") add_arg('aspect_ratios', float, [0.5,1.0,2.0], "The ratio of anchors.")
add_arg('variance', float, [1.,1.,1.,1.], "The variance of anchors.") add_arg('variance', float, [1.,1.,1.,1.], "The variance of anchors.")
add_arg('rpn_stride', float, [16.,16.], "Stride of the feature map that RPN is attached.") add_arg('rpn_stride', float, [16.,16.], "Stride of the feature map that RPN is attached.")
add_arg('rpn_nms_thresh', float, 0.7, "NMS threshold used on RPN proposals") add_arg('rpn_nms_thresh', float, 0.7, "NMS threshold used on RPN proposals")
# TRAIN TEST INFER # TRAIN VAL INFER
add_arg('MASK_ON', bool, False, "Option for different models. If False, choose faster_rcnn. If True, choose mask_rcnn")
add_arg('im_per_batch', int, 1, "Minibatch size.") add_arg('im_per_batch', int, 1, "Minibatch size.")
add_arg('max_size', int, 1333, "The resized image height.") add_arg('max_size', int, 1333, "The resized image height.")
add_arg('scales', int, [800], "The resized image height.") add_arg('scales', int, [800], "The resized image height.")
...@@ -124,7 +158,6 @@ def parse_args(): ...@@ -124,7 +158,6 @@ def parse_args():
add_arg('nms_thresh', float, 0.5, "NMS threshold.") add_arg('nms_thresh', float, 0.5, "NMS threshold.")
add_arg('score_thresh', float, 0.05, "score threshold for NMS.") add_arg('score_thresh', float, 0.05, "score threshold for NMS.")
add_arg('snapshot_stride', int, 10000, "save model every snapshot stride.") add_arg('snapshot_stride', int, 10000, "save model every snapshot stride.")
add_arg('debug', bool, False, "Debug mode")
# SINGLE EVAL AND DRAW # SINGLE EVAL AND DRAW
add_arg('draw_threshold', float, 0.8, "Confidence threshold to draw bbox.") add_arg('draw_threshold', float, 0.8, "Confidence threshold to draw bbox.")
add_arg('image_path', str, 'dataset/coco/val2017', "The image path used to inference and visualize.") add_arg('image_path', str, 'dataset/coco/val2017', "The image path used to inference and visualize.")
...@@ -138,5 +171,5 @@ def parse_args(): ...@@ -138,5 +171,5 @@ def parse_args():
if 'train' in file_name or 'profile' in file_name: if 'train' in file_name or 'profile' in file_name:
merge_cfg_from_args(args, 'train') merge_cfg_from_args(args, 'train')
else: else:
merge_cfg_from_args(args, 'test') merge_cfg_from_args(args, 'val')
return args return args
checkpoints
output*
*.pyc
*.swp
*_result
## 简介
本教程期望给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型。目前包含视频分类模型,后续会不断的扩展到其他更多场景。
目前视频分类模型包括:
| 模型 | 类别 | 描述 |
| :--------------- | :--------: | :------------: |
| [Attention Cluster](./models/attention_cluster/README.md) | 视频分类| CVPR'18提出的视频多模态特征注意力聚簇融合方法 |
| [Attention LSTM](./models/attention_lstm/README.md) | 视频分类| 常用模型,速度快精度高 |
| [NeXtVLAD](./models/nextvlad/README.md) | 视频分类| 2nd-Youtube-8M最优单模型 |
| [StNet](./models/stnet/README.md) | 视频分类| AAAI'19提出的视频联合时空建模方法 |
| [TSN](./models/tsn/README.md) | 视频分类| ECCV'16提出的基于2D-CNN经典解决方案 |
### 主要特点
- 包含视频分类方向的多个主流领先模型,其中Attention LSTM,Attention Cluster和NeXtVLAD是比较流行的特征序列模型,TSN和StNet是两个End-to-End的视频分类模型。Attention LSTM模型速度快精度高,NeXtVLAD是2nd-Youtube-8M比赛中最好的单模型, TSN是基于2D-CNN的经典解决方案。Attention Cluster和StNet是百度自研模型,分别发表于CVPR2018和AAAI2019,是Kinetics600比赛第一名中使用到的模型。
- 提供了适合视频分类任务的通用骨架代码,用户可一键式高效配置模型完成训练和评测。
## 安装
在当前模型库运行样例代码需要PadddlePaddle Fluid v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。
## 数据准备
视频模型库使用Youtube-8M和Kinetics数据集, 具体使用方法请参考[数据说明](./dataset/README.md)
## 快速使用
视频模型库提供通用的train/test/infer框架,通过`train.py/test.py/infer.py`指定模型名、模型配置参数等可一键式进行训练和预测。
以StNet模型为例:
单卡训练:
``` bash
export CUDA_VISIBLE_DEVICES=0
python train.py --model-name=STNET
--config=./configs/stnet.txt
--save-dir=checkpoints
```
多卡训练:
``` bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model-name=STNET
--config=./configs/stnet.txt
--save-dir=checkpoints
```
视频模型库同时提供了快速训练脚本,脚本位于`scripts/train`目录下,可通过如下命令启动训练:
``` bash
bash scripts/train/train_stnet.sh
```
- 请根据`CUDA_VISIBLE_DEVICES`指定卡数修改`config`文件中的`num_gpus``batch_size`配置。
## 模型库结构
### 代码结构
```
configs/
stnet.txt
tsn.txt
...
dataset/
youtube/
kinetics/
datareader/
feature_readeer.py
kinetics_reader.py
...
metrics/
kinetics/
youtube8m/
...
models/
stnet/
tsn/
...
scripts/
train/
test/
train.py
test.py
infer.py
```
- `configs`: 各模型配置文件模板
- `datareader`: 提供Youtube-8M,Kinetics数据集reader
- `metrics`: Youtube-8,Kinetics数据集评估脚本
- `models`: 各模型网络结构构建脚本
- `scripts`: 各模型快速训练评估脚本
- `train.py`: 一键式训练脚本,可通过指定模型名,配置文件等一键式启动训练
- `test.py`: 一键式评估脚本,可通过指定模型名,配置文件,模型权重等一键式启动评估
- `infer.py`: 一键式推断脚本,可通过指定模型名,配置文件,模型权重,待推断文件列表等一键式启动推断
## Model Zoo
- 基于Youtube-8M数据集模型:
| 模型 | Batch Size | 环境配置 | cuDNN版本 | GAP | 下载链接 |
| :-------: | :---: | :---------: | :-----: | :----: | :----------: |
| Attention Cluster | 2048 | 8卡P40 | 7.1 | 0.84 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz) |
| Attention LSTM | 1024 | 8卡P40 | 7.1 | 0.86 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz) |
| NeXtVLAD | 160 | 4卡P40 | 7.1 | 0.87 | [model](https://paddlemodels.bj.bcebos.com/video_classification/nextvlad_youtube8m.tar.gz) |
- 基于Kinetics数据集模型:
| 模型 | Batch Size | 环境配置 | cuDNN版本 | Top-1 | 下载链接 |
| :-------: | :---: | :---------: | :----: | :----: | :----------: |
| StNet | 128 | 8卡P40 | 5.1 | 0.69 | [model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz) |
| TSN | 256 | 8卡P40 | 7.1 | 0.67 | [model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz) |
## 参考文献
- [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550), Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
- [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan
- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
## 版本更新
- 3/2019: 新增模型库,发布Attention Cluster,Attention LSTM,NeXtVLAD,StNet,TSN五个视频分类模型。
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
try:
from configparser import ConfigParser
except:
from ConfigParser import ConfigParser
from utils import AttrDict
CONFIG_SECS = [
'train',
'valid',
'test',
'infer',
]
def parse_config(cfg_file):
parser = ConfigParser()
cfg = AttrDict()
parser.read(cfg_file)
for sec in parser.sections():
sec_dict = AttrDict()
for k, v in parser.items(sec):
try:
v = eval(v)
except:
pass
setattr(sec_dict, k, v)
setattr(cfg, sec.upper(), sec_dict)
return cfg
def merge_configs(cfg, sec, args_dict):
assert sec in CONFIG_SECS, "invalid config section {}".format(sec)
sec_dict = getattr(cfg, sec.upper())
for k, v in args_dict.items():
if v is None:
continue
try:
if hasattr(sec_dict, k):
setattr(sec_dict, k, v)
except:
pass
return cfg
[MODEL]
name = "AttentionCluster"
dataset = "YouTube-8M"
bone_network = None
drop_rate = 0.5
feature_num = 2
feature_names = ['rgb', 'audio']
feature_dims = [1024, 128]
seg_num = 100
cluster_nums = [32, 32]
num_classes = 3862
topk = 20
[TRAIN]
epoch = 5
learning_rate = 0.001
pretrain_base = None
batch_size = 2048
use_gpu = True
num_gpus = 8
filelist = "dataset/youtube8m/train.list"
[VALID]
batch_size = 2048
filelist = "dataset/youtube8m/val.list"
[TEST]
batch_size = 256
filelist = "dataset/youtube8m/test.list"
[INFER]
batch_size = 1
filelist = "dataset/youtube8m/infer.list"
[MODEL]
name = "AttentionLSTM"
dataset = "YouTube-8M"
bone_nework = None
drop_rate = 0.5
feature_num = 2
feature_names = ['rgb', 'audio']
feature_dims = [1024, 128]
embedding_size = 512
lstm_size = 1024
num_classes = 3862
topk = 20
[TRAIN]
epoch = 10
learning_rate = 0.001
decay_epochs = [5]
decay_gamma = 0.1
weight_decay = 0.0008
num_samples = 5000000
pretrain_base = None
batch_size = 1024
use_gpu = True
num_gpus = 8
filelist = "dataset/youtube8m/train.list"
[VALID]
batch_size = 1024
filelist = "dataset/youtube8m/val.list"
[TEST]
batch_size = 128
filelist = "dataset/youtube8m/test.list"
[INFER]
batch_size = 1
filelist = "dataset/youtube8m/infer.list"
[MODEL]
name = "NEXTVLAD"
num_classes = 3862
topk = 20
video_feature_size = 1024
audio_feature_size = 128
cluster_size = 128
hidden_size = 2048
groups = 8
expansion = 2
drop_rate = 0.5
gating_reduction = 8
eigen_file = "./dataset/youtube8m/yt8m_pca/eigenvals.npy"
[TRAIN]
epoch = 6
learning_rate = 0.0002
lr_boundary_examples = 2000000
max_iter = 700000
learning_rate_decay = 0.8
l2_penalty = 1e-5
gradient_clip_norm = 1.0
use_gpu = True
num_gpus = 4
batch_size = 160
filelist = "./dataset/youtube8m/train.list"
[VALID]
batch_size = 160
filelist = "./dataset/youtube8m/val.list"
[TEST]
batch_size = 40
filelist = "./dataset/youtube8m/test.list"
[INFER]
batch_size = 1
filelist = "./dataset/youtube8m/infer.list"
[MODEL]
name = "STNET"
format = "pkl"
num_classes = 400
seg_num = 7
seglen = 5
image_mean = [0.485, 0.456, 0.406]
image_std = [0.229, 0.224, 0.225]
num_layers = 50
[TRAIN]
epoch = 60
short_size = 256
target_size = 224
num_reader_threads = 12
buf_size = 1024
batch_size = 128
num_gpus = 8
use_gpu = True
filelist = "./dataset/kinetics/train.list"
learning_rate = 0.01
learning_rate_decay = 0.1
l2_weight_decay = 1e-4
momentum = 0.9
total_videos = 224684
pretrain_base = "./dataset/pretrained/ResNet50_pretrained"
[VALID]
short_size = 256
target_size = 224
num_reader_threads = 12
buf_size = 1024
batch_size = 128
filelist = "./dataset/kinetics/val.list"
[TEST]
short_size = 256
target_size = 256
num_reader_threads = 12
buf_size = 1024
batch_size = 16
filelist = "./dataset/kinetics/test.list"
[INFER]
short_size = 256
target_size = 256
num_reader_threads = 12
buf_size = 1024
batch_size = 1
filelist = "./dataset/kinetics/infer.list"
[MODEL]
name = "TSN"
format = "pkl"
num_classes = 400
seg_num = 3
seglen = 1
image_mean = [0.485, 0.456, 0.406]
image_std = [0.229, 0.224, 0.225]
num_layers = 50
[TRAIN]
epoch = 45
short_size = 256
target_size = 224
num_reader_threads = 12
buf_size = 1024
batch_size = 256
use_gpu = True
num_gpus = 8
filelist = "./dataset/kinetics/train.list"
learning_rate = 0.01
learning_rate_decay = 0.1
l2_weight_decay = 1e-4
momentum = 0.9
total_videos = 224684
[VALID]
short_size = 256
target_size = 224
num_reader_threads = 12
buf_size = 1024
batch_size = 256
filelist = "./dataset/kinetics/val.list"
[TEST]
short_size = 256
target_size = 224
num_reader_threads = 12
buf_size = 1024
batch_size = 32
filelist = "./dataset/kinetics/test.list"
[INFER]
short_size = 256
target_size = 224
num_reader_threads = 12
buf_size = 1024
batch_size = 1
filelist = "./dataset/kinetics/infer.list"
from .reader_utils import regist_reader, get_reader
from .feature_reader import FeatureReader
from .kinetics_reader import KineticsReader
from .nonlocal_reader import NonlocalReader
regist_reader("ATTENTIONCLUSTER", FeatureReader)
regist_reader("NEXTVLAD", FeatureReader)
regist_reader("ATTENTIONLSTM", FeatureReader)
regist_reader("TSN", KineticsReader)
regist_reader("TSM", KineticsReader)
regist_reader("STNET", KineticsReader)
regist_reader("NONLOCAL", NonlocalReader)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import sys
from .reader_utils import DataReader
try:
import cPickle as pickle
from cStringIO import StringIO
except ImportError:
import pickle
from io import BytesIO
import numpy as np
import random
python_ver = sys.version_info
class FeatureReader(DataReader):
"""
Data reader for youtube-8M dataset, which was stored as features extracted by prior networks
This is for the three models: lstm, attention cluster, nextvlad
dataset cfg: num_classes
batch_size
list
NextVlad only: eigen_file
"""
def __init__(self, name, mode, cfg):
self.name = name
self.mode = mode
self.num_classes = cfg.MODEL.num_classes
# set batch size and file list
self.batch_size = cfg[mode.upper()]['batch_size']
self.filelist = cfg[mode.upper()]['filelist']
self.eigen_file = cfg.MODEL.get('eigen_file', None)
self.seg_num = cfg.MODEL.get('seg_num', None)
def create_reader(self):
fl = open(self.filelist).readlines()
fl = [line.strip() for line in fl if line.strip() != '']
if self.mode == 'train':
random.shuffle(fl)
def reader():
batch_out = []
for filepath in fl:
if python_ver < (3, 0):
data = pickle.load(open(filepath, 'rb'))
else:
data = pickle.load(open(filepath, 'rb'), encoding='bytes')
indexes = list(range(len(data)))
if self.mode == 'train':
random.shuffle(indexes)
for i in indexes:
record = data[i]
nframes = record[b'nframes']
rgb = record[b'feature'].astype(float)
audio = record[b'audio'].astype(float)
if self.mode != 'infer':
label = record[b'label']
one_hot_label = make_one_hot(label, self.num_classes)
video = record[b'video']
rgb = rgb[0:nframes, :]
audio = audio[0:nframes, :]
rgb = dequantize(
rgb, max_quantized_value=2., min_quantized_value=-2.)
audio = dequantize(
audio, max_quantized_value=2, min_quantized_value=-2)
if self.name == 'NEXTVLAD':
# add the effect of eigen values
eigen_file = self.eigen_file
eigen_val = np.sqrt(np.load(eigen_file)
[:1024, 0]).astype(np.float32)
eigen_val = eigen_val + 1e-4
rgb = (rgb - 4. / 512) * eigen_val
if self.name == 'ATTENTIONCLUSTER':
sample_inds = generate_random_idx(rgb.shape[0],
self.seg_num)
rgb = rgb[sample_inds]
audio = audio[sample_inds]
if self.mode != 'infer':
batch_out.append((rgb, audio, one_hot_label))
else:
batch_out.append((rgb, audio, video))
if len(batch_out) == self.batch_size:
yield batch_out
batch_out = []
return reader
def dequantize(feat_vector, max_quantized_value=2., min_quantized_value=-2.):
"""
Dequantize the feature from the byte format to the float format
"""
assert max_quantized_value > min_quantized_value
quantized_range = max_quantized_value - min_quantized_value
scalar = quantized_range / 255.0
bias = (quantized_range / 512.0) + min_quantized_value
return feat_vector * scalar + bias
def make_one_hot(label, dim=3862):
one_hot_label = np.zeros(dim)
one_hot_label = one_hot_label.astype(float)
for ind in label:
one_hot_label[int(ind)] = 1
return one_hot_label
def generate_random_idx(feature_len, seg_num):
idxs = []
stride = float(feature_len) / seg_num
for i in range(seg_num):
pos = (i + np.random.random()) * stride
idxs.append(min(feature_len - 1, int(pos)))
return idxs
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import math
import random
import functools
try:
import cPickle as pickle
from cStringIO import StringIO
except ImportError:
import pickle
from io import BytesIO
import numpy as np
import paddle
from PIL import Image, ImageEnhance
import logging
from .reader_utils import DataReader
logger = logging.getLogger(__name__)
python_ver = sys.version_info
class KineticsReader(DataReader):
"""
Data reader for kinetics dataset of two format mp4 and pkl.
1. mp4, the original format of kinetics400
2. pkl, the mp4 was decoded previously and stored as pkl
In both case, load the data, and then get the frame data in the form of numpy and label as an integer.
dataset cfg: format
num_classes
seg_num
short_size
target_size
num_reader_threads
buf_size
image_mean
image_std
batch_size
list
"""
def __init__(self, name, mode, cfg):
self.name = name
self.mode = mode
self.format = cfg.MODEL.format
self.num_classes = cfg.MODEL.num_classes
self.seg_num = cfg.MODEL.seg_num
self.seglen = cfg.MODEL.seglen
self.short_size = cfg[mode.upper()]['short_size']
self.target_size = cfg[mode.upper()]['target_size']
self.num_reader_threads = cfg[mode.upper()]['num_reader_threads']
self.buf_size = cfg[mode.upper()]['buf_size']
self.img_mean = np.array(cfg.MODEL.image_mean).reshape(
[3, 1, 1]).astype(np.float32)
self.img_std = np.array(cfg.MODEL.image_std).reshape(
[3, 1, 1]).astype(np.float32)
# set batch size and file list
self.batch_size = cfg[mode.upper()]['batch_size']
self.filelist = cfg[mode.upper()]['filelist']
def create_reader(self):
_reader = _reader_creator(self.filelist, self.mode, seg_num=self.seg_num, seglen = self.seglen, \
short_size = self.short_size, target_size = self.target_size, \
img_mean = self.img_mean, img_std = self.img_std, \
shuffle = (self.mode == 'train'), \
num_threads = self.num_reader_threads, \
buf_size = self.buf_size, format = self.format)
def _batch_reader():
batch_out = []
for imgs, label in _reader():
if imgs is None:
continue
batch_out.append((imgs, label))
if len(batch_out) == self.batch_size:
yield batch_out
batch_out = []
return _batch_reader
def _reader_creator(pickle_list,
mode,
seg_num,
seglen,
short_size,
target_size,
img_mean,
img_std,
shuffle=False,
num_threads=1,
buf_size=1024,
format='pkl'):
def reader():
with open(pickle_list) as flist:
lines = [line.strip() for line in flist]
if shuffle:
random.shuffle(lines)
for line in lines:
pickle_path = line.strip()
yield [pickle_path]
if format == 'pkl':
decode_func = decode_pickle
elif format == 'mp4':
decode_func = decode_mp4
else:
raise "Not implemented format {}".format(format)
mapper = functools.partial(
decode_func,
mode=mode,
seg_num=seg_num,
seglen=seglen,
short_size=short_size,
target_size=target_size,
img_mean=img_mean,
img_std=img_std)
return paddle.reader.xmap_readers(mapper, reader, num_threads, buf_size)
def decode_mp4(sample, mode, seg_num, seglen, short_size, target_size, img_mean,
img_std):
sample = sample[0].split(' ')
mp4_path = sample[0]
# when infer, we store vid as label
label = int(sample[1])
try:
imgs = mp4_loader(mp4_path, seg_num, seglen, mode)
if len(imgs) < 1:
logger.error('{} frame length {} less than 1.'.format(mp4_path,
len(imgs)))
return None, None
except:
logger.error('Error when loading {}'.format(mp4_path))
return None, None
return imgs_transform(imgs, label, mode, seg_num, seglen, \
short_size, target_size, img_mean, img_std)
def decode_pickle(sample, mode, seg_num, seglen, short_size, target_size,
img_mean, img_std):
pickle_path = sample[0]
try:
if python_ver < (3, 0):
data_loaded = pickle.load(open(pickle_path, 'rb'))
else:
data_loaded = pickle.load(open(pickle_path, 'rb'), encoding='bytes')
vid, label, frames = data_loaded
if len(frames) < 1:
logger.error('{} frame length {} less than 1.'.format(pickle_path,
len(frames)))
return None, None
except:
logger.info('Error when loading {}'.format(pickle_path))
return None, None
if mode == 'train' or mode == 'valid' or mode == 'test':
ret_label = label
elif mode == 'infer':
ret_label = vid
imgs = video_loader(frames, seg_num, seglen, mode)
return imgs_transform(imgs, ret_label, mode, seg_num, seglen, \
short_size, target_size, img_mean, img_std)
def imgs_transform(imgs, label, mode, seg_num, seglen, short_size, target_size,
img_mean, img_std):
imgs = group_scale(imgs, short_size)
if mode == 'train':
imgs = group_random_crop(imgs, target_size)
imgs = group_random_flip(imgs)
else:
imgs = group_center_crop(imgs, target_size)
np_imgs = (np.array(imgs[0]).astype('float32').transpose(
(2, 0, 1))).reshape(1, 3, target_size, target_size) / 255
for i in range(len(imgs) - 1):
img = (np.array(imgs[i + 1]).astype('float32').transpose(
(2, 0, 1))).reshape(1, 3, target_size, target_size) / 255
np_imgs = np.concatenate((np_imgs, img))
imgs = np_imgs
imgs -= img_mean
imgs /= img_std
imgs = np.reshape(imgs, (seg_num, seglen * 3, target_size, target_size))
return imgs, label
def group_random_crop(img_group, target_size):
w, h = img_group[0].size
th, tw = target_size, target_size
assert (w >= target_size) and (h >= target_size), \
"image width({}) and height({}) should be larger than crop size".format(w, h, target_size)
out_images = []
x1 = random.randint(0, w - tw)
y1 = random.randint(0, h - th)
for img in img_group:
if w == tw and h == th:
out_images.append(img)
else:
out_images.append(img.crop((x1, y1, x1 + tw, y1 + th)))
return out_images
def group_random_flip(img_group):
v = random.random()
if v < 0.5:
ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group]
return ret
else:
return img_group
def group_center_crop(img_group, target_size):
img_crop = []
for img in img_group:
w, h = img.size
th, tw = target_size, target_size
assert (w >= target_size) and (h >= target_size), \
"image width({}) and height({}) should be larger than crop size".format(w, h, target_size)
x1 = int(round((w - tw) / 2.))
y1 = int(round((h - th) / 2.))
img_crop.append(img.crop((x1, y1, x1 + tw, y1 + th)))
return img_crop
def group_scale(imgs, target_size):
resized_imgs = []
for i in range(len(imgs)):
img = imgs[i]
w, h = img.size
if (w <= h and w == target_size) or (h <= w and h == target_size):
resized_imgs.append(img)
continue
if w < h:
ow = target_size
oh = int(target_size * 4.0 / 3.0)
resized_imgs.append(img.resize((ow, oh), Image.BILINEAR))
else:
oh = target_size
ow = int(target_size * 4.0 / 3.0)
resized_imgs.append(img.resize((ow, oh), Image.BILINEAR))
return resized_imgs
def imageloader(buf):
if isinstance(buf, str):
img = Image.open(StringIO(buf))
else:
img = Image.open(BytesIO(buf))
return img.convert('RGB')
def video_loader(frames, nsample, seglen, mode):
videolen = len(frames)
average_dur = int(videolen / nsample)
imgs = []
for i in range(nsample):
idx = 0
if mode == 'train':
if average_dur >= seglen:
idx = random.randint(0, average_dur - seglen)
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
else:
if average_dur >= seglen:
idx = (average_dur - seglen) // 2
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
for jj in range(idx, idx + seglen):
imgbuf = frames[int(jj % videolen)]
img = imageloader(imgbuf)
imgs.append(img)
return imgs
def mp4_loader(filepath, nsample, seglen, mode):
cap = cv2.VideoCapture(filepath)
videolen = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
average_dur = int(videolen / nsample)
sampledFrames = []
for i in range(videolen):
ret, frame = cap.read()
# maybe first frame is empty
if ret == False:
continue
img = frame[:, :, ::-1]
sampledFrames.append(img)
imgs = []
for i in range(nsample):
idx = 0
if mode == 'train':
if average_dur >= seglen:
idx = random.randint(0, average_dur - seglen)
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
else:
if average_dur >= seglen:
idx = (average_dur - 1) // 2
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
for jj in range(idx, idx + seglen):
imgbuf = sampledFrames[int(jj % videolen)]
img = Image.fromarray(imgbuf, mode='RGB')
imgs.append(img)
return imgs
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import random
import time
import multiprocessing
import numpy as np
import cv2
import logging
from .reader_utils import DataReader
logger = logging.getLogger(__name__)
class NonlocalReader(DataReader):
"""
Data reader for kinetics dataset, which read mp4 file and decode into numpy.
This is for nonlocal neural network model.
cfg: num_classes
num_reader_threads
image_mean
image_std
batch_size
list
crop_size
sample_rate
video_length
jitter_scales
Test only cfg: num_test_clips
use_multi_crop
"""
def __init__(self, name, mode, cfg):
self.name = name
self.mode = mode
self.cfg = cfg
def create_reader(self):
cfg = self.cfg
mode = self.mode
num_reader_threads = cfg[mode.upper()]['num_reader_threads']
assert num_reader_threads >=1, \
"number of reader threads({}) should be a positive integer".format(num_reader_threads)
if num_reader_threads == 1:
reader_func = make_reader
else:
reader_func = make_multi_reader
dataset_args = {}
dataset_args['image_mean'] = cfg.MODEL.image_mean
dataset_args['image_std'] = cfg.MODEL.image_std
dataset_args['crop_size'] = cfg[mode.upper()]['crop_size']
dataset_args['sample_rate'] = cfg[mode.upper()]['sample_rate']
dataset_args['video_length'] = cfg[mode.upper()]['video_length']
dataset_args['min_size'] = cfg[mode.upper()]['jitter_scales'][0]
dataset_args['max_size'] = cfg[mode.upper()]['jitter_scales'][1]
dataset_args['num_reader_threads'] = num_reader_threads
filelist = cfg[mode.upper()]['list']
batch_size = cfg[mode.upper()]['batch_size']
if self.mode == 'train':
sample_times = 1
return reader_func(filelist, batch_size, sample_times, True, True,
**dataset_args)
elif self.mode == 'valid':
sample_times = 1
return reader_func(filelist, batch_size, sample_times, False, False,
**dataset_args)
elif self.mode == 'test':
sample_times = cfg['TEST']['num_test_clips']
if cfg['TEST']['use_multi_crop'] == 1:
sample_times = int(sample_times / 3)
if cfg['TEST']['use_multi_crop'] == 2:
sample_times = int(sample_times / 6)
return reader_func(filelist, batch_size, sample_times, False, False,
**dataset_args)
else:
logger.info('Not implemented')
raise NotImplementedError
def video_fast_get_frame(video_path,
sampling_rate=1,
length=64,
start_frm=-1,
sample_times=1):
cap = cv2.VideoCapture(video_path)
frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
sampledFrames = []
video_output = np.ndarray(shape=[length, height, width, 3], dtype=np.uint8)
use_start_frm = start_frm
if start_frm < 0:
if (frame_cnt - length * sampling_rate > 0):
use_start_frm = random.randint(0,
frame_cnt - length * sampling_rate)
else:
use_start_frm = 0
else:
frame_gaps = float(frame_cnt) / float(sample_times)
use_start_frm = int(frame_gaps * start_frm) % frame_cnt
for i in range(frame_cnt):
ret, frame = cap.read()
# maybe first frame is empty
if ret == False:
continue
img = frame[:, :, ::-1]
sampledFrames.append(img)
for idx in range(length):
i = use_start_frm + idx * sampling_rate
i = i % len(sampledFrames)
video_output[idx] = sampledFrames[i]
cap.release()
return video_output
def apply_resize(rgbdata, min_size, max_size):
length, height, width, channel = rgbdata.shape
ratio = 1.0
# generate random scale between [min_size, max_size]
if min_size == max_size:
side_length = min_size
else:
side_length = np.random.randint(min_size, max_size)
if height > width:
ratio = float(side_length) / float(width)
else:
ratio = float(side_length) / float(height)
out_height = int(height * ratio)
out_width = int(width * ratio)
outdata = np.zeros(
(length, out_height, out_width, channel), dtype=rgbdata.dtype)
for i in range(length):
outdata[i] = cv2.resize(rgbdata[i], (out_width, out_height))
return outdata
def crop_mirror_transform(rgbdata,
mean,
std,
cropsize=224,
use_mirror=True,
center_crop=False,
spatial_pos=-1):
channel, length, height, width = rgbdata.shape
assert height >= cropsize, "crop size should not be larger than video height"
assert width >= cropsize, "crop size should not be larger than video width"
# crop to specific scale
if center_crop:
h_off = int((height - cropsize) / 2)
w_off = int((width - cropsize) / 2)
if spatial_pos >= 0:
now_pos = spatial_pos % 3
if h_off > 0:
h_off = h_off * now_pos
else:
w_off = w_off * now_pos
else:
h_off = np.random.randint(0, height - cropsize)
w_off = np.random.randint(0, width - cropsize)
outdata = np.zeros(
(channel, length, cropsize, cropsize), dtype=rgbdata.dtype)
outdata[:, :, :, :] = rgbdata[:, :, h_off:h_off + cropsize, w_off:w_off +
cropsize]
# apply mirror
mirror_indicator = (np.random.rand() > 0.5)
mirror_me = use_mirror and mirror_indicator
if spatial_pos > 0:
mirror_me = (int(spatial_pos / 3) > 0)
if mirror_me:
outdata = outdata[:, :, :, ::-1]
# substract mean and divide std
outdata = outdata.astype(np.float32)
outdata = (outdata - mean) / std
return outdata
def make_reader(filelist, batch_size, sample_times, is_training, shuffle,
**dataset_args):
# should add smaple_times param
fl = open(filelist).readlines()
fl = [line.strip() for line in fl if line.strip() != '']
if shuffle:
random.shuffle(fl)
def reader():
batch_out = []
for line in fl:
# start_time = time.time()
line_items = line.split(' ')
fn = line_items[0]
label = int(line_items[1])
if len(line_items) > 2:
start_frm = int(line_items[2])
spatial_pos = int(line_items[3])
in_sample_times = sample_times
else:
start_frm = -1
spatial_pos = -1
in_sample_times = 1
label = np.array([label]).astype(np.int64)
# 1, get rgb data for fixed length of frames
try:
rgbdata = video_fast_get_frame(fn, \
sampling_rate = dataset_args['sample_rate'], length = dataset_args['video_length'], \
start_frm = start_frm, sample_times = in_sample_times)
except:
logger.info('Error when loading {}, just skip this file'.format(
fn))
continue
# add prepocessing
# 2, reszie to randomly scale between [min_size, max_size] when training, or cgf.TEST.SCALE when inference
min_size = dataset_args['min_size']
max_size = dataset_args['max_size']
rgbdata = apply_resize(rgbdata, min_size, max_size)
# transform [length, height, width, channel] to [channel, length, height, width]
rgbdata = np.transpose(rgbdata, [3, 0, 1, 2])
# 3 crop, mirror and transform
rgbdata = crop_mirror_transform(rgbdata, mean = dataset_args['image_mean'], \
std = dataset_args['image_std'], cropsize = dataset_args['crop_size'], \
use_mirror = is_training, center_crop = (not is_training), \
spatial_pos = spatial_pos)
batch_out.append((rgbdata, label))
if len(batch_out) == batch_size:
yield batch_out
batch_out = []
return reader
def make_multi_reader(filelist, batch_size, sample_times, is_training, shuffle,
**dataset_args):
fl = open(filelist).readlines()
fl = [line.strip() for line in fl if line.strip() != '']
if shuffle:
random.shuffle(fl)
n = dataset_args['num_reader_threads']
queue_size = 20
reader_lists = [None] * n
file_num = int(len(fl) // n)
for i in range(n):
if i < len(reader_lists) - 1:
tmp_list = fl[i * file_num:(i + 1) * file_num]
else:
tmp_list = fl[i * file_num:]
reader_lists[i] = tmp_list
def read_into_queue(flq, queue):
batch_out = []
for line in flq:
line_items = line.split(' ')
fn = line_items[0]
label = int(line_items[1])
if len(line_items) > 2:
start_frm = int(line_items[2])
spatial_pos = int(line_items[3])
in_sample_times = sample_times
else:
start_frm = -1
spatial_pos = -1
in_sample_times = 1
label = np.array([label]).astype(np.int64)
# 1, get rgb data for fixed length of frames
try:
rgbdata = video_fast_get_frame(fn, \
sampling_rate = dataset_args['sample_rate'], length = dataset_args['video_length'], \
start_frm = start_frm, sample_times = in_sample_times)
except:
logger.info('Error when loading {}, just skip this file'.format(
fn))
continue
# add prepocessing
# 2, reszie to randomly scale between [min_size, max_size] when training, or cgf.TEST.SCALE when inference
min_size = dataset_args['min_size']
max_size = dataset_args['max_size']
rgbdata = apply_resize(rgbdata, min_size, max_size)
# transform [length, height, width, channel] to [channel, length, height, width]
rgbdata = np.transpose(rgbdata, [3, 0, 1, 2])
# 3 crop, mirror and transform
rgbdata = crop_mirror_transform(rgbdata, mean = dataset_args['image_mean'], \
std = dataset_args['image_std'], cropsize = dataset_args['crop_size'], \
use_mirror = is_training, center_crop = (not is_training), \
spatial_pos = spatial_pos)
batch_out.append((rgbdata, label))
if len(batch_out) == batch_size:
queue.put(batch_out)
batch_out = []
queue.put(None)
def queue_reader():
queue = multiprocessing.Queue(queue_size)
p_list = [None] * len(reader_lists)
# for reader_list in reader_lists:
for i in range(len(reader_lists)):
reader_list = reader_lists[i]
p_list[i] = multiprocessing.Process(
target=read_into_queue, args=(reader_list, queue))
p_list[i].start()
reader_num = len(reader_lists)
finish_num = 0
while finish_num < reader_num:
sample = queue.get()
if sample is None:
finish_num += 1
else:
yield sample
for i in range(len(p_list)):
p_list[i].terminate()
p_list[i].join()
return queue_reader
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import pickle
import cv2
import numpy as np
import random
class ReaderNotFoundError(Exception):
"Error: reader not found"
def __init__(self, reader_name, avail_readers):
super(ReaderNotFoundError, self).__init__()
self.reader_name = reader_name
self.avail_readers = avail_readers
def __str__(self):
msg = "Reader {} Not Found.\nAvailiable readers:\n".format(
self.reader_name)
for reader in self.avail_readers:
msg += " {}\n".format(reader)
return msg
class DataReader(object):
"""data reader for video input"""
def __init__(self, model_name, mode, cfg):
"""Not implemented"""
pass
def create_reader(self):
"""Not implemented"""
pass
class ReaderZoo(object):
def __init__(self):
self.reader_zoo = {}
def regist(self, name, reader):
assert reader.__base__ == DataReader, "Unknow model type {}".format(
type(reader))
self.reader_zoo[name] = reader
def get(self, name, mode, cfg):
for k, v in self.reader_zoo.items():
if k == name:
return v(name, mode, cfg)
raise ReaderNotFoundError(name, self.reader_zoo.keys())
# singleton reader_zoo
reader_zoo = ReaderZoo()
def regist_reader(name, reader):
reader_zoo.regist(name, reader)
def get_reader(name, mode, cfg):
reader_model = reader_zoo.get(name, mode, cfg)
return reader_model.create_reader()
# 数据使用说明
- [Youtube-8M](#Youtube-8M数据集)
- [Kinetics](#Kinetics数据集)
## Youtube-8M数据集
这里用到的是YouTube-8M 2018年更新之后的数据集。使用官方数据集,并将TFRecord文件转化为pickle文件以便PaddlePaddle使用。Youtube-8M数据集官方提供了frame-level和video-level的特征,这里只需使用到frame-level的特征。
### 数据下载
请使用Youtube-8M官方链接分别下载[训练集](http://us.data.yt8m.org/2/frame/train/index.html)[验证集](http://us.data.yt8m.org/2/frame/validate/index.html)。每个链接里各提供了3844个文件的下载地址,用户也可以使用官方提供的[下载脚本](https://research.google.com/youtube8m/download.html)下载数据。数据下载完成后,将会得到3844个训练数据文件和3844个验证数据文件(TFRecord格式)。
假设存放视频模型代码库的主目录为: Code\_Root,进入dataset/youtube8m目录
cd dataset/youtube8m
在youtube8m下新建目录tf/train和tf/val
mkdir tf && cd tf
mkdir train && mkdir val
并分别将下载的train和validate数据存放在其中。
### 数据格式转化
为了适用于PaddlePaddle训练,需要离线将下载好的TFRecord文件格式转成了pickle格式,转换脚本请使用[dataset/youtube8m/tf2pkl.py](./youtube8m/tf2pkl.py)
在dataset/youtube8m 目录下新建目录pkl/train和pkl/val
cd dataset/youtube8m
mkdir pkl && cd pkl
mkdir train && mkdir val
转化文件格式(TFRecord -> pkl),进入dataset/youtube8m目录,运行脚本
python tf2pkl.py ./tf/train ./pkl/train
python tf2pkl.py ./tf/val ./pkl/val
分别将train和validate数据集转化为pkl文件。tf2pkl.py文件运行时需要两个参数,分别是数据源tf文件存放路径和转化后的pkl文件存放路径。
备注:由于TFRecord文件的读取需要用到Tensorflow,用户要先安装Tensorflow,或者在安装有Tensorflow的环境中转化完数据,再拷贝到dataset/youtube8m/pkl目录下。为了避免和PaddlePaddle环境冲突,建议先在其他地方转化完成再将数据拷贝过来。
### 生成文件列表
进入dataset/youtube8m目录
ls $Code_Root/dataset/youtube8m/pkl/train/* > train.list
ls $Code_Root/dataset/youtube8m/pkl/val/* > val.list
在dataset/youtube8m目录下将生成两个文件,train.list和val.list,每一行分别保存了一个pkl文件的绝对路径。
## Kinetics数据集
Kinetics数据集是DeepMind公开的大规模视频动作识别数据集,有Kinetics400与Kinetics600两个版本。这里使用Kinetics400数据集,具体的数据预处理过程如下。
### mp4视频下载
在Code\_Root目录下创建文件夹
cd $Code_Root/dataset && mkdir kinetics
cd kinetics && mkdir data_k400 && cd data_k400
mkdir train_mp4 && mkdir val_mp4
ActivityNet官方提供了Kinetics的下载工具,具体参考其[官方repo ](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics)即可下载Kinetics400的mp4视频集合。将kinetics400的训练与验证集合分别下载到dataset/kinetics/data\_k400/train\_mp4与dataset/kinetics/data\_k400/val\_mp4。
### mp4文件预处理
为提高数据读取速度,提前将mp4文件解帧并打pickle包,dataloader从视频的pkl文件中读取数据(该方法耗费更多存储空间)。pkl文件里打包的内容为(video-id,[frame1, frame2,...,frameN],label)。
在 dataset/kinetics/data\_k400目录下创建目录train\_pkl和val\_pkl
cd $Code_Root/dataset/kinetics/data_k400
mkdir train_pkl && mkdir val_pkl
进入$Code\_Root/dataset/kinetics目录,使用video2pkl.py脚本进行数据转化。首先需要下载[train](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics/data/kinetics-400_train.csv)[validation](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics/data/kinetics-400_val.csv)数据集的文件列表。
首先生成预处理需要的数据集标签文件
python generate_label.py kinetics-400_train.csv kinetics400_label.txt
然后执行如下程序:
python video2pkl.py kinetics-400_train.csv $Source_dir $Target_dir 8 #以8个进程为例
- 该脚本依赖`ffmpeg`库,请预先安装`ffmpeg`
对于train数据,
Source_dir = $Code_Root/dataset/kinetics/data_k400/train_mp4
Target_dir = $Code_Root/dataset/kinetics/data_k400/train_pkl
对于val数据,
Source_dir = $Code_Root/dataset/kinetics/data_k400/val_mp4
Target_dir = $Code_Root/dataset/kinetics/data_k400/val_pkl
这样即可将mp4文件解码并保存为pkl文件。
### 生成训练和验证集list
cd $Code_Root/dataset/kinetics
ls $Code_Root/dataset/kinetics/data_k400/train_pkl /* > train.list
ls $Code_Root/dataset/kinetics/data_k400/val_pkl /* > val.list
即可生成相应的文件列表,train.list和val.list的每一行表示一个pkl文件的绝对路径。
import sys
# kinetics-400_train.csv should be down loaded first and set as sys.argv[1]
# sys.argv[2] can be set as kinetics400_label.txt
# python generate_label.py kinetics-400_train.csv kinetics400_label.txt
num_classes = 400
fname = sys.argv[1]
outname = sys.argv[2]
fl = open(fname).readlines()
fl = fl[1:]
outf = open(outname, 'w')
label_list = []
for line in fl:
label = line.strip().split(',')[0].strip('"')
if label in label_list:
continue
else:
label_list.append(label)
assert len(label_list
) == num_classes, "there should be {} labels in list, but ".format(
num_classes, len(label_list))
label_list.sort()
for i in range(num_classes):
outf.write('{} {}'.format(label_list[i], i) + '\n')
outf.close()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import glob
import cPickle
from multiprocessing import Pool
# example command line: python generate_k400_pkl.py kinetics-400_train.csv 8
#
# kinetics-400_train.csv is the training set file of K400 official release
# each line contains laebl,youtube_id,time_start,time_end,split,is_cc
assert (len(sys.argv) == 5)
f = open(sys.argv[1])
source_dir = sys.argv[2]
target_dir = sys.argv[3]
num_threads = sys.argv[4]
all_video_entries = [x.strip().split(',') for x in f.readlines()]
all_video_entries = all_video_entries[1:]
f.close()
category_label_map = {}
f = open('kinetics400_label.txt')
for line in f:
ens = line.strip().split(' ')
category = " ".join(ens[0:-1])
label = int(ens[-1])
category_label_map[category] = label
f.close()
def generate_pkl(entry):
mode = entry[4]
category = entry[0].strip('"')
category_dir = category
video_path = os.path.join(
'./',
entry[1] + "_%06d" % int(entry[2]) + "_%06d" % int(entry[3]) + ".mp4")
video_path = os.path.join(source_dir, category_dir, video_path)
label = category_label_map[category]
vid = './' + video_path.split('/')[-1].split('.')[0]
if os.path.exists(video_path):
if not os.path.exists(vid):
os.makedirs(vid)
os.system('ffmpeg -i ' + video_path + ' -q 0 ' + vid + '/%06d.jpg')
else:
print("File not exists {}".format(video_path))
return
images = sorted(glob.glob(vid + '/*.jpg'))
ims = []
for img in images:
f = open(img)
ims.append(f.read())
f.close()
output_pkl = vid + ".pkl"
output_pkl = os.path.join(target_dir, output_pkl)
f = open(output_pkl, 'w')
cPickle.dump((vid, label, ims), f, -1)
f.close()
os.system('rm -rf %s' % vid)
pool = Pool(processes=int(sys.argv[4]))
pool.map(generate_pkl, all_video_entries)
pool.close()
pool.join()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
"""Provides readers configured for different datasets."""
import os, sys
import numpy as np
import tensorflow as tf
from tensorflow import logging
import cPickle
from tensorflow.python.platform import gfile
assert (len(sys.argv) == 3)
source_dir = sys.argv[1]
target_dir = sys.argv[2]
def Dequantize(feat_vector, max_quantized_value=2, min_quantized_value=-2):
"""Dequantize the feature from the byte format to the float format.
Args:
feat_vector: the input 1-d vector.
max_quantized_value: the maximum of the quantized value.
min_quantized_value: the minimum of the quantized value.
Returns:
A float vector which has the same shape as feat_vector.
"""
assert max_quantized_value > min_quantized_value
quantized_range = max_quantized_value - min_quantized_value
scalar = quantized_range / 255.0
bias = (quantized_range / 512.0) + min_quantized_value
return feat_vector * scalar + bias
def resize_axis(tensor, axis, new_size, fill_value=0):
"""Truncates or pads a tensor to new_size on on a given axis.
Truncate or extend tensor such that tensor.shape[axis] == new_size. If the
size increases, the padding will be performed at the end, using fill_value.
Args:
tensor: The tensor to be resized.
axis: An integer representing the dimension to be sliced.
new_size: An integer or 0d tensor representing the new value for
tensor.shape[axis].
fill_value: Value to use to fill any new entries in the tensor. Will be
cast to the type of tensor.
Returns:
The resized tensor.
"""
tensor = tf.convert_to_tensor(tensor)
shape = tf.unstack(tf.shape(tensor))
pad_shape = shape[:]
pad_shape[axis] = tf.maximum(0, new_size - shape[axis])
shape[axis] = tf.minimum(shape[axis], new_size)
shape = tf.stack(shape)
resized = tf.concat([
tf.slice(tensor, tf.zeros_like(shape), shape),
tf.fill(tf.stack(pad_shape), tf.cast(fill_value, tensor.dtype))
], axis)
# Update shape.
new_shape = tensor.get_shape().as_list() # A copy is being made.
new_shape[axis] = new_size
resized.set_shape(new_shape)
return resized
class BaseReader(object):
"""Inherit from this class when implementing new readers."""
def prepare_reader(self, unused_filename_queue):
"""Create a thread for generating prediction and label tensors."""
raise NotImplementedError()
class YT8MFrameFeatureReader(BaseReader):
"""Reads TFRecords of SequenceExamples.
The TFRecords must contain SequenceExamples with the sparse in64 'labels'
context feature and a fixed length byte-quantized feature vector, obtained
from the features in 'feature_names'. The quantized features will be mapped
back into a range between min_quantized_value and max_quantized_value.
"""
def __init__(self,
num_classes=3862,
feature_sizes=[1024],
feature_names=["inc3"],
max_frames=300):
"""Construct a YT8MFrameFeatureReader.
Args:
num_classes: a positive integer for the number of classes.
feature_sizes: positive integer(s) for the feature dimensions as a list.
feature_names: the feature name(s) in the tensorflow record as a list.
max_frames: the maximum number of frames to process.
"""
assert len(feature_names) == len(feature_sizes), \
"length of feature_names (={}) != length of feature_sizes (={})".format( \
len(feature_names), len(feature_sizes))
self.num_classes = num_classes
self.feature_sizes = feature_sizes
self.feature_names = feature_names
self.max_frames = max_frames
def get_video_matrix(self, features, feature_size, max_frames,
max_quantized_value, min_quantized_value):
"""Decodes features from an input string and quantizes it.
Args:
features: raw feature values
feature_size: length of each frame feature vector
max_frames: number of frames (rows) in the output feature_matrix
max_quantized_value: the maximum of the quantized value.
min_quantized_value: the minimum of the quantized value.
Returns:
feature_matrix: matrix of all frame-features
num_frames: number of frames in the sequence
"""
decoded_features = tf.reshape(
tf.cast(tf.decode_raw(features, tf.uint8), tf.float32),
[-1, feature_size])
num_frames = tf.minimum(tf.shape(decoded_features)[0], max_frames)
feature_matrix = decoded_features
return feature_matrix, num_frames
def prepare_reader(self,
filename_queue,
max_quantized_value=2,
min_quantized_value=-2):
"""Creates a single reader thread for YouTube8M SequenceExamples.
Args:
filename_queue: A tensorflow queue of filename locations.
max_quantized_value: the maximum of the quantized value.
min_quantized_value: the minimum of the quantized value.
Returns:
A tuple of video indexes, video features, labels, and padding data.
"""
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
contexts, features = tf.parse_single_sequence_example(
serialized_example,
context_features={
"id": tf.FixedLenFeature([], tf.string),
"labels": tf.VarLenFeature(tf.int64)
},
sequence_features={
feature_name: tf.FixedLenSequenceFeature(
[], dtype=tf.string)
for feature_name in self.feature_names
})
# read ground truth labels
labels = (tf.cast(
tf.sparse_to_dense(
contexts["labels"].values, (self.num_classes, ),
1,
validate_indices=False),
tf.bool))
# loads (potentially) different types of features and concatenates them
num_features = len(self.feature_names)
assert num_features > 0, "No feature selected: feature_names is empty!"
assert len(self.feature_names) == len(self.feature_sizes), \
"length of feature_names (={}) != length of feature_sizes (={})".format( \
len(self.feature_names), len(self.feature_sizes))
num_frames = -1 # the number of frames in the video
feature_matrices = [None
] * num_features # an array of different features
for feature_index in range(num_features):
feature_matrix, num_frames_in_this_feature = self.get_video_matrix(
features[self.feature_names[feature_index]],
self.feature_sizes[feature_index], self.max_frames,
max_quantized_value, min_quantized_value)
if num_frames == -1:
num_frames = num_frames_in_this_feature
#else:
# tf.assert_equal(num_frames, num_frames_in_this_feature)
feature_matrices[feature_index] = feature_matrix
# cap the number of frames at self.max_frames
num_frames = tf.minimum(num_frames, self.max_frames)
# concatenate different features
video_matrix = feature_matrices[0]
audio_matrix = feature_matrices[1]
return contexts["id"], video_matrix, audio_matrix, labels, num_frames
def main(files_pattern):
data_files = gfile.Glob(files_pattern)
filename_queue = tf.train.string_input_producer(
data_files, num_epochs=1, shuffle=False)
reader = YT8MFrameFeatureReader(
feature_sizes=[1024, 128], feature_names=["rgb", "audio"])
vals = reader.prepare_reader(filename_queue)
with tf.Session() as sess:
sess.run(tf.initialize_local_variables())
sess.run(tf.initialize_all_variables())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
vid_num = 0
all_data = []
try:
while not coord.should_stop():
vid, features, audios, labels, nframes = sess.run(vals)
label_index = np.where(labels == True)[0].tolist()
vid_num += 1
#print vid, features.shape, audios.shape, label_index, nframes
features_int = features.astype(np.uint8)
audios_int = audios.astype(np.uint8)
value_dict = {}
value_dict['video'] = vid
value_dict['feature'] = features_int
value_dict['audio'] = audios_int
value_dict['label'] = label_index
value_dict['nframes'] = nframes
all_data.append(value_dict)
except tf.errors.OutOfRangeError:
print('Finished extracting.')
finally:
coord.request_stop()
coord.join(threads)
print vid_num
record_name = files_pattern.split('/')[-1].split('.')[0]
outputdir = target_dir
fn = '%s.pkl' % record_name
outp = open(os.path.join(outputdir, fn), 'wb')
cPickle.dump(all_data, outp, protocol=cPickle.HIGHEST_PROTOCOL)
outp.close()
if __name__ == '__main__':
record_dir = source_dir
record_files = os.listdir(record_dir)
for f in record_files:
record_path = os.path.join(record_dir, f)
main(record_path)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import numpy as np
try:
import cPickle as pickle
except:
import pickle
import paddle.fluid as fluid
from config import *
import models
from datareader import get_reader
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model-name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--use-gpu', type=bool, default=True, help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default=None,
help='weight path, None to use weights from Paddle.')
parser.add_argument(
'--batch-size',
type=int,
default=1,
help='sample number in a batch for inference.')
parser.add_argument(
'--filelist',
type=str,
default=None,
help='path to inferenece data file lists file.')
parser.add_argument(
'--log-interval',
type=int,
default=1,
help='mini-batch interval to log.')
parser.add_argument(
'--infer-topk',
type=int,
default=20,
help='topk predictions to restore.')
parser.add_argument(
'--save-dir', type=str, default='./', help='directory to store results')
args = parser.parse_args()
return args
def infer(args):
# parse config
config = parse_config(args.config)
infer_config = merge_configs(config, 'infer', vars(args))
infer_model = models.get_model(args.model_name, infer_config, mode='infer')
infer_model.build_input(use_pyreader=False)
infer_model.build_model()
infer_feeds = infer_model.feeds()
infer_outputs = infer_model.outputs()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
filelist = args.filelist or infer_config.INFER.filelist
assert os.path.exists(filelist), "{} not exist.".format(args.filelist)
# get infer reader
infer_reader = get_reader(args.model_name.upper(), 'infer', infer_config)
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
# if no weight files specified, download weights from paddle
weights = args.weights or infer_model.get_weights()
def if_exist(var):
return os.path.exists(os.path.join(weights, var.name))
fluid.io.load_vars(exe, weights, predicate=if_exist)
infer_feeder = fluid.DataFeeder(place=place, feed_list=infer_feeds)
fetch_list = [x.name for x in infer_outputs]
periods = []
results = []
cur_time = time.time()
for infer_iter, data in enumerate(infer_reader()):
data_feed_in = [items[:-1] for items in data]
video_id = [items[-1] for items in data]
infer_outs = exe.run(fetch_list=fetch_list,
feed=infer_feeder.feed(data_feed_in))
predictions = np.array(infer_outs[0])
for i in range(len(predictions)):
topk_inds = predictions[i].argsort()[0 - args.infer_topk:]
topk_inds = topk_inds[::-1]
preds = predictions[i][topk_inds]
results.append(
(video_id[i], preds.tolist(), topk_inds.tolist()))
prev_time = cur_time
cur_time = time.time()
period = cur_time - prev_time
periods.append(period)
if args.log_interval > 0 and infer_iter % args.log_interval == 0:
logger.info('Processed {} samples'.format((infer_iter) * len(
predictions)))
logger.info('[INFER] infer finished. average time: {}'.format(
np.mean(periods)))
if not os.path.isdir(args.save_dir):
os.mkdir(args.save_dir)
result_file_name = os.path.join(args.save_dir,
"{}_infer_result".format(args.model_name))
pickle.dump(results, open(result_file_name, 'wb'))
if __name__ == "__main__":
args = parse_args()
logger.info(args)
infer(args)
from .metrics_util import get_metrics
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
import numpy as np
import datetime
import logging
logger = logging.getLogger(__name__)
class MetricsCalculator():
def __init__(self, name, mode):
self.name = name
self.mode = mode # 'train', 'val', 'test'
self.reset()
def reset(self):
logger.info('Resetting {} metrics...'.format(self.mode))
self.aggr_acc1 = 0.0
self.aggr_acc5 = 0.0
self.aggr_loss = 0.0
self.aggr_batch_size = 0
def finalize_metrics(self):
self.avg_acc1 = self.aggr_acc1 / self.aggr_batch_size
self.avg_acc5 = self.aggr_acc5 / self.aggr_batch_size
self.avg_loss = self.aggr_loss / self.aggr_batch_size
def get_computed_metrics(self):
json_stats = {}
json_stats['avg_loss'] = self.avg_loss
json_stats['avg_acc1'] = self.avg_acc1
json_stats['avg_acc5'] = self.avg_acc5
return json_stats
def calculate_metrics(self, loss, softmax, labels):
accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
return accuracy1, accuracy5
def accumulate(self, loss, softmax, labels):
cur_batch_size = softmax.shape[0]
# if returned loss is None for e.g. test, just set loss to be 0.
if loss is None:
cur_loss = 0.
else:
cur_loss = np.mean(np.array(loss)) #
self.aggr_batch_size += cur_batch_size
self.aggr_loss += cur_loss * cur_batch_size
accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
self.aggr_acc1 += accuracy1 * cur_batch_size
self.aggr_acc5 += accuracy5 * cur_batch_size
return
# ----------------------------------------------
# other utils
# ----------------------------------------------
def compute_topk_correct_hits(top_k, preds, labels):
'''Compute the number of corret hits'''
batch_size = preds.shape[0]
top_k_preds = np.zeros((batch_size, top_k), dtype=np.float32)
for i in range(batch_size):
top_k_preds[i, :] = np.argsort(-preds[i, :])[:top_k]
correctness = np.zeros(batch_size, dtype=np.int32)
for i in range(batch_size):
if labels[i] in top_k_preds[i, :].astype(np.int32).tolist():
correctness[i] = 1
correct_hits = sum(correctness)
return correct_hits
def compute_topk_accuracy(softmax, labels, top_k):
computed_metrics = {}
assert labels.shape[0] == softmax.shape[0], "Batch size mismatch."
aggr_batch_size = labels.shape[0]
aggr_top_k_correct_hits = compute_topk_correct_hits(top_k, softmax, labels)
# normalize results
computed_metrics = \
float(aggr_top_k_correct_hits) / aggr_batch_size
return computed_metrics
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
import logging
import numpy as np
from metrics.youtube8m import eval_util as youtube8m_metrics
from metrics.kinetics import accuracy_metrics as kinetics_metrics
from metrics.multicrop_test import multicrop_test_metrics as multicrop_test_metrics
logger = logging.getLogger(__name__)
class Metrics(object):
def __init__(self, name, mode, metrics_args):
"""Not implemented"""
pass
def calculate_and_log_out(self, loss, pred, label, info=''):
"""Not implemented"""
pass
def accumulate(self, loss, pred, label, info=''):
"""Not implemented"""
pass
def finalize_and_log_out(self, info=''):
"""Not implemented"""
pass
def reset(self):
"""Not implemented"""
pass
class Youtube8mMetrics(Metrics):
def __init__(self, name, mode, metrics_args):
self.name = name
self.mode = mode
self.num_classes = metrics_args['MODEL']['num_classes']
self.topk = metrics_args['MODEL']['topk']
self.calculator = youtube8m_metrics.EvaluationMetrics(self.num_classes,
self.topk)
def calculate_and_log_out(self, loss, pred, label, info=''):
loss = np.mean(np.array(loss))
hit_at_one = youtube8m_metrics.calculate_hit_at_one(pred, label)
perr = youtube8m_metrics.calculate_precision_at_equal_recall_rate(pred,
label)
gap = youtube8m_metrics.calculate_gap(pred, label)
logger.info(info + ' , loss = {0}, Hit@1 = {1}, PERR = {2}, GAP = {3}'.format(\
'%.6f' % loss, '%.2f' % hit_at_one, '%.2f' % perr, '%.2f' % gap))
def accumulate(self, loss, pred, label, info=''):
self.calculator.accumulate(loss, pred, label)
def finalize_and_log_out(self, info=''):
epoch_info_dict = self.calculator.get()
logger.info(info + '\tavg_hit_at_one: {0},\tavg_perr: {1},\tavg_loss :{2},\taps: {3},\tgap:{4}'\
.format(epoch_info_dict['avg_hit_at_one'], epoch_info_dict['avg_perr'], \
epoch_info_dict['avg_loss'], epoch_info_dict['aps'], epoch_info_dict['gap']))
def reset(self):
self.calculator.clear()
class Kinetics400Metrics(Metrics):
def __init__(self, name, mode, metrics_args):
self.name = name
self.mode = mode
self.calculator = kinetics_metrics.MetricsCalculator(name, mode.lower())
def calculate_and_log_out(self, loss, pred, label, info=''):
if loss is not None:
loss = np.mean(np.array(loss))
else:
loss = 0.
acc1, acc5 = self.calculator.calculate_metrics(loss, pred, label)
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
def accumulate(self, loss, pred, label, info=''):
self.calculator.accumulate(loss, pred, label)
def finalize_and_log_out(self, info=''):
self.calculator.finalize_metrics()
metrics_dict = self.calculator.get_computed_metrics()
loss = metrics_dict['avg_loss']
acc1 = metrics_dict['avg_acc1']
acc5 = metrics_dict['avg_acc5']
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
def reset(self):
self.calculator.reset()
class MulticropMetrics(Metrics):
def __init__(self, name, mode, metrics_args):
self.name = name
self.mode = mode
if mode == 'test':
args = {}
args['num_test_clips'] = metrics_args.TEST.num_test_clips
args['dataset_size'] = metrics_args.TEST.dataset_size
args['filename_gt'] = metrics_args.TEST.filename_gt
args['checkpoint_dir'] = metrics_args.TEST.checkpoint_dir
args['num_classes'] = metrics_args.MODEL.num_classes
self.calculator = multicrop_test_metrics.MetricsCalculator(
name, mode.lower(), **args)
else:
self.calculator = kinetics_metrics.MetricsCalculator(name,
mode.lower())
def calculate_and_log_out(self, loss, pred, label, info=''):
if self.mode == 'test':
pass
else:
if loss is not None:
loss = np.mean(np.array(loss))
else:
loss = 0.
acc1, acc5 = self.calculator.calculate_metrics(loss, pred, label)
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
def accumulate(self, loss, pred, label):
self.calculator.accumulate(loss, pred, label)
def finalize_and_log_out(self, info=''):
if self.mode == 'test':
self.calculator.finalize_metrics()
else:
self.calculator.finalize_metrics()
metrics_dict = self.calculator.get_computed_metrics()
loss = metrics_dict['avg_loss']
acc1 = metrics_dict['avg_acc1']
acc5 = metrics_dict['avg_acc5']
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
def reset(self):
self.calculator.reset()
class MetricsZoo(object):
def __init__(self):
self.metrics_zoo = {}
def regist(self, name, metrics):
assert metrics.__base__ == Metrics, "Unknow model type {}".format(
type(metrics))
self.metrics_zoo[name] = metrics
def get(self, name, mode, cfg):
for k, v in self.metrics_zoo.items():
if k == name:
return v(name, mode, cfg)
raise MetricsNotFoundError(name, self.metrics_zoo.keys())
# singleton metrics_zoo
metrics_zoo = MetricsZoo()
def regist_metrics(name, metrics):
metrics_zoo.regist(name, metrics)
def get_metrics(name, mode, cfg):
return metrics_zoo.get(name, mode, cfg)
regist_metrics("NEXTVLAD", Youtube8mMetrics)
regist_metrics("ATTENTIONLSTM", Youtube8mMetrics)
regist_metrics("ATTENTIONCLUSTER", Youtube8mMetrics)
regist_metrics("TSN", Kinetics400Metrics)
regist_metrics("TSM", Kinetics400Metrics)
regist_metrics("STNET", Kinetics400Metrics)
regist_metrics("NONLOCAL", MulticropMetrics)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
import sys
import os
import numpy as np
import datetime
import logging
from collections import defaultdict
import pickle
logger = logging.getLogger(__name__)
class MetricsCalculator():
def __init__(self, name, mode, **metrics_args):
"""
metrics args:
num_test_clips, number of clips of each video when test
dataset_size, total number of videos in the dataset
filename_gt, a file with each line stores the groud truth of each video
checkpoint_dir, dir where to store the test results
num_classes, number of classes of the dataset
"""
self.name = name
self.mode = mode # 'train', 'val', 'test'
self.metrics_args = metrics_args
self.num_test_clips = metrics_args['num_test_clips']
self.dataset_size = metrics_args['dataset_size']
self.filename_gt = metrics_args['filename_gt']
self.checkpoint_dir = metrics_args['checkpoint_dir']
self.num_classes = metrics_args['num_classes']
self.reset()
def reset(self):
logger.info('Resetting {} metrics...'.format(self.mode))
self.aggr_acc1 = 0.0
self.aggr_acc5 = 0.0
self.aggr_loss = 0.0
self.aggr_batch_size = 0
self.seen_inds = defaultdict(int)
self.results = []
def calculate_metrics(self, loss, pred, labels):
pass
def accumulate(self, loss, pred, labels):
labels = labels.astype(int)
for i in range(pred.shape[0]):
probs = pred[i, :].tolist()
vid = labels[i]
self.seen_inds[vid] += 1
if self.seen_inds[vid] > self.num_test_clips:
logger.warning('Video id {} have been seen. Skip.'.format(vid,
))
continue
save_pairs = [vid, probs]
self.results.append(save_pairs)
logger.info("({0} / {1}) videos".format(\
len(self.seen_inds), self.dataset_size))
def finalize_metrics(self):
if self.filename_gt is not None:
evaluate_results(self.results, self.filename_gt, self.dataset_size, \
self.num_classes, self.num_test_clips)
# save temporary file
pkl_path = os.path.join(self.checkpoint_dir, "results_probs.pkl")
with open(pkl_path, 'w') as f:
pickle.dump(self.results, f)
logger.info('Temporary file saved to: {}'.format(pkl_path))
def read_groundtruth(filename_gt):
f = open(filename_gt, 'r')
labels = []
for line in f:
rows = line.split()
labels.append(int(rows[1]))
f.close()
return labels
def evaluate_results(results, filename_gt, test_dataset_size, num_classes,
num_test_clips):
gt_labels = read_groundtruth(filename_gt)
sample_num = test_dataset_size
class_num = num_classes
sample_video_times = num_test_clips
counts = np.zeros(sample_num, dtype=np.int32)
probs = np.zeros((sample_num, class_num))
assert (len(gt_labels) == sample_num)
"""
clip_accuracy: the (e.g.) 10*19761 clips' average accuracy
clip1_accuracy: the 1st clip's accuracy (starting from frame 0)
"""
clip_accuracy = 0
clip1_accuracy = 0
clip1_count = 0
seen_inds = defaultdict(int)
# evaluate
for entry in results:
vid = entry[0]
prob = np.array(entry[1])
probs[vid] += prob[0:class_num]
counts[vid] += 1
idx = prob.argmax()
if idx == gt_labels[vid]:
# clip accuracy
clip_accuracy += 1
# clip1 accuracy
seen_inds[vid] += 1
if seen_inds[vid] == 1:
clip1_count += 1
if idx == gt_labels[vid]:
clip1_accuracy += 1
# sanity checkcnt = 0
max_clips = 0
min_clips = sys.maxsize
count_empty = 0
count_corrupted = 0
for i in range(sample_num):
max_clips = max(max_clips, counts[i])
min_clips = min(min_clips, counts[i])
if counts[i] != sample_video_times:
count_corrupted += 1
logger.warning('Id: {} count: {}'.format(i, counts[i]))
if counts[i] == 0:
count_empty += 1
logger.info('Num of empty videos: {}'.format(count_empty))
logger.info('Num of corrupted videos: {}'.format(count_corrupted))
logger.info('Max num of clips in a video: {}'.format(max_clips))
logger.info('Min num of clips in a video: {}'.format(min_clips))
# clip1 accuracy for sanity (# print clip1 first as it is lowest)
logger.info('Clip1 accuracy: {:.2f} percent ({}/{})'.format(
100. * clip1_accuracy / clip1_count, clip1_accuracy, clip1_count))
# clip accuracy for sanity
logger.info('Clip accuracy: {:.2f} percent ({}/{})'.format(
100. * clip_accuracy / len(results), clip_accuracy, len(results)))
# compute accuracy
accuracy = 0
accuracy_top5 = 0
for i in range(sample_num):
prob = probs[i]
# top-1
idx = prob.argmax()
if idx == gt_labels[i] and counts[i] > 0:
accuracy = accuracy + 1
ids = np.argsort(prob)[::-1]
for j in range(5):
if ids[j] == gt_labels[i] and counts[i] > 0:
accuracy_top5 = accuracy_top5 + 1
break
accuracy = float(accuracy) / float(sample_num)
accuracy_top5 = float(accuracy_top5) / float(sample_num)
logger.info('-' * 80)
logger.info('top-1 accuracy: {:.2f} percent'.format(accuracy * 100))
logger.info('top-5 accuracy: {:.2f} percent'.format(accuracy_top5 * 100))
logger.info('-' * 80)
for i in range(sample_num):
prob = probs[i]
# top-1
idx = prob.argmax()
if idx == gt_labels[i] and counts[i] > 0:
accuracy = accuracy + 1
ids = np.argsort(prob)[::-1]
for j in range(5):
if ids[j] == gt_labels[i] and counts[i] > 0:
accuracy_top5 = accuracy_top5 + 1
break
accuracy = float(accuracy) / float(sample_num)
accuracy_top5 = float(accuracy_top5) / float(sample_num)
logger.info('-' * 80)
logger.info('top-1 accuracy: {:.2f} percent'.format(accuracy * 100))
logger.info('top-5 accuracy: {:.2f} percent'.format(accuracy_top5 * 100))
logger.info('-' * 80)
return
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Calculate or keep track of the interpolated average precision.
It provides an interface for calculating interpolated average precision for an
entire list or the top-n ranked items. For the definition of the
(non-)interpolated average precision:
http://trec.nist.gov/pubs/trec15/appendices/CE.MEASURES06.pdf
Example usages:
1) Use it as a static function call to directly calculate average precision for
a short ranked list in the memory.
```
import random
p = np.array([random.random() for _ in xrange(10)])
a = np.array([random.choice([0, 1]) for _ in xrange(10)])
ap = average_precision_calculator.AveragePrecisionCalculator.ap(p, a)
```
2) Use it as an object for long ranked list that cannot be stored in memory or
the case where partial predictions can be observed at a time (Tensorflow
predictions). In this case, we first call the function accumulate many times
to process parts of the ranked list. After processing all the parts, we call
peek_interpolated_ap_at_n.
```
p1 = np.array([random.random() for _ in xrange(5)])
a1 = np.array([random.choice([0, 1]) for _ in xrange(5)])
p2 = np.array([random.random() for _ in xrange(5)])
a2 = np.array([random.choice([0, 1]) for _ in xrange(5)])
# interpolated average precision at 10 using 1000 break points
calculator = average_precision_calculator.AveragePrecisionCalculator(10)
calculator.accumulate(p1, a1)
calculator.accumulate(p2, a2)
ap3 = calculator.peek_ap_at_n()
```
"""
import heapq
import random
import numbers
import numpy
class AveragePrecisionCalculator(object):
"""Calculate the average precision and average precision at n."""
def __init__(self, top_n=None):
"""Construct an AveragePrecisionCalculator to calculate average precision.
This class is used to calculate the average precision for a single label.
Args:
top_n: A positive Integer specifying the average precision at n, or
None to use all provided data points.
Raises:
ValueError: An error occurred when the top_n is not a positive integer.
"""
if not ((isinstance(top_n, int) and top_n >= 0) or top_n is None):
raise ValueError("top_n must be a positive integer or None.")
self._top_n = top_n # average precision at n
self._total_positives = 0 # total number of positives have seen
self._heap = [] # max heap of (prediction, actual)
@property
def heap_size(self):
"""Gets the heap size maintained in the class."""
return len(self._heap)
@property
def num_accumulated_positives(self):
"""Gets the number of positive samples that have been accumulated."""
return self._total_positives
def accumulate(self, predictions, actuals, num_positives=None):
"""Accumulate the predictions and their ground truth labels.
After the function call, we may call peek_ap_at_n to actually calculate
the average precision.
Note predictions and actuals must have the same shape.
Args:
predictions: a list storing the prediction scores.
actuals: a list storing the ground truth labels. Any value
larger than 0 will be treated as positives, otherwise as negatives.
num_positives = If the 'predictions' and 'actuals' inputs aren't complete,
then it's possible some true positives were missed in them. In that case,
you can provide 'num_positives' in order to accurately track recall.
Raises:
ValueError: An error occurred when the format of the input is not the
numpy 1-D array or the shape of predictions and actuals does not match.
"""
if len(predictions) != len(actuals):
raise ValueError(
"the shape of predictions and actuals does not match.")
if not num_positives is None:
if not isinstance(num_positives,
numbers.Number) or num_positives < 0:
raise ValueError(
"'num_positives' was provided but it wan't a nonzero number."
)
if not num_positives is None:
self._total_positives += num_positives
else:
self._total_positives += numpy.size(numpy.where(actuals > 0))
topk = self._top_n
heap = self._heap
for i in range(numpy.size(predictions)):
if topk is None or len(heap) < topk:
heapq.heappush(heap, (predictions[i], actuals[i]))
else:
if predictions[i] > heap[0][0]: # heap[0] is the smallest
heapq.heappop(heap)
heapq.heappush(heap, (predictions[i], actuals[i]))
def clear(self):
"""Clear the accumulated predictions."""
self._heap = []
self._total_positives = 0
def peek_ap_at_n(self):
"""Peek the non-interpolated average precision at n.
Returns:
The non-interpolated average precision at n (default 0).
If n is larger than the length of the ranked list,
the average precision will be returned.
"""
if self.heap_size <= 0:
return 0
predlists = numpy.array(list(zip(*self._heap)))
ap = self.ap_at_n(
predlists[0],
predlists[1],
n=self._top_n,
total_num_positives=self._total_positives)
return ap
@staticmethod
def ap(predictions, actuals):
"""Calculate the non-interpolated average precision.
Args:
predictions: a numpy 1-D array storing the sparse prediction scores.
actuals: a numpy 1-D array storing the ground truth labels. Any value
larger than 0 will be treated as positives, otherwise as negatives.
Returns:
The non-interpolated average precision at n.
If n is larger than the length of the ranked list,
the average precision will be returned.
Raises:
ValueError: An error occurred when the format of the input is not the
numpy 1-D array or the shape of predictions and actuals does not match.
"""
return AveragePrecisionCalculator.ap_at_n(predictions, actuals, n=None)
@staticmethod
def ap_at_n(predictions, actuals, n=20, total_num_positives=None):
"""Calculate the non-interpolated average precision.
Args:
predictions: a numpy 1-D array storing the sparse prediction scores.
actuals: a numpy 1-D array storing the ground truth labels. Any value
larger than 0 will be treated as positives, otherwise as negatives.
n: the top n items to be considered in ap@n.
total_num_positives : (optionally) you can specify the number of total
positive
in the list. If specified, it will be used in calculation.
Returns:
The non-interpolated average precision at n.
If n is larger than the length of the ranked list,
the average precision will be returned.
Raises:
ValueError: An error occurred when
1) the format of the input is not the numpy 1-D array;
2) the shape of predictions and actuals does not match;
3) the input n is not a positive integer.
"""
if len(predictions) != len(actuals):
raise ValueError(
"the shape of predictions and actuals does not match.")
if n is not None:
if not isinstance(n, int) or n <= 0:
raise ValueError("n must be 'None' or a positive integer."
" It was '%s'." % n)
ap = 0.0
predictions = numpy.array(predictions)
actuals = numpy.array(actuals)
# add a shuffler to avoid overestimating the ap
predictions, actuals = AveragePrecisionCalculator._shuffle(predictions,
actuals)
sortidx = sorted(
range(len(predictions)), key=lambda k: predictions[k], reverse=True)
if total_num_positives is None:
numpos = numpy.size(numpy.where(actuals > 0))
else:
numpos = total_num_positives
if numpos == 0:
return 0
if n is not None:
numpos = min(numpos, n)
delta_recall = 1.0 / numpos
poscount = 0.0
# calculate the ap
r = len(sortidx)
if n is not None:
r = min(r, n)
for i in range(r):
if actuals[sortidx[i]] > 0:
poscount += 1
ap += poscount / (i + 1) * delta_recall
return ap
@staticmethod
def _shuffle(predictions, actuals):
random.seed(0)
suffidx = random.sample(range(len(predictions)), len(predictions))
predictions = predictions[suffidx]
actuals = actuals[suffidx]
return predictions, actuals
@staticmethod
def _zero_one_normalize(predictions, epsilon=1e-7):
"""Normalize the predictions to the range between 0.0 and 1.0.
For some predictions like SVM predictions, we need to normalize them before
calculate the interpolated average precision. The normalization will not
change the rank in the original list and thus won't change the average
precision.
Args:
predictions: a numpy 1-D array storing the sparse prediction scores.
epsilon: a small constant to avoid denominator being zero.
Returns:
The normalized prediction.
"""
denominator = numpy.max(predictions) - numpy.min(predictions)
ret = (predictions - numpy.min(predictions)) / numpy.max(denominator,
epsilon)
return ret
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Provides functions to help with evaluating models."""
import datetime
import numpy
from . import mean_average_precision_calculator as map_calculator
from . import average_precision_calculator as ap_calculator
def flatten(l):
""" Merges a list of lists into a single list. """
return [item for sublist in l for item in sublist]
def calculate_hit_at_one(predictions, actuals):
"""Performs a local (numpy) calculation of the hit at one.
Args:
predictions: Matrix containing the outputs of the model.
Dimensions are 'batch' x 'num_classes'.
actuals: Matrix containing the ground truth labels.
Dimensions are 'batch' x 'num_classes'.
Returns:
float: The average hit at one across the entire batch.
"""
top_prediction = numpy.argmax(predictions, 1)
hits = actuals[numpy.arange(actuals.shape[0]), top_prediction]
return numpy.average(hits)
def calculate_precision_at_equal_recall_rate(predictions, actuals):
"""Performs a local (numpy) calculation of the PERR.
Args:
predictions: Matrix containing the outputs of the model.
Dimensions are 'batch' x 'num_classes'.
actuals: Matrix containing the ground truth labels.
Dimensions are 'batch' x 'num_classes'.
Returns:
float: The average precision at equal recall rate across the entire batch.
"""
aggregated_precision = 0.0
num_videos = actuals.shape[0]
for row in numpy.arange(num_videos):
num_labels = int(numpy.sum(actuals[row]))
top_indices = numpy.argpartition(predictions[row],
-num_labels)[-num_labels:]
item_precision = 0.0
for label_index in top_indices:
if predictions[row][label_index] > 0:
item_precision += actuals[row][label_index]
item_precision /= top_indices.size
aggregated_precision += item_precision
aggregated_precision /= num_videos
return aggregated_precision
def calculate_gap(predictions, actuals, top_k=20):
"""Performs a local (numpy) calculation of the global average precision.
Only the top_k predictions are taken for each of the videos.
Args:
predictions: Matrix containing the outputs of the model.
Dimensions are 'batch' x 'num_classes'.
actuals: Matrix containing the ground truth labels.
Dimensions are 'batch' x 'num_classes'.
top_k: How many predictions to use per video.
Returns:
float: The global average precision.
"""
gap_calculator = ap_calculator.AveragePrecisionCalculator()
sparse_predictions, sparse_labels, num_positives = top_k_by_class(
predictions, actuals, top_k)
gap_calculator.accumulate(
flatten(sparse_predictions), flatten(sparse_labels), sum(num_positives))
return gap_calculator.peek_ap_at_n()
def top_k_by_class(predictions, labels, k=20):
"""Extracts the top k predictions for each video, sorted by class.
Args:
predictions: A numpy matrix containing the outputs of the model.
Dimensions are 'batch' x 'num_classes'.
k: the top k non-zero entries to preserve in each prediction.
Returns:
A tuple (predictions,labels, true_positives). 'predictions' and 'labels'
are lists of lists of floats. 'true_positives' is a list of scalars. The
length of the lists are equal to the number of classes. The entries in the
predictions variable are probability predictions, and
the corresponding entries in the labels variable are the ground truth for
those predictions. The entries in 'true_positives' are the number of true
positives for each class in the ground truth.
Raises:
ValueError: An error occurred when the k is not a positive integer.
"""
if k <= 0:
raise ValueError("k must be a positive integer.")
k = min(k, predictions.shape[1])
num_classes = predictions.shape[1]
prediction_triplets = []
for video_index in range(predictions.shape[0]):
prediction_triplets.extend(
top_k_triplets(predictions[video_index], labels[video_index], k))
out_predictions = [[] for v in range(num_classes)]
out_labels = [[] for v in range(num_classes)]
for triplet in prediction_triplets:
out_predictions[triplet[0]].append(triplet[1])
out_labels[triplet[0]].append(triplet[2])
out_true_positives = [numpy.sum(labels[:, i]) for i in range(num_classes)]
return out_predictions, out_labels, out_true_positives
def top_k_triplets(predictions, labels, k=20):
"""Get the top_k for a 1-d numpy array. Returns a sparse list of tuples in
(prediction, class) format"""
m = len(predictions)
k = min(k, m)
indices = numpy.argpartition(predictions, -k)[-k:]
return [(index, predictions[index], labels[index]) for index in indices]
class EvaluationMetrics(object):
"""A class to store the evaluation metrics."""
def __init__(self, num_class, top_k):
"""Construct an EvaluationMetrics object to store the evaluation metrics.
Args:
num_class: A positive integer specifying the number of classes.
top_k: A positive integer specifying how many predictions are considered per video.
Raises:
ValueError: An error occurred when MeanAveragePrecisionCalculator cannot
not be constructed.
"""
self.sum_hit_at_one = 0.0
self.sum_perr = 0.0
self.sum_loss = 0.0
self.map_calculator = map_calculator.MeanAveragePrecisionCalculator(
num_class)
self.global_ap_calculator = ap_calculator.AveragePrecisionCalculator()
self.top_k = top_k
self.num_examples = 0
#def accumulate(self, predictions, labels, loss):
def accumulate(self, loss, predictions, labels):
"""Accumulate the metrics calculated locally for this mini-batch.
Args:
predictions: A numpy matrix containing the outputs of the model.
Dimensions are 'batch' x 'num_classes'.
labels: A numpy matrix containing the ground truth labels.
Dimensions are 'batch' x 'num_classes'.
loss: A numpy array containing the loss for each sample.
Returns:
dictionary: A dictionary storing the metrics for the mini-batch.
Raises:
ValueError: An error occurred when the shape of predictions and actuals
does not match.
"""
batch_size = labels.shape[0]
mean_hit_at_one = calculate_hit_at_one(predictions, labels)
mean_perr = calculate_precision_at_equal_recall_rate(predictions,
labels)
mean_loss = numpy.mean(loss)
# Take the top 20 predictions.
sparse_predictions, sparse_labels, num_positives = top_k_by_class(
predictions, labels, self.top_k)
self.map_calculator.accumulate(sparse_predictions, sparse_labels,
num_positives)
self.global_ap_calculator.accumulate(
flatten(sparse_predictions),
flatten(sparse_labels), sum(num_positives))
self.num_examples += batch_size
self.sum_hit_at_one += mean_hit_at_one * batch_size
self.sum_perr += mean_perr * batch_size
self.sum_loss += mean_loss * batch_size
return {
"hit_at_one": mean_hit_at_one,
"perr": mean_perr,
"loss": mean_loss
}
def get(self):
"""Calculate the evaluation metrics for the whole epoch.
Raises:
ValueError: If no examples were accumulated.
Returns:
dictionary: a dictionary storing the evaluation metrics for the epoch. The
dictionary has the fields: avg_hit_at_one, avg_perr, avg_loss, and
aps (default nan).
"""
if self.num_examples <= 0:
raise ValueError("total_sample must be positive.")
avg_hit_at_one = self.sum_hit_at_one / self.num_examples
avg_perr = self.sum_perr / self.num_examples
avg_loss = self.sum_loss / self.num_examples
aps = self.map_calculator.peek_map_at_n()
gap = self.global_ap_calculator.peek_ap_at_n()
epoch_info_dict = {}
return {
"avg_hit_at_one": avg_hit_at_one,
"avg_perr": avg_perr,
"avg_loss": avg_loss,
"aps": aps,
"gap": gap
}
def clear(self):
"""Clear the evaluation metrics and reset the EvaluationMetrics object."""
self.sum_hit_at_one = 0.0
self.sum_perr = 0.0
self.sum_loss = 0.0
self.map_calculator.clear()
self.global_ap_calculator.clear()
self.num_examples = 0
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Calculate the mean average precision.
It provides an interface for calculating mean average precision
for an entire list or the top-n ranked items.
Example usages:
We first call the function accumulate many times to process parts of the ranked
list. After processing all the parts, we call peek_map_at_n
to calculate the mean average precision.
```
import random
p = np.array([[random.random() for _ in xrange(50)] for _ in xrange(1000)])
a = np.array([[random.choice([0, 1]) for _ in xrange(50)]
for _ in xrange(1000)])
# mean average precision for 50 classes.
calculator = mean_average_precision_calculator.MeanAveragePrecisionCalculator(
num_class=50)
calculator.accumulate(p, a)
aps = calculator.peek_map_at_n()
```
"""
import numpy
from . import average_precision_calculator
class MeanAveragePrecisionCalculator(object):
"""This class is to calculate mean average precision.
"""
def __init__(self, num_class):
"""Construct a calculator to calculate the (macro) average precision.
Args:
num_class: A positive Integer specifying the number of classes.
top_n_array: A list of positive integers specifying the top n for each
class. The top n in each class will be used to calculate its average
precision at n.
The size of the array must be num_class.
Raises:
ValueError: An error occurred when num_class is not a positive integer;
or the top_n_array is not a list of positive integers.
"""
if not isinstance(num_class, int) or num_class <= 1:
raise ValueError("num_class must be a positive integer.")
self._ap_calculators = [] # member of AveragePrecisionCalculator
self._num_class = num_class # total number of classes
for i in range(num_class):
self._ap_calculators.append(
average_precision_calculator.AveragePrecisionCalculator())
def accumulate(self, predictions, actuals, num_positives=None):
"""Accumulate the predictions and their ground truth labels.
Args:
predictions: A list of lists storing the prediction scores. The outer
dimension corresponds to classes.
actuals: A list of lists storing the ground truth labels. The dimensions
should correspond to the predictions input. Any value
larger than 0 will be treated as positives, otherwise as negatives.
num_positives: If provided, it is a list of numbers representing the
number of true positives for each class. If not provided, the number of
true positives will be inferred from the 'actuals' array.
Raises:
ValueError: An error occurred when the shape of predictions and actuals
does not match.
"""
if not num_positives:
num_positives = [None for i in predictions.shape[1]]
calculators = self._ap_calculators
for i in range(len(predictions)):
calculators[i].accumulate(predictions[i], actuals[i],
num_positives[i])
def clear(self):
for calculator in self._ap_calculators:
calculator.clear()
def is_empty(self):
return ([calculator.heap_size for calculator in self._ap_calculators] ==
[0 for _ in range(self._num_class)])
def peek_map_at_n(self):
"""Peek the non-interpolated mean average precision at n.
Returns:
An array of non-interpolated average precision at n (default 0) for each
class.
"""
aps = [
self._ap_calculators[i].peek_ap_at_n()
for i in range(self._num_class)
]
return aps
from .model import regist_model, get_model
from .attention_cluster import AttentionCluster
from .nextvlad import NEXTVLAD
from .tsn import TSN
from .stnet import STNET
from .attention_lstm import AttentionLSTM
# regist models
regist_model("AttentionCluster", AttentionCluster)
regist_model("NEXTVLAD", NEXTVLAD)
regist_model("TSN", TSN)
regist_model("STNET", STNET)
regist_model("AttentionLSTM", AttentionLSTM)
# Attention Cluster 视频分类模型
---
## 目录
- [模型简介](#模型简介)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型推断](#模型推断)
- [参考论文](#参考论文)
## 模型简介
Attention Cluster模型为ActivityNet Kinetics Challenge 2017中最佳序列模型。该模型通过带Shifting Opeation的Attention Clusters处理已抽取好的RGB、Flow、Audio数据,Attention Cluster结构如下图所示。
<p align="center">
<img src="../../images/attention_cluster.png" height=300 width=400 hspace='10'/> <br />
Multimodal Attention Cluster with Shifting Operation
</p>
详细内容请参考[Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550)
## 数据准备
Attention Cluster模型使用2nd-Youtube-8M数据集, 数据下载及准备请参考[数据说明](../../dataset/README.md)
## 模型训练
数据准备完毕后,可以通过如下两种方式启动训练:
python train.py --model-name=AttentionCluster
--config=./configs/attention_cluster.txt
--save-dir=checkpoints
--log-interval=10
--valid-interval=1
bash scripts/train/train_attention_cluster.sh
- 可下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz)通过`--resume`指定权重存放路径进行finetune等开发
**数据读取器说明:** 模型读取Youtube-8M数据集中已抽取好的`rgb``audio`数据,对于每个视频的数据,均匀采样100帧,该值由配置文件中的`seg_num`参数指定。
**模型设置:** 模型主要可配置参数为`cluster_nums``seg_num`参数,当配置`cluster_nums`为32, `seg_num`为100时,在Nvidia Tesla P40上单卡可跑`batch_size=256`
**训练策略:**
* 采用Adam优化器,初始learning\_rate=0.001。
* 训练过程中不使用权重衰减。
* 参数主要使用MSRA初始化
## 模型评估
可通过如下两种方式进行模型评估:
python test.py --model-name=AttentionCluster
--config=configs/attention_cluster.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
bash scripts/test/test_attention_cluster.sh
- 使用`scripts/test/test_attention_cluster.sh`进行评估时,需要修改脚本中的`--weights`参数指定需要评估的权重。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz)进行评估
当取如下参数时:
| 参数 | 取值 |
| :---------: | :----: |
| cluster\_nums | 32 |
| seg\_num | 100 |
| batch\_size | 2048 |
| nums\_gpu | 7 |
在2nd-YouTube-8M数据集下评估精度如下:
| 精度指标 | 模型精度 |
| :---------: | :----: |
| Hit@1 | 0.87 |
| PERR | 0.78 |
| GAP | 0.84 |
## 模型推断
可通过如下命令进行模型推断:
python infer.py --model-name=attention_cluster
--config=configs/attention_cluster.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
--filelist=$FILELIST
- 模型推断结果存储于`AttentionCluster_infer_result`中,通过`pickle`格式存储。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz)进行推断
## 参考论文
- [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550), Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen
from __future__ import absolute_import
from .attention_cluster import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
from ..model import ModelBase
from .shifting_attention import ShiftingAttentionModel
from .logistic_model import LogisticModel
__all__ = ["AttentionCluster"]
class AttentionCluster(ModelBase):
def __init__(self, name, cfg, mode='train'):
super(AttentionCluster, self).__init__(name, cfg, mode)
self.get_config()
def get_config(self):
# get model configs
self.feature_num = self.cfg.MODEL.feature_num
self.feature_names = self.cfg.MODEL.feature_names
self.feature_dims = self.cfg.MODEL.feature_dims
self.cluster_nums = self.cfg.MODEL.cluster_nums
self.seg_num = self.cfg.MODEL.seg_num
self.class_num = self.cfg.MODEL.num_classes
self.drop_rate = self.cfg.MODEL.drop_rate
if self.mode == 'train':
self.learning_rate = self.get_config_from_sec('train',
'learning_rate', 1e-3)
def build_input(self, use_pyreader):
if use_pyreader:
assert self.mode != 'infer', \
'pyreader is not recommendated when infer, please set use_pyreader to be false.'
shapes = []
for dim in self.feature_dims:
shapes.append([-1, self.seg_num, dim])
shapes.append([-1, self.class_num]) # label
self.py_reader = fluid.layers.py_reader(
capacity=1024,
shapes=shapes,
lod_levels=[0] * (self.feature_num + 1),
dtypes=['float32'] * (self.feature_num + 1),
name='train_py_reader'
if self.is_training else 'test_py_reader',
use_double_buffer=True)
inputs = fluid.layers.read_file(self.py_reader)
self.feature_input = inputs[:self.feature_num]
self.label_input = inputs[-1]
else:
self.feature_input = []
for name, dim in zip(self.feature_names, self.feature_dims):
self.feature_input.append(
fluid.layers.data(
shape=[self.seg_num, dim], dtype='float32', name=name))
if self.mode == 'infer':
self.label_input = None
else:
self.label_input = fluid.layers.data(
shape=[self.class_num], dtype='float32', name='label')
def build_model(self):
att_outs = []
for i, (input_dim, cluster_num, feature) in enumerate(
zip(self.feature_dims, self.cluster_nums, self.feature_input)):
att = ShiftingAttentionModel(input_dim, self.seg_num, cluster_num,
"satt{}".format(i))
att_out = att.forward(feature)
att_outs.append(att_out)
out = fluid.layers.concat(att_outs, axis=1)
if self.drop_rate > 0.:
out = fluid.layers.dropout(
out, self.drop_rate, is_test=(not self.is_training))
fc1 = fluid.layers.fc(
out,
size=1024,
act='tanh',
param_attr=ParamAttr(
name="fc1.weights",
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=ParamAttr(
name="fc1.bias", initializer=fluid.initializer.MSRA()))
fc2 = fluid.layers.fc(
fc1,
size=4096,
act='tanh',
param_attr=ParamAttr(
name="fc2.weights",
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=ParamAttr(
name="fc2.bias", initializer=fluid.initializer.MSRA()))
aggregate_model = LogisticModel()
self.output, self.logit = aggregate_model.build_model(
model_input=fc2,
vocab_size=self.class_num,
is_training=self.is_training)
def optimizer(self):
assert self.mode == 'train', "optimizer only can be get in train mode"
return fluid.optimizer.AdamOptimizer(self.learning_rate)
def loss(self):
assert self.mode != 'infer', "invalid loss calculationg in infer mode"
cost = fluid.layers.sigmoid_cross_entropy_with_logits(
x=self.logit, label=self.label_input)
cost = fluid.layers.reduce_sum(cost, dim=-1)
self.loss_ = fluid.layers.mean(x=cost)
return self.loss_
def outputs(self):
return [self.output, self.logit]
def feeds(self):
return self.feature_input if self.mode == 'infer' else self.feature_input + [
self.label_input
]
def weights_info(self):
return (
"attention_cluster_youtube8m",
"https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz"
)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle
import paddle.fluid as fluid
class LogisticModel(object):
"""Logistic model."""
def build_model(self,
model_input,
vocab_size,
**unused_params):
"""Creates a logistic model.
Args:
model_input: 'batch' x 'num_features' matrix of input features.
vocab_size: The number of classes in the dataset.
Returns:
A dictionary with a tensor containing the probability predictions of the
model in the 'predictions' key. The dimensions of the tensor are
batch_size x num_classes."""
logit = fluid.layers.fc(
input=model_input,
size=vocab_size,
act=None,
name='logits_clf',
param_attr=fluid.ParamAttr(
name='logistic.weights',
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=fluid.ParamAttr(
name='logistic.bias',
initializer=fluid.initializer.MSRA(uniform=False)))
output = fluid.layers.sigmoid(logit)
return output, logit
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
import numpy as np
class ShiftingAttentionModel(object):
"""Shifting Attention Model"""
def __init__(self, input_dim, seg_num, n_att, name):
self.n_att = n_att
self.input_dim = input_dim
self.seg_num = seg_num
self.name = name
self.gnorm = np.sqrt(n_att)
def softmax_m1(self, x):
x_shape = fluid.layers.shape(x)
x_shape.stop_gradient = True
flat_x = fluid.layers.reshape(x, shape=(-1, self.seg_num))
flat_softmax = fluid.layers.softmax(flat_x)
return fluid.layers.reshape(
flat_softmax, shape=x.shape, actual_shape=x_shape)
def glorot(self, n):
return np.sqrt(1.0 / np.sqrt(n))
def forward(self, x):
"""Forward shifting attention model.
Args:
x: input features in shape of [N, L, F].
Returns:
out: output features in shape of [N, F * C]
"""
trans_x = fluid.layers.transpose(x, perm=[0, 2, 1])
# scores and weight in shape [N, C, L], sum(weights, -1) = 1
trans_x = fluid.layers.unsqueeze(trans_x, [-1])
scores = fluid.layers.conv2d(
trans_x,
self.n_att,
filter_size=1,
param_attr=ParamAttr(
name=self.name + ".conv.weight",
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=ParamAttr(
name=self.name + ".conv.bias",
initializer=fluid.initializer.MSRA()))
scores = fluid.layers.squeeze(scores, [-1])
weights = self.softmax_m1(scores)
glrt = self.glorot(self.n_att)
self.w = fluid.layers.create_parameter(
shape=(self.n_att, ),
dtype=x.dtype,
attr=ParamAttr(self.name + ".shift_w"),
default_initializer=fluid.initializer.Normal(0.0, glrt))
self.b = fluid.layers.create_parameter(
shape=(self.n_att, ),
dtype=x.dtype,
attr=ParamAttr(name=self.name + ".shift_b"),
default_initializer=fluid.initializer.Normal(0.0, glrt))
outs = []
for i in range(self.n_att):
# slice weight and expand to shape [N, L, C]
weight = fluid.layers.slice(
weights, axes=[1], starts=[i], ends=[i + 1])
weight = fluid.layers.transpose(weight, perm=[0, 2, 1])
weight = fluid.layers.expand(weight, [1, 1, self.input_dim])
w_i = fluid.layers.slice(self.w, axes=[0], starts=[i], ends=[i + 1])
b_i = fluid.layers.slice(self.b, axes=[0], starts=[i], ends=[i + 1])
shift = fluid.layers.reduce_sum(x * weight, dim=1) * w_i + b_i
l2_norm = fluid.layers.l2_normalize(shift, axis=-1)
outs.append(l2_norm / self.gnorm)
out = fluid.layers.concat(outs, axis=1)
return out
# AttentionLSTM视频分类模型
---
## 内容
- [模型简介](#简介)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型推断](#模型推断)
- [参考论文](#参考论文)
## 模型简介
递归神经网络(RNN)常用于序列数据的处理,可建模视频连续多帧的时序信息,在视频分类领域为基础常用方法。该模型采用了双向长短记忆网络(LSTM),将视频的所有帧特征依次编码。与传统方法直接采用LSTM最后一个时刻的输出不同,该模型增加了一个Attention层,每个时刻的隐状态输出都有一个自适应权重,然后线性加权得到最终特征向量。论文中实现的是两层LSTM结构,而本代码实现的是带Attention的双向LSTM,Attention层可参考论文[AttentionCluster](https://arxiv.org/abs/1711.09550)
详细内容请参考[Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909)
## 数据准备
AttentionLSTM模型使用2nd-Youtube-8M数据集,关于数据部分请参考[数据说明](../../dataset/README.md)
## 模型训练
### 随机初始化开始训练
数据准备完毕后,可以通过如下两种方式启动训练:
python train.py --model-name=AttentionLSTM
--config=./configs/attention_lstm.txt
--save-dir=checkpoints
--log-interval=10
--valid-interval=1
bash scripts/train/train_attention_lstm.sh
- AttentionLSTM模型使用8卡Nvidia Tesla P40来训练的,总的batch size数是1024。
### 使用预训练模型做finetune
请先将提供的[model](https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz)下载到本地,并在上述脚本文件中添加`--resume`为所保存的预模型存放路径。
## 模型评估
可通过如下两种方式进行模型评估:
python test.py --model-name=AttentionLSTM
--config=configs/attention_lstm.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
bash scripts/test/test_attention_lstm.sh
- 使用`scripts/test/test_attention_LSTM.sh`进行评估时,需要修改脚本中的`--weights`参数指定需要评估的权重。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz)进行评估
模型参数列表如下:
| 参数 | 取值 |
| :---------: | :----: |
| embedding\_size | 512 |
| lstm\_size | 1024 |
| drop\_rate | 0.5 |
计算指标列表如下:
| 精度指标 | 模型精度 |
| :---------: | :----: |
| Hit@1 | 0.8885 |
| PERR | 0.8012 |
| GAP | 0.8594 |
## 模型推断
可通过如下命令进行模型推断:
python infer.py --model-name=attention_lstm
--config=configs/attention_lstm.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
--filelist=$FILELIST
- 模型推断结果存储于`AttentionLSTM_infer_result`中,通过`pickle`格式存储。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz)进行推断
## 参考论文
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
- [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550), Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
from ..model import ModelBase
from .lstm_attention import LSTMAttentionModel
__all__ = ["AttentionLSTM"]
class AttentionLSTM(ModelBase):
def __init__(self, name, cfg, mode='train'):
super(AttentionLSTM, self).__init__(name, cfg, mode)
self.get_config()
def get_config(self):
# get model configs
self.feature_num = self.cfg.MODEL.feature_num
self.feature_names = self.cfg.MODEL.feature_names
self.feature_dims = self.cfg.MODEL.feature_dims
self.num_classes = self.cfg.MODEL.num_classes
self.embedding_size = self.cfg.MODEL.embedding_size
self.lstm_size = self.cfg.MODEL.lstm_size
self.drop_rate = self.cfg.MODEL.drop_rate
# get mode configs
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size', 1)
self.num_gpus = self.get_config_from_sec(self.mode, 'num_gpus', 1)
if self.mode == 'train':
self.learning_rate = self.get_config_from_sec('train',
'learning_rate', 1e-3)
self.weight_decay = self.get_config_from_sec('train',
'weight_decay', 8e-4)
self.num_samples = self.get_config_from_sec('train', 'num_samples',
5000000)
self.decay_epochs = self.get_config_from_sec('train',
'decay_epochs', [5])
self.decay_gamma = self.get_config_from_sec('train', 'decay_gamma',
0.1)
def build_input(self, use_pyreader):
if use_pyreader:
assert self.mode != 'infer', \
'pyreader is not recommendated when infer, please set use_pyreader to be false.'
shapes = []
for dim in self.feature_dims:
shapes.append([-1, dim])
shapes.append([-1, self.num_classes]) # label
self.py_reader = fluid.layers.py_reader(
capacity=1024,
shapes=shapes,
lod_levels=[1] * self.feature_num + [0],
dtypes=['float32'] * (self.feature_num + 1),
name='train_py_reader'
if self.is_training else 'test_py_reader',
use_double_buffer=True)
inputs = fluid.layers.read_file(self.py_reader)
self.feature_input = inputs[:self.feature_num]
self.label_input = inputs[-1]
else:
self.feature_input = []
for name, dim in zip(self.feature_names, self.feature_dims):
self.feature_input.append(
fluid.layers.data(
shape=[dim], lod_level=1, dtype='float32', name=name))
if self.mode == 'infer':
self.label_input = None
else:
self.label_input = fluid.layers.data(
shape=[self.num_classes], dtype='float32', name='label')
def build_model(self):
att_outs = []
for i, (input_dim, feature
) in enumerate(zip(self.feature_dims, self.feature_input)):
att = LSTMAttentionModel(input_dim, self.embedding_size,
self.lstm_size, self.drop_rate)
att_out = att.forward(feature, is_training=(self.mode == 'train'))
att_outs.append(att_out)
out = fluid.layers.concat(att_outs, axis=1)
fc1 = fluid.layers.fc(
input=out,
size=8192,
act='relu',
bias_attr=ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)))
fc2 = fluid.layers.fc(
input=fc1,
size=4096,
act='tanh',
bias_attr=ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)))
self.logit = fluid.layers.fc(input=fc2, size=self.num_classes, act=None, \
bias_attr=ParamAttr(regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)))
self.output = fluid.layers.sigmoid(self.logit)
def optimizer(self):
assert self.mode == 'train', "optimizer only can be get in train mode"
values = [
self.learning_rate * (self.decay_gamma**i)
for i in range(len(self.decay_epochs) + 1)
]
iter_per_epoch = self.num_samples / self.batch_size
boundaries = [e * iter_per_epoch for e in self.decay_epochs]
return fluid.optimizer.RMSProp(
learning_rate=fluid.layers.piecewise_decay(
values=values, boundaries=boundaries),
centered=True,
regularization=fluid.regularizer.L2Decay(self.weight_decay))
def loss(self):
assert self.mode != 'infer', "invalid loss calculationg in infer mode"
cost = fluid.layers.sigmoid_cross_entropy_with_logits(
x=self.logit, label=self.label_input)
cost = fluid.layers.reduce_sum(cost, dim=-1)
sum_cost = fluid.layers.reduce_sum(cost)
self.loss_ = fluid.layers.scale(
sum_cost, scale=self.num_gpus, bias_after_scale=False)
return self.loss_
def outputs(self):
return [self.output, self.logit]
def feeds(self):
return self.feature_input if self.mode == 'infer' else self.feature_input + [
self.label_input
]
def weights_info(self):
return ('attention_lstm_youtube8m',
'https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
import numpy as np
class LSTMAttentionModel(object):
"""LSTM Attention Model"""
def __init__(self,
bias_attr,
embedding_size=512,
lstm_size=1024,
drop_rate=0.5):
self.lstm_size = lstm_size
self.embedding_size = embedding_size
self.drop_rate = drop_rate
def forward(self, input, is_training):
input_fc = fluid.layers.fc(
input=input,
size=self.embedding_size,
act='tanh',
bias_attr=ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)))
lstm_forward_fc = fluid.layers.fc(
input=input_fc,
size=self.lstm_size * 4,
act=None,
bias_attr=ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)))
lstm_forward, _ = fluid.layers.dynamic_lstm(
input=lstm_forward_fc, size=self.lstm_size * 4, is_reverse=False)
lsmt_backward_fc = fluid.layers.fc(
input=input_fc,
size=self.lstm_size * 4,
act=None,
bias_attr=ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)))
lstm_backward, _ = fluid.layers.dynamic_lstm(
input=lsmt_backward_fc, size=self.lstm_size * 4, is_reverse=True)
lstm_concat = fluid.layers.concat(
input=[lstm_forward, lstm_backward], axis=1)
lstm_dropout = fluid.layers.dropout(
x=lstm_concat, dropout_prob=self.drop_rate, is_test=(not is_training))
lstm_weight = fluid.layers.fc(
input=lstm_dropout,
size=1,
act='sequence_softmax',
bias_attr=ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)))
scaled = fluid.layers.elementwise_mul(
x=lstm_dropout, y=lstm_weight, axis=0)
lstm_pool = fluid.layers.sequence_pool(input=scaled, pool_type='sum')
return lstm_pool
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import logging
try:
from configparser import ConfigParser
except:
from ConfigParser import ConfigParser
import paddle.fluid as fluid
from datareader import get_reader
from metrics import get_metrics
from .utils import download, AttrDict
WEIGHT_DIR = os.path.expanduser("~/.paddle/weights")
logger = logging.getLogger(__name__)
class NotImplementError(Exception):
"Error: model function not implement"
def __init__(self, model, function):
super(NotImplementError, self).__init__()
self.model = model.__class__.__name__
self.function = function.__name__
def __str__(self):
return "Function {}() is not implemented in model {}".format(
self.function, self.model)
class ModelNotFoundError(Exception):
"Error: model not found"
def __init__(self, model_name, avail_models):
super(ModelNotFoundError, self).__init__()
self.model_name = model_name
self.avail_models = avail_models
def __str__(self):
msg = "Model {} Not Found.\nAvailiable models:\n".format(
self.model_name)
for model in self.avail_models:
msg += " {}\n".format(model)
return msg
class ModelBase(object):
def __init__(self, name, cfg, mode='train'):
assert mode in ['train', 'valid', 'test', 'infer'], \
"Unknown mode type {}".format(mode)
self.name = name
self.is_training = (mode == 'train')
self.mode = mode
self.py_reader = None
# parse config
# assert os.path.exists(cfg), \
# "Config file {} not exists".format(cfg)
# self._config = ModelConfig(cfg)
# self._config.parse()
# if args and isinstance(args, dict):
# self._config.merge_configs(mode, args)
# self.cfg = self._config.get_configs()
self.cfg = cfg
def build_model(self):
"build model struct"
raise NotImplementError(self, self.build_model)
def build_input(self, use_pyreader):
"build input Variable"
raise NotImplementError(self, self.build_input)
def optimizer(self):
"get model optimizer"
raise NotImplementError(self, self.optimizer)
def outputs():
"get output variable"
raise notimplementerror(self, self.outputs)
def loss(self):
"get loss variable"
raise notimplementerror(self, self.loss)
def feeds(self):
"get feed inputs list"
raise NotImplementError(self, self.feeds)
def weights_info(self):
"get model weight default path and download url"
raise NotImplementError(self, self.weights_info)
def get_weights(self):
"get model weight file path, download weight from Paddle if not exist"
path, url = self.weights_info()
path = os.path.join(WEIGHT_DIR, path)
if os.path.exists(path):
return path
logger.info("Download weights of {} from {}".format(self.name, url))
download(url, path)
return path
def pyreader(self):
return self.py_reader
def epoch_num(self):
"get train epoch num"
return self.cfg.TRAIN.epoch
def pretrain_info(self):
"get pretrain base model directory"
return (None, None)
def get_pretrain_weights(self):
"get model weight file path, download weight from Paddle if not exist"
path, url = self.pretrain_info()
if not path:
return None
path = os.path.join(WEIGHT_DIR, path)
if os.path.exists(path):
return path
logger.info("Download pretrain weights of {} from {}".format(
self.name, url))
download(url, path)
return path
def load_pretrain_params(self, exe, pretrain, prog, place):
logger.info("Load pretrain weights from {}".format(pretrain))
fluid.io.load_params(exe, pretrain, main_program=prog)
def get_config_from_sec(self, sec, item, default=None):
if sec.upper() not in self.cfg:
return default
return self.cfg[sec.upper()].get(item, default)
class ModelZoo(object):
def __init__(self):
self.model_zoo = {}
def regist(self, name, model):
assert model.__base__ == ModelBase, "Unknow model type {}".format(
type(model))
self.model_zoo[name] = model
def get(self, name, cfg, mode='train'):
for k, v in self.model_zoo.items():
if k == name:
return v(name, cfg, mode)
raise ModelNotFoundError(name, self.model_zoo.keys())
# singleton model_zoo
model_zoo = ModelZoo()
def regist_model(name, model):
model_zoo.regist(name, model)
def get_model(name, cfg, mode='train'):
return model_zoo.get(name, cfg, mode)
# NeXtVLAD视频分类模型
---
## 目录
- [算法介绍](#模型简介)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型推断](#模型推断)
- [参考论文](#参考论文)
## 算法介绍
NeXtVLAD模型是第二届Youtube-8M视频理解竞赛中效果最好的单模型,在参数量小于80M的情况下,能得到高于0.87的GAP指标。该模型提供了一种将桢级别的视频特征转化并压缩成特征向量,以适用于大尺寸视频文件的分类的方法。其基本出发点是在NetVLAD模型的基础上,将高维度的特征先进行分组,通过引入attention机制聚合提取时间维度的信息,这样既可以获得较高的准确率,又可以使用更少的参数量。详细内容请参考[NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014)
这里实现了论文中的单模型结构,使用2nd-Youtube-8M的train数据集作为训练集,在val数据集上做测试。
## 数据准备
NeXtVLAD模型使用2nd-Youtube-8M数据集, 数据下载及准备请参考[数据说明](../../dataset/README.md)
## 模型训练
### 随机初始化开始训练
在video目录下运行如下脚本即可
bash ./scripts/train/train_nextvlad.sh
### 使用预训练模型做finetune
请先将提供的预训练模型[model](https://paddlemodels.bj.bcebos.com/video_classification/nextvlad_youtube8m.tar.gz)下载到本地,并在上述脚本文件中添加--resume为所保存的预模型存放路径。
使用4卡Nvidia Tesla P40,总的batch size数是160。
### 训练策略
* 使用Adam优化器,初始learning\_rate=0.0002
* 每2,000,000个样本做一次学习率衰减,learning\_rate\_decay = 0.8
* 正则化使用l2\_weight\_decay = 1e-5
## 模型评估
用户可以下载的预训练模型参数,或者使用自己训练好的模型参数,请在./scripts/test/test\_nextvald.sh
文件中修改--weights参数为保存模型参数的目录。运行
bash ./scripts/test/test_nextvlad.sh
由于youtube-8m提供的数据中test数据集是没有ground truth标签的,所以这里使用validation数据集来做测试。
模型参数列表如下:
| 参数 | 取值 |
| :---------: | :----: |
| cluster\_size | 128 |
| hidden\_size | 2048 |
| groups | 8 |
| expansion | 2 |
| drop\_rate | 0.5 |
| gating\_reduction | 8 |
计算指标列表如下:
| 精度指标 | 模型精度 |
| :---------: | :----: |
| Hit@1 | 0.8960 |
| PERR | 0.8132 |
| GAP | 0.8709 |
## 模型推断
用户可以下载的预训练模型参数,或者使用自己训练好的模型参数,请在./scripts/infer/infer\_nextvald.sh
文件中修改--weights参数为保存模型参数的目录,运行如下脚本
bash ./scripts/infer/infer_nextvald.sh
推断结果会保存在NEXTVLAD\_infer\_result文件中,通过pickle格式存储。
## 参考论文
- [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan
from __future__ import absolute_import
from .nextvlad import *
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle
import paddle.fluid as fluid
class LogisticModel(object):
"""Logistic model with L2 regularization."""
def create_model(self,
model_input,
vocab_size,
l2_penalty=None,
**unused_params):
"""Creates a logistic model.
Args:
model_input: 'batch' x 'num_features' matrix of input features.
vocab_size: The number of classes in the dataset.
Returns:
A dictionary with a tensor containing the probability predictions of the
model in the 'predictions' key. The dimensions of the tensor are
batch_size x num_classes."""
logits = fluid.layers.fc(
input=model_input,
size=vocab_size,
act=None,
name='logits_clf',
param_attr=fluid.ParamAttr(
name='logits_clf_weights',
initializer=fluid.initializer.MSRA(uniform=False),
regularizer=fluid.regularizer.L2DecayRegularizer(l2_penalty)),
bias_attr=fluid.ParamAttr(
name='logits_clf_bias',
regularizer=fluid.regularizer.L2DecayRegularizer(l2_penalty)))
output = fluid.layers.sigmoid(logits)
return {'predictions': output, 'logits': logits}
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
from ..model import ModelBase
from .clf_model import LogisticModel
from . import nextvlad_model
__all__ = ["NEXTVLAD"]
class NEXTVLAD(ModelBase):
def __init__(self, name, cfg, mode='train'):
super(NEXTVLAD, self).__init__(name, cfg, mode=mode)
self.get_config()
def get_config(self):
# model params
self.num_classes = self.get_config_from_sec('model', 'num_classes')
self.video_feature_size = self.get_config_from_sec('model',
'video_feature_size')
self.audio_feature_size = self.get_config_from_sec('model',
'audio_feature_size')
self.cluster_size = self.get_config_from_sec('model', 'cluster_size')
self.hidden_size = self.get_config_from_sec('model', 'hidden_size')
self.groups = self.get_config_from_sec('model', 'groups')
self.expansion = self.get_config_from_sec('model', 'expansion')
self.drop_rate = self.get_config_from_sec('model', 'drop_rate')
self.gating_reduction = self.get_config_from_sec('model',
'gating_reduction')
self.eigen_file = self.get_config_from_sec('model', 'eigen_file')
# training params
self.base_learning_rate = self.get_config_from_sec('train',
'learning_rate')
self.lr_boundary_examples = self.get_config_from_sec(
'train', 'lr_boundary_examples')
self.max_iter = self.get_config_from_sec('train', 'max_iter')
self.learning_rate_decay = self.get_config_from_sec(
'train', 'learning_rate_decay')
self.l2_penalty = self.get_config_from_sec('train', 'l2_penalty')
self.gradient_clip_norm = self.get_config_from_sec('train',
'gradient_clip_norm')
self.use_gpu = self.get_config_from_sec('train', 'use_gpu')
self.num_gpus = self.get_config_from_sec('train', 'num_gpus')
# other params
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
def build_input(self, use_pyreader=True):
rgb_shape = [self.video_feature_size]
audio_shape = [self.audio_feature_size]
label_shape = [self.num_classes]
if use_pyreader:
assert self.mode != 'infer', \
'pyreader is not recommendated when infer, please set use_pyreader to be false.'
py_reader = fluid.layers.py_reader(
capacity=100,
shapes=[[-1] + rgb_shape, [-1] + audio_shape,
[-1] + label_shape],
lod_levels=[1, 1, 0],
dtypes=['float32', 'float32', 'float32'],
name='train_py_reader'
if self.is_training else 'test_py_reader',
use_double_buffer=True)
rgb, audio, label = fluid.layers.read_file(py_reader)
self.py_reader = py_reader
else:
rgb = fluid.layers.data(
name='train_rgb' if self.is_training else 'test_rgb',
shape=rgb_shape,
dtype='float32',
lod_level=1)
audio = fluid.layers.data(
name='train_audio' if self.is_training else 'test_audio',
shape=audio_shape,
dtype='float32',
lod_level=1)
if self.mode == 'infer':
label = None
else:
label = fluid.layers.data(
name='train_label' if self.is_training else 'test_label',
shape=label_shape,
dtype='float32')
self.feature_input = [rgb, audio]
self.label_input = label
def create_model_args(self):
model_args = {}
model_args['class_dim'] = self.num_classes
model_args['cluster_size'] = self.cluster_size
model_args['hidden_size'] = self.hidden_size
model_args['groups'] = self.groups
model_args['expansion'] = self.expansion
model_args['drop_rate'] = self.drop_rate
model_args['gating_reduction'] = self.gating_reduction
model_args['l2_penalty'] = self.l2_penalty
return model_args
def build_model(self):
model_args = self.create_model_args()
videomodel = nextvlad_model.NeXtVLADModel()
rgb = self.feature_input[0]
audio = self.feature_input[1]
out = videomodel.create_model(
rgb, audio, is_training=(self.mode == 'train'), **model_args)
self.logits = out['logits']
self.predictions = out['predictions']
self.network_outputs = [out['predictions']]
def optimizer(self):
assert self.mode == 'train', "optimizer only can be get in train mode"
im_per_batch = self.batch_size
lr_bounds, lr_values = get_learning_rate_decay_list(
self.base_learning_rate, self.learning_rate_decay, self.max_iter,
self.lr_boundary_examples, im_per_batch)
return fluid.optimizer.AdamOptimizer(
learning_rate=fluid.layers.piecewise_decay(
boundaries=lr_bounds, values=lr_values))
def loss(self):
assert self.mode != 'infer', "invalid loss calculationg in infer mode"
cost = fluid.layers.sigmoid_cross_entropy_with_logits(
x=self.logits, label=self.label_input)
cost = fluid.layers.reduce_sum(cost, dim=-1)
self.loss_ = fluid.layers.mean(x=cost)
return self.loss_
def outputs(self):
return self.network_outputs
def feeds(self):
return self.feature_input if self.mode == 'infer' else self.feature_input + [
self.label_input
]
def weights_info(self):
return ('nextvlad_youtube8m',
'https://paddlemodels.bj.bcebos.com/video_classification/nextvlad_youtube8m.tar.gz')
def get_learning_rate_decay_list(base_learning_rate, decay, max_iter,
decay_examples, total_batch_size):
decay_step = decay_examples // total_batch_size
lr_bounds = []
lr_values = [base_learning_rate]
i = 1
while True:
if i * decay_step >= max_iter:
break
lr_bounds.append(i * decay_step)
lr_values.append(base_learning_rate * (decay**i))
i += 1
return lr_bounds, lr_values
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import numpy as np
import paddle
import paddle.fluid as fluid
from . import clf_model
class NeXtVLAD(object):
"""
This is a paddlepaddle implementation of the NeXtVLAD model. For more
information, please refer to the paper,
https://static.googleusercontent.com/media/research.google.com/zh-CN//youtube8m/workshop2018/p_c03.pdf
"""
def __init__(self,
feature_size,
cluster_size,
is_training=True,
expansion=2,
groups=None,
inputname='video'):
self.feature_size = feature_size
self.cluster_size = cluster_size
self.is_training = is_training
self.expansion = expansion
self.groups = groups
self.name = inputname + '_'
def forward(self, input):
input = fluid.layers.fc(
input=input,
size=self.expansion * self.feature_size,
act=None,
name=self.name + 'fc_expansion',
param_attr=fluid.ParamAttr(
name=self.name + 'fc_expansion_w',
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=fluid.ParamAttr(
name=self.name + 'fc_expansion_b',
initializer=fluid.initializer.Constant(value=0.)))
# attention factor of per group
attention = fluid.layers.fc(
input=input,
size=self.groups,
act='sigmoid',
name=self.name + 'fc_group_attention',
param_attr=fluid.ParamAttr(
name=self.name + 'fc_group_attention_w',
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=fluid.ParamAttr(
name=self.name + 'fc_group_attention_b',
initializer=fluid.initializer.Constant(value=0.)))
# calculate activation factor of per group per cluster
feature_size = self.feature_size * self.expansion // self.groups
cluster_weights = fluid.layers.create_parameter(
shape=[
self.expansion * self.feature_size,
self.groups * self.cluster_size
],
dtype=input.dtype,
attr=fluid.ParamAttr(name=self.name + 'cluster_weights'),
default_initializer=fluid.initializer.MSRA(uniform=False))
activation = fluid.layers.matmul(input, cluster_weights)
activation = fluid.layers.batch_norm(
activation, is_test=(not self.is_training))
# reshape of activation
activation = fluid.layers.reshape(activation,
[-1, self.groups, self.cluster_size])
# softmax on per cluster
activation = fluid.layers.softmax(activation)
activation = fluid.layers.elementwise_mul(activation, attention, axis=0)
a_sum = fluid.layers.sequence_pool(activation, 'sum')
a_sum = fluid.layers.reduce_sum(a_sum, dim=1)
# create cluster_weights2
cluster_weights2 = fluid.layers.create_parameter(
shape=[self.cluster_size, feature_size],
dtype=input.dtype,
attr=fluid.ParamAttr(name=self.name + 'cluster_weights2'),
default_initializer=fluid.initializer.MSRA(uniform=False))
# expand a_sum dimension from [-1, self.cluster_size] to be [-1, self.cluster_size, feature_size]
a_sum = fluid.layers.reshape(a_sum, [-1, self.cluster_size, 1])
a_sum = fluid.layers.expand(a_sum, [1, 1, feature_size])
# element wise multiply a_sum and cluster_weights2
a = fluid.layers.elementwise_mul(
a_sum, cluster_weights2,
axis=1) # output shape [-1, self.cluster_size, feature_size]
# transpose activation from [-1, self.groups, self.cluster_size] to [-1, self.cluster_size, self.groups]
activation2 = fluid.layers.transpose(activation, perm=[0, 2, 1])
# transpose op will clear the lod infomation, so it should be reset
activation = fluid.layers.lod_reset(activation2, activation)
# reshape input from [-1, self.expansion * self.feature_size] to [-1, self.groups, feature_size]
reshaped_input = fluid.layers.reshape(input,
[-1, self.groups, feature_size])
# mat multiply activation and reshaped_input
vlad = fluid.layers.matmul(
activation,
reshaped_input) # output shape [-1, self.cluster_size, feature_size]
vlad = fluid.layers.sequence_pool(vlad, 'sum')
vlad = fluid.layers.elementwise_sub(vlad, a)
# l2_normalization
vlad = fluid.layers.transpose(vlad, [0, 2, 1])
vlad = fluid.layers.l2_normalize(vlad, axis=1)
# reshape and batch norm
vlad = fluid.layers.reshape(vlad,
[-1, self.cluster_size * feature_size])
vlad = fluid.layers.batch_norm(vlad, is_test=(not self.is_training))
return vlad
class NeXtVLADModel(object):
"""
Creates a NeXtVLAD based model.
Args:
model_input: A LoDTensor of [-1, N] for the input video frames.
vocab_size: The number of classes in the dataset.
"""
def __init__(self):
pass
def create_model(self,
video_input,
audio_input,
is_training=True,
class_dim=None,
cluster_size=None,
hidden_size=None,
groups=None,
expansion=None,
drop_rate=None,
gating_reduction=None,
l2_penalty=None,
**unused_params):
# calcluate vlad of video and audio
video_nextvlad = NeXtVLAD(
1024,
cluster_size,
is_training,
expansion=expansion,
groups=groups,
inputname='video')
audio_nextvlad = NeXtVLAD(
128,
cluster_size,
is_training,
expansion=expansion,
groups=groups,
inputname='audio')
vlad_video = video_nextvlad.forward(video_input)
vlad_audio = audio_nextvlad.forward(audio_input)
# concat video and audio
vlad = fluid.layers.concat([vlad_video, vlad_audio], axis=1)
# drop out
if drop_rate > 0.:
vlad = fluid.layers.dropout(
vlad, drop_rate, is_test=(not is_training))
# add fc
activation = fluid.layers.fc(
input=vlad,
size=hidden_size,
act=None,
name='hidden1_fc',
param_attr=fluid.ParamAttr(
name='hidden1_fc_weights',
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=False)
activation = fluid.layers.batch_norm(
activation, is_test=(not is_training))
# add fc, gate 1
gates = fluid.layers.fc(
input=activation,
size=hidden_size // gating_reduction,
act=None,
name='gating_fc1',
param_attr=fluid.ParamAttr(
name='gating_fc1_weights',
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=False)
gates = fluid.layers.batch_norm(
gates, is_test=(not is_training), act='relu')
# add fc, gate 2
gates = fluid.layers.fc(
input=gates,
size=hidden_size,
act='sigmoid',
name='gating_fc2',
param_attr=fluid.ParamAttr(
name='gating_fc2_weights',
initializer=fluid.initializer.MSRA(uniform=False)),
bias_attr=False)
activation = fluid.layers.elementwise_mul(activation, gates)
aggregate_model = clf_model.LogisticModel # set classification model
return aggregate_model().create_model(
model_input=activation,
vocab_size=class_dim,
l2_penalty=l2_penalty,
is_training=is_training,
**unused_params)
# StNet 视频分类模型
---
## 目录
- [模型简介](#模型简介)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型推断](#模型推断)
- [参考论文](#参考论文)
## 模型简介
StNet模型框架为ActivityNet Kinetics Challenge 2018中夺冠的基础网络框架,本次开源的是基于ResNet50实现的StNet模型,基于其他backbone网络的框架用户可以依样配置。该模型提出“super-image"的概念,在super-image上进行2D卷积,建模视频中局部时空相关性。另外通过temporal modeling block建模视频的全局时空依赖,最后用一个temporal Xception block对抽取的特征序列进行长时序建模。StNet主体网络结构如下图所示:
<p align="center">
<img src="../../images/StNet.png" height=300 width=500 hspace='10'/> <br />
StNet Framework Overview
</p>
详细内容请参考AAAI'2019年论文[StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549)
## 数据准备
StNet的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。数据下载及准备请参考[数据说明](../../dataset/README.md)
## 模型训练
数据准备完毕后,可以通过如下两种方式启动训练:
python train.py --model-name=STNET
--config=./configs/stnet.txt
--save-dir=checkpoints
--log-interval=10
--valid-interval=1
bash scripts/train/train_stnet.sh
- 可下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz)通过`--resume`指定权重存放路径进行finetune等开发
**数据读取器说明:** 模型读取Kinetics-400数据集中的`mp4`数据,每条数据抽取`seg_num`段,每段抽取`seg_len`帧图像,对每帧图像做随机增强后,缩放至`target_size`
**训练策略:**
* 采用Momentum优化算法训练,momentum=0.9
* 权重衰减系数为1e-4
* 学习率在训练的总epoch数的1/3和2/3时分别做0.1的衰减
**备注:**
* 在训练StNet模型时使用PaddlePaddle Fluid 1.3 + cudnn5.1。使用cudnn7.0以上版本时batchnorm计算moving mean和moving average会出现异常,此问题还在修复中。建议用户安装PaddlePaddle时指定cudnn版本,
pip install paddlepaddle\_gpu==1.3.0.post85
或者在PaddlePaddle的whl包[下载页面](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/Tables.html/#permalink-4--whl-release)选择下载cuda8.0\_cudnn5\_avx\_mkl对应的whl包安装。
关于安装PaddlePaddle的详细操作请参考[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html)
## 模型评估
可通过如下两种方式进行模型评估:
python test.py --model-name=STNET
--config=configs/stnet.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
bash scripts/test/test__stnet.sh
- 使用`scripts/test/test_stnet.sh`进行评估时,需要修改脚本中的`--weights`参数指定需要评估的权重。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz)进行评估
当取如下参数时:
| 参数 | 取值 |
| :---------: | :----: |
| seg\_num | 25 |
| seglen | 5 |
| target\_size | 256 |
在Kinetics400的validation数据集下评估精度如下:
| 精度指标 | 模型精度 |
| :---------: | :----: |
| TOP\_1 | 0.69 |
## 模型推断
可通过如下命令进行模型推断:
python infer.py --model-name=stnet
--config=configs/stnet.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
--filelist=$FILELIST
- 模型推断结果存储于`STNET_infer_result`中,通过`pickle`格式存储。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz)进行推断
## 参考论文
- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import numpy as np
import paddle.fluid as fluid
from ..model import ModelBase
from .stnet_res_model import StNet_ResNet
import logging
logger = logging.getLogger(__name__)
__all__ = ["STNET"]
class STNET(ModelBase):
def __init__(self, name, cfg, mode='train'):
super(STNET, self).__init__(name, cfg, mode=mode)
self.get_config()
def get_config(self):
self.num_classes = self.get_config_from_sec('model', 'num_classes')
self.seg_num = self.get_config_from_sec('model', 'seg_num')
self.seglen = self.get_config_from_sec('model', 'seglen')
self.image_mean = self.get_config_from_sec('model', 'image_mean')
self.image_std = self.get_config_from_sec('model', 'image_std')
self.num_layers = self.get_config_from_sec('model', 'num_layers')
self.num_epochs = self.get_config_from_sec('train', 'epoch')
self.total_videos = self.get_config_from_sec('train', 'total_videos')
self.base_learning_rate = self.get_config_from_sec('train',
'learning_rate')
self.learning_rate_decay = self.get_config_from_sec(
'train', 'learning_rate_decay')
self.l2_weight_decay = self.get_config_from_sec('train',
'l2_weight_decay')
self.momentum = self.get_config_from_sec('train', 'momentum')
self.target_size = self.get_config_from_sec(self.mode, 'target_size')
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
def build_input(self, use_pyreader=True):
image_shape = [3, self.target_size, self.target_size]
image_shape[0] = image_shape[0] * self.seglen
image_shape = [self.seg_num] + image_shape
self.use_pyreader = use_pyreader
if use_pyreader:
assert self.mode != 'infer', \
'pyreader is not recommendated when infer, please set use_pyreader to be false.'
py_reader = fluid.layers.py_reader(
capacity=100,
shapes=[[-1] + image_shape, [-1] + [1]],
dtypes=['float32', 'int64'],
name='train_py_reader'
if self.is_training else 'test_py_reader',
use_double_buffer=True)
image, label = fluid.layers.read_file(py_reader)
self.py_reader = py_reader
else:
image = fluid.layers.data(
name='image', shape=image_shape, dtype='float32')
if self.mode != 'infer':
label = fluid.layers.data(
name='label', shape=[1], dtype='int64')
else:
label = None
self.feature_input = [image]
self.label_input = label
def create_model_args(self):
cfg = {}
cfg['layers'] = self.num_layers
cfg['class_dim'] = self.num_classes
cfg['seg_num'] = self.seg_num
cfg['seglen'] = self.seglen
return cfg
def build_model(self):
cfg = self.create_model_args()
videomodel = StNet_ResNet(layers = cfg['layers'], seg_num = cfg['seg_num'], \
seglen = cfg['seglen'], is_training = (self.mode == 'train'))
out = videomodel.net(input=self.feature_input[0],
class_dim=cfg['class_dim'])
self.network_outputs = [out]
def optimizer(self):
epoch_points = [self.num_epochs / 3, self.num_epochs * 2 / 3]
total_videos = self.total_videos
step = int(total_videos / self.batch_size + 1)
bd = [e * step for e in epoch_points]
base_lr = self.base_learning_rate
lr_decay = self.learning_rate_decay
lr = [base_lr, base_lr * lr_decay, base_lr * lr_decay * lr_decay]
l2_weight_decay = self.l2_weight_decay
momentum = self.momentum
optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.piecewise_decay(
boundaries=bd, values=lr),
momentum=momentum,
regularization=fluid.regularizer.L2Decay(l2_weight_decay))
return optimizer
def loss(self):
cost = fluid.layers.cross_entropy(input=self.network_outputs[0], \
label=self.label_input, ignore_index=-1)
self.loss_ = fluid.layers.mean(x=cost)
return self.loss_
def outputs(self):
return self.network_outputs
def feeds(self):
return self.feature_input if self.mode == 'infer' else self.feature_input + [
self.label_input
]
def pretrain_info(self):
return ('ResNet50_pretrained', 'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz')
def weights_info(self):
return ('stnet_kinetics',
'https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz')
def load_pretrain_params(self, exe, pretrain, prog, place):
def is_parameter(var):
if isinstance(var, fluid.framework.Parameter):
return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name)) \
and (not ("batch_norm" in var.name)) and (not ("xception" in var.name)) and (not ("conv3d" in var.name))
logger.info("Load pretrain weights from {}, exclude fc, batch_norm, xception, conv3d layers.".format(pretrain))
vars = filter(is_parameter, prog.list_vars())
fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog)
param_tensor = fluid.global_scope().find_var(
"conv1_weights").get_tensor()
param_numpy = np.array(param_tensor)
param_numpy = np.mean(param_numpy, axis=1, keepdims=True) / self.seglen
param_numpy = np.repeat(param_numpy, 3 * self.seglen, axis=1)
param_tensor.set(param_numpy.astype(np.float32), place)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import time
import sys
import paddle.fluid as fluid
import math
class StNet_ResNet():
def __init__(self, layers=50, seg_num=7, seglen=5, is_training=True):
self.layers = layers
self.seglen = seglen
self.seg_num = seg_num
self.is_training = is_training
def temporal_conv_bn(
self,
input, #(B*seg_num, c, h, w)
num_filters,
filter_size=(3, 1, 1),
padding=(1, 0, 0)):
#(B, seg_num, c, h, w)
in_reshape = fluid.layers.reshape(
x=input,
shape=[
-1, self.seg_num, input.shape[-3], input.shape[-2],
input.shape[-1]
])
in_transpose = fluid.layers.transpose(in_reshape, perm=[0, 2, 1, 3, 4])
conv = fluid.layers.conv3d(
input=in_transpose,
num_filters=num_filters,
filter_size=filter_size,
stride=1,
groups=1,
padding=padding,
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.MSRAInitializer()),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0.0)))
out = fluid.layers.batch_norm(
input=conv,
act=None,
is_test=(not self.is_training),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=1.0)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0.0)))
out = out + in_transpose
out = fluid.layers.transpose(out, perm=[0, 2, 1, 3, 4])
out = fluid.layers.reshape(x=out, shape=input.shape)
return out
def xception(self, input): #(B, C, seg_num,1)
bn = fluid.layers.batch_norm(
input=input,
act=None,
name="xception_bn",
is_test=(not self.is_training),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=1.0)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0.0)))
att_conv = fluid.layers.conv2d(
input=bn,
num_filters=2048,
filter_size=[3, 1],
stride=[1, 1],
padding=[1, 0],
groups=2048,
name="xception_att_conv",
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.MSRAInitializer()),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0)))
att_2 = fluid.layers.conv2d(
input=att_conv,
num_filters=1024,
filter_size=[1, 1],
stride=[1, 1],
name="xception_att_2",
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.MSRAInitializer()),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0)))
bndw = fluid.layers.batch_norm(
input=att_2,
act="relu",
name="xception_bndw",
is_test=(not self.is_training),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=1.0)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0.0)))
att1 = fluid.layers.conv2d(
input=bndw,
num_filters=1024,
filter_size=[3, 1],
stride=[1, 1],
padding=[1, 0],
groups=1024,
name="xception_att1",
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.MSRAInitializer()),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0)))
att1_2 = fluid.layers.conv2d(
input=att1,
num_filters=1024,
filter_size=[1, 1],
stride=[1, 1],
name="xception_att1_2",
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.MSRAInitializer()),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0)))
dw = fluid.layers.conv2d(
input=bn,
num_filters=1024,
filter_size=[1, 1],
stride=[1, 1],
name="xception_dw",
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.MSRAInitializer()),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0)))
add_to = dw + att1_2
bn2 = fluid.layers.batch_norm(
input=add_to,
act=None,
name='xception_bn2',
is_test=(not self.is_training),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=1.0)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.ConstantInitializer(value=0.0)))
return fluid.layers.relu(bn2)
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
bias_attr=False,
#name = name+".conv2d.output.1"
)
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(
input=conv,
act=act,
is_test=(not self.is_training),
#name=bn_name+'.output.1',
param_attr=fluid.param_attr.ParamAttr(name=bn_name + "_scale"),
bias_attr=fluid.param_attr.ParamAttr(bn_name + '_offset'),
moving_mean_name=bn_name + "_mean",
moving_variance_name=bn_name + '_variance')
def shortcut(self, input, ch_out, stride, name):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
act='relu',
name=name + "_branch2a")
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu',
name=name + "_branch2b")
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
filter_size=1,
act=None,
name=name + "_branch2c")
short = self.shortcut(
input, num_filters * 4, stride, name=name + "_branch1")
return fluid.layers.elementwise_add(
x=short,
y=conv2,
act='relu',
#name=".add.output.5"
)
def net(self, input, class_dim=101):
layers = self.layers
seg_num = self.seg_num
seglen = self.seglen
supported_layers = [50, 101, 152]
if layers not in supported_layers:
print("supported layers are", supported_layers, \
"but input layer is ", layers)
exit()
# reshape input
# [B, seg_num, seglen*c, H, W] --> [B*seg_num, seglen*c, H, W]
channels = input.shape[2]
short_size = input.shape[3]
input = fluid.layers.reshape(
x=input, shape=[-1, channels, short_size, short_size])
if layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name='conv1')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
for block in range(len(depth)):
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
name=conv_name)
if block == 1:
#insert the first temporal modeling block
conv = self.temporal_conv_bn(input=conv, num_filters=512)
if block == 2:
#insert the second temporal modeling block
conv = self.temporal_conv_bn(input=conv, num_filters=1024)
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
feature = fluid.layers.reshape(
x=pool, shape=[-1, seg_num, pool.shape[1], 1])
feature = fluid.layers.transpose(feature, perm=[0, 2, 1, 3])
#append the temporal Xception block
xfeat = self.xception(feature) #(B, 1024, seg_num, 1)
out = fluid.layers.pool2d(
input=xfeat,
pool_size=(seg_num, 1),
pool_type='max',
global_pooling=True)
out = fluid.layers.reshape(x=out, shape=[-1, 1024])
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=out,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
return out
# TSN 视频分类模型
---
## 内容
- [模型简介](#模型简介)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型推断](#模型推断)
- [参考论文](#参考论文)
## 模型简介
Temporal Segment Network (TSN) 是视频分类领域经典的基于2D-CNN的解决方案。该方法主要解决视频的长时间行为判断问题,通过稀疏采样视频帧的方式代替稠密采样,既能捕获视频全局信息,也能去除冗余,降低计算量。最终将每帧特征平均融合后得到视频的整体特征,并用于分类。本代码实现的模型为基于单路RGB图像的TSN网络结构,Backbone采用ResNet-50结构。
详细内容请参考ECCV 2016年论文[StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1608.00859)
## 数据准备
TSN的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。数据下载及准备请参考[数据说明](../../dataset/README.md)
## 模型训练
数据准备完毕后,可以通过如下两种方式启动训练:
python train.py --model-name=TSN
--config=./configs/tsn.txt
--save-dir=checkpoints
--log-interval=10
--valid-interval=1
bash scripts/train/train_tsn.sh
- 可下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz)通过`--resume`指定权重存放路径进行finetune等开发
**数据读取器说明:** 模型读取Kinetics-400数据集中的`mp4`数据,每条数据抽取`seg_num`段,每段抽取1帧图像,对每帧图像做随机增强后,缩放至`target_size`
**训练策略:**
* 采用Momentum优化算法训练,momentum=0.9
* 权重衰减系数为1e-4
* 学习率在训练的总epoch数的1/3和2/3时分别做0.1的衰减
## 模型评估
可通过如下两种方式进行模型评估:
python test.py --model-name=TSN
--config=configs/tsn.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
bash scripts/test/test_tsn.sh
- 使用`scripts/test/test_tsn.sh`进行评估时,需要修改脚本中的`--weights`参数指定需要评估的权重。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz)进行评估
当取如下参数时,在Kinetics400的validation数据集下评估精度如下:
| seg\_num | target\_size | Top-1 |
| :------: | :----------: | :----: |
| 3 | 224 | 0.66 |
| 7 | 224 | 0.67 |
## 模型推断
可通过如下命令进行模型推断:
python infer.py --model-name=TSN
--config=configs/tsn.txt
--log-interval=1
--weights=$PATH_TO_WEIGHTS
--filelist=$FILELIST
- 模型推断结果存储于`TSN_infer_result`中,通过`pickle`格式存储。
- 若未指定`--weights`参数,脚本会下载已发布模型[model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz)进行推断
## 参考论文
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
from ..model import ModelBase
from .tsn_res_model import TSN_ResNet
import logging
logger = logging.getLogger(__name__)
__all__ = ["TSN"]
class TSN(ModelBase):
def __init__(self, name, cfg, mode='train'):
super(TSN, self).__init__(name, cfg, mode=mode)
self.get_config()
def get_config(self):
self.num_classes = self.get_config_from_sec('model', 'num_classes')
self.seg_num = self.get_config_from_sec('model', 'seg_num')
self.seglen = self.get_config_from_sec('model', 'seglen')
self.image_mean = self.get_config_from_sec('model', 'image_mean')
self.image_std = self.get_config_from_sec('model', 'image_std')
self.num_layers = self.get_config_from_sec('model', 'num_layers')
self.num_epochs = self.get_config_from_sec('train', 'epoch')
self.total_videos = self.get_config_from_sec('train', 'total_videos')
self.base_learning_rate = self.get_config_from_sec('train',
'learning_rate')
self.learning_rate_decay = self.get_config_from_sec(
'train', 'learning_rate_decay')
self.l2_weight_decay = self.get_config_from_sec('train',
'l2_weight_decay')
self.momentum = self.get_config_from_sec('train', 'momentum')
self.target_size = self.get_config_from_sec(self.mode, 'target_size')
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
def build_input(self, use_pyreader=True):
image_shape = [3, self.target_size, self.target_size]
image_shape[0] = image_shape[0] * self.seglen
image_shape = [self.seg_num] + image_shape
self.use_pyreader = use_pyreader
if use_pyreader:
assert self.mode != 'infer', \
'pyreader is not recommendated when infer, please set use_pyreader to be false.'
py_reader = fluid.layers.py_reader(
capacity=100,
shapes=[[-1] + image_shape, [-1] + [1]],
dtypes=['float32', 'int64'],
name='train_py_reader'
if self.is_training else 'test_py_reader',
use_double_buffer=True)
image, label = fluid.layers.read_file(py_reader)
self.py_reader = py_reader
else:
image = fluid.layers.data(
name='image', shape=image_shape, dtype='float32')
if self.mode != 'infer':
label = fluid.layers.data(
name='label', shape=[1], dtype='int64')
else:
label = None
self.feature_input = [image]
self.label_input = label
def create_model_args(self):
cfg = {}
cfg['layers'] = self.num_layers
cfg['class_dim'] = self.num_classes
cfg['seg_num'] = self.seg_num
return cfg
def build_model(self):
cfg = self.create_model_args()
videomodel = TSN_ResNet(
layers=cfg['layers'],
seg_num=cfg['seg_num'],
is_training=(self.mode == 'train'))
out = videomodel.net(input=self.feature_input[0],
class_dim=cfg['class_dim'])
self.network_outputs = [out]
def optimizer(self):
assert self.mode == 'train', "optimizer only can be get in train mode"
epoch_points = [self.num_epochs / 3, self.num_epochs * 2 / 3]
total_videos = self.total_videos
step = int(total_videos / self.batch_size + 1)
bd = [e * step for e in epoch_points]
base_lr = self.base_learning_rate
lr_decay = self.learning_rate_decay
lr = [base_lr, base_lr * lr_decay, base_lr * lr_decay * lr_decay]
l2_weight_decay = self.l2_weight_decay
momentum = self.momentum
optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.piecewise_decay(
boundaries=bd, values=lr),
momentum=momentum,
regularization=fluid.regularizer.L2Decay(l2_weight_decay))
return optimizer
def loss(self):
assert self.mode != 'infer', "invalid loss calculationg in infer mode"
cost = fluid.layers.cross_entropy(input=self.network_outputs[0], \
label=self.label_input, ignore_index=-1)
self.loss_ = fluid.layers.mean(x=cost)
return self.loss_
def outputs(self):
return self.network_outputs
def feeds(self):
return self.feature_input if self.mode == 'infer' else self.feature_input + [
self.label_input
]
def pretrain_info(self):
return ('ResNet50_pretrained', 'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz')
def weights_info(self):
return ('tsn_kinetics',
'https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz')
def load_pretrain_params(self, exe, pretrain, prog, place):
def is_parameter(var):
return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name))
logger.info("Load pretrain weights from {}, exclude fc layer.".format(pretrain))
vars = filter(is_parameter, prog.list_vars())
fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import time
import sys
import paddle.fluid as fluid
import math
class TSN_ResNet():
def __init__(self, layers=50, seg_num=7, is_training=True):
self.layers = layers
self.seg_num = seg_num
self.is_training = is_training
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
bias_attr=False)
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(
input=conv,
act=act,
is_test=(not self.is_training),
param_attr=fluid.param_attr.ParamAttr(name=bn_name + "_scale"),
bias_attr=fluid.param_attr.ParamAttr(bn_name + '_offset'),
moving_mean_name=bn_name + "_mean",
moving_variance_name=bn_name + '_variance')
def shortcut(self, input, ch_out, stride, name):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
act='relu',
name=name + "_branch2a")
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu',
name=name + "_branch2b")
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
filter_size=1,
act=None,
name=name + "_branch2c")
short = self.shortcut(
input, num_filters * 4, stride, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
def net(self, input, class_dim=101):
layers = self.layers
seg_num = self.seg_num
supported_layers = [50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
# reshape input
channels = input.shape[2]
short_size = input.shape[3]
input = fluid.layers.reshape(
x=input, shape=[-1, channels, short_size, short_size])
if layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name='conv1')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
for block in range(len(depth)):
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
name=conv_name)
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
feature = fluid.layers.reshape(
x=pool, shape=[-1, seg_num, pool.shape[1]])
out = fluid.layers.reduce_mean(feature, dim=1)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=out,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
return out
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import wget
import tarfile
__all__ = ['decompress', 'download', 'AttrDict']
def decompress(path):
t = tarfile.open(path)
t.extractall(path='/'.join(path.split('/')[:-1]))
t.close()
os.remove(path)
def download(url, path):
weight_dir = '/'.join(path.split('/')[:-1])
if not os.path.exists(weight_dir):
os.makedirs(weight_dir)
path = path + ".tar.gz"
wget.download(url, path)
decompress(path)
class AttrDict(dict):
def __getattr__(self, key):
return self[key]
def __setattr__(self, key, value):
if key in self.__dict__:
self.__dict__[key] = value
else:
self[key] = value
python infer.py --model-name="AttentionCluster" --config=./configs/attention_cluster.txt \
--filelist=./data/youtube8m/infer.list \
--weights=./checkpoints/AttentionCluster_epoch0 \
--save-dir="./save"
python infer.py --model-name="AttentionLSTM" --config=./configs/attention_lstm.txt \
--filelist=./data/youtube8m/infer.list \
--weights=./checkpoints/AttentionLSTM_epoch0 \
--save-dir="./save"
python infer.py --model-name="NEXTVLAD" --config=./configs/nextvlad.txt --filelist=./data/youtube8m/infer.list \
--weights=./checkpoints/NEXTVLAD_epoch0 \
--save-dir="./save"
python infer.py --model-name="STNET" --config=./configs/stnet.txt --filelist=./data/kinetics/infer.list \
--log-interval=10 --weights=./checkpoints/STNET_epoch0 --save-dir=./save
python infer.py --model-name="TSN" --config=./configs/tsn.txt --filelist=./data/kinetics/infer.list \
--log-interval=10 --weights=./checkpoints/TSN_epoch0 --save-dir=./save
python test.py --model-name="AttentionCluster" --config=./configs/attention_cluster.txt \
--log-interval=5 --weights=./checkpoints/AttentionCluster_epoch0
python test.py --model-name="AttentionLSTM" --config=./configs/attention_lstm.txt \
--log-interval=5 --weights=./checkpoints/AttentionLSTM_epoch0
python test.py --model-name="NEXTVLAD" --config=./configs/nextvlad.txt \
--log-interval=10 --weights=./checkpoints/NEXTVLAD_epoch0
python test.py --model-name="STNET" --config=./configs/stnet.txt \
--log-interval=10 --weights=./checkpoints/STNET_epoch0
python test.py --model-name="TSN" --config=./configs/tsn.txt \
--log-interval=10 --weights=./checkpoints/TSN_epoch0
python train.py --model-name="AttentionCluster" --config=./configs/attention_cluster.txt --epoch-num=5 \
--valid-interval=1 --log-interval=10
python train.py --model-name="AttentionLSTM" --config=./configs/attention_lstm.txt --epoch-num=10 \
--valid-interval=1 --log-interval=10
export CUDA_VISIBLE_DEVICES=0,1,2,3
python train.py --model-name="NEXTVLAD" --config=./configs/nextvlad.txt --epoch-num=6 \
--valid-interval=1 --log-interval=10
python train.py --model-name="STNET" --config=./configs/stnet.txt --epoch-num=60 \
--valid-interval=1 --log-interval=10
python train.py --model-name="TSN" --config=./configs/tsn.txt --epoch-num=45 \
--valid-interval=1 --log-interval=10
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import numpy as np
import paddle.fluid as fluid
from config import *
import models
from datareader import get_reader
from metrics import get_metrics
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model-name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--batch-size',
type=int,
default=None,
help='traing batch size per GPU. None to use config file setting.')
parser.add_argument(
'--use-gpu', type=bool, default=True, help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default=None,
help='weight path, None to use weights from Paddle.')
parser.add_argument(
'--log-interval',
type=int,
default=1,
help='mini-batch interval to log.')
args = parser.parse_args()
return args
def test(args):
# parse config
config = parse_config(args.config)
test_config = merge_configs(config, 'test', vars(args))
# build model
test_model = models.get_model(args.model_name, test_config, mode='test')
test_model.build_input(use_pyreader=False)
test_model.build_model()
test_feeds = test_model.feeds()
test_outputs = test_model.outputs()
loss = test_model.loss()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
weights = args.weights or test_model.get_weights()
def if_exist(var):
return os.path.exists(os.path.join(weights, var.name))
fluid.io.load_vars(exe, weights, predicate=if_exist)
# get reader and metrics
test_reader = get_reader(args.model_name.upper(), 'test', test_config)
test_metrics = get_metrics(args.model_name.upper(), 'test', test_config)
test_feeder = fluid.DataFeeder(place=place, feed_list=test_feeds)
fetch_list = [loss.name] + [x.name
for x in test_outputs] + [test_feeds[-1].name]
epoch_period = []
for test_iter, data in enumerate(test_reader()):
cur_time = time.time()
test_outs = exe.run(fetch_list=fetch_list,
feed=test_feeder.feed(data))
period = time.time() - cur_time
epoch_period.append(period)
loss = np.array(test_outs[0])
pred = np.array(test_outs[1])
label = np.array(test_outs[-1])
test_metrics.accumulate(loss, pred, label)
# metric here
if args.log_interval > 0 and test_iter % args.log_interval == 0:
info_str = '[EVAL] Batch {}'.format(test_iter)
test_metrics.calculate_and_log_out(loss, pred, label, info_str)
test_metrics.finalize_and_log_out("[EVAL] eval finished. ")
if __name__ == "__main__":
args = parse_args()
logger.info(args)
test(args)
import os
import time
import numpy as np
import paddle
import paddle.fluid as fluid
import logging
import shutil
logger = logging.getLogger(__name__)
def test_without_pyreader(test_exe,
test_reader,
test_feeder,
test_fetch_list,
test_metrics,
log_interval=0):
test_metrics.reset()
for test_iter, data in enumerate(test_reader()):
test_outs = test_exe.run(test_fetch_list, feed=test_feeder.feed(data))
loss = np.array(test_outs[0])
pred = np.array(test_outs[1])
label = np.array(test_outs[-1])
test_metrics.accumulate(loss, pred, label)
if log_interval > 0 and test_iter % log_interval == 0:
test_metrics.calculate_and_log_out(loss, pred, label, \
info = '[TEST] test_iter {} '.format(test_iter))
test_metrics.finalize_and_log_out("[TEST] Finish")
def test_with_pyreader(test_exe,
test_pyreader,
test_fetch_list,
test_metrics,
log_interval=0):
if not test_pyreader:
logger.error("[TEST] get pyreader failed.")
test_pyreader.start()
test_metrics.reset()
test_iter = 0
try:
while True:
test_outs = test_exe.run(fetch_list=test_fetch_list)
loss = np.array(test_outs[0])
pred = np.array(test_outs[1])
label = np.array(test_outs[-1])
test_metrics.accumulate(loss, pred, label)
if log_interval > 0 and test_iter % log_interval == 0:
test_metrics.calculate_and_log_out(loss, pred, label, \
info = '[TEST] test_iter {} '.format(test_iter))
test_iter += 1
except fluid.core.EOFException:
test_metrics.finalize_and_log_out("[TEST] Finish")
finally:
test_pyreader.reset()
def train_without_pyreader(exe, train_prog, train_exe, train_reader, train_feeder, \
train_fetch_list, train_metrics, epochs = 10, \
log_interval = 0, valid_interval = 0, save_dir = './', \
save_model_name = 'model', test_exe = None, test_reader = None, \
test_feeder = None, test_fetch_list = None, test_metrics = None):
for epoch in range(epochs):
epoch_periods = []
for train_iter, data in enumerate(train_reader()):
cur_time = time.time()
train_outs = train_exe.run(train_fetch_list,
feed=train_feeder.feed(data))
period = time.time() - cur_time
epoch_periods.append(period)
loss = np.array(train_outs[0])
pred = np.array(train_outs[1])
label = np.array(train_outs[-1])
if log_interval > 0 and (train_iter % log_interval == 0):
# eval here
train_metrics.calculate_and_log_out(loss, pred, label, \
info = '[TRAIN] Epoch {}, iter {} '.format(epoch, train_iter))
train_iter += 1
logger.info('[TRAIN] Epoch {} training finished, average time: {}'.
format(epoch, np.mean(epoch_periods)))
save_model(exe, train_prog, save_dir, save_model_name,
"_epoch{}".format(epoch))
if test_exe and valid_interval > 0 and (epoch + 1) % valid_interval == 0:
test_without_pyreader(test_exe, test_reader, test_feeder,
test_fetch_list, test_metrics, log_interval)
def train_with_pyreader(exe, train_prog, train_exe, train_pyreader, \
train_fetch_list, train_metrics, epochs = 10, \
log_interval = 0, valid_interval = 0, \
save_dir = './', save_model_name = 'model', \
test_exe = None, test_pyreader = None, \
test_fetch_list = None, test_metrics = None):
if not train_pyreader:
logger.error("[TRAIN] get pyreader failed.")
for epoch in range(epochs):
train_pyreader.start()
train_metrics.reset()
try:
train_iter = 0
epoch_periods = []
while True:
cur_time = time.time()
train_outs = train_exe.run(fetch_list=train_fetch_list)
period = time.time() - cur_time
epoch_periods.append(period)
loss = np.array(train_outs[0])
pred = np.array(train_outs[1])
label = np.array(train_outs[-1])
if log_interval > 0 and (train_iter % log_interval == 0):
# eval here
train_metrics.calculate_and_log_out(loss, pred, label, \
info = '[TRAIN] Epoch {}, iter {} '.format(epoch, train_iter))
train_iter += 1
except fluid.core.EOFException:
# eval here
logger.info('[TRAIN] Epoch {} training finished, average time: {}'.
format(epoch, np.mean(epoch_periods)))
save_model(exe, train_prog, save_dir, save_model_name,
"_epoch{}".format(epoch))
if test_exe and valid_interval > 0 and (epoch + 1) % valid_interval == 0:
test_with_pyreader(test_exe, test_pyreader, test_fetch_list,
test_metrics, log_interval)
finally:
epoch_period = []
train_pyreader.reset()
def save_model(exe, program, save_dir, model_name, postfix=None):
model_path = os.path.join(save_dir, model_name + postfix)
if os.path.isdir(model_path):
shutil.rmtree(model_path)
fluid.io.save_persistables(exe, model_path, main_program=program)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import argparse
import logging
import numpy as np
import paddle.fluid as fluid
from tools.train_utils import train_with_pyreader, train_without_pyreader
import models
from config import *
from datareader import get_reader
from metrics import get_metrics
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser("Paddle Video train script")
parser.add_argument(
'--model-name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--batch-size',
type=int,
default=None,
help='training batch size. None to use config file setting.')
parser.add_argument(
'--learning-rate',
type=float,
default=None,
help='learning rate use for training. None to use config file setting.')
parser.add_argument(
'--pretrain',
type=str,
default=None,
help='path to pretrain weights. None to use default weights path in ~/.paddle/weights.'
)
parser.add_argument(
'--resume',
type=str,
default=None,
help='path to resume training based on previous checkpoints. '
'None for not resuming any checkpoints.'
)
parser.add_argument(
'--use-gpu', type=bool, default=True, help='default use gpu.')
parser.add_argument(
'--no-use-pyreader',
action='store_true',
default=False,
help='whether to use pyreader')
parser.add_argument(
'--no-memory-optimize',
action='store_true',
default=False,
help='whether to use memory optimize in train')
parser.add_argument(
'--epoch-num',
type=int,
default=0,
help='epoch number, 0 for read from config file')
parser.add_argument(
'--valid-interval',
type=int,
default=1,
help='validation epoch interval, 0 for no validation.')
parser.add_argument(
'--save-dir',
type=str,
default='checkpoints',
help='directory name to save train snapshoot')
parser.add_argument(
'--log-interval',
type=int,
default=10,
help='mini-batch interval to log.')
args = parser.parse_args()
return args
def train(args):
# parse config
config = parse_config(args.config)
train_config = merge_configs(config, 'train', vars(args))
valid_config = merge_configs(config, 'valid', vars(args))
train_model = models.get_model(args.model_name, train_config, mode='train')
valid_model = models.get_model(args.model_name, valid_config, mode='valid')
# build model
startup = fluid.Program()
train_prog = fluid.Program()
with fluid.program_guard(train_prog, startup):
with fluid.unique_name.guard():
train_model.build_input(not args.no_use_pyreader)
train_model.build_model()
# for the input, has the form [data1, data2,..., label], so train_feeds[-1] is label
train_feeds = train_model.feeds()
train_feeds[-1].persistable = True
# for the output of classification model, has the form [pred]
train_outputs = train_model.outputs()
for output in train_outputs:
output.persistable = True
train_loss = train_model.loss()
train_loss.persistable = True
# outputs, loss, label should be fetched, so set persistable to be true
optimizer = train_model.optimizer()
optimizer.minimize(train_loss)
train_pyreader = train_model.pyreader()
if not args.no_memory_optimize:
fluid.memory_optimize(train_prog)
valid_prog = fluid.Program()
with fluid.program_guard(valid_prog, startup):
with fluid.unique_name.guard():
valid_model.build_input(not args.no_use_pyreader)
valid_model.build_model()
valid_feeds = valid_model.feeds()
valid_outputs = valid_model.outputs()
valid_loss = valid_model.loss()
valid_pyreader = valid_model.pyreader()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup)
if args.resume:
# if resume weights is given, load resume weights directly
assert os.path.exists(args.resume), \
"Given resume weight dir {} not exist.".format(args.resume)
def if_exist(var):
return os.path.exists(os.path.join(args.resume, var.name))
fluid.io.load_vars(exe, args.resume, predicate=if_exist, main_program=train_prog)
else:
# if not in resume mode, load pretrain weights
if args.pretrain:
assert os.path.exists(args.pretrain), \
"Given pretrain weight dir {} not exist.".format(args.pretrain)
pretrain = args.pretrain or train_model.get_pretrain_weights()
if pretrain:
train_model.load_pretrain_params(exe, pretrain, train_prog, place)
train_exe = fluid.ParallelExecutor(
use_cuda=args.use_gpu,
loss_name=train_loss.name,
main_program=train_prog)
valid_exe = fluid.ParallelExecutor(
use_cuda=args.use_gpu,
share_vars_from=train_exe,
main_program=valid_prog)
# get reader
bs_denominator = 1
if (not args.no_use_pyreader) and args.use_gpu:
bs_denominator = train_config.TRAIN.num_gpus
train_config.TRAIN.batch_size = int(train_config.TRAIN.batch_size /
bs_denominator)
valid_config.VALID.batch_size = int(valid_config.VALID.batch_size /
bs_denominator)
train_reader = get_reader(args.model_name.upper(), 'train', train_config)
valid_reader = get_reader(args.model_name.upper(), 'valid', valid_config)
# get metrics
train_metrics = get_metrics(args.model_name.upper(), 'train', train_config)
valid_metrics = get_metrics(args.model_name.upper(), 'valid', valid_config)
train_fetch_list = [train_loss.name] + [x.name for x in train_outputs
] + [train_feeds[-1].name]
valid_fetch_list = [valid_loss.name] + [x.name for x in valid_outputs
] + [valid_feeds[-1].name]
epochs = args.epoch_num or train_model.epoch_num()
if args.no_use_pyreader:
train_feeder = fluid.DataFeeder(place=place, feed_list=train_feeds)
valid_feeder = fluid.DataFeeder(place=place, feed_list=valid_feeds)
train_without_pyreader(exe, train_prog, train_exe, train_reader, train_feeder,
train_fetch_list, train_metrics, epochs = epochs,
log_interval = args.log_interval, valid_interval = args.valid_interval,
save_dir = args.save_dir, save_model_name = args.model_name,
test_exe = valid_exe, test_reader = valid_reader, test_feeder = valid_feeder,
test_fetch_list = valid_fetch_list, test_metrics = valid_metrics)
else:
train_pyreader.decorate_paddle_reader(train_reader)
valid_pyreader.decorate_paddle_reader(valid_reader)
train_with_pyreader(exe, train_prog, train_exe, train_pyreader, train_fetch_list, train_metrics,
epochs = epochs, log_interval = args.log_interval,
valid_interval = args.valid_interval,
save_dir = args.save_dir, save_model_name = args.model_name,
test_exe = valid_exe, test_pyreader = valid_pyreader,
test_fetch_list = valid_fetch_list, test_metrics = valid_metrics)
if __name__ == "__main__":
args = parse_args()
logger.info(args)
if not os.path.exists(args.save_dir):
os.makedirs(args.save_dir)
train(args)
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
__all__ = ['AttrDict']
class AttrDict(dict):
def __getattr__(self, key):
return self[key]
def __setattr__(self, key, value):
if key in self.__dict__:
self.__dict__[key] = value
else:
self[key] = value
Subproject commit d2fc9e0b45b4e6cfc93e73054026fc5a8abfbfb9 Subproject commit a4eb73b2fb64d8aab8499a1184edf4fc386f8268
Subproject commit 733c1d02085a3092dd262c4f396563962a514c3e Subproject commit dc1af6a83dd1372055158ac6d17f6d14b3a0f0f8
Subproject commit 60b698a294c34420a7f0aab3112f27649aed1445 Subproject commit 57b93859aa070ae6d96f10a470b1bdf2cfaea052
...@@ -25,7 +25,7 @@ def parse_args(): ...@@ -25,7 +25,7 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--model_path', '--model_path',
type=str, type=str,
default='model/params_pass_0', default='output/params_pass_0',
help='A path to the model. (default: %(default)s)') help='A path to the model. (default: %(default)s)')
parser.add_argument( parser.add_argument(
'--test_data_dir', '--test_data_dir',
......
...@@ -130,13 +130,13 @@ def test(args): ...@@ -130,13 +130,13 @@ def test(args):
loss, logits = dam.create_network() loss, logits = dam.create_network()
loss.persistable = True loss.persistable = True
logits.persistable = True
# gradient clipping # gradient clipping
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue( fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue(
max=1.0, min=-1.0)) max=1.0, min=-1.0))
test_program = fluid.default_main_program().clone(for_test=True) test_program = fluid.default_main_program().clone(for_test=True)
optimizer = fluid.optimizer.Adam( optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.exponential_decay( learning_rate=fluid.layers.exponential_decay(
learning_rate=args.learning_rate, learning_rate=args.learning_rate,
...@@ -145,7 +145,6 @@ def test(args): ...@@ -145,7 +145,6 @@ def test(args):
staircase=True)) staircase=True))
optimizer.minimize(loss) optimizer.minimize(loss)
# The fethced loss is wrong when mem opt is enabled
fluid.memory_optimize(fluid.default_main_program()) fluid.memory_optimize(fluid.default_main_program())
if args.use_cuda: if args.use_cuda:
...@@ -173,8 +172,10 @@ def test(args): ...@@ -173,8 +172,10 @@ def test(args):
if args.ext_eval: if args.ext_eval:
import utils.douban_evaluation as eva import utils.douban_evaluation as eva
eval_metrics = ["MAP", "MRR", "P@1", "R_{10}@1", "R_{10}@2", "R_{10}@5"]
else: else:
import utils.evaluation as eva import utils.evaluation as eva
eval_metrics = ["R_2@1", "R_{10}@1", "R_{10}@2", "R_{10}@5"]
test_batches = reader.build_batches(test_data, data_conf) test_batches = reader.build_batches(test_data, data_conf)
...@@ -214,8 +215,8 @@ def test(args): ...@@ -214,8 +215,8 @@ def test(args):
result = eva.evaluate(score_path) result = eva.evaluate(score_path)
result_file_path = os.path.join(args.save_path, 'result.txt') result_file_path = os.path.join(args.save_path, 'result.txt')
with open(result_file_path, 'w') as out_file: with open(result_file_path, 'w') as out_file:
for p_at in result: for metric, p_at in zip(eval_metrics, result):
out_file.write(str(p_at) + '\n') out_file.write(metric + ": " + str(p_at) + '\n')
print('finish test') print('finish test')
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))) print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
......
...@@ -523,8 +523,8 @@ def evaluate(logger, args): ...@@ -523,8 +523,8 @@ def evaluate(logger, args):
inference_program = main_program.clone(for_test=True) inference_program = main_program.clone(for_test=True)
eval_loss, bleu_rouge = validation( eval_loss, bleu_rouge = validation(
inference_program, avg_cost, s_probs, e_probs, feed_order, inference_program, avg_cost, s_probs, e_probs, match,
place, dev_count, vocab, brc_data, logger, args) feed_order, place, dev_count, vocab, brc_data, logger, args)
logger.info('Dev eval loss {}'.format(eval_loss)) logger.info('Dev eval loss {}'.format(eval_loss))
logger.info('Dev eval result: {}'.format(bleu_rouge)) logger.info('Dev eval result: {}'.format(bleu_rouge))
logger.info('Predicted answers are saved to {}'.format( logger.info('Predicted answers are saved to {}'.format(
......
运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。 运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html)中的说明更新 PaddlePaddle 安装版本。
# 机器翻译:RNN Search # 机器翻译:RNN Search
...@@ -24,7 +24,7 @@ ...@@ -24,7 +24,7 @@
本目录下此范例模型的实现,旨在展示如何用Paddle Fluid实现一个带有注意力机制(Attention)的RNN模型来解决Seq2Seq类问题,以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型,则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer) 本目录下此范例模型的实现,旨在展示如何用Paddle Fluid实现一个带有注意力机制(Attention)的RNN模型来解决Seq2Seq类问题,以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型,则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)
## 模型概览 ## 模型概览
RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector,再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式,可以参考[深度学习101](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/basics/machine_translation/index.html). RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector,再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式,可以参考[深度学习101](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html).
本模型中,在编码器方面,我们的实现使用了双向循环神经网络(Bi-directional Recurrent Neural Network);在解码器方面,我们使用了带注意力(Attention)机制的RNN解码器,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。 本模型中,在编码器方面,我们的实现使用了双向循环神经网络(Bi-directional Recurrent Neural Network);在解码器方面,我们使用了带注意力(Attention)机制的RNN解码器,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
...@@ -45,7 +45,7 @@ RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框 ...@@ -45,7 +45,7 @@ RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框
### 注意力机制 ### 注意力机制
如果编码阶段的输出是一个固定维度的向量,会带来以下两个问题:1)不论源语言序列的长度是5个词还是50个词,如果都用固定维度的向量去编码其中的语义和句法结构信息,对模型来说是一个非常高的要求,特别是对长句子序列而言;2)直觉上,当人类翻译一句话时,会对与当前译文更相关的源语言片段上给予更多关注,且关注点会随着翻译的进行而改变。而固定维度的向量则相当于,任何时刻都对源语言所有信息给予了同等程度的关注,这是不合理的。因此,Bahdanau等人\[[4](#参考文献)\]引入注意力(attention)机制,可以对编码后的上下文片段进行解码,以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。 如果编码阶段的输出是一个固定维度的向量,会带来以下两个问题:1)不论源语言序列的长度是5个词还是50个词,如果都用固定维度的向量去编码其中的语义和句法结构信息,对模型来说是一个非常高的要求,特别是对长句子序列而言;2)直觉上,当人类翻译一句话时,会对与当前译文更相关的源语言片段上给予更多关注,且关注点会随着翻译的进行而改变。而固定维度的向量则相当于,任何时刻都对源语言所有信息给予了同等程度的关注,这是不合理的。因此,Bahdanau等人\[[4](#参考文献)\]引入注意力(attention)机制,可以对编码后的上下文片段进行解码,以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。
与简单的解码器不同,这里$z_i$的计算公式为: 与简单的解码器不同,这里$z_i$的计算公式为 (由于Github原生不支持LaTeX公式,请您移步[这里](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html)查看)
$$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$ $$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$
......
...@@ -69,9 +69,9 @@ WMT 数据集是机器翻译领域公认的主流数据集,[WMT'16 EN-DE 数 ...@@ -69,9 +69,9 @@ WMT 数据集是机器翻译领域公认的主流数据集,[WMT'16 EN-DE 数
└── subword-nmt # BPE 编码的代码 └── subword-nmt # BPE 编码的代码
``` ```
`gen_data/wmt16_ende_data_bpe` 中是我们最终使用的英德翻译数据,其中 `train.tok.clean.bpe.32000.en-de` 为训练数据,`newstest2016.tok.bpe.32000.en-de` 等为验证和测试数据,`vocab_all.bpe.32000` 为相应的词典文件(已加入 `<s>``<e>``<unk>` 这三个特殊符号,源语言和目标语言共享该词典文件)。 `gen_data/wmt16_ende_data_bpe` 中是我们最终使用的英德翻译数据,其中 `train.tok.clean.bpe.32000.en-de` 为训练数据,`newstest2016.tok.bpe.32000.en-de` 等为验证和测试数据,`vocab_all.bpe.32000` 为相应的词典文件(已加入 `<s>``<e>``<unk>` 这三个特殊符号,源语言和目标语言共享该词典文件)。另外我们也整理提供了一份处理好的 WMT'16 EN-DE 数据以供[下载](https://transformer-res.bj.bcebos.com/wmt16_ende_data_bpe_clean.tar.gz)使用(包含训练所需 BPE 数据和词典以及预测和评估所需的 BPE 数据和 tokenize 的数据)。
对于其他自定义数据,转换为类似 `train.tok.clean.bpe.32000.en-de` 的数据格式(`\t` 分隔的源语言和目标语言句子对,句子中的 token 之间使用空格分隔)即可;如需使用 BPE 编码,可参考,亦可以使用类似 WMT,使用 `gen_data.sh` 进行处理。 对于其他自定义数据,转换为类似 `train.tok.clean.bpe.32000.en-de` 的数据格式(`\t` 分隔的源语言和目标语言句子对,句子中的 token 之间使用空格分隔)即可;如需使用 BPE 编码,亦可以使用类似 WMT'16 EN-DE 原始数据的格式,参照 `gen_data.sh` 进行处理。
### 模型训练 ### 模型训练
...@@ -110,11 +110,9 @@ python -u train.py \ ...@@ -110,11 +110,9 @@ python -u train.py \
--batch_size 3200 \ --batch_size 3200 \
--sort_type pool \ --sort_type pool \
--pool_size 200000 \ --pool_size 200000 \
n_layer 6 \
n_head 16 \ n_head 16 \
d_model 1024 \ d_model 1024 \
d_inner_hid 4096 \ d_inner_hid 4096 \
n_head 16 \
prepostprocess_dropout 0.3 prepostprocess_dropout 0.3
``` ```
有关这些参数更详细信息的请参考 `config.py` 中的注释说明。 有关这些参数更详细信息的请参考 `config.py` 中的注释说明。
...@@ -144,30 +142,53 @@ python -u infer.py \ ...@@ -144,30 +142,53 @@ python -u infer.py \
--token_delimiter ' ' \ --token_delimiter ' ' \
--batch_size 32 \ --batch_size 32 \
model_path trained_models/iter_100000.infer.model \ model_path trained_models/iter_100000.infer.model \
beam_size 4 \ beam_size 5 \
max_out_len 255 max_out_len 255
``` ```
和模型训练时类似,预测时也需要设置数据和 reader 相关的参数,并可以执行 `python infer.py --help` 查看这些参数的说明(部分参数意义和训练时略有不同);同样可以在预测命令中设置模型超参数,但应与模型训练时的设置一致;此外相比于模型训练,预测时还有一些额外的参数,如需要设置 `model_path` 来给出模型所在目录,可以设置 `beam_size``max_out_len` 来指定 Beam Search 算法的搜索宽度和最大深度(翻译长度),这些参数也可以在 `config.py` 中的 `InferTaskConfig` 内查阅注释说明并进行更改设置。 和模型训练时类似,预测时也需要设置数据和 reader 相关的参数,并可以执行 `python infer.py --help` 查看这些参数的说明(部分参数意义和训练时略有不同);同样可以在预测命令中设置模型超参数,但应与模型训练时的设置一致,如训练时使用 big model 的参数设置,则预测时对应类似如下命令:
```sh
python -u infer.py \
--src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--special_token '<s>' '<e>' '<unk>' \
--test_file_pattern gen_data/wmt16_ende_data_bpe/newstest2016.tok.bpe.32000.en-de \
--token_delimiter ' ' \
--batch_size 32 \
model_path trained_models/iter_100000.infer.model \
n_head 16 \
d_model 1024 \
d_inner_hid 4096 \
prepostprocess_dropout 0.3 \
beam_size 5 \
max_out_len 255
```
此外相比于模型训练,预测时还有一些额外的参数,如需要设置 `model_path` 来给出模型所在目录,可以设置 `beam_size``max_out_len` 来指定 Beam Search 算法的搜索宽度和最大深度(翻译长度),这些参数也可以在 `config.py` 中的 `InferTaskConfig` 内查阅注释说明并进行更改设置。
执行以上预测命令会打印翻译结果到标准输出,每行输出是对应行输入的得分最高的翻译。对于使用 BPE 的英德数据,预测出的翻译结果也将是 BPE 表示的数据,要还原成原始的数据(这里指 tokenize 后的数据)才能进行正确的评估,可以使用以下命令来恢复 `predict.txt` 内的翻译结果到 `predict.tok.txt` 中(无需再次 tokenize 处理): 执行以上预测命令会打印翻译结果到标准输出,每行输出是对应行输入的得分最高的翻译。对于使用 BPE 的英德数据,预测出的翻译结果也将是 BPE 表示的数据,要还原成原始的数据(这里指 tokenize 后的数据)才能进行正确的评估,可以使用以下命令来恢复 `predict.txt` 内的翻译结果到 `predict.tok.txt` 中(无需再次 tokenize 处理):
```sh ```sh
sed -r 's/(@@ )|(@@ ?$)//g' predict.txt > predict.tok.txt sed -r 's/(@@ )|(@@ ?$)//g' predict.txt > predict.tok.txt
``` ```
接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了。以英德翻译 `newstest2016.tok.de` 数据为例,执行如下命令: 接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了,评估需要用到 mosesdecoder 中的脚本,可以通过以下命令获取:
```sh
git clone https://github.com/moses-smt/mosesdecoder.git
```
以英德翻译 `newstest2014.tok.de` 数据为例,获取 mosesdecoder 后使用 `multi-bleu.perl` 执行如下命令进行翻译结果评估:
```sh ```sh
perl gen_data/mosesdecoder/scripts/generic/multi-bleu.perl gen_data/wmt16_ende_data/newstest2016.tok.de < predict.tok.txt perl gen_data/mosesdecoder/scripts/generic/multi-bleu.perl gen_data/wmt16_ende_data/newstest2014.tok.de < predict.tok.txt
``` ```
可以看到类似如下的结果(为单机两卡训练 200K 个 iteration 后模型的预测结果)。 可以看到类似如下的结果
``` ```
BLEU = 33.08, 64.2/39.2/26.4/18.5 (BP=0.994, ratio=0.994, hyp_len=61971, ref_len=62362) BLEU = 26.35, 57.7/32.1/20.0/13.0 (BP=1.000, ratio=1.013, hyp_len=63903, ref_len=63078)
``` ```
目前在未使用 model average 的情况下,英德翻译 base model 八卡训练 100K 个 iteration 后测试 BLEU 值如下: 目前在未使用 model average 的情况下,英德翻译 base model 和 big model 八卡训练 100K 个 iteration 后测试 BLEU 值如下:
| 测试集 | newstest2014 | newstest2015 | newstest2016 | | 测试集 | newstest2014 | newstest2015 | newstest2016 |
|-|-|-|-| |-|-|-|-|
| BLEU | 26.25 | 29.15 | 33.64 | | Base | 26.35 | 29.07 | 33.30 |
| Big | 27.07 | 30.09 | 34.38 |
我们这里也提供了以上 [base model](https://transformer-res.bj.bcebos.com/base_model.tar.gz)[big model](https://transformer-res.bj.bcebos.com/big_model.tar.gz) 模型的下载以供使用。
### 分布式训练 ### 分布式训练
......
...@@ -164,7 +164,10 @@ input_descs = { ...@@ -164,7 +164,10 @@ input_descs = {
# [batch_size * max_trg_len_in_batch, 1] # [batch_size * max_trg_len_in_batch, 1]
"lbl_weight": [(batch_size * seq_len, 1), "float32"], "lbl_weight": [(batch_size * seq_len, 1), "float32"],
# This input is used in beam-search decoder. # This input is used in beam-search decoder.
"init_score": [(batch_size, 1), "float32"], "init_score": [(batch_size, 1), "float32", 2],
# This input is used in beam-search decoder for the first gather
# (cell states updation)
"init_idx": [(batch_size, ), "int32"],
} }
# Names of word embedding table which might be reused for weight sharing. # Names of word embedding table which might be reused for weight sharing.
...@@ -194,4 +197,5 @@ label_data_input_fields = ( ...@@ -194,4 +197,5 @@ label_data_input_fields = (
fast_decoder_data_input_fields = ( fast_decoder_data_input_fields = (
"trg_word", "trg_word",
"init_score", "init_score",
"init_idx",
"trg_src_attn_bias", ) "trg_src_attn_bias", )
import argparse import argparse
import ast import ast
import multiprocessing
import numpy as np import numpy as np
import os
from functools import partial from functools import partial
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import model import model
import reader
from config import *
from model import wrap_encoder as encoder from model import wrap_encoder as encoder
from model import wrap_decoder as decoder from model import wrap_decoder as decoder
from model import fast_decode as fast_decoder from model import fast_decode as fast_decoder
from config import * from train import pad_batch_data, prepare_data_generator
from train import pad_batch_data
import reader
def parse_args(): def parse_args():
...@@ -54,6 +56,21 @@ def parse_args(): ...@@ -54,6 +56,21 @@ def parse_args():
default=" ", default=" ",
help="The delimiter used to split tokens in source or target sentences. " help="The delimiter used to split tokens in source or target sentences. "
"For EN-DE BPE data we provided, use spaces as token delimiter. ") "For EN-DE BPE data we provided, use spaces as token delimiter. ")
parser.add_argument(
"--use_mem_opt",
type=ast.literal_eval,
default=True,
help="The flag indicating whether to use memory optimization.")
parser.add_argument(
"--use_py_reader",
type=ast.literal_eval,
default=True,
help="The flag indicating whether to use py_reader.")
parser.add_argument(
"--use_parallel_exe",
type=ast.literal_eval,
default=False,
help="The flag indicating whether to use ParallelExecutor.")
parser.add_argument( parser.add_argument(
'opts', 'opts',
help='See config.py for all options', help='See config.py for all options',
...@@ -123,55 +140,152 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, bos_idx, n_head, ...@@ -123,55 +140,152 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, bos_idx, n_head,
trg_word, dtype="float32").reshape(-1, 1), trg_word, dtype="float32").reshape(-1, 1),
place, [range(trg_word.shape[0] + 1)] * 2) place, [range(trg_word.shape[0] + 1)] * 2)
trg_word = to_lodtensor(trg_word, place, [range(trg_word.shape[0] + 1)] * 2) trg_word = to_lodtensor(trg_word, place, [range(trg_word.shape[0] + 1)] * 2)
init_idx = np.asarray(range(len(insts)), dtype="int32")
data_input_dict = dict( data_input_dict = dict(
zip(data_input_names, [ zip(data_input_names, [
src_word, src_pos, src_slf_attn_bias, trg_word, init_score, src_word, src_pos, src_slf_attn_bias, trg_word, init_score,
trg_src_attn_bias init_idx, trg_src_attn_bias
])) ]))
return data_input_dict
def prepare_feed_dict_list(data_generator, count, place):
"""
Prepare the list of feed dict for multi-devices.
"""
feed_dict_list = []
if data_generator is not None: # use_py_reader == False
data_input_names = encoder_data_input_fields + fast_decoder_data_input_fields
data = next(data_generator)
for idx, data_buffer in enumerate(data):
data_input_dict = prepare_batch_input(
data_buffer, data_input_names, ModelHyperParams.eos_idx,
ModelHyperParams.bos_idx, ModelHyperParams.n_head,
ModelHyperParams.d_model, place)
feed_dict_list.append(data_input_dict)
return feed_dict_list if len(feed_dict_list) == count else None
def py_reader_provider_wrapper(data_reader, place):
"""
Data provider needed by fluid.layers.py_reader.
"""
input_dict = dict(data_input_dict.items()) def py_reader_provider():
return input_dict data_input_names = encoder_data_input_fields + fast_decoder_data_input_fields
for batch_id, data in enumerate(data_reader()):
data_input_dict = prepare_batch_input(
data, data_input_names, ModelHyperParams.eos_idx,
ModelHyperParams.bos_idx, ModelHyperParams.n_head,
ModelHyperParams.d_model, place)
yield [data_input_dict[item] for item in data_input_names]
return py_reader_provider
def fast_infer(test_data, trg_idx2word):
def fast_infer(args):
""" """
Inference by beam search decoder based solely on Fluid operators. Inference by beam search decoder based solely on Fluid operators.
""" """
place = fluid.CUDAPlace(0) if InferTaskConfig.use_gpu else fluid.CPUPlace() out_ids, out_scores, pyreader = fast_decoder(
exe = fluid.Executor(place) ModelHyperParams.src_vocab_size,
ModelHyperParams.trg_vocab_size,
ModelHyperParams.max_length + 1,
ModelHyperParams.n_layer,
ModelHyperParams.n_head,
ModelHyperParams.d_key,
ModelHyperParams.d_value,
ModelHyperParams.d_model,
ModelHyperParams.d_inner_hid,
ModelHyperParams.prepostprocess_dropout,
ModelHyperParams.attention_dropout,
ModelHyperParams.relu_dropout,
ModelHyperParams.preprocess_cmd,
ModelHyperParams.postprocess_cmd,
ModelHyperParams.weight_sharing,
InferTaskConfig.beam_size,
InferTaskConfig.max_out_len,
ModelHyperParams.eos_idx,
use_py_reader=args.use_py_reader)
# This is used here to set dropout to the test mode.
infer_program = fluid.default_main_program().clone(for_test=True)
if args.use_mem_opt:
fluid.memory_optimize(infer_program)
out_ids, out_scores = fast_decoder( if InferTaskConfig.use_gpu:
ModelHyperParams.src_vocab_size, ModelHyperParams.trg_vocab_size, place = fluid.CUDAPlace(0)
ModelHyperParams.max_length + 1, ModelHyperParams.n_layer, dev_count = fluid.core.get_cuda_device_count()
ModelHyperParams.n_head, ModelHyperParams.d_key, else:
ModelHyperParams.d_value, ModelHyperParams.d_model, place = fluid.CPUPlace()
ModelHyperParams.d_inner_hid, ModelHyperParams.prepostprocess_dropout, dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
ModelHyperParams.attention_dropout, ModelHyperParams.relu_dropout, exe = fluid.Executor(place)
ModelHyperParams.preprocess_cmd, ModelHyperParams.postprocess_cmd, exe.run(fluid.default_startup_program())
ModelHyperParams.weight_sharing, InferTaskConfig.beam_size,
InferTaskConfig.max_out_len, ModelHyperParams.eos_idx)
fluid.io.load_vars( fluid.io.load_vars(
exe, exe,
InferTaskConfig.model_path, InferTaskConfig.model_path,
vars=[ vars=[
var for var in fluid.default_main_program().list_vars() var for var in infer_program.list_vars()
if isinstance(var, fluid.framework.Parameter) if isinstance(var, fluid.framework.Parameter)
]) ])
# This is used here to set dropout to the test mode. exec_strategy = fluid.ExecutionStrategy()
infer_program = fluid.default_main_program().clone(for_test=True) # For faster executor
exec_strategy.use_experimental_executor = True
exec_strategy.num_threads = 1
build_strategy = fluid.BuildStrategy()
infer_exe = fluid.ParallelExecutor(
use_cuda=TrainTaskConfig.use_gpu,
main_program=infer_program,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
# data reader settings for inference
args.train_file_pattern = args.test_file_pattern
args.use_token_batch = False
args.sort_type = reader.SortType.NONE
args.shuffle = False
args.shuffle_batch = False
test_data = prepare_data_generator(
args,
is_test=False,
count=dev_count,
pyreader=pyreader,
py_reader_provider_wrapper=py_reader_provider_wrapper,
place=place)
if args.use_py_reader:
pyreader.start()
data_generator = None
else:
data_generator = test_data()
trg_idx2word = reader.DataReader.load_dict(
dict_path=args.trg_vocab_fpath, reverse=True)
for batch_id, data in enumerate(test_data.batch_generator()): while True:
data_input = prepare_batch_input( try:
data, encoder_data_input_fields + fast_decoder_data_input_fields, feed_dict_list = prepare_feed_dict_list(data_generator, dev_count,
ModelHyperParams.eos_idx, ModelHyperParams.bos_idx, place)
ModelHyperParams.n_head, ModelHyperParams.d_model, place) if args.use_parallel_exe:
seq_ids, seq_scores = exe.run(infer_program, seq_ids, seq_scores = infer_exe.run(
feed=data_input, fetch_list=[out_ids.name, out_scores.name],
fetch_list=[out_ids, out_scores], feed=feed_dict_list,
return_numpy=False) return_numpy=False)
else:
seq_ids, seq_scores = exe.run(
program=infer_program,
fetch_list=[out_ids.name, out_scores.name],
feed=feed_dict_list[0]
if feed_dict_list is not None else None,
return_numpy=False,
use_program_cache=True)
seq_ids_list, seq_scores_list = [seq_ids], [
seq_scores
] if isinstance(
seq_ids, paddle.fluid.core.LoDTensor) else (seq_ids, seq_scores)
for seq_ids, seq_scores in zip(seq_ids_list, seq_scores_list):
# How to parse the results: # How to parse the results:
# Suppose the lod of seq_ids is: # Suppose the lod of seq_ids is:
# [[0, 3, 6], [0, 12, 24, 40, 54, 67, 82]] # [[0, 3, 6], [0, 12, 24, 40, 54, 67, 82]]
...@@ -180,9 +294,10 @@ def fast_infer(test_data, trg_idx2word): ...@@ -180,9 +294,10 @@ def fast_infer(test_data, trg_idx2word):
# from lod[1]: # from lod[1]:
# the first source sentence has 3 hyps; the lengths are 12, 12, 16 # the first source sentence has 3 hyps; the lengths are 12, 12, 16
# the second source sentence has 3 hyps; the lengths are 14, 13, 15 # the second source sentence has 3 hyps; the lengths are 14, 13, 15
hyps = [[] for i in range(len(data))] hyps = [[] for i in range(len(seq_ids.lod()[0]) - 1)]
scores = [[] for i in range(len(data))] scores = [[] for i in range(len(seq_scores.lod()[0]) - 1)]
for i in range(len(seq_ids.lod()[0]) - 1): # for each source sentence for i in range(len(seq_ids.lod()[0]) -
1): # for each source sentence
start = seq_ids.lod()[0][i] start = seq_ids.lod()[0][i]
end = seq_ids.lod()[0][i + 1] end = seq_ids.lod()[0][i + 1]
for j in range(end - start): # for each candidate for j in range(end - start): # for each candidate
...@@ -197,32 +312,13 @@ def fast_infer(test_data, trg_idx2word): ...@@ -197,32 +312,13 @@ def fast_infer(test_data, trg_idx2word):
print(hyps[i][-1]) print(hyps[i][-1])
if len(hyps[i]) >= InferTaskConfig.n_best: if len(hyps[i]) >= InferTaskConfig.n_best:
break break
except (StopIteration, fluid.core.EOFException):
# The data pass is over.
def infer(args, inferencer=fast_infer): if args.use_py_reader:
place = fluid.CUDAPlace(0) if InferTaskConfig.use_gpu else fluid.CPUPlace() pyreader.reset()
test_data = reader.DataReader( break
src_vocab_fpath=args.src_vocab_fpath,
trg_vocab_fpath=args.trg_vocab_fpath,
fpattern=args.test_file_pattern,
token_delimiter=args.token_delimiter,
use_token_batch=False,
batch_size=args.batch_size,
pool_size=args.pool_size,
sort_type=reader.SortType.NONE,
shuffle=False,
shuffle_batch=False,
start_mark=args.special_token[0],
end_mark=args.special_token[1],
unk_mark=args.special_token[2],
# count start and end tokens out
max_length=ModelHyperParams.max_length - 2,
clip_last_batch=False)
trg_idx2word = test_data.load_dict(
dict_path=args.trg_vocab_fpath, reverse=True)
inferencer(test_data, trg_idx2word)
if __name__ == "__main__": if __name__ == "__main__":
args = parse_args() args = parse_args()
infer(args) fast_infer(args)
...@@ -7,6 +7,43 @@ import paddle.fluid.layers as layers ...@@ -7,6 +7,43 @@ import paddle.fluid.layers as layers
from config import * from config import *
def wrap_layer_with_block(layer, block_idx):
"""
Make layer define support indicating block, by which we can add layers
to other blocks within current block. This will make it easy to define
cache among while loop.
"""
class BlockGuard(object):
"""
BlockGuard class.
BlockGuard class is used to switch to the given block in a program by
using the Python `with` keyword.
"""
def __init__(self, block_idx=None, main_program=None):
self.main_program = fluid.default_main_program(
) if main_program is None else main_program
self.old_block_idx = self.main_program.current_block().idx
self.new_block_idx = block_idx
def __enter__(self):
self.main_program.current_block_idx = self.new_block_idx
def __exit__(self, exc_type, exc_val, exc_tb):
self.main_program.current_block_idx = self.old_block_idx
if exc_type is not None:
return False # re-raise exception
return True
def layer_wrapper(*args, **kwargs):
with BlockGuard(block_idx):
return layer(*args, **kwargs)
return layer_wrapper
def position_encoding_init(n_position, d_pos_vec): def position_encoding_init(n_position, d_pos_vec):
""" """
Generate the initial values for the sinusoid position encoding table. Generate the initial values for the sinusoid position encoding table.
...@@ -35,7 +72,9 @@ def multi_head_attention(queries, ...@@ -35,7 +72,9 @@ def multi_head_attention(queries,
d_model, d_model,
n_head=1, n_head=1,
dropout_rate=0., dropout_rate=0.,
cache=None): cache=None,
gather_idx=None,
static_kv=False):
""" """
Multi-Head Attention. Note that attn_bias is added to the logit before Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that computing softmax activiation to mask certain selected positions so that
...@@ -56,42 +95,86 @@ def multi_head_attention(queries, ...@@ -56,42 +95,86 @@ def multi_head_attention(queries,
size=d_key * n_head, size=d_key * n_head,
bias_attr=False, bias_attr=False,
num_flatten_dims=2) num_flatten_dims=2)
k = layers.fc(input=keys, # For encoder-decoder attention in inference, insert the ops and vars
# into global block to use as cache among beam search.
fc_layer = wrap_layer_with_block(
layers.fc, fluid.default_main_program().current_block()
.parent_idx) if cache is not None and static_kv else layers.fc
k = fc_layer(
input=keys,
size=d_key * n_head, size=d_key * n_head,
bias_attr=False, bias_attr=False,
num_flatten_dims=2) num_flatten_dims=2)
v = layers.fc(input=values, v = fc_layer(
input=values,
size=d_value * n_head, size=d_value * n_head,
bias_attr=False, bias_attr=False,
num_flatten_dims=2) num_flatten_dims=2)
return q, k, v return q, k, v
def __split_heads(x, n_head): def __split_heads_qkv(queries, keys, values, n_head, d_key, d_value):
""" """
Reshape the last dimension of inpunt tensor x so that it becomes two Reshape input tensors at the last dimension to split multi-heads
dimensions and then transpose. Specifically, input a tensor with shape and then transpose. Specifically, transform the input tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor [bs, max_sequence_length, n_head * hidden_dim] to the output tensor
with shape [bs, n_head, max_sequence_length, hidden_dim]. with shape [bs, n_head, max_sequence_length, hidden_dim].
""" """
if n_head == 1:
return x
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension # The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size. # size of the input as the output dimension size.
reshaped = layers.reshape( reshaped_q = layers.reshape(
x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True) x=queries, shape=[0, 0, n_head, d_key], inplace=True)
# permuate the dimensions into: # permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head] # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3]) q = layers.transpose(x=reshaped_q, perm=[0, 2, 1, 3])
# For encoder-decoder attention in inference, insert the ops and vars
# into global block to use as cache among beam search.
reshape_layer = wrap_layer_with_block(
layers.reshape,
fluid.default_main_program().current_block()
.parent_idx) if cache is not None and static_kv else layers.reshape
transpose_layer = wrap_layer_with_block(
layers.transpose,
fluid.default_main_program().current_block().
parent_idx) if cache is not None and static_kv else layers.transpose
reshaped_k = reshape_layer(
x=keys, shape=[0, 0, n_head, d_key], inplace=True)
k = transpose_layer(x=reshaped_k, perm=[0, 2, 1, 3])
reshaped_v = reshape_layer(
x=values, shape=[0, 0, n_head, d_value], inplace=True)
v = transpose_layer(x=reshaped_v, perm=[0, 2, 1, 3])
if cache is not None: # only for faster inference
if static_kv: # For encoder-decoder attention in inference
cache_k, cache_v = cache["static_k"], cache["static_v"]
# To init the static_k and static_v in cache.
# Maybe we can use condition_op(if_else) to do these at the first
# step in while loop to replace these, however it might be less
# efficient.
static_cache_init = wrap_layer_with_block(
layers.assign,
fluid.default_main_program().current_block().parent_idx)
static_cache_init(k, cache_k)
static_cache_init(v, cache_v)
else: # For decoder self-attention in inference
cache_k, cache_v = cache["k"], cache["v"]
# gather cell states corresponding to selected parent
select_k = layers.gather(cache_k, index=gather_idx)
select_v = layers.gather(cache_v, index=gather_idx)
if not static_kv:
# For self attention in inference, use cache and concat time steps.
select_k = layers.concat([select_k, k], axis=2)
select_v = layers.concat([select_v, v], axis=2)
# update cell states(caches) cached in global block
layers.assign(select_k, cache_k)
layers.assign(select_v, cache_v)
return q, select_k, select_v
return q, k, v
def __combine_heads(x): def __combine_heads(x):
""" """
Transpose and then reshape the last two dimensions of inpunt tensor x Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads. so that it becomes one dimension, which is reverse to __split_heads.
""" """
if len(x.shape) == 3: return x
if len(x.shape) != 4: if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.") raise ValueError("Input(x) should be a 4-D Tensor.")
...@@ -107,8 +190,7 @@ def multi_head_attention(queries, ...@@ -107,8 +190,7 @@ def multi_head_attention(queries,
""" """
Scaled Dot-Product Attention Scaled Dot-Product Attention
""" """
scaled_q = layers.scale(x=q, scale=d_key**-0.5) product = layers.matmul(x=q, y=k, transpose_y=True, alpha=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias: if attn_bias:
product += attn_bias product += attn_bias
weights = layers.softmax(product) weights = layers.softmax(product)
...@@ -122,23 +204,7 @@ def multi_head_attention(queries, ...@@ -122,23 +204,7 @@ def multi_head_attention(queries,
return out return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value) q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
q, k, v = __split_heads_qkv(q, k, v, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat(
[layers.reshape(
cache["k"], shape=[0, 0, d_key * n_head]), k],
axis=1)
v = cache["v"] = layers.concat(
[layers.reshape(
cache["v"], shape=[0, 0, d_value * n_head]), v],
axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_model, ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_model,
dropout_rate) dropout_rate)
...@@ -327,7 +393,8 @@ def decoder_layer(dec_input, ...@@ -327,7 +393,8 @@ def decoder_layer(dec_input,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
cache=None): cache=None,
gather_idx=None):
""" The layer to be stacked in decoder part. """ The layer to be stacked in decoder part.
The structure of this module is similar to that in the encoder part except The structure of this module is similar to that in the encoder part except
a multi-head attention is added to implement encoder-decoder attention. a multi-head attention is added to implement encoder-decoder attention.
...@@ -342,7 +409,8 @@ def decoder_layer(dec_input, ...@@ -342,7 +409,8 @@ def decoder_layer(dec_input,
d_model, d_model,
n_head, n_head,
attention_dropout, attention_dropout,
cache, ) cache=cache,
gather_idx=gather_idx)
slf_attn_output = post_process_layer( slf_attn_output = post_process_layer(
dec_input, dec_input,
slf_attn_output, slf_attn_output,
...@@ -358,7 +426,10 @@ def decoder_layer(dec_input, ...@@ -358,7 +426,10 @@ def decoder_layer(dec_input,
d_value, d_value,
d_model, d_model,
n_head, n_head,
attention_dropout, ) attention_dropout,
cache=cache,
gather_idx=gather_idx,
static_kv=True)
enc_attn_output = post_process_layer( enc_attn_output = post_process_layer(
slf_attn_output, slf_attn_output,
enc_attn_output, enc_attn_output,
...@@ -393,7 +464,8 @@ def decoder(dec_input, ...@@ -393,7 +464,8 @@ def decoder(dec_input,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
caches=None): caches=None,
gather_idx=None):
""" """
The decoder is composed of a stack of identical decoder_layer layers. The decoder is composed of a stack of identical decoder_layer layers.
""" """
...@@ -413,7 +485,8 @@ def decoder(dec_input, ...@@ -413,7 +485,8 @@ def decoder(dec_input,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
cache=None if caches is None else caches[i]) cache=None if caches is None else caches[i],
gather_idx=gather_idx)
dec_input = dec_output dec_input = dec_output
dec_output = pre_process_layer(dec_output, preprocess_cmd, dec_output = pre_process_layer(dec_output, preprocess_cmd,
prepostprocess_dropout) prepostprocess_dropout)
...@@ -610,7 +683,8 @@ def wrap_decoder(trg_vocab_size, ...@@ -610,7 +683,8 @@ def wrap_decoder(trg_vocab_size,
weight_sharing, weight_sharing,
dec_inputs=None, dec_inputs=None,
enc_output=None, enc_output=None,
caches=None): caches=None,
gather_idx=None):
""" """
The wrapper assembles together all needed layers for the decoder. The wrapper assembles together all needed layers for the decoder.
""" """
...@@ -646,7 +720,8 @@ def wrap_decoder(trg_vocab_size, ...@@ -646,7 +720,8 @@ def wrap_decoder(trg_vocab_size,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
caches=caches) caches=caches,
gather_idx=gather_idx)
# Reshape to 2D tensor to use GEMM instead of BatchedGEMM # Reshape to 2D tensor to use GEMM instead of BatchedGEMM
dec_output = layers.reshape( dec_output = layers.reshape(
dec_output, shape=[-1, dec_output.shape[-1]], inplace=True) dec_output, shape=[-1, dec_output.shape[-1]], inplace=True)
...@@ -666,8 +741,7 @@ def wrap_decoder(trg_vocab_size, ...@@ -666,8 +741,7 @@ def wrap_decoder(trg_vocab_size,
return predict return predict
def fast_decode( def fast_decode(src_vocab_size,
src_vocab_size,
trg_vocab_size, trg_vocab_size,
max_in_len, max_in_len,
n_layer, n_layer,
...@@ -684,62 +758,93 @@ def fast_decode( ...@@ -684,62 +758,93 @@ def fast_decode(
weight_sharing, weight_sharing,
beam_size, beam_size,
max_out_len, max_out_len,
eos_idx, ): eos_idx,
use_py_reader=False):
""" """
Use beam search to decode. Caches will be used to store states of history Use beam search to decode. Caches will be used to store states of history
steps which can make the decoding faster. steps which can make the decoding faster.
""" """
data_input_names = encoder_data_input_fields + fast_decoder_data_input_fields
if use_py_reader:
all_inputs, reader = make_all_py_reader_inputs(data_input_names)
else:
all_inputs = make_all_inputs(data_input_names)
enc_inputs_len = len(encoder_data_input_fields)
dec_inputs_len = len(fast_decoder_data_input_fields)
enc_inputs = all_inputs[0:enc_inputs_len]
dec_inputs = all_inputs[enc_inputs_len:enc_inputs_len + dec_inputs_len]
enc_output = wrap_encoder( enc_output = wrap_encoder(
src_vocab_size, max_in_len, n_layer, n_head, d_key, d_value, d_model, src_vocab_size,
d_inner_hid, prepostprocess_dropout, attention_dropout, relu_dropout, max_in_len,
preprocess_cmd, postprocess_cmd, weight_sharing) n_layer,
start_tokens, init_scores, trg_src_attn_bias = make_all_inputs( n_head,
fast_decoder_data_input_fields) d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
preprocess_cmd,
postprocess_cmd,
weight_sharing,
enc_inputs, )
start_tokens, init_scores, parent_idx, trg_src_attn_bias = dec_inputs
def beam_search(): def beam_search():
max_len = layers.fill_constant( max_len = layers.fill_constant(
shape=[1], dtype=start_tokens.dtype, value=max_out_len) shape=[1],
dtype=start_tokens.dtype,
value=max_out_len,
force_cpu=True)
step_idx = layers.fill_constant( step_idx = layers.fill_constant(
shape=[1], dtype=start_tokens.dtype, value=0) shape=[1], dtype=start_tokens.dtype, value=0, force_cpu=True)
cond = layers.less_than(x=step_idx, y=max_len) cond = layers.less_than(x=step_idx, y=max_len) # default force_cpu=True
while_op = layers.While(cond) while_op = layers.While(cond)
# array states will be stored for each step. # array states will be stored for each step.
ids = layers.array_write( ids = layers.array_write(
layers.reshape(start_tokens, (-1, 1)), step_idx) layers.reshape(start_tokens, (-1, 1)), step_idx)
scores = layers.array_write(init_scores, step_idx) scores = layers.array_write(init_scores, step_idx)
# cell states will be overwrited at each step. # cell states will be overwrited at each step.
# caches contains states of history steps to reduce redundant # caches contains states of history steps in decoder self-attention
# computation in decoder. # and static encoder output projections in encoder-decoder attention
caches = [{ # to reduce redundant computation.
"k": layers.fill_constant_batch_size_like( caches = [
{
"k": # for self attention
layers.fill_constant_batch_size_like(
input=start_tokens, input=start_tokens,
shape=[-1, 0, d_model], shape=[-1, n_head, 0, d_key],
dtype=enc_output.dtype, dtype=enc_output.dtype,
value=0), value=0),
"v": layers.fill_constant_batch_size_like( "v": # for self attention
layers.fill_constant_batch_size_like(
input=start_tokens, input=start_tokens,
shape=[-1, 0, d_model], shape=[-1, n_head, 0, d_value],
dtype=enc_output.dtype, dtype=enc_output.dtype,
value=0) value=0),
} for i in range(n_layer)] "static_k": # for encoder-decoder attention
layers.create_tensor(dtype=enc_output.dtype),
"static_v": # for encoder-decoder attention
layers.create_tensor(dtype=enc_output.dtype)
} for i in range(n_layer)
]
with while_op.block(): with while_op.block():
pre_ids = layers.array_read(array=ids, i=step_idx) pre_ids = layers.array_read(array=ids, i=step_idx)
pre_ids = layers.reshape(pre_ids, (-1, 1, 1)) # Since beam_search_op dosen't enforce pre_ids' shape, we can do
# inplace reshape here which actually change the shape of pre_ids.
pre_ids = layers.reshape(pre_ids, (-1, 1, 1), inplace=True)
pre_scores = layers.array_read(array=scores, i=step_idx) pre_scores = layers.array_read(array=scores, i=step_idx)
# sequence_expand can gather sequences according to lod thus can be # gather cell states corresponding to selected parent
# used in beam search to sift states corresponding to selected ids. pre_src_attn_bias = layers.gather(
pre_src_attn_bias = layers.sequence_expand( trg_src_attn_bias, index=parent_idx)
x=trg_src_attn_bias, y=pre_scores)
pre_enc_output = layers.sequence_expand(x=enc_output, y=pre_scores)
pre_caches = [{
"k": layers.sequence_expand(
x=cache["k"], y=pre_scores),
"v": layers.sequence_expand(
x=cache["v"], y=pre_scores),
} for cache in caches]
pre_pos = layers.elementwise_mul( pre_pos = layers.elementwise_mul(
x=layers.fill_constant_batch_size_like( x=layers.fill_constant_batch_size_like(
input=pre_enc_output, # cann't use pre_ids here since it has lod input=pre_src_attn_bias, # cann't use lod tensor here
value=1, value=1,
shape=[-1, 1, 1], shape=[-1, 1, 1],
dtype=pre_ids.dtype), dtype=pre_ids.dtype),
...@@ -761,35 +866,33 @@ def fast_decode( ...@@ -761,35 +866,33 @@ def fast_decode(
postprocess_cmd, postprocess_cmd,
weight_sharing, weight_sharing,
dec_inputs=(pre_ids, pre_pos, None, pre_src_attn_bias), dec_inputs=(pre_ids, pre_pos, None, pre_src_attn_bias),
enc_output=pre_enc_output, enc_output=enc_output,
caches=pre_caches) caches=caches,
gather_idx=parent_idx)
# intra-beam topK
topk_scores, topk_indices = layers.topk( topk_scores, topk_indices = layers.topk(
input=layers.softmax(logits), k=beam_size) input=layers.softmax(logits), k=beam_size)
accu_scores = layers.elementwise_add( accu_scores = layers.elementwise_add(
x=layers.log(topk_scores), x=layers.log(topk_scores), y=pre_scores, axis=0)
y=layers.reshape( # beam_search op uses lod to differentiate branches.
pre_scores, shape=[-1]),
axis=0)
# beam_search op uses lod to distinguish branches.
topk_indices = layers.lod_reset(topk_indices, pre_ids) topk_indices = layers.lod_reset(topk_indices, pre_ids)
selected_ids, selected_scores = layers.beam_search( # topK reduction across beams, also contain special handle of
# end beams and end sentences(batch reduction)
selected_ids, selected_scores, gather_idx = layers.beam_search(
pre_ids=pre_ids, pre_ids=pre_ids,
pre_scores=pre_scores, pre_scores=pre_scores,
ids=topk_indices, ids=topk_indices,
scores=accu_scores, scores=accu_scores,
beam_size=beam_size, beam_size=beam_size,
end_id=eos_idx) end_id=eos_idx,
return_parent_idx=True)
layers.increment(x=step_idx, value=1.0, in_place=True) layers.increment(x=step_idx, value=1.0, in_place=True)
# update states # cell states(caches) have been updated in wrap_decoder,
# only need to update beam search states here.
layers.array_write(selected_ids, i=step_idx, array=ids) layers.array_write(selected_ids, i=step_idx, array=ids)
layers.array_write(selected_scores, i=step_idx, array=scores) layers.array_write(selected_scores, i=step_idx, array=scores)
layers.assign(gather_idx, parent_idx)
layers.assign(pre_src_attn_bias, trg_src_attn_bias) layers.assign(pre_src_attn_bias, trg_src_attn_bias)
layers.assign(pre_enc_output, enc_output)
for i in range(n_layer):
layers.assign(pre_caches[i]["k"], caches[i]["k"])
layers.assign(pre_caches[i]["v"], caches[i]["v"])
length_cond = layers.less_than(x=step_idx, y=max_len) length_cond = layers.less_than(x=step_idx, y=max_len)
finish_cond = layers.logical_not(layers.is_empty(x=selected_ids)) finish_cond = layers.logical_not(layers.is_empty(x=selected_ids))
layers.logical_and(x=length_cond, y=finish_cond, out=cond) layers.logical_and(x=length_cond, y=finish_cond, out=cond)
...@@ -799,4 +902,4 @@ def fast_decode( ...@@ -799,4 +902,4 @@ def fast_decode(
return finished_ids, finished_scores return finished_ids, finished_scores
finished_ids, finished_scores = beam_search() finished_ids, finished_scores = beam_search()
return finished_ids, finished_scores return finished_ids, finished_scores, reader if use_py_reader else None
...@@ -186,7 +186,7 @@ def main(args): ...@@ -186,7 +186,7 @@ def main(args):
# Since the token number differs among devices, customize gradient scale to # Since the token number differs among devices, customize gradient scale to
# use token average cost among multi-devices. and the gradient scale is # use token average cost among multi-devices. and the gradient scale is
# `1 / token_number` for average cost. # `1 / token_number` for average cost.
build_strategy.gradient_scale_strategy = fluid.BuildStrategy.GradientScaleStrategy.Customized # build_strategy.gradient_scale_strategy = fluid.BuildStrategy.GradientScaleStrategy.Customized
train_exe = fluid.ParallelExecutor( train_exe = fluid.ParallelExecutor(
use_cuda=TrainTaskConfig.use_gpu, use_cuda=TrainTaskConfig.use_gpu,
loss_name=avg_cost.name, loss_name=avg_cost.name,
......
...@@ -10,7 +10,6 @@ import time ...@@ -10,7 +10,6 @@ import time
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.transpiler.details import program_to_code
import reader import reader
from config import * from config import *
...@@ -258,7 +257,12 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, trg_pad_idx, ...@@ -258,7 +257,12 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, trg_pad_idx,
return data_input_dict, np.asarray([num_token], dtype="float32") return data_input_dict, np.asarray([num_token], dtype="float32")
def prepare_data_generator(args, is_test, count, pyreader): def prepare_data_generator(args,
is_test,
count,
pyreader,
py_reader_provider_wrapper,
place=None):
""" """
Data generator wrapper for DataReader. If use py_reader, set the data Data generator wrapper for DataReader. If use py_reader, set the data
provider for py_reader provider for py_reader
...@@ -319,7 +323,7 @@ def prepare_data_generator(args, is_test, count, pyreader): ...@@ -319,7 +323,7 @@ def prepare_data_generator(args, is_test, count, pyreader):
data_reader = split(data_reader, count) data_reader = split(data_reader, count)
if args.use_py_reader: if args.use_py_reader:
pyreader.decorate_tensor_provider( pyreader.decorate_tensor_provider(
py_reader_provider_wrapper(data_reader)) py_reader_provider_wrapper(data_reader, place))
data_reader = None data_reader = None
else: # Data generator for multi-devices else: # Data generator for multi-devices
data_reader = stack(data_reader, count) data_reader = stack(data_reader, count)
...@@ -357,7 +361,7 @@ def prepare_feed_dict_list(data_generator, init_flag, count): ...@@ -357,7 +361,7 @@ def prepare_feed_dict_list(data_generator, init_flag, count):
return feed_dict_list if len(feed_dict_list) == count else None return feed_dict_list if len(feed_dict_list) == count else None
def py_reader_provider_wrapper(data_reader): def py_reader_provider_wrapper(data_reader, place):
""" """
Data provider needed by fluid.layers.py_reader. Data provider needed by fluid.layers.py_reader.
""" """
...@@ -370,8 +374,7 @@ def py_reader_provider_wrapper(data_reader): ...@@ -370,8 +374,7 @@ def py_reader_provider_wrapper(data_reader):
data, data_input_names, ModelHyperParams.eos_idx, data, data_input_names, ModelHyperParams.eos_idx,
ModelHyperParams.eos_idx, ModelHyperParams.n_head, ModelHyperParams.eos_idx, ModelHyperParams.n_head,
ModelHyperParams.d_model) ModelHyperParams.d_model)
total_dict = dict(data_input_dict.items()) yield [data_input_dict[item] for item in data_input_names]
yield [total_dict[item] for item in data_input_names]
return py_reader_provider return py_reader_provider
...@@ -406,7 +409,11 @@ def test_context(exe, train_exe, dev_count): ...@@ -406,7 +409,11 @@ def test_context(exe, train_exe, dev_count):
is_test=True) is_test=True)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
test_data = prepare_data_generator( test_data = prepare_data_generator(
args, is_test=True, count=dev_count, pyreader=pyreader) args,
is_test=True,
count=dev_count,
pyreader=pyreader,
py_reader_provider_wrapper=py_reader_provider_wrapper)
exe.run(startup_prog) # to init pyreader for testing exe.run(startup_prog) # to init pyreader for testing
if TrainTaskConfig.ckpt_path: if TrainTaskConfig.ckpt_path:
...@@ -477,7 +484,11 @@ def train_loop(exe, ...@@ -477,7 +484,11 @@ def train_loop(exe,
logging.info("begin reader") logging.info("begin reader")
train_data = prepare_data_generator( train_data = prepare_data_generator(
args, is_test=False, count=dev_count, pyreader=pyreader) args,
is_test=False,
count=dev_count,
pyreader=pyreader,
py_reader_provider_wrapper=py_reader_provider_wrapper)
# For faster executor # For faster executor
exec_strategy = fluid.ExecutionStrategy() exec_strategy = fluid.ExecutionStrategy()
......
...@@ -136,6 +136,7 @@ def main(train_data_file, ...@@ -136,6 +136,7 @@ def main(train_data_file,
" pass_f1_score:" + str(test_pass_f1_score)) " pass_f1_score:" + str(test_pass_f1_score))
save_dirname = os.path.join(model_save_dir, "params_pass_%d" % pass_id) save_dirname = os.path.join(model_save_dir, "params_pass_%d" % pass_id)
if "CE_MODE_X" not in os.environ:
fluid.io.save_inference_model(save_dirname, ['word', 'mark'], fluid.io.save_inference_model(save_dirname, ['word', 'mark'],
crf_decode, exe) crf_decode, exe)
......
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Build raw data
"""
from __future__ import print_function
import sys
import os
import random
import re
data_type = sys.argv[1]
if not (data_type == "train" or data_type == "test"):
print("python %s [test/train]" % sys.argv[0], file=sys.stderr)
sys.exit(-1)
pos_folder = "aclImdb/" + data_type + "/pos/"
neg_folder = "aclImdb/" + data_type + "/neg/"
pos_train_list = [(pos_folder + x, "1") for x in os.listdir(pos_folder)]
neg_train_list = [(neg_folder + x, "0") for x in os.listdir(neg_folder)]
all_train_list = pos_train_list + neg_train_list
random.shuffle(all_train_list)
def load_dict(dictfile):
"""
Load word id dict
"""
vocab = {}
wid = 0
with open(dictfile) as f:
for line in f:
vocab[line.strip()] = str(wid)
wid += 1
return vocab
vocab = load_dict("aclImdb/imdb.vocab")
unk_id = str(len(vocab))
print("vocab size: ", len(vocab), file=sys.stderr)
pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
for fitem in all_train_list:
label = str(fitem[1])
fname = fitem[0]
with open(fname) as f:
sent = f.readline().lower().replace("<br />", " ").strip()
out_s = "%s | %s" % (sent, label)
print(out_s, file=sys.stdout)
#!/bin/bash
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1
#cudaid=${face_detection:=0} # use 0-th card as default
#export CUDA_VISIBLE_DEVICES=$cudaid
export CPU_NUM=1
export NUM_THREADS=1
FLAGS_benchmark=true python train.py --is_local 1 --cloud_train 0 --train_data_path data/raw/train.txt --enable_ce | python _ce.py
export CPU_NUM=1
export NUM_THREADS=8
FLAGS_benchmark=true python train.py --is_local 1 --cloud_train 0 --train_data_path data/raw/train.txt --enable_ce | python _ce.py
export CPU_NUM=8
export NUM_THREADS=8
FLAGS_benchmark=true python train.py --is_local 1 --cloud_train 0 --train_data_path data/raw/train.txt --enable_ce | python _ce.py
# this file is only used for continuous evaluation test!
import os
import sys
sys.path.append(os.environ['ceroot'])
from kpi import CostKpi
from kpi import DurationKpi
from kpi import AccKpi
each_pass_duration_cpu1_thread1_kpi = DurationKpi('each_pass_duration_cpu1_thread1', 0.08, 0, actived=True)
train_loss_cpu1_thread1_kpi = CostKpi('train_loss_cpu1_thread1', 0.08, 0)
train_auc_val_cpu1_thread1_kpi = AccKpi('train_auc_val_cpu1_thread1', 0.08, 0)
train_batch_auc_val_cpu1_thread1_kpi = AccKpi('train_batch_auc_val_cpu1_thread1', 0.08, 0)
each_pass_duration_cpu1_thread8_kpi = DurationKpi('each_pass_duration_cpu1_thread8', 0.08, 0, actived=True)
train_loss_cpu1_thread8_kpi = CostKpi('train_loss_cpu1_thread8', 0.08, 0)
train_auc_val_cpu1_thread8_kpi = AccKpi('train_auc_val_cpu1_thread8', 0.08, 0)
train_batch_auc_val_cpu1_thread8_kpi = AccKpi('train_batch_auc_val_cpu1_thread8', 0.08, 0)
each_pass_duration_cpu8_thread8_kpi = DurationKpi('each_pass_duration_cpu8_thread8', 0.08, 0, actived=True)
train_loss_cpu8_thread8_kpi = CostKpi('train_loss_cpu8_thread8', 0.08, 0)
train_auc_val_cpu8_thread8_kpi = AccKpi('train_auc_val_cpu8_thread8', 0.08, 0)
train_batch_auc_val_cpu8_thread8_kpi = AccKpi('train_batch_auc_val_cpu8_thread8', 0.08, 0)
tracking_kpis = [
each_pass_duration_cpu1_thread1_kpi,
train_loss_cpu1_thread1_kpi,
train_auc_val_cpu1_thread1_kpi,
train_batch_auc_val_cpu1_thread1_kpi,
each_pass_duration_cpu1_thread8_kpi,
train_loss_cpu1_thread8_kpi,
train_auc_val_cpu1_thread8_kpi,
train_batch_auc_val_cpu1_thread8_kpi,
each_pass_duration_cpu8_thread8_kpi,
train_loss_cpu8_thread8_kpi,
train_auc_val_cpu8_thread8_kpi,
train_batch_auc_val_cpu8_thread8_kpi,
]
def parse_log(log):
'''
This method should be implemented by model developers.
The suggestion:
each line in the log should be key, value, for example:
"
train_cost\t1.0
test_cost\t1.0
train_cost\t1.0
train_cost\t1.0
train_acc\t1.2
"
'''
for line in log.split('\n'):
fs = line.strip().split('\t')
print(fs)
if len(fs) == 3 and fs[0] == 'kpis':
kpi_name = fs[1]
kpi_value = float(fs[2])
yield kpi_name, kpi_value
def log_to_ce(log):
kpi_tracker = {}
for kpi in tracking_kpis:
kpi_tracker[kpi.name] = kpi
for (kpi_name, kpi_value) in parse_log(log):
print(kpi_name, kpi_value)
kpi_tracker[kpi_name].add_record(kpi_value)
kpi_tracker[kpi_name].persist()
if __name__ == '__main__':
log = sys.stdin.read()
log_to_ce(log)
...@@ -61,14 +61,14 @@ def infer(): ...@@ -61,14 +61,14 @@ def infer():
startup_program = fluid.framework.Program() startup_program = fluid.framework.Program()
test_program = fluid.framework.Program() test_program = fluid.framework.Program()
with fluid.framework.program_guard(test_program, startup_program): with fluid.framework.program_guard(test_program, startup_program):
loss, data_list, auc_var, batch_auc_var = ctr_dnn_model(args.embedding_size, args.sparse_feature_dim) loss, auc_var, batch_auc_var, _, data_list = ctr_dnn_model(args.embedding_size, args.sparse_feature_dim, False)
exe = fluid.Executor(place) exe = fluid.Executor(place)
feeder = fluid.DataFeeder(feed_list=data_list, place=place) feeder = fluid.DataFeeder(feed_list=data_list, place=place)
with fluid.scope_guard(inference_scope): fluid.io.load_persistables(executor=exe, dirname=args.model_path,
[inference_program, _, fetch_targets] = fluid.io.load_inference_model(args.model_path, exe) main_program=fluid.default_main_program())
def set_zero(var_name): def set_zero(var_name):
param = inference_scope.var(var_name).get_tensor() param = inference_scope.var(var_name).get_tensor()
...@@ -80,9 +80,9 @@ def infer(): ...@@ -80,9 +80,9 @@ def infer():
set_zero(name) set_zero(name)
for batch_id, data in enumerate(test_reader()): for batch_id, data in enumerate(test_reader()):
loss_val, auc_val = exe.run(inference_program, loss_val, auc_val = exe.run(test_program,
feed=feeder.feed(data), feed=feeder.feed(data),
fetch_list=fetch_targets) fetch_list=[loss, auc_var])
if batch_id % 100 == 0: if batch_id % 100 == 0:
logger.info("TEST --> batch: {} loss: {} auc: {}".format(batch_id, loss_val/args.batch_size, auc_val)) logger.info("TEST --> batch: {} loss: {} auc: {}".format(batch_id, loss_val/args.batch_size, auc_val))
......
...@@ -104,7 +104,7 @@ def ctr_deepfm_model(factor_size, sparse_feature_dim, dense_feature_dim, sparse_ ...@@ -104,7 +104,7 @@ def ctr_deepfm_model(factor_size, sparse_feature_dim, dense_feature_dim, sparse_
return avg_cost, auc_var, batch_auc_var, py_reader return avg_cost, auc_var, batch_auc_var, py_reader
def ctr_dnn_model(embedding_size, sparse_feature_dim): def ctr_dnn_model(embedding_size, sparse_feature_dim, use_py_reader=True):
def embedding_layer(input): def embedding_layer(input):
return fluid.layers.embedding( return fluid.layers.embedding(
...@@ -126,10 +126,12 @@ def ctr_dnn_model(embedding_size, sparse_feature_dim): ...@@ -126,10 +126,12 @@ def ctr_dnn_model(embedding_size, sparse_feature_dim):
label = fluid.layers.data(name='label', shape=[1], dtype='int64') label = fluid.layers.data(name='label', shape=[1], dtype='int64')
datas = [dense_input] + sparse_input_ids + [label] words = [dense_input] + sparse_input_ids + [label]
py_reader = None
if use_py_reader:
py_reader = fluid.layers.create_py_reader_by_data(capacity=64, py_reader = fluid.layers.create_py_reader_by_data(capacity=64,
feed_list=datas, feed_list=words,
name='py_reader', name='py_reader',
use_double_buffer=True) use_double_buffer=True)
words = fluid.layers.read_file(py_reader) words = fluid.layers.read_file(py_reader)
...@@ -156,4 +158,4 @@ def ctr_dnn_model(embedding_size, sparse_feature_dim): ...@@ -156,4 +158,4 @@ def ctr_dnn_model(embedding_size, sparse_feature_dim):
auc_var, batch_auc_var, auc_states = \ auc_var, batch_auc_var, auc_states = \
fluid.layers.auc(input=predict, label=words[-1], num_thresholds=2 ** 12, slide_steps=20) fluid.layers.auc(input=predict, label=words[-1], num_thresholds=2 ** 12, slide_steps=20)
return avg_cost, auc_var, batch_auc_var, py_reader return avg_cost, auc_var, batch_auc_var, py_reader, words
...@@ -46,7 +46,7 @@ class CriteoDataset(Dataset): ...@@ -46,7 +46,7 @@ class CriteoDataset(Dataset):
return self._reader_creator(file_list, True, trainer_num, trainer_id) return self._reader_creator(file_list, True, trainer_num, trainer_id)
def test(self, file_list): def test(self, file_list):
return self._reader_creator(file_list, False, -1) return self._reader_creator(file_list, False, 1, 0)
def infer(self, file_list): def infer(self, file_list):
return self._reader_creator(file_list, False, -1) return self._reader_creator(file_list, False, 1, 0)
...@@ -107,12 +107,22 @@ def parse_args(): ...@@ -107,12 +107,22 @@ def parse_args():
type=int, type=int,
default=1, default=1,
help='The num of trianers, (default: 1)') help='The num of trianers, (default: 1)')
parser.add_argument(
'--enable_ce',
action='store_true',
help='If set, run the task with continuous evaluation logs.')
return parser.parse_args() return parser.parse_args()
def train_loop(args, train_program, py_reader, loss, auc_var, batch_auc_var, def train_loop(args, train_program, py_reader, loss, auc_var, batch_auc_var,
trainer_num, trainer_id): trainer_num, trainer_id):
if args.enable_ce:
SEED = 102
train_program.random_seed = SEED
fluid.default_startup_program().random_seed = SEED
dataset = reader.CriteoDataset(args.sparse_feature_dim) dataset = reader.CriteoDataset(args.sparse_feature_dim)
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( paddle.reader.shuffle(
...@@ -146,6 +156,7 @@ def train_loop(args, train_program, py_reader, loss, auc_var, batch_auc_var, ...@@ -146,6 +156,7 @@ def train_loop(args, train_program, py_reader, loss, auc_var, batch_auc_var,
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
total_time = 0
for pass_id in range(args.num_passes): for pass_id in range(args.num_passes):
pass_start = time.time() pass_start = time.time()
batch_id = 0 batch_id = 0
...@@ -163,15 +174,32 @@ def train_loop(args, train_program, py_reader, loss, auc_var, batch_auc_var, ...@@ -163,15 +174,32 @@ def train_loop(args, train_program, py_reader, loss, auc_var, batch_auc_var,
if batch_id % 1000 == 0 and batch_id != 0: if batch_id % 1000 == 0 and batch_id != 0:
model_dir = args.model_output_dir + '/batch-' + str(batch_id) model_dir = args.model_output_dir + '/batch-' + str(batch_id)
if args.trainer_id == 0: if args.trainer_id == 0:
fluid.io.save_inference_model(model_dir, data_name_list, [loss, auc_var], exe) fluid.io.save_persistables(executor=exe, dirname=model_dir,
main_program=fluid.default_main_program())
batch_id += 1 batch_id += 1
except fluid.core.EOFException: except fluid.core.EOFException:
py_reader.reset() py_reader.reset()
print("pass_id: %d, pass_time_cost: %f" % (pass_id, time.time() - pass_start)) print("pass_id: %d, pass_time_cost: %f" % (pass_id, time.time() - pass_start))
total_time += time.time() - pass_start
model_dir = args.model_output_dir + '/pass-' + str(pass_id) model_dir = args.model_output_dir + '/pass-' + str(pass_id)
if args.trainer_id == 0: if args.trainer_id == 0:
fluid.io.save_inference_model(model_dir, data_name_list, [loss, auc_var], exe) fluid.io.save_persistables(executor=exe, dirname=model_dir,
main_program=fluid.default_main_program())
# only for ce
if args.enable_ce:
threads_num, cpu_num = get_cards(args)
epoch_idx = args.num_passes
print("kpis\teach_pass_duration_cpu%s_thread%s\t%s" %
(cpu_num, threads_num, total_time / epoch_idx))
print("kpis\ttrain_loss_cpu%s_thread%s\t%s" %
(cpu_num, threads_num, loss_val/args.batch_size))
print("kpis\ttrain_auc_val_cpu%s_thread%s\t%s" %
(cpu_num, threads_num, auc_val))
print("kpis\ttrain_batch_auc_val_cpu%s_thread%s\t%s" %
(cpu_num, threads_num, batch_auc_val))
def train(): def train():
...@@ -180,7 +208,7 @@ def train(): ...@@ -180,7 +208,7 @@ def train():
if not os.path.isdir(args.model_output_dir): if not os.path.isdir(args.model_output_dir):
os.mkdir(args.model_output_dir) os.mkdir(args.model_output_dir)
loss, auc_var, batch_auc_var, py_reader = ctr_dnn_model(args.embedding_size, args.sparse_feature_dim) loss, auc_var, batch_auc_var, py_reader, _ = ctr_dnn_model(args.embedding_size, args.sparse_feature_dim)
optimizer = fluid.optimizer.Adam(learning_rate=1e-4) optimizer = fluid.optimizer.Adam(learning_rate=1e-4)
optimizer.minimize(loss) optimizer.minimize(loss)
if args.cloud_train: if args.cloud_train:
...@@ -224,5 +252,11 @@ def train(): ...@@ -224,5 +252,11 @@ def train():
) )
def get_cards(args):
threads_num = os.environ.get('NUM_THREADS', 1)
cpu_num = os.environ.get('CPU_NUM', 1)
return int(threads_num), int(cpu_num)
if __name__ == '__main__': if __name__ == '__main__':
train() train()
...@@ -79,7 +79,7 @@ SessionId ItemId Time ...@@ -79,7 +79,7 @@ SessionId ItemId Time
2 214757407 1396850438.247 2 214757407 1396850438.247
``` ```
数据格式需要转换 运行脚本 数据格式需要转换, 运行脚本如下
``` ```
python convert_format.py python convert_format.py
``` ```
...@@ -101,7 +101,7 @@ python convert_format.py ...@@ -101,7 +101,7 @@ python convert_format.py
根据训练和测试文件生成字典和对应的paddle输入文件 根据训练和测试文件生成字典和对应的paddle输入文件
注意需要将训练文件放到一个目录下面,测试文件放到一个目录下面,同时支持多训练文件 需要将训练文件放到目录raw_train_data下,测试文件放到目录raw_test_data下,并生成对应的train_data,test_data和vocab.txt文件
``` ```
python text2paddle.py raw_train_data/ raw_test_data/ train_data test_data vocab.txt python text2paddle.py raw_train_data/ raw_test_data/ train_data test_data vocab.txt
``` ```
......
...@@ -13,22 +13,26 @@ import net ...@@ -13,22 +13,26 @@ import net
SEED = 102 SEED = 102
def parse_args(): def parse_args():
parser = argparse.ArgumentParser("gru4rec benchmark.") parser = argparse.ArgumentParser("gru4rec benchmark.")
parser.add_argument( parser.add_argument(
'--train_dir', type=str, default='train_data', help='train file address') '--train_dir',
parser.add_argument( type=str,
'--vocab_path', type=str, default='vocab.txt', help='vocab file address') default='train_data',
parser.add_argument( help='train file address')
'--is_local', type=int, default=1, help='whether local')
parser.add_argument( parser.add_argument(
'--hid_size', type=int, default=100, help='hid size') '--vocab_path',
type=str,
default='vocab.txt',
help='vocab file address')
parser.add_argument('--is_local', type=int, default=1, help='whether local')
parser.add_argument('--hid_size', type=int, default=100, help='hid size')
parser.add_argument( parser.add_argument(
'--model_dir', type=str, default='model_recall20', help='model dir') '--model_dir', type=str, default='model_recall20', help='model dir')
parser.add_argument( parser.add_argument(
'--batch_size', type=int, default=5, help='num of batch size') '--batch_size', type=int, default=5, help='num of batch size')
parser.add_argument( parser.add_argument('--pass_num', type=int, default=10, help='num of epoch')
'--pass_num', type=int, default=10, help='num of epoch')
parser.add_argument( parser.add_argument(
'--print_batch', type=int, default=10, help='num of print batch') '--print_batch', type=int, default=10, help='num of print batch')
parser.add_argument( parser.add_argument(
...@@ -40,19 +44,33 @@ def parse_args(): ...@@ -40,19 +44,33 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--role', type=str, default='pserver', help='trainer or pserver') '--role', type=str, default='pserver', help='trainer or pserver')
parser.add_argument( parser.add_argument(
'--endpoints', type=str, default='127.0.0.1:6000', help='The pserver endpoints, like: 127.0.0.1:6000, 127.0.0.1:6001') '--endpoints',
parser.add_argument( type=str,
'--current_endpoint', type=str, default='127.0.0.1:6000', help='The current_endpoint') default='127.0.0.1:6000',
parser.add_argument( help='The pserver endpoints, like: 127.0.0.1:6000, 127.0.0.1:6001')
'--trainer_id', type=int, default=0, help='trainer id ,only trainer_id=0 save model') parser.add_argument(
parser.add_argument( '--current_endpoint',
'--trainers', type=int, default=1, help='The num of trianers, (default: 1)') type=str,
default='127.0.0.1:6000',
help='The current_endpoint')
parser.add_argument(
'--trainer_id',
type=int,
default=0,
help='trainer id ,only trainer_id=0 save model')
parser.add_argument(
'--trainers',
type=int,
default=1,
help='The num of trianers, (default: 1)')
args = parser.parse_args() args = parser.parse_args()
return args return args
def get_cards(args): def get_cards(args):
return args.num_devices return args.num_devices
def train(): def train():
""" do training """ """ do training """
args = parse_args() args = parse_args()
...@@ -67,7 +85,8 @@ def train(): ...@@ -67,7 +85,8 @@ def train():
buffer_size=1000, word_freq_threshold=0, is_train=True) buffer_size=1000, word_freq_threshold=0, is_train=True)
# Train program # Train program
src_wordseq, dst_wordseq, avg_cost, acc = net.network(vocab_size=vocab_size, hid_size=hid_size) src_wordseq, dst_wordseq, avg_cost, acc = net.all_vocab_network(
vocab_size=vocab_size, hid_size=hid_size)
# Optimization to minimize lost # Optimization to minimize lost
sgd_optimizer = fluid.optimizer.SGD(learning_rate=args.base_lr) sgd_optimizer = fluid.optimizer.SGD(learning_rate=args.base_lr)
...@@ -97,8 +116,10 @@ def train(): ...@@ -97,8 +116,10 @@ def train():
lod_dst_wordseq = utils.to_lodtensor([dat[1] for dat in data], lod_dst_wordseq = utils.to_lodtensor([dat[1] for dat in data],
place) place)
ret_avg_cost = exe.run(main_program, ret_avg_cost = exe.run(main_program,
feed={ "src_wordseq": lod_src_wordseq, feed={
"dst_wordseq": lod_dst_wordseq}, "src_wordseq": lod_src_wordseq,
"dst_wordseq": lod_dst_wordseq
},
fetch_list=fetch_list) fetch_list=fetch_list)
avg_ppl = np.exp(ret_avg_cost[0]) avg_ppl = np.exp(ret_avg_cost[0])
newest_ppl = np.mean(avg_ppl) newest_ppl = np.mean(avg_ppl)
...@@ -113,7 +134,8 @@ def train(): ...@@ -113,7 +134,8 @@ def train():
feed_var_names = ["src_wordseq", "dst_wordseq"] feed_var_names = ["src_wordseq", "dst_wordseq"]
fetch_vars = [avg_cost, acc] fetch_vars = [avg_cost, acc]
if args.trainer_id == 0: if args.trainer_id == 0:
fluid.io.save_inference_model(save_dir, feed_var_names, fetch_vars, exe) fluid.io.save_inference_model(save_dir, feed_var_names,
fetch_vars, exe)
print("model saved in %s" % save_dir) print("model saved in %s" % save_dir)
print("finish training") print("finish training")
...@@ -123,7 +145,8 @@ def train(): ...@@ -123,7 +145,8 @@ def train():
else: else:
print("run distribute training") print("run distribute training")
t = fluid.DistributeTranspiler() t = fluid.DistributeTranspiler()
t.transpile(args.trainer_id, pservers=args.endpoints, trainers=args.trainers) t.transpile(
args.trainer_id, pservers=args.endpoints, trainers=args.trainers)
if args.role == "pserver": if args.role == "pserver":
print("run psever") print("run psever")
pserver_prog = t.get_pserver_program(args.current_endpoint) pserver_prog = t.get_pserver_program(args.current_endpoint)
...@@ -136,5 +159,6 @@ def train(): ...@@ -136,5 +159,6 @@ def train():
print("run trainer") print("run trainer")
train_loop(t.get_trainer_program()) train_loop(t.get_trainer_program())
if __name__ == "__main__": if __name__ == "__main__":
train() train()
...@@ -11,6 +11,7 @@ import paddle ...@@ -11,6 +11,7 @@ import paddle
import utils import utils
def parse_args(): def parse_args():
parser = argparse.ArgumentParser("gru4rec benchmark.") parser = argparse.ArgumentParser("gru4rec benchmark.")
parser.add_argument( parser.add_argument(
...@@ -22,12 +23,15 @@ def parse_args(): ...@@ -22,12 +23,15 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--model_dir', type=str, default='model_recall20', help='model dir') '--model_dir', type=str, default='model_recall20', help='model dir')
parser.add_argument( parser.add_argument(
'--use_cuda', type=int, default='1', help='whether use cuda') '--use_cuda', type=int, default='0', help='whether use cuda')
parser.add_argument( parser.add_argument(
'--batch_size', type=int, default='5', help='batch_size') '--batch_size', type=int, default='5', help='batch_size')
parser.add_argument(
'--vocab_path', type=str, default='vocab.txt', help='vocab file')
args = parser.parse_args() args = parser.parse_args()
return args return args
def infer(test_reader, use_cuda, model_path): def infer(test_reader, use_cuda, model_path):
""" inference function """ """ inference function """
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
...@@ -72,11 +76,16 @@ if __name__ == "__main__": ...@@ -72,11 +76,16 @@ if __name__ == "__main__":
test_dir = args.test_dir test_dir = args.test_dir
model_dir = args.model_dir model_dir = args.model_dir
batch_size = args.batch_size batch_size = args.batch_size
vocab_path = args.vocab_path
use_cuda = True if args.use_cuda else False use_cuda = True if args.use_cuda else False
print("start index: ", start_index, " last_index:" ,last_index) print("start index: ", start_index, " last_index:", last_index)
vocab_size, test_reader = utils.prepare_data( vocab_size, test_reader = utils.prepare_data(
test_dir, "", batch_size=batch_size, test_dir,
buffer_size=1000, word_freq_threshold=0, is_train=False) vocab_path,
batch_size=batch_size,
buffer_size=1000,
word_freq_threshold=0,
is_train=False)
for epoch in range(start_index, last_index + 1): for epoch in range(start_index, last_index + 1):
epoch_path = model_dir + "/epoch_" + str(epoch) epoch_path = model_dir + "/epoch_" + str(epoch)
......
...@@ -171,7 +171,8 @@ def train_cross_entropy_network(vocab_size, neg_size, hid_size, drop_out=0.2): ...@@ -171,7 +171,8 @@ def train_cross_entropy_network(vocab_size, neg_size, hid_size, drop_out=0.2):
ele_mul = fluid.layers.elementwise_mul(emb_label_drop, gru) ele_mul = fluid.layers.elementwise_mul(emb_label_drop, gru)
red_sum = fluid.layers.reduce_sum(input=ele_mul, dim=1, keep_dim=True) red_sum = fluid.layers.reduce_sum(input=ele_mul, dim=1, keep_dim=True)
pre = fluid.layers.sequence_reshape(input=red_sum, new_dim=(neg_size + 1)) pre_ = fluid.layers.sequence_reshape(input=red_sum, new_dim=(neg_size + 1))
pre = fluid.layers.softmax(input=pre_)
cost = fluid.layers.cross_entropy(input=pre, label=pos_label) cost = fluid.layers.cross_entropy(input=pre, label=pos_label)
cost_sum = fluid.layers.reduce_sum(input=cost) cost_sum = fluid.layers.reduce_sum(input=cost)
......
...@@ -68,9 +68,11 @@ def train(): ...@@ -68,9 +68,11 @@ def train():
# Train program # Train program
if args.loss == 'bpr': if args.loss == 'bpr':
print('bpr loss')
src, pos_label, label, avg_cost = net.train_bpr_network( src, pos_label, label, avg_cost = net.train_bpr_network(
neg_size=args.neg_size, vocab_size=vocab_size, hid_size=hid_size) neg_size=args.neg_size, vocab_size=vocab_size, hid_size=hid_size)
else: else:
print('cross-entory loss')
src, pos_label, label, avg_cost = net.train_cross_entropy_network( src, pos_label, label, avg_cost = net.train_cross_entropy_network(
neg_size=args.neg_size, vocab_size=vocab_size, hid_size=hid_size) neg_size=args.neg_size, vocab_size=vocab_size, hid_size=hid_size)
......
...@@ -45,8 +45,8 @@ def to_lodtensor_bpr(raw_data, neg_size, vocab_size, place): ...@@ -45,8 +45,8 @@ def to_lodtensor_bpr(raw_data, neg_size, vocab_size, place):
neg_data = np.tile(pos_data, neg_size) neg_data = np.tile(pos_data, neg_size)
np.random.shuffle(neg_data) np.random.shuffle(neg_data)
for ii in range(length * neg_size): for ii in range(length * neg_size):
if neg_data[ii] == pos_data[ii / neg_size]: if neg_data[ii] == pos_data[ii // neg_size]:
neg_data[ii] = pos_data[length - 1 - ii / neg_size] neg_data[ii] = pos_data[length - 1 - ii // neg_size]
label_data = np.column_stack( label_data = np.column_stack(
(pos_data.reshape(length, 1), neg_data.reshape(length, neg_size))) (pos_data.reshape(length, 1), neg_data.reshape(length, neg_size)))
......
#!/bin/bash
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1
export CPU_NUM=1
export NUM_THREADS=1
FLAGS_benchmark=true python train.py --enable_ce | python _ce.py
# this file is only used for continuous evaluation test!
import os
import sys
sys.path.append(os.environ['ceroot'])
from kpi import CostKpi
from kpi import DurationKpi
from kpi import AccKpi
each_pass_duration_cpu1_thread1_kpi = DurationKpi('each_pass_duration_cpu1_thread1', 0.08, 0, actived=True)
train_loss_cpu1_thread1_kpi = CostKpi('train_loss_cpu1_thread1', 0.08, 0)
tracking_kpis = [
each_pass_duration_cpu1_thread1_kpi,
train_loss_cpu1_thread1_kpi,
]
def parse_log(log):
'''
This method should be implemented by model developers.
The suggestion:
each line in the log should be key, value, for example:
"
train_cost\t1.0
test_cost\t1.0
train_cost\t1.0
train_cost\t1.0
train_acc\t1.2
"
'''
for line in log.split('\n'):
fs = line.strip().split('\t')
print(fs)
if len(fs) == 3 and fs[0] == 'kpis':
kpi_name = fs[1]
kpi_value = float(fs[2])
yield kpi_name, kpi_value
def log_to_ce(log):
kpi_tracker = {}
for kpi in tracking_kpis:
kpi_tracker[kpi.name] = kpi
for (kpi_name, kpi_value) in parse_log(log):
print(kpi_name, kpi_value)
kpi_tracker[kpi_name].add_record(kpi_value)
kpi_tracker[kpi_name].persist()
if __name__ == '__main__':
log = sys.stdin.read()
log_to_ce(log)
...@@ -81,10 +81,19 @@ def parse_args(): ...@@ -81,10 +81,19 @@ def parse_args():
"for index processing") "for index processing")
parser.add_argument( parser.add_argument(
"--hidden_size", type=int, default=128, help="Hidden dim") "--hidden_size", type=int, default=128, help="Hidden dim")
parser.add_argument(
'--enable_ce',
action='store_true',
help='If set, run the task with continuous evaluation logs.')
return parser.parse_args() return parser.parse_args()
def start_train(args): def start_train(args):
if args.enable_ce:
SEED = 102
fluid.default_startup_program().random_seed = SEED
fluid.default_startup_program().random_seed = SEED
dataset = reader.SyntheticDataset(args.sparse_feature_dim, args.query_slots, dataset = reader.SyntheticDataset(args.sparse_feature_dim, args.query_slots,
args.title_slots) args.title_slots)
train_reader = paddle.batch( train_reader = paddle.batch(
...@@ -115,7 +124,10 @@ def start_train(args): ...@@ -115,7 +124,10 @@ def start_train(args):
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(startup_program) exe.run(startup_program)
total_time = 0
ce_info = []
for pass_id in range(args.epochs): for pass_id in range(args.epochs):
start_time = time.time()
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
loss_val, correct_val = exe.run(loop_program, loss_val, correct_val = exe.run(loop_program,
feed=feeder.feed(data), feed=feeder.feed(data),
...@@ -123,10 +135,34 @@ def start_train(args): ...@@ -123,10 +135,34 @@ def start_train(args):
logger.info("TRAIN --> pass: {} batch_id: {} avg_cost: {}, acc: {}" logger.info("TRAIN --> pass: {} batch_id: {} avg_cost: {}, acc: {}"
.format(pass_id, batch_id, loss_val, .format(pass_id, batch_id, loss_val,
float(correct_val) / args.batch_size)) float(correct_val) / args.batch_size))
ce_info.append(loss_val[0])
end_time = time.time()
total_time += end_time - start_time
fluid.io.save_inference_model(args.model_output_dir, fluid.io.save_inference_model(args.model_output_dir,
[val.name for val in all_slots], [val.name for val in all_slots],
[avg_cost, correct], exe) [avg_cost, correct], exe)
# only for ce
if args.enable_ce:
threads_num, cpu_num = get_cards(args)
epoch_idx = args.epochs
ce_loss = 0
try:
ce_loss = ce_info[-2]
except:
logger.error("ce info error")
print("kpis\teach_pass_duration_cpu%s_thread%s\t%s" %
(cpu_num, threads_num, total_time / epoch_idx))
print("kpis\ttrain_loss_cpu%s_thread%s\t%s" %
(cpu_num, threads_num, ce_loss))
def get_cards(args):
threads_num = os.environ.get('NUM_THREADS', 1)
cpu_num = os.environ.get('CPU_NUM', 1)
return int(threads_num), int(cpu_num)
def main(): def main():
args = parse_args() args = parse_args()
......
...@@ -81,7 +81,7 @@ def infer(args, vocab_size, test_reader): ...@@ -81,7 +81,7 @@ def infer(args, vocab_size, test_reader):
start_up_program = fluid.Program() start_up_program = fluid.Program()
with fluid.program_guard(main_program, start_up_program): with fluid.program_guard(main_program, start_up_program):
acc = model(vocab_size, emb_size, hid_size) acc = model(vocab_size, emb_size, hid_size)
for epoch in xrange(start_index, last_index + 1): for epoch in range(start_index, last_index + 1):
copy_program = main_program.clone() copy_program = main_program.clone()
model_path = model_dir + "/epoch_" + str(epoch) model_path = model_dir + "/epoch_" + str(epoch)
fluid.io.load_params( fluid.io.load_params(
......
...@@ -33,11 +33,14 @@ Fluid模型配置和参数文件的工具。 ...@@ -33,11 +33,14 @@ Fluid模型配置和参数文件的工具。
VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`__\ 、\ `MS VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`__\ 、\ `MS
COCO <http://cocodataset.org/#home>`__\ 数据训练通用物体检测模型,当前介绍了SSD算法,SSD全称Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具有检测速度快且检测精度高的特点。 COCO <http://cocodataset.org/#home>`__\ 数据训练通用物体检测模型,当前介绍了SSD算法,SSD全称Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具有检测速度快且检测精度高的特点。
开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 `WIDER FACE <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/>`_ 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 `第一名 <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html>`_。 开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 `WIDER FACE <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/>`_ 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 `第一名 <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html>`_ 。
RCNN系列模型是典型的两阶段目标检测器,相较于传统提取区域的方法,RCNN中RPN网络通过共享卷积层参数大幅提高提取区域的效率,并提出高质量的候选区域。其中典型模型包括Faster RCNN和Mask RCNN。
- `Single Shot MultiBox - `Single Shot MultiBox
Detector <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md>`__ Detector <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md>`__
- `Face Detector: PyramidBox <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md>`_ - `Face Detector: PyramidBox <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md>`_
- `RCNN <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md>`_
图像语义分割 图像语义分割
------------ ------------
......
...@@ -28,11 +28,14 @@ Fluid模型配置和参数文件的工具。 ...@@ -28,11 +28,14 @@ Fluid模型配置和参数文件的工具。
开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html) 开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html)
Faster RCNN 是典型的两阶段目标检测器,相较于传统提取区域的方法,Faster RCNN中RPN网络通过共享卷积层参数大幅提高提取区域的效率,并提出高质量的候选区域。 Faster RCNN模型是典型的两阶段目标检测器,相较于传统提取区域的方法,通过RPN网络共享卷积层参数大幅提高提取区域的效率,并提出高质量的候选区域。
Mask RCNN模型是基于Faster RCNN模型的经典实例分割模型,在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。
- [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md) - [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md)
- [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md) - [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md)
- [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn/README_cn.md) - [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)
- [Mask RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)
图像语义分割 图像语义分割
------------ ------------
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册