diff --git a/README.md b/README.md index c86268805acbce4e78f26c95eb2c9f8d97cbcbbe..98cef358dae468f9b16209299de2776210970a99 100644 --- a/README.md +++ b/README.md @@ -16,56 +16,56 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ## PaddleCV 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[AlexNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks) +[AlexNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks) [VGG](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力|[Very Deep ConvNets for Large-Scale Inage Recognition](https://arxiv.org/pdf/1409.1556.pdf) -[GoogleNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594) -[ResNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) -[Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) -[MobileNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) -[DPN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629) -[SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507) -[SSD](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) -[Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf) -[Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) -[Mask RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)|基于Faster RCNN模型的经典实例分割模型|在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。|[Mask R-CNN](https://arxiv.org/abs/1703.06870) -[ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545) -[DCGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf) -[ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784) -[CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) -[CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks) -[Attention模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247) +[GoogleNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594) +[ResNet](./fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) +[Inception-v4](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) +[MobileNet](./fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) +[DPN](./fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629) +[SE-ResNeXt](./fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507) +[SSD](./fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) +[Face Detector: PyramidBox](./fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf) +[Faster RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) +[Mask RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|基于Faster RCNN模型的经典实例分割模型|在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。|[Mask R-CNN](https://arxiv.org/abs/1703.06870) +[ICNet](./fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545) +[DCGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf) +[ConditionalGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784) +[CycleGAN](./fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) +[CRNN-CTC模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks) +[Attention模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247) [Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/metric_learning)|度量学习模型|能够用于分析对象时间的关联、比较关系,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域|- -[TSN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859) -[视频模型库](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/video)|视频模型库|给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型|| -[caffe2fluid](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|- +[TSN](./fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859) +[视频模型库](./fluid/PaddleCV/video)|视频模型库|给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型|| +[caffe2fluid](./fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|- ## PaddleNLP 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[Transformer](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762) +[Transformer](./fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762) [LAC](https://github.com/baidu/lac/blob/master/README.md)|联合的词法分析模型|能够整体性地完成中文分词、词性标注、专名识别任务|[Chinese Lexical Analysis with Deep Bi-GRU-CRF Network](https://arxiv.org/abs/1807.01882) [Senta](https://github.com/baidu/Senta/blob/master/README.md)|情感倾向分析模型集|百度AI开放平台中情感倾向分析模型|- -[DAM](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103) +[DAM](./fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103) [SimNet](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)|语义匹配框架|使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力|- -[DuReader](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|- -[Bi-GRU-CRF](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|- +[DuReader](./fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|- +[Bi-GRU-CRF](./fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|- ## PaddleRec 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[TagSpace](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462) -[GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939) -[SSR](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726) -[DeepCTR](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247) -[Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf) +[TagSpace](./fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462) +[GRU4Rec](./fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939) +[SSR](./fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726) +[DeepCTR](./fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247) +[Multiview-Simnet](./fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf) ## Other Models 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[DeepASR](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|- -[DQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236) -[DoubleDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -[DuelingDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) +[DeepASR](./fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|- +[DQN](./fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236) +[DoubleDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) +[DuelingDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) ## License This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). diff --git a/fluid/PaddleCV/deeplabv3+/README.md b/fluid/PaddleCV/deeplabv3+/README.md index 3a075c89c60279e2304a5a204745fdffc2462d7b..b9990a20845f637eb3611f4875213a657b4491d8 100644 --- a/fluid/PaddleCV/deeplabv3+/README.md +++ b/fluid/PaddleCV/deeplabv3+/README.md @@ -1,4 +1,4 @@ -DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.0.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。 +DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.3.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。 ## 代码结构 @@ -38,15 +38,16 @@ data/cityscape/ # 预训练模型准备 +我们为了节约更多的显存,在这里我们使用Group Norm作为我们的归一化手段。 如果需要从头开始训练模型,用户需要下载我们的初始化模型 ``` -wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_xception65_initialize.tgz -tar -xf deeplabv3plus_xception65_initialize.tgz && rm deeplabv3plus_xception65_initialize.tgz +wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz +tar -xf deeplabv3plus_gn_init.tgz && rm deeplabv3plus_gn_init.tgz ``` 如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型 ``` -wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus.tgz -tar -xf deeplabv3plus.tgz && rm deeplabv3plus.tgz +wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz +tar -xf deeplabv3plus_gn.tgz && rm deeplabv3plus_gn.tgz ``` @@ -59,6 +60,7 @@ python ./train.py \ --batch_size=1 \ --train_crop_size=769 \ --total_step=50 \ + --norm_type=gn \ --init_weights_path=$INIT_WEIGHTS_PATH \ --save_weights_path=$SAVE_WEIGHTS_PATH \ --dataset_path=$DATASET_PATH @@ -72,19 +74,25 @@ python train.py --help ``` python ./train.py \ --batch_size=8 \ - --parallel=true \ + --parallel=True \ + --norm_type=gn \ --train_crop_size=769 \ --total_step=90000 \ - --init_weights_path=deeplabv3plus_xception65_initialize.params \ - --save_weights_path=output/ \ + --base_lr=0.001 \ + --init_weights_path=deeplabv3plus_gn_init \ + --save_weights_path=output \ --dataset_path=$DATASET_PATH ``` +如果您的显存不足,可以尝试减小`batch_size`,同时等比例放大`total_step`, 保证相乘的值不变,这得益于Group Norm的特性,改变 `batch_size` 并不会显著影响结果,而且能够节约更多显存, 比如您可以设置`--batch_size=4 --total_step=180000`。 + +如果您希望使用多卡进行训练,可以同比增加`batch_size`,减小`total_step`, 比如原来单卡训练是`--batch_size=4 --total_step=180000`,使用4卡训练则是`--batch_size=16 --total_step=45000` ### 测试 执行以下命令在`Cityscape`测试数据集上进行测试: ``` python ./eval.py \ - --init_weights=deeplabv3plus.params \ + --init_weights=deeplabv3plus_gn \ + --norm_type=gn \ --dataset_path=$DATASET_PATH ``` 需要通过选项`--model_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。 @@ -93,16 +101,17 @@ python ./eval.py \ ## 实验结果 训练完成以后,使用`eval.py`在验证集上进行测试,得到以下结果: ``` -load from: ../models/deeplabv3p +load from: ../models/deeplabv3plus_gn total number 500 -step: 500, mIoU: 0.7873 +step: 500, mIoU: 0.7881 ``` ## 其他信息 -|数据集 | pretrained model | trained model | mean IoU -|---|---|---|---| -|CityScape | [deeplabv3plus_xception65_initialize.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_xception65_initialize.tgz) | [deeplabv3plus.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus.tgz) | 0.7873 | +|数据集 | norm type | pretrained model | trained model | mean IoU +|---|---|---|---|---| +|CityScape | batch norm | [deeplabv3plus_xception65_initialize.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_xception65_initialize.tgz) | [deeplabv3plus.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus.tgz) | 0.7873 | +|CityScape | group norm | [deeplabv3plus_gn_init.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz) | [deeplabv3plus_gn.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz) | 0.7881 | ## 参考 diff --git a/fluid/PaddleCV/deeplabv3+/eval.py b/fluid/PaddleCV/deeplabv3+/eval.py index 6137af412f05c8d664e8f67cfc5ce6a3edad8574..4620dd5d7cb9c91735619478a550c3a4cc747aa6 100644 --- a/fluid/PaddleCV/deeplabv3+/eval.py +++ b/fluid/PaddleCV/deeplabv3+/eval.py @@ -27,6 +27,7 @@ add_arg('verbose', bool, False, "Print mIoU for each step if ver add_arg('use_gpu', bool, True, "Whether use GPU or CPU.") add_arg('num_classes', int, 19, "Number of classes.") add_arg('use_py_reader', bool, True, "Use py_reader.") +add_arg('norm_type', str, 'bn', "Normalization type, should be 'bn' or 'gn'.") #yapf: enable @@ -58,6 +59,7 @@ args = parser.parse_args() models.clean() models.is_train = False +models.default_norm_type = args.norm_type deeplabv3p = models.deeplabv3p image_shape = [1025, 2049] diff --git a/fluid/PaddleCV/deeplabv3+/train.py b/fluid/PaddleCV/deeplabv3+/train.py index 9a0f9f6ca2c63cb57a29d10cb66779c752b73126..4a3d22e0c97132314cee0430feae7276d371bf01 100755 --- a/fluid/PaddleCV/deeplabv3+/train.py +++ b/fluid/PaddleCV/deeplabv3+/train.py @@ -4,7 +4,6 @@ from __future__ import print_function import os if 'FLAGS_fraction_of_gpu_memory_to_use' not in os.environ: os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98' -os.environ['FLAGS_enable_parallel_graph'] = '1' import paddle import paddle.fluid as fluid @@ -34,7 +33,7 @@ add_arg('use_gpu', bool, True, "Whether use GPU or CPU.") add_arg('num_classes', int, 19, "Number of classes.") add_arg('load_logit_layer', bool, True, "Load last logit fc layer or not. If you are training with different number of classes, you should set to False.") add_arg('memory_optimize', bool, True, "Using memory optimizer.") -add_arg('norm_type', str, 'bn', "Normalization type, should be bn or gn.") +add_arg('norm_type', str, 'bn', "Normalization type, should be 'bn' or 'gn'.") add_arg('profile', bool, False, "Enable profiler.") add_arg('use_py_reader', bool, True, "Use py reader.") parser.add_argument( @@ -52,6 +51,13 @@ def profile_context(profile=True): yield def load_model(): + load_vars = [ + x for x in tp.list_vars() + if isinstance(x, fluid.framework.Parameter) and x.name.find('image_pool') == + -1 + ] + fluid.io.load_vars(exe, dirname=args.init_weights_path, vars=load_vars) + return if os.path.isdir(args.init_weights_path): load_vars = [ x for x in tp.list_vars() @@ -225,7 +231,6 @@ with profile_context(args.profile): print("Training done. Model is saved to", args.save_weights_path) save_model() -py_reader.stop() if args.enable_ce: gpu_num = fluid.core.get_cuda_device_count() diff --git a/fluid/PaddleCV/image_classification/README.md b/fluid/PaddleCV/image_classification/README.md index 02500e90e9ff519029952680cb6942ebe38b0a6a..3a20bdf4aa44e2752b8939bfa49886e7c0d5f6f0 100644 --- a/fluid/PaddleCV/image_classification/README.md +++ b/fluid/PaddleCV/image_classification/README.md @@ -81,7 +81,7 @@ python train.py \ * **lr**: initialized learning rate. Default: 0.1. * **pretrained_model**: model path for pretraining. Default: None. * **checkpoint**: the checkpoint path to resume. Default: None. -* **model_category**: the category of models, ("models"|"models_name"). Default: "models". +* **model_category**: the category of models, ("models"|"models_name"). Default: "models_name". Or can start the training step by running the ```run.sh```. @@ -221,6 +221,8 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by ` - Released models: not specify parameter names +**NOTE: These are trained by using model_category=models** + |model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | |- |:-: |:-:| |[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% | diff --git a/fluid/PaddleCV/image_classification/README_cn.md b/fluid/PaddleCV/image_classification/README_cn.md index 17e911596f2cb9830a51efaa839d4fa5fff38a74..c9f553e311ff07e76eab2274646270f5de9b8fa2 100644 --- a/fluid/PaddleCV/image_classification/README_cn.md +++ b/fluid/PaddleCV/image_classification/README_cn.md @@ -79,7 +79,7 @@ python train.py \ * **lr**: initialized learning rate. Default: 0.1. * **pretrained_model**: model path for pretraining. Default: None. * **checkpoint**: the checkpoint path to resume. Default: None. -* **model_category**: the category of models, ("models"|"models_name"). Default:"models". +* **model_category**: the category of models, ("models"|"models_name"). Default:"models_name". **数据读取器说明:** 数据读取器定义在```reader.py```和```reader_cv2.py```中, 一般, CV2 reader可以提高数据读取速度, reader(PIL)可以得到相对更高的精度, 在[训练阶段](#training-a-model), 默认采用的增广方式是随机裁剪与水平翻转, 而在[评估](#inference)与[推断](#inference)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有: * 旋转 @@ -213,6 +213,8 @@ Models包括两种模型:带有参数名字的模型,和不带有参数名 - Released models: not specify parameter names +**注意:这是model_category = models 的预训练模型** + |model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | |- |:-: |:-:| |[ResNet152](http://paddle-imagenet-models.bj.bcebos.com/ResNet152_pretrained.zip) | 78.18%/93.93% | 78.11%/94.04% | diff --git a/fluid/PaddleCV/image_classification/eval.py b/fluid/PaddleCV/image_classification/eval.py index ddce243fe1fcae81ee6064c7ff185fb8a045a402..0660efe13750467ad6bf964b484c9db0ab44b1ee 100644 --- a/fluid/PaddleCV/image_classification/eval.py +++ b/fluid/PaddleCV/image_classification/eval.py @@ -7,8 +7,6 @@ import time import sys import paddle import paddle.fluid as fluid -#import models -import models_name as models #import reader_cv2 as reader import reader as reader import argparse @@ -26,10 +24,21 @@ add_arg('class_dim', int, 1000, "Class number.") add_arg('image_shape', str, "3,224,224", "Input image size") add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") add_arg('pretrained_model', str, None, "Whether to use pretrained model.") -add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.") +add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.") +add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." ) + # yapf: enable -model_list = [m for m in dir(models) if "__" not in m] + +def set_models(model_category): + global models + assert model_category in ["models", "models_name" + ], "{} is not in lists: {}".format( + model_category, ["models", "models_name"]) + if model_category == "models_name": + import models_name as models + else: + import models as models def eval(args): @@ -40,6 +49,7 @@ def eval(args): with_memory_optimization = args.with_mem_opt image_shape = [int(m) for m in args.image_shape.split(",")] + model_list = [m for m in dir(models) if "__" not in m] assert model_name in model_list, "{} is not in lists: {}".format(args.model, model_list) @@ -63,11 +73,11 @@ def eval(args): acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5) else: out = model.net(input=image, class_dim=class_dim) - cost = fluid.layers.cross_entropy(input=out, label=label) - + cost, pred = fluid.layers.softmax_with_cross_entropy( + out, label, return_softmax=True) avg_cost = fluid.layers.mean(x=cost) - acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1) - acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5) + acc_top1 = fluid.layers.accuracy(input=pred, label=label, k=1) + acc_top5 = fluid.layers.accuracy(input=pred, label=label, k=5) test_program = fluid.default_main_program().clone(for_test=True) @@ -125,6 +135,7 @@ def eval(args): def main(): args = parser.parse_args() print_arguments(args) + set_models(args.model_category) eval(args) diff --git a/fluid/PaddleCV/image_classification/infer.py b/fluid/PaddleCV/image_classification/infer.py index e89c08d923cdc37596c76dc7146a2666b719844d..88ccf42912b67035895cd81f5f982edca1bd0a3e 100644 --- a/fluid/PaddleCV/image_classification/infer.py +++ b/fluid/PaddleCV/image_classification/infer.py @@ -7,7 +7,6 @@ import time import sys import paddle import paddle.fluid as fluid -import models import reader import argparse import functools @@ -23,9 +22,19 @@ add_arg('image_shape', str, "3,224,224", "Input image size") add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") add_arg('pretrained_model', str, None, "Whether to use pretrained model.") add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.") +add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." ) # yapf: enable -model_list = [m for m in dir(models) if "__" not in m] + +def set_models(model_category): + global models + assert model_category in ["models", "models_name" + ], "{} is not in lists: {}".format( + model_category, ["models", "models_name"]) + if model_category == "models_name": + import models_name as models + else: + import models as models def infer(args): @@ -35,7 +44,7 @@ def infer(args): pretrained_model = args.pretrained_model with_memory_optimization = args.with_mem_opt image_shape = [int(m) for m in args.image_shape.split(",")] - + model_list = [m for m in dir(models) if "__" not in m] assert model_name in model_list, "{} is not in lists: {}".format(args.model, model_list) @@ -85,6 +94,7 @@ def infer(args): def main(): args = parser.parse_args() print_arguments(args) + set_models(args.model_category) infer(args) diff --git a/fluid/PaddleCV/image_classification/run.sh b/fluid/PaddleCV/image_classification/run.sh index 41e5e493468a8d8bffbf6c6aabc0d7e7947e989b..b0cc2255b03db82bc88397f625ed68023280d2f0 100644 --- a/fluid/PaddleCV/image_classification/run.sh +++ b/fluid/PaddleCV/image_classification/run.sh @@ -1,16 +1,19 @@ #Hyperparameters config +#Example: SE_ResNext50_32x4d python train.py \ --model=SE_ResNeXt50_32x4d \ - --batch_size=32 \ + --batch_size=400 \ --total_images=1281167 \ --class_dim=1000 \ --image_shape=3,224,224 \ --model_save_dir=output/ \ --with_mem_opt=True \ - --lr_strategy=piecewise_decay \ - --lr=0.1 + --lr_strategy=cosine_decay \ + --lr=0.1 \ + --num_epochs=200 \ + --l2_decay=1.2e-4 \ + --model_category=models_name \ # >log_SE_ResNeXt50_32x4d.txt 2>&1 & - #AlexNet: #python train.py \ # --model=AlexNet \ @@ -20,23 +23,11 @@ python train.py \ # --image_shape=3,224,224 \ # --model_save_dir=output/ \ # --with_mem_opt=True \ +# --model_category=models_name \ # --lr_strategy=piecewise_decay \ # --num_epochs=120 \ -# --lr=0.01 - -#VGG11: -#python train.py \ -# --model=VGG11 \ -# --batch_size=512 \ -# --total_images=1281167 \ -# --class_dim=1000 \ -# --image_shape=3,224,224 \ -# --model_save_dir=output/ \ -# --with_mem_opt=True \ -# --lr_strategy=piecewise_decay \ -# --num_epochs=120 \ -# --lr=0.1 - +# --lr=0.01 \ +# --l2_decay=1e-4 #MobileNet v1: #python train.py \ @@ -47,9 +38,11 @@ python train.py \ # --image_shape=3,224,224 \ # --model_save_dir=output/ \ # --with_mem_opt=True \ +# --model_category=models_name \ # --lr_strategy=piecewise_decay \ # --num_epochs=120 \ -# --lr=0.1 +# --lr=0.1 \ +# --l2_decay=3e-5 #python train.py \ # --model=MobileNetV2 \ @@ -58,10 +51,12 @@ python train.py \ # --class_dim=1000 \ # --image_shape=3,224,224 \ # --model_save_dir=output/ \ +# --model_category=models_name \ # --with_mem_opt=True \ # --lr_strategy=cosine_decay \ -# --num_epochs=200 \ -# --lr=0.1 +# --num_epochs=240 \ +# --lr=0.1 \ +# --l2_decay=4e-5 #ResNet50: #python train.py \ # --model=ResNet50 \ @@ -71,9 +66,11 @@ python train.py \ # --image_shape=3,224,224 \ # --model_save_dir=output/ \ # --with_mem_opt=True \ +# --model_category=models_name \ # --lr_strategy=piecewise_decay \ # --num_epochs=120 \ -# --lr=0.1 +# --lr=0.1 \ +# --l2_decay=1e-4 #ResNet101: #python train.py \ @@ -83,44 +80,58 @@ python train.py \ # --class_dim=1000 \ # --image_shape=3,224,224 \ # --model_save_dir=output/ \ -# --with_mem_opt=False \ +# --model_category=models_name \ +# --with_mem_opt=True \ # --lr_strategy=piecewise_decay \ # --num_epochs=120 \ -# --lr=0.1 +# --lr=0.1 \ +# --l2_decay=1e-4 #ResNet152: #python train.py \ # --model=ResNet152 \ # --batch_size=256 \ # --total_images=1281167 \ +# --class_dim=1000 \ # --image_shape=3,224,224 \ +# --model_save_dir=output/ \ # --lr_strategy=piecewise_decay \ +# --model_category=models_name \ +# --with_mem_opt=True \ # --lr=0.1 \ # --num_epochs=120 \ # --l2_decay=1e-4 -#SE_ResNeXt50: +#SE_ResNeXt50_32x4d: #python train.py \ -# --model=SE_ResNeXt50 \ +# --model=SE_ResNeXt50_32x4d \ # --batch_size=400 \ # --total_images=1281167 \ +# --class_dim=1000 \ # --image_shape=3,224,224 \ # --lr_strategy=cosine_decay \ +# --model_category=models_name \ +# --model_save_dir=output/ \ # --lr=0.1 \ # --num_epochs=200 \ -# --l2_decay=12e-5 +# --with_mem_opt=True \ +# --l2_decay=1.2e-4 -#SE_ResNeXt101: +#SE_ResNeXt101_32x4d: #python train.py \ -# --model=SE_ResNeXt101 \ +# --model=SE_ResNeXt101_32x4d \ # --batch_size=400 \ # --total_images=1281167 \ +# --class_dim=1000 \ # --image_shape=3,224,224 \ # --lr_strategy=cosine_decay \ +# --model_category=models_name \ +# --model_save_dir=output/ \ # --lr=0.1 \ # --num_epochs=200 \ -# --l2_decay=15e-5 +# --with_mem_opt=True \ +# --l2_decay=1.5e-5 #VGG11: #python train.py \ @@ -129,8 +140,12 @@ python train.py \ # --total_images=1281167 \ # --image_shape=3,224,224 \ # --lr_strategy=cosine_decay \ +# --class_dim=1000 \ +# --model_category=models_name \ +# --model_save_dir=output/ \ # --lr=0.1 \ # --num_epochs=90 \ +# --with_mem_opt=True \ # --l2_decay=2e-4 #VGG13: @@ -138,8 +153,42 @@ python train.py \ # --model=VGG13 \ # --batch_size=256 \ # --total_images=1281167 \ +# --class_dim=1000 \ # --image_shape=3,224,224 \ # --lr_strategy=cosine_decay \ # --lr=0.01 \ # --num_epochs=90 \ +# --model_category=models_name \ +# --model_save_dir=output/ \ +# --with_mem_opt=True \ # --l2_decay=3e-4 + +#VGG16: +#python train.py +# --model=VGG16 \ +# --batch_size=256 \ +# --total_images=1281167 \ +# --class_dim=1000 \ +# --lr_strategy=cosine_decay \ +# --image_shape=3,224,224 \ +# --model_category=models_name \ +# --model_save_dir=output/ \ +# --lr=0.01 \ +# --num_epochs=90 \ +# --with_mem_opt=True \ +# --l2_decay=3e-4 + +#VGG19: +#python train.py +# --model=VGG19 \ +# --batch_size=256 \ +# --total_images=1281167 \ +# --class_dim=1000 \ +# --image_shape=3,224,224 \ +# --lr_strategy=cosine_decay \ +# --lr=0.01 \ +# --num_epochs=90 \ +# --with_mem_opt=True \ +# --model_category=models_name \ +# --model_save_dir=output/ \ +# --l2_decay=3e-4 diff --git a/fluid/PaddleCV/image_classification/train.py b/fluid/PaddleCV/image_classification/train.py index 87cef29e0eed4c9ed0944a7f5b3a14b76766d579..145b288620bbbb27693bacbb7145e4df8371a4c2 100644 --- a/fluid/PaddleCV/image_classification/train.py +++ b/fluid/PaddleCV/image_classification/train.py @@ -39,7 +39,7 @@ add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.") add_arg('enable_ce', bool, False, "If set True, enable continuous evaluation job.") add_arg('data_dir', str, "./data/ILSVRC2012", "The ImageNet dataset root dir.") -add_arg('model_category', str, "models", "Whether to use models_name or not, valid value:'models','models_name'." ) +add_arg('model_category', str, "models_name", "Whether to use models_name or not, valid value:'models','models_name'." ) add_arg('fp16', bool, False, "Enable half precision training with fp16." ) add_arg('scale_loss', float, 1.0, "Scale loss for fp16." ) add_arg('l2_decay', float, 1e-4, "L2_decay parameter.")