提交 4b47b5d6 编写于 作者: W wuzewu

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleHub into develop

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
[style]
based_on_style = pep8
column_limit = 80
column_limit = 120
......@@ -27,6 +27,7 @@ install:
else
pip install --upgrade paddlepaddle;
pip install -r requirements.txt;
pip install yapf==0.26.0;
fi
notifications:
......
......@@ -6,3 +6,5 @@
| Steffy-zxf | Xuefei Zhang |
| kinghuin | Jinxuan Qiu |
| ShenYuhan | Yuhan Shen |
|haoyuying|Yuying Hao|
|KPatr1ck|Xiaojie Chen|
此差异已折叠。
简体中文 | [English](README.md)
<p align="center">
<img src="./docs/imgs/paddlehub_logo.jpg" align="middle">
<p align="center">
<div align="center">
<h3> <a href=#QuickStart> 快速开始 </a> | <a href="https://paddlehub.readthedocs.io/zh_CN/release-v2.1//"> 教程文档 </a> | <a href="https://www.paddlepaddle.org.cn/hublist"> 模型搜索 </a> | <a href="https://www.paddlepaddle.org.cn/hub"> 演示Demo </a>
</h3>
</div>
------------------------------------------------------------------------------------------
<p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleHub/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleHub?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.6+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/pypi/format/paddlehub?color=c77"></a>
</p>
<p align="center">
<a href="https://github.com/PaddlePaddle/PaddleHub/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleHub?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/PaddleHub/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleHub?color=3af"></a>
<a href="https://pypi.org/project/paddlehub/"><img src="https://img.shields.io/pypi/dm/paddlehub?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleHub/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleHub?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/PaddleHub/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleHub?color=ccf"></a>
</p>
## 简介与特性
- PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型。
- **【模型种类丰富】**: 涵盖CV、NLP、Audio、Video、工业应用主流五大品类的 300+ 预训练模型,全部开源下载,离线可运行。
- **【超低使用门槛】**:无需深度学习背景、无需数据与训练过程,可快速使用AI模型,
- **【一键模型快速预测】**:通过一行命令行或者极简的Python API实现模型调用,可快速体验模型效果。
- **【一键模型转服务化】**:一行命令,搭建深度学习模型API服务化部署能力。
- **【十行代码迁移学习】**:十行代码完成图片分类、文本分类的迁移学习任务
- **【跨平台兼容性】**:可运行于Linux、Windows、MacOS等多种操作系统
## 近期更新
- **2021.05.12**,新增轻量级中文对话模型[plato-mini](https://www.paddlepaddle.org.cn/hubdetail?name=plato-mini&en_category=TextGeneration),可以配合使用wechaty实现微信闲聊机器人,[参考demo](https://github.com/KPatr1ck/paddlehub-wechaty-demo)
- **2021.04.27**,发布v2.1.0版本。【1】新增基于VOC数据集的高精度语义分割模型2个,语音分类模型3个。【2】新增图像语义分割、文本语义匹配、语音分类等相关任务的Fine-Tune能力以及相关任务数据集;完善部署能力:【3】新增ONNX和PaddleInference等模型格式的导出功能。【4】新增[BentoML](https://github.com/bentoml/BentoML) 云原生服务化部署能力,可以支持统一的多框架模型管理和模型部署的工作流,[详细教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v2.1/demo/serving/bentoml/cloud-native-model-serving-with-bentoml.ipynb). 更多内容可以参考BentoML 最新 v0.12.1 [Releasenote](https://github.com/bentoml/BentoML/releases/tag/v0.12.1).(感谢@[parano](https://github.com/parano) @[cqvu](https://github.com/cqvu) @[deehrlic](https://github.com/deehrlic))的贡献与支持。【5】预训练模型总量达到[**【300】**](https://www.paddlepaddle.org.cn/hublist)个。
- **2021.02.18**,发布v2.0.0版本,【1】模型开发调试更简单,finetune接口更加灵活易用。视觉类任务迁移学习能力全面升级,支持[图像分类](./demo/image_classification/README.md)[图像着色](./demo/colorization/README.md)[风格迁移](./demo/style_transfer/README.md)等多种任务;BERT、ERNIE、RoBERTa等Transformer类模型升级至动态图,支持[文本分类](./demo/text_classification/README.md)[序列标注](./demo/sequence_labeling/README.md)的Fine-Tune能力;【2】优化服务化部署Serving能力,支持多卡预测、自动负载均衡,性能大幅度提升;【3】新增自动数据增强能力[Auto Augment](./demo/autoaug/README.md),能高效地搜索适合数据集的数据增强策略组合。【4】新增[词向量模型](./modules/text/embedding)61个,其中包含中文模型51个,英文模型10个;新增[图像分割](./modules/thirdparty/image/semantic_segmentation)模型4个、[深度模型](./modules/thirdparty/image/depth_estimation)2个、[图像生成](./modules/thirdparty/image/Image_gan/style_transfer)模型7个、[文本生成](./modules/thirdparty/text/text_generation)模型3个。【5】预训练模型总量达到[**【274】**](https://www.paddlepaddle.org.cn/hublist) 个。
- [More](./docs/docs_ch/release.md)
## **精品模型效果展示[【更多】](./docs/docs_ch/visualization.md)**
### **图像类(161个)**
- 包括图像分类、人脸检测、口罩检测、车辆检测、人脸/人体/手部关键点检测、人像分割、80+语言文本识别、图像超分/上色/动漫化等
<div align="center">
<img src="./docs/imgs/Readme_Related/Image_all.gif" width = "530" height = "400" />
</div>
- 感谢CopyRight@[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)[PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN)[AnimeGAN](https://github.com/TachibanaYoshino/AnimeGANv2)[openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose)[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)[Zhengxia Zou](https://github.com/jiupinjia/SkyAR) 提供相关预训练模型,训练能力开放,欢迎体验。
### **文本类(129个)**
- 包括中文分词、词性标注与命名实体识别、句法分析、AI写诗/对联/情话/藏头诗、中文的评论情感分析、中文色情文本审核等
<div align="center">
<img src="./docs/imgs/Readme_Related/Text_all.gif" width = "640" height = "240" />
</div>
- 感谢CopyRight@[ERNIE](https://github.com/PaddlePaddle/ERNIE)[LAC](https://github.com/baidu/LAC)[DDParser](https://github.com/baidu/DDParser)提供相关预训练模型,训练能力开放,欢迎体验。
### **语音类(3个)**
- TTS语音合成算法,多种算法可选
- 感谢CopyRight@[Parakeet](https://github.com/PaddlePaddle/Parakeet)提供预训练模型,训练能力开放,欢迎体验。
- 输入:`Life was like a box of chocolates, you never know what you're gonna get.`
- 合成效果如下:
<div align="center">
<table>
<thead>
</thead>
<tbody>
<tr>
<th>deepvoice3 </th>
<th>fastspeech </th>
<th>transformer</th>
</tr>
<tr>
<th>
<a href="https://paddlehub.bj.bcebos.com/resources/deepvoice3_ljspeech-0.wav">
<img src="./docs/imgs/Readme_Related/audio_icon.png" width=250 /></a><br>
</th>
<th>
<a href="https://paddlehub.bj.bcebos.com/resources/fastspeech_ljspeech-0.wav">
<img src="./docs/imgs/Readme_Related/audio_icon.png" width=250 /></a><br>
</th>
<th>
<a href="https://paddlehub.bj.bcebos.com/resources/transformer_tts_ljspeech-0.wav">
<img src="./docs/imgs/Readme_Related/audio_icon.png" width=250 /></a><br>
</th>
</tr>
</tbody>
</table>
</div>
### **视频类(8个)**
- 包含短视频分类,支持3000+标签种类,可输出TOP-K标签,多种算法可选。
- 感谢CopyRight@[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)提供预训练模型,训练能力开放,欢迎体验。
- `举例:输入一段游泳的短视频,算法可以输出"游泳"结果`
<div align="center">
<img src="./docs/imgs/Readme_Related/Text_Video.gif" width = "400" height = "400" />
</div>
## ===划重点===
- 以上所有预训练模型全部开源,模型数量持续更新,欢迎**⭐Star⭐**关注。
<div align="center">
<a href="https://github.com/PaddlePaddle/PaddleHub/stargazers">
<img src="./docs/imgs/Readme_Related/star.png" width = "411" height = "100" /></a>
</div>
<a name="欢迎加入PaddleHub技术交流群"></a>
## 欢迎加入PaddleHub技术交流群
- 在使用模型过程中有任何问题,可以加入官方微信群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
<div align="center">
<img src="./docs/imgs/joinus.PNG" width = "200" height = "200" />
</div>
如扫码失败,请添加微信15704308458,并备注“Hub”,运营同学会邀请您入群。
<div id="QuickStart">
## 快速开始
</div>
```python
!pip install --upgrade paddlepaddle -i https://mirror.baidu.com/pypi/simple
!pip install --upgrade paddlehub -i https://mirror.baidu.com/pypi/simple
import paddlehub as hub
lac = hub.Module(name="lac")
test_text = ["今天是个好天气。"]
results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
print(results)
#{'word': ['今天', '是', '个', '好天气', '。'], 'tag': ['TIME', 'v', 'q', 'n', 'w']}
# 一行代码启动serving 服务,更多模型搜索可参考 https://www.paddlepaddle.org.cn/hublist
hub serving start -m lac
```
<a name="许可证书"></a>
## 许可证书
本项目的发布受<a href="./LICENSE">Apache 2.0 license</a>许可认证。
<a name="致谢"></a>
## 致谢开发者
<p align="center">
<a href="https://github.com/nepeplwu"><img src="https://avatars.githubusercontent.com/u/45024560?v=4" width=75 height=75></a>
<a href="https://github.com/Steffy-zxf"><img src="https://avatars.githubusercontent.com/u/48793257?v=4" width=75 height=75></a>
<a href="https://github.com/ZeyuChen"><img src="https://avatars.githubusercontent.com/u/1371212?v=4" width=75 height=75></a>
<a href="https://github.com/ShenYuhan"><img src="https://avatars.githubusercontent.com/u/28444161?v=4" width=75 height=75></a>
<a href="https://github.com/kinghuin"><img src="https://avatars.githubusercontent.com/u/11913168?v=4" width=75 height=75></a>
<a href="https://github.com/haoyuying"><img src="https://avatars.githubusercontent.com/u/35907364?v=4" width=75 height=75></a>
<a href="https://github.com/grasswolfs"><img src="https://avatars.githubusercontent.com/u/23690325?v=4" width=75 height=75></a>
<a href="https://github.com/sjtubinlong"><img src="https://avatars.githubusercontent.com/u/2063170?v=4" width=75 height=75></a>
<a href="https://github.com/KPatr1ck"><img src="https://avatars.githubusercontent.com/u/22954146?v=4" width=75 height=75></a>
<a href="https://github.com/jm12138"><img src="https://avatars.githubusercontent.com/u/15712990?v=4" width=75 height=75></a>
<a href="https://github.com/DesmonDay"><img src="https://avatars.githubusercontent.com/u/20554008?v=4" width=75 height=75></a>
<a href="https://github.com/adaxiadaxi"><img src="https://avatars.githubusercontent.com/u/58928121?v=4" width=75 height=75></a>
<a href="https://github.com/chunzhang-hub"><img src="https://avatars.githubusercontent.com/u/63036966?v=4" width=75 height=75></a>
<a href="https://github.com/linshuliang"><img src="https://avatars.githubusercontent.com/u/15993091?v=4" width=75 height=75></a>
<a href="https://github.com/eepgxxy"><img src="https://avatars.githubusercontent.com/u/15946195?v=4" width=75 height=75></a>
<a href="https://github.com/houj04"><img src="https://avatars.githubusercontent.com/u/35131887?v=4" width=75 height=75></a>
<a href="https://github.com/paopjian"><img src="https://avatars.githubusercontent.com/u/20377352?v=4" width=75 height=75></a>
<a href="https://github.com/zbp-xxxp"><img src="https://avatars.githubusercontent.com/u/58476312?v=4" width=75 height=75></a>
<a href="https://github.com/dxxxp"><img src="https://avatars.githubusercontent.com/u/15886898?v=4" width=75 height=75></a>
<a href="https://github.com/1084667371"><img src="https://avatars.githubusercontent.com/u/50902619?v=4" width=75 height=75></a>
<a href="https://github.com/Channingss"><img src="https://avatars.githubusercontent.com/u/12471701?v=4" width=75 height=75></a>
<a href="https://github.com/Austendeng"><img src="https://avatars.githubusercontent.com/u/16330293?v=4" width=75 height=75></a>
<a href="https://github.com/BurrowsWang"><img src="https://avatars.githubusercontent.com/u/478717?v=4" width=75 height=75></a>
<a href="https://github.com/cqvu"><img src="https://avatars.githubusercontent.com/u/37096589?v=4" width=75 height=75></a>
<a href="https://github.com/DeepGeGe"><img src="https://avatars.githubusercontent.com/u/51083814?v=4" width=75 height=75></a>
<a href="https://github.com/Haijunlv"><img src="https://avatars.githubusercontent.com/u/28926237?v=4" width=75 height=75></a>
<a href="https://github.com/holyseven"><img src="https://avatars.githubusercontent.com/u/13829174?v=4" width=75 height=75></a>
<a href="https://github.com/MRXLT"><img src="https://avatars.githubusercontent.com/u/16594411?v=4" width=75 height=75></a>
<a href="https://github.com/cclauss"><img src="https://avatars.githubusercontent.com/u/3709715?v=4" width=75 height=75></a>
<a href="https://github.com/hu-qi"><img src="https://avatars.githubusercontent.com/u/17986122?v=4" width=75 height=75></a>
<a href="https://github.com/jayhenry"><img src="https://avatars.githubusercontent.com/u/4285375?v=4" width=75 height=75></a>
<a href="https://github.com/hlmu"><img src="https://avatars.githubusercontent.com/u/30133236?v=4" width=75 height=75></a>
<a href="https://github.com/yma-admin"><img src="https://avatars.githubusercontent.com/u/40477813?v=4" width=75 height=75></a>
<a href="https://github.com/brooklet"><img src="https://avatars.githubusercontent.com/u/1585799?v=4" width=75 height=75></a>
</p>
我们非常欢迎您为PaddleHub贡献代码,也十分感谢您的反馈。
* 非常感谢[肖培楷](https://github.com/jm12138)贡献了街景动漫化,人像动漫化、手势关键点识别、天空置换、深度估计、人像分割等module
* 非常感谢[Austendeng](https://github.com/Austendeng)贡献了修复SequenceLabelReader的pr
* 非常感谢[cclauss](https://github.com/cclauss)贡献了优化travis-ci检查的pr
* 非常感谢[奇想天外](http://www.cheerthink.com/)贡献了口罩检测的demo
* 非常感谢[mhlwsk](https://github.com/mhlwsk)贡献了修复序列标注预测demo的pr
* 非常感谢[zbp-xxxp](https://github.com/zbp-xxxp)[七年期限](https://github.com/1084667371)联合贡献了看图写诗中秋特别版module、谣言预测、请假条生成等module
* 非常感谢[livingbody](https://github.com/livingbody)贡献了基于PaddleHub能力的风格迁移和中秋看图写诗微信小程序
* 非常感谢[BurrowsWang](https://github.com/BurrowsWang)修复Markdown表格显示问题
* 非常感谢[huqi](https://github.com/hu-qi)修复了readme中的错别字
* 非常感谢[parano](https://github.com/parano)[cqvu](https://github.com/cqvu)[deehrlic](https://github.com/deehrlic)三位的贡献与支持
## `v1.7.0`
* 丰富预训练模型,提升应用性
* 新增VENUS系列视觉预训练模型[yolov3_darknet53_venus](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_venus&en_category=ObjectDetection)[faster_rcnn_resnet50_fpn_venus](https://www.paddlepaddle.org.cn/hubdetail?name=faster_rcnn_resnet50_fpn_venus&en_category=ObjectDetection),可大幅度提升图像分类和目标检测任务的Fine-tune效果
* 新增工业级短视频分类模型[videotag_tsn_lstm](https://paddlepaddle.org.cn/hubdetail?name=videotag_tsn_lstm&en_category=VideoClassification),支持3000类中文标签识别
* 新增轻量级中文OCR模型[chinese_ocr_db_rcnn](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_rcnn&en_category=TextRecognition)[chinese_text_detection_db](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_text_detection_db&en_category=TextRecognition),支持一键快速OCR识别
* 新增行人检测、车辆检测、动物识别、Object等工业级模型
* Fine-tune API升级
* 文本分类任务新增6个预置网络,包括CNN, BOW, LSTM, BiLSTM, DPCNN等
* 使用VisualDL可视化训练评估性能数据
## `v1.6.2`
* 修复图像分类在windows下运行错误
## `v1.6.1`
* 修复windows下安装PaddleHub缺失config.json文件
# `v1.6.0`
* NLP Module全面升级,提升应用性和灵活性
* lac、senta系列(bow、cnn、bilstm、gru、lstm)、simnet_bow、porn_detection系列(cnn、gru、lstm)升级高性能预测,性能提升高达50%
* ERNIE、BERT、RoBERTa等Transformer类语义模型新增获取预训练embedding接口get_embedding,方便接入下游任务,提升应用性
* 新增RoBERTa通过模型结构压缩得到的3层Transformer模型[rbt3](https://www.paddlepaddle.org.cn/hubdetail?name=rbt3&en_category=SemanticModel)[rbtl3](https://www.paddlepaddle.org.cn/hubdetail?name=rbtl3&en_category=SemanticModel)
* Task predict接口增加高性能预测模式accelerate_mode,性能提升高达90%
* PaddleHub Module创建流程开放,支持Fine-tune模型转化,全面提升应用性和灵活性
* [预训练模型转化为PaddleHub Module教程](./docs/contribution/contri_pretrained_model.md)
* [Fine-tune模型转化为PaddleHub Module教程](./docs/tutorial/finetuned_model_to_module.md)
* [PaddleHub Serving](/docs/tutorial/serving.md)优化启动方式,支持更加灵活的参数配置
# `v1.5.4`
* 修复Fine-tune中断,checkpoint文件恢复训练失败的问题
# `v1.5.3`
* 优化口罩模型输出结果,提供更加灵活的部署及调用方式
# `v1.5.2`
* 优化pyramidbox_lite_server_mask、pyramidbox_lite_mobile_mask模型的服务化部署性能
# `v1.5.1`
* 修复加载module缺少cache目录的问题
# `v1.5.0`
* 升级PaddleHub Serving,提升性能和易用性
* 新增文本Embedding服务[Bert Service](./tutorial/bert_service.md), 轻松获取文本embedding;
* 代码精短,易于使用。服务端/客户端一行命令即可获取文本embedding;
* 更高性能,更高效率。通过Paddle AnalysisPredictor API优化计算图,提升速度减小显存占用
* 随"机"应变,灵活扩展。根据机器资源和实际需求可灵活增加服务端数量,支持多显卡多模型计算任务
* 优化并发方式,多核环境中使用多线程并发提高整体QPS
* 优化PaddleHub迁移学习组网Task功能,提升易用性
* 增加Hook机制,支持[修改Task内置方法](https://github.com/PaddlePaddle/PaddleHub/wiki/%E5%A6%82%E4%BD%95%E4%BF%AE%E6%94%B9Task%E5%86%85%E7%BD%AE%E6%96%B9%E6%B3%95%EF%BC%9F)
* 增加colorlog,支持日志彩色显示
* 改用save_inference_model接口保存模型,方便模型部署
* 优化predict接口,增加return_result参数,方便用户直接获取预测结果
* 优化PaddleHub Dataset基类,加载[自定义数据](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub%E9%80%82%E9%85%8D%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E5%AE%8C%E6%88%90FineTune)代码更少、更简单
# `v1.4.1`
* 修复利用Transformer类模型完成序列标注任务适配paddle1.6版本的问题
* Windows下兼容性提升为python >= 3.6
# `v1.4.0`
* 新增预训练模型ERNIE tiny
* 新增数据集:INEWS、BQ、DRCD、CMRC2018、THUCNEWS,支持ChineseGLUE(CLUE)V0 所有任务
* 修复module与PaddlePaddle版本兼容性问题
* 优化Hub Serving启动过程和模型加载流程,提高服务响应速度
# `v1.3.0`
* 新增PaddleHub Serving服务部署
* 新增[hub serving](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub-Serving%E4%B8%80%E9%94%AE%E6%9C%8D%E5%8A%A1%E9%83%A8%E7%BD%B2)命令,支持一键启动Module预测服务部署
* 新增预训练模型:
* roberta_wwm_ext_chinese_L-24_H-1024_A-16
* roberta_wwm_ext_chinese_L-12_H-768_A-12
* bert_wwm_ext_chinese_L-12_H-768_A-12
* bert_wwm_chinese_L-12_H-768_A-12
* AutoDL Finetuner优化使用体验
* 支持通过接口方式回传模型性能
* 可视化效果优化,支持多trail效果显示
# `v1.2.1`
* 新增**超参优化Auto Fine-tune**,实现给定超参搜索空间,PaddleHub自动给出较佳的超参组合
* 支持两种超参优化算法:HAZero和PSHE2
* 支持两种评估方式:FullTrail和PopulationBased
* 新增Fine-tune**优化策略ULMFiT**,包括以下三种设置
* Slanted triangular learning rates:学习率先线性增加后缓慢降低
* Discriminative fine-tuning:将计算图划分为n段,不同的段设置不同学习率
* Gradual unfreezing:根据计算图的拓扑结构逐层unfreezing
* 新增支持用户自定义PaddleHub配置,包括
* 预训练模型管理服务器地址
* 日志记录级别
* Fine-tune API升级,灵活性与易用性提升
* 新增**阅读理解Fine-tune任务****回归Fine-tune任务**
* 新增多指标评测
* 优化predict接口
* 可视化工具支持使用tensorboard
# `v1.1.2`
* PaddleHub支持修改预训练模型存放路径${HUB_HOME}
# `v1.1.1`
* PaddleHub支持离线运行
* 修复python2安装PaddleHub失败问题
# `v1.1.0`
* PaddleHub **新增预训练模型ERNIE 2.0**
* 升级Reader, 支持自动传送数据给Ernie 1.0/2.0
* 新增数据集GLUE(MRPC、QQP、SST-2、CoLA、QNLI、RTE、MNLI)
# `v1.0.1`
* 安装模型时自动选择与paddlepaddle版本适配的模型
# `v1.0.0`
* 全新发布PaddleHub官网,易用性全面提升
* 新增网站 https://www.paddlepaddle.org.cn/hub 包含PaddlePaddle生态的预训练模型使用介绍
* 迁移学习Demo接入AI Studio与AI Book,无需安装即可快速体验
* 新增29个预训练模型,覆盖文本、图像、视频三大领域;目前官方提供40个预训练模型
* CV预训练模型:
* 新增图像分类预训练模型11个:SE_ResNeXt, GoogleNet, ShuffleNet等
* 新增目标检测模型Faster-RCNN和YOLOv3
* 新增图像生成模型CycleGAN
* 新增人脸检测模型Pyramidbox
* 新增视频分类模型4个: TSN, TSM, StNet, Non-Local
* NLP预训练模型
* 新增语义模型ELMo
* 新增情感分析模型5个: Senta-BOW, Senta-CNN, Senta-GRNN, , Senta-LSTM, EmoTect
* 新增中文语义相似度分析模型SimNet
* 升级LAC词法分析模型,新增词典干预功能,支持用户自定义分词
* Fine-tune API升级,灵活性与性能全面提升
* 支持多卡并行、PyReader多线程IO,Fine-tune速度提升60%
* 简化finetune、evaluate、predict等使用逻辑,提升易用性
* 增加事件回调功能,方便用户快速实现自定义迁移学习任务
* 新增多标签分类Fine-tune任务
# `v0.5.0`
正式发布PaddleHub预训练模型管理工具,旨在帮助用户更高效的管理模型并开展迁移学习的工作。
**预训练模型管理**: 通过hub命令行可完成PaddlePaddle生态的预训练模型下载、搜索、版本管理等功能。
**命令行一键使用**: 无需代码,通过命令行即可直接使用预训练模型进行预测,快速调研训练模型效果。目前版本支持以下模型:词法分析LAC;情感分析Senta;目标检测SSD;图像分类ResNet, MobileNet, NASNet等。
**迁移学习**: 提供了基于预训练模型的Fine-tune API,用户通过少量代码即可完成迁移学习,包括BERT/ERNIE文本分类、序列标注、图像分类迁移等。
# DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks
## Introduction
This page implements the [DELTA](https://arxiv.org/abs/1901.09229) algorithm in [PaddlePaddle](https://www.paddlepaddle.org.cn).
> Li, Xingjian, et al. "DELTA: Deep learning transfer using feature map with attention for convolutional networks." ICLR 2019.
## Preparation of Data and Pre-trained Model
- Download transfer learning target datasets, like [Caltech-256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/), [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) or others. Arrange the dataset in this way:
```
root/train/dog/xxy.jpg
root/train/dog/xxz.jpg
...
root/train/cat/nsdf3.jpg
root/train/cat/asd932_.jpg
...
root/test/dog/xxx.jpg
...
root/test/cat/123.jpg
...
```
- Download [the pretrained models](https://github.com/PaddlePaddle/models/tree/release/1.7/PaddleCV/image_classification#resnet-series). We give the results of ResNet-101 below.
## Running Scripts
Modify `global_data_path` in `datasets/data_path` to the path root where the dataset is.
```bash
python -u main.py --dataset Caltech30 --delta_reg 0.1 --wd_rate 1e-4 --batch_size 64 --outdir outdir --num_epoch 100 --use_cuda 0
python -u main.py --dataset CUB_200_2011 --delta_reg 0.1 --wd_rate 1e-4 --batch_size 64 --outdir outdir --num_epoch 100 --use_cuda 0
```
Those scripts give the results below:
\ | l2 | delta
---|---|---
Caltech-256|79.86|84.71
CUB_200|77.41|80.05
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
'--prefix', default=None, type=str, help='prefix for model id')
parser.add_argument('--dataset', default='PetImages', type=str, help='dataset')
parser.add_argument(
'--seed',
default=None,
type=int,
help='random seed (default: None, i.e., not fix the randomness).')
parser.add_argument('--batch_size', default=20, type=int, help='batch_size.')
parser.add_argument('--delta_reg', default=0.1, type=float, help='delta_reg.')
parser.add_argument('--wd_rate', default=1e-4, type=float, help='wd_rate.')
parser.add_argument(
'--use_cuda', default=0, type=int, help='use_cuda device. -1 cpu.')
parser.add_argument('--num_epoch', default=100, type=int, help='num_epoch.')
parser.add_argument('--outdir', default='outdir', type=str, help='outdir')
parser.add_argument(
'--pretrained_model',
default='./pretrained_models/ResNet101_pretrained',
type=str,
help='pretrained model pathname')
args = parser.parse_args()
global_data_path = '[root_path]/datasets'
import cv2
import numpy as np
import six
import os
import glob
def resize_short(img, target_size, interpolation=None):
"""resize image
Args:
img: image data
target_size: resize short target size
interpolation: interpolation mode
Returns:
resized image data
"""
percent = float(target_size) / min(img.shape[0], img.shape[1])
resized_width = int(round(img.shape[1] * percent))
resized_height = int(round(img.shape[0] * percent))
if interpolation:
resized = cv2.resize(
img, (resized_width, resized_height), interpolation=interpolation)
else:
resized = cv2.resize(img, (resized_width, resized_height))
return resized
def crop_image(img, target_size, center):
"""crop image
Args:
img: images data
target_size: crop target size
center: crop mode
Returns:
img: cropped image data
"""
height, width = img.shape[:2]
size = target_size
if center == True:
w_start = (width - size) // 2
h_start = (height - size) // 2
else:
w_start = np.random.randint(0, width - size + 1)
h_start = np.random.randint(0, height - size + 1)
w_end = w_start + size
h_end = h_start + size
img = img[h_start:h_end, w_start:w_end, :]
return img
def preprocess_image(img, random_mirror=True):
"""
centered, scaled by 1/255.
:param img: np.array: shape: [ns, h, w, 3], color order: rgb.
:return: np.array: shape: [ns, h, w, 3]
"""
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
# transpose to [ns, 3, h, w]
img = img.astype('float32').transpose((0, 3, 1, 2)) / 255
img_mean = np.array(mean).reshape((3, 1, 1))
img_std = np.array(std).reshape((3, 1, 1))
img -= img_mean
img /= img_std
if random_mirror:
mirror = int(np.random.uniform(0, 2))
if mirror == 1:
img = img[:, :, ::-1, :]
return img
def _find_classes(dir):
# Faster and available in Python 3.5 and above
classes = [d.name for d in os.scandir(dir) if d.is_dir()]
classes.sort()
class_to_idx = {classes[i]: i for i in range(len(classes))}
return classes, class_to_idx
class ReaderConfig():
"""
A generic data loader where the images are arranged in this way:
root/train/dog/xxy.jpg
root/train/dog/xxz.jpg
...
root/train/cat/nsdf3.jpg
root/train/cat/asd932_.jpg
...
root/test/dog/xxx.jpg
...
root/test/cat/123.jpg
...
"""
def __init__(self, dataset_dir, is_test):
image_paths, labels, self.num_classes = self.reader_creator(
dataset_dir, is_test)
random_per = np.random.permutation(range(len(image_paths)))
self.image_paths = image_paths[random_per]
self.labels = labels[random_per]
self.is_test = is_test
def get_reader(self):
def reader():
IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
'.tif', '.tiff', '.webp')
target_size = 256
crop_size = 224
for i, img_path in enumerate(self.image_paths):
if not img_path.lower().endswith(IMG_EXTENSIONS):
continue
img = cv2.imread(img_path)
if img is None:
print(img_path)
continue
img = resize_short(img, target_size, interpolation=None)
img = crop_image(img, crop_size, center=self.is_test)
img = img[:, :, ::-1]
img = np.expand_dims(img, axis=0)
img = preprocess_image(img, not self.is_test)
yield img, self.labels[i]
return reader
def reader_creator(self, dataset_dir, is_test=False):
IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
'.tif', '.tiff', '.webp')
# read
if is_test:
datasubset_dir = os.path.join(dataset_dir, 'test')
else:
datasubset_dir = os.path.join(dataset_dir, 'train')
class_names, class_to_idx = _find_classes(datasubset_dir)
# num_classes = len(class_names)
image_paths = []
labels = []
for class_name in class_names:
classes_dir = os.path.join(datasubset_dir, class_name)
for img_path in glob.glob(os.path.join(classes_dir, '*')):
if not img_path.lower().endswith(IMG_EXTENSIONS):
continue
image_paths.append(img_path)
labels.append(class_to_idx[class_name])
image_paths = np.array(image_paths)
labels = np.array(labels)
return image_paths, labels, len(class_names)
import os
import time
import sys
import math
import numpy as np
import functools
import re
import logging
import glob
import paddle
import paddle.fluid as fluid
from models.resnet import ResNet101
from datasets.readers import ReaderConfig
# import cv2
# import skimage
# import matplotlib.pyplot as plt
# from paddle.fluid.core import PaddleTensor
# from paddle.fluid.core import AnalysisConfig
# from paddle.fluid.core import create_paddle_predictor
from args import args
from datasets.data_path import global_data_path
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
if args.seed is not None:
np.random.seed(args.seed)
print(os.environ.get('LD_LIBRARY_PATH', None))
print(os.environ.get('PATH', None))
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self):
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def load_vars_by_dict(executor, name_var_dict, main_program=None):
from paddle.fluid.framework import Program, Variable
from paddle.fluid import core
load_prog = Program()
load_block = load_prog.global_block()
if main_program is None:
main_program = fluid.default_main_program()
if not isinstance(main_program, Program):
raise TypeError("program should be as Program type or None")
for each_var_name in name_var_dict.keys():
assert isinstance(name_var_dict[each_var_name], Variable)
if name_var_dict[each_var_name].type == core.VarDesc.VarType.RAW:
continue
load_block.append_op(
type='load',
inputs={},
outputs={'Out': [name_var_dict[each_var_name]]},
attrs={'file_path': each_var_name})
executor.run(load_prog)
def get_model_id():
prefix = ''
if args.prefix is not None:
prefix = args.prefix + '-' # for some notes.
model_id = prefix + args.dataset + \
'-epo_' + str(args.num_epoch) + \
'-b_' + str(args.batch_size) + \
'-reg_' + str(args.delta_reg) + \
'-wd_' + str(args.wd_rate)
return model_id
def train():
dataset = args.dataset
image_shape = [3, 224, 224]
pretrained_model = args.pretrained_model
class_map_path = f'{global_data_path}/{dataset}/readable_label.txt'
if os.path.exists(class_map_path):
logger.info(
"The map of readable label and numerical label has been found!")
with open(class_map_path) as f:
label_dict = {}
strinfo = re.compile(r"\d+ ")
for item in f.readlines():
key = int(item.split(" ")[0])
value = [
strinfo.sub("", l).replace("\n", "")
for l in item.split(", ")
]
label_dict[key] = value[0]
assert os.path.isdir(
pretrained_model), "please load right pretrained model path for infer"
# data reader
batch_size = args.batch_size
reader_config = ReaderConfig(f'{global_data_path}/{dataset}', is_test=False)
reader = reader_config.get_reader()
train_reader = paddle.batch(
paddle.reader.shuffle(reader, buf_size=batch_size),
batch_size,
drop_last=True)
# model ops
image = fluid.data(
name='image', shape=[None] + image_shape, dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
model = ResNet101(is_test=False)
features, logits = model.net(
input=image, class_dim=reader_config.num_classes)
out = fluid.layers.softmax(logits)
# loss, metric
cost = fluid.layers.mean(fluid.layers.cross_entropy(out, label))
accuracy = fluid.layers.accuracy(input=out, label=label)
# delta regularization
# teacher model pre-trained on Imagenet, 1000 classes.
global_name = 't_'
t_model = ResNet101(is_test=True, global_name=global_name)
t_features, _ = t_model.net(input=image, class_dim=1000)
for f in t_features.keys():
t_features[f].stop_gradient = True
# delta loss. hard code for the layer name, which is just before global pooling.
delta_loss = fluid.layers.square(t_features['t_res5c.add.output.5.tmp_0'] -
features['res5c.add.output.5.tmp_0'])
delta_loss = fluid.layers.reduce_mean(delta_loss)
params = fluid.default_main_program().global_block().all_parameters()
parameters = []
for param in params:
if param.trainable:
if global_name in param.name:
print('\tfixing', param.name)
else:
print('\ttraining', param.name)
parameters.append(param.name)
# optimizer, with piecewise_decay learning rate.
total_steps = len(reader_config.image_paths) * args.num_epoch // batch_size
boundaries = [int(total_steps * 2 / 3)]
print('\ttotal learning steps:', total_steps)
print('\tlr decays at:', boundaries)
values = [0.01, 0.001]
optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.piecewise_decay(
boundaries=boundaries, values=values),
momentum=0.9,
parameter_list=parameters,
regularization=fluid.regularizer.L2Decay(args.wd_rate))
cur_lr = optimizer._global_learning_rate()
optimizer.minimize(
cost + args.delta_reg * delta_loss, parameter_list=parameters)
# data reader
feed_order = ['image', 'label']
# executor (session)
place = fluid.CUDAPlace(
args.use_cuda) if args.use_cuda >= 0 else fluid.CPUPlace()
exe = fluid.Executor(place)
# running
main_program = fluid.default_main_program()
start_program = fluid.default_startup_program()
feed_var_list_loop = [
main_program.global_block().var(var_name) for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list=feed_var_list_loop, place=place)
exe.run(start_program)
loading_parameters = {}
t_loading_parameters = {}
for p in main_program.all_parameters():
if 'fc' not in p.name:
if global_name in p.name:
new_name = os.path.join(pretrained_model,
p.name.split(global_name)[-1])
t_loading_parameters[new_name] = p
print(new_name, p.name)
else:
name = os.path.join(pretrained_model, p.name)
loading_parameters[name] = p
print(name, p.name)
else:
print(f'not loading {p.name}')
load_vars_by_dict(exe, loading_parameters, main_program=main_program)
load_vars_by_dict(exe, t_loading_parameters, main_program=main_program)
step = 0
# test_data = reader_creator_all_in_memory('./datasets/PetImages', is_test=True)
for e_id in range(args.num_epoch):
avg_delta_loss = AverageMeter()
avg_loss = AverageMeter()
avg_accuracy = AverageMeter()
batch_time = AverageMeter()
end = time.time()
for step_id, data_train in enumerate(train_reader()):
wrapped_results = exe.run(
main_program,
feed=feeder.feed(data_train),
fetch_list=[cost, accuracy, delta_loss, cur_lr])
# print(avg_loss_value[2])
batch_time.update(time.time() - end)
end = time.time()
avg_loss.update(wrapped_results[0][0], len(data_train))
avg_accuracy.update(wrapped_results[1][0], len(data_train))
avg_delta_loss.update(wrapped_results[2][0], len(data_train))
if step % 100 == 0:
print(
f"\tEpoch {e_id}, Global_Step {step}, Batch_Time {batch_time.avg: .2f},"
f" LR {wrapped_results[3][0]}, "
f"Loss {avg_loss.avg: .4f}, Acc {avg_accuracy.avg: .4f}, Delta_Loss {avg_delta_loss.avg: .4f}"
)
step += 1
if args.outdir is not None:
try:
os.makedirs(args.outdir, exist_ok=True)
fluid.io.save_params(
executor=exe, dirname=args.outdir + '/' + get_model_id())
except:
print('\t Not saving trained parameters.')
if e_id == args.num_epoch - 1:
print("kpis\ttrain_cost\t%f" % avg_loss.avg)
print("kpis\ttrain_acc\t%f" % avg_accuracy.avg)
def test():
image_shape = [3, 224, 224]
pretrained_model = args.outdir + '/' + get_model_id()
# data reader
batch_size = args.batch_size
reader_config = ReaderConfig(
f'{global_data_path}/{args.dataset}', is_test=True)
reader = reader_config.get_reader()
test_reader = paddle.batch(reader, batch_size)
# model ops
image = fluid.data(
name='image', shape=[None] + image_shape, dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
model = ResNet101(is_test=True)
_, logits = model.net(input=image, class_dim=reader_config.num_classes)
out = fluid.layers.softmax(logits)
# loss, metric
cost = fluid.layers.mean(fluid.layers.cross_entropy(out, label))
accuracy = fluid.layers.accuracy(input=out, label=label)
# data reader
feed_order = ['image', 'label']
# executor (session)
place = fluid.CUDAPlace(
args.use_cuda) if args.use_cuda >= 0 else fluid.CPUPlace()
exe = fluid.Executor(place)
# running
main_program = fluid.default_main_program()
start_program = fluid.default_startup_program()
feed_var_list_loop = [
main_program.global_block().var(var_name) for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list=feed_var_list_loop, place=place)
exe.run(start_program)
fluid.io.load_params(exe, pretrained_model)
step = 0
avg_loss = AverageMeter()
avg_accuracy = AverageMeter()
for step_id, data_train in enumerate(test_reader()):
avg_loss_value = exe.run(
main_program,
feed=feeder.feed(data_train),
fetch_list=[cost, accuracy])
avg_loss.update(avg_loss_value[0], len(data_train))
avg_accuracy.update(avg_loss_value[1], len(data_train))
if step_id % 10 == 0:
print("\nBatch %d, Loss %f, Acc %f" % (step_id, avg_loss.avg,
avg_accuracy.avg))
step += 1
print("test counts:", avg_loss.count)
print("test_cost\t%f" % avg_loss.avg)
print("test_acc\t%f" % avg_accuracy.avg)
if __name__ == '__main__':
print(args)
train()
test()
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
# from https://github.com/PaddlePaddle/models/blob/release/1.7/PaddleCV/image_classification/models/resnet.py.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
__all__ = [
"ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"
]
class ResNet():
def __init__(self, layers=50, is_test=True, global_name=''):
self.layers = layers
self.is_test = is_test
self.features = {}
self.global_name = global_name
def net(self, input, class_dim=1000, data_format="NCHW"):
layers = self.layers
supported_layers = [18, 34, 50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
if layers == 18:
depth = [2, 2, 2, 2]
elif layers == 34 or layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name="conv1",
data_format=data_format)
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max',
name=self.global_name + 'poo1',
data_format=data_format)
self.features[conv.name] = conv
if layers >= 50:
for block in range(len(depth)):
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
name=conv_name,
data_format=data_format)
self.features[conv.name] = conv
pool = fluid.layers.pool2d(
input=conv,
pool_type='avg',
global_pooling=True,
name=self.global_name + 'global_pooling',
data_format=data_format)
self.features[pool.name] = pool
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
bias_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.b_0'),
param_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.w_0',
initializer=fluid.initializer.Uniform(-stdv, stdv)))
else:
for block in range(len(depth)):
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.basic_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
is_first=block == i == 0,
name=conv_name,
data_format=data_format)
self.features[conv.name] = conv
pool = fluid.layers.pool2d(
input=conv,
pool_type='avg',
global_pooling=True,
name=self.global_name + 'global_pooling',
data_format=data_format)
self.features[pool.name] = pool
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
bias_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.b_0'),
param_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.w_0',
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return self.features, out
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None,
data_format='NCHW'):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=self.global_name + name + "_weights"),
bias_attr=False,
name=name + '.conv2d.output.1',
data_format=data_format)
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(
input=conv,
act=act,
name=self.global_name + bn_name + '.output.1',
param_attr=ParamAttr(self.global_name + bn_name + '_scale'),
bias_attr=ParamAttr(self.global_name + bn_name + '_offset'),
moving_mean_name=self.global_name + bn_name + '_mean',
moving_variance_name=self.global_name + bn_name + '_variance',
data_layout=data_format,
use_global_stats=self.is_test)
def shortcut(self, input, ch_out, stride, is_first, name, data_format):
if data_format == 'NCHW':
ch_in = input.shape[1]
else:
ch_in = input.shape[-1]
if ch_in != ch_out or stride != 1 or is_first == True:
return self.conv_bn_layer(
input, ch_out, 1, stride, name=name, data_format=data_format)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name, data_format):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
act='relu',
name=name + "_branch2a",
data_format=data_format)
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu',
name=name + "_branch2b",
data_format=data_format)
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
filter_size=1,
act=None,
name=name + "_branch2c",
data_format=data_format)
short = self.shortcut(
input,
num_filters * 4,
stride,
is_first=False,
name=name + "_branch1",
data_format=data_format)
return fluid.layers.elementwise_add(
x=short,
y=conv2,
act='relu',
name=self.global_name + name + ".add.output.5")
def basic_block(self, input, num_filters, stride, is_first, name,
data_format):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a",
data_format=data_format)
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b",
data_format=data_format)
short = self.shortcut(
input,
num_filters,
stride,
is_first,
name=name + "_branch1",
data_format=data_format)
return fluid.layers.elementwise_add(
x=short,
y=conv1,
act='relu',
name=self.global_name + name + ".add.output.5")
def ResNet18(is_test=True, global_name=''):
model = ResNet(layers=18, is_test=is_test, global_name=global_name)
return model
def ResNet34(is_test=True, global_name=''):
model = ResNet(layers=34, is_test=is_test, global_name=global_name)
return model
def ResNet50(is_test=True, global_name=''):
model = ResNet(layers=50, is_test=is_test, global_name=global_name)
return model
def ResNet101(is_test=True, global_name=''):
model = ResNet(layers=101, is_test=is_test, global_name=global_name)
return model
def ResNet152(is_test=True, global_name=''):
model = ResNet(layers=152, is_test=is_test, global_name=global_name)
return model
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
# from https://github.com/PaddlePaddle/models/blob/release/1.7/PaddleCV/image_classification/models/resnet_vc.py.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
__all__ = ["ResNet", "ResNet50_vc", "ResNet101_vc", "ResNet152_vc"]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class ResNet():
def __init__(self, layers=50, is_test=False, global_name=''):
self.params = train_parameters
self.layers = layers
self.is_test = is_test
self.features = {}
self.global_name = global_name
def net(self, input, class_dim=1000):
layers = self.layers
supported_layers = [50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
if layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input,
num_filters=32,
filter_size=3,
stride=2,
act='relu',
name='conv1_1')
conv = self.conv_bn_layer(
input=conv,
num_filters=32,
filter_size=3,
stride=1,
act='relu',
name='conv1_2')
conv = self.conv_bn_layer(
input=conv,
num_filters=64,
filter_size=3,
stride=1,
act='relu',
name='conv1_3')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max',
name=self.global_name + 'poo1')
self.features[conv.name] = conv
for block in range(len(depth)):
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
name=conv_name)
self.features[conv.name] = conv
pool = fluid.layers.pool2d(
input=conv,
pool_type='avg',
global_pooling=True,
name=self.global_name + 'global_pooling')
self.features[pool.name] = pool
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
bias_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.b_0'),
param_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.w_0',
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return self.features, out
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=self.global_name + name + "_weights"),
bias_attr=False,
name=self.global_name + name + '.conv2d.output.1')
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(
input=conv,
act=act,
name=self.global_name + bn_name + '.output.1',
param_attr=ParamAttr(self.global_name + bn_name + '_scale'),
bias_attr=ParamAttr(self.global_name + bn_name + '_offset'),
moving_mean_name=self.global_name + bn_name + '_mean',
moving_variance_name=self.global_name + bn_name + '_variance',
use_global_stats=self.is_test)
def shortcut(self, input, ch_out, stride, name):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
act='relu',
name=name + "_branch2a")
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu',
name=name + "_branch2b")
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
filter_size=1,
act=None,
name=name + "_branch2c")
short = self.shortcut(
input, num_filters * 4, stride, name=name + "_branch1")
return fluid.layers.elementwise_add(
x=short,
y=conv2,
act='relu',
name=self.global_name + name + ".add.output.5")
def ResNet50_vc(is_test=True, global_name=''):
model = ResNet(layers=50, is_test=is_test, global_name=global_name)
return model
def ResNet101_vc(is_test=True, global_name=''):
model = ResNet(layers=101, is_test=is_test, global_name=global_name)
return model
def ResNet152_vc(is_test=True, global_name=''):
model = ResNet(layers=152, is_test=is_test, global_name=global_name)
return model
# PaddleHub Demo 简介
目前PaddleHub有以下任务示例:
* [口罩检测](./mask_detection)
提供了基于完整的口罩人脸检测及分类的模型搭建的完整的视频级别Demo,同时提供基于飞桨高性能预测库的C++和Python部署方案。
* [图像分类](./image_classification)
该样例展示了PaddleHub如何将ResNet50、ResNet101、ResNet152、MobileNet、NasNet以及PNasNet作为预训练模型在Flowers、DogCat、Indoor67、Food101、StanfordDogs等数据集上进行图像分类的FineTune和预测。
* [中文词法分析](./lac)
该样例展示了PaddleHub如何利用中文词法分析LAC进行预测。
* [情感分析](./senta)
该样例展示了PaddleHub如何利用中文情感分析模型Senta进行FineTune和预测。
* [序列标注](./sequence_labeling)
该样例展示了PaddleHub如何将ERNIE/BERT等Transformer类模型作为预训练模型在MSRA_NER数据集上完成序列标注的FineTune和预测。
* [目标检测](./ssd)
该样例展示了PaddleHub如何将SSD作为预训练模型在PascalVOC数据集上完成目标检测的预测。
* [文本分类](./text_classification)
该样例展示了PaddleHub如何将ERNIE/BERT等Transformer类模型作为预训练模型在GLUE、ChnSentiCorp等数据集上完成文本分类的FineTune和预测。
**同时,该样例还展示了如何将一个Fine-tune保存的模型转化成PaddleHub Module。** 请确认转化时,使用的PaddleHub为1.6.0以上版本。
* [多标签分类](./multi_label_classification)
该样例展示了PaddleHub如何将BERT作为预训练模型在Toxic数据集上完成多标签分类的FineTune和预测。
* [回归任务](./regression)
该样例展示了PaddleHub如何将BERT作为预训练模型在GLUE-STSB数据集上完成回归任务的FineTune和预测。
* [阅读理解](./reading_comprehension)
该样例展示了PaddleHub如何将BERT作为预训练模型在SQAD数据集上完成阅读理解的FineTune和预测。
* [检索式问答任务](./qa_classfication)
该样例展示了PaddleHub如何将ERNIE和BERT作为预训练模型在NLPCC-DBQA等数据集上完成检索式问答任务的FineTune和预测。
* [句子语义相似度计算](./sentence_similarity)
该样例展示了PaddleHub如何将word2vec_skipgram用于计算两个文本语义相似度。
* [超参优化AutoDL Finetuner使用](./autofinetune)
该样例展示了PaddleHub超参优化AutoDL Finetuner如何使用,给出了自动搜素图像分类/文本分类任务的较佳超参数示例。
* [服务化部署Hub Serving使用](./serving)
该样例文件夹下展示了服务化部署Hub Serving如何使用,将PaddleHub支持的可预测Module如何服务化部署。
* [预训练模型转化成PaddleHub Module](./senta_module_sample)
该样例展示了如何将一个预训练模型转化成PaddleHub Module形式,使得可以通过`hub.Module(name="module_name")`实现一键加载。
请确认转化时,使用的PaddleHub为1.6.0以上版本。
**NOTE:**
以上任务示例均是利用PaddleHub提供的数据集,若您想在自定义数据集上完成相应任务,请查看[PaddleHub适配自定义数据完成Fine-tune](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub%E9%80%82%E9%85%8D%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E5%AE%8C%E6%88%90FineTune)
## 在线体验
我们在AI Studio上提供了IPython NoteBook形式的demo,您可以直接在平台上在线体验,链接如下:
|预训练模型|任务类型|数据集|AIStudio链接|备注|
|-|-|-|-|-|
|ResNet|图像分类|猫狗数据集DogCat|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147010)||
|ERNIE|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147006)||
|ERNIE|文本分类|中文新闻分类数据集THUNEWS|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/221999)|本教程讲述了如何将自定义数据集加载,并利用Fine-tune API完成文本分类迁移学习。|
|ERNIE|序列标注|中文序列标注数据集MSRA_NER|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147009)||
|ERNIE|序列标注|中文快递单数据集Express|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/184200)|本教程讲述了如何将自定义数据集加载,并利用Fine-tune API完成序列标注迁移学习。|
|ERNIE Tiny|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/186443)||
|Senta|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/216846)|本教程讲述了任何利用Senta和Fine-tune API完成情感分类迁移学习。|
|Senta|情感分析预测|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215814)||
|LAC|词法分析|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215711)||
|Ultra-Light-Fast-Generic-Face-Detector-1MB|人脸检测|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215962)||
## 超参优化AutoDL Finetuner
PaddleHub还提供了超参优化(Hyperparameter Tuning)功能, 自动搜索最优模型超参得到更好的模型效果。详细信息参见[AutoDL Finetuner超参优化功能教程](../docs/tutorial/autofinetune.md)
# PaddleHub 声音分类
本示例展示如何使用PaddleHub Fine-tune API以及CNN14等预训练模型完成声音分类和Tagging的任务。
CNN14等预训练模型的详情,请参考论文[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf)和代码[audioset_tagging_cnn](https://github.com/qiuqiangkong/audioset_tagging_cnn)
## 如何开始Fine-tune
我们以环境声音分类公开数据集[ESC50](https://github.com/karolpiczak/ESC-50)为示例数据集,可以运行下面的命令,在训练集(train.npz)上进行模型训练,并在开发集(dev.npz)验证。通过如下命令,即可启动训练。
```python
# 设置使用的GPU卡号
export CUDA_VISIBLE_DEVICES=0
python train.py
```
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 选择模型
```python
import paddle
import paddlehub as hub
from paddlehub.datasets import ESC50
model = hub.Module(name='panns_cnn14', version='1.0.0', task='sound-cls', num_class=ESC50.num_class)
```
其中,参数:
- `name`: 模型名称,可以选择`panns_cnn14``panns_cnn10``panns_cnn6`,具体的模型参数信息可见下表。
- `version`: module版本号
- `task`:模型的执行任务。`sound-cls`表示声音分类任务;`None`表示Audio Tagging任务。
- `num_classes`:表示当前声音分类任务的类别数,根据具体使用的数据集确定。
目前可选用的预训练模型:
模型名 | PaddleHub Module
-----------| :------:
CNN14 | `hub.Module(name='panns_cnn14')`
CNN10 | `hub.Module(name='panns_cnn10')`
CNN6 | `hub.Module(name='panns_cnn6')`
### Step2: 加载数据集
```python
train_dataset = ESC50(mode='train')
dev_dataset = ESC50(mode='dev')
```
### Step3: 选择优化策略和运行配置
```python
optimizer = paddle.optimizer.AdamW(learning_rate=5e-5, parameters=model.parameters())
trainer = hub.Trainer(model, optimizer, checkpoint_dir='./', use_gpu=True)
```
#### 优化策略
Paddle2.0提供了多种优化器选择,如`SGD`, `AdamW`, `Adamax`等,详细参见[策略](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)
其中`AdamW`:
- `learning_rate`: 全局学习率。默认为1e-3;
- `parameters`: 待优化模型参数。
其余可配置参数请参考[AdamW](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/adamw/AdamW_cn.html#cn-api-paddle-optimizer-adamw)
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
- `model`: 被优化模型;
- `optimizer`: 优化器选择;
- `use_vdl`: 是否使用vdl可视化训练过程;
- `checkpoint_dir`: 保存模型参数的地址;
- `compare_metrics`: 保存最优模型的衡量指标;
### Step4: 执行训练和模型评估
```python
trainer.train(
train_dataset,
epochs=50,
batch_size=16,
eval_dataset=dev_dataset,
save_interval=10,
)
trainer.evaluate(dev_dataset, batch_size=16)
```
`trainer.train`执行模型的训练,其参数可以控制具体的训练过程,主要的参数包含:
- `train_dataset`: 训练时所用的数据集;
- `epochs`: 训练轮数;
- `batch_size`: 训练时每一步用到的样本数目,如果使用GPU,请根据实际情况调整batch_size;
- `num_workers`: works的数量,默认为0;
- `eval_dataset`: 验证集;
- `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
- `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
`trainer.evaluate`执行模型的评估,主要的参数包含:
- `eval_dataset`: 模型评估时所用的数据集;
- `batch_size`: 模型评估时每一步用到的样本数目,如果使用GPU,请根据实际情况调整batch_size
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
以下代码将本地的音频文件`./cat.wav`作为预测数据,使用训好的模型对它进行分类,输出结果。
```python
import os
import librosa
import paddlehub as hub
from paddlehub.datasets import ESC50
wav = './cat.wav' # 存储在本地的需要预测的wav文件
sr = 44100 # 音频文件的采样率
checkpoint = './best_model/model.pdparams' # 模型checkpoint
label_map = {idx: label for idx, label in enumerate(ESC50.label_list)}
model = hub.Module(name='panns_cnn14',
version='1.0.0',
task='sound-cls',
num_class=ESC50.num_class,
label_map=label_map,
load_checkpoint=checkpoint)
data = [librosa.load(wav, sr=sr)[0]]
result = model.predict(data, sample_rate=sr, batch_size=1, feat_type='mel', use_gpu=True)
print(result[0]) # result[0]包含音频文件属于各类别的概率值
```
## Audio Tagging
当前使用的模型是基于[Audioset数据集](https://research.google.com/audioset/)的预训练模型,除了以上的针对特定声音分类数据集的finetune任务,模型还支持基于Audioset 527个标签的Tagging功能。
以下代码将本地的音频文件`./cat.wav`作为预测数据,使用预训练模型对它进行打分,输出top 10的标签和对应的得分。
```python
import os
import librosa
import numpy as np
import paddlehub as hub
from paddlehub.env import MODULE_HOME
wav = './cat.wav' # 存储在本地的需要预测的wav文件
sr = 44100 # 音频文件的采样率
topk = 10 # 展示音频得分前10的标签和分数
# 读取audioset数据集的label文件
label_file = os.path.join(MODULE_HOME, 'panns_cnn14', 'audioset_labels.txt')
label_map = {}
with open(label_file, 'r') as f:
for i, l in enumerate(f.readlines()):
label_map[i] = l.strip()
model = hub.Module(name='panns_cnn14', version='1.0.0', task=None, label_map=label_map)
data = [librosa.load(wav, sr=sr)[0]]
result = model.predict(data, sample_rate=sr, batch_size=1, feat_type='mel', use_gpu=True)
# 打印topk的类别和对应得分
msg = ''
for label, score in list(result[0].items())[:topk]:
msg += f'{label}: {score}\n'
print(msg)
```
### 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.1.0
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import ast
import os
import librosa
import numpy as np
import paddlehub as hub
from paddlehub.env import MODULE_HOME
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--wav", type=str, required=True, help="Audio file to infer.")
parser.add_argument("--sr", type=int, default=32000, help="Sample rate of inference audio.")
parser.add_argument("--model_type", type=str, default='panns_cnn14', help="Select model to to inference.")
parser.add_argument("--topk", type=int, default=10, help="Show top k results of audioset labels.")
args = parser.parse_args()
if __name__ == '__main__':
label_file = os.path.join(MODULE_HOME, args.model_type, 'audioset_labels.txt')
label_map = {}
with open(label_file, 'r') as f:
for i, l in enumerate(f.readlines()):
label_map[i] = l.strip()
model = hub.Module(name=args.model_type, version='1.0.0', task=None, label_map=label_map)
data = [librosa.load(args.wav, sr=args.sr)[0]] # (t, num_mel_bins)
result = model.predict(data, sample_rate=args.sr, batch_size=1, feat_type='mel', use_gpu=True)
msg = f'[{args.wav}]\n'
for label, score in list(result[0].items())[:args.topk]:
msg += f'{label}: {score}\n'
print(msg)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import ast
import os
import librosa
import paddlehub as hub
from paddlehub.datasets import ESC50
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--wav", type=str, required=True, help="Audio file to infer.")
parser.add_argument("--sr", type=int, default=44100, help="Sample rate of inference audio.")
parser.add_argument("--model_type", type=str, default='panns_cnn14', help="Select model to to inference.")
parser.add_argument("--topk", type=int, default=1, help="Show top k results of prediction labels.")
parser.add_argument(
"--checkpoint", type=str, default='./checkpoint/best_model/model.pdparams', help="Checkpoint of model.")
args = parser.parse_args()
if __name__ == '__main__':
label_map = {idx: label for idx, label in enumerate(ESC50.label_list)}
model = hub.Module(
name=args.model_type,
version='1.0.0',
task='sound-cls',
num_class=ESC50.num_class,
label_map=label_map,
load_checkpoint=args.checkpoint)
data = [librosa.load(args.wav, sr=args.sr)[0]]
result = model.predict(data, sample_rate=args.sr, batch_size=1, feat_type='mel', use_gpu=True)
msg = f'[{args.wav}]\n'
for label, score in list(result[0].items())[:args.topk]:
msg += f'{label}: {score}\n'
print(msg)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import ast
import paddle
import paddlehub as hub
from paddlehub.datasets import ESC50
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=50, help="Number of epoches for fine-tuning.")
parser.add_argument(
"--use_gpu",
type=ast.literal_eval,
default=True,
help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.")
parser.add_argument("--checkpoint_dir", type=str, default='./checkpoint', help="Directory to model checkpoint")
parser.add_argument("--save_interval", type=int, default=10, help="Save checkpoint every n epoch.")
args = parser.parse_args()
if __name__ == "__main__":
model = hub.Module(name='panns_cnn14', task='sound-cls', num_class=ESC50.num_class)
train_dataset = ESC50(mode='train')
dev_dataset = ESC50(mode='dev')
optimizer = paddle.optimizer.AdamW(learning_rate=args.learning_rate, parameters=model.parameters())
trainer = hub.Trainer(model, optimizer, checkpoint_dir=args.checkpoint_dir, use_gpu=args.use_gpu)
trainer.train(
train_dataset,
epochs=args.num_epoch,
batch_size=args.batch_size,
eval_dataset=dev_dataset,
save_interval=args.save_interval,
)
# PaddleHub 自动数据增强
本示例将展示如何使用PaddleHub搜索最适合数据的数据增强策略,并将其应用到模型训练中。
## 依赖
请预先从pip下载auto-augment软件包
```
pip install auto-augment
```
## auto-augment简述
auto-augment软件包目前支持Paddle的图像分类任务和物体检测任务。
应用时分成搜索(search)和训练(train)两个阶段
**搜索阶段在预置模型上对不同算子的组合进行策略搜索,输出最优数据增强调度策略组合**
**训练阶段在特定模型上应用最优调度数据增强策略组合 **
详细关于auto-augment的使用及benchmark可参考auto_augment/doc里的readme
## 支持任务
目前auto-augment支持paddlhub的图像分类任务。
后续会扩充到其他任务
## 图像分类任务
### 参数配置
参数配置支持yaml格式描述及json格式描述,项目中仅提供yaml格式配置模板。模板统一于configs/路径下
用户可配置参数分为task_config(任务配置),data_config(数据配置), resource_config(资源配置),algo_config(算法配置), search_space(搜索空间配置)。
#### task_config(任务配置)
​ 任务配置细节,包括任务类型及模型细节
​ 具体字段如下:
​ run_mode: ["ray", "automl_service"], #表示后端采用服务,目前支持单机ray框架
​ work_space: 用户工作空间
​ task_type: ["classifier"] #任务类型,目前PaddleHub支持图像分类单标签,需要请使用物体检测单标签任务的增强请参考auto_augment/doc
​ classifier: 具体任务类型的配置细节,
##### classifier任务配置细节
- model_name: paddlehub模型名称
- epochs: int, 任务搜索轮数, **必填** , 该参数需要特殊指定
- Input_size: 模型输入尺寸
- scale_size: 数据预处理尺寸
- no_cache_image: 不缓存数据, 默认False
- use_class_map: 使用label_list 映射
#### data_config(数据配置)
数据配置支持多种格式输入, 包括图像分类txt标注格式, 物体检测voc标注格式, 物体检测coco标注格式.
- train_img_prefix:str. 训练集数据路径前缀
- train_ann_file:str, 训练集数据描述文件,
- val_img_prefix:str, 验证集数据路径前缀
- val_ann_file:str,验证集数据描述文件
- label_list:str, 标签文件
- delimiter: "," 数据描述文件采用的分隔符
#### resource_config(资源配置)
- gpu:float, 表示每个搜索进程的gpu分配资源,run_mode=="ray"模式下支持小数分配
- cpu: float, 表示每个搜索进程的cpu分配资源,run_mode=="ray"模式下支持小数分配
#### algo_config(算法配置)
算法配置目前仅支持PBA,后续会进一步拓展。
##### PBA配置
- algo_name: str, ["PBA"], 搜索算法
- algo_param:
- perturbation_interval: 搜索扰动周期
- num_samples:搜索进程数
#### search_space(搜索空间配置)
搜索空间定义, 策略搜索阶段必填, 策略应用训练会忽略。
- operators_repeat: int,默认1, 表示搜索算子的重复次数。
- operator_space: 搜索的算子空间
1. 自定义算子模式:
htype: str, ["choice"] 超参类型,目前支持choice枚举
value: list, [0,0.5,1] 枚举数据
![image-20200707162627074](./doc/operators.png)
2. 缩略版算子模式:
用户只需要指定需要搜索的算子,prob, magtitue搜索空间为系统默认配置,为0-1之间。
![image-20200707162709253](./doc/short_operators.png)
支持1,2模式混合定议
##### 图像分类算子
["Sharpness", "Rotate", "Invert", "Brightness", "Cutout", "Equalize","TranslateY", "AutoContrast", "Color","TranslateX", "Solarize", "ShearX","Contrast", "Posterize", "ShearY", "FlipLR"]
### 搜索阶段
用于数据增强策略的搜索
### 训练阶段
在训练中应用搜索出来的数据增强策略
### 示例demo
#### Flower数据组织
```
cd PaddleHub/demo/autaug/
mkdir -p ./dataset
cd dataset
wget https://bj.bcebos.com/paddlehub-dataset/flower_photos.tar.gz
tar -xvf flower_photos.tar.gz
```
#### 搜索流程
```
cd PaddleHub/demo/autaug/
bash search.sh
# 结果会以json形式dump到workspace中,用户可利用这个json文件进行训练
```
#### 训练阶段
```
cd PaddleHub/demo/autaug/
bash train.sh
```
# -*- coding: utf-8 -*-
#*******************************************************************************
#
# Copyright (c) 2020 Baidu.com, Inc. All Rights Reserved
#
#*******************************************************************************
"""
Authors: lvhaijun01@baidu.com
Date: 2020-11-24 20:43
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import time
import six
import os
from typing import Dict, List, Optional, Union, Tuple
from auto_augment.autoaug.utils import log
import logging
logger = log.get_logger(level=logging.INFO)
import auto_augment
auto_augment_path = auto_augment.__file__
class HubFitterClassifer(object):
"""Trains an instance of the Model class."""
def __init__(self, hparams: dict) -> None:
"""
定义分类任务的数据、模型
Args:
hparams:
"""
def set_paddle_flags(**kwargs):
for key, value in kwargs.items():
if os.environ.get(key, None) is None:
os.environ[key] = str(value)
# NOTE(paddle-dev): All of these flags should be set before
# `import paddle`. Otherwise, it would not take any effect.
set_paddle_flags(
# enable GC to save memory
FLAGS_fraction_of_gpu_memory_to_use=hparams.resource_config.gpu, )
import paddle
import paddlehub as hub
from paddlehub_utils.trainer import CustomTrainer
from paddlehub_utils.reader import _init_loader
# todo now does not support fleet distribute training
# from paddle.fluid.incubate.fleet.base import role_maker
# from paddle.fluid.incubate.fleet.collective import fleet
# role = role_maker.PaddleCloudRoleMaker(is_collective=True)
# fleet.init(role)
logger.info("classficiation data augment search begin")
self.hparams = hparams
# param compatible
self._fit_param(show=True)
paddle.disable_static(paddle.CUDAPlace(paddle.distributed.get_rank()))
train_dataset, eval_dataset = _init_loader(self.hparams)
model = hub.Module(
name=hparams["task_config"]["classifier"]["model_name"],
label_list=self.class_to_id_dict.keys(),
load_checkpoint=None)
optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
trainer = CustomTrainer(model=model, optimizer=optimizer, checkpoint_dir='img_classification_ckpt')
self.model = model
self.optimizer = optimizer
trainer.init_train_and_eval(
train_dataset, epochs=100, batch_size=32, eval_dataset=eval_dataset, save_interval=1)
self.trainer = trainer
def _fit_param(self, show: bool = False) -> None:
"""
param fit
Args:
hparams:
Returns:
"""
hparams = self.hparams
self._get_label_info(hparams)
def _get_label_info(self, hparams: dict) -> None:
"""
Args:
hparams:
Returns:
"""
from paddlehub_utils.reader import _read_classes
data_config = hparams.data_config
label_list = data_config.label_list
if os.path.isfile(label_list):
class_to_id_dict = _read_classes(label_list)
else:
assert 0, "label_list:{} not exist".format(label_list)
self.num_classes = len(class_to_id_dict)
self.class_to_id_dict = class_to_id_dict
def reset_config(self, new_hparams: dict) -> None:
"""
reset config, used by search stage
Args:
new_hparams:
Returns:
"""
self.hparams = new_hparams
self.trainer.train_loader.dataset.reset_policy(new_hparams.search_space)
return None
def save_model(self, checkpoint_dir: str, step: Optional[str] = None) -> str:
"""Dumps model into the backup_dir.
Args:
step: If provided, creates a checkpoint with the given step
number, instead of overwriting the existing checkpoints.
"""
checkpoint_path = os.path.join(checkpoint_dir, 'epoch') + '-' + str(step)
logger.info('Saving model checkpoint to {}'.format(checkpoint_path))
self.trainer.save_model(os.path.join(checkpoint_path, "checkpoint"))
return checkpoint_path
def extract_model_spec(self, checkpoint_path: str) -> None:
"""Loads a checkpoint with the architecture structure stored in the name."""
ckpt_path = os.path.join(checkpoint_path, "checkpoint")
self.trainer.load_model(ckpt_path)
logger.info('Loaded child model checkpoint from {}'.format(checkpoint_path))
def eval_child_model(self, mode: str, pass_id: int = 0) -> dict:
"""Evaluate the child model.
Args:
model: image model that will be evaluated.
data_loader: dataset object to extract eval data from.
mode: will the model be evalled on train, val or test.
Returns:
Accuracy of the model on the specified dataset.
"""
eval_loader = self.trainer.eval_loader
res = self.trainer.evaluate_process(eval_loader)
top1_acc = res["metrics"]["acc"]
if mode == "val":
return {"val_acc": top1_acc}
elif mode == "test":
return {"test_acc": top1_acc}
else:
raise NotImplementedError
def train_one_epoch(self, pass_id: int) -> dict:
"""
Args:
model:
train_loader:
optimizer:
Returns:
"""
from paddlehub.utils.utils import Timer
batch_sampler = self.trainer.batch_sampler
train_loader = self.trainer.train_loader
steps_per_epoch = len(batch_sampler)
task_config = self.hparams.task_config
task_type = task_config.task_type
epochs = task_config.classifier.epochs
timer = Timer(steps_per_epoch * epochs)
timer.start()
self.trainer.train_one_epoch(
loader=train_loader,
timer=timer,
current_epoch=pass_id,
epochs=epochs,
log_interval=10,
steps_per_epoch=steps_per_epoch)
return {"train_acc": 0}
def _run_training_loop(self, curr_epoch: int) -> dict:
"""Trains the model `m` for one epoch."""
start_time = time.time()
train_acc = self.train_one_epoch(curr_epoch)
logger.info('Epoch:{} time(min): {}'.format(curr_epoch, (time.time() - start_time) / 60.0))
return train_acc
def _compute_final_accuracies(self, iteration: int) -> dict:
"""Run once training is finished to compute final test accuracy."""
task_config = self.hparams.task_config
task_type = task_config.task_type
if (iteration >= task_config[task_type].epochs - 1):
test_acc = self.eval_child_model('test', iteration)
pass
else:
test_acc = {"test_acc": 0}
logger.info('Test acc: {}'.format(test_acc))
return test_acc
def run_model(self, epoch: int) -> dict:
"""Trains and evalutes the image model."""
self._fit_param()
train_acc = self._run_training_loop(epoch)
valid_acc = self.eval_child_model(mode="val", pass_id=epoch)
logger.info('valid acc: {}'.format(valid_acc))
all_metric = {}
all_metric.update(train_acc)
all_metric.update(valid_acc)
return all_metric
# -*- coding: utf-8 -*-
#*******************************************************************************
#
# Copyright (c) 2019 Baidu.com, Inc. All Rights Reserved
#
#*******************************************************************************
"""
Authors: lvhaijun01@baidu.com
Date: 2019-09-17 14:15
"""
# coding:utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
# *******************************************************************************
#
# Copyright (c) 2020 Baidu.com, Inc. All Rights Reserved
#
# *******************************************************************************
"""
Authors: lvhaijun01@baidu.com
Date: 2019-06-30 00:10
"""
import re
import numpy as np
from typing import Dict, List, Optional, Union, Tuple
import six
import cv2
import os
import paddle
import paddlehub.vision.transforms as transforms
from PIL import ImageFile
from auto_augment.autoaug.transform.autoaug_transform import AutoAugTransform
ImageFile.LOAD_TRUNCATED_IMAGES = True
__imagenet_stats = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]}
class PbaAugment(object):
"""
pytorch 分类 PbaAugment transform
"""
def __init__(self,
input_size: int = 224,
scale_size: int = 256,
normalize: Optional[list] = None,
pre_transform: bool = True,
stage: str = "search",
**kwargs) -> None:
"""
Args:
input_size:
scale_size:
normalize:
pre_transform:
**kwargs:
"""
if normalize is None:
normalize = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]}
policy = kwargs["policy"]
assert stage in ["search", "train"]
train_epochs = kwargs["hp_policy_epochs"]
self.auto_aug_transform = AutoAugTransform.create(policy, stage=stage, train_epochs=train_epochs)
#self.auto_aug_transform = PbtAutoAugmentClassiferTransform(conf)
if pre_transform:
self.pre_transform = transforms.Resize(input_size)
self.post_transform = transforms.Compose(
transforms=[transforms.Permute(),
transforms.Normalize(**normalize, channel_first=True)],
channel_first=False)
self.cur_epoch = 0
def set_epoch(self, indx: int) -> None:
"""
Args:
indx:
Returns:
"""
self.auto_aug_transform.set_epoch(indx)
def reset_policy(self, new_hparams: dict) -> None:
"""
Args:
new_hparams:
Returns:
"""
self.auto_aug_transform.reset_policy(new_hparams)
def __call__(self, img: np.ndarray):
"""
Args:
img: PIL image
Returns:
"""
# tensform resize
if self.pre_transform:
img = self.pre_transform(img)
img = self.auto_aug_transform.apply(img)
img = img.astype(np.uint8)
img = self.post_transform(img)
return img
class PicRecord(object):
"""
PicRecord
"""
def __init__(self, row: list) -> None:
"""
Args:
row:
"""
self._data = row
@property
def sub_path(self) -> str:
"""
Returns:
"""
return self._data[0]
@property
def label(self) -> str:
"""
Returns:
"""
return self._data[1]
class PicReader(paddle.io.Dataset):
"""
PicReader
"""
def __init__(self,
root_path: str,
list_file: str,
meta: bool = False,
transform: Optional[callable] = None,
class_to_id_dict: Optional[dict] = None,
cache_img: bool = False,
**kwargs) -> None:
"""
Args:
root_path:
list_file:
meta:
transform:
class_to_id_dict:
cache_img:
**kwargs:
"""
self.root_path = root_path
self.list_file = list_file
self.transform = transform
self.meta = meta
self.class_to_id_dict = class_to_id_dict
self.train_type = kwargs["conf"].get("train_type", "single_label")
self.class_num = kwargs["conf"].get("class_num", 0)
self._parse_list(**kwargs)
self.cache_img = cache_img
self.cache_img_buff = dict()
if self.cache_img:
self._get_all_img(**kwargs)
def _get_all_img(self, **kwargs) -> None:
"""
缓存图片进行预resize, 减少内存占用
Returns:
"""
scale_size = kwargs.get("scale_size", 256)
for idx in range(len(self)):
record = self.pic_list[idx]
relative_path = record.sub_path
if self.root_path is not None:
image_path = os.path.join(self.root_path, relative_path)
else:
image_path = relative_path
try:
img = cv2.imread(image_path, cv2.IMREAD_COLOR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (scale_size, scale_size))
self.cache_img_buff[image_path] = img
except BaseException:
print("img_path:{} can not by cv2".format(image_path).format(image_path))
pass
def _load_image(self, directory: str) -> np.ndarray:
"""
Args:
directory:
Returns:
"""
if not self.cache_img:
img = cv2.imread(directory, cv2.IMREAD_COLOR).astype('float32')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# img = Image.open(directory).convert('RGB')
else:
if directory in self.cache_img_buff:
img = self.cache_img_buff[directory]
else:
img = cv2.imread(directory, cv2.IMREAD_COLOR).astype('float32')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# img = Image.open(directory).convert('RGB')
return img
def _parse_list(self, **kwargs) -> None:
"""
Args:
**kwargs:
Returns:
"""
delimiter = kwargs.get("delimiter", " ")
self.pic_list = []
with open(self.list_file) as f:
lines = f.read().splitlines()
print("PicReader:: found {} picture in `{}'".format(len(lines), self.list_file))
for i, line in enumerate(lines):
record = re.split(delimiter, line)
# record = line.split()
assert len(record) == 2, "length of record is not 2!"
if not os.path.splitext(record[0])[1]:
# 适配线上分类数据转无后缀的情况
record[0] = record[0] + ".jpg"
# 线上单标签情况兼容多标签,后续需去除
record[1] = re.split(",", record[1])[0]
self.pic_list.append(PicRecord(record))
def __getitem__(self, index: int):
"""
Args:
index:
Returns:
"""
record = self.pic_list[index]
return self.get(record)
def get(self, record: PicRecord) -> tuple:
"""
Args:
record:
Returns:
"""
relative_path = record.sub_path
if self.root_path is not None:
image_path = os.path.join(self.root_path, relative_path)
else:
image_path = relative_path
img = self._load_image(image_path)
# print("org img sum:{}".format(np.sum(np.asarray(img))))
process_data = self.transform(img)
if self.train_type == "single_label":
if self.class_to_id_dict:
label = self.class_to_id_dict[record.label]
else:
label = int(record.label)
elif self.train_type == "multi_labels":
label_tensor = np.zeros((1, self.class_num))
for label in record.label.split(","):
label_tensor[0, int(label)] = 1
label_tensor = np.squeeze(label_tensor)
label = label_tensor
if self.meta:
return process_data, label, relative_path
else:
return process_data, label
def __len__(self) -> int:
"""
Returns:
"""
return len(self.pic_list)
def set_meta(self, meta: bool) -> None:
"""
Args:
meta:
Returns:
"""
self.meta = meta
def set_epoch(self, epoch: int) -> None:
"""
Args:
epoch:
Returns:
"""
if self.transform is not None:
self.transform.set_epoch(epoch)
# only use in search
def reset_policy(self, new_hparams: dict) -> None:
"""
Args:
new_hparams:
Returns:
"""
if self.transform is not None:
self.transform.reset_policy(new_hparams)
def _parse(value: str, function: callable, fmt: str) -> None:
"""
Parse a string into a value, and format a nice ValueError if it fails.
Returns `function(value)`.
Any `ValueError` raised is catched and a new `ValueError` is raised
with message `fmt.format(e)`, where `e` is the caught `ValueError`.
"""
try:
return function(value)
except ValueError as e:
six.raise_from(ValueError(fmt.format(e)), None)
def _read_classes(csv_file: str) -> dict:
""" Parse the classes file.
"""
result = {}
with open(csv_file) as csv_reader:
for line, row in enumerate(csv_reader):
try:
class_name = row.strip()
# print(class_id, class_name)
except ValueError:
six.raise_from(ValueError('line {}: format should be \'class_name\''.format(line)), None)
class_id = _parse(line, int, 'line {}: malformed class ID: {{}}'.format(line))
if class_name in result:
raise ValueError('line {}: duplicate class name: \'{}\''.format(line, class_name))
result[class_name] = class_id
return result
def _init_loader(hparams: dict, TrainTransform=None) -> tuple:
"""
Args:
hparams:
Returns:
"""
train_data_root = hparams.data_config.train_img_prefix
val_data_root = hparams.data_config.val_img_prefix
train_list = hparams.data_config.train_ann_file
val_list = hparams.data_config.val_ann_file
input_size = hparams.task_config.classifier.input_size
scale_size = hparams.task_config.classifier.scale_size
search_space = hparams.search_space
search_space["task_type"] = hparams.task_config.task_type
epochs = hparams.task_config.classifier.epochs
no_cache_img = hparams.task_config.classifier.get("no_cache_img", False)
normalize = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]}
if TrainTransform is None:
TrainTransform = PbaAugment(
input_size=input_size,
scale_size=scale_size,
normalize=normalize,
policy=search_space,
hp_policy_epochs=epochs,
)
delimiter = hparams.data_config.delimiter
kwargs = dict(conf=hparams, delimiter=delimiter)
if hparams.task_config.classifier.use_class_map:
class_to_id_dict = _read_classes(label_list=hparams.data_config.label_list)
else:
class_to_id_dict = None
train_data = PicReader(
root_path=train_data_root,
list_file=train_list,
transform=TrainTransform,
class_to_id_dict=class_to_id_dict,
cache_img=not no_cache_img,
**kwargs)
val_data = PicReader(
root_path=val_data_root,
list_file=val_list,
transform=transforms.Compose(
transforms=[
transforms.Resize((224, 224)),
transforms.Permute(),
transforms.Normalize(**normalize, channel_first=True)
],
channel_first=False),
class_to_id_dict=class_to_id_dict,
cache_img=not no_cache_img,
**kwargs)
return train_data, val_data
# -*- coding: utf-8 -*-
#*******************************************************************************
#
# Copyright (c) 2019 Baidu.com, Inc. All Rights Reserved
#
#*******************************************************************************
"""
Authors: lvhaijun01@baidu.com
Date: 2020-11-24 20:46
"""
from paddlehub.finetune.trainer import Trainer
import os
from collections import defaultdict
import paddle
from paddle.distributed import ParallelEnv
from paddlehub.utils.log import logger
from paddlehub.utils.utils import Timer
class CustomTrainer(Trainer):
def __init__(self, **kwargs) -> None:
super(CustomTrainer, self).__init__(**kwargs)
def init_train_and_eval(self,
train_dataset: paddle.io.Dataset,
epochs: int = 1,
batch_size: int = 1,
num_workers: int = 0,
eval_dataset: paddle.io.Dataset = None,
log_interval: int = 10,
save_interval: int = 10) -> None:
self.batch_sampler, self.train_loader = self.init_train(train_dataset, batch_size, num_workers)
self.eval_loader = self.init_evaluate(eval_dataset, batch_size, num_workers)
def init_train(self, train_dataset: paddle.io.Dataset, batch_size: int = 1, num_workers: int = 0) -> tuple:
use_gpu = True
place = paddle.CUDAPlace(ParallelEnv().dev_id) if use_gpu else paddle.CPUPlace()
paddle.disable_static(place)
batch_sampler = paddle.io.DistributedBatchSampler(
train_dataset, batch_size=batch_size, shuffle=True, drop_last=False)
loader = paddle.io.DataLoader(
train_dataset, batch_sampler=batch_sampler, places=place, num_workers=num_workers, return_list=True)
return batch_sampler, loader
def train_one_epoch(self, loader: paddle.io.DataLoader, timer: Timer, current_epoch: int, epochs: int,
log_interval: int, steps_per_epoch: int) -> None:
avg_loss = 0
avg_metrics = defaultdict(int)
self.model.train()
for batch_idx, batch in enumerate(loader):
loss, metrics = self.training_step(batch, batch_idx)
self.optimizer_step(current_epoch, batch_idx, self.optimizer, loss)
self.optimizer_zero_grad(current_epoch, batch_idx, self.optimizer)
# calculate metrics and loss
avg_loss += loss.numpy()[0]
for metric, value in metrics.items():
avg_metrics[metric] += value.numpy()[0]
timer.count()
if (batch_idx + 1) % log_interval == 0 and self.local_rank == 0:
lr = self.optimizer.get_lr()
avg_loss /= log_interval
if self.use_vdl:
self.log_writer.add_scalar(tag='TRAIN/loss', step=timer.current_step, value=avg_loss)
print_msg = 'Epoch={}/{}, Step={}/{}'.format(current_epoch, epochs, batch_idx + 1, steps_per_epoch)
print_msg += ' loss={:.4f}'.format(avg_loss)
for metric, value in avg_metrics.items():
value /= log_interval
if self.use_vdl:
self.log_writer.add_scalar(tag='TRAIN/{}'.format(metric), step=timer.current_step, value=value)
print_msg += ' {}={:.4f}'.format(metric, value)
print_msg += ' lr={:.6f} step/sec={:.2f} | ETA {}'.format(lr, timer.timing, timer.eta)
logger.train(print_msg)
avg_loss = 0
avg_metrics = defaultdict(int)
def train(self,
train_dataset: paddle.io.Dataset,
epochs: int = 1,
batch_size: int = 1,
num_workers: int = 0,
eval_dataset: paddle.io.Dataset = None,
log_interval: int = 10,
save_interval: int = 10):
'''
Train a model with specific config.
Args:
train_dataset(paddle.io.Dataset) : Dataset to train the model
epochs(int) : Number of training loops, default is 1.
batch_size(int) : Batch size of per step, default is 1.
num_workers(int) : Number of subprocess to load data, default is 0.
eval_dataset(paddle.io.Dataset) : The validation dataset, deafult is None. If set, the Trainer will
execute evaluate function every `save_interval` epochs.
log_interval(int) : Log the train infomation every `log_interval` steps.
save_interval(int) : Save the checkpoint every `save_interval` epochs.
'''
batch_sampler, loader = self.init_train(train_dataset, batch_size, num_workers)
steps_per_epoch = len(batch_sampler)
timer = Timer(steps_per_epoch * epochs)
timer.start()
for i in range(epochs):
loader.dataset.set_epoch(epochs)
self.current_epoch += 1
self.train_one_epoch(loader, timer, self.current_epoch, epochs, log_interval, steps_per_epoch)
# todo, why paddlehub put save, eval in batch?
if self.current_epoch % save_interval == 0 and self.local_rank == 0:
if eval_dataset:
result = self.evaluate(eval_dataset, batch_size, num_workers)
eval_loss = result.get('loss', None)
eval_metrics = result.get('metrics', {})
if self.use_vdl:
if eval_loss:
self.log_writer.add_scalar(tag='EVAL/loss', step=timer.current_step, value=eval_loss)
for metric, value in eval_metrics.items():
self.log_writer.add_scalar(
tag='EVAL/{}'.format(metric), step=timer.current_step, value=value)
if not self.best_metrics or self.compare_metrics(self.best_metrics, eval_metrics):
self.best_metrics = eval_metrics
best_model_path = os.path.join(self.checkpoint_dir, 'best_model')
self.save_model(best_model_path)
self._save_metrics()
metric_msg = ['{}={:.4f}'.format(metric, value) for metric, value in self.best_metrics.items()]
metric_msg = ' '.join(metric_msg)
logger.eval('Saving best model to {} [best {}]'.format(best_model_path, metric_msg))
self._save_checkpoint()
def init_evaluate(self, eval_dataset: paddle.io.Dataset, batch_size: int, num_workers: int) -> paddle.io.DataLoader:
use_gpu = True
place = paddle.CUDAPlace(ParallelEnv().dev_id) if use_gpu else paddle.CPUPlace()
paddle.disable_static(place)
batch_sampler = paddle.io.DistributedBatchSampler(
eval_dataset, batch_size=batch_size, shuffle=False, drop_last=False)
loader = paddle.io.DataLoader(
eval_dataset, batch_sampler=batch_sampler, places=place, num_workers=num_workers, return_list=True)
return loader
def evaluate_process(self, loader: paddle.io.DataLoader) -> dict:
self.model.eval()
avg_loss = num_samples = 0
sum_metrics = defaultdict(int)
avg_metrics = defaultdict(int)
for batch_idx, batch in enumerate(loader):
result = self.validation_step(batch, batch_idx)
loss = result.get('loss', None)
metrics = result.get('metrics', {})
bs = batch[0].shape[0]
num_samples += bs
if loss:
avg_loss += loss.numpy()[0] * bs
for metric, value in metrics.items():
sum_metrics[metric] += value.numpy()[0] * bs
# print avg metrics and loss
print_msg = '[Evaluation result]'
if loss:
avg_loss /= num_samples
print_msg += ' avg_loss={:.4f}'.format(avg_loss)
for metric, value in sum_metrics.items():
avg_metrics[metric] = value / num_samples
print_msg += ' avg_{}={:.4f}'.format(metric, avg_metrics[metric])
logger.eval(print_msg)
if loss:
return {'loss': avg_loss, 'metrics': avg_metrics}
return {'metrics': avg_metrics}
def evaluate(self, eval_dataset: paddle.io.Dataset, batch_size: int = 1, num_workers: int = 0) -> dict:
'''
Run evaluation and returns metrics.
Args:
eval_dataset(paddle.io.Dataset) : The validation dataset
batch_size(int) : Batch size of per step, default is 1.
num_workers(int) : Number of subprocess to load data, default is 0.
'''
loader = self.init_evaluate(eval_dataset, batch_size, num_workers)
res = self.evaluate_process(loader)
return res
task_config:
run_mode: "ray"
workspace: "./work_dirs/pbt_hub_classifer/test_autoaug"
task_type: "classifier"
classifier:
model_name: "resnet50_vd_imagenet_ssld"
epochs: 100
input_size: 224
scale_size: 256
no_cache_img: false
use_class_map: false
data_config:
train_img_prefix: "./dataset/flower_photos"
train_ann_file: "./dataset/flower_photos/train_list.txt"
val_img_prefix: "./dataset/flower_photos"
val_ann_file: "./dataset/flower_photos/validate_list.txt"
label_list: "./dataset/flower_photos/label_list.txt"
delimiter: " "
resource_config:
gpu: 0.4
cpu: 1
algo_config:
algo_name: "PBA"
algo_param:
perturbation_interval: 3
num_samples: 8
search_space:
operator_space:
-
name: Sharpness
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Rotate
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Invert
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Brightness
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Cutout
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Equalize
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: TranslateY
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: AutoContrast
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Color
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: TranslateX
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Solarize
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: ShearX
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Contrast
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: Posterize
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: ShearY
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
-
name: FlipLR
prob:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
magtitude:
htype: choice
value: [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
from auto_augment.autoaug.experiment.experiment import AutoAugExperiment
from auto_augment.autoaug.utils.yaml_config import get_config
from hub_fitter import HubFitterClassifer
import os
import argparse
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
parser = argparse.ArgumentParser()
parser.add_argument(
"--config",
help="config file",
)
parser.add_argument(
"--workspace",
default=None,
help="work_space",
)
def main():
search_test()
def search_test():
args = parser.parse_args()
config = args.config
config = get_config(config, show=True)
task_config = config.task_config
data_config = config.data_config
resource_config = config.resource_config
algo_config = config.algo_config
search_space = config.get("search_space", None)
if args.workspace is not None:
task_config["workspace"] = args.workspace
workspace = task_config["workspace"]
# 算法,任务,资源,数据,搜索空间(optional)配置导入,
exper = AutoAugExperiment.create(
algo_config=algo_config,
task_config=task_config,
resource_config=resource_config,
data_config=data_config,
search_space=search_space,
fitter=HubFitterClassifer)
result = exper.search() # 开始搜索任务
policy = result.get_best_policy() # 最佳策略获取, policy格式见 搜索结果应用格式
print("policy is:{}".format(policy))
dump_path = os.path.join(workspace, "auto_aug_config.json")
result.dump_best_policy(path=dump_path)
if __name__ == "__main__":
main()
#!/usr/bin/env bash
export FLAGS_fast_eager_deletion_mode=1
export FLAGS_eager_delete_tensor_gb=0.0
config="./pba_classifier_example.yaml"
workspace="./work_dirs//autoaug_flower_mobilenetv2"
# workspace工作空间需要初始化
rm -rf ${workspace}
mkdir -p ${workspace}
CUDA_VISIBLE_DEVICES=0,1 python -u search.py \
--config=${config} \
--workspace=${workspace} 2>&1 | tee -a ${workspace}/log.txt
# -*- coding: utf-8 -*-
#*******************************************************************************
#
# Copyright (c) 2020 Baidu.com, Inc. All Rights Reserved
#
#*******************************************************************************
"""
Authors: lvhaijun01@baidu.com
Date: 2020-11-26 20:57
"""
from auto_augment.autoaug.utils.yaml_config import get_config
from hub_fitter import HubFitterClassifer
import os
import argparse
import logging
import paddlehub as hub
import paddle
import paddlehub.vision.transforms as transforms
from paddlehub_utils.reader import _init_loader, PbaAugment
from paddlehub_utils.reader import _read_classes
from paddlehub_utils.trainer import CustomTrainer
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
parser = argparse.ArgumentParser()
parser.add_argument(
"--config",
help="config file",
)
parser.add_argument(
"--workspace",
default=None,
help="work_space",
)
parser.add_argument(
"--policy",
default=None,
help="data aug policy",
)
if __name__ == '__main__':
args = parser.parse_args()
config = args.config
config = get_config(config, show=True)
task_config = config.task_config
data_config = config.data_config
resource_config = config.resource_config
algo_config = config.algo_config
input_size = task_config.classifier.input_size
scale_size = task_config.classifier.scale_size
normalize = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]}
epochs = task_config.classifier.epochs
policy = args.policy
if policy is None:
print("use normal train transform")
TrainTransform = transforms.Compose(
transforms=[
transforms.Resize((input_size, input_size)),
transforms.Permute(),
transforms.Normalize(**normalize, channel_first=True)
],
channel_first=False)
else:
TrainTransform = PbaAugment(
input_size=input_size,
scale_size=scale_size,
normalize=normalize,
policy=policy,
hp_policy_epochs=epochs,
stage="train")
train_dataset, eval_dataset = _init_loader(config, TrainTransform=TrainTransform)
class_to_id_dict = _read_classes(config.data_config.label_list)
model = hub.Module(
name=config.task_config.classifier.model_name, label_list=class_to_id_dict.keys(), load_checkpoint=None)
optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
trainer = CustomTrainer(model=model, optimizer=optimizer, checkpoint_dir='img_classification_ckpt')
trainer.train(train_dataset, epochs=epochs, batch_size=32, eval_dataset=eval_dataset, save_interval=10)
#!/usr/bin/env bash
export FLAGS_fast_eager_deletion_mode=1
export FLAGS_eager_delete_tensor_gb=0.0
config="./pba_classifier_example.yaml"
workspace="./work_dirs//autoaug_flower_mobilenetv2"
# workspace工作空间需要初始化
mkdir -p ${workspace}
policy=./work_dirs//autoaug_flower_mobilenetv2/auto_aug_config.json
CUDA_VISIBLE_DEVICES=0,1 python train.py \
--config=${config} \
--policy=${policy} \
--workspace=${workspace} 2>&1 | tee -a ${workspace}/log.txt
# PaddleHub超参优化——图像分类
**确认安装PaddleHub版本在1.3.0以上, 同时PaddleHub AutoDL Finetuner功能要求至少有一张GPU显卡可用。**
本示例展示如何利用PaddleHub超参优化AutoDL Finetuner,得到一个效果较佳的超参数组合。
每次执行AutoDL Finetuner,用户只需要定义搜索空间,改动几行代码,就能利用PaddleHub搜索最好的超参组合。 只需要两步即可完成:
* 定义搜索空间:AutoDL Finetuner会根据搜索空间来取样生成参数和网络架构。搜索空间通过YAML文件来定义。
* 改动模型代码:需要首先定义参数组,并更新模型代码。
## Step1:定义搜索空间
AutoDL Finetuner会根据搜索空间来取样生成参数和网络架构。搜索空间通过YAML文件来定义。
要定义搜索空间,需要定义变量名称、类型及其搜索范围。通过这些信息构建了一个超参空间,
PaddleHub将在这个空间内进行超参数的搜索,将搜索到的超参传入train.py获得评估效果,根据评估效果自动调整超参搜索方向,直到满足搜索次数。
以Fine-tune图像分类任务为例, 以下是待优化超参数的yaml文件hparam.yaml,包含需要搜素的超参名字、类型、范围等信息。目前参数搜索类型只支持float和int类型。
```
param_list:
- name : learning_rate
init_value : 0.001
type : float
lower_than : 0.05
greater_than : 0.00005
- name : batch_size
init_value : 12
type : int
lower_than : 20
greater_than : 10
```
## Step2:改动模型代码
img_cls.py以mobilenet为预训练模型,在flowers数据集上进行Fine-tune。PaddleHub如何完成Finetune可以参考[图像分类迁移学习示例](../image_classification)
* import paddlehub
在img_cls.py加上`import paddlehub as hub`
* 从AutoDL Finetuner获得参数值
1. img_cls.py的选项参数须包含待优化超参数,需要将超参以argparser的方式写在其中,待搜索超参数选项名字和yaml文件中的超参数名字保持一致。
2. img_cls.py须包含选项参数saved_params_dir,优化后的参数将会保存到该路径下。
3. 超参评估策略选择PopulationBased时,img_cls.py须包含选项参数model_path,自动从model_path指定的路径恢复模型
* 返回配置的最终效果
img_cls.py须反馈模型的评价效果(建议使用验证集或者测试集上的评价效果),通过调用`report_final_result`接口反馈,如
```python
hub.report_final_result(eval_avg_score["acc"])
```
**NOTE:** 输出的评价效果取值范围应为`(-∞, 1]`,取值越高,表示效果越好。
## 启动AutoDL Finetuner
在完成安装PaddlePaddle与PaddleHub后,通过执行脚本`sh run_autofinetune.sh`即可开始使用超参优化功能。
**NOTE:** 关于PaddleHub超参优化详情参考[教程](../../docs/tutorial/autofinetune.md)
param_list:
- name : learning_rate
init_value : 0.001
type : float
lower_than : 0.05
greater_than : 0.00005
- name : batch_size
init_value : 12
type : int
lower_than : 20
greater_than : 10
# coding:utf-8
import argparse
import os
import ast
import shutil
import paddlehub as hub
from paddlehub.common.logger import logger
parser = argparse.ArgumentParser(__doc__)
parser.add_argument(
"--epochs", type=int, default=5, help="Number of epoches for fine-tuning.")
parser.add_argument(
"--checkpoint_dir", type=str, default=None, help="Path to save log data.")
parser.add_argument(
"--module",
type=str,
default="mobilenet",
help="Module used as feature extractor.")
# the name of hyper-parameters to be searched should keep with hparam.py
parser.add_argument(
"--batch_size",
type=int,
default=16,
help="Total examples' number in batch for training.")
parser.add_argument(
"--learning_rate", type=float, default=1e-4, help="learning_rate.")
# saved_params_dir and model_path are needed by auto fine-tune
parser.add_argument(
"--saved_params_dir",
type=str,
default="",
help="Directory for saving model")
parser.add_argument(
"--model_path", type=str, default="", help="load model path")
module_map = {
"resnet50": "resnet_v2_50_imagenet",
"resnet101": "resnet_v2_101_imagenet",
"resnet152": "resnet_v2_152_imagenet",
"mobilenet": "mobilenet_v2_imagenet",
"nasnet": "nasnet_imagenet",
"pnasnet": "pnasnet_imagenet"
}
def is_path_valid(path):
if path == "":
return False
path = os.path.abspath(path)
dirname = os.path.dirname(path)
if not os.path.exists(dirname):
os.mkdir(dirname)
return True
def finetune(args):
# Load Paddlehub pretrained model, default as mobilenet
module = hub.Module(name=args.module)
input_dict, output_dict, program = module.context(trainable=True)
# Download dataset and use ImageClassificationReader to read dataset
dataset = hub.dataset.Flowers()
data_reader = hub.reader.ImageClassificationReader(
image_width=module.get_expected_image_width(),
image_height=module.get_expected_image_height(),
images_mean=module.get_pretrained_images_mean(),
images_std=module.get_pretrained_images_std(),
dataset=dataset)
# The last 2 layer of resnet_v2_101_imagenet network
feature_map = output_dict["feature_map"]
img = input_dict["image"]
feed_list = [img.name]
# Select fine-tune strategy, setup config and fine-tune
strategy = hub.DefaultFinetuneStrategy(learning_rate=args.learning_rate)
config = hub.RunConfig(
use_cuda=True,
num_epoch=args.epochs,
batch_size=args.batch_size,
checkpoint_dir=args.checkpoint_dir,
strategy=strategy)
# Construct transfer learning network
task = hub.ImageClassifierTask(
data_reader=data_reader,
feed_list=feed_list,
feature=feature_map,
num_classes=dataset.num_labels,
config=config)
# Load model from the defined model path or not
if args.model_path != "":
with task.phase_guard(phase="train"):
task.init_if_necessary()
task.load_parameters(args.model_path)
logger.info("PaddleHub has loaded model from %s" % args.model_path)
# Fine-tune by PaddleHub's API
task.finetune()
# Evaluate by PaddleHub's API
run_states = task.eval()
# Get acc score on dev
eval_avg_score, eval_avg_loss, eval_run_speed = task._calculate_metrics(
run_states)
# Move ckpt/best_model to the defined saved parameters directory
best_model_dir = os.path.join(config.checkpoint_dir, "best_model")
if is_path_valid(args.saved_params_dir) and os.path.exists(best_model_dir):
shutil.copytree(best_model_dir, args.saved_params_dir)
shutil.rmtree(config.checkpoint_dir)
# acc on dev will be used by auto fine-tune
hub.report_final_result(eval_avg_score["acc"])
if __name__ == "__main__":
args = parser.parse_args()
if not args.module in module_map:
hub.logger.error("module should in %s" % module_map.keys())
exit(1)
args.module = module_map[args.module]
finetune(args)
OUTPUT=result
hub autofinetune img_cls.py \
--param_file=hparam.yaml \
--gpu=0 \
--popsize=15 \
--round=10 \
--output_dir=${OUTPUT} \
--evaluator=fulltrail \
--tuning_strategy=pshe2
# PaddleHub超参优化——文本分类
**确认安装PaddleHub版本在1.3.0以上, 同时PaddleHub AutoDL Finetuner功能要求至少有一张GPU显卡可用。**
本示例展示如何利用PaddleHub超参优化AutoDL Finetuner,得到一个效果较佳的超参数组合。
每次执行AutoDL Finetuner,用户只需要定义搜索空间,改动几行代码,就能利用PaddleHub搜索最好的超参组合。 只需要两步即可完成:
* 定义搜索空间:AutoDL Finetuner会根据搜索空间来取样生成参数和网络架构。搜索空间通过YAML文件来定义。
* 改动模型代码:需要首先定义参数组,并更新模型代码。
## Step1:定义搜索空间
AutoDL Finetuner会根据搜索空间来取样生成参数和网络架构。搜索空间通过YAML文件来定义。
要定义搜索空间,需要定义变量名称、类型及其搜索范围。通过这些信息构建了一个超参空间,
PaddleHub将在这个空间内进行超参数的搜索,将搜索到的超参传入train.py获得评估效果,根据评估效果自动调整超参搜索方向,直到满足搜索次数。
以Fine-tune文本分类任务为例, 以下是待优化超参数的yaml文件hparam.yaml,包含需要搜素的超参名字、类型、范围等信息。目前参数搜索类型只支持float和int类型。
```
param_list:
- name : learning_rate
init_value : 0.001
type : float
lower_than : 0.05
greater_than : 0.000005
- name : weight_decay
init_value : 0.1
type : float
lower_than : 1
greater_than : 0.0
- name : batch_size
init_value : 32
type : int
lower_than : 40
greater_than : 30
- name : warmup_prop
init_value : 0.1
type : float
lower_than : 0.2
greater_than : 0.0
```
## Step2:改动模型代码
text_cls.py以ernie为预训练模型,在ChnSentiCorp数据集上进行Fine-tune。PaddleHub如何完成Finetune可以参考[文本分类迁移学习示例](../text_classification)
* import paddlehub
在text_cls.py加上`import paddlehub as hub`
* 从AutoDL Finetuner获得参数值
1. text_cls.py的选项参数须包含待优化超参数,需要将超参以argparser的方式写在其中,待搜索超参数选项名字和yaml文件中的超参数名字保持一致。
2. text_cls.py须包含选项参数saved_params_dir,优化后的参数将会保存到该路径下。
3. 超参评估策略选择PopulationBased时,text_cls.py须包含选项参数model_path,自动从model_path指定的路径恢复模型
* 返回配置的最终效果
text_cls.py须反馈模型的评价效果(建议使用验证集或者测试集上的评价效果),通过调用`report_final_result`接口反馈,如
```python
hub.report_final_result(eval_avg_score["acc"])
```
**NOTE:** 输出的评价效果取值范围应为`(-∞, 1]`,取值越高,表示效果越好。
## 启动AutoDL Finetuner
在完成安装PaddlePaddle与PaddleHub后,通过执行脚本`sh run_autofinetune.sh`即可开始使用超参优化功能。
**NOTE:** 关于PaddleHub超参优化详情参考[教程](../../docs/tutorial/autofinetune.md)
param_list:
- name : learning_rate
init_value : 0.001
type : float
lower_than : 0.05
greater_than : 0.000005
- name : weight_decay
init_value : 0.1
type : float
lower_than : 1
greater_than : 0.0
- name : batch_size
init_value : 32
type : int
lower_than : 40
greater_than : 30
- name : warmup_prop
init_value : 0.1
type : float
lower_than : 0.2
greater_than : 0.0
OUTPUT=result
hub autofinetune text_cls.py \
--param_file=hparam.yaml \
--gpu=0 \
--popsize=15 \
--round=10 \
--output_dir=${OUTPUT} \
--evaluator=fulltrail \
--tuning_strategy=pshe2
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import shutil
import paddlehub as hub
import os
from paddlehub.common.logger import logger
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--epochs", type=int, default=3, help="epochs.")
# the name of hyper-parameters to be searched should keep with hparam.py
parser.add_argument("--batch_size", type=int, default=32, help="batch_size.")
parser.add_argument(
"--learning_rate", type=float, default=5e-5, help="learning_rate.")
parser.add_argument(
"--warmup_prop", type=float, default=0.1, help="warmup_prop.")
parser.add_argument(
"--weight_decay", type=float, default=0.01, help="weight_decay.")
parser.add_argument(
"--max_seq_len",
type=int,
default=128,
help="Number of words of the longest seqence.")
parser.add_argument(
"--checkpoint_dir",
type=str,
default=None,
help="Directory to model checkpoint")
# saved_params_dir and model_path are needed by auto fine-tune
parser.add_argument(
"--saved_params_dir",
type=str,
default="",
help="Directory for saving model during ")
parser.add_argument(
"--model_path", type=str, default="", help="load model path")
args = parser.parse_args()
def is_path_valid(path):
if path == "":
return False
path = os.path.abspath(path)
dirname = os.path.dirname(path)
if not os.path.exists(dirname):
os.mkdir(dirname)
return True
if __name__ == '__main__':
# Load Paddlehub ERNIE pretrained model
module = hub.Module(name="ernie")
inputs, outputs, program = module.context(
trainable=True, max_seq_len=args.max_seq_len)
# Download dataset and use ClassifyReader to read dataset
dataset = hub.dataset.ChnSentiCorp()
metrics_choices = ["acc"]
reader = hub.reader.ClassifyReader(
dataset=dataset,
vocab_path=module.get_vocab_path(),
max_seq_len=args.max_seq_len)
# Construct transfer learning network
# Use "pooled_output" for classification tasks on an entire sentence.
pooled_output = outputs["pooled_output"]
# Setup feed list for data feeder
# Must feed all the tensor of ERNIE's module need
feed_list = [
inputs["input_ids"].name,
inputs["position_ids"].name,
inputs["segment_ids"].name,
inputs["input_mask"].name,
]
# Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_prop,
learning_rate=args.learning_rate,
weight_decay=args.weight_decay,
lr_scheduler="linear_decay")
# Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig(
checkpoint_dir=args.checkpoint_dir,
use_cuda=True,
num_epoch=args.epochs,
batch_size=args.batch_size,
enable_memory_optim=True,
strategy=strategy)
# Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask(
data_reader=reader,
feature=pooled_output,
feed_list=feed_list,
num_classes=dataset.num_labels,
config=config,
metrics_choices=metrics_choices)
# Load model from the defined model path or not
if args.model_path != "":
with cls_task.phase_guard(phase="train"):
cls_task.init_if_necessary()
cls_task.load_parameters(args.model_path)
logger.info("PaddleHub has loaded model from %s" % args.model_path)
cls_task.finetune()
run_states = cls_task.eval()
eval_avg_score, eval_avg_loss, eval_run_speed = cls_task._calculate_metrics(
run_states)
# Move ckpt/best_model to the defined saved parameters directory
best_model_dir = os.path.join(config.checkpoint_dir, "best_model")
if is_path_valid(args.saved_params_dir) and os.path.exists(best_model_dir):
shutil.copytree(best_model_dir, args.saved_params_dir)
shutil.rmtree(config.checkpoint_dir)
# acc on dev will be used by auto fine-tune
hub.report_final_result(eval_avg_score["acc"])
# PaddleHub 图像着色
本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
## 命令行预测
```
$ hub run user_guided_colorization --input_path "/PATH/TO/IMAGE"
```
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用user_guided_colorization模型对[Canvas](../../docs/reference/datasets.md#class-hubdatasetsCanvas)等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 定义数据预处理方式
```python
import paddlehub.vision.transforms as T
transform = T.Compose([T.Resize((256, 256), interpolation='NEAREST'),
T.RandomPaddingCrop(crop_size=176),
T.RGB2LAB()], to_rgb=True)
```
`transforms`数据增强模块定义了丰富的数据预处理方式,用户可按照需求替换自己需要的数据预处理方式。
**NOTE:** 要将`T.Compose``to_rgb`设定为True.
### Step2: 下载数据集并使用
```python
from paddlehub.datasets import Canvas
color_set = Canvas(transform=transform, mode='train')
```
* `transform`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test` 默认为`train`
数据集的准备代码可以参考 [canvas.py](../../paddlehub/datasets/canvas.py)`hub.datasets.Canvas()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
### Step3: 加载预训练模型
```python
model = hub.Module(name='user_guided_colorization', load_checkpoint=None)
model.set_config(classification=True, prob=1)
```
* `name`:加载模型的名字。
* `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
* `classification`: 着色模型分两部分训练,开始阶段`classification`设置为True, 用于浅层网络训练。训练后期将`classification`设置为False, 用于训练网络的输出层。
* `prob`: 每张输入图不加一个先验彩色块的概率,默认为1,即不加入先验彩色块。例如,当`prob`设定为0.9时,一张图上有两个先验彩色块的概率为(1-0.9)*(1-0.9)*0.9=0.009.
### Step4: 选择优化策略和运行配置
```python
optimizer = paddle.optimizer.Adam(learning_rate=0.0001, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='img_colorization_ckpt_cls_1')
trainer.train(color_set, epochs=201, batch_size=25, eval_dataset=color_set, log_interval=10, save_interval=10)
```
#### 优化策略
Paddle2.0-rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,详细参见[策略](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/optimizer/optimizer/Optimizer_cn.html)
其中`Adam`:
* `learning_rate`: 全局学习率。默认为1e-4;
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证过程所用的数据集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='user_guided_colorization', load_checkpoint='/PATH/TO/CHECKPOINT')
model.set_config(prob=0.1)
result = model.predict(images=['house.png'])
```
参数配置正确后,请执行脚本`python predict.py`, 加载模型具体可参见[加载](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/framework/io/load_cn.html#load)
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。若想获取油画风着色效果,请下载参数文件[油画着色](https://paddlehub.bj.bcebos.com/dygraph/models/canvas_rc.pdparams)
**Args**
* `images`:原始图像路径或者BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认为'result'。
## 服务部署
PaddleHub Serving可以部署一个在线着色任务服务。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m user_guided_colorization
```
这样就完成了一个着色任务服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/user_guided_colorization"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
data = base64_to_cv2(r.json()["results"]['data'][0]['fake_reg'])
cv2.imwrite('color.png', data)
```
### 查看代码
https://github.com/richzhang/colorization-pytorch
### 依赖
paddlepaddle >= 2.0.0rc
paddlehub >= 2.0.0
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='user_guided_colorization', load_checkpoint='/PATH/TO/CHECKPOINT')
model.set_config(prob=0.1)
result = model.predict(images=['house.png'])
import paddle
import paddlehub as hub
import paddlehub.vision.transforms as T
from paddlehub.finetune.trainer import Trainer
from paddlehub.datasets import Canvas
if __name__ == '__main__':
transform = T.Compose(
[T.Resize((256, 256), interpolation='NEAREST'),
T.RandomPaddingCrop(crop_size=176),
T.RGB2LAB()], to_rgb=True)
color_set = Canvas(transform=transform, mode='train')
model = hub.Module(name='user_guided_colorization', load_checkpoint='/PATH/TO/CHECKPOINT')
model.set_config(classification=True, prob=1)
optimizer = paddle.optimizer.Adam(learning_rate=0.0001, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='img_colorization_ckpt_cls_1')
trainer.train(color_set, epochs=201, batch_size=25, eval_dataset=color_set, log_interval=10, save_interval=10)
model.set_config(classification=False, prob=0.125)
optimizer = paddle.optimizer.Adam(learning_rate=0.00001, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='img_colorization_ckpt_reg_1')
trainer.train(color_set, epochs=101, batch_size=25, log_interval=10, save_interval=10)
# PaddleHub 图像分类
本示例将展示如何使用PaddleHub Fine-tune API以及[ResNet](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_50_imagenet&en_category=ImageClassification)等预训练模型完成分类任务。
本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行脚本`sh run_classifier.sh`即可开始使用ResNet对[Flowers](../../docs/reference/dataset.md#class-hubdatasetflowers)等数据集进行Fine-tune。
其中脚本参数说明如下:
## 命令行预测
```shell
--batch_size: 批处理大小,请结合显存情况进行调整,若出现显存不足,请适当调低这一参数。默认为16;
--num_epoch: Fine-tune迭代的轮数。默认为1;
--module: 使用哪个Module作为Fine-tune的特征提取器,脚本支持{resnet50/resnet101/resnet152/mobilenet/nasnet/pnasnet}等模型。默认为resnet50;
--checkpoint_dir: 模型保存路径,PaddleHub会自动保存验证集上表现最好的模型。默认为paddlehub_finetune_ckpt;
--dataset: 使用什么数据集进行Fine-tune, 脚本支持分别是{flowers/dogcat/stanforddogs/indoor67/food101}。默认为flowers;
--use_gpu: 是否使用GPU进行训练,如果机器支持GPU且安装了GPU版本的PaddlePaddle,我们建议您打开这个开关。默认关闭;
--use_data_parallel: 是否使用数据并行,打开该开关时,会将数据分散到不同的卡上进行训练(CPU下会分布到不同线程)。默认打开;
$ hub run resnet50_vd_imagenet_ssld --input_path "/PATH/TO/IMAGE" --top_k 5
```
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用resnet50_vd_imagenet_ssld对[Flowers](../../docs/reference/datasets.md#class-hubdatasetsflowers)等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 加载预训练模型
### Step1: 定义数据预处理方式
```python
module = hub.Module(name="resnet_v2_50_imagenet")
inputs, outputs, program = module.context(trainable=True)
import paddlehub.vision.transforms as T
transforms = T.Compose([T.Resize((256, 256)),
T.CenterCrop(224),
T.Normalize(mean=[0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225])],
to_rgb=True)
```
PaddleHub提供许多图像分类预训练模型,如xception、mobilenet、efficientnet等,详细信息参见[图像分类模型](https://www.paddlepaddle.org.cn/hub?filter=en_category&value=ImageClassification)
`transforms` 数据增强模块定义了丰富的数据预处理方式,用户可按照需求替换自己需要的数据预处理方式
如果想尝试efficientnet模型,只需要更换Module中的`name`参数即可.
### Step2: 下载数据集并使用
```python
# 更换name参数即可无缝切换efficientnet模型, 代码示例如下
module = hub.Module(name="efficientnetb7_imagenet")
```
from paddlehub.datasets import Flowers
### Step2: 下载数据集并使用ImageClassificationReader读取数据
```python
dataset = hub.dataset.Flowers()
data_reader = hub.reader.ImageClassificationReader(
image_width=module.get_expected_image_width(),
image_height=module.get_expected_image_height(),
images_mean=module.get_pretrained_images_mean(),
images_std=module.get_pretrained_images_std(),
dataset=dataset)
flowers = Flowers(transforms)
flowers_validate = Flowers(transforms, mode='val')
```
其中数据集的准备代码可以参考 [flowers.py](../../paddlehub/dataset/flowers.py)
同时,PaddleHub提供了更多的图像分类数据集:
* `transforms`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
| 数据集 | API |
| -------- | ------------------------------------------ |
| Flowers | hub.dataset.Flowers() |
| DogCat | hub.dataset.DogCat() |
| Indoor67 | hub.dataset.Indoor67() |
| Food101 | hub.dataset.Food101() |
数据集的准备代码可以参考 [flowers.py](../../paddlehub/datasets/flowers.py)`hub.datasets.Flowers()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
`hub.dataset.Flowers()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
`module.get_expected_image_width()``module.get_expected_image_height()`会返回预训练模型对应的图片尺寸。
### Step3: 加载预训练模型
`module.module.get_pretrained_images_mean()``module.get_pretrained_images_std()`会返回预训练模型对应的图片均值和方差。
```python
model = hub.Module(name="resnet50_vd_imagenet_ssld", label_list=["roses", "tulips", "daisy", "sunflowers", "dandelion"])
```
* `name`: 选择预训练模型的名字。
* `label_list`: 设置输出分类类别,默认为Imagenet2012类别。
#### 自定义数据集
PaddleHub提供许多图像分类预训练模型,如xception、mobilenet、efficientnet等,详细信息参见[图像分类模型](https://www.paddlepaddle.org.cn/hub?filter=en_category&value=ImageClassification)
如果想加载自定义数据集完成迁移学习,详细参见[自定义数据集](../../docs/tutorial/how_to_load_data.md)
如果想尝试efficientnet模型,只需要更换Module中的`name`参数即可.
```python
# 更换name参数即可无缝切换efficientnet模型, 代码示例如下
model = hub.Module(name="efficientnetb7_imagenet")
```
**NOTE:**目前部分模型还没有完全升级到2.0版本,敬请期待。
### Step3:选择优化策略和运行配置
### Step4: 选择优化策略和运行配置
```python
strategy = hub.DefaultFinetuneStrategy(
learning_rate=1e-4,
optimizer_name="adam",
regularization_coeff=1e-3)
optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='img_classification_ckpt')
config = hub.RunConfig(use_cuda=True, use_data_parallel=True, num_epoch=3, batch_size=32, strategy=strategy)
trainer.train(flowers, epochs=100, batch_size=32, eval_dataset=flowers_validate, save_interval=1)
```
#### 优化策略
PaddleHub提供了许多优化策略,如`AdamWeightDecayStrategy``ULMFiTStrategy``DefaultFinetuneStrategy`等,详细信息参见[策略](../../docs/reference/strategy.md)
Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`:
其中`DefaultFinetuneStrategy`:
* `learning_rate`: 全局学习率。默认为1e-4;
* `optimizer_name`: 优化器名称。默认adam;
* `regularization_coeff`: 正则化的λ参数。默认为1e-3;
* `learning_rate`: 全局学习率。默认为1e-3;
* `parameters`: 待优化模型参数。
#### 运行配置
`RunConfig` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `log_interval`: 进度日志打印间隔,默认每10个step打印一次;
* `eval_interval`: 模型评估的间隔,默认每100个step评估一次验证集;
* `save_ckpt_interval`: 模型保存间隔,请根据任务大小配置,默认只保存验证集效果最好的模型和训练结束的模型;
* `use_cuda`: 是否使用GPU训练,默认为False;
* `use_pyreader`: 是否使用pyreader,默认False;
* `use_data_parallel`: 是否使用并行计算,默认True。打开该功能依赖nccl库;
* `checkpoint_dir`: 模型checkpoint保存路径, 若用户没有指定,程序会自动生成;
* `num_epoch`: Fine-tune的轮数;
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `strategy`: Fine-tune优化策略;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
### Step4: 构建网络并创建分类迁移任务进行Fine-tune
我们使用该模型来进行预测。predict.py脚本如下:
```python
feature_map = output_dict["feature_map"]
feed_list = [input_dict["image"].name]
import paddle
import paddlehub as hub
task = hub.ImageClassifierTask(
data_reader=data_reader,
feed_list=feed_list,
feature=feature_map,
num_classes=dataset.num_labels,
config=config)
if __name__ == '__main__':
task.finetune_and_eval()
model = hub.Module(name='resnet50_vd_imagenet_ssld', label_list=["roses", "tulips", "daisy", "sunflowers", "dandelion"], load_checkpoint='/PATH/TO/CHECKPOINT')
result = model.predict(['flower.jpg'])
```
**NOTE:**
1. `output_dict["feature_map"]`返回了resnet/mobilenet等模型对应的feature_map,可以用于图片的特征表达。
2. `feed_list`中的inputs参数指明了resnet/mobilenet等模型的输入tensor的顺序,与ImageClassifierTask返回的结果一致。
3. `hub.ImageClassifierTask`通过输入特征,label与迁移的类别数,可以生成适用于图像分类的迁移任务`ImageClassifierTask`
#### 自定义迁移任务
参数配置正确后,请执行脚本`python predict.py`, 加载模型具体可参见[加载](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/framework/io/load_cn.html#load)
如果想改变迁移任务组网,详细参见[自定义迁移任务](../../docs/tutorial/how_to_define_task.md)
## 可视化
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
Fine-tune API训练过程中会自动对关键训练指标进行打点,启动程序后执行下面命令
```bash
$ visualdl --logdir $CKPT_DIR/visualization --host ${HOST_IP} --port ${PORT_NUM}
```
其中${HOST_IP}为本机IP地址,${PORT_NUM}为可用端口号,如本机IP地址为192.168.0.1,端口号8040,用浏览器打开192.168.0.1:8040,即可看到训练过程中指标的变化情况。
## 服务部署
## 模型预测
PaddleHub Serving可以部署一个在线分类任务服务。
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
### Step1: 启动PaddleHub Serving
我们使用该模型来进行预测。predict.py脚本支持的参数如下
运行启动命令
```shell
--module: 使用哪个Module作为Fine-tune的特征提取器,脚本支持{resnet50/resnet101/resnet152/mobilenet/nasnet/pnasnet}等模型。默认为resnet50;
--checkpoint_dir: 模型保存路径,PaddleHub会自动保存验证集上表现最好的模型。默认为paddlehub_finetune_ckpt;
--dataset: 使用什么数据集进行Fine-tune, 脚本支持分别是{flowers/dogcat}。默认为flowers;
--use_gpu: 使用使用GPU进行训练,如果本机支持GPU且安装了GPU版本的PaddlePaddle,我们建议您打开这个开关。默认关闭;
--use_pyreader: 是否使用pyreader进行数据喂入。默认关闭;
$ hub serving start -m resnet50_vd_imagenet_ssld
```
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样
这样就完成了一个分类任务服务化API的部署,默认端口号为8866
参数配置正确后,请执行脚本`sh run_predict.sh`,即可看到以下图片分类预测结果。
如需了解更多预测步骤,请参考`predict.py`
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)], 'top_k':2}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/resnet50_vd_imagenet_ssld"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
data =r.json()["results"]['data']
```
我们在AI Studio上提供了IPython NoteBook形式的demo,您可以直接在平台上在线体验,链接如下:
### 查看代码
|预训练模型|任务类型|数据集|AIStudio链接|备注|
|-|-|-|-|-|
|ResNet|图像分类|猫狗数据集DogCat|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147010)||
|ERNIE|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147006)||
|ERNIE|文本分类|中文新闻分类数据集THUNEWS|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/221999)|本教程讲述了如何将自定义数据集加载,并利用Fine-tune API完成文本分类迁移学习。|
|ERNIE|序列标注|中文序列标注数据集MSRA_NER|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147009)||
|ERNIE|序列标注|中文快递单数据集Express|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/184200)|本教程讲述了如何将自定义数据集加载,并利用Fine-tune API完成序列标注迁移学习。|
|ERNIE Tiny|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/186443)||
|Senta|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/216846)|本教程讲述了任何利用Senta和Fine-tune API完成情感分类迁移学习。|
|Senta|情感分析预测|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215814)||
|LAC|词法分析|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215711)||
|Ultra-Light-Fast-Generic-Face-Detector-1MB|人脸检测|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215962)||
https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification
### 依赖
## 超参优化AutoDL Finetuner
paddlepaddle >= 2.0.0rc
PaddleHub还提供了超参优化(Hyperparameter Tuning)功能, 自动搜索最优模型超参得到更好的模型效果。详细信息参见[AutoDL Finetuner超参优化功能教程](../../docs/tutorial/autofinetune.md)
paddlehub >= 2.0.0
#coding:utf-8
import argparse
import os
import ast
import paddle.fluid as fluid
import paddlehub as hub
import numpy as np
# yapf: disable
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=1, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning.")
parser.add_argument("--checkpoint_dir", type=str, default="paddlehub_finetune_ckpt", help="Path to save log data.")
parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.")
parser.add_argument("--module", type=str, default="resnet50", help="Module used as feature extractor.")
parser.add_argument("--dataset", type=str, default="flowers", help="Dataset to fine-tune.")
parser.add_argument("--use_data_parallel", type=ast.literal_eval, default=True, help="Whether use data parallel.")
# yapf: enable.
module_map = {
"resnet50": "resnet_v2_50_imagenet",
"resnet101": "resnet_v2_101_imagenet",
"resnet152": "resnet_v2_152_imagenet",
"mobilenet": "mobilenet_v2_imagenet",
"nasnet": "nasnet_imagenet",
"pnasnet": "pnasnet_imagenet"
}
def finetune(args):
# Load Paddlehub pretrained model
module = hub.Module(name=args.module)
input_dict, output_dict, program = module.context(trainable=True)
# Download dataset
if args.dataset.lower() == "flowers":
dataset = hub.dataset.Flowers()
elif args.dataset.lower() == "dogcat":
dataset = hub.dataset.DogCat()
elif args.dataset.lower() == "indoor67":
dataset = hub.dataset.Indoor67()
elif args.dataset.lower() == "food101":
dataset = hub.dataset.Food101()
elif args.dataset.lower() == "stanforddogs":
dataset = hub.dataset.StanfordDogs()
else:
raise ValueError("%s dataset is not defined" % args.dataset)
# Use ImageClassificationReader to read dataset
data_reader = hub.reader.ImageClassificationReader(
image_width=module.get_expected_image_width(),
image_height=module.get_expected_image_height(),
images_mean=module.get_pretrained_images_mean(),
images_std=module.get_pretrained_images_std(),
dataset=dataset)
feature_map = output_dict["feature_map"]
# Setup feed list for data feeder
feed_list = [input_dict["image"].name]
# Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig(
use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu,
num_epoch=args.num_epoch,
batch_size=args.batch_size,
checkpoint_dir=args.checkpoint_dir,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy())
# Define a image classification task by PaddleHub Fine-tune API
task = hub.ImageClassifierTask(
data_reader=data_reader,
feed_list=feed_list,
feature=feature_map,
num_classes=dataset.num_labels,
config=config)
# Fine-tune by PaddleHub's API
task.finetune_and_eval()
if __name__ == "__main__":
args = parser.parse_args()
if not args.module in module_map:
hub.logger.error("module should in %s" % module_map.keys())
exit(1)
args.module = module_map[args.module]
finetune(args)
#coding:utf-8
import argparse
import os
import numpy as np
import paddlehub as hub
import paddle.fluid as fluid
from paddle.fluid.dygraph import Linear
from paddle.fluid.dygraph.base import to_variable
from paddle.fluid.optimizer import AdamOptimizer
# yapf: disable
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=1, help="Number of epoches for fine-tuning.")
parser.add_argument("--checkpoint_dir", type=str, default="paddlehub_finetune_ckpt_dygraph", help="Path to save log data.")
parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.")
parser.add_argument("--log_interval", type=int, default=10, help="log interval.")
parser.add_argument("--save_interval", type=int, default=10, help="save interval.")
# yapf: enable.
class ResNet50(fluid.dygraph.Layer):
def __init__(self, num_classes, backbone):
super(ResNet50, self).__init__()
self.fc = Linear(input_dim=2048, output_dim=num_classes)
self.backbone = backbone
def forward(self, imgs):
feature_map = self.backbone(imgs)
feature_map = fluid.layers.reshape(feature_map, shape=[-1, 2048])
pred = self.fc(feature_map)
return fluid.layers.softmax(pred)
def finetune(args):
with fluid.dygraph.guard():
resnet50_vd_10w = hub.Module(name="resnet50_vd_10w")
dataset = hub.dataset.Flowers()
resnet = ResNet50(
num_classes=dataset.num_labels, backbone=resnet50_vd_10w)
adam = AdamOptimizer(
learning_rate=0.001, parameter_list=resnet.parameters())
state_dict_path = os.path.join(args.checkpoint_dir,
'dygraph_state_dict')
if os.path.exists(state_dict_path + '.pdparams'):
state_dict, _ = fluid.load_dygraph(state_dict_path)
resnet.load_dict(state_dict)
reader = hub.reader.ImageClassificationReader(
image_width=resnet50_vd_10w.get_expected_image_width(),
image_height=resnet50_vd_10w.get_expected_image_height(),
images_mean=resnet50_vd_10w.get_pretrained_images_mean(),
images_std=resnet50_vd_10w.get_pretrained_images_std(),
dataset=dataset)
train_reader = reader.data_generator(
batch_size=args.batch_size, phase='train')
loss_sum = acc_sum = cnt = 0
# 执行epoch_num次训练
for epoch in range(args.num_epoch):
# 读取训练数据进行训练
for batch_id, data in enumerate(train_reader()):
imgs = np.array(data[0][0])
labels = np.array(data[0][1])
pred = resnet(imgs)
acc = fluid.layers.accuracy(pred, to_variable(labels))
loss = fluid.layers.cross_entropy(pred, to_variable(labels))
avg_loss = fluid.layers.mean(loss)
avg_loss.backward()
# 参数更新
adam.minimize(avg_loss)
loss_sum += avg_loss.numpy() * imgs.shape[0]
acc_sum += acc.numpy() * imgs.shape[0]
cnt += imgs.shape[0]
if batch_id % args.log_interval == 0:
print('epoch {}: loss {}, acc {}'.format(
epoch, loss_sum / cnt, acc_sum / cnt))
loss_sum = acc_sum = cnt = 0
if batch_id % args.save_interval == 0:
state_dict = resnet.state_dict()
fluid.save_dygraph(state_dict, state_dict_path)
if __name__ == "__main__":
args = parser.parse_args()
finetune(args)
#coding:utf-8
import argparse
import os
import ast
import paddle.fluid as fluid
import paddle
import paddlehub as hub
import numpy as np
# yapf: disable
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for predict.")
parser.add_argument("--checkpoint_dir", type=str, default="paddlehub_finetune_ckpt", help="Path to save log data.")
parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.")
parser.add_argument("--module", type=str, default="resnet50", help="Module used as a feature extractor.")
parser.add_argument("--dataset", type=str, default="flowers", help="Dataset to fine-tune.")
# yapf: enable.
module_map = {
"resnet50": "resnet_v2_50_imagenet",
"resnet101": "resnet_v2_101_imagenet",
"resnet152": "resnet_v2_152_imagenet",
"mobilenet": "mobilenet_v2_imagenet",
"nasnet": "nasnet_imagenet",
"pnasnet": "pnasnet_imagenet"
}
def predict(args):
# Load Paddlehub pretrained model
module = hub.Module(name=args.module)
input_dict, output_dict, program = module.context(trainable=True)
# Download dataset
if args.dataset.lower() == "flowers":
dataset = hub.dataset.Flowers()
elif args.dataset.lower() == "dogcat":
dataset = hub.dataset.DogCat()
elif args.dataset.lower() == "indoor67":
dataset = hub.dataset.Indoor67()
elif args.dataset.lower() == "food101":
dataset = hub.dataset.Food101()
elif args.dataset.lower() == "stanforddogs":
dataset = hub.dataset.StanfordDogs()
else:
raise ValueError("%s dataset is not defined" % args.dataset)
# Use ImageClassificationReader to read dataset
data_reader = hub.reader.ImageClassificationReader(
image_width=module.get_expected_image_width(),
image_height=module.get_expected_image_height(),
images_mean=module.get_pretrained_images_mean(),
images_std=module.get_pretrained_images_std(),
dataset=dataset)
feature_map = output_dict["feature_map"]
# Setup feed list for data feeder
feed_list = [input_dict["image"].name]
# Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig(
use_data_parallel=False,
use_cuda=args.use_gpu,
batch_size=args.batch_size,
checkpoint_dir=args.checkpoint_dir,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy())
# Define a image classification task by PaddleHub Fine-tune API
task = hub.ImageClassifierTask(
data_reader=data_reader,
feed_list=feed_list,
feature=feature_map,
num_classes=dataset.num_labels,
config=config)
data = ["./test/test_img_daisy.jpg", "./test/test_img_roses.jpg"]
print(task.predict(data=data, return_result=True))
if __name__ == "__main__":
args = parser.parse_args()
if not args.module in module_map:
hub.logger.error("module should in %s" % module_map.keys())
exit(1)
args.module = module_map[args.module]
predict(args)
if __name__ == '__main__':
model = hub.Module(
name='resnet50_vd_imagenet_ssld',
label_list=["roses", "tulips", "daisy", "sunflowers", "dandelion"],
load_checkpoint='/PATH/TO/CHECKPOINT')
result = model.predict(['flower.jpg'])
export FLAGS_eager_delete_tensor_gb=0.0
export CUDA_VISIBLE_DEVICES=0
python -u img_classifier.py $@
export FLAGS_eager_delete_tensor_gb=0.0
export CUDA_VISIBLE_DEVICES=0
python -u predict.py $@
IMAGE_PATH
./resources/test/test_img_bird.jpg
input_data:
image:
type : IMAGE
key : IMAGE_PATH
config:
top_only : True
import paddle
import paddlehub as hub
import paddlehub.vision.transforms as T
from paddlehub.finetune.trainer import Trainer
from paddlehub.datasets import Flowers
if __name__ == '__main__':
transforms = T.Compose(
[T.Resize((256, 256)),
T.CenterCrop(224),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])],
to_rgb=True)
flowers = Flowers(transforms)
flowers_validate = Flowers(transforms, mode='val')
model = hub.Module(
name='resnet50_vd_imagenet_ssld',
label_list=["roses", "tulips", "daisy", "sunflowers", "dandelion"],
load_checkpoint=None)
optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='img_classification_ckpt', use_gpu=True)
trainer.train(flowers, epochs=100, batch_size=32, eval_dataset=flowers_validate, save_interval=10)
# LAC 词法分析
本示例展示如何使用LAC Module进行预测。
LAC是中文词法分析模型,可以用于进行中文句子的分词/词性标注/命名实体识别等功能,关于模型的细节参见[模型介绍](https://www.paddlepaddle.org.cn/hubdetail?name=lac&en_category=LexicalAnalysis)
## 命令行方式预测
`cli_demo.sh`给出了使用命令行接口(Command Line Interface)调用Module预测的示例脚本,
通过以下命令试验下效果。
```shell
$ hub run lac --input_text "今天是个好日子"
$ hub run lac --input_file test.txt --user_dict user.dict
```
test.txt 存放待分词文本, 如:
```text
今天是个好日子
今天天气晴朗
```
user.dict为用户自定义词典,可以不指定,当指定自定义词典时,可以干预默认分词结果。
词典包含三列,第一列为单词,第二列为单词词性,第三列为单词词频,以水平制表符\t分隔。词频越高的单词,对分词结果影响越大,词典样例如下:
```text
天气预报 n 400000
经 v 1000
常 d 1000
```
**NOTE:**
* 该PaddleHub Module使用词典干预功能时,依赖于第三方库pyahocorasick,请自行安装;
* 请不要直接复制示例文本使用,复制后的格式可能存在问题;
## 通过Python API预测
`lac_demo.py`给出了使用python API调用PaddleHub LAC Module预测的示例代码,
通过以下命令试验下效果。
```shell
python lac_demo.py
```
#coding:utf-8
from __future__ import print_function
import json
import os
import six
import paddlehub as hub
if __name__ == "__main__":
# Load LAC Module
lac = hub.Module(name="lac")
test_text = ["今天是个好日子", "天气预报说今天要下雨", "下一班地铁马上就要到了"]
# Set input dict
inputs = {"text": test_text}
# execute predict and print the result
results = lac.lexical_analysis(data=inputs, use_gpu=True, batch_size=10)
for result in results:
if six.PY2:
print(
json.dumps(result['word'], encoding="utf8", ensure_ascii=False))
print(
json.dumps(result['tag'], encoding="utf8", ensure_ascii=False))
else:
print(result['word'])
print(result['tag'])
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册