提交 3eb807ea 编写于 作者: C ceci3

update

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# PaddleSlim
中文 | [English](README_en.md)
文档:https://paddlepaddle.github.io/PaddleSlim
# PaddleSlim
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://paddleslim.readthedocs.io/en/latest/)
[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](https://paddleslim.readthedocs.io/zh_CN/latest/)
[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
PaddleSlim是一个模型压缩工具库,包含模型剪裁、定点量化、知识蒸馏、超参搜索和模型结构搜索等一系列模型压缩策略。
......@@ -16,36 +18,213 @@ PaddleSlim会从底层能力、技术咨询合作和业务场景等角度支持
## 功能
- 模型剪裁
- 卷积通道均匀剪裁
- 基于敏感度的卷积通道剪裁
- 基于进化算法的自动剪裁
<table style="width:100%;" cellpadding="2" cellspacing="0" border="1" bordercolor="#000000">
<tbody>
<tr>
<td style="text-align:center;">
<span style="font-size:18px;">功能模块</span>
</td>
<td style="text-align:center;">
<span style="font-size:18px;">算法</span>
</td>
<td style="text-align:center;">
<span style="font-size:18px;">教程</span><span style="font-size:18px;">与文档</span>
</td>
</tr>
<tr>
<td style="text-align:center;">
<span style="font-size:12px;">剪裁</span><span style="font-size:12px;"></span><br />
</td>
<td>
<ul>
<li>
Sensitivity&nbsp;&nbsp;Pruner:&nbsp;<a href="https://arxiv.org/abs/1608.08710" target="_blank"><span style="font-family:&quot;font-size:14px;background-color:#FFFFFF;"><span style="font-family:&quot;font-size:14px;background-color:#FFFFFF;">Li H , Kadav A , Durdanovic I , et al. Pruning Filters for Efficient ConvNets[J]. 2016.</span></span></a>
</li>
<li>
AMC Pruner:&nbsp;<a href="https://arxiv.org/abs/1802.03494" target="_blank"><span style="font-family:&quot;font-size:13px;background-color:#FFFFFF;">He, Yihui , et al. "AMC: AutoML for Model Compression and Acceleration on Mobile Devices." (2018).</span></a>
</li>
<li>
FFPGM Pruner:&nbsp;<a href="https://arxiv.org/abs/1811.00250" target="_blank"><span style="font-family:&quot;font-size:14px;background-color:#FFFFFF;">He Y , Liu P , Wang Z , et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration[C]// IEEE/CVF Conference on Computer Vision &amp; Pattern Recognition. IEEE, 2019.</span></a>
</li>
<li>
Slim Pruner:<span style="background-color:#FFFDFA;">&nbsp;<a href="https://arxiv.org/pdf/1708.06519.pdf" target="_blank"><span style="font-family:&quot;font-size:14px;background-color:#FFFFFF;">Liu Z , Li J , Shen Z , et al. Learning Efficient Convolutional Networks through Network Slimming[J]. 2017.</span></a></span>
</li>
<li>
<span style="background-color:#FFFDFA;">Opt Slim Pruner:&nbsp;<a href="https://arxiv.org/pdf/1708.06519.pdf" target="_blank"><span style="font-family:&quot;font-size:14px;background-color:#FFFFFF;">Ye Y , You G , Fwu J K , et al. Channel Pruning via Optimal Thresholding[J]. 2020.</span></a><br />
</span>
</li>
</ul>
</td>
<td>
<ul>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/prune_api.rst" target="_blank">剪裁模块API文档</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/quick_start/pruning_tutorial.md" target="_blank">剪裁快速开始示例</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md" target="_blank">分类模敏感度分析教程</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/paddledetection_slim_pruing_tutorial.md" target="_blank">检测模型剪裁教程</a>
</li>
<li>
<span id="__kindeditor_bookmark_start_313__"></span><a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/paddledetection_slim_prune_dist_tutorial.md" target="_blank">检测模型剪裁+蒸馏教程</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/paddledetection_slim_sensitivy_tutorial.md" target="_blank">检测模型敏感度分析教程</a>
</li>
</ul>
</td>
</tr>
<tr>
<td style="text-align:center;">
量化
</td>
<td>
<ul>
<li>
Quantization Aware Training:&nbsp;<a href="https://arxiv.org/abs/1806.08342" target="_blank"><span style="font-family:&quot;font-size:14px;background-color:#FFFFFF;">Krishnamoorthi R . Quantizing deep convolutional networks for efficient inference: A whitepaper[J]. 2018.</span></a>
</li>
<li>
Post Training&nbsp;<span>Quantization&nbsp;</span><a href="http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf" target="_blank">原理</a>
</li>
<li>
Embedding&nbsp;<span>Quantization:&nbsp;<a href="https://arxiv.org/pdf/1603.01025.pdf" target="_blank"><span style="font-family:&quot;font-size:14px;background-color:#FFFFFF;">Miyashita D , Lee E H , Murmann B . Convolutional Neural Networks using Logarithmic Data Representation[J]. 2016.</span></a></span>
</li>
<li>
DSQ: <a href="https://arxiv.org/abs/1908.05033" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Gong, Ruihao, et al. "Differentiable soft quantization: Bridging full-precision and low-bit neural networks."&nbsp;</span><i>Proceedings of the IEEE International Conference on Computer Vision</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">. 2019.</span></a>
</li>
<li>
PACT:&nbsp; <a href="https://arxiv.org/abs/1805.06085" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Choi, Jungwook, et al. "Pact: Parameterized clipping activation for quantized neural networks."&nbsp;</span><i>arXiv preprint arXiv:1805.06085</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">&nbsp;(2018).</span></a>
</li>
</ul>
</td>
<td>
<ul>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/quantization_api.rst" target="_blank">量化API文档</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/quick_start/quant_aware_tutorial.md" target="_blank">量化训练快速开始示例</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/quick_start/quant_post_static_tutorial.md" target="_blank">静态离线量化快速开始示例</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/paddledetection_slim_quantization_tutorial.md" target="_blank">检测模型量化教程</a>
</li>
</ul>
</td>
</tr>
<tr>
<td style="text-align:center;">
蒸馏
</td>
<td>
<ul>
<li>
<span>Knowledge Distillation</span>:&nbsp;<a href="https://arxiv.org/abs/1503.02531" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network."&nbsp;</span><i>arXiv preprint arXiv:1503.02531</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">&nbsp;(2015).</span></a>
</li>
<li>
FSP <span>Knowledge Distillation</span>:&nbsp;&nbsp;<a href="http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Yim, Junho, et al. "A gift from knowledge distillation: Fast optimization, network minimization and transfer learning."&nbsp;</span><i>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">. 2017.</span></a>
</li>
<li>
YOLO Knowledge Distillation:&nbsp;&nbsp;<a href="http://openaccess.thecvf.com/content_ECCVW_2018/papers/11133/Mehta_Object_detection_at_200_Frames_Per_Second_ECCVW_2018_paper.pdf" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Mehta, Rakesh, and Cemalettin Ozturk. "Object detection at 200 frames per second."&nbsp;</span><i>Proceedings of the European Conference on Computer Vision (ECCV)</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">. 2018.</span></a>
</li>
<li>
DML:&nbsp;<a href="https://arxiv.org/abs/1706.00384" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Zhang, Ying, et al. "Deep mutual learning."&nbsp;</span><i>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">. 2018.</span></a>
</li>
</ul>
</td>
<td>
<ul>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/single_distiller_api.rst" target="_blank">蒸馏API文档</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/quick_start/distillation_tutorial.md" target="_blank">蒸馏快速开始示例</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/paddledetection_slim_distillation_tutorial.md" target="_blank">检测模型蒸馏教程</a>
</li>
</ul>
</td>
</tr>
<tr>
<td style="text-align:center;">
模型结构搜索(NAS)
</td>
<td>
<ul>
<li>
Simulate Anneal NAS:&nbsp;<a href="https://arxiv.org/pdf/2005.04117.pdf" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Abdelhamed, Abdelrahman, et al. "Ntire 2020 challenge on real image denoising: Dataset, methods and results."&nbsp;</span><i>The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">. Vol. 2. 2020.</span></a>
</li>
<li>
DARTS <a href="https://arxiv.org/abs/1806.09055" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search."&nbsp;</span><i>arXiv preprint arXiv:1806.09055</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">&nbsp;(2018).</span></a>
</li>
<li>
PC-DARTS <a href="https://arxiv.org/abs/1907.05737" target="_blank"><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">Xu, Yuhui, et al. "Pc-darts: Partial channel connections for memory-efficient differentiable architecture search."&nbsp;</span><i>arXiv preprint arXiv:1907.05737</i><span style="color:#222222;font-family:Arial, sans-serif;font-size:13px;background-color:#FFFFFF;">&nbsp;(2019).</span></a>
</li>
<li>
OneShot&nbsp;
</li>
</ul>
</td>
<td>
<ul>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/nas_api.rst" target="_blank">NAS API文档</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/darts.rst" target="_blank">DARTS API文档</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/quick_start/nas_tutorial.md" target="_blank">NAS快速开始示例</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/paddledetection_slim_nas_tutorial.md" target="_blank">检测模型NAS教程</a>
</li>
<li>
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/sanas_darts_space.md" target="_blank">SANAS进阶版实验教程-压缩DARTS产出模型</a>
</li>
</ul>
</td>
</tr>
</tbody>
</table>
<br />
- 定点量化
- 在线量化训练(training aware)
- 离线量化(post training)
## 安装
- 知识蒸馏
- 支持单进程知识蒸馏
- 支持多进程分布式知识蒸馏
```bash
pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
```
### 量化和Paddle版本的对应关系
- 神经网络结构自动搜索(NAS)
- 支持基于进化算法的轻量神经网络结构自动搜索
- 支持One-Shot网络结构自动搜索
- 支持 FLOPS / 硬件延时约束
- 支持多平台模型延时评估
- 支持用户自定义搜索算法和搜索空间
如果在ARM和GPU上预测,每个版本都可以,如果在CPU上预测,请选择Paddle 2.0对应的PaddleSlim 1.1.0版本
## 安装
- Paddle 1.7 系列版本,需要安装PaddleSlim 1.0.1版本
依赖:
```bash
pip install paddleslim==1.0.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
Paddle >= 1.7.0
- Paddle 1.8 系列版本,需要安装PaddleSlim 1.1.1版本
```bash
pip install paddleslim -i https://pypi.org/simple
pip install paddleslim==1.1.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- Paddle 2.0 系列版本,需要安装PaddleSlim 1.1.0版本
```bash
pip install paddleslim==1.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
## 使用
- [快速开始](docs/zh_cn/quick_start):通过简单示例介绍如何快速使用PaddleSlim。
......@@ -97,3 +276,10 @@ pip install paddleslim -i https://pypi.org/simple
| RK3288 | [-23%]() | +0.07% |
| Android cellphone | [-20%]() | +0.16% |
| iPhone 6s | [-17%]() | +0.32% |
## 许可证书
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
## 如何贡献代码
我们非常欢迎你可以为PaddleSlim提供代码,也十分感谢你的反馈。
中文 | [English](README.md)
文档:https://paddlepaddle.github.io/PaddleSlim
# PaddleSlim
PaddleSlim是一个模型压缩工具库,包含模型剪裁、定点量化、知识蒸馏、超参搜索和模型结构搜索等一系列模型压缩策略。
对于业务用户,PaddleSlim提供完整的模型压缩解决方案,可用于图像分类、检测、分割等各种类型的视觉场景。
同时也在持续探索NLP领域模型的压缩方案。另外,PaddleSlim提供且在不断完善各种压缩策略在经典开源任务的benchmark,
以便业务用户参考。
对于模型压缩算法研究者或开发者,PaddleSlim提供各种压缩策略的底层辅助接口,方便用户复现、调研和使用最新论文方法。
PaddleSlim会从底层能力、技术咨询合作和业务场景等角度支持开发者进行模型压缩策略相关的创新工作。
## 功能
- 模型剪裁
- 卷积通道均匀剪裁
- 基于敏感度的卷积通道剪裁
- 基于进化算法的自动剪裁
- 定点量化
- 在线量化训练(training aware)
- 离线量化(post training)
- 知识蒸馏
- 支持单进程知识蒸馏
- 支持多进程分布式知识蒸馏
- 神经网络结构自动搜索(NAS)
- 支持基于进化算法的轻量神经网络结构自动搜索
- 支持One-Shot网络结构自动搜索
- 支持 FLOPS / 硬件延时约束
- 支持多平台模型延时评估
- 支持用户自定义搜索算法和搜索空间
## 安装
依赖:
Paddle >= 1.7.0
```bash
pip install paddleslim -i https://pypi.org/simple
```
## 使用
- [快速开始](docs/zh_cn/quick_start):通过简单示例介绍如何快速使用PaddleSlim。
- [进阶教程](docs/zh_cn/tutorials):PaddleSlim高阶教程。
- [模型库](docs/zh_cn/model_zoo.md):各个压缩策略在图像分类、目标检测和图像语义分割模型上的实验结论,包括模型精度、预测速度和可供下载的预训练模型。
- [API文档](https://paddlepaddle.github.io/PaddleSlim/api_cn/index.html)
- [算法原理](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html): 介绍量化、剪枝、蒸馏、NAS的基本知识背景。
- [Paddle检测库](https://github.com/PaddlePaddle/PaddleDetection/tree/master/slim):介绍如何在检测库中使用PaddleSlim。
- [Paddle分割库](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/slim):介绍如何在分割库中使用PaddleSlim。
- [PaddleLite](https://paddlepaddle.github.io/Paddle-Lite/):介绍如何使用预测库PaddleLite部署PaddleSlim产出的模型。
## 部分压缩策略效果
### 分类模型
数据: ImageNet2012; 模型: MobileNetV1;
|压缩策略 |精度收益(baseline: 70.91%) |模型大小(baseline: 17.0M)|
|:---:|:---:|:---:|
| 知识蒸馏(ResNet50)| [+1.06%]() |-|
| 知识蒸馏(ResNet50) + int8量化训练 |[+1.10%]()| [-71.76%]()|
| 剪裁(FLOPs-50%) + int8量化训练|[-1.71%]()|[-86.47%]()|
### 图像检测模型
#### 数据:Pascal VOC;模型:MobileNet-V1-YOLOv3
| 压缩方法 | mAP(baseline: 76.2%) | 模型大小(baseline: 94MB) |
| :---------------------: | :------------: | :------------:|
| 知识蒸馏(ResNet34-YOLOv3) | [+2.8%](#) | - |
| 剪裁 FLOPs -52.88% | [+1.4%]() | [-67.76%]() |
|知识蒸馏(ResNet34-YOLOv3)+剪裁(FLOPs-69.57%)| [+2.6%]()|[-67.00%]()|
#### 数据:COCO;模型:MobileNet-V1-YOLOv3
| 压缩方法 | mAP(baseline: 29.3%) | 模型大小|
| :---------------------: | :------------: | :------:|
| 知识蒸馏(ResNet34-YOLOv3) | [+2.1%]() |-|
| 知识蒸馏(ResNet34-YOLOv3)+剪裁(FLOPs-67.56%) | [-0.3%]() | [-66.90%]()|
### 搜索
数据:ImageNet2012; 模型:MobileNetV2
|硬件环境 | 推理耗时 | Top1准确率(baseline:71.90%) |
|:---------------:|:---------:|:--------------------:|
| RK3288 | [-23%]() | +0.07% |
| Android cellphone | [-20%]() | +0.16% |
| iPhone 6s | [-17%]() | +0.32% |
......@@ -56,6 +56,30 @@ Paddle >= 1.7.0
pip install paddleslim -i https://pypi.org/simple
```
### quantization
If you want to use quantization in PaddleSlim, please install PaddleSlim as follows.
If you want to use quantized model in ARM and GPU, any PaddleSlim version is ok and you should install 1.1.0 for CPU.
- For Paddle 1.7, install PaddleSlim 1.0.1
```bash
pip install paddleslim==1.0.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- For Paddle 1.8,install PaddleSlim 1.1.1
```bash
pip install paddleslim==1.1.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- For Paddle 2.0 ,install PaddleSlim 1.1.0
```bash
pip install paddleslim==1.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
## Usage
- [QuickStart](https://paddlepaddle.github.io/PaddleSlim/quick_start/index_en.html): Introduce how to use PaddleSlim by simple examples.
......
# 深度互学习DML(Deep Mutual Learning)
本示例介绍如何使用PaddleSlim的深度互学习DML方法训练模型,算法原理请参考论文 [Deep Mutual Learning](https://arxiv.org/abs/1706.00384)
## 使用数据
示例中使用cifar100数据集进行训练, 您可以在启动训练时等待自动下载,
也可以在自行下载[数据集](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz)之后,放在当前目录的`./dataset/cifar100`路径下
## 启动命令
单卡训练, 以0号GPU为例:
```bash
CUDA_VISIBLE_DEVICES=0 python dml_train.py
```
多卡训练, 以0-3号GPU为例:
```bash
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog dml_train.py --use_parallel=True
```
## 实验结果
以下实验结果可以由默认实验配置(学习率、优化器等)训练得到,仅调整了DML训练的模型组合
如果想进一步提升实验结果可以尝试[更多优化tricks](https://arxiv.org/abs/1812.01187), 或进一步增加一次DML训练的模型数量。
| 数据集 | 网络模型 | 单独训练准确率 | 深度互学习准确率 |
| ------ | ------ | ------ | ------ |
| CIFAR100 | MobileNet X 2 | 73.65% | 76.34% (+2.69%) |
| CIFAR100 | MobileNet X 4 | 73.65% | 76.56% (+2.91%) |
| CIFAR100 | MobileNet + ResNet50 | 73.65%/76.52% | 76.00%/77.80% (+2.35%/+1.28%) |
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from PIL import Image
from PIL import ImageOps
import os
import math
import random
import tarfile
import functools
import numpy as np
from PIL import Image, ImageEnhance
import paddle
# for python2/python3 compatiablity
try:
import cPickle
except:
import _pickle as cPickle
IMAGE_SIZE = 32
IMAGE_DEPTH = 3
CIFAR_MEAN = [0.5070751592371323, 0.48654887331495095, 0.4409178433670343]
CIFAR_STD = [0.2673342858792401, 0.2564384629170883, 0.27615047132568404]
URL_PREFIX = 'https://www.cs.toronto.edu/~kriz/'
CIFAR100_URL = URL_PREFIX + 'cifar-100-python.tar.gz'
CIFAR100_MD5 = 'eb9058c3a382ffc7106e4002c42a8d85'
paddle.dataset.common.DATA_HOME = "dataset/"
def preprocess(sample, is_training):
image_array = sample.reshape(IMAGE_DEPTH, IMAGE_SIZE, IMAGE_SIZE)
rgb_array = np.transpose(image_array, (1, 2, 0))
img = Image.fromarray(rgb_array, 'RGB')
if is_training:
# pad, ramdom crop, random_flip_left_right, random_rotation
img = ImageOps.expand(img, (4, 4, 4, 4), fill=0)
left_top = np.random.randint(8, size=2)
img = img.crop((left_top[1], left_top[0], left_top[1] + IMAGE_SIZE,
left_top[0] + IMAGE_SIZE))
if np.random.randint(2):
img = img.transpose(Image.FLIP_LEFT_RIGHT)
random_angle = np.random.randint(-15, 15)
img = img.rotate(random_angle, Image.NEAREST)
img = np.array(img).astype(np.float32)
img_float = img / 255.0
img = (img_float - CIFAR_MEAN) / CIFAR_STD
img = np.transpose(img, (2, 0, 1))
return img
def reader_generator(datasets, batch_size, is_training, is_shuffle):
def read_batch(datasets):
if is_shuffle:
random.shuffle(datasets)
for im, label in datasets:
im = preprocess(im, is_training)
yield im, [int(label)]
def reader():
batch_data = []
batch_label = []
for data in read_batch(datasets):
batch_data.append(data[0])
batch_label.append(data[1])
if len(batch_data) == batch_size:
batch_data = np.array(batch_data, dtype='float32')
batch_label = np.array(batch_label, dtype='int64')
batch_out = [batch_data, batch_label]
yield batch_out
batch_data = []
batch_label = []
return reader
def cifar100_reader(file_name, data_name, is_shuffle):
with tarfile.open(file_name, mode='r') as f:
names = [
each_item.name for each_item in f if data_name in each_item.name
]
names.sort()
datasets = []
for name in names:
print("Reading file " + name)
try:
batch = cPickle.load(
f.extractfile(name), encoding='iso-8859-1')
except:
batch = cPickle.load(f.extractfile(name))
data = batch['data']
labels = batch.get('labels', batch.get('fine_labels', None))
assert labels is not None
dataset = zip(data, labels)
datasets.extend(dataset)
if is_shuffle:
random.shuffle(datasets)
return datasets
def train_valid(batch_size, is_train, is_shuffle):
name = 'train' if is_train else 'test'
datasets = cifar100_reader(
paddle.dataset.common.download(CIFAR100_URL, 'cifar', CIFAR100_MD5),
name, is_shuffle)
reader = reader_generator(datasets, batch_size, is_train, is_shuffle)
return reader
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import argparse
import functools
import logging
import paddle.fluid as fluid
from paddle.fluid.dygraph.base import to_variable
from paddleslim.common import AvgrageMeter, get_logger
from paddleslim.dist import DML
from paddleslim.models.dygraph import MobileNetV1
import cifar100_reader as reader
sys.path[0] = os.path.join(os.path.dirname("__file__"), os.path.pardir)
from utility import add_arguments, print_arguments
logger = get_logger(__name__, level=logging.INFO)
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('log_freq', int, 100, "Log frequency.")
add_arg('batch_size', int, 256, "Minibatch size.")
add_arg('init_lr', float, 0.1, "The start learning rate.")
add_arg('use_gpu', bool, True, "Whether use GPU.")
add_arg('epochs', int, 200, "Epoch number.")
add_arg('class_num', int, 100, "Class number of dataset.")
add_arg('trainset_num', int, 50000, "Images number of trainset.")
add_arg('model_save_dir', str, 'saved_models', "The path to save model.")
add_arg('use_multiprocess', bool, True, "Whether use multiprocess reader.")
add_arg('use_parallel', bool, False, "Whether to use data parallel mode to train the model.")
# yapf: enable
def create_optimizer(models, args):
device_num = fluid.dygraph.parallel.Env().nranks
step = int(args.trainset_num / (args.batch_size * device_num))
epochs = [60, 120, 180]
bd = [step * e for e in epochs]
lr = [args.init_lr * (0.1**i) for i in range(len(bd) + 1)]
optimizers = []
for cur_model in models:
learning_rate = fluid.dygraph.PiecewiseDecay(bd, lr, 0)
opt = fluid.optimizer.MomentumOptimizer(
learning_rate,
0.9,
parameter_list=cur_model.parameters(),
use_nesterov=True,
regularization=fluid.regularizer.L2DecayRegularizer(5e-4))
optimizers.append(opt)
return optimizers
def create_reader(place, args):
train_reader = reader.train_valid(
batch_size=args.batch_size, is_train=True, is_shuffle=True)
valid_reader = reader.train_valid(
batch_size=args.batch_size, is_train=False, is_shuffle=False)
if args.use_parallel:
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
train_loader = fluid.io.DataLoader.from_generator(
capacity=1024,
return_list=True,
use_multiprocess=args.use_multiprocess)
valid_loader = fluid.io.DataLoader.from_generator(
capacity=1024,
return_list=True,
use_multiprocess=args.use_multiprocess)
train_loader.set_batch_generator(train_reader, places=place)
valid_loader.set_batch_generator(valid_reader, places=place)
return train_loader, valid_loader
def train(train_loader, dml_model, dml_optimizer, args):
dml_model.train()
costs = [AvgrageMeter() for i in range(dml_model.model_num)]
accs = [AvgrageMeter() for i in range(dml_model.model_num)]
for step_id, (images, labels) in enumerate(train_loader):
images, labels = to_variable(images), to_variable(labels)
batch_size = images.shape[0]
logits = dml_model.forward(images)
precs = [
fluid.layers.accuracy(
input=l, label=labels, k=1) for l in logits
]
losses = dml_model.loss(logits, labels)
dml_optimizer.minimize(losses)
for i in range(dml_model.model_num):
accs[i].update(precs[i].numpy(), batch_size)
costs[i].update(losses[i].numpy(), batch_size)
model_names = dml_model.full_name()
if step_id % args.log_freq == 0:
log_msg = "Train Step {}".format(step_id)
for model_id, (cost, acc) in enumerate(zip(costs, accs)):
log_msg += ", {} loss: {:.6f} acc: {:.6f}".format(
model_names[model_id], cost.avg[0], acc.avg[0])
logger.info(log_msg)
return costs, accs
def valid(valid_loader, dml_model, args):
dml_model.eval()
costs = [AvgrageMeter() for i in range(dml_model.model_num)]
accs = [AvgrageMeter() for i in range(dml_model.model_num)]
for step_id, (images, labels) in enumerate(valid_loader):
images, labels = to_variable(images), to_variable(labels)
batch_size = images.shape[0]
logits = dml_model.forward(images)
precs = [
fluid.layers.accuracy(
input=l, label=labels, k=1) for l in logits
]
losses = dml_model.loss(logits, labels)
for i in range(dml_model.model_num):
accs[i].update(precs[i].numpy(), batch_size)
costs[i].update(losses[i].numpy(), batch_size)
model_names = dml_model.full_name()
if step_id % args.log_freq == 0:
log_msg = "Valid Step{} ".format(step_id)
for model_id, (cost, acc) in enumerate(zip(costs, accs)):
log_msg += ", {} loss: {:.6f} acc: {:.6f}".format(
model_names[model_id], cost.avg[0], acc.avg[0])
logger.info(log_msg)
return costs, accs
def main(args):
if not args.use_gpu:
place = fluid.CPUPlace()
elif not args.use_parallel:
place = fluid.CUDAPlace(0)
else:
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
with fluid.dygraph.guard(place):
# 1. Define data reader
train_loader, valid_loader = create_reader(place, args)
# 2. Define neural network
models = [
MobileNetV1(class_dim=args.class_num),
MobileNetV1(class_dim=args.class_num)
]
optimizers = create_optimizer(models, args)
# 3. Use PaddleSlim DML strategy
dml_model = DML(models, args.use_parallel)
dml_optimizer = dml_model.opt(optimizers)
# 4. Train your network
save_parameters = (not args.use_parallel) or (
args.use_parallel and fluid.dygraph.parallel.Env().local_rank == 0)
best_valid_acc = [0] * dml_model.model_num
for epoch_id in range(args.epochs):
current_step_lr = dml_optimizer.get_lr()
lr_msg = "Epoch {}".format(epoch_id)
for model_id, lr in enumerate(current_step_lr):
lr_msg += ", {} lr: {:.6f}".format(
dml_model.full_name()[model_id], lr)
logger.info(lr_msg)
train_losses, train_accs = train(train_loader, dml_model,
dml_optimizer, args)
valid_losses, valid_accs = valid(valid_loader, dml_model, args)
for i in range(dml_model.model_num):
if valid_accs[i].avg[0] > best_valid_acc[i]:
best_valid_acc[i] = valid_accs[i].avg[0]
if save_parameters:
fluid.save_dygraph(
models[i].state_dict(),
os.path.join(args.model_save_dir,
dml_model.full_name()[i],
"best_model"))
summery_msg = "Epoch {} {}: valid_loss {:.6f}, valid_acc {:.6f}, best_valid_acc {:.6f}"
logger.info(
summery_msg.format(epoch_id,
dml_model.full_name()[i], valid_losses[
i].avg[0], valid_accs[i].avg[0],
best_valid_acc[i]))
if __name__ == '__main__':
args = parser.parse_args()
print_arguments(args)
main(args)
......@@ -116,8 +116,8 @@ def compress(args):
fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist)
val_reader = paddle.batch(val_reader, batch_size=args.batch_size)
train_reader = paddle.batch(
val_reader = paddle.fluid.io.batch(val_reader, batch_size=args.batch_size)
train_reader = paddle.fluid.io.batch(
train_reader, batch_size=args.batch_size, drop_last=True)
train_feeder = feeder = fluid.DataFeeder([image, label], place)
......
......@@ -34,12 +34,12 @@ add_arg('config_file', str, None, "The config file for comp
model_list = [m for m in dir(models) if "__" not in m]
ratiolist = [
# [0.06, 0.0, 0.09, 0.03, 0.09, 0.02, 0.05, 0.03, 0.0, 0.07, 0.07, 0.05, 0.08],
# [0.08, 0.02, 0.03, 0.13, 0.1, 0.06, 0.03, 0.04, 0.14, 0.02, 0.03, 0.02, 0.01],
]
# [0.06, 0.0, 0.09, 0.03, 0.09, 0.02, 0.05, 0.03, 0.0, 0.07, 0.07, 0.05, 0.08],
# [0.08, 0.02, 0.03, 0.13, 0.1, 0.06, 0.03, 0.04, 0.14, 0.02, 0.03, 0.02, 0.01],
]
def save_model(args, exe, train_prog, eval_prog,info):
def save_model(args, exe, train_prog, eval_prog, info):
model_path = os.path.join(args.model_save_dir, args.model, str(info))
if not os.path.isdir(model_path):
os.makedirs(model_path)
......@@ -58,29 +58,31 @@ def piecewise_decay(args):
regularization=fluid.regularizer.L2Decay(args.l2_decay))
return optimizer
def cosine_decay(args):
step = int(math.ceil(float(args.total_images) / args.batch_size))
learning_rate = fluid.layers.cosine_decay(
learning_rate=args.lr,
step_each_epoch=step,
epochs=args.num_epochs)
learning_rate=args.lr, step_each_epoch=step, epochs=args.num_epochs)
optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate,
momentum=args.momentum_rate,
regularization=fluid.regularizer.L2Decay(args.l2_decay))
return optimizer
def create_optimizer(args):
if args.lr_strategy == "piecewise_decay":
return piecewise_decay(args)
elif args.lr_strategy == "cosine_decay":
return cosine_decay(args)
def compress(args):
class_dim=1000
image_shape="3,224,224"
class_dim = 1000
image_shape = "3,224,224"
image_shape = [int(m) for m in image_shape.split(",")]
assert args.model in model_list, "{} is not in lists: {}".format(args.model, model_list)
assert args.model in model_list, "{} is not in lists: {}".format(
args.model, model_list)
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
# model definition
......@@ -98,18 +100,22 @@ def compress(args):
exe.run(fluid.default_startup_program())
if args.pretrained_model:
def if_exist(var):
exist = os.path.exists(os.path.join(args.pretrained_model, var.name))
print("exist",exist)
exist = os.path.exists(
os.path.join(args.pretrained_model, var.name))
print("exist", exist)
return exist
#fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist)
val_reader = paddle.batch(reader.val(), batch_size=args.batch_size)
train_reader = paddle.batch(
val_reader = paddle.fluid.io.batch(reader.val(), batch_size=args.batch_size)
train_reader = paddle.fluid.io.batch(
reader.train(), batch_size=args.batch_size, drop_last=True)
train_feeder = feeder = fluid.DataFeeder([image, label], place)
val_feeder = feeder = fluid.DataFeeder([image, label], place, program=val_program)
val_feeder = feeder = fluid.DataFeeder(
[image, label], place, program=val_program)
def test(epoch, program):
batch_id = 0
......@@ -117,80 +123,99 @@ def compress(args):
acc_top5_ns = []
for data in val_reader():
start_time = time.time()
acc_top1_n, acc_top5_n = exe.run(program,
feed=train_feeder.feed(data),
fetch_list=[acc_top1.name, acc_top5.name])
acc_top1_n, acc_top5_n = exe.run(
program,
feed=train_feeder.feed(data),
fetch_list=[acc_top1.name, acc_top5.name])
end_time = time.time()
print("Eval epoch[{}] batch[{}] - acc_top1: {}; acc_top5: {}; time: {}".format(epoch, batch_id, np.mean(acc_top1_n), np.mean(acc_top5_n), end_time-start_time))
print(
"Eval epoch[{}] batch[{}] - acc_top1: {}; acc_top5: {}; time: {}".
format(epoch, batch_id,
np.mean(acc_top1_n),
np.mean(acc_top5_n), end_time - start_time))
acc_top1_ns.append(np.mean(acc_top1_n))
acc_top5_ns.append(np.mean(acc_top5_n))
batch_id += 1
print("Final eval epoch[{}] - acc_top1: {}; acc_top5: {}".format(epoch, np.mean(np.array(acc_top1_ns)), np.mean(np.array(acc_top5_ns))))
print("Final eval epoch[{}] - acc_top1: {}; acc_top5: {}".format(
epoch,
np.mean(np.array(acc_top1_ns)), np.mean(np.array(acc_top5_ns))))
def train(epoch, program):
build_strategy = fluid.BuildStrategy()
exec_strategy = fluid.ExecutionStrategy()
train_program = fluid.compiler.CompiledProgram(
program).with_data_parallel(
loss_name=avg_cost.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
program).with_data_parallel(
loss_name=avg_cost.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
batch_id = 0
for data in train_reader():
start_time = time.time()
loss_n, acc_top1_n, acc_top5_n,lr_n = exe.run(train_program,
feed=train_feeder.feed(data),
fetch_list=[avg_cost.name, acc_top1.name, acc_top5.name,"learning_rate"])
loss_n, acc_top1_n, acc_top5_n, lr_n = exe.run(
train_program,
feed=train_feeder.feed(data),
fetch_list=[
avg_cost.name, acc_top1.name, acc_top5.name,
"learning_rate"
])
end_time = time.time()
loss_n = np.mean(loss_n)
acc_top1_n = np.mean(acc_top1_n)
acc_top5_n = np.mean(acc_top5_n)
lr_n = np.mean(lr_n)
print("epoch[{}]-batch[{}] - loss: {}; acc_top1: {}; acc_top5: {};lrn: {}; time: {}".format(epoch, batch_id, loss_n, acc_top1_n, acc_top5_n, lr_n,end_time-start_time))
print(
"epoch[{}]-batch[{}] - loss: {}; acc_top1: {}; acc_top5: {};lrn: {}; time: {}".
format(epoch, batch_id, loss_n, acc_top1_n, acc_top5_n, lr_n,
end_time - start_time))
batch_id += 1
params = []
for param in fluid.default_main_program().global_block().all_parameters():
#if "_weights" in param.name and "conv1_weights" not in param.name:
if "_sep_weights" in param.name:
if "_sep_weights" in param.name:
params.append(param.name)
print("fops before pruning: {}".format(flops(fluid.default_main_program())))
print("fops before pruning: {}".format(
flops(fluid.default_main_program())))
pruned_program_iter = fluid.default_main_program()
pruned_val_program_iter = val_program
for ratios in ratiolist:
pruner = Pruner()
pruned_val_program_iter = pruner.prune(pruned_val_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place,
only_graph=True)
pruned_program_iter = pruner.prune(pruned_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place)
pruned_val_program_iter = pruner.prune(
pruned_val_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place,
only_graph=True)
pruned_program_iter = pruner.prune(
pruned_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place)
print("fops after pruning: {}".format(flops(pruned_program_iter)))
""" do not inherit learning rate """
if(os.path.exists(args.pretrained_model + "/learning_rate")):
os.remove( args.pretrained_model + "/learning_rate")
if(os.path.exists(args.pretrained_model + "/@LR_DECAY_COUNTER@")):
os.remove( args.pretrained_model + "/@LR_DECAY_COUNTER@")
fluid.io.load_vars(exe, args.pretrained_model , main_program = pruned_program_iter, predicate=if_exist)
if (os.path.exists(args.pretrained_model + "/learning_rate")):
os.remove(args.pretrained_model + "/learning_rate")
if (os.path.exists(args.pretrained_model + "/@LR_DECAY_COUNTER@")):
os.remove(args.pretrained_model + "/@LR_DECAY_COUNTER@")
fluid.io.load_vars(
exe,
args.pretrained_model,
main_program=pruned_program_iter,
predicate=if_exist)
pruned_program = pruned_program_iter
pruned_val_program = pruned_val_program_iter
for i in range(args.num_epochs):
train(i, pruned_program)
test(i, pruned_val_program)
save_model(args,exe,pruned_program,pruned_val_program,i)
save_model(args, exe, pruned_program, pruned_val_program, i)
def main():
args = parser.parse_args()
......
......@@ -41,9 +41,10 @@ add_arg('test_period', int, 10, "Test period in epoches.")
model_list = [m for m in dir(models) if "__" not in m]
ratiolist = [
# [0.06, 0.0, 0.09, 0.03, 0.09, 0.02, 0.05, 0.03, 0.0, 0.07, 0.07, 0.05, 0.08],
# [0.08, 0.02, 0.03, 0.13, 0.1, 0.06, 0.03, 0.04, 0.14, 0.02, 0.03, 0.02, 0.01],
]
# [0.06, 0.0, 0.09, 0.03, 0.09, 0.02, 0.05, 0.03, 0.0, 0.07, 0.07, 0.05, 0.08],
# [0.08, 0.02, 0.03, 0.13, 0.1, 0.06, 0.03, 0.04, 0.14, 0.02, 0.03, 0.02, 0.01],
]
def piecewise_decay(args):
step = int(math.ceil(float(args.total_images) / args.batch_size))
......@@ -121,8 +122,8 @@ def compress(args):
# fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist)
val_reader = paddle.batch(val_reader, batch_size=args.batch_size)
train_reader = paddle.batch(
val_reader = paddle.fluid.io.batch(val_reader, batch_size=args.batch_size)
train_reader = paddle.fluid.io.batch(
train_reader, batch_size=args.batch_size, drop_last=True)
train_feeder = feeder = fluid.DataFeeder([image, label], place)
......@@ -194,21 +195,26 @@ def compress(args):
for ratios in ratiolist:
pruner = Pruner()
pruned_val_program_iter = pruner.prune(pruned_val_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place,
only_graph=True)
pruned_program_iter = pruner.prune(pruned_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place)
pruned_val_program_iter = pruner.prune(
pruned_val_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place,
only_graph=True)
pruned_program_iter = pruner.prune(
pruned_program_iter,
fluid.global_scope(),
params=params,
ratios=ratios,
place=place)
print("fops after pruning: {}".format(flops(pruned_program_iter)))
fluid.io.load_vars(exe, args.pretrained_model , main_program = pruned_program_iter, predicate=if_exist)
fluid.io.load_vars(
exe,
args.pretrained_model,
main_program=pruned_program_iter,
predicate=if_exist)
pruner = AutoPruner(
pruned_val_program_iter,
......@@ -238,8 +244,6 @@ def compress(args):
pruner.reward(score)
def main():
args = parser.parse_args()
print_arguments(args)
......
CUDA_VISIBLE_DEVICES=0 python2 -u train_cell_base.py
import paddle.fluid as fluid
from paddleslim.teachers.bert.reader.cls import *
from paddleslim.nas.darts.search_space import AdaBERTClassifier
from paddleslim.nas.darts import DARTSearch
def main():
place = fluid.CUDAPlace(0)
BERT_BASE_PATH = "./data/pretrained_models/uncased_L-12_H-768_A-12/"
bert_config_path = BERT_BASE_PATH + "/bert_config.json"
vocab_path = BERT_BASE_PATH + "/vocab.txt"
data_dir = "./data/glue_data/MNLI/"
max_seq_len = 512
do_lower_case = True
batch_size = 32
epoch = 30
processor = MnliProcessor(
data_dir=data_dir,
vocab_path=vocab_path,
max_seq_len=max_seq_len,
do_lower_case=do_lower_case,
in_tokens=False)
train_reader = processor.data_generator(
batch_size=batch_size,
phase='train',
epoch=epoch,
dev_count=1,
shuffle=True)
val_reader = processor.data_generator(
batch_size=batch_size,
phase='train',
epoch=epoch,
dev_count=1,
shuffle=True)
with fluid.dygraph.guard(place):
model = AdaBERTClassifier(
3,
teacher_model="/work/PaddleSlim/demo/bert_1/checkpoints/steps_23000"
)
searcher = DARTSearch(
model,
train_reader,
val_reader,
batchsize=batch_size,
num_epochs=epoch,
log_freq=10)
searcher.train()
if __name__ == '__main__':
main()
import numpy as np
from itertools import izip
import paddle.fluid as fluid
from paddleslim.teachers.bert.reader.cls import *
from paddleslim.nas.darts.search_space import AdaBERTClassifier
from paddleslim.nas.darts.architect_for_bert import Architect
import logging
from paddleslim.common import AvgrageMeter, get_logger
logger = get_logger(__name__, level=logging.INFO)
def count_parameters_in_MB(all_params):
parameters_number = 0
for param in all_params:
if param.trainable:
parameters_number += np.prod(param.shape)
return parameters_number / 1e6
def model_loss(model, data_ids):
# src_ids = data_ids[0]
# position_ids = data_ids[1]
# sentence_ids = data_ids[2]
# input_mask = data_ids[3]
labels = data_ids[4]
labels.stop_gradient = True
enc_output = model(data_ids)
ce_loss, probs = fluid.layers.softmax_with_cross_entropy(
logits=enc_output, label=labels, return_softmax=True)
loss = fluid.layers.mean(x=ce_loss)
num_seqs = fluid.layers.create_tensor(dtype='int64')
accuracy = fluid.layers.accuracy(input=probs, label=labels, total=num_seqs)
return loss, accuracy
def train_one_epoch(model, architect, train_loader, valid_loader, optimizer,
epoch, use_data_parallel, log_freq):
ce_losses = AvgrageMeter()
accs = AvgrageMeter()
model.train()
step_id = 0
for train_data, valid_data in izip(train_loader(), valid_loader):
architect.step(train_data, valid_data)
loss, acc = model_loss(model, train_data)
if use_data_parallel:
loss = model.scale_loss(loss)
loss.backward()
model.apply_collective_grads()
else:
loss.backward()
optimizer.minimize(loss)
model.clear_gradients()
batch_size = train_data[0].shape[0]
ce_losses.update(loss.numpy(), batch_size)
accs.update(acc.numpy(), batch_size)
if step_id % log_freq == 0:
logger.info(
"Train Epoch {}, Step {}, Lr {:.6f} loss {:.6f}; acc: {:.6f};".
format(epoch, step_id,
optimizer.current_step_lr(), ce_losses.avg[0], accs.avg[
0]))
step_id += 1
def valid_one_epoch(model, valid_loader, epoch, log_freq):
ce_losses = AvgrageMeter()
accs = AvgrageMeter()
model.eval()
step_id = 0
for valid_data in valid_loader():
loss, acc = model_loss(model, valid_data)
batch_size = valid_data[0].shape[0]
ce_losses.update(loss.numpy(), batch_size)
accs.update(acc.numpy(), batch_size)
if step_id % log_freq == 0:
logger.info("Valid Epoch {}, Step {}, loss {:.6f}; acc: {:.6f};".
format(epoch, step_id, ce_losses.avg[0], accs.avg[0]))
step_id += 1
def main():
use_data_parallel = False
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env(
).dev_id) if use_data_parallel else fluid.CUDAPlace(0)
BERT_BASE_PATH = "./data/pretrained_models/uncased_L-12_H-768_A-12"
bert_config_path = BERT_BASE_PATH + "/bert_config.json"
vocab_path = BERT_BASE_PATH + "/vocab.txt"
data_dir = "./data/glue_data/MNLI/"
teacher_model_dir = "./teacher_model/steps_23000"
num_samples = 392702
max_seq_len = 128
do_lower_case = True
batch_size = 128
hidden_size = 768
emb_size = 768
max_layer = 8
epoch = 80
log_freq = 10
use_fixed_gumbel = True
processor = MnliProcessor(
data_dir=data_dir,
vocab_path=vocab_path,
max_seq_len=max_seq_len,
do_lower_case=do_lower_case,
in_tokens=False)
train_reader = processor.data_generator(
batch_size=batch_size,
phase='search_train',
epoch=1,
dev_count=1,
shuffle=True)
val_reader = processor.data_generator(
batch_size=batch_size,
phase='search_valid',
epoch=1,
dev_count=1,
shuffle=True)
if use_data_parallel:
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
valid_reader = fluid.contrib.reader.distributed_batch_reader(
valid_reader)
with fluid.dygraph.guard(place):
model = AdaBERTClassifier(
3,
n_layer=max_layer,
hidden_size=hidden_size,
emb_size=emb_size,
teacher_model=teacher_model_dir,
data_dir=data_dir,
use_fixed_gumbel=use_fixed_gumbel)
if use_data_parallel:
strategy = fluid.dygraph.parallel.prepare_context()
model = fluid.dygraph.parallel.DataParallel(model, strategy)
device_num = fluid.dygraph.parallel.Env().nranks
step_per_epoch = int(num_samples / (batch_size * device_num))
learning_rate = fluid.dygraph.CosineDecay(2e-2, step_per_epoch, epoch)
model_parameters = [
p for p in model.parameters()
if p.name not in [a.name for a in model.arch_parameters()]
]
clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=5.0)
optimizer = fluid.optimizer.MomentumOptimizer(
learning_rate,
0.9,
regularization=fluid.regularizer.L2DecayRegularizer(3e-4),
parameter_list=model_parameters,
grad_clip=clip)
train_loader = fluid.io.DataLoader.from_generator(
capacity=1024,
use_double_buffer=True,
iterable=True,
return_list=True)
valid_loader = fluid.io.DataLoader.from_generator(
capacity=1024,
use_double_buffer=True,
iterable=True,
return_list=True)
train_loader.set_batch_generator(train_reader, places=place)
valid_loader.set_batch_generator(val_reader, places=place)
architect = Architect(model, learning_rate, 3e-4, place, False)
for epoch_id in range(epoch):
train_one_epoch(model, architect, train_loader, valid_loader,
optimizer, epoch_id, use_data_parallel, log_freq)
valid_one_epoch(model, valid_loader, epoch_id, log_freq)
print(model.student._encoder.alphas.numpy())
print("=" * 100)
if __name__ == '__main__':
main()
import paddle.fluid as fluid
from paddleslim.teachers.bert import BERTClassifier
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
with fluid.dygraph.guard(place):
bert = BERTClassifier(3)
bert.fit("./data/glue_data/MNLI/",
5,
batch_size=32,
use_data_parallel=True,
learning_rate=0.00005,
save_steps=1000)
......@@ -2,9 +2,31 @@
本示例介绍如何使用PaddlePaddle进行可微分架构搜索,可以直接使用[DARTS](https://arxiv.org/abs/1806.09055)[PC-DARTS](https://arxiv.org/abs/1907.05737)两种方法,也支持自定义修改后使用其他可微分架构搜索算法。
本示例目录结构如下:
```
├── genotypes.py 搜索过程得到的模型结构Genotypes
├── model.py 对搜索得到的子网络组网
├── model_search.py 对搜索前的超网络组网
├── operations.py 用于搜索的多种运算符组合
├── reader.py 数据读取与增广部分
├── search.py 模型结构搜索入口
├── train.py CIFAR10数据集评估训练入口
├── train_imagenet.py ImageNet数据集评估训练入口
├── visualize.py 模型结构可视化入口
```
## 依赖项
> PaddlePaddle >= 1.7.0, graphviz >= 0.11.1
PaddlePaddle >= 1.8.0, PaddleSlim >= 1.1.0, graphviz >= 0.11.1
## 数据集
......@@ -20,6 +42,15 @@ python search.py # DARTS一阶近似搜索方法
python search.py --unrolled=True # DARTS的二阶近似搜索方法
python search.py --method='PC-DARTS' --batch_size=256 --learning_rate=0.1 --arch_learning_rate=6e-4 --epochs_no_archopt=15 # PC-DARTS搜索方法
```
如果您使用的是docker环境,请确保共享内存足够使用多进程的dataloader,如果碰到共享内存问题,请设置`--use_multiprocess=False`
也可以使用多卡进行模型结构搜索,以4卡为例(GPU id: 0-3), 启动命令如下:
```bash
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog search.py --use_data_parallel 1
```
因为使用多卡训练总的BatchSize会扩大n倍,n代表卡数,为了获得与单卡相当的准确率效果,请相应的将初始学习率扩大n倍。
模型结构随搜索轮数的变化如图1所示。需要注意的是,图中准确率Acc并不代表该结构最终准确率,为了获得当前结构的最佳准确率,请对得到的genotype做网络结构评估训练。
......@@ -40,6 +71,15 @@ python train.py --arch='PC_DARTS' # 在CIFAR10数据集上对搜索
python train_imagenet.py --arch='PC_DARTS' # 在ImageNet数据集上对搜索得到的结构评估训练
```
同样,也支持用多卡进行评估训练, 以4卡为例(GPU id: 0-3), 启动命令如下:
```bash
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train.py --use_data_parallel 1 --arch='DARTS_V2'
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_imagenet.py --use_data_parallel 1 --arch='DARTS_V2'
```
同理,使用多卡训练总的BatchSize会扩大n倍,n代表卡数,为了获得与单卡相当的准确率效果,请相应的将初始学习率扩大n倍。
对搜索到的`DARTS_V1``DARTS_V2``PC-DARTS`做评估训练的结果如下:
| 模型结构 | 数据集 | 准确率 |
......
......@@ -37,8 +37,7 @@ IMAGE_DEPTH = 3
CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
URL_PREFIX = 'https://www.cs.toronto.edu/~kriz/'
CIFAR10_URL = URL_PREFIX + 'cifar-10-python.tar.gz'
CIFAR10_URL = 'https://dataset.bj.bcebos.com/cifar%2Fcifar-10-python.tar.gz'
CIFAR10_MD5 = 'c58f30108f718f92721af3b95e74349a'
paddle.dataset.common.DATA_HOME = "dataset/"
......@@ -140,32 +139,10 @@ def train_search(batch_size, train_portion, is_shuffle, args):
split_point = int(np.floor(train_portion * len(datasets)))
train_datasets = datasets[:split_point]
val_datasets = datasets[split_point:]
train_readers = []
val_readers = []
n = int(math.ceil(len(train_datasets) // args.num_workers)
) if args.use_multiprocess else len(train_datasets)
train_datasets_lists = [
train_datasets[i:i + n] for i in range(0, len(train_datasets), n)
reader = [
reader_generator(train_datasets, batch_size, True, True, args),
reader_generator(val_datasets, batch_size, True, True, args)
]
val_datasets_lists = [
val_datasets[i:i + n] for i in range(0, len(val_datasets), n)
]
for pid in range(len(train_datasets_lists)):
train_readers.append(
reader_generator(train_datasets_lists[pid], batch_size, True, True,
args))
val_readers.append(
reader_generator(val_datasets_lists[pid], batch_size, True, True,
args))
if args.use_multiprocess:
reader = [
paddle.reader.multiprocess_reader(train_readers, False),
paddle.reader.multiprocess_reader(val_readers, False)
]
else:
reader = [train_readers[0], val_readers[0]]
return reader
......@@ -174,18 +151,8 @@ def train_valid(batch_size, is_train, is_shuffle, args):
datasets = cifar10_reader(
paddle.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5),
name, is_shuffle, args)
n = int(math.ceil(len(datasets) // args.
num_workers)) if args.use_multiprocess else len(datasets)
datasets_lists = [datasets[i:i + n] for i in range(0, len(datasets), n)]
multi_readers = []
for pid in range(len(datasets_lists)):
multi_readers.append(
reader_generator(datasets_lists[pid], batch_size, is_train,
is_shuffle, args))
if args.use_multiprocess:
reader = paddle.reader.multiprocess_reader(multi_readers, False)
else:
reader = multi_readers[0]
reader = reader_generator(datasets, batch_size, is_train, is_shuffle, args)
return reader
......
......@@ -35,9 +35,7 @@ add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('log_freq', int, 50, "Log frequency.")
add_arg('use_multiprocess', bool, False, "Whether use multiprocess reader.")
add_arg('num_workers', int, 4, "The multiprocess reader number.")
add_arg('data', str, 'dataset/cifar10',"The dir of dataset.")
add_arg('use_multiprocess', bool, True, "Whether use multiprocess reader.")
add_arg('batch_size', int, 64, "Minibatch size.")
add_arg('learning_rate', float, 0.025, "The start learning rate.")
add_arg('momentum', float, 0.9, "Momentum.")
......@@ -80,6 +78,7 @@ def main(args):
model,
train_reader,
valid_reader,
place,
learning_rate=args.learning_rate,
batchsize=args.batch_size,
num_imgs=args.trainset_num,
......@@ -87,8 +86,9 @@ def main(args):
unrolled=args.unrolled,
num_epochs=args.epochs,
epochs_no_archopt=args.epochs_no_archopt,
use_gpu=args.use_gpu,
use_multiprocess=args.use_multiprocess,
use_data_parallel=args.use_data_parallel,
save_dir=args.model_save_dir,
log_freq=args.log_freq)
searcher.train()
......
......@@ -19,13 +19,14 @@ from __future__ import print_function
import os
import sys
import ast
import logging
import argparse
import functools
import logging
import paddle.fluid as fluid
from paddle.fluid.dygraph.base import to_variable
from paddleslim.common import AvgrageMeter, get_logger
from paddleslim.nas.darts import count_parameters_in_MB
import genotypes
import reader
......@@ -38,8 +39,7 @@ parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('use_multiprocess', bool, False, "Whether use multiprocess reader.")
add_arg('num_workers', int, 4, "The multiprocess reader number.")
add_arg('use_multiprocess', bool, True, "Whether use multiprocess reader.")
add_arg('data', str, 'dataset/cifar10',"The dir of dataset.")
add_arg('batch_size', int, 96, "Minibatch size.")
add_arg('learning_rate', float, 0.025, "The start learning rate.")
......@@ -140,9 +140,6 @@ def main(args):
if args.use_data_parallel else fluid.CUDAPlace(0)
with fluid.dygraph.guard(place):
if args.use_data_parallel:
strategy = fluid.dygraph.parallel.prepare_context()
genotype = eval("genotypes.%s" % args.arch)
model = Network(
C=args.init_channels,
......@@ -151,7 +148,12 @@ def main(args):
auxiliary=args.auxiliary,
genotype=genotype)
step_per_epoch = int(args.trainset_num / args.batch_size)
logger.info("param size = {:.6f}MB".format(
count_parameters_in_MB(model.parameters())))
device_num = fluid.dygraph.parallel.Env().nranks
step_per_epoch = int(args.trainset_num /
(args.batch_size * device_num))
learning_rate = fluid.dygraph.CosineDecay(args.learning_rate,
step_per_epoch, args.epochs)
clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=args.grad_clip)
......@@ -163,18 +165,21 @@ def main(args):
grad_clip=clip)
if args.use_data_parallel:
strategy = fluid.dygraph.parallel.prepare_context()
model = fluid.dygraph.parallel.DataParallel(model, strategy)
train_loader = fluid.io.DataLoader.from_generator(
capacity=64,
use_double_buffer=True,
iterable=True,
return_list=True)
return_list=True,
use_multiprocess=args.use_multiprocess)
valid_loader = fluid.io.DataLoader.from_generator(
capacity=64,
use_double_buffer=True,
iterable=True,
return_list=True)
return_list=True,
use_multiprocess=args.use_multiprocess)
train_reader = reader.train_valid(
batch_size=args.batch_size,
......@@ -186,13 +191,13 @@ def main(args):
is_train=False,
is_shuffle=False,
args=args)
train_loader.set_batch_generator(train_reader, places=place)
valid_loader.set_batch_generator(valid_reader, places=place)
if args.use_data_parallel:
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
train_loader.set_batch_generator(train_reader, places=place)
valid_loader.set_batch_generator(valid_reader, places=place)
save_parameters = (not args.use_data_parallel) or (
args.use_data_parallel and
fluid.dygraph.parallel.Env().local_rank == 0)
......
......@@ -19,13 +19,15 @@ from __future__ import print_function
import os
import sys
import ast
import logging
import argparse
import functools
import logging
import paddle.fluid as fluid
from paddle.fluid.dygraph.base import to_variable
from paddleslim.common import AvgrageMeter, get_logger
from paddleslim.nas.darts import count_parameters_in_MB
import genotypes
import reader
from model import NetworkImageNet as Network
......@@ -66,7 +68,7 @@ add_arg('use_data_parallel', ast.literal_eval, False, "The flag indicating whet
def cross_entropy_label_smooth(preds, targets, epsilon):
preds = fluid.layers.softmax(preds)
targets_one_hot = fluid.layers.one_hot(input=targets, depth=args.class_num)
targets_one_hot = fluid.one_hot(input=targets, depth=args.class_num)
targets_smooth = fluid.layers.label_smooth(
targets_one_hot, epsilon=epsilon, dtype="float32")
loss = fluid.layers.cross_entropy(
......@@ -152,9 +154,6 @@ def main(args):
if args.use_data_parallel else fluid.CUDAPlace(0)
with fluid.dygraph.guard(place):
if args.use_data_parallel:
strategy = fluid.dygraph.parallel.prepare_context()
genotype = eval("genotypes.%s" % args.arch)
model = Network(
C=args.init_channels,
......@@ -163,7 +162,12 @@ def main(args):
auxiliary=args.auxiliary,
genotype=genotype)
step_per_epoch = int(args.trainset_num / args.batch_size)
logger.info("param size = {:.6f}MB".format(
count_parameters_in_MB(model.parameters())))
device_num = fluid.dygraph.parallel.Env().nranks
step_per_epoch = int(args.trainset_num /
(args.batch_size * device_num))
learning_rate = fluid.dygraph.ExponentialDecay(
args.learning_rate,
step_per_epoch,
......@@ -179,6 +183,7 @@ def main(args):
grad_clip=clip)
if args.use_data_parallel:
strategy = fluid.dygraph.parallel.prepare_context()
model = fluid.dygraph.parallel.DataParallel(model, strategy)
train_loader = fluid.io.DataLoader.from_generator(
......@@ -199,20 +204,19 @@ def main(args):
valid_reader = fluid.io.batch(
reader.imagenet_reader(args.data_dir, 'val'),
batch_size=args.batch_size)
train_loader.set_sample_list_generator(train_reader, places=place)
valid_loader.set_sample_list_generator(valid_reader, places=place)
if args.use_data_parallel:
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
train_loader.set_sample_list_generator(train_reader, places=place)
valid_loader.set_sample_list_generator(valid_reader, places=place)
save_parameters = (not args.use_data_parallel) or (
args.use_data_parallel and
fluid.dygraph.parallel.Env().local_rank == 0)
best_top1 = 0
for epoch in range(args.epochs):
logging.info('Epoch {}, lr {:.6f}'.format(
logger.info('Epoch {}, lr {:.6f}'.format(
epoch, optimizer.current_step_lr()))
train_top1, train_top5 = train(model, train_loader, optimizer,
epoch, args)
......
......@@ -133,9 +133,9 @@ def compress(args):
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
train_reader, batch_size=args.batch_size, drop_last=True)
val_reader = paddle.batch(
val_reader = paddle.fluid.io.batch(
val_reader, batch_size=args.batch_size, drop_last=True)
val_program = student_program.clone(for_test=True)
......
......@@ -165,7 +165,7 @@
"metadata": {},
"outputs": [],
"source": [
"train_reader = paddle.batch(\n",
"train_reader = paddle.fluid.io.batch(\n",
" paddle.dataset.mnist.train(), batch_size=128, drop_last=True)\n",
"train_feeder = fluid.DataFeeder(['image', 'label'], fluid.CPUPlace(), student_program)"
]
......
CMAKE_MINIMUM_REQUIRED(VERSION 3.2)
project(mkldnn_quantaware_demo CXX C)
set(DEMO_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
set(DEMO_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
option(USE_GPU "Compile the inference code with the support CUDA GPU" OFF)
option(USE_PROFILER "Whether enable Paddle's profiler." OFF)
set(USE_SHARED OFF)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
if(NOT PADDLE_ROOT)
set(PADDLE_ROOT ${DEMO_SOURCE_DIR}/fluid_inference)
endif()
find_package(Fluid)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -std=c++11")
if(USE_PROFILER)
find_package(Gperftools REQUIRED)
include_directories(${GPERFTOOLS_INCLUDE_DIR})
add_definitions(-DWITH_GPERFTOOLS)
endif()
include_directories(${CMAKE_CURRENT_SOURCE_DIR})
if(PADDLE_FOUND)
add_executable(inference sample_tester.cc)
target_link_libraries(inference
${PADDLE_LIBRARIES}
${PADDLE_THIRD_PARTY_LIBRARIES}
rt dl pthread)
if (mklml_FOUND)
target_link_libraries(inference "-L${THIRD_PARTY_ROOT}/install/mklml/lib -liomp5 -Wl,--as-needed")
endif()
else()
message(FATAL_ERROR "Cannot find PaddlePaddle Fluid under ${PADDLE_ROOT}")
endif()
# 图像分类INT8模型在CPU优化部署和预测
## 概述
本文主要介绍在CPU上转化、部署和执行PaddleSlim产出的量化模型的流程。在Intel(R) Xeon(R) Gold 6271机器上,量化后的INT8模型为优化后FP32模型的3-4倍,而精度仅有极小下降。
流程步骤如下:
- 产出量化模型:使用PaddleSlim训练产出量化模型,注意模型的weights的值应该在INT8范围内,但是类型仍为float型。
- CPU转换量化模型:在CPU上使用DNNL转化量化模型为真正的INT8模型
- CPU部署预测:在CPU上部署demo应用并预测
## 1. 准备
#### 安装构建PaddleSlim
PaddleSlim 安装请参考[官方安装文档](https://paddlepaddle.github.io/PaddleSlim/install.html)安装
```
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
python setup.py install
```
#### 在代码中使用
在用户自己的测试样例中,按以下方式导入Paddle和PaddleSlim:
```
import paddle
import paddle.fluid as fluid
import paddleslim as slim
import numpy as np
```
## 2. 用PaddleSlim产出量化模型
使用PaddleSlim产出量化训练模型或者离线量化模型。
#### 2.1 量化训练
量化训练流程可以参考 [分类模型的量化训练流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/)
**注意量化训练过程中config参数:**
- **quantize_op_types:** 目前CPU上支持量化 `depthwise_conv2d`, `mul`, `conv2d`, `matmul`, `transpose2`, `reshape2`, `pool2d`, `scale`。但是训练阶段插入fake quantize/dequantize op时,只需在前四种op前后插入fake quantize/dequantize ops,因为后面四种op `matmul`, `transpose2`, `reshape2`, `pool2d`的输入输出scale不变,将从前后方op的输入输出scales获得scales,所以`quantize_op_types` 参数只需要 `depthwise_conv2d`, `mul`, `conv2d`, `matmul` 即可。
- **其他参数:** 请参考 [PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware)
#### 2.2 静态离线量化
静态离线量化模型产出可以参考[分类模型的静态离线量化流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/#_1)
## 3. 转化产出的量化模型为DNNL优化后的INT8模型
为了部署在CPU上,我们将保存的quant模型,通过一个转化脚本,移除fake quantize/dequantize op,fuse一些op,并且完全转化成 INT8 模型。需要使用Paddle所在目录运行下面的脚本,脚本在官网的位置为[save_qat_model.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/save_qat_model.py)。复制脚本到demo所在目录下(`/PATH_TO_PaddleSlim/demo/mkldnn_quant/quant_aware/`)并执行如下命令:
```
python save_qat_model.py --qat_model_path=/PATH/TO/SAVE/FLOAT32/QAT/MODEL --int8_model_save_path=/PATH/TO/SAVE/INT8/MODEL -ops_to_quantize="conv2d,pool2d"
```
**参数说明:**
- **qat_model_path:** 为输入参数,必填。为量化训练产出的quant模型。
- **int8_model_save_path:** 将quant模型经过DNNL优化量化后保存的最终INT8模型路径。注意:qat_model_path必须传入量化训练后的含有fake quant/dequant ops的quant模型
- **ops_to_quantize:** 必填,不可以不设置。表示最终INT8模型中使用量化op的列表。图像分类模型请设置`--ops_to_quantize=“conv2d, pool2d"`。自然语言处理模型,如Ernie模型,请设置`--ops_to_quantize="fc,reshape2,transpose2,matmul"`。用户必须手动设置,因为不是量化所有可量化的op就能达到最优速度。
注意:
- 目前支持DNNL量化op列表是`conv2d`, `depthwise_conv2d`, `mul`, `fc`, `matmul`, `pool2d`, `reshape2`, `transpose2`, `concat`,只能从这个列表中选择。
- 量化所有可量化的Op不一定性能最优,所以用户要手动输入。比如,如果一个op是单个的INT8 op, 不可以与之前的和之后的op融合,那么为了量化这个op,需要先做quantize,然后运行INT8 op, 再dequantize, 这样可能导致最终性能不如保持该op为fp32 op。由于用户模型未知,这里不给出默认设置。图像分类和NLP任务的设置建议已给出。
- 一个有效找到最优配置的方法是,用户观察这个模型一共用到了哪些可量化的op,选出不同的`ops_to_quantize`组合,多运行几次。
## 4. 预测
### 4.1 数据预处理转化
在精度和性能预测中,需要先对数据进行二进制转化。运行脚本如下可转化完整ILSVRC2012 val数据集。使用`--local`可以转化用户自己的数据。在Paddle所在目录运行下面的脚本。脚本在官网位置为[full_ILSVRC2012_val_preprocess.py](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py)
```
python Paddle/paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py --local --data_dir=/PATH/TO/USER/DATASET/ --output_file=/PATH/TO/SAVE/BINARY/FILE
```
可选参数:
- 不设置任何参数。脚本将下载 ILSVRC2012_img_val数据集,并转化为二进制文件。
- **local:** 设置便为true,表示用户将提供自己的数据
- **data_dir:** 用户自己的数据目录
- **label_list:** 图片路径-图片类别列表文件,类似于`val_list.txt`
- **output_file:** 生成的binary文件路径。
- **data_dim:** 预处理图片的长和宽。默认值 224。
用户自己的数据集目录结构应该如下
```
imagenet_user
├── val
│   ├── ILSVRC2012_val_00000001.jpg
│   ├── ILSVRC2012_val_00000002.jpg
| |── ...
└── val_list.txt
```
其中,val_list.txt 内容应该如下:
```
val/ILSVRC2012_val_00000001.jpg 0
val/ILSVRC2012_val_00000002.jpg 0
```
注意:
- 为什么将数据集转化为二进制文件?因为paddle中的数据预处理(resize, crop等)都使用pythong.Image模块进行,训练出的模型也是基于Python预处理的图片,但是我们发现Python测试性能开销很大,导致预测性能下降。为了获得良好性能,在量化模型预测阶段,我们决定使用C++测试,而C++只支持Open-CV等库,Paddle不建议使用外部库,因此我们使用Python将图片预处理然后放入二进制文件,再在C++测试中读出。用户根据自己的需要,可以更改C++测试以使用open-cv库直接读数据并预处理,精度不会有太大下降。我们还提供了python测试`sample_tester.py`作为参考,与C++测试`sample_tester.cc`相比,用户可以看到Python测试更大的性能开销。
### 4.2 部署预测
#### 部署前提
- 只有使用AVX512系列CPU服务器才能获得性能提升。用户可以通过在命令行红输入`lscpu`查看本机支持指令。
- 在支持`avx512_vnni`的CPU服务器上,INT8精度最高,性能提升最快。
#### 准备预测推理库
用户可以从源码编译Paddle推理库,也可以直接下载推理库。
- 用户可以从Paddle源码编译Paddle推理库,参考[从源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12),使用release/2.0以上版本。
- 用户也可以从Paddle官网下载发布的[预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。请选择`ubuntu14.04_cpu_avx_mkl` 最新发布版或者develop版。
你可以将准备好的预测库解压并重命名为fluid_inference,放在当前目录下(`/PATH_TO_PaddleSlim/demo/mkldnn_quant/quant_aware/`)。或者在cmake时通过设置PADDLE_ROOT来指定Paddle预测库的位置。
#### 编译应用
样例所在目录为PaddleSlim下`demo/mkldnn_quant/quant_aware/`,样例`sample_tester.cc`和编译所需`cmake`文件夹都在这个目录下。
```
cd /PATH/TO/PaddleSlim
cd demo/mkldnn_quant/quant_aware
mkdir build
cd build
make -j
```
如果你从官网下载解压了[预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)到当前目录下,这里`-DPADDLE_ROOT`可以不设置,因为`DPADDLE_ROOT`默认位置`demo/mkldnn_quant/quant_aware/fluid_inference`
#### 运行测试
```
# Bind threads to cores
export KMP_AFFINITY=granularity=fine,compact,1,0
export KMP_BLOCKTIME=1
# Turbo Boost could be set to OFF using the command
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
# In the file run.sh, set `MODEL_DIR` to `/PATH/TO/FLOAT32/MODEL`或者`/PATH/TO/SAVE/INT8/MODEL`
# In the file run.sh, set `DATA_FILE` to `/PATH/TO/SAVE/BINARY/FILE`
# For 1 thread performance:
./run.sh
# For 20 thread performance:
./run.sh -1 20
```
运行时需要配置以下参数:
- **infer_model:** 模型所在目录,注意模型参数当前必须是分开保存成多个文件的。可以设置为`PATH/TO/SAVE/INT8/MODEL`, `PATH/TO/SAVE/FLOAT32/MODEL`。无默认值。
- **infer_data:** 测试数据文件所在路径。注意需要是经`full_ILSVRC2012_val_preprocess`转化后的binary文件。
- **batch_size:** 预测batch size大小。默认值为50。
- **iterations:** 预测多少batches。默认为0,表示预测infer_data中所有batches (image numbers/batch size)
- **num_threads:** 预测使用CPU 线程数,默认为单核一个线程。
- **with_accuracy_layer:** 由于这个测试是Image Classification通用的测试,既可以测试float32模型也可以INT8模型,模型可以包含或者不包含label层,设置此参数更改。
- **optimize_fp32_model** 是否优化测试FP32模型。样例可以测试保存的INT8模型,也可以优化(fuses等)并测试优化后的FP32模型。默认为False,表示测试转化好的INT8模型,此处无需优化。
- **use_profile:** 由Paddle预测库中提供,设置用来进行性能分析。默认值为false。
你可以直接修改`/PATH_TO_PaddleSlim/demo/mkldnn_quant/quant_aware/`目录下的`run.sh`中的MODEL_DIR和DATA_DIR,即可执行`./run.sh`进行CPU预测。
### 4.3 用户编写自己的测试:
如果用户编写自己的测试:
1. 测试INT8模型
如果用户测试转化好的INT8模型,使用 paddle::NativeConfig 即可测试。在demo中,设置`optimize_fp32_model`为false。
2. 测试FP32模型
如果用户要测试PF32模型,可以使用AnalysisConfig对原始FP32模型先优化(fuses等)再测试。AnalysisConfig配置设置如下:
```
static void SetConfig(paddle::AnalysisConfig *cfg) {
cfg->SetModel(FLAGS_infer_model); // 必须。表示需要测试的模型
cfg->DisableGpu(); // 必须。部署在CPU上预测,必须Disablegpu
cfg->EnableMKLDNN(); //必须。表示使用MKLDNN算子,将比 native 快
cfg->SwitchIrOptim(); // 如果传入FP32原始,这个配置设置为true将优化加速模型(如进行fuses等)
cfg->SetCpuMathLibraryNumThreads(FLAGS_num_threads); //默认设置为1。表示多线程运行
if(FLAGS_use_profile){
cfg->EnableProfile(); // 可选。如果设置use_profile,运行结束将展现各个算子所占用时间
}
}
```
在我们提供的样例中,只要设置`optimize_fp32_model`为true,`infer_model`传入原始FP32模型,AnalysisConfig的上述设置将被执行,传入的FP32模型将被DNNL优化加速(包括fuses等)。
如果infer_model传入INT8模型,则optimize_fp32_model将不起作用,因为INT8模型已经被优化量化。
如果infer_model传入PaddleSlim产出的模型,optimize_fp32_model也不起作用,因为quant模型包含fake quantize/dequantize ops,无法fuse,无法优化。
## 5. 精度和性能数据
INT8模型精度和性能结果参考[CPU部署预测INT8模型的精度和性能](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/docs/zh_cn/tutorials/image_classification_mkldnn_quant_aware_tutorial.md)
## FAQ
- 自然语言处理模型在CPU上的部署和预测参考样例[ERNIE 模型 QAT INT8 精度与性能复现](https://github.com/PaddlePaddle/benchmark/tree/master/Inference/c%2B%2B/ernie/mkldnn)
- 具体DNNL优化原理可以查看[SLIM QAT for INT8 DNNL](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License
set(PADDLE_FOUND OFF)
if(NOT PADDLE_ROOT)
set(PADDLE_ROOT $ENV{PADDLE_ROOT} CACHE PATH "Paddle Path")
endif()
if(NOT PADDLE_ROOT)
message(FATAL_ERROR "Set PADDLE_ROOT as your root directory installed PaddlePaddle")
endif()
set(THIRD_PARTY_ROOT ${PADDLE_ROOT}/third_party)
if(USE_GPU)
set(CUDA_ROOT $ENV{CUDA_ROOT} CACHE PATH "CUDA root Path")
set(CUDNN_ROOT $ENV{CUDNN_ROOT} CACHE PATH "CUDNN root Path")
endif()
# Support directory orgnizations
find_path(PADDLE_INC_DIR NAMES paddle_inference_api.h PATHS ${PADDLE_ROOT}/paddle/include)
if(PADDLE_INC_DIR)
set(LIB_PATH "paddle/lib")
else()
find_path(PADDLE_INC_DIR NAMES paddle/fluid/inference/paddle_inference_api.h PATHS ${PADDLE_ROOT})
if(PADDLE_INC_DIR)
include_directories(${PADDLE_ROOT}/paddle/fluid/inference)
endif()
set(LIB_PATH "paddle/fluid/inference")
endif()
include_directories(${PADDLE_INC_DIR})
find_library(PADDLE_FLUID_SHARED_LIB NAMES "libpaddle_fluid.so" PATHS
${PADDLE_ROOT}/${LIB_PATH})
find_library(PADDLE_FLUID_STATIC_LIB NAMES "libpaddle_fluid.a" PATHS
${PADDLE_ROOT}/${LIB_PATH})
if(USE_SHARED AND PADDLE_INC_DIR AND PADDLE_FLUID_SHARED_LIB)
set(PADDLE_FOUND ON)
add_library(paddle_fluid_shared SHARED IMPORTED)
set_target_properties(paddle_fluid_shared PROPERTIES IMPORTED_LOCATION
${PADDLE_FLUID_SHARED_LIB})
set(PADDLE_LIBRARIES paddle_fluid_shared)
message(STATUS "Found PaddlePaddle Fluid (include: ${PADDLE_INC_DIR}; "
"library: ${PADDLE_FLUID_SHARED_LIB}")
elseif(PADDLE_INC_DIR AND PADDLE_FLUID_STATIC_LIB)
set(PADDLE_FOUND ON)
add_library(paddle_fluid_static STATIC IMPORTED)
set_target_properties(paddle_fluid_static PROPERTIES IMPORTED_LOCATION
${PADDLE_FLUID_STATIC_LIB})
set(PADDLE_LIBRARIES paddle_fluid_static)
message(STATUS "Found PaddlePaddle Fluid (include: ${PADDLE_INC_DIR}; "
"library: ${PADDLE_FLUID_STATIC_LIB}")
else()
set(PADDLE_FOUND OFF)
message(WARNING "Cannot find PaddlePaddle Fluid under ${PADDLE_ROOT}")
return()
endif()
# including directory of third_party libraries
set(PADDLE_THIRD_PARTY_INC_DIRS)
function(third_party_include TARGET_NAME HEADER_NAME TARGET_DIRNAME)
find_path(PADDLE_${TARGET_NAME}_INC_DIR NAMES ${HEADER_NAME} PATHS
${TARGET_DIRNAME}
NO_DEFAULT_PATH)
if(PADDLE_${TARGET_NAME}_INC_DIR)
message(STATUS "Found PaddlePaddle third_party including directory: " ${PADDLE_${TARGET_NAME}_INC_DIR})
set(PADDLE_THIRD_PARTY_INC_DIRS ${PADDLE_THIRD_PARTY_INC_DIRS} ${PADDLE_${TARGET_NAME}_INC_DIR} PARENT_SCOPE)
endif()
endfunction()
third_party_include(glog glog/logging.h ${THIRD_PARTY_ROOT}/install/glog/include)
third_party_include(protobuf google/protobuf/message.h ${THIRD_PARTY_ROOT}/install/protobuf/include)
third_party_include(gflags gflags/gflags.h ${THIRD_PARTY_ROOT}/install/gflags/include)
third_party_include(eigen unsupported/Eigen/CXX11/Tensor ${THIRD_PARTY_ROOT}/eigen3)
third_party_include(boost boost/config.hpp ${THIRD_PARTY_ROOT}/boost)
if(USE_GPU)
third_party_include(cuda cuda.h ${CUDA_ROOT}/include)
third_party_include(cudnn cudnn.h ${CUDNN_ROOT}/include)
endif()
message(STATUS "PaddlePaddle need to include these third party directories: ${PADDLE_THIRD_PARTY_INC_DIRS}")
include_directories(${PADDLE_THIRD_PARTY_INC_DIRS})
set(PADDLE_THIRD_PARTY_LIBRARIES)
function(third_party_library TARGET_NAME TARGET_DIRNAME)
set(library_names ${ARGN})
set(local_third_party_libraries)
foreach(lib ${library_names})
string(REGEX REPLACE "^lib" "" lib_noprefix ${lib})
if(${lib} MATCHES "${CMAKE_STATIC_LIBRARY_SUFFIX}$")
set(libtype STATIC)
string(REGEX REPLACE "${CMAKE_STATIC_LIBRARY_SUFFIX}$" "" libname ${lib_noprefix})
elseif(${lib} MATCHES "${CMAKE_SHARED_LIBRARY_SUFFIX}(\\.[0-9]+)?$")
set(libtype SHARED)
string(REGEX REPLACE "${CMAKE_SHARED_LIBRARY_SUFFIX}(\\.[0-9]+)?$" "" libname ${lib_noprefix})
else()
message(FATAL_ERROR "Unknown library type: ${lib}")
endif()
#message(STATUS "libname: ${libname}")
find_library(${libname}_LIBRARY NAMES "${lib}" PATHS
${TARGET_DIRNAME}
NO_DEFAULT_PATH)
if(${libname}_LIBRARY)
set(${TARGET_NAME}_FOUND ON PARENT_SCOPE)
add_library(${libname} ${libtype} IMPORTED)
set_target_properties(${libname} PROPERTIES IMPORTED_LOCATION ${${libname}_LIBRARY})
set(local_third_party_libraries ${local_third_party_libraries} ${libname})
message(STATUS "Found PaddlePaddle third_party library: " ${${libname}_LIBRARY})
else()
set(${TARGET_NAME}_FOUND OFF PARENT_SCOPE)
message(WARNING "Cannot find ${lib} under ${THIRD_PARTY_ROOT}")
endif()
endforeach()
set(PADDLE_THIRD_PARTY_LIBRARIES ${PADDLE_THIRD_PARTY_LIBRARIES} ${local_third_party_libraries} PARENT_SCOPE)
endfunction()
third_party_library(mklml ${THIRD_PARTY_ROOT}/install/mklml/lib libiomp5.so libmklml_intel.so)
third_party_library(mkldnn ${THIRD_PARTY_ROOT}/install/mkldnn/lib libmkldnn.so)
if(NOT mkldnn_FOUND)
third_party_library(mkldnn ${THIRD_PARTY_ROOT}/install/mkldnn/lib libmkldnn.so.0)
endif()
if(NOT USE_SHARED)
third_party_library(glog ${THIRD_PARTY_ROOT}/install/glog/lib libglog.a)
third_party_library(protobuf ${THIRD_PARTY_ROOT}/install/protobuf/lib libprotobuf.a)
third_party_library(gflags ${THIRD_PARTY_ROOT}/install/gflags/lib libgflags.a)
if(NOT mklml_FOUND)
third_party_library(openblas ${THIRD_PARTY_ROOT}/install/openblas/lib libopenblas.a)
endif()
third_party_library(zlib ${THIRD_PARTY_ROOT}/install/zlib/lib libz.a)
third_party_library(snappystream ${THIRD_PARTY_ROOT}/install/snappystream/lib libsnappystream.a)
third_party_library(snappy ${THIRD_PARTY_ROOT}/install/snappy/lib libsnappy.a)
third_party_library(xxhash ${THIRD_PARTY_ROOT}/install/xxhash/lib libxxhash.a)
if(USE_GPU)
third_party_library(cudart ${CUDA_ROOT}/lib64 libcudart.so)
endif()
endif()
\ No newline at end of file
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License
# Tries to find Gperftools.
#
# Usage of this module as follows:
#
# find_package(Gperftools)
#
# Variables used by this module, they can change the default behaviour and need
# to be set before calling find_package:
#
# Gperftools_ROOT_DIR Set this variable to the root installation of
# Gperftools if the module has problems finding
# the proper installation path.
#
# Variables defined by this module:
#
# GPERFTOOLS_FOUND System has Gperftools libs/headers
# GPERFTOOLS_LIBRARIES The Gperftools libraries (tcmalloc & profiler)
# GPERFTOOLS_INCLUDE_DIR The location of Gperftools headers
find_library(GPERFTOOLS_TCMALLOC
NAMES tcmalloc
HINTS ${Gperftools_ROOT_DIR}/lib)
find_library(GPERFTOOLS_PROFILER
NAMES profiler
HINTS ${Gperftools_ROOT_DIR}/lib)
find_library(GPERFTOOLS_TCMALLOC_AND_PROFILER
NAMES tcmalloc_and_profiler
HINTS ${Gperftools_ROOT_DIR}/lib)
find_path(GPERFTOOLS_INCLUDE_DIR
NAMES gperftools/heap-profiler.h
HINTS ${Gperftools_ROOT_DIR}/include)
set(GPERFTOOLS_LIBRARIES ${GPERFTOOLS_TCMALLOC_AND_PROFILER})
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(
Gperftools
DEFAULT_MSG
GPERFTOOLS_LIBRARIES
GPERFTOOLS_INCLUDE_DIR)
mark_as_advanced(
Gperftools_ROOT_DIR
GPERFTOOLS_TCMALLOC
GPERFTOOLS_PROFILER
GPERFTOOLS_TCMALLOC_AND_PROFILER
GPERFTOOLS_LIBRARIES
GPERFTOOLS_INCLUDE_DIR)
# create IMPORTED targets
if (Gperftools_FOUND AND NOT TARGET gperftools::tcmalloc)
add_library(gperftools::tcmalloc UNKNOWN IMPORTED)
set_target_properties(gperftools::tcmalloc PROPERTIES
IMPORTED_LOCATION ${GPERFTOOLS_TCMALLOC}
INTERFACE_INCLUDE_DIRECTORIES "${GPERFTOOLS_INCLUDE_DIR}")
add_library(gperftools::profiler UNKNOWN IMPORTED)
set_target_properties(gperftools::profiler PROPERTIES
IMPORTED_LOCATION ${GPERFTOOLS_PROFILER}
INTERFACE_INCLUDE_DIRECTORIES "${GPERFTOOLS_INCLUDE_DIR}")
endif()
#!/bin/bash
MODEL_DIR=$HOME/repo/Paddle/resnet50_quant_int8
DATA_FILE=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin
num_threads=10
with_accuracy_layer=false
use_profile=true
ITERATIONS=0
./build/inference --logtostderr=1 \
--infer_model=${MODEL_DIR} \
--infer_data=${DATA_FILE} \
--batch_size=1 \
--num_threads=${num_threads} \
--iterations=${ITERATIONS} \
--with_accuracy_layer=${with_accuracy_layer} \
--use_profile=${use_profile} \
--optimize_fp32_model=false
/* Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <paddle_inference_api.h>
#include <algorithm>
#include <chrono>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <numeric>
#include <sstream>
#include <string>
#include <vector>
#ifdef WITH_GPERFTOOLS
#include <gperftools/profiler.h>
#include <paddle/fluid/platform/profiler.h>
#endif
DEFINE_string(infer_model, "", "path to the model");
DEFINE_string(infer_data, "", "path to the input data");
DEFINE_int32(batch_size, 50, "inference batch size");
DEFINE_int32(iterations,
0,
"number of batches to process. 0 means testing whole dataset");
DEFINE_int32(num_threads, 1, "num of threads to run in parallel");
DEFINE_bool(with_accuracy_layer,
true,
"Set with_accuracy_layer to true if provided model has accuracy layer and requires label input");
DEFINE_bool(use_profile, false, "Set use_profile to true to get profile information");
DEFINE_bool(optimize_fp32_model, false, "If optimize_fp32_model is set to true, fp32 model will be optimized");
struct Timer {
std::chrono::high_resolution_clock::time_point start;
std::chrono::high_resolution_clock::time_point startu;
void tic() { start = std::chrono::high_resolution_clock::now(); }
double toc() {
startu = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> time_span =
std::chrono::duration_cast<std::chrono::duration<double>>(startu -
start);
double used_time_ms = static_cast<double>(time_span.count()) * 1000.0;
return used_time_ms;
}
};
template <typename T>
constexpr paddle::PaddleDType GetPaddleDType();
template <>
constexpr paddle::PaddleDType GetPaddleDType<int64_t>() {
return paddle::PaddleDType::INT64;
}
template <>
constexpr paddle::PaddleDType GetPaddleDType<float>() {
return paddle::PaddleDType::FLOAT32;
}
template <typename T>
class TensorReader {
public:
TensorReader(std::ifstream &file,
size_t beginning_offset,
std::vector<int> shape,
std::string name)
: file_(file), position_(beginning_offset), shape_(shape), name_(name) {
numel_ = std::accumulate(
shape_.begin(), shape_.end(), size_t{1}, std::multiplies<size_t>());
}
paddle::PaddleTensor NextBatch() {
paddle::PaddleTensor tensor;
tensor.name = name_;
tensor.shape = shape_;
tensor.dtype = GetPaddleDType<T>();
tensor.data.Resize(numel_ * sizeof(T));
file_.seekg(position_);
file_.read(static_cast<char *>(tensor.data.data()), numel_ * sizeof(T));
position_ = file_.tellg();
if (file_.eof()) LOG(ERROR) << name_ << ": reached end of stream";
if (file_.bad()) LOG(ERROR) << name_ << "ERROR: badbit is true";
if (file_.fail())
throw std::runtime_error(name_ + ": failed reading file.");
return tensor;
}
protected:
std::ifstream &file_;
size_t position_;
std::vector<int> shape_;
std::string name_;
size_t numel_;
};
void SetInput(std::vector<std::vector<paddle::PaddleTensor>> *inputs,
std::vector<paddle::PaddleTensor> *labels_gt,
bool with_accuracy_layer = FLAGS_with_accuracy_layer,
int32_t batch_size = FLAGS_batch_size) {
std::ifstream file(FLAGS_infer_data, std::ios::binary);
if (!file) {
throw std::runtime_error("Couldn't open file: " + FLAGS_infer_data);
}
int64_t total_images{0};
file.seekg(0, std::ios::beg);
file.read(reinterpret_cast<char *>(&total_images), sizeof(total_images));
LOG(INFO) << "Total images in file: " << total_images;
std::vector<int> image_batch_shape{batch_size, 3, 224, 224};
std::vector<int> label_batch_shape{batch_size, 1};
auto images_offset_in_file = static_cast<size_t>(file.tellg());
TensorReader<float> image_reader(
file, images_offset_in_file, image_batch_shape, "image");
auto iterations_max = total_images / batch_size;
auto iterations = iterations_max;
if (FLAGS_iterations > 0 && FLAGS_iterations < iterations_max) {
iterations = FLAGS_iterations;
}
auto labels_offset_in_file =
images_offset_in_file + sizeof(float) * total_images * 3 * 224 * 224;
TensorReader<int64_t> label_reader(
file, labels_offset_in_file, label_batch_shape, "label");
for (auto i = 0; i < iterations; i++) {
auto images = image_reader.NextBatch();
std::vector<paddle::PaddleTensor> tmp_vec;
tmp_vec.push_back(std::move(images));
auto labels = label_reader.NextBatch();
if (with_accuracy_layer) {
tmp_vec.push_back(std::move(labels));
} else {
labels_gt->push_back(std::move(labels));
}
inputs->push_back(std::move(tmp_vec));
}
}
static void PrintTime(int batch_size,
int num_threads,
double batch_latency,
int epoch = 1) {
double sample_latency = batch_latency / batch_size;
LOG(INFO) <<"Model: "<<FLAGS_infer_model;
LOG(INFO) << "====== num of threads: " << num_threads << " ======";
LOG(INFO) << "====== batch size: " << batch_size << ", iterations: " << epoch;
LOG(INFO) << "====== batch latency: " << batch_latency
<< "ms, number of samples: " << batch_size * epoch;
LOG(INFO) << ", sample latency: " << sample_latency
<< "ms, fps: " << 1000.f / sample_latency << " ======";
}
void PredictionRun(paddle::PaddlePredictor *predictor,
const std::vector<std::vector<paddle::PaddleTensor>> &inputs,
std::vector<std::vector<paddle::PaddleTensor>> *outputs,
int num_threads,
float *sample_latency = nullptr) {
int iterations = inputs.size(); // process the whole dataset ...
if (FLAGS_iterations > 0 &&
FLAGS_iterations < static_cast<int64_t>(inputs.size()))
iterations =
FLAGS_iterations; // ... unless the number of iterations is set
outputs->resize(iterations);
Timer run_timer;
double elapsed_time = 0;
#ifdef WITH_GPERFTOOLS
ResetProfiler();
ProfilerStart("paddle_inference.prof");
#endif
int predicted_num = 0;
for (int i = 0; i < iterations; i++) {
run_timer.tic();
predictor->Run(inputs[i], &(*outputs)[i], FLAGS_batch_size);
elapsed_time += run_timer.toc();
predicted_num += FLAGS_batch_size;
if (predicted_num % 100 == 0) {
LOG(INFO) << "Infer " << predicted_num << " samples";
}
}
#ifdef WITH_GPERFTOOLS
ProfilerStop();
#endif
auto batch_latency = elapsed_time / iterations;
PrintTime(FLAGS_batch_size, num_threads, batch_latency, iterations);
if (sample_latency != nullptr)
*sample_latency = batch_latency / FLAGS_batch_size;
}
std::pair<float, float> CalculateAccuracy(
const std::vector<std::vector<paddle::PaddleTensor>> &outputs,
const std::vector<paddle::PaddleTensor> &labels_gt,
bool with_accuracy = FLAGS_with_accuracy_layer) {
LOG_IF(ERROR, !with_accuracy && labels_gt.size() == 0)
<< "if with_accuracy set to false, labels_gt must be not empty";
std::vector<float> acc1_ss;
std::vector<float> acc5_ss;
if (!with_accuracy) { // model with_accuracy_layer = false
float *result_array; // for one batch 50*1000
int64_t *batch_labels; // 50*1
LOG_IF(ERROR, outputs.size() != labels_gt.size())
<< "outputs first dimension must be equal to labels_gt first dimension";
for (auto i = 0; i < outputs.size();
++i) { // same as labels first dimension
result_array = static_cast<float *>(outputs[i][0].data.data());
batch_labels = static_cast<int64_t *>(labels_gt[i].data.data());
int correct_1 = 0, correct_5 = 0, total = FLAGS_batch_size;
for (auto j = 0; j < FLAGS_batch_size; j++) { // batch_size
std::vector<float> v(result_array + j * 1000,
result_array + (j + 1) * 1000);
std::vector<std::pair<float, int>> vx;
for (int k = 0; k < 1000; k++) {
vx.push_back(std::make_pair(v[k], k));
}
std::partial_sort(vx.begin(),
vx.begin() + 5,
vx.end(),
[](std::pair<float, int> a, std::pair<float, int> b) {
return a.first > b.first;
});
if (static_cast<int>(batch_labels[j]) == vx[0].second) correct_1 += 1;
if (std::find_if(vx.begin(),
vx.begin() + 5,
[batch_labels, j](std::pair<float, int> a) {
return static_cast<int>(batch_labels[j]) == a.second;
}) != vx.begin() + 5)
correct_5 += 1;
}
acc1_ss.push_back(static_cast<float>(correct_1) /
static_cast<float>(total));
acc5_ss.push_back(static_cast<float>(correct_5) /
static_cast<float>(total));
}
} else { // model with_accuracy_layer = true
for (auto i = 0; i < outputs.size(); ++i) {
LOG_IF(ERROR, outputs[i].size() < 3UL) << "To get top1 and top5 "
"accuracy, output[i] size must "
"be bigger than or equal to 3";
acc1_ss.push_back(
*static_cast<float *>(outputs[i][1].data.data())); // 1 is top1 acc
acc5_ss.push_back(*static_cast<float *>(
outputs[i][2].data.data())); // 2 is top5 acc or mAP
}
}
auto acc1_ss_avg =
std::accumulate(acc1_ss.begin(), acc1_ss.end(), 0.0) / acc1_ss.size();
auto acc5_ss_avg =
std::accumulate(acc5_ss.begin(), acc5_ss.end(), 0.0) / acc5_ss.size();
return std::make_pair(acc1_ss_avg, acc5_ss_avg);
}
static void SetIrOptimConfig(paddle::AnalysisConfig *cfg) {
cfg->DisableGpu();
cfg->SwitchIrOptim();
cfg->EnableMKLDNN();
if(FLAGS_use_profile){
cfg->EnableProfile();
}
}
std::unique_ptr<paddle::PaddlePredictor> CreatePredictor(
const paddle::PaddlePredictor::Config *config, bool use_analysis = true) {
const auto *analysis_config =
reinterpret_cast<const paddle::AnalysisConfig *>(config);
if (use_analysis) {
return paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(
*analysis_config);
}
auto native_config = analysis_config->ToNativeConfig();
return paddle::CreatePaddlePredictor<paddle::NativeConfig>(native_config);
}
int main(int argc, char *argv[]) {
// InitFLAGS(argc, argv);
google::InitGoogleLogging(*argv);
gflags::ParseCommandLineFlags(&argc, &argv, true);
paddle::AnalysisConfig cfg;
cfg.SetModel(FLAGS_infer_model);
cfg.SetCpuMathLibraryNumThreads(FLAGS_num_threads);
if (FLAGS_optimize_fp32_model){
SetIrOptimConfig(&cfg);
}
std::vector<std::vector<paddle::PaddleTensor>> input_slots_all;
std::vector<std::vector<paddle::PaddleTensor>> outputs;
std::vector<paddle::PaddleTensor> labels_gt; // optional
SetInput(&input_slots_all, &labels_gt); // iterations*batch_size
auto predictor = CreatePredictor(reinterpret_cast<paddle::PaddlePredictor::Config *>(&cfg), FLAGS_optimize_fp32_model);
PredictionRun(predictor.get(), input_slots_all, &outputs, FLAGS_num_threads);
auto acc_pair = CalculateAccuracy(outputs, labels_gt);
LOG(INFO) <<"Top1 accuracy: " << std::fixed << std::setw(6)
<<std::setprecision(4) << acc_pair.first;
LOG(INFO) <<"Top5 accuracy: " << std::fixed << std::setw(6)
<<std::setprecision(4) << acc_pair.second;
}
# copyright (c) 2020 paddlepaddle authors. all rights reserved.
#
# licensed under the apache license, version 2.0 (the "license");
# you may not use this file except in compliance with the license.
# you may obtain a copy of the license at
#
# http://www.apache.org/licenses/license-2.0
#
# unless required by applicable law or agreed to in writing, software
# distributed under the license is distributed on an "as is" basis,
# without warranties or conditions of any kind, either express or implied.
# see the license for the specific language governing permissions and
# limitations under the license.
import unittest
import os
import sys
import argparse
import logging
import struct
import six
import numpy as np
import time
import paddle
import paddle.fluid as fluid
from paddle.fluid.framework import IrGraph
from paddle.fluid import core
logging.basicConfig(format='%(asctime)s-%(levelname)s: %(message)s')
_logger = logging.getLogger(__name__)
_logger.setLevel(logging.INFO)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--batch_size', type=int, default=1, help='Batch size.')
parser.add_argument(
'--skip_batch_num',
type=int,
default=0,
help='Number of the first minibatches to skip in performance statistics.'
)
parser.add_argument(
'--infer_model',
type=str,
default='',
help='A path to an Inference model.')
parser.add_argument(
'--infer_data', type=str, default='', help='Data file.')
parser.add_argument(
'--batch_num',
type=int,
default=0,
help='Number of batches to process. 0 or less means whole dataset. Default: 0.'
)
parser.add_argument(
'--with_accuracy_layer',
type=bool,
default=False,
help='The model is with accuracy or without accuracy layer')
test_args, args = parser.parse_known_args(namespace=unittest)
return test_args, sys.argv[:1] + args
class SampleTester(unittest.TestCase):
def _reader_creator(self, data_file='data.bin'):
def reader():
with open(data_file, 'rb') as fp:
num = fp.read(8)
num = struct.unpack('q', num)[0]
imgs_offset = 8
img_ch = 3
img_w = 224
img_h = 224
img_pixel_size = 4
img_size = img_ch * img_h * img_w * img_pixel_size
label_size = 8
labels_offset = imgs_offset + num * img_size
step = 0
while step < num:
fp.seek(imgs_offset + img_size * step)
img = fp.read(img_size)
img = struct.unpack_from(
'{}f'.format(img_ch * img_w * img_h), img)
img = np.array(img)
img.shape = (img_ch, img_w, img_h)
fp.seek(labels_offset + label_size * step)
label = fp.read(label_size)
label = struct.unpack('q', label)[0]
yield img, int(label)
step += 1
return reader
def _get_batch_accuracy(self, batch_output=None, labels=None):
total = 0
correct = 0
correct_5 = 0
for n, result in enumerate(batch_output):
index = result.argsort()
top_1_index = index[-1]
top_5_index = index[-5:]
total += 1
if top_1_index == labels[n]:
correct += 1
if labels[n] in top_5_index:
correct_5 += 1
acc1 = float(correct) / float(total)
acc5 = float(correct_5) / float(total)
return acc1, acc5
def _prepare_for_fp32_mkldnn(self, graph):
ops = graph.all_op_nodes()
for op_node in ops:
name = op_node.name()
if name in ['depthwise_conv2d']:
input_var_node = graph._find_node_by_name(
op_node.inputs, op_node.input("Input")[0])
weight_var_node = graph._find_node_by_name(
op_node.inputs, op_node.input("Filter")[0])
output_var_node = graph._find_node_by_name(
graph.all_var_nodes(), op_node.output("Output")[0])
attrs = {
name: op_node.op().attr(name)
for name in op_node.op().attr_names()
}
conv_op_node = graph.create_op_node(
op_type='conv2d',
attrs=attrs,
inputs={
'Input': input_var_node,
'Filter': weight_var_node
},
outputs={'Output': output_var_node})
graph.link_to(input_var_node, conv_op_node)
graph.link_to(weight_var_node, conv_op_node)
graph.link_to(conv_op_node, output_var_node)
graph.safe_remove_nodes(op_node)
return graph
def _predict(self,
test_reader=None,
model_path=None,
with_accuracy_layer=False,
batch_size=1,
batch_num=1,
skip_batch_num=0):
place = fluid.CPUPlace()
exe = fluid.Executor(place)
inference_scope = fluid.executor.global_scope()
with fluid.scope_guard(inference_scope):
if os.path.exists(os.path.join(model_path, '__model__')):
[inference_program, feed_target_names, fetch_targets
] = fluid.io.load_inference_model(model_path, exe)
else:
[inference_program, feed_target_names,
fetch_targets] = fluid.io.load_inference_model(
model_path, exe, 'model', 'params')
graph = IrGraph(core.Graph(inference_program.desc), for_test=True)
graph = self._prepare_for_fp32_mkldnn(graph)
inference_program = graph.to_program()
dshape = [3, 224, 224]
outputs = []
infer_accs1 = []
infer_accs5 = []
batch_acc1 = 0.0
batch_acc5 = 0.0
fpses = []
batch_times = []
batch_time = 0.0
total_samples = 0
iters = 0
infer_start_time = time.time()
for data in test_reader():
if batch_num > 0 and iters >= batch_num:
break
if iters == skip_batch_num:
total_samples = 0
infer_start_time = time.time()
if six.PY2:
images = map(lambda x: x[0].reshape(dshape), data)
if six.PY3:
images = list(map(lambda x: x[0].reshape(dshape), data))
images = np.array(images).astype('float32')
labels = np.array([x[1] for x in data]).astype('int64')
if (with_accuracy_layer == False):
# models that do not have accuracy measuring layers
start = time.time()
out = exe.run(inference_program,
feed={feed_target_names[0]: images},
fetch_list=fetch_targets)
batch_time = (time.time() - start) * 1000 # in miliseconds
outputs.append(out[0])
# Calculate accuracy result
batch_acc1, batch_acc5 = self._get_batch_accuracy(out[0],
labels)
else:
# models have accuracy measuring layers
labels = labels.reshape([-1, 1])
start = time.time()
out = exe.run(inference_program,
feed={
feed_target_names[0]: images,
feed_target_names[1]: labels
},
fetch_list=fetch_targets)
batch_time = (time.time() - start) * 1000 # in miliseconds
batch_acc1, batch_acc5 = out[1][0], out[2][0]
outputs.append(batch_acc1)
infer_accs1.append(batch_acc1)
infer_accs5.append(batch_acc5)
samples = len(data)
total_samples += samples
batch_times.append(batch_time)
fps = samples / batch_time * 1000
fpses.append(fps)
iters += 1
appx = ' (warm-up)' if iters <= skip_batch_num else ''
_logger.info('batch {0}{5}, acc1: {1:.4f}, acc5: {2:.4f}, '
'latency: {3:.4f} ms, fps: {4:.2f}'.format(
iters, batch_acc1, batch_acc5, batch_time /
batch_size, fps, appx))
# Postprocess benchmark data
batch_latencies = batch_times[skip_batch_num:]
batch_latency_avg = np.average(batch_latencies)
latency_avg = batch_latency_avg / batch_size
fpses = fpses[skip_batch_num:]
fps_avg = np.average(fpses)
infer_total_time = time.time() - infer_start_time
acc1_avg = np.mean(infer_accs1)
acc5_avg = np.mean(infer_accs5)
_logger.info('Total inference run time: {:.2f} s'.format(
infer_total_time))
return outputs, acc1_avg, acc5_avg, fps_avg, latency_avg
def test_graph_transformation(self):
if not fluid.core.is_compiled_with_mkldnn():
return
infer_model_path = test_case_args.infer_model
assert infer_model_path, 'The model path cannot be empty. Please, use the --infer_model option.'
data_path = test_case_args.infer_data
assert data_path, 'The dataset path cannot be empty. Please, use the --infer_data option.'
batch_size = test_case_args.batch_size
batch_num = test_case_args.batch_num
skip_batch_num = test_case_args.skip_batch_num
with_accuracy_layer = test_case_args.with_accuracy_layer
_logger.info('Inference model: {0}'.format(infer_model_path))
_logger.info('Dataset: {0}'.format(data_path))
_logger.info('Batch size: {0}'.format(batch_size))
_logger.info('Batch number: {0}'.format(batch_num))
_logger.info('--- Inference prediction start ---')
val_reader = paddle.batch(
self._reader_creator(data_path), batch_size=batch_size)
fp32_output, fp32_acc1, fp32_acc5, fp32_fps, fp32_lat = self._predict(
val_reader, infer_model_path, with_accuracy_layer, batch_size,
batch_num, skip_batch_num)
_logger.info(
'Inference: avg top1 accuracy: {0:.4f}, avg top5 accuracy: {1:.4f}'.
format(fp32_acc1, fp32_acc5))
_logger.info('Inference: avg fps: {0:.2f}, avg latency: {1:.4f} ms'.
format(fp32_fps, fp32_lat))
if __name__ == '__main__':
global test_case_args
test_case_args, remaining_args = parse_args()
unittest.main(argv=remaining_args)
from __future__ import absolute_import
from .mobilenet import MobileNet
from .resnet import ResNet34, ResNet50
from .resnet_vd import ResNet50_vd
from .mobilenet_v2 import MobileNetV2
from .resnet_vd import ResNet50_vd, ResNet101_vd
from .mobilenet_v2 import MobileNetV2_x0_25, MobileNetV2
from .pvanet import PVANet
from .slimfacenet import SlimFaceNet_A_x0_60, SlimFaceNet_B_x0_75, SlimFaceNet_C_x0_75
from .mobilenet_v3 import *
__all__ = [
"model_list", "MobileNet", "ResNet34", "ResNet50", "MobileNetV2", "PVANet",
"ResNet50_vd"
"ResNet50_vd", "ResNet101_vd", "MobileNetV2_x0_25"
]
model_list = [
'MobileNet', 'ResNet34', 'ResNet50', 'MobileNetV2', 'PVANet', 'ResNet50_vd'
'MobileNet', 'ResNet34', 'ResNet50', 'MobileNetV2', 'PVANet',
'ResNet50_vd', "ResNet101_vd", "MobileNetV2_x0_25"
]
__all__ += mobilenet_v3.__all__
model_list += mobilenet_v3.__all__
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
import math
__all__ = [
'MobileNetV3', 'MobileNetV3_small_x0_25', 'MobileNetV3_small_x0_5',
'MobileNetV3_small_x0_75', 'MobileNetV3_small_x1_0',
'MobileNetV3_small_x1_25', 'MobileNetV3_large_x0_25',
'MobileNetV3_large_x0_5', 'MobileNetV3_large_x0_75',
'MobileNetV3_large_x1_0', 'MobileNetV3_large_x1_25',
'MobileNetV3_large_x2_0'
]
class MobileNetV3():
def __init__(self, scale=1.0, model_name='small'):
self.scale = scale
self.inplanes = 16
if model_name == "large":
self.cfg = [
# k, exp, c, se, nl, s,
[3, 16, 16, False, 'relu', 1],
[3, 64, 24, False, 'relu', 2],
[3, 72, 24, False, 'relu', 1],
[5, 72, 40, True, 'relu', 2],
[5, 120, 40, True, 'relu', 1],
[5, 120, 40, True, 'relu', 1],
[3, 240, 80, False, 'hard_swish', 2],
[3, 200, 80, False, 'hard_swish', 1],
[3, 184, 80, False, 'hard_swish', 1],
[3, 184, 80, False, 'hard_swish', 1],
[3, 480, 112, True, 'hard_swish', 1],
[3, 672, 112, True, 'hard_swish', 1],
[5, 672, 160, True, 'hard_swish', 2],
[5, 960, 160, True, 'hard_swish', 1],
[5, 960, 160, True, 'hard_swish', 1],
]
self.cls_ch_squeeze = 960
self.cls_ch_expand = 1280
elif model_name == "small":
self.cfg = [
# k, exp, c, se, nl, s,
[3, 16, 16, True, 'relu', 2],
[3, 72, 24, False, 'relu', 2],
[3, 88, 24, False, 'relu', 1],
[5, 96, 40, True, 'hard_swish', 2],
[5, 240, 40, True, 'hard_swish', 1],
[5, 240, 40, True, 'hard_swish', 1],
[5, 120, 48, True, 'hard_swish', 1],
[5, 144, 48, True, 'hard_swish', 1],
[5, 288, 96, True, 'hard_swish', 2],
[5, 576, 96, True, 'hard_swish', 1],
[5, 576, 96, True, 'hard_swish', 1],
]
self.cls_ch_squeeze = 576
self.cls_ch_expand = 1280
else:
raise NotImplementedError
def net(self, input, class_dim=1000):
scale = self.scale
inplanes = self.inplanes
cfg = self.cfg
cls_ch_squeeze = self.cls_ch_squeeze
cls_ch_expand = self.cls_ch_expand
#conv1
conv = self.conv_bn_layer(
input,
filter_size=3,
#num_filters=int(scale*inplanes),
num_filters=inplanes if scale <= 1.0 else int(inplanes * scale),
stride=2,
padding=1,
num_groups=1,
if_act=True,
act='hard_swish',
name='conv1')
print(conv.shape)
i = 0
for layer_cfg in cfg:
conv = self.residual_unit(
input=conv,
num_in_filter=inplanes,
num_mid_filter=int(scale * layer_cfg[1]),
num_out_filter=int(scale * layer_cfg[2]),
act=layer_cfg[4],
stride=layer_cfg[5],
filter_size=layer_cfg[0],
use_se=layer_cfg[3],
name='conv' + str(i + 2))
inplanes = int(scale * layer_cfg[2])
i += 1
conv = self.conv_bn_layer(
input=conv,
filter_size=1,
num_filters=int(scale * cls_ch_squeeze),
stride=1,
padding=0,
num_groups=1,
if_act=True,
act='hard_swish',
name='conv_last')
conv = fluid.layers.pool2d(
input=conv, pool_type='avg', global_pooling=True, use_cudnn=False)
conv = fluid.layers.conv2d(
input=conv,
num_filters=cls_ch_expand,
filter_size=1,
stride=1,
padding=0,
act=None,
param_attr=ParamAttr(name='last_1x1_conv_weights'),
bias_attr=False)
#conv = fluid.layers.hard_swish(conv)
conv = self.hard_swish(conv)
out = fluid.layers.fc(input=conv,
size=class_dim,
act='softmax',
param_attr=ParamAttr(name='fc_weights'),
bias_attr=ParamAttr(name='fc_offset'))
return out
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
num_groups=1,
if_act=True,
act=None,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(
name=bn_name + "_scale",
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0)),
bias_attr=ParamAttr(
name=bn_name + "_offset",
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0)),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
if act == 'relu':
bn = fluid.layers.relu(bn)
elif act == 'hard_swish':
#bn = fluid.layers.hard_swish(bn)
bn = self.hard_swish(bn)
return bn
def hard_swish(self, x):
return x * fluid.layers.relu6(x + 3) / 6.
def se_block(self, input, num_out_filter, ratio=4, name=None):
num_mid_filter = int(num_out_filter // ratio)
pool = fluid.layers.pool2d(
input=input, pool_type='avg', global_pooling=True, use_cudnn=False)
conv1 = fluid.layers.conv2d(
input=pool,
filter_size=1,
num_filters=num_mid_filter,
act='relu',
param_attr=ParamAttr(name=name + '_1_weights'),
bias_attr=ParamAttr(name=name + '_1_offset'))
conv2 = fluid.layers.conv2d(
input=conv1,
filter_size=1,
num_filters=num_out_filter,
act='hard_sigmoid',
param_attr=ParamAttr(name=name + '_2_weights'),
bias_attr=ParamAttr(name=name + '_2_offset'))
scale = fluid.layers.elementwise_mul(x=input, y=conv2, axis=0)
return scale
def residual_unit(self,
input,
num_in_filter,
num_mid_filter,
num_out_filter,
stride,
filter_size,
act=None,
use_se=False,
name=None):
input_data = input
conv0 = self.conv_bn_layer(
input=input,
filter_size=1,
num_filters=num_mid_filter,
stride=1,
padding=0,
if_act=True,
act=act,
name=name + '_expand')
conv1 = self.conv_bn_layer(
input=conv0,
filter_size=filter_size,
num_filters=num_mid_filter,
stride=stride,
padding=int((filter_size - 1) // 2),
if_act=True,
act=act,
num_groups=num_mid_filter,
use_cudnn=False,
name=name + '_depthwise')
if use_se:
with fluid.name_scope('se_block_skip'):
conv1 = self.se_block(
input=conv1,
num_out_filter=num_mid_filter,
name=name + '_se')
conv2 = self.conv_bn_layer(
input=conv1,
filter_size=1,
num_filters=num_out_filter,
stride=1,
padding=0,
if_act=False,
name=name + '_linear')
if num_in_filter != num_out_filter or stride != 1:
return conv2
else:
return fluid.layers.elementwise_add(
x=input_data, y=conv2, act=None)
def MobileNetV3_small_x0_25():
model = MobileNetV3(model_name='small', scale=0.25)
return model
def MobileNetV3_small_x0_5():
model = MobileNetV3(model_name='small', scale=0.5)
return model
def MobileNetV3_small_x0_75():
model = MobileNetV3(model_name='small', scale=0.75)
return model
def MobileNetV3_small_x1_0():
model = MobileNetV3(model_name='small', scale=1.0)
return model
def MobileNetV3_small_x1_25():
model = MobileNetV3(model_name='small', scale=1.25)
return model
def MobileNetV3_large_x0_25():
model = MobileNetV3(model_name='large', scale=0.25)
return model
def MobileNetV3_large_x0_5():
model = MobileNetV3(model_name='large', scale=0.5)
return model
def MobileNetV3_large_x0_75():
model = MobileNetV3(model_name='large', scale=0.75)
return model
def MobileNetV3_large_x1_0():
model = MobileNetV3(model_name='large', scale=1.0)
return model
def MobileNetV3_large_x1_25():
model = MobileNetV3(model_name='large', scale=1.25)
return model
def MobileNetV3_large_x2_0():
model = MobileNetV3(model_name='large', scale=2.0)
return model
# ================================================================
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import datetime
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
class SlimFaceNet():
def __init__(self, class_dim, scale=0.6, arch=None):
assert arch is not None
self.arch = arch
self.class_dim = class_dim
kernels = [3]
expansions = [2, 4, 6]
SE = [0, 1]
self.table = []
for k in kernels:
for e in expansions:
for se in SE:
self.table.append((k, e, se))
if scale == 1.0:
# 100% - channel
self.Slimfacenet_bottleneck_setting = [
# t, c , n ,s
[2, 64, 5, 2],
[4, 128, 1, 2],
[2, 128, 6, 1],
[4, 128, 1, 2],
[2, 128, 2, 1]
]
elif scale == 0.9:
# 90% - channel
self.Slimfacenet_bottleneck_setting = [
# t, c , n ,s
[2, 56, 5, 2],
[4, 116, 1, 2],
[2, 116, 6, 1],
[4, 116, 1, 2],
[2, 116, 2, 1]
]
elif scale == 0.75:
# 75% - channel
self.Slimfacenet_bottleneck_setting = [
# t, c , n ,s
[2, 48, 5, 2],
[4, 96, 1, 2],
[2, 96, 6, 1],
[4, 96, 1, 2],
[2, 96, 2, 1]
]
elif scale == 0.6:
# 60% - channel
self.Slimfacenet_bottleneck_setting = [
# t, c , n ,s
[2, 40, 5, 2],
[4, 76, 1, 2],
[2, 76, 6, 1],
[4, 76, 1, 2],
[2, 76, 2, 1]
]
else:
print('WRONG scale')
exit()
self.extract_feature = True
def set_extract_feature_flag(self, flag):
self.extract_feature = flag
def net(self, input, label=None):
x = self.conv_bn_layer(
input,
filter_size=3,
num_filters=64,
stride=2,
padding=1,
num_groups=1,
if_act=True,
name='conv3x3')
x = self.conv_bn_layer(
x,
filter_size=3,
num_filters=64,
stride=1,
padding=1,
num_groups=64,
if_act=True,
name='dw_conv3x3')
in_c = 64
cnt = 0
for _exp, out_c, times, _stride in self.Slimfacenet_bottleneck_setting:
for i in range(times):
stride = _stride if i == 0 else 1
filter_size, exp, se = self.table[self.arch[cnt]]
se = False if se == 0 else True
x = self.residual_unit(
x,
num_in_filter=in_c,
num_out_filter=out_c,
stride=stride,
filter_size=filter_size,
expansion_factor=exp,
use_se=se,
name='residual_unit' + str(cnt + 1))
cnt += 1
in_c = out_c
out_c = 512
x = self.conv_bn_layer(
x,
filter_size=1,
num_filters=out_c,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name='conv1x1')
x = self.conv_bn_layer(
x,
filter_size=(7, 6),
num_filters=out_c,
stride=1,
padding=0,
num_groups=out_c,
if_act=False,
name='global_dw_conv7x7')
x = fluid.layers.conv2d(
x,
num_filters=128,
filter_size=1,
stride=1,
padding=0,
groups=1,
act=None,
use_cudnn=True,
param_attr=ParamAttr(
name='linear_conv1x1_weights',
initializer=MSRA(),
regularizer=fluid.regularizer.L2Decay(4e-4)),
bias_attr=False)
bn_name = 'linear_conv1x1_bn'
x = fluid.layers.batch_norm(
x,
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
x = fluid.layers.reshape(x, shape=[x.shape[0], x.shape[1]])
if self.extract_feature:
return x
out = self.arc_margin_product(
x, label, self.class_dim, s=32.0, m=0.50, mode=2)
softmax = fluid.layers.softmax(input=out)
cost = fluid.layers.cross_entropy(input=softmax, label=label)
loss = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=out, label=label, k=1)
return loss, acc
def residual_unit(self,
input,
num_in_filter,
num_out_filter,
stride,
filter_size,
expansion_factor,
use_se=False,
name=None):
num_expfilter = int(round(num_in_filter * expansion_factor))
input_data = input
expand_conv = self.conv_bn_layer(
input=input,
filter_size=1,
num_filters=num_expfilter,
stride=1,
padding=0,
if_act=True,
name=name + '_expand')
depthwise_conv = self.conv_bn_layer(
input=expand_conv,
filter_size=filter_size,
num_filters=num_expfilter,
stride=stride,
padding=int((filter_size - 1) // 2),
if_act=True,
num_groups=num_expfilter,
use_cudnn=True,
name=name + '_depthwise')
if use_se:
depthwise_conv = self.se_block(
input=depthwise_conv,
num_out_filter=num_expfilter,
name=name + '_se')
linear_conv = self.conv_bn_layer(
input=depthwise_conv,
filter_size=1,
num_filters=num_out_filter,
stride=1,
padding=0,
if_act=False,
name=name + '_linear')
if num_in_filter != num_out_filter or stride != 1:
return linear_conv
else:
return fluid.layers.elementwise_add(
x=input_data, y=linear_conv, act=None)
def se_block(self, input, num_out_filter, ratio=4, name=None):
num_mid_filter = int(num_out_filter // ratio)
pool = fluid.layers.pool2d(
input=input, pool_type='avg', global_pooling=True, use_cudnn=False)
conv1 = fluid.layers.conv2d(
input=pool,
filter_size=1,
num_filters=num_mid_filter,
act=None,
param_attr=ParamAttr(name=name + '_1_weights'),
bias_attr=ParamAttr(name=name + '_1_offset'))
conv1 = fluid.layers.prelu(
conv1,
mode='channel',
param_attr=ParamAttr(
name=name + '_prelu',
regularizer=fluid.regularizer.L2Decay(0.0)))
conv2 = fluid.layers.conv2d(
input=conv1,
filter_size=1,
num_filters=num_out_filter,
act='hard_sigmoid',
param_attr=ParamAttr(name=name + '_2_weights'),
bias_attr=ParamAttr(name=name + '_2_offset'))
scale = fluid.layers.elementwise_mul(x=input, y=conv2, axis=0)
return scale
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(
name=name + '_weights', initializer=MSRA()),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.prelu(
bn,
mode='channel',
param_attr=ParamAttr(
name=name + '_prelu',
regularizer=fluid.regularizer.L2Decay(0.0)))
else:
return bn
def arc_margin_product(self, input, label, out_dim, s=32.0, m=0.50,
mode=2):
input_norm = fluid.layers.sqrt(
fluid.layers.reduce_sum(
fluid.layers.square(input), dim=1))
input = fluid.layers.elementwise_div(input, input_norm, axis=0)
weight = fluid.layers.create_parameter(
shape=[out_dim, input.shape[1]],
dtype='float32',
name='weight_norm',
attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Xavier(),
regularizer=fluid.regularizer.L2Decay(4e-4)))
weight_norm = fluid.layers.sqrt(
fluid.layers.reduce_sum(
fluid.layers.square(weight), dim=1))
weight = fluid.layers.elementwise_div(weight, weight_norm, axis=0)
weight = fluid.layers.transpose(weight, perm=[1, 0])
cosine = fluid.layers.mul(input, weight)
sine = fluid.layers.sqrt(1.0 - fluid.layers.square(cosine))
cos_m = math.cos(m)
sin_m = math.sin(m)
phi = cosine * cos_m - sine * sin_m
th = math.cos(math.pi - m)
mm = math.sin(math.pi - m) * m
if mode == 1:
phi = self.paddle_where_more_than(cosine, 0, phi, cosine)
elif mode == 2:
phi = self.paddle_where_more_than(cosine, th, phi, cosine - mm)
else:
pass
one_hot = fluid.one_hot(input=label, depth=out_dim)
output = fluid.layers.elementwise_mul(
one_hot, phi) + fluid.layers.elementwise_mul(
(1.0 - one_hot), cosine)
output = output * s
return output
def paddle_where_more_than(self, target, limit, x, y):
mask = fluid.layers.cast(x=(target > limit), dtype='float32')
output = fluid.layers.elementwise_mul(
mask, x) + fluid.layers.elementwise_mul((1.0 - mask), y)
return output
def SlimFaceNet_A_x0_60(class_dim=None, scale=0.6, arch=None):
scale = 0.6
arch = [0, 1, 5, 1, 0, 2, 1, 2, 0, 1, 2, 1, 1, 0, 1]
return SlimFaceNet(class_dim=class_dim, scale=scale, arch=arch)
def SlimFaceNet_B_x0_75(class_dim=None, scale=0.6, arch=None):
scale = 0.75
arch = [1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 3, 2, 2, 3]
return SlimFaceNet(class_dim=class_dim, scale=scale, arch=arch)
def SlimFaceNet_C_x0_75(class_dim=None, scale=0.6, arch=None):
scale = 0.75
arch = [1, 1, 2, 1, 0, 2, 1, 0, 1, 0, 1, 1, 2, 2, 3]
return SlimFaceNet(class_dim=class_dim, scale=scale, arch=arch)
if __name__ == "__main__":
x = fluid.data(name='x', shape=[-1, 3, 112, 112], dtype='float32')
print(x.shape)
model = SlimFaceNet(10000, [1, 3, 3, 1, 1, 0, 0, 1, 0, 1, 1, 0, 5, 5, 3])
y = model.net(x)
# 网络结构搜索示例
# SANAS网络结构搜索示例
本示例介绍如何使用网络结构搜索接口,搜索到一个更小或者精度更高的模型,该文档仅介绍paddleslim中SANAS的使用及如何利用SANAS得到模型结构,完整示例代码请参考sa_nas_mobilenetv2.py或者block_sa_nas_mobilenetv2.py。
本示例介绍如何使用网络结构搜索接口,搜索到一个更小或者精度更高的模型,该示例介绍paddleslim中SANAS的使用及如何利用SANAS得到模型结构,完整示例代码请参考sa_nas_mobilenetv2.py或者block_sa_nas_mobilenetv2.py。
## 数据准备
本示例默认使用cifar10数据,cifar10数据会根据调用的paddle接口自动下载,无需额外准备。
......@@ -8,7 +8,7 @@
## 接口介绍
请参考<a href='../../docs/zh_cn/api_cn/nas_api.rst'>神经网络搜索API文档</a>
本示例为在MobileNetV2的搜索空间上搜索FLOPs更小的模型。
本示例为利用SANAS在MobileNetV2的搜索空间上搜索FLOPs更小的模型。
## 1 搜索空间配置
默认搜索空间为`MobileNetV2`,详细的搜索空间配置请参考<a href='../../docs/zh_cn/api_cn/search_space.md'>搜索空间配置文档</a>
......@@ -24,3 +24,29 @@ CUDA_VISIBLE_DEVICES=0 python sa_nas_mobilenetv2.py
```shell
CUDA_VISIBLE_DEVICES=0 python block_sa_nas_mobilenetv2.py
```
# RLNAS网络结构搜索示例
本示例介绍如何使用RLNAS接口进行网络结构搜索,该示例介绍paddleslim中RLNAS的使用,完整示例代码请参考rl_nas_mobilenetv2.py或者parl_nas_mobilenetv2.py。
## 数据准备
本示例默认使用cifar10数据,cifar10数据会根据调用的paddle接口自动下载,无需额外准备。
## 接口介绍
请参考<a href='../../docs/zh_cn/api_cn/nas_api.rst'>神经网络搜索API文档</a>
示例为利用SANAS在MobileNetV2的搜索空间上搜索精度更高的模型。
## 1 搜索空间配置
默认搜索空间为`MobileNetV2`,详细的搜索空间配置请参考<a href='../../docs/zh_cn/api_cn/search_space.md'>搜索空间配置文档</a>
## 2 启动训练
### 2.1 启动基于MobileNetV2初始模型结构构造搜索空间,强化学习算法为lstm的搜索实验
```shell
CUDA_VISIBLE_DEVICES=0 python rl_nas_mobilenetv2.py
```
### 2.2 启动基于MobileNetV2初始模型结构构造搜索空间,强化学习算法为ddpg的搜索实验
```shell
CUDA_VISIBLE_DEVICES=0 python parl_nas_mobilenetv2.py
```
......@@ -137,22 +137,22 @@ def search_mobilenetv2_block(config, args, image_size):
exe.run(startup_program)
if args.data == 'cifar10':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(cycle=False), buf_size=1024),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
paddle.dataset.cifar.test10(cycle=False),
batch_size=args.batch_size,
drop_last=False)
elif args.data == 'imagenet':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
imagenet_reader.train(),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
imagenet_reader.val(),
batch_size=args.batch_size,
drop_last=False)
......
......@@ -114,9 +114,9 @@
" if current_flops > 321208544:\n",
" continue\n",
" \n",
" train_reader = paddle.batch(paddle.reader.shuffle(paddle.dataset.cifar.train10(cycle=False), buf_size=1024),batch_size=256)\n",
" train_reader = paddle.fluid.io.batch(paddle.reader.shuffle(paddle.dataset.cifar.train10(cycle=False), buf_size=1024),batch_size=256)\n",
" train_feeder = fluid.DataFeeder(inputs, fluid.CPUPlace())\n",
" test_reader = paddle.batch(paddle.dataset.cifar.test10(cycle=False),\n",
" test_reader = paddle.fluid.io.batch(paddle.dataset.cifar.test10(cycle=False),\n",
" batch_size=256)\n",
" test_feeder = fluid.DataFeeder(inputs, fluid.CPUPlace())\n",
"\n",
......@@ -160,4 +160,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
\ No newline at end of file
}
import sys
sys.path.append('..')
import numpy as np
import argparse
import ast
import time
import argparse
import ast
import logging
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddleslim.nas import RLNAS
from paddleslim.common import get_logger
from optimizer import create_optimizer
import imagenet_reader
_logger = get_logger(__name__, level=logging.INFO)
def create_data_loader(image_shape):
data_shape = [None] + image_shape
data = fluid.data(name='data', shape=data_shape, dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
data_loader = fluid.io.DataLoader.from_generator(
feed_list=[data, label],
capacity=1024,
use_double_buffer=True,
iterable=True)
return data_loader, data, label
def build_program(main_program,
startup_program,
image_shape,
archs,
args,
is_test=False):
with fluid.program_guard(main_program, startup_program):
with fluid.unique_name.guard():
data_loader, data, label = create_data_loader(image_shape)
output = archs(data)
output = fluid.layers.fc(input=output, size=args.class_dim)
softmax_out = fluid.layers.softmax(input=output, use_cudnn=False)
cost = fluid.layers.cross_entropy(input=softmax_out, label=label)
avg_cost = fluid.layers.mean(cost)
acc_top1 = fluid.layers.accuracy(
input=softmax_out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(
input=softmax_out, label=label, k=5)
if is_test == False:
optimizer = create_optimizer(args)
optimizer.minimize(avg_cost)
return data_loader, avg_cost, acc_top1, acc_top5
def search_mobilenetv2(config, args, image_size, is_server=True):
if is_server:
### start a server and a client
rl_nas = RLNAS(
key='ddpg',
configs=config,
is_sync=False,
obs_dim=26, ### step + length_of_token
server_addr=(args.server_address, args.port))
else:
### start a client
rl_nas = RLNAS(
key='ddpg',
configs=config,
is_sync=False,
obs_dim=26,
server_addr=(args.server_address, args.port),
is_server=False)
image_shape = [3, image_size, image_size]
for step in range(args.search_steps):
if step == 0:
action_prev = [1. for _ in rl_nas.range_tables]
else:
action_prev = rl_nas.tokens[0]
obs = [step]
obs.extend(action_prev)
archs = rl_nas.next_archs(obs=obs)[0][0]
train_program = fluid.Program()
test_program = fluid.Program()
startup_program = fluid.Program()
train_loader, avg_cost, acc_top1, acc_top5 = build_program(
train_program, startup_program, image_shape, archs, args)
test_loader, test_avg_cost, test_acc_top1, test_acc_top5 = build_program(
test_program,
startup_program,
image_shape,
archs,
args,
is_test=True)
test_program = test_program.clone(for_test=True)
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup_program)
if args.data == 'cifar10':
train_reader = paddle.fluid.io.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(cycle=False), buf_size=1024),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.fluid.io.batch(
paddle.dataset.cifar.test10(cycle=False),
batch_size=args.batch_size,
drop_last=False)
elif args.data == 'imagenet':
train_reader = paddle.fluid.io.batch(
imagenet_reader.train(),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.fluid.io.batch(
imagenet_reader.val(),
batch_size=args.batch_size,
drop_last=False)
train_loader.set_sample_list_generator(
train_reader,
places=fluid.cuda_places() if args.use_gpu else fluid.cpu_places())
test_loader.set_sample_list_generator(test_reader, places=place)
build_strategy = fluid.BuildStrategy()
train_compiled_program = fluid.CompiledProgram(
train_program).with_data_parallel(
loss_name=avg_cost.name, build_strategy=build_strategy)
for epoch_id in range(args.retain_epoch):
for batch_id, data in enumerate(train_loader()):
fetches = [avg_cost.name]
s_time = time.time()
outs = exe.run(train_compiled_program,
feed=data,
fetch_list=fetches)[0]
batch_time = time.time() - s_time
if batch_id % 10 == 0:
_logger.info(
'TRAIN: steps: {}, epoch: {}, batch: {}, cost: {}, batch_time: {}ms'.
format(step, epoch_id, batch_id, outs[0], batch_time))
reward = []
for batch_id, data in enumerate(test_loader()):
test_fetches = [
test_avg_cost.name, test_acc_top1.name, test_acc_top5.name
]
batch_reward = exe.run(test_program,
feed=data,
fetch_list=test_fetches)
reward_avg = np.mean(np.array(batch_reward), axis=1)
reward.append(reward_avg)
_logger.info(
'TEST: step: {}, batch: {}, avg_cost: {}, acc_top1: {}, acc_top5: {}'.
format(step, batch_id, batch_reward[0], batch_reward[1],
batch_reward[2]))
finally_reward = np.mean(np.array(reward), axis=0)
_logger.info(
'FINAL TEST: avg_cost: {}, acc_top1: {}, acc_top5: {}'.format(
finally_reward[0], finally_reward[1], finally_reward[2]))
obs = np.expand_dims(obs, axis=0).astype('float32')
actions = rl_nas.tokens
obs_next = [step + 1]
obs_next.extend(actions[0])
obs_next = np.expand_dims(obs_next, axis=0).astype('float32')
if step == args.search_steps - 1:
terminal = np.expand_dims([True], axis=0).astype(np.bool)
else:
terminal = np.expand_dims([False], axis=0).astype(np.bool)
rl_nas.reward(
np.expand_dims(
np.float32(finally_reward[1]), axis=0),
obs=obs,
actions=actions.astype('float32'),
obs_next=obs_next,
terminal=terminal)
if step == 2:
sys.exit(0)
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description='RL NAS MobileNetV2 cifar10 argparase')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='Whether to use GPU in train/test model.')
parser.add_argument(
'--batch_size', type=int, default=256, help='batch size.')
parser.add_argument(
'--class_dim', type=int, default=10, help='classify number.')
parser.add_argument(
'--data',
type=str,
default='cifar10',
choices=['cifar10', 'imagenet'],
help='server address.')
parser.add_argument(
'--is_server',
type=ast.literal_eval,
default=True,
help='Whether to start a server.')
parser.add_argument(
'--search_steps',
type=int,
default=100,
help='controller server number.')
parser.add_argument(
'--server_address', type=str, default="", help='server ip.')
parser.add_argument('--port', type=int, default=8881, help='server port')
parser.add_argument(
'--retain_epoch', type=int, default=5, help='epoch for each token.')
parser.add_argument('--lr', type=float, default=0.1, help='learning rate.')
args = parser.parse_args()
print(args)
if args.data == 'cifar10':
image_size = 32
block_num = 3
elif args.data == 'imagenet':
image_size = 224
block_num = 6
else:
raise NotImplementedError(
'data must in [cifar10, imagenet], but received: {}'.format(
args.data))
config = [('MobileNetV2Space')]
search_mobilenetv2(config, args, image_size, is_server=args.is_server)
......@@ -65,8 +65,10 @@ def search_mobilenetv2(config, args, image_size, is_server=True):
is_sync=False,
server_addr=(args.server_address, args.port),
controller_batch_size=1,
controller_decay_steps=1000,
controller_decay_rate=0.8,
lstm_num_layers=1,
hidden_size=100,
hidden_size=10,
temperature=1.0)
else:
### start a client
......@@ -78,6 +80,9 @@ def search_mobilenetv2(config, args, image_size, is_server=True):
lstm_num_layers=1,
hidden_size=10,
temperature=1.0,
controller_batch_size=1,
controller_decay_steps=1000,
controller_decay_rate=0.8,
is_server=False)
image_shape = [3, image_size, image_size]
......@@ -104,22 +109,22 @@ def search_mobilenetv2(config, args, image_size, is_server=True):
exe.run(startup_program)
if args.data == 'cifar10':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(cycle=False), buf_size=1024),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
paddle.dataset.cifar.test10(cycle=False),
batch_size=args.batch_size,
drop_last=False)
elif args.data == 'imagenet':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
imagenet_reader.train(),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
imagenet_reader.val(),
batch_size=args.batch_size,
drop_last=False)
......@@ -182,7 +187,7 @@ if __name__ == '__main__':
parser.add_argument(
'--batch_size', type=int, default=256, help='batch size.')
parser.add_argument(
'--class_dim', type=int, default=1000, help='classify number.')
'--class_dim', type=int, default=10, help='classify number.')
parser.add_argument(
'--data',
type=str,
......
......@@ -102,22 +102,22 @@ def search_mobilenetv2(config, args, image_size, is_server=True):
exe.run(startup_program)
if args.data == 'cifar10':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(cycle=False), buf_size=1024),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
paddle.dataset.cifar.test10(cycle=False),
batch_size=args.batch_size,
drop_last=False)
elif args.data == 'imagenet':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
imagenet_reader.train(),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
imagenet_reader.val(),
batch_size=args.batch_size,
drop_last=False)
......@@ -177,7 +177,7 @@ def test_search_result(tokens, image_size, args, config):
image_shape = [3, image_size, image_size]
archs = sa_nas.tokens2arch(tokens)
archs = sa_nas.tokens2arch(tokens)[0]
train_program = fluid.Program()
test_program = fluid.Program()
......@@ -197,22 +197,22 @@ def test_search_result(tokens, image_size, args, config):
exe.run(startup_program)
if args.data == 'cifar10':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(cycle=False), buf_size=1024),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
paddle.dataset.cifar.test10(cycle=False),
batch_size=args.batch_size,
drop_last=False)
elif args.data == 'imagenet':
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
imagenet_reader.train(),
batch_size=args.batch_size,
drop_last=True)
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
imagenet_reader.val(), batch_size=args.batch_size, drop_last=False)
train_loader.set_sample_list_generator(
......@@ -271,7 +271,7 @@ if __name__ == '__main__':
parser.add_argument(
'--batch_size', type=int, default=256, help='batch size.')
parser.add_argument(
'--class_dim', type=int, default=1000, help='classify number.')
'--class_dim', type=int, default=10, help='classify number.')
parser.add_argument(
'--data',
type=str,
......
......@@ -113,7 +113,7 @@ def test_mnist(model, tokens=None):
acc_set = []
avg_loss_set = []
batch_size = 64
test_reader = paddle.batch(
test_reader = paddle.fluid.io.batch(
paddle.dataset.mnist.test(), batch_size=batch_size, drop_last=True)
for batch_id, data in enumerate(test_reader()):
dy_x_data = np.array([x[0].reshape(1, 28, 28)
......@@ -145,7 +145,7 @@ def train_mnist(args, model, tokens=None):
adam = AdamOptimizer(
learning_rate=0.001, parameter_list=model.parameters())
train_reader = paddle.batch(
train_reader = paddle.fluid.io.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
if args.use_data_parallel:
train_reader = fluid.contrib.reader.distributed_batch_reader(
......
# Distillation example: Chinese lexical analysis
We demonstrated how to use the Pantheon framework for online distillation of the Chinese lexical analysis model with sample dataset. The effect of large-scale online distillation is shown below:
| model | Precision | Recall | F1-score|
| ------ | ------ | ------ | ------ |
| BiGRU | 89.2 | 89.4 | 89.3 |
| BERT fine-tuned | 90.2 | 90.4 | 90.3 |
| ERNIE fine-tuned | 91.7 | 91.7 | 91.7 |
| DistillBiGRU | 90.20 | 90.52 | 90.36 |
BiGRU is to train a BiGRU based LAC model from scratch; BERT fine-tuned is to fine-tune LAC task on BERT base model; ERNIE fine-tuned is to fine-tune LAC task on BERT base model; DistillBiGRU is trained through large-scale online distillation with ERNIE fine-tuned as teacher model.
## Introduction
Lexical Analysis of Chinese, or LAC for short, is a lexical analysis model that completes the tasks of Chinese word segmentation, part-of-speech tagging, and named entity recognition in a single model. We conduct an overall evaluation of word segmentation, part-of-speech tagging, and named entity recognition on a self-built dataset. We use the finetuned [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) model as the Teacher model and GRU as the Student model, which are needed by the Pantheon framework for online distillation.
#### 1. Download the training data set
Download the data set file, and after decompression, a `./data/` folder will be created.
```bash
python downloads.py dataset
```
#### 2. Download the Teacher model
```bash
# download ERNIE finetuned model
python downloads.py finetuned
python downloads.py conf
```
### 3. Distilling Student model
```bash
# start teacher service
bash run_teacher.sh
# start student service
bash run_student.sh
```
> If you want to learn more about LAC, you can refer to this repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis
\ No newline at end of file
# 蒸馏样例:中文词法分析
我们在样例数据集上,对中文词法分析模型,演示了如何使用Pantheon框架进行在线蒸馏。大规模在线蒸馏的效果如下图所示:
| 模型 | 精度 | 召回率 | F1值|
| ------ | ------ | ------ | ------ |
| BiGRU | 89.2 | 89.4 | 89.3 |
| BERT fine-tuned | 90.2 | 90.4 | 90.3 |
| ERNIE fine-tuned | 91.7 | 91.7 | 91.7 |
| DistillBiGRU | 90.20 | 90.52 | 90.36 |
BiGRU 是使用双向GRU网络从头训练LAC任务;BERT fine-tuned 是在BERT base模型上微调LAC任务;ERNIE fine-tuned 是在ERNIE base模型上微调LAC任务;DistillBiGRU 是使用ERNIE fine-tuned模型作为teacher模型,通过大规模蒸馏训练LAC任务。
## 简介
Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型,在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果。我们使用经过finetune的 [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) 模型作为Teacher模型,使用GRU作为Student模型,使用Pantheon框架进行在线蒸馏。
#### 1. 下载训练数据集
下载数据集文件,解压后会生成 `./data/` 文件夹
```bash
python downloads.py dataset
```
#### 2. 下载Teacher模型
```bash
# download ERNIE finetuned model
python downloads.py finetuned
python downloads.py conf
```
### 3. 蒸馏Student模型
```bash
# start teacher service
bash run_teacher.sh
# start student service
bash run_student.sh
```
> 如果你想详细了解LAC的原理可以参照相关repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis
from .teacher import Teacher
from .student import Student
__all__ = teacher.__all__ + student.__all__
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Define the function to create lexical analysis model and model's data reader
"""
import sys
import os
import math
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.initializer import NormalInitializer
from reader import Dataset
from ernie_reader import SequenceLabelReader
from models.sequence_labeling import nets
from models.representation.ernie import ernie_encoder, ernie_pyreader
def create_model(args, vocab_size, num_labels, mode='train'):
"""create lac model"""
# model's input data
words = fluid.data(name='words', shape=[-1, 1], dtype='int64', lod_level=1)
targets = fluid.data(
name='targets', shape=[-1, 1], dtype='int64', lod_level=1)
if mode == "train":
print("create model mode: ", mode)
teacher_crf_decode = fluid.data(
name='teacher_crf_decode', shape=[-1, 1], dtype='float32', lod_level=1)
else:
print("create model mode: ", mode)
teacher_crf_decode = None
feed_list = [words, targets]
if teacher_crf_decode:
feed_list.append(teacher_crf_decode)
pyreader = fluid.io.DataLoader.from_generator(
feed_list=feed_list,
capacity=200,
use_double_buffer=True,
iterable=False)
# for test or train process
avg_cost, crf_avg_cost, teacher_cost, crf_decode= nets.lex_net(
words, args, vocab_size, num_labels, teacher_crf_decode,for_infer=False, target=targets)
(precision, recall, f1_score, num_infer_chunks, num_label_chunks,
num_correct_chunks) = fluid.layers.chunk_eval(
input=crf_decode,
label=targets,
chunk_scheme="IOB",
num_chunk_types=int(math.ceil((num_labels - 1) / 2.0)))
chunk_evaluator = fluid.metrics.ChunkEvaluator()
chunk_evaluator.reset()
ret = {
"pyreader": pyreader,
"words": words,
"targets": targets,
"avg_cost": avg_cost,
"crf_avg_cost": crf_avg_cost,
"teacher_cost": teacher_cost,
"crf_decode": crf_decode,
"precision": precision,
"recall": recall,
"f1_score": f1_score,
"chunk_evaluator": chunk_evaluator,
"num_infer_chunks": num_infer_chunks,
"num_label_chunks": num_label_chunks,
"num_correct_chunks": num_correct_chunks
}
return ret
def create_lexnet_data_generator(args,
reader,
file_name,
place,
mode='train'):
if mode == 'train':
def wrapper():
batch_words, batch_labels, batch_emissions, seq_lens = [], [], None, []
emi_lens = []
for epoch in range(args.epoch):
print("data epoch: {}".format(epoch))
for instance in reader.file_reader(file_name, mode="train")():
words, labels, emission = instance
if len(seq_lens) < args.batch_size:
batch_words.append(words)
batch_labels.append(labels)
if batch_emissions is not None:
batch_emissions = np.concatenate((batch_emissions, emission))
else:
batch_emissions = emission
seq_lens.append(len(words))
emi_lens.append(emission.shape[0])
if len(seq_lens) == args.batch_size:
#print("batch words len", [len(seq) for seq in batch_words])
#print("batch labels len", [len(seq) for seq in batch_labels])
#print("emi lens:", emi_lens)
#print("emission first dim:", batch_emissions.shape[0])
#print("reduced seq_lens:", sum(seq_lens))
t_words = fluid.create_lod_tensor(batch_words, [seq_lens], place)
t_labels = fluid.create_lod_tensor(batch_labels, [seq_lens], place)
t_emissions = fluid.create_lod_tensor(batch_emissions, [seq_lens], place)
yield t_words, t_labels, t_emissions
batch_words, batch_labels, batch_emissions, seq_lens = [], [], None, []
emi_lens = []
if len(seq_lens) > 0:
t_words = fluid.create_lod_tensor(batch_words, [seq_lens], place)
t_labels = fluid.create_lod_tensor(batch_labels, [seq_lens], place)
t_emissions = fluid.create_lod_tensor(batch_emissions, [seq_lens], place)
yield t_words, t_labels, t_emissions
batch_words, batch_labels, batch_emissions, seq_lens = [], [], None, []
else:
def wrapper():
batch_words, batch_labels, seq_lens = [], [], []
for instance in reader.file_reader(file_name, mode="test")():
words, labels = instance
if len(seq_lens) < args.batch_size:
batch_words.append(words)
batch_labels.append(labels)
seq_lens.append(len(words))
if len(seq_lens) == args.batch_size:
t_words = fluid.create_lod_tensor(batch_words, [seq_lens], place)
t_labels = fluid.create_lod_tensor(batch_labels, [seq_lens], place)
yield t_words, t_labels
batch_words, batch_labels, seq_lens = [], [], []
if len(seq_lens) > 0:
t_words = fluid.create_lod_tensor(batch_words, [seq_lens], place)
t_labels = fluid.create_lod_tensor(batch_labels, [seq_lens], place)
yield t_words, t_labels
batch_words, batch_labels, seq_lens = [], [], []
return wrapper
def create_pyreader(args,
file_name,
feed_list,
place,
model='lac',
reader=None,
return_reader=False,
mode='train'):
reader = SequenceLabelReader(
vocab_path=args.vocab_path,
label_map_config=args.label_map_config,
max_seq_len=args.max_seq_len,
do_lower_case=args.do_lower_case,
random_seed=args.random_seed)
return reader.data_generator(file_name,args.batch_size,args.epoch,shuffle=False,phase="train")
def create_ernie_model(args, ernie_config):
"""
Create Model for LAC based on ERNIE encoder
"""
# ERNIE's input data
src_ids = fluid.data(
name='src_ids', shape=[-1, args.max_seq_len, 1], dtype='int64')
sent_ids = fluid.data(
name='sent_ids', shape=[-1, args.max_seq_len, 1], dtype='int64')
pos_ids = fluid.data(
name='pos_ids', shape=[-1, args.max_seq_len, 1], dtype='int64')
input_mask = fluid.data(
name='input_mask', shape=[-1, args.max_seq_len, 1], dtype='float32')
padded_labels = fluid.data(
name='padded_labels', shape=[-1, args.max_seq_len, 1], dtype='int64')
seq_lens = fluid.data(
name='seq_lens', shape=[-1], dtype='int64', lod_level=0)
squeeze_labels = fluid.layers.squeeze(padded_labels, axes=[-1])
# ernie_pyreader
ernie_inputs = {
"src_ids": src_ids,
"sent_ids": sent_ids,
"pos_ids": pos_ids,
"input_mask": input_mask,
"seq_lens": seq_lens
}
embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config)
padded_token_embeddings = embeddings["padded_token_embeddings"]
emission = fluid.layers.fc(
size=args.num_labels,
input=padded_token_embeddings,
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=-args.init_bound, high=args.init_bound),
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=1e-4)),
num_flatten_dims=2)
crf_cost = fluid.layers.linear_chain_crf(
input=emission,
label=padded_labels,
param_attr=fluid.ParamAttr(
name='crfw', learning_rate=args.crf_learning_rate),
length=seq_lens)
avg_cost = fluid.layers.mean(x=crf_cost)
crf_decode = fluid.layers.crf_decoding(
input=emission,
param_attr=fluid.ParamAttr(name='crfw'),
length=seq_lens)
(precision, recall, f1_score, num_infer_chunks, num_label_chunks,
num_correct_chunks) = fluid.layers.chunk_eval(
input=crf_decode,
label=squeeze_labels,
chunk_scheme="IOB",
num_chunk_types=int(math.ceil((args.num_labels - 1) / 2.0)),
seq_length=seq_lens)
chunk_evaluator = fluid.metrics.ChunkEvaluator()
chunk_evaluator.reset()
ret = {
"feed_list":
[src_ids, sent_ids, pos_ids, input_mask, padded_labels, seq_lens],
"words": src_ids,
"pos_ids":pos_ids,
"sent_ids":sent_ids,
"input_mask":input_mask,
"labels": padded_labels,
"seq_lens": seq_lens,
"avg_cost": avg_cost,
"crf_decode": crf_decode,
"precision": precision,
"recall": recall,
"f1_score": f1_score,
"chunk_evaluator": chunk_evaluator,
"num_infer_chunks": num_infer_chunks,
"num_label_chunks": num_label_chunks,
"num_correct_chunks": num_correct_chunks,
"emission":emission,
"alpha": None
}
return ret
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Download script, download dataset and pretrain models.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import io
import os
import sys
import time
import hashlib
import tarfile
import requests
FILE_INFO = {
'BASE_URL': 'https://baidu-nlp.bj.bcebos.com/',
'DATA': {
'name': 'lexical_analysis-dataset-2.0.0.tar.gz',
'md5': '71e4a9a36d0f0177929a1bccedca7dba'
},
'FINETURN_MODEL': {
'name': 'lexical_analysis_finetuned-1.0.0.tar.gz',
'md5': "ee2c7614b06dcfd89561fbbdaac34342"
},
'CONF': {
'name': 'conf.tar.gz',
'md5': "7a0fe28db46db496fff4361eebaa6515",
'url': 'https://paddlemodels.bj.bcebos.com/PaddleSlim/pantheon/lexical_analysis/',
}
}
def usage():
desc = ("\nDownload datasets and pretrained models for LAC.\n"
"Usage:\n"
" 1. python download.py all\n"
" 2. python download.py dataset\n"
" 3. python download.py finetuned\n"
" 4. python download.py conf\n")
print(desc)
def md5file(fname):
hash_md5 = hashlib.md5()
with io.open(fname, "rb") as fin:
for chunk in iter(lambda: fin.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
def extract(fname, dir_path):
"""
Extract tar.gz file
"""
try:
tar = tarfile.open(fname, "r")
file_names = tar.getnames()
for file_name in file_names:
tar.extract(file_name, dir_path)
print(file_name)
tar.close()
except Exception as e:
raise e
def _download(url, filename, md5sum):
"""
Download file and check md5
"""
retry = 0
retry_limit = 3
chunk_size = 4096
while not (os.path.exists(filename) and md5file(filename) == md5sum):
if retry < retry_limit:
retry += 1
else:
raise RuntimeError(
"Cannot download dataset ({0}) with retry {1} times.".format(
url, retry_limit))
try:
start = time.time()
size = 0
res = requests.get(url, stream=True)
filesize = int(res.headers['content-length'])
if res.status_code == 200:
print("[Filesize]: %0.2f MB" % (filesize / 1024 / 1024))
# save by chunk
with io.open(filename, "wb") as fout:
for chunk in res.iter_content(chunk_size=chunk_size):
if chunk:
fout.write(chunk)
size += len(chunk)
pr = '>' * int(size * 50 / filesize)
print(
'\r[Process ]: %s%.2f%%' %
(pr, float(size / filesize * 100)),
end='')
end = time.time()
print("\n[CostTime]: %.2f s" % (end - start))
except Exception as e:
print(e)
def download(name, dir_path):
# import ipdb; ipdb.set_trace()
if name == 'CONF':
url = FILE_INFO[name]['url'] + FILE_INFO[name]['name']
else:
url = FILE_INFO['BASE_URL'] + FILE_INFO[name]['name']
file_path = os.path.join(dir_path, FILE_INFO[name]['name'])
if not os.path.exists(dir_path):
os.makedirs(dir_path)
# download data
print("Downloading : %s" % name)
_download(url, file_path, FILE_INFO[name]['md5'])
# extract data
print("Extracting : %s" % file_path)
extract(file_path, dir_path)
os.remove(file_path)
if __name__ == '__main__':
if len(sys.argv) != 2:
usage()
sys.exit(1)
pwd = os.path.join(os.path.dirname(__file__), './')
ernie_dir = os.path.join(os.path.dirname(__file__), './pretrained')
if sys.argv[1] == 'all':
download('DATA', pwd)
download('FINETURN_MODEL', pwd)
download('CONF', pwd)
if sys.argv[1] == "dataset":
download('DATA', pwd)
elif sys.argv[1] == "finetuned":
download('FINETURN_MODEL', pwd)
elif sys.argv[1] == "conf":
download('CONF', pwd)
else:
usage()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This module provides reader for ernie model
"""
import sys
from collections import namedtuple
import numpy as np
sys.path.append("..")
from preprocess.ernie.task_reader import BaseReader, tokenization
def pad_batch_data(insts,
pad_idx=0,
max_len=128,
return_pos=False,
return_input_mask=False,
return_max_len=False,
return_num_token=False,
return_seq_lens=False):
"""
Pad the instances to the max sequence length in batch, and generate the
corresponding position data and input mask.
"""
return_list = []
# max_len = max(len(inst) for inst in insts)
max_len = max_len
# Any token included in dict can be used to pad, since the paddings' loss
# will be masked out by weights and make no effect on parameter gradients.
inst_data = np.array(
[inst + list([pad_idx] * (max_len - len(inst))) for inst in insts])
return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])]
# position data
if return_pos:
inst_pos = np.array([
list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
for inst in insts
])
return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])]
if return_input_mask:
# This is used to avoid attention on paddings.
input_mask_data = np.array([[1] * len(inst) + [0] *
(max_len - len(inst)) for inst in insts])
input_mask_data = np.expand_dims(input_mask_data, axis=-1)
return_list += [input_mask_data.astype("float32")]
if return_max_len:
return_list += [max_len]
if return_num_token:
num_token = 0
for inst in insts:
num_token += len(inst)
return_list += [num_token]
if return_seq_lens:
seq_lens = np.array([len(inst) for inst in insts])
return_list += [seq_lens.astype("int64").reshape([-1])]
return return_list if len(return_list) > 1 else return_list[0]
class SequenceLabelReader(BaseReader):
"""SequenceLabelReader"""
def _pad_batch_records(self, batch_records):
batch_token_ids = [record.token_ids for record in batch_records]
batch_text_type_ids = [record.text_type_ids for record in batch_records]
batch_position_ids = [record.position_ids for record in batch_records]
batch_label_ids = [record.label_ids for record in batch_records]
# padding
padded_token_ids, input_mask, batch_seq_lens = pad_batch_data(
batch_token_ids,
max_len=self.max_seq_len,
pad_idx=self.pad_id,
return_input_mask=True,
return_seq_lens=True)
padded_text_type_ids = pad_batch_data(
batch_text_type_ids, max_len=self.max_seq_len, pad_idx=self.pad_id)
padded_position_ids = pad_batch_data(
batch_position_ids, max_len=self.max_seq_len, pad_idx=self.pad_id)
padded_label_ids = pad_batch_data(
batch_label_ids,
max_len=self.max_seq_len,
pad_idx=len(self.label_map) - 1)
return_list = [
padded_token_ids, padded_text_type_ids, padded_position_ids,
input_mask, padded_label_ids, batch_seq_lens
]
return return_list
def _reseg_token_label(self, tokens, labels, tokenizer):
assert len(tokens) == len(labels)
ret_tokens = []
ret_labels = []
for token, label in zip(tokens, labels):
sub_token = tokenizer.tokenize(token)
if len(sub_token) == 0:
continue
ret_tokens.extend(sub_token)
ret_labels.append(label)
if len(sub_token) < 2:
continue
sub_label = label
if label.startswith("B-"):
sub_label = "I-" + label[2:]
ret_labels.extend([sub_label] * (len(sub_token) - 1))
assert len(ret_tokens) == len(ret_labels)
return ret_tokens, ret_labels
def _convert_example_to_record(self, example, max_seq_length, tokenizer):
tokens = tokenization.convert_to_unicode(example.text_a).split(u"")
labels = tokenization.convert_to_unicode(example.label).split(u"")
tokens, labels = self._reseg_token_label(tokens, labels, tokenizer)
if len(tokens) > max_seq_length - 2:
tokens = tokens[0:(max_seq_length - 2)]
labels = labels[0:(max_seq_length - 2)]
tokens = ["[CLS]"] + tokens + ["[SEP]"]
token_ids = tokenizer.convert_tokens_to_ids(tokens)
position_ids = list(range(len(token_ids)))
text_type_ids = [0] * len(token_ids)
no_entity_id = len(self.label_map) - 1
labels = [
label if label in self.label_map else u"O" for label in labels
]
label_ids = [no_entity_id] + [
self.label_map[label] for label in labels
] + [no_entity_id]
Record = namedtuple(
'Record',
['token_ids', 'text_type_ids', 'position_ids', 'label_ids'])
record = Record(
token_ids=token_ids,
text_type_ids=text_type_ids,
position_ids=position_ids,
label_ids=label_ids)
return record
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import time
import sys
import paddle.fluid as fluid
import paddle
import model_utils
import reader
import creator
sys.path.append('models/')
from model_check import check_cuda
from model_check import check_version
parser = argparse.ArgumentParser(__doc__)
# 1. model parameters
model_g = model_utils.ArgumentGroup(parser, "model", "model configuration")
model_g.add_arg("word_emb_dim", int, 128,
"The dimension in which a word is embedded.")
model_g.add_arg("grnn_hidden_dim", int, 128,
"The number of hidden nodes in the GRNN layer.")
model_g.add_arg("bigru_num", int, 2,
"The number of bi_gru layers in the network.")
model_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
# 2. data parameters
data_g = model_utils.ArgumentGroup(parser, "data", "data paths")
data_g.add_arg("word_dict_path", str, "./conf/word.dic",
"The path of the word dictionary.")
data_g.add_arg("label_dict_path", str, "./conf/tag.dic",
"The path of the label dictionary.")
data_g.add_arg("word_rep_dict_path", str, "./conf/q2b.dic",
"The path of the word replacement Dictionary.")
data_g.add_arg("test_data", str, "./data/test.tsv",
"The folder where the training data is located.")
data_g.add_arg("init_checkpoint", str, "./model_baseline", "Path to init model")
data_g.add_arg(
"batch_size", int, 200,
"The number of sequences contained in a mini-batch, "
"or the maximum number of tokens (include paddings) contained in a mini-batch."
)
def do_eval(args):
print('do_eval...........')
dataset = reader.Dataset(args)
test_program = fluid.Program()
with fluid.program_guard(test_program, fluid.default_startup_program()):
with fluid.unique_name.guard():
test_ret = creator.create_model(
args, dataset.vocab_size, dataset.num_labels, mode='test')
test_program = test_program.clone(for_test=True)
# init executor
if args.use_cuda:
place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
else:
place = fluid.CPUPlace()
pyreader = creator.create_pyreader(
args,
file_name=args.test_data,
feed_list=test_ret['feed_list'],
place=place,
model='lac',
reader=dataset,
mode='test')
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
# load model
model_utils.init_checkpoint(exe, args.init_checkpoint, test_program)
test_process(
exe=exe, program=test_program, reader=pyreader, test_ret=test_ret)
def test_process(exe, program, reader, test_ret):
"""
the function to execute the infer process
:param exe: the fluid Executor
:param program: the infer_program
:param reader: data reader
:return: the list of prediction result
"""
print('test_process...........')
test_ret["chunk_evaluator"].reset()
start_time = time.time()
reader.start()
while True:
try:
nums_infer, nums_label, nums_correct = exe.run(
program,
fetch_list=[
test_ret["num_infer_chunks"],
test_ret["num_label_chunks"],
test_ret["num_correct_chunks"],
])
test_ret["chunk_evaluator"].update(nums_infer, nums_label, nums_correct)
except fluid.core.EOFException:
reader.reset()
break
precision, recall, f1 = test_ret["chunk_evaluator"].eval()
end_time = time.time()
print("[test] P: %.5f, R: %.5f, F1: %.5f, elapsed time: %.3f s" %
(precision, recall, f1, end_time - start_time))
if __name__ == '__main__':
args = parser.parse_args()
check_cuda(args.use_cuda)
check_version()
do_eval(args)
此差异已折叠。
#encoding=utf8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import paddle
import paddle.fluid as fluid
def check_cuda(use_cuda, err = \
"\nYou can not set use_cuda = True in the model because you are using paddlepaddle-cpu.\n \
Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_cuda = False to run models on CPU.\n"
):
"""
Log error and exit when set use_gpu=true in paddlepaddle
cpu version.
"""
try:
if use_cuda == True and fluid.is_compiled_with_cuda() == False:
print(err)
sys.exit(1)
except Exception as e:
pass
def check_version():
"""
Log error and exit when the installed version of paddlepaddle is
not satisfied.
"""
err = "PaddlePaddle version 1.6 or higher is required, " \
"or a suitable develop version is satisfied as well. \n" \
"Please make sure the version is good with your code." \
try:
fluid.require_version('1.6.0')
except Exception as e:
print(err)
sys.exit(1)
def check_version():
"""
Log error and exit when the installed version of paddlepaddle is
not satisfied.
"""
err = "PaddlePaddle version 1.6 or higher is required, " \
"or a suitable develop version is satisfied as well. \n" \
"Please make sure the version is good with your code." \
try:
fluid.require_version('1.6.0')
except Exception as e:
print(err)
sys.exit(1)
if __name__ == "__main__":
check_cuda(True)
check_cuda(False)
check_cuda(True, "This is only for testing.")
此差异已折叠。
此差异已折叠。
#!/bin/bash
export CUDA_VISIBLE_DEVICES=5,6
python -u train_student.py \
--train_data ./data/train.tsv \
--test_data ./data/test.tsv \
--model_save_dir ./teacher_ernie_init_lac_1gru_emb128 \
--validation_steps 1000 \
--save_steps 1000 \
--print_steps 100 \
--batch_size 32 \
--epoch 10 \
--traindata_shuffle_buffer 20000 \
--word_emb_dim 128 \
--grnn_hidden_dim 128 \
--bigru_num 1 \
--base_learning_rate 1e-3 \
--emb_learning_rate 2 \
--crf_learning_rate 0.2 \
--word_dict_path ./conf/word.dic \
--label_dict_path ./conf/tag.dic \
--word_rep_dict_path ./conf/q2b.dic \
--enable_ce false \
--use_cuda true \
--in_address "127.0.0.1:5002"
#!/bin/bash
export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1
export FLAGS_fraction_of_gpu_memory_to_use=0.99
export CUDA_VISIBLE_DEVICES=5,6 # which GPU to use
ERNIE_FINETUNED_MODEL_PATH=./model_finetuned
DATA_PATH=./data/
python -u teacher_ernie.py \
--ernie_config_path "conf/ernie_config.json" \
--init_checkpoint "${ERNIE_FINETUNED_MODEL_PATH}" \
--init_bound 0.1 \
--vocab_path "conf/vocab.txt" \
--batch_size 32 \
--random_seed 0 \
--num_labels 57 \
--max_seq_len 128 \
--test_data "${DATA_PATH}/train.tsv" \
--label_map_config "./conf/label_map.json" \
--do_lower_case true \
--use_cuda true \
--out_port=5002
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册