未验证 提交 e5a3b1f3 编写于 作者: W whs 提交者: GitHub

[cherry-pick/release1.7] Update docs of PaddleSlim (#1808)

上级 018f6cdf
......@@ -4,10 +4,13 @@
- `服务器端部署 <inference/index_cn.html>`_ :介绍了如何在服务器端将模型部署上线
- `移动端部署 <mobile/index_cn.html>`_:介绍了 PaddlePaddle 组织下的嵌入式平台深度学习框架Paddle-Lite
- `移动端部署 <mobile/index_cn.html>`_ :介绍了 PaddlePaddle 组织下的嵌入式平台深度学习框架Paddle-Lite
- `模型压缩 <paddleslim/paddle_slim.html>`_ :简要介绍了PaddleSlim模型压缩工具库的特点以及使用说明。
.. toctree::
:hidden:
inference/index_cn.rst
mobile/index_cn.rst
paddleslim/paddle_slim.md
......@@ -4,8 +4,10 @@ Deploy Inference Model
- `Server side Deployment <inference/index_en.html>`_ : This section illustrates the method how to deploy and release the trained models on the servers
- `Model Compression <paddleslim/paddle_slim_en.html>`_ : Introduce the features and usage of PaddleSlim which is a toolkit for model compression.
.. toctree::
:hidden:
inference/index_en.rst
\ No newline at end of file
inference/index_en.rst
paddleslim/paddle_slim_en.rst
......@@ -2,14 +2,11 @@
移动端部署
##########
本模块介绍了飞桨的端侧推理引擎Paddle-Lite以及在模型压缩工具PaddleSlim,包括
本模块介绍了飞桨的端侧推理引擎Paddle-Lite:
* `Paddle Lite <mobile_index.html>`_:简要介绍了 Paddle-Lite 特点以及使用说明。
* `PaddleSlim <paddle_slim.html>`_:简要介绍了PaddleSlim 特点以及使用说明。
.. toctree::
:hidden:
mobile_index.md
paddle_slim.md
# 模型压缩工具库
<div align="center">
<h3>
模型压缩工具库
<span> | </span>
<a href="https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/tutorial.md">
算法原理介绍
</a>
<span> | </span>
<a href="https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/usage.md">
使用文档
</a>
<span> | </span>
<a href="https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/demo.md">
示例文档
</a>
<span> | </span>
<a href="https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/model_zoo.md">
Model Zoo
</a>
</h3>
</div>
## 简介
PaddleSlim是PaddlePaddle框架的一个子模块。PaddleSlim首次发布于PaddlePaddle 1.4版本。在PaddleSlim中,实现了目前主流的网络剪枝、量化、蒸馏三种压缩策略,主要用于压缩图像领域模型。在后续版本中,会添加更多的压缩策略,以及完善对NLP领域模型的支持。
## 主要特点
Paddle-Slim工具库有以下特点:
### 接口简单
- 以配置文件方式集中管理可配参数,方便实验管理
- 在普通模型训练脚本上,添加极少代码即可完成模型压缩
详见:[使用示例](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/demo.md)
### 效果好
- 对于冗余信息较少的MobileNetV1模型,卷积核剪切策略依然可缩减模型大小,并保持尽量少的精度损失。
- 蒸馏压缩策略可明显提升原始模型的精度。
- 量化训练与蒸馏的组合使用,可同时做到缩减模型大小和提升模型精度。
详见:[效果数据与ModelZoo](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/model_zoo.md)
### 功能更强更灵活
- 剪切压缩过程自动化
- 剪切压缩策略支持更多网络结构
- 蒸馏支持多种方式,用户可自定义组合loss
- 支持快速配置多种压缩策略组合使用
详见:[使用说明](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/usage.md)
## 架构介绍
这里简要介绍模型压缩工具实现的整体原理,便于理解使用流程。
**图 1**为模型压缩工具的架构图,从上到下为API依赖关系。蒸馏模块、量化模块和剪切模块都间接依赖底层的paddle框架。目前,模型压缩工具作为了PaddlePaddle框架的一部分,所以已经安装普通版本paddle的用户需要重新下载安装支持模型压缩功能的paddle,才能使用压缩功能。
<p align="center">
<img src="https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/images/framework_0.png?raw=true" height=252 width=406 hspace='10'/> <br />
<strong>图 1</strong>
</p>
**图 1**所示,最上层的紫色模块为用户接口,在Python脚本中调用模型压缩功能时,只需要构造一个Compressor对象即可,在[使用文档](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/usage.md)中会有详细说明。
我们将每个压缩算法称为压缩策略,在迭代训练模型的过程中调用用户注册的压缩策略完成模型压缩,如**图2**所示。其中,模型压缩工具封装好了模型训练逻辑,用户只需要提供训练模型需要的网络结构、数据、优化策略(optimizer)等,在[使用文档](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/usage.md)会对此详细介绍。
<p align="center">
<img src="https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/images/framework_1.png?raw=true" height=255 width=646 hspace='10'/> <br />
<strong>图 2</strong>
</p>
## 功能列表
### 剪切
- 支持敏感度和uniform两种方式
- 支持VGG、ResNet、MobileNet等各种类型的网络
- 支持用户自定义剪切范围
### 量化训练
- 支持动态和静态两种量化训练方式
- 动态策略: 在推理过程中,动态统计激活的量化参数。
- 静态策略: 在推理过程中,对不同的输入,采用相同的从训练数据中统计得到的量化参数。
- 支持对权重全局量化和Channel-Wise量化
- 支持以兼容Paddle Mobile的格式保存模型
### 蒸馏
- 支持在teacher网络和student网络任意层添加组合loss
- 支持FSP loss
- 支持L2 loss
- 支持softmax with cross-entropy loss
### 其它功能
- 支持配置文件管理压缩任务超参数
- 支持多种压缩策略组合使用
## 简要实验结果
本节列出了PaddleSlim模型压缩工具库的一些实验结果,更多实验数据和预训练模型的下载,请参考:[详细实验结果与ModelZoo](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/model_zoo.md)
### 量化训练
评估实验所使用数据集为ImageNet1000类数据,且以top-1准确率为衡量指标:
| Model | FP32| int8(X:abs_max, W:abs_max) | int8, (X:moving_average_abs_max, W:abs_max) |int8, (X:abs_max, W:channel_wise_abs_max) |
|:---|:---:|:---:|:---:|:---:|
|MobileNetV1|89.54%/70.91%|89.64%/71.01%|89.58%/70.86%|89.75%/71.13%|
|ResNet50|92.80%/76.35%|93.12%/76.77%|93.07%/76.65%|93.15%/76.80%|
### 卷积核剪切
数据:ImageNet 1000类
模型:MobileNetV1
原始模型大小:17M
原始精度(top5/top1): 89.54% / 70.91%
#### Uniform剪切
| FLOPS |model size| 精度损失(top5/top1)|精度(top5/top1) |
|---|---|---|---|
| -50%|-47.0%(9.0M)|-0.41% / -1.08%|89.13% / 69.83%|
| -60%|-55.9%(7.5M)|-1.34% / -2.67%|88.22% / 68.24%|
| -70%|-65.3%(5.9M)|-2.55% / -4.34%|86.99% / 66.57%|
#### 基于敏感度迭代剪切
| FLOPS |精度(top5/top1)|
|---|---|
| -0% |89.54% / 70.91% |
| -20% |90.08% / 71.48% |
| -36% |89.62% / 70.83%|
| -50% |88.77% / 69.31%|
### 蒸馏
数据:ImageNet 1000类
模型:MobileNetV1
|- |精度(top5/top1) |收益(top5/top1)|
|---|---|---|
| 单独训| 89.54% / 70.91%| - |
| ResNet50蒸馏训| 90.92% / 71.97%| +1.28% / +1.06%|
### 组合实验
数据:ImageNet 1000类
模型:MobileNetV1
|压缩策略 |精度(top5/top1) |模型大小|
|---|---|---|
| Baseline|89.54% / 70.91%|17.0M|
| ResNet50蒸馏|90.92% / 71.97%|17.0M|
| ResNet50蒸馏训 + 量化|90.94% / 72.08%|4.2M|
| 剪切-50% FLOPS|89.13% / 69.83%|9.0M|
| 剪切-50% FLOPS + 量化|89.11% / 69.70%|2.3M|
## 模型导出格式
压缩框架支持导出以下格式的模型:
- **Paddle Fluid模型格式:** Paddle Fluid模型格式,可通过Paddle框架加载使用。
- **Paddle Mobile模型格式:** 仅在量化训练策略时使用,兼容[Paddle Mobile](https://github.com/PaddlePaddle/paddle-mobile)的模型格式。
# 模型压缩
PaddleSlim是一个模型压缩工具库,包含模型剪裁、定点量化、知识蒸馏、超参搜索和模型结构搜索等一系列模型压缩策略。
对于业务用户,PaddleSlim提供完整的模型压缩解决方案,可用于图像分类、检测、分割等各种类型的视觉场景。
同时也在持续探索NLP领域模型的压缩方案。另外,PaddleSlim提供且在不断完善各种压缩策略在经典开源任务的benchmark,
以便业务用户参考。
对于模型压缩算法研究者或开发者,PaddleSlim提供各种压缩策略的底层辅助接口,方便用户复现、调研和使用最新论文方法。
PaddleSlim会从底层能力、技术咨询合作和业务场景等角度支持开发者进行模型压缩策略相关的创新工作。
## 功能
- 模型剪裁
- 卷积通道均匀剪裁
- 基于敏感度的卷积通道剪裁
- 基于进化算法的自动剪裁
- 定点量化
- 在线量化训练(training aware)
- 离线量化(post training)
- 知识蒸馏
- 支持单进程知识蒸馏
- 支持多进程分布式知识蒸馏
- 神经网络结构自动搜索(NAS)
- 支持基于进化算法的轻量神经网络结构自动搜索
- 支持One-Shot网络结构自动搜索
- 支持 FLOPS / 硬件延时约束
- 支持多平台模型延时评估
- 支持用户自定义搜索算法和搜索空间
## 安装
依赖:
Paddle >= 1.7.0
```bash
pip install paddleslim -i https://pypi.org/simple
```
## 使用
- [快速开始](https://paddlepaddle.github.io/PaddleSlim/quick_start/index.html):通过简单示例介绍如何快速使用PaddleSlim。
- [进阶教程](https://paddlepaddle.github.io/PaddleSlim/tutorials/index.html):PaddleSlim高阶教程。
- [模型库](https://paddlepaddle.github.io/PaddleSlim/model_zoo.html):各个压缩策略在图像分类、目标检测和图像语义分割模型上的实验结论,包括模型精度、预测速度和可供下载的预训练模型。
- [API文档](https://paddlepaddle.github.io/PaddleSlim/api_cn/index.html)
- [Paddle检测库](https://github.com/PaddlePaddle/PaddleDetection/tree/master/slim):介绍如何在检测库中使用PaddleSlim。
- [Paddle分割库](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/slim):介绍如何在分割库中使用PaddleSlim。
- [PaddleLite](https://paddlepaddle.github.io/Paddle-Lite/):介绍如何使用预测库PaddleLite部署PaddleSlim产出的模型。
## 部分压缩策略效果
### 分类模型
数据: ImageNet2012; 模型: MobileNetV1;
|压缩策略 |精度收益(baseline: 70.91%) |模型大小(baseline: 17.0M)|
|:---:|:---:|:---:|
| 知识蒸馏(ResNet50)| **+1.06%** |-|
| 知识蒸馏(ResNet50) + int8量化训练 |**+1.10%**| **-71.76%**|
| 剪裁(FLOPs-50%) + int8量化训练|**-1.71%**|**-86.47%**|
### 图像检测模型
#### 数据:Pascal VOC;模型:MobileNet-V1-YOLOv3
| 压缩方法 | mAP(baseline: 76.2%) | 模型大小(baseline: 94MB) |
| :---------------------: | :------------: | :------------:|
| 知识蒸馏(ResNet34-YOLOv3) | **+2.8%** | - |
| 剪裁 FLOPs -52.88% | **+1.4%** | **-67.76%** |
|知识蒸馏(ResNet34-YOLOv3)+剪裁(FLOPs-69.57%)| **+2.6%**|**-67.00%**|
#### 数据:COCO;模型:MobileNet-V1-YOLOv3
| 压缩方法 | mAP(baseline: 29.3%) | 模型大小|
| :---------------------: | :------------: | :------:|
| 知识蒸馏(ResNet34-YOLOv3) | **+2.1%** |-|
| 知识蒸馏(ResNet34-YOLOv3)+剪裁(FLOPs-67.56%) | **-0.3%** | **-66.90%**|
### 搜索
数据:ImageNet2012; 模型:MobileNetV2
|硬件环境 | 推理耗时 | Top1准确率(baseline:71.90%) |
|:---------------:|:---------:|:--------------------:|
| RK3288 | **-23%** | +0.07% |
| Android cellphone | **-20%** | +0.16% |
| iPhone 6s | **-17%** | +0.32% |
Model Compression
==================
PaddleSlim is a toolkit for model compression. It contains a collection of compression strategies, such as pruning, fixed point quantization, knowledge distillation, hyperparameter searching and neural architecture search.
PaddleSlim provides solutions of compression on computer vision models, such as image classification, object detection and semantic segmentation. Meanwhile, PaddleSlim Keeps exploring advanced compression strategies for language model. Furthermore, benckmark of compression strategies on some open tasks is available for your reference.
PaddleSlim also provides auxiliary and primitive API for developer and researcher to survey, implement and apply the method in latest papers. PaddleSlim will support developer in ability of framework and technology consulting.
Features
----------
Pruning
+++++++++
- Uniform pruning of convolution
- Sensitivity-based prunning
- Automated pruning based evolution search strategy
- Support pruning of various deep architectures such as VGG, ResNet, and MobileNet.
- Support self-defined range of pruning, i.e., layers to be pruned.
Fixed Point Quantization
++++++++++++++++++++++++
- Training aware
- Dynamic strategy: During inference, we quantize models with hyperparameters dynamically estimated from small batches of samples.
- Static strategy: During inference, we quantize models with the same hyperparameters estimated from training data.
- Support layer-wise and channel-wise quantization.
- Post training
Knowledge Distillation
+++++++++++++++++++++++
- Naive knowledge distillation: transfers dark knowledge by merging the teacher and student model into the same Program
- Paddle large-scale scalable knowledge distillation framework Pantheon: a universal solution for knowledge distillation, more flexible than the naive knowledge distillation, and easier to scale to the large-scale applications.
- Decouple the teacher and student models --- they run in different processes in the same or different nodes, and transfer knowledge via TCP/IP ports or local files;
- Friendly to assemble multiple teacher models and each of them can work in either online or offline mode independently;
- Merge knowledge from different teachers and make batch data for the student model automatically;
- Support the large-scale knowledge prediction of teacher models on multiple devices.
Neural Architecture Search
+++++++++++++++++++++++++++
- Neural architecture search based on evolution strategy.
- Support distributed search.
- One-Shot neural architecture search.
- Support FLOPs and latency constrained search.
- Support the latency estimation on different hardware and platforms.
Install
--------
Requires:
Paddle >= 1.7.0
.. code-block:: bash
pip install paddleslim -i https://pypi.org/simple
Usage
------
- `QuickStart <https://paddlepaddle.github.io/PaddleSlim/quick_start/index_en.html>`_ : Introduce how to use PaddleSlim by simple examples.
- `Advanced Tutorials <https://paddlepaddle.github.io/PaddleSlim/tutorials/index_en.html>`_ : Tutorials about advanced usage of PaddleSlim.
- `Model Zoo <https://paddlepaddle.github.io/PaddleSlim/model_zoo_en.html>`_ : Benchmark and pretrained models.
- `API Documents <https://paddlepaddle.github.io/PaddleSlim/api_en/index_en.html>`_
- `PaddleDetection <https://github.com/PaddlePaddle/PaddleDetection/tree/master/slim>`_ : Introduce how to use PaddleSlim in PaddleDetection library.
- `PaddleSeg <https://github.com/PaddlePaddle/PaddleSeg/tree/develop/slim>`_ : Introduce how to use PaddleSlim in PaddleSeg library.
- `PaddleLite <https://paddlepaddle.github.io/Paddle-Lite/>`_ : How to use PaddleLite to deploy models generated by PaddleSlim.
Performance
------------
Image Classification
+++++++++++++++++++++
Dataset: ImageNet2012; Model: MobileNetV1;
===================================================== =========================== ============================
Method Accuracy(baseline: 70.91%) Model Size(baseline: 17.0M)
===================================================== =========================== ============================
Knowledge Distillation(ResNet50) +1.06% -
Knowledge Distillation(ResNet50) + int8 quantization +1.10% -71.76%
Pruning(FLOPs-50%) + int8 quantization -1.71% -86.47%
===================================================== =========================== ============================
Object Detection
+++++++++++++++++
Dataset: Pascal VOC; Model: MobileNet-V1-YOLOv3
============================================================== ===================== ===========================
Method mAP(baseline: 76.2%) Model Size(baseline: 94MB)
============================================================== ===================== ===========================
Knowledge Distillation(ResNet34-YOLOv3) +2.8% -
Pruning(FLOPs -52.88%) +1.4% -67.76%
Knowledge DistillationResNet34-YOLOv3)+Pruning(FLOPs-69.57%) +2.6% -67.00%
============================================================== ===================== ===========================
Dataset: COCO; Model: MobileNet-V1-YOLOv3
============================================================== ===================== ===========================
Method mAP(baseline: 29.3%) Model Size|
============================================================== ===================== ===========================
Knowledge Distillation(ResNet34-YOLOv3) +2.1% -
Knowledge Distillation(ResNet34-YOLOv3)+Pruning(FLOPs-67.56%) -0.3% -66.90%|
============================================================== ===================== ===========================
NAS
++++++
Dataset: ImageNet2012; Model: MobileNetV2
=================== ================ ===============================
Device Infer time cost Top1 accuracy(baseline:71.90%)
=================== ================ ===============================
RK3288 -23% +0.07%
Android cellphone -20% +0.16%
iPhone 6s -17% +0.32%
=================== ================ ===============================
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册