Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleSlim
提交
9ce81dcc
P
PaddleSlim
项目概览
PaddlePaddle
/
PaddleSlim
1 年多 前同步成功
通知
51
Star
1434
Fork
344
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
53
列表
看板
标记
里程碑
合并请求
16
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleSlim
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
53
Issue
53
列表
看板
标记
里程碑
合并请求
16
合并请求
16
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
9ce81dcc
编写于
2月 07, 2020
作者:
B
Bai Yifan
提交者:
GitHub
2月 07, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add distillation quick-start (#91)
上级
d80ed89f
变更
2
显示空白变更内容
内联
并排
Showing
2 changed file
with
319 addition
and
0 deletion
+319
-0
demo/distillation/image_classification_distillation_tutorial.ipynb
...illation/image_classification_distillation_tutorial.ipynb
+206
-0
docs/zh_cn/quick_start/distillation_tutorial.md
docs/zh_cn/quick_start/distillation_tutorial.md
+113
-0
未找到文件。
demo/distillation/image_classification_distillation_tutorial.ipynb
0 → 100644
浏览文件 @
9ce81dcc
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PaddleSlim Distillation知识蒸馏简介与实验\n",
"\n",
"一般情况下,模型参数量越多,结构越复杂,其性能越好,但参数也越冗余,运算量和资源消耗也越大。**知识蒸馏**就是一种将大模型学习到的有用信息(Dark Knowledge)压缩进更小更快的模型,而获得可以匹敌大模型结果的方法。\n",
"\n",
"在本文中性能强劲的大模型被称为teacher, 性能稍逊但体积较小的模型被称为student。示例包含以下步骤:\n",
"\n",
"1. 导入依赖\n",
"2. 定义student_program和teacher_program\n",
"3. 选择特征图\n",
"4. 合并program (merge)并添加蒸馏loss\n",
"5. 模型训练\n",
"\n",
"\n",
"## 1. 导入依赖\n",
"PaddleSlim依赖Paddle1.7版本,请确认已正确安装Paddle,然后按以下方式导入Paddle、PaddleSlim以及其他依赖:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import paddle\n",
"import paddle.fluid as fluid\n",
"import paddleslim as slim\n",
"import sys\n",
"sys.path.append(\"../\")\n",
"import models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 定义student_program和teacher_program\n",
"\n",
"本教程在MNIST数据集上进行知识蒸馏的训练和验证,输入图片尺寸为`[1, 28, 28]`,输出类别数为10。\n",
"选择`ResNet50`作为teacher对`MobileNet`结构的student进行蒸馏训练。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"model = models.__dict__['MobileNet']()\n",
"student_program = fluid.Program()\n",
"student_startup = fluid.Program()\n",
"with fluid.program_guard(student_program, student_startup):\n",
" image = fluid.data(\n",
" name='image', shape=[None] + [1, 28, 28], dtype='float32')\n",
" label = fluid.data(name='label', shape=[None, 1], dtype='int64')\n",
" out = model.net(input=image, class_dim=10)\n",
" cost = fluid.layers.cross_entropy(input=out, label=label)\n",
" avg_cost = fluid.layers.mean(x=cost)\n",
" acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)\n",
" acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"teacher_model = models.__dict__['ResNet50']()\n",
"teacher_program = fluid.Program()\n",
"teacher_startup = fluid.Program()\n",
"with fluid.program_guard(teacher_program, teacher_startup):\n",
" with fluid.unique_name.guard():\n",
" image = fluid.data(\n",
" name='image', shape=[None] + [1, 28, 28], dtype='float32')\n",
" predict = teacher_model.net(image, class_dim=10)\n",
"exe = fluid.Executor(fluid.CPUPlace())\n",
"exe.run(teacher_startup)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 选择特征图\n",
"我们可以用student_的list_vars方法来观察其中全部的Variables,从中选出一个或多个变量(Variable)来拟合teacher相应的变量。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get all student variables\n",
"student_vars = []\n",
"for v in student_program.list_vars():\n",
" student_vars.append((v.name, v.shape))\n",
"#uncomment the following lines to observe student's variables for distillation\n",
"#print(\"=\"*50+\"student_model_vars\"+\"=\"*50)\n",
"#print(student_vars)\n",
"\n",
"# get all teacher variables\n",
"teacher_vars = []\n",
"for v in teacher_program.list_vars():\n",
" teacher_vars.append((v.name, v.shape))\n",
"#uncomment the following lines to observe teacher's variables for distillation\n",
"#print(\"=\"*50+\"teacher_model_vars\"+\"=\"*50)\n",
"#print(teacher_vars)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"经过筛选我们可以看到,teacher_program中的'bn5c_branch2b.output.1.tmp_3'和student_program的'depthwise_conv2d_11.tmp_0'尺寸一致,可以组成蒸馏损失函数。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 合并program (merge)并添加蒸馏loss\n",
"merge操作将student_program和teacher_program中的所有Variables和Op都将被添加到同一个Program中,同时为了避免两个program中有同名变量会引起命名冲突,merge也会为teacher_program中的Variables添加一个同一的命名前缀name_prefix,其默认值是'teacher_'\n",
"\n",
"为了确保teacher网络和student网络输入的数据是一样的,merge操作也会对两个program的输入数据层进行合并操作,所以需要指定一个数据层名称的映射关系data_name_map,key是teacher的输入数据名称,value是student的"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_name_map = {'image': 'image'}\n",
"main = slim.dist.merge(teacher_program, student_program, data_name_map, fluid.CPUPlace())\n",
"with fluid.program_guard(student_program, student_startup):\n",
" l2_loss = slim.dist.l2_loss('teacher_bn5c_branch2b.output.1.tmp_3', 'depthwise_conv2d_11.tmp_0', student_program)\n",
" loss = l2_loss + avg_cost\n",
" opt = fluid.optimizer.Momentum(0.01, 0.9)\n",
" opt.minimize(loss)\n",
"exe.run(student_startup)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 模型训练\n",
"\n",
"为了快速执行该示例,我们选取简单的MNIST数据,Paddle框架的`paddle.dataset.mnist`包定义了MNIST数据的下载和读取。\n",
"代码如下:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_reader = paddle.batch(\n",
" paddle.dataset.mnist.train(), batch_size=128, drop_last=True)\n",
"train_feeder = fluid.DataFeeder(['image', 'label'], fluid.CPUPlace(), student_program)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for data in train_reader():\n",
" acc1, acc5, loss_np = exe.run(student_program, feed=train_feeder.feed(data), fetch_list=[acc_top1.name, acc_top5.name, loss.name])\n",
" print(\"Acc1: {:.6f}, Acc5: {:.6f}, Loss: {:.6f}\".format(acc1.mean(), acc5.mean(), loss_np.mean()))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
docs/zh_cn/quick_start/distillation_tutorial.md
0 → 100755
浏览文件 @
9ce81dcc
# 图像分类模型知识蒸馏-快速开始
该教程以图像分类模型MobileNetV1为例,说明如何快速使用
[
PaddleSlim的知识蒸馏接口
](
https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/
)
。
该示例包含以下步骤:
1.
导入依赖
2.
定义student_program和teacher_program
3.
选择特征图
4.
合并program(merge)并添加蒸馏loss
5.
模型训练
以下章节依次介绍每个步骤的内容。
## 1. 导入依赖
PaddleSlim依赖Paddle1.7版本,请确认已正确安装Paddle,然后按以下方式导入Paddle和PaddleSlim:
```
import paddle
import paddle.fluid as fluid
import paddleslim as slim
```
## 2. 定义student_program和teacher_program
本教程在MNIST数据集上进行知识蒸馏的训练和验证,输入图片尺寸为
`[1, 28, 28]`
,输出类别数为10。
选择
`ResNet50`
作为teacher对
`MobileNet`
结构的student进行蒸馏训练。
```
python
model
=
models
.
__dict__
[
'MobileNet'
]()
student_program
=
fluid
.
Program
()
student_startup
=
fluid
.
Program
()
with
fluid
.
program_guard
(
student_program
,
student_startup
):
image
=
fluid
.
data
(
name
=
'image'
,
shape
=
[
None
]
+
[
1
,
28
,
28
],
dtype
=
'float32'
)
label
=
fluid
.
data
(
name
=
'label'
,
shape
=
[
None
,
1
],
dtype
=
'int64'
)
out
=
model
.
net
(
input
=
image
,
class_dim
=
10
)
cost
=
fluid
.
layers
.
cross_entropy
(
input
=
out
,
label
=
label
)
avg_cost
=
fluid
.
layers
.
mean
(
x
=
cost
)
acc_top1
=
fluid
.
layers
.
accuracy
(
input
=
out
,
label
=
label
,
k
=
1
)
acc_top5
=
fluid
.
layers
.
accuracy
(
input
=
out
,
label
=
label
,
k
=
5
)
```
```
python
teacher_model
=
models
.
__dict__
[
'ResNet50'
]()
teacher_program
=
fluid
.
Program
()
teacher_startup
=
fluid
.
Program
()
with
fluid
.
program_guard
(
teacher_program
,
teacher_startup
):
with
fluid
.
unique_name
.
guard
():
image
=
fluid
.
data
(
name
=
'image'
,
shape
=
[
None
]
+
[
1
,
28
,
28
],
dtype
=
'float32'
)
predict
=
teacher_model
.
net
(
image
,
class_dim
=
10
)
exe
=
fluid
.
Executor
(
fluid
.
CPUPlace
())
exe
.
run
(
teacher_startup
)
```
## 3. 选择特征图
我们可以用student_的list_vars方法来观察其中全部的Variables,从中选出一个或多个变量(Variable)来拟合teacher相应的变量。
```
python
# get all student variables
student_vars
=
[]
for
v
in
student_program
.
list_vars
():
student_vars
.
append
((
v
.
name
,
v
.
shape
))
#uncomment the following lines to observe student's variables for distillation
#print("="*50+"student_model_vars"+"="*50)
#print(student_vars)
# get all teacher variables
teacher_vars
=
[]
for
v
in
teacher_program
.
list_vars
():
teacher_vars
.
append
((
v
.
name
,
v
.
shape
))
#uncomment the following lines to observe teacher's variables for distillation
#print("="*50+"teacher_model_vars"+"="*50)
#print(teacher_vars)
```
经过筛选我们可以看到,teacher_program中的'bn5c_branch2b.output.1.tmp_3'和student_program的'depthwise_conv2d_11.tmp_0'尺寸一致,可以组成蒸馏损失函数。
## 4. 合并program (merge)并添加蒸馏loss
merge操作将student_program和teacher_program中的所有Variables和Op都将被添加到同一个Program中,同时为了避免两个program中有同名变量会引起命名冲突,merge也会为teacher_program中的Variables添加一个同一的命名前缀name_prefix,其默认值是'teacher_'
为了确保teacher网络和student网络输入的数据是一样的,merge操作也会对两个program的输入数据层进行合并操作,所以需要指定一个数据层名称的映射关系data_name_map,key是teacher的输入数据名称,value是student的
```
python
data_name_map
=
{
'image'
:
'image'
}
main
=
slim
.
dist
.
merge
(
teacher_program
,
student_program
,
data_name_map
,
fluid
.
CPUPlace
())
with
fluid
.
program_guard
(
student_program
,
student_startup
):
l2_loss
=
slim
.
dist
.
l2_loss
(
'teacher_bn5c_branch2b.output.1.tmp_3'
,
'depthwise_conv2d_11.tmp_0'
,
student_program
)
loss
=
l2_loss
+
avg_cost
opt
=
fluid
.
optimizer
.
Momentum
(
0.01
,
0.9
)
opt
.
minimize
(
loss
)
exe
.
run
(
student_startup
)
```
## 5. 模型训练
为了快速执行该示例,我们选取简单的MNIST数据,Paddle框架的
`paddle.dataset.mnist`
包定义了MNIST数据的下载和读取。 代码如下:
```
python
train_reader
=
paddle
.
batch
(
paddle
.
dataset
.
mnist
.
train
(),
batch_size
=
128
,
drop_last
=
True
)
train_feeder
=
fluid
.
DataFeeder
([
'image'
,
'label'
],
fluid
.
CPUPlace
(),
student_program
)
```
```
python
for
data
in
train_reader
():
acc1
,
acc5
,
loss_np
=
exe
.
run
(
student_program
,
feed
=
train_feeder
.
feed
(
data
),
fetch_list
=
[
acc_top1
.
name
,
acc_top5
.
name
,
loss
.
name
])
print
(
"Acc1: {:.6f}, Acc5: {:.6f}, Loss: {:.6f}"
.
format
(
acc1
.
mean
(),
acc5
.
mean
(),
loss_np
.
mean
()))
```
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录