diff --git a/demo/distillation/image_classification_distillation_tutorial.ipynb b/demo/distillation/image_classification_distillation_tutorial.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..28c1de23050fd1944401c095304dba4674aa965b
--- /dev/null
+++ b/demo/distillation/image_classification_distillation_tutorial.ipynb
@@ -0,0 +1,206 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# PaddleSlim Distillation知识蒸馏简介与实验\n",
+    "\n",
+    "一般情况下，模型参数量越多，结构越复杂，其性能越好，但参数也越冗余，运算量和资源消耗也越大。**知识蒸馏**就是一种将大模型学习到的有用信息（Dark Knowledge）压缩进更小更快的模型，而获得可以匹敌大模型结果的方法。\n",
+    "\n",
+    "在本文中性能强劲的大模型被称为teacher, 性能稍逊但体积较小的模型被称为student。示例包含以下步骤：\n",
+    "\n",
+    "1. 导入依赖\n",
+    "2. 定义student_program和teacher_program\n",
+    "3. 选择特征图\n",
+    "4. 合并program (merge)并添加蒸馏loss\n",
+    "5. 模型训练\n",
+    "\n",
+    "\n",
+    "## 1. 导入依赖\n",
+    "PaddleSlim依赖Paddle1.7版本，请确认已正确安装Paddle，然后按以下方式导入Paddle、PaddleSlim以及其他依赖:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import paddle\n",
+    "import paddle.fluid as fluid\n",
+    "import paddleslim as slim\n",
+    "import sys\n",
+    "sys.path.append(\"../\")\n",
+    "import models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. 定义student_program和teacher_program\n",
+    "\n",
+    "本教程在MNIST数据集上进行知识蒸馏的训练和验证，输入图片尺寸为`[1, 28, 28]`，输出类别数为10。\n",
+    "选择`ResNet50`作为teacher对`MobileNet`结构的student进行蒸馏训练。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "model = models.__dict__['MobileNet']()\n",
+    "student_program = fluid.Program()\n",
+    "student_startup = fluid.Program()\n",
+    "with fluid.program_guard(student_program, student_startup):\n",
+    "    image = fluid.data(\n",
+    "        name='image', shape=[None] + [1, 28, 28], dtype='float32')\n",
+    "    label = fluid.data(name='label', shape=[None, 1], dtype='int64')\n",
+    "    out = model.net(input=image, class_dim=10)\n",
+    "    cost = fluid.layers.cross_entropy(input=out, label=label)\n",
+    "    avg_cost = fluid.layers.mean(x=cost)\n",
+    "    acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)\n",
+    "    acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "teacher_model = models.__dict__['ResNet50']()\n",
+    "teacher_program = fluid.Program()\n",
+    "teacher_startup = fluid.Program()\n",
+    "with fluid.program_guard(teacher_program, teacher_startup):\n",
+    "    with fluid.unique_name.guard():\n",
+    "        image = fluid.data(\n",
+    "            name='image', shape=[None] + [1, 28, 28], dtype='float32')\n",
+    "        predict = teacher_model.net(image, class_dim=10)\n",
+    "exe = fluid.Executor(fluid.CPUPlace())\n",
+    "exe.run(teacher_startup)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. 选择特征图\n",
+    "我们可以用student_的list_vars方法来观察其中全部的Variables，从中选出一个或多个变量（Variable）来拟合teacher相应的变量。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# get all student variables\n",
+    "student_vars = []\n",
+    "for v in student_program.list_vars():\n",
+    "    student_vars.append((v.name, v.shape))\n",
+    "#uncomment the following lines to observe student's variables for distillation\n",
+    "#print(\"=\"*50+\"student_model_vars\"+\"=\"*50)\n",
+    "#print(student_vars)\n",
+    "\n",
+    "# get all teacher variables\n",
+    "teacher_vars = []\n",
+    "for v in teacher_program.list_vars():\n",
+    "    teacher_vars.append((v.name, v.shape))\n",
+    "#uncomment the following lines to observe teacher's variables for distillation\n",
+    "#print(\"=\"*50+\"teacher_model_vars\"+\"=\"*50)\n",
+    "#print(teacher_vars)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "经过筛选我们可以看到，teacher_program中的'bn5c_branch2b.output.1.tmp_3'和student_program的'depthwise_conv2d_11.tmp_0'尺寸一致，可以组成蒸馏损失函数。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. 合并program (merge)并添加蒸馏loss\n",
+    "merge操作将student_program和teacher_program中的所有Variables和Op都将被添加到同一个Program中，同时为了避免两个program中有同名变量会引起命名冲突，merge也会为teacher_program中的Variables添加一个同一的命名前缀name_prefix，其默认值是'teacher_'\n",
+    "\n",
+    "为了确保teacher网络和student网络输入的数据是一样的，merge操作也会对两个program的输入数据层进行合并操作，所以需要指定一个数据层名称的映射关系data_name_map，key是teacher的输入数据名称，value是student的"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_name_map = {'image': 'image'}\n",
+    "main = slim.dist.merge(teacher_program, student_program, data_name_map, fluid.CPUPlace())\n",
+    "with fluid.program_guard(student_program, student_startup):\n",
+    "    l2_loss = slim.dist.l2_loss('teacher_bn5c_branch2b.output.1.tmp_3', 'depthwise_conv2d_11.tmp_0', student_program)\n",
+    "    loss = l2_loss + avg_cost\n",
+    "    opt = fluid.optimizer.Momentum(0.01, 0.9)\n",
+    "    opt.minimize(loss)\n",
+    "exe.run(student_startup)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. 模型训练\n",
+    "\n",
+    "为了快速执行该示例，我们选取简单的MNIST数据，Paddle框架的`paddle.dataset.mnist`包定义了MNIST数据的下载和读取。\n",
+    "代码如下："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_reader = paddle.batch(\n",
+    "    paddle.dataset.mnist.train(), batch_size=128, drop_last=True)\n",
+    "train_feeder = fluid.DataFeeder(['image', 'label'], fluid.CPUPlace(), student_program)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for data in train_reader():\n",
+    "    acc1, acc5, loss_np = exe.run(student_program, feed=train_feeder.feed(data), fetch_list=[acc_top1.name, acc_top5.name, loss.name])\n",
+    "    print(\"Acc1: {:.6f}, Acc5: {:.6f}, Loss: {:.6f}\".format(acc1.mean(), acc5.mean(), loss_np.mean()))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/docs/zh_cn/quick_start/distillation_tutorial.md b/docs/zh_cn/quick_start/distillation_tutorial.md
new file mode 100755
index 0000000000000000000000000000000000000000..d998e338afda7a9d607e88ef75c670d273e4cf72
--- /dev/null
+++ b/docs/zh_cn/quick_start/distillation_tutorial.md
@@ -0,0 +1,113 @@
+#  图像分类模型知识蒸馏-快速开始
+
+该教程以图像分类模型MobileNetV1为例，说明如何快速使用[PaddleSlim的知识蒸馏接口](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)。
+该示例包含以下步骤：
+
+1. 导入依赖
+2. 定义student_program和teacher_program
+3. 选择特征图
+4. 合并program(merge)并添加蒸馏loss
+5. 模型训练
+
+以下章节依次介绍每个步骤的内容。
+
+## 1. 导入依赖
+
+PaddleSlim依赖Paddle1.7版本，请确认已正确安装Paddle，然后按以下方式导入Paddle和PaddleSlim:
+
+```
+import paddle
+import paddle.fluid as fluid
+import paddleslim as slim
+```
+
+## 2. 定义student_program和teacher_program
+
+本教程在MNIST数据集上进行知识蒸馏的训练和验证，输入图片尺寸为`[1, 28, 28]`，输出类别数为10。
+选择`ResNet50`作为teacher对`MobileNet`结构的student进行蒸馏训练。
+
+```python
+model = models.__dict__['MobileNet']()
+student_program = fluid.Program()
+student_startup = fluid.Program()
+with fluid.program_guard(student_program, student_startup):
+    image = fluid.data(
+        name='image', shape=[None] + [1, 28, 28], dtype='float32')
+    label = fluid.data(name='label', shape=[None, 1], dtype='int64')
+    out = model.net(input=image, class_dim=10)
+    cost = fluid.layers.cross_entropy(input=out, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
+    acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
+```
+
+
+
+```python
+teacher_model = models.__dict__['ResNet50']()
+teacher_program = fluid.Program()
+teacher_startup = fluid.Program()
+with fluid.program_guard(teacher_program, teacher_startup):
+    with fluid.unique_name.guard():
+        image = fluid.data(
+            name='image', shape=[None] + [1, 28, 28], dtype='float32')
+        predict = teacher_model.net(image, class_dim=10)
+exe = fluid.Executor(fluid.CPUPlace())
+exe.run(teacher_startup)
+```
+
+## 3. 选择特征图
+
+我们可以用student_的list_vars方法来观察其中全部的Variables，从中选出一个或多个变量（Variable）来拟合teacher相应的变量。
+
+```python
+# get all student variables
+student_vars = []
+for v in student_program.list_vars():
+    student_vars.append((v.name, v.shape))
+#uncomment the following lines to observe student's variables for distillation
+#print("="*50+"student_model_vars"+"="*50)
+#print(student_vars)
+
+# get all teacher variables
+teacher_vars = []
+for v in teacher_program.list_vars():
+    teacher_vars.append((v.name, v.shape))
+#uncomment the following lines to observe teacher's variables for distillation
+#print("="*50+"teacher_model_vars"+"="*50)
+#print(teacher_vars)
+```
+
+经过筛选我们可以看到，teacher_program中的'bn5c_branch2b.output.1.tmp_3'和student_program的'depthwise_conv2d_11.tmp_0'尺寸一致，可以组成蒸馏损失函数。
+
+## 4. 合并program (merge)并添加蒸馏loss
+merge操作将student_program和teacher_program中的所有Variables和Op都将被添加到同一个Program中，同时为了避免两个program中有同名变量会引起命名冲突，merge也会为teacher_program中的Variables添加一个同一的命名前缀name_prefix，其默认值是'teacher_'
+
+为了确保teacher网络和student网络输入的数据是一样的，merge操作也会对两个program的输入数据层进行合并操作，所以需要指定一个数据层名称的映射关系data_name_map，key是teacher的输入数据名称，value是student的
+
+```python
+data_name_map = {'image': 'image'}
+main = slim.dist.merge(teacher_program, student_program, data_name_map, fluid.CPUPlace())
+with fluid.program_guard(student_program, student_startup):
+    l2_loss = slim.dist.l2_loss('teacher_bn5c_branch2b.output.1.tmp_3', 'depthwise_conv2d_11.tmp_0', student_program)
+    loss = l2_loss + avg_cost
+    opt = fluid.optimizer.Momentum(0.01, 0.9)
+    opt.minimize(loss)
+exe.run(student_startup)
+```
+
+## 5. 模型训练
+
+为了快速执行该示例，我们选取简单的MNIST数据，Paddle框架的`paddle.dataset.mnist`包定义了MNIST数据的下载和读取。 代码如下：
+
+```python
+train_reader = paddle.batch(
+    paddle.dataset.mnist.train(), batch_size=128, drop_last=True)
+train_feeder = fluid.DataFeeder(['image', 'label'], fluid.CPUPlace(), student_program)
+```
+
+```python
+for data in train_reader():
+    acc1, acc5, loss_np = exe.run(student_program, feed=train_feeder.feed(data), fetch_list=[acc_top1.name, acc_top5.name, loss.name])
+    print("Acc1: {:.6f}, Acc5: {:.6f}, Loss: {:.6f}".format(acc1.mean(), acc5.mean(), loss_np.mean()))
+```