{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.ERNIE 3.0 轻量级模型简介\n", "\n", "PaddleNLP 开源的 [ERNIE 3.0 轻量级模型](https://github.com/paddlepaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0) 是在文心大模型 ERNIE 3.0 基础上通过在线蒸馏技术得到的轻量级模型,模型结构与 ERNIE 2.0 保持一致,相比 ERNIE 2.0 具有更强的中文效果。\n", "\n", "相关技术详解可参考文章[《解析全球最大中文单体模型鹏城-百度·文心技术细节》](https://www.jiqizhixin.com/articles/2021-12-08-9)\n", "\n", "# 2.模型效果\n", "\n", "ERNIE 3.0 轻量级模型开源 **ERNIE 3.0 _Base_** 、**ERNIE 3.0 _Medium_** 、 **ERNIE 3.0 _Mini_** 、 **ERNIE 3.0 _Micro_** 、 **ERNIE 3.0 _Nano_** 五个模型:\n", "\n", "- [**ERNIE 3.0-_Base_**](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams) (_12-layer, 768-hidden, 12-heads_)\n", "- [**ERNIE 3.0-_Medium_**](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_medium_zh.pdparams) (_6-layer, 768-hidden, 12-heads_)\n", "- [**ERNIE 3.0-_Mini_**](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_mini_zh.pdparams) (_6-layer, 384-hidden, 12-heads_)\n", "- [**ERNIE 3.0-_Micro_**](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_micro_zh.pdparams) (_4-layer, 384-hidden, 12-heads_)\n", "- [**ERNIE 3.0-_Nano_**](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_nano_zh.pdparams) (_4-layer, 312-hidden, 12-heads_)\n", "\n", "\n", "下面是 PaddleNLP 中轻量级中文模型的**效果-时延图**。横坐标表示在 IFLYTEK 数据集 (最大序列长度设置为 128) 上测试的延迟(latency,单位:ms),纵坐标是 CLUE 10 个任务上的平均精度(包含文本分类、文本匹配、自然语言推理、代词消歧、阅读理解等任务),其中 CMRC2018 阅读理解任务的评价指标是 Exact Match(EM),其他任务的评价指标均是 Accuracy。图中越靠**左上**的模型,精度和性能水平越高。\n", "\n", "图中模型名下方标注了模型的参数量,测试环境见[性能测试](https://github.com/paddlepaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0#%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95)。\n", "\n", "batch_size=32 时,CPU 下的效果-时延图(线程数 1 和 8):\n", "\n", "\n", " \n", " \n", " \n", " \n", "
\n", "\n", "batch_size=1 时,CPU 下的效果-时延图(线程数 1 和 8):\n", "\n", "\n", " \n", " \n", " \n", " \n", "
\n", "\n", "batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:\n", "\n", "\n", " \n", " \n", " \n", " \n", "
\n", "\n", "从图上可看出,ERNIE 3.0 系列轻量级模型在精度和性能上的综合表现已全面领先于 UER-py、Huawei-Noah 以及 HFL 的中文模型。且当 batch_size=1、预测精度为 FP16 时,在 GPU 上宽且浅的模型的推理性能更有优势。\n", "\n", "在 CLUE **验证集**上评测指标如下表所示:\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Arch\n", " \n", " Model\n", " \n", " AVG\n", " \n", " AFQMC\n", " \n", " TNEWS\n", " \n", " IFLYTEK\n", " \n", " CMNLI\n", " \n", " OCNLI\n", " \n", " CLUEWSC2020\n", " \n", " CSL\n", " \n", " CMRC2018\n", " \n", " CHID\n", " \n", " C3\n", "
24L1024H \n", " ERNIE 1.0-Large-cw\n", " \n", " 79.03\n", " \n", " 75.97\n", " \n", " 59.65\n", " \n", " 62.91\n", " \n", " 85.09\n", " \n", " 81.73\n", " \n", " 93.09\n", " \n", " 84.53\n", " \n", " 74.22/91.88\n", " \n", " 88.57\n", " \n", " 84.54\n", "
\n", " ERNIE 2.0-Large-zh\n", " \n", " 76.90\n", " \n", " 76.23\n", " \n", " 59.33\n", " \n", " 61.91\n", " \n", " 83.85\n", " \n", " 79.93\n", " \n", " 89.82\n", " \n", " 83.23\n", " \n", " 70.95/90.31\n", " \n", " 86.78\n", " \n", " 78.12\n", "
\n", " RoBERTa-wwm-ext-large\n", " \n", " 76.61\n", " \n", " 76.00\n", " \n", " 59.33\n", " \n", " 62.02\n", " \n", " 83.88\n", " \n", " 78.81\n", " \n", " 90.79\n", " \n", " 83.67\n", " \n", " 70.58/89.82\n", " \n", " 85.72\n", " \n", " 75.26\n", "
20L1024H \n", " ERNIE 3.0-Xbase-zh\n", " \n", " 78.39\n", " \n", " 76.16\n", " \n", " 59.55\n", " \n", " 61.87\n", " \n", " 84.40\n", " \n", " 81.73\n", " \n", " 88.82\n", " \n", " 83.60\n", " \n", " 75.99/93.00\n", " \n", " 86.78\n", " \n", " 84.98\n", "
12L768H \n", " \n", " \n", " ERNIE 3.0-Base-zh\n", " \n", " \n", " \n", " 76.05\n", " \n", " 75.93\n", " \n", " 58.26\n", " \n", " 61.56\n", " \n", " 83.02\n", " \n", " 80.10\n", " \n", " 86.18\n", " \n", " 82.63\n", " \n", " 70.71/90.41\n", " \n", " 84.26\n", " \n", " 77.88\n", "
\n", " ERNIE 1.0-Base-zh-cw\n", " \n", " 76.47\n", " \n", " 76.07\n", " \n", " 57.86\n", " \n", " 59.91\n", " \n", " 83.41\n", " \n", " 79.58\n", " \n", " 89.91\n", " \n", " 83.42\n", " \n", " 72.88/90.78\n", " \n", " 84.68\n", " \n", " 76.98\n", "
\n", " ERNIE-Gram-zh\n", " \n", " 75.72\n", " \n", " 75.28\n", " \n", " 57.88\n", " \n", " 60.87\n", " \n", " 82.90\n", " \n", " 79.08\n", " \n", " 88.82\n", " \n", " 82.83\n", " \n", " 71.82/90.38\n", " \n", " 84.04\n", " \n", " 73.69\n", "
\n", " Langboat/Mengzi-BERT-Base\n", " \n", " 74.69\n", " \n", " 75.35\n", " \n", " 57.76\n", " \n", " 61.64\n", " \n", " 82.41\n", " \n", " 77.93\n", " \n", " 88.16\n", " \n", " 82.20\n", " \n", " 67.04/88.35\n", " \n", " 83.74\n", " \n", " 70.70\n", "
\n", " ERNIE 2.0-Base-zh\n", " \n", " 74.32\n", " \n", " 75.65\n", " \n", " 58.25\n", " \n", " 61.64\n", " \n", " 82.62\n", " \n", " 78.71\n", " \n", " 81.91\n", " \n", " 82.33\n", " \n", " 66.08/87.46\n", " \n", " 82.78\n", " \n", " 73.19\n", "
\n", " ERNIE 1.0-Base-zh\n", " \n", " 74.17\n", " \n", " 74.84\n", " \n", " 58.91\n", " \n", " 62.25\n", " \n", " 81.68\n", " \n", " 76.58\n", " \n", " 85.20\n", " \n", " 82.77\n", " \n", " 67.32/87.83\n", " \n", " 82.47\n", " \n", " 69.68\n", "
\n", " RoBERTa-wwm-ext\n", " \n", " 74.11\n", " \n", " 74.60\n", " \n", " 58.08\n", " \n", " 61.23\n", " \n", " 81.11\n", " \n", " 76.92\n", " \n", " 88.49\n", " \n", " 80.77\n", " \n", " 68.39/88.50\n", " \n", " 83.43\n", " \n", " 68.03\n", "
\n", " BERT-Base-Chinese\n", " \n", " 72.57\n", " \n", " 74.63\n", " \n", " 57.13\n", " \n", " 61.29\n", " \n", " 80.97\n", " \n", " 75.22\n", " \n", " 81.91\n", " \n", " 81.90\n", " \n", " 65.30/86.53\n", " \n", " 82.01\n", " \n", " 65.38\n", "
\n", " UER/Chinese-RoBERTa-Base\n", " \n", " 71.78\n", " \n", " 72.89\n", " \n", " 57.62\n", " \n", " 61.14\n", " \n", " 80.01\n", " \n", " 75.56\n", " \n", " 81.58\n", " \n", " 80.80\n", " \n", " 63.87/84.95\n", " \n", " 81.52\n", " \n", " 62.76\n", "
8L512H \n", " UER/Chinese-RoBERTa-Medium\n", " \n", " 67.06\n", " \n", " 70.64\n", " \n", " 56.10\n", " \n", " 58.29\n", " \n", " 77.35\n", " \n", " 71.90\n", " \n", " 68.09\n", " \n", " 78.63\n", " \n", " 57.63/78.91\n", " \n", " 75.13\n", " \n", " 56.84\n", "
6L768H \n", " \n", " \n", " ERNIE 3.0-Medium-zh\n", " \n", " \n", " \n", " 72.49\n", " \n", " 73.37\n", " \n", " 57.00\n", " \n", " 60.67\n", " \n", " 80.64\n", " \n", " 76.88\n", " \n", " 79.28\n", " \n", " 81.60\n", " \n", " 65.83/87.30\n", " \n", " 79.91\n", " \n", " 69.73\n", "
\n", " HLF/RBT6, Chinese\n", " \n", " 70.06\n", " \n", " 73.45\n", " \n", " 56.82\n", " \n", " 59.64\n", " \n", " 79.36\n", " \n", " 73.32\n", " \n", " 76.64\n", " \n", " 80.67\n", " \n", " 62.72/84.77\n", " \n", " 78.17\n", " \n", " 59.85\n", "
\n", " TinyBERT6, Chinese\n", " \n", " 69.62\n", " \n", " 72.22\n", " \n", " 55.70\n", " \n", " 54.48\n", " \n", " 79.12\n", " \n", " 74.07\n", " \n", " 77.63\n", " \n", " 80.17\n", " \n", " 63.03/83.75\n", " \n", " 77.64\n", " \n", " 62.11\n", "
\n", " RoFormerV2 Small\n", " \n", " 68.52\n", " \n", " 72.47\n", " \n", " 56.53\n", " \n", " 60.72\n", " \n", " 76.37\n", " \n", " 72.95\n", " \n", " 75.00\n", " \n", " 81.07\n", " \n", " 62.97/83.64\n", " \n", " 67.66\n", " \n", " 59.41\n", "
\n", " UER/Chinese-RoBERTa-L6-H768\n", " \n", " 67.09\n", " \n", " 70.13\n", " \n", " 56.54\n", " \n", " 60.48\n", " \n", " 77.49\n", " \n", " 72.00\n", " \n", " 72.04\n", " \n", " 77.33\n", " \n", " 53.74/75.52\n", " \n", " 76.73\n", " \n", " 54.40\n", "
6L384H \n", " \n", " \n", " ERNIE 3.0-Mini-zh\n", " \n", " \n", " \n", " 66.90\n", " \n", " 71.85\n", " \n", " 55.24\n", " \n", " 54.48\n", " \n", " 77.19\n", " \n", " 73.08\n", " \n", " 71.05\n", " \n", " 79.30\n", " \n", " 58.53/81.97\n", " \n", " 69.71\n", " \n", " 58.60\n", "
4L768H \n", " HFL/RBT4, Chinese\n", " \n", " 67.42\n", " \n", " 72.41\n", " \n", " 56.50\n", " \n", " 58.95\n", " \n", " 77.34\n", " \n", " 70.78\n", " \n", " 71.05\n", " \n", " 78.23\n", " \n", " 59.30/81.93\n", " \n", " 73.18\n", " \n", " 56.45\n", "
4L512H \n", " UER/Chinese-RoBERTa-Small\n", " \n", " 63.25\n", " \n", " 69.21\n", " \n", " 55.41\n", " \n", " 57.552\n", " \n", " 73.64\n", " \n", " 69.80\n", " \n", " 66.78\n", " \n", " 74.83\n", " \n", " 46.75/69.69\n", " \n", " 67.59\n", " \n", " 50.92\n", "
4L384H \n", " \n", " \n", " ERNIE 3.0-Micro-zh\n", " \n", " \n", " \n", " 64.21\n", " \n", " 71.15\n", " \n", " 55.05\n", " \n", " 53.83\n", " \n", " 74.81\n", " \n", " 70.41\n", " \n", " 69.08\n", " \n", " 76.50\n", " \n", " 53.77/77.82\n", " \n", " 62.26\n", " \n", " 55.53\n", "
4L312H \n", " \n", " \n", " ERNIE 3.0-Nano-zh\n", " \n", " \n", " \n", " 62.97\n", " \n", " 70.51\n", " \n", " 54.57\n", " \n", " 48.36\n", " \n", " 74.97\n", " \n", " 70.61\n", " \n", " 68.75\n", " \n", " 75.93\n", " \n", " 52.00/76.35\n", " \n", " 58.91\n", " \n", " 55.11\n", "
\n", " TinyBERT4, Chinese\n", " \n", " 60.82\n", " \n", " 69.07\n", " \n", " 54.02\n", " \n", " 39.71\n", " \n", " 73.94\n", " \n", " 69.59\n", " \n", " 70.07\n", " \n", " 75.07\n", " \n", " 46.04/69.34\n", " \n", " 58.53\n", " \n", " 52.18\n", "
4L256H \n", " UER/Chinese-RoBERTa-Mini\n", " \n", " 53.40\n", " \n", " 69.32\n", " \n", " 54.22\n", " \n", " 41.63\n", " \n", " 69.40\n", " \n", " 67.36\n", " \n", " 65.13\n", " \n", " 70.07\n", " \n", " 5.96/17.13\n", " \n", " 51.19\n", " \n", " 39.68\n", "
3L1024H \n", " HFL/RBTL3, Chinese\n", " \n", " 66.63\n", " \n", " 71.11\n", " \n", " 56.14\n", " \n", " 59.56\n", " \n", " 76.41\n", " \n", " 71.29\n", " \n", " 69.74\n", " \n", " 76.93\n", " \n", " 58.50/80.90\n", " \n", " 71.03\n", " \n", " 55.56\n", "
3L768H \n", " HFL/RBT3, Chinese\n", " \n", " 65.72\n", " \n", " 70.95\n", " \n", " 55.53\n", " \n", " 59.18\n", " \n", " 76.20\n", " \n", " 70.71\n", " \n", " 67.11\n", " \n", " 76.63\n", " \n", " 55.73/78.63\n", " \n", " 70.26\n", " \n", " 54.93\n", "
2L128H \n", " UER/Chinese-RoBERTa-Tiny\n", " \n", " 44.45\n", " \n", " 69.02\n", " \n", " 51.47\n", " \n", " 20.28\n", " \n", " 59.95\n", " \n", " 57.73\n", " \n", " 63.82\n", " \n", " 67.43\n", " \n", " 3.08/14.33\n", " \n", " 23.57\n", " \n", " 28.12\n", "
\n", "
\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3.模型如何使用\n", "\n", "安装 paddlenlp 最新的安装包\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-11-11T06:50:18.329409Z", "iopub.status.busy": "2022-11-11T06:50:18.328968Z", "iopub.status.idle": "2022-11-11T06:53:07.814814Z", "shell.execute_reply": "2022-11-11T06:53:07.813775Z", "shell.execute_reply.started": "2022-11-11T06:50:18.329379Z" }, "scrolled": true }, "outputs": [], "source": [ "!pip install paddlenlp --upgrade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 微调\n", "\n", "使用 PaddleNLP 只需要一行代码可以拿到 ERNIE 3.0 系列预训练模型,之后可以在自己的下游数据下进行微调,从而获得具体任务上效果更好的模型。\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-11-11T06:53:07.817337Z", "iopub.status.busy": "2022-11-11T06:53:07.816843Z", "iopub.status.idle": "2022-11-11T06:53:56.003766Z", "shell.execute_reply": "2022-11-11T06:53:56.002942Z", "shell.execute_reply.started": "2022-11-11T06:53:07.817293Z" }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "from paddlenlp.transformers import *\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\"ernie-3.0-medium-zh\")\n", "\n", "# 用于分类任务\n", "seq_cls_model = AutoModelForSequenceClassification.from_pretrained(\"ernie-3.0-medium-zh\")\n", "\n", "# 用于序列标注任务\n", "token_cls_model = AutoModelForTokenClassification.from_pretrained(\"ernie-3.0-medium-zh\")\n", "\n", "# 用于阅读理解任务\n", "qa_model = AutoModelForQuestionAnswering.from_pretrained(\"ernie-3.0-medium-zh\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可启动如下微调脚本对 **ERNIE 3.0-Medium** 在 CLUE IFLYTEK 文本分类的数据集上进行微调:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-11-11T06:53:56.006231Z", "iopub.status.busy": "2022-11-11T06:53:56.005136Z", "iopub.status.idle": "2022-11-11T06:55:21.095442Z", "shell.execute_reply": "2022-11-11T06:55:21.094490Z", "shell.execute_reply.started": "2022-11-11T06:53:56.006195Z" }, "scrolled": true }, "outputs": [], "source": [ "!git clone https://gitee.com/paddlepaddle/PaddleNLP.git" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-11-11T06:56:23.255848Z", "iopub.status.busy": "2022-11-11T06:56:23.255438Z", "iopub.status.idle": "2022-11-11T06:58:21.333380Z", "shell.execute_reply": "2022-11-11T06:58:21.332474Z", "shell.execute_reply.started": "2022-11-11T06:56:23.255822Z" }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# 分类任务\n", "# 该脚本共支持 CLUE 中 7 个分类任务,超参不全相同,因此分类任务中的超参配置利用 config.yml 配置\n", "!python PaddleNLP/model_zoo/ernie-3.0/run_seq_cls.py \\\n", " --task_name iflytek \\\n", " --model_name_or_path ernie-3.0-medium-zh \\\n", " --do_train\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 模型压缩\n", "\n", "如果有模型部署上线的需求,则可以进一步压缩模型体积,可使用模型压缩方案及 API 对上一步微调后的模型进行压缩。\n", "\n", "模型压缩 API 的使用可参考[文档](../../docs/compression.md)。同样地,模型压缩 API 也支持分类(包含文本分类、文本匹配、自然语言推理、代词消歧等任务)、序列标注、阅读理解、信息抽取等自然语言处理场景。\n", "\n", "压缩后保存的模型可以直接用于部署。\n", "\n", "## 部署\n", "\n", "我们为 ERNIE 3.0 提供了[多种部署方案](https://github.com/paddlepaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0#%E9%83%A8%E7%BD%B2),可以满足不同场景下的部署需求,请根据实际情况进行选择:\n", "

\n", " \"image\"\n", "

\n", "\n", "\n", "# 4.原理\n", "\n", "### 在线蒸馏技术\n", "\n", "在线蒸馏技术在模型学习的过程中周期性地将知识信号传递给若干个学生模型同时训练,从而在蒸馏阶段一次性产出多种尺寸的学生模型。相对传统蒸馏技术,该技术极大节省了因大模型额外蒸馏计算以及多个学生的重复知识传递带来的算力消耗。\n", "\n", "这种新颖的蒸馏方式利用了文心大模型的规模优势,在蒸馏完成后保证了学生模型的效果和尺寸丰富性,方便不同性能需求的应用场景使用。此外,由于文心大模型的模型尺寸与学生模型差距巨大,模型蒸馏难度极大甚至容易失效。为此,通过引入了助教模型进行蒸馏的技术,利用助教作为知识传递的桥梁以缩短学生模型和大模型表达空间相距过大的问题,从而促进蒸馏效率的提升。\n", "\n", "更多技术细节可以参考论文:\n", "- [ERNIE-Tiny: A Progressive Distillation Framework for Pretrained Transformer Compression](https://arxiv.org/abs/2106.02241)\n", "- [ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation](https://arxiv.org/abs/2112.12731)\n", "\n", "

\n", " \"image\"\n", "

\n", "\n", "\n", "\n", "# 5.相关论文及引用信息\n", "\n", "\n", "```text\n", "@article{sun2021ernie,\n", " title={Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation},\n", " author={Sun, Yu and Wang, Shuohuan and Feng, Shikun and Ding, Siyu and Pang, Chao and Shang, Junyuan and Liu, Jiaxiang and Chen, Xuyi and Zhao, Yanbin and Lu, Yuxiang and others},\n", " journal={arXiv preprint arXiv:2107.02137},\n", " year={2021}\n", "}\n", "\n", "@article{su2021ernie,\n", " title={Ernie-tiny: A progressive distillation framework for pretrained transformer compression},\n", " author={Su, Weiyue and Chen, Xuyi and Feng, Shikun and Liu, Jiaxiang and Liu, Weixin and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},\n", " journal={arXiv preprint arXiv:2106.02241},\n", " year={2021}\n", "}\n", "\n", "@article{wang2021ernie,\n", " title={Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation},\n", " author={Wang, Shuohuan and Sun, Yu and Xiang, Yang and Wu, Zhihua and Ding, Siyu and Gong, Weibao and Feng, Shikun and Shang, Junyuan and Zhao, Yanbin and Pang, Chao and others},\n", " journal={arXiv preprint arXiv:2112.12731},\n", " year={2021}\n", "}\n", "```\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "py35-paddle1.2.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }