5. The unit of performance data is QPS. How to calculate QPS: fixed batch size of 32, test running time total_time, calculated QPS = total_samples / total_time.
6. Metrics:Accuracy for sequence classification,F1-Score for token classification, EM (Exact Match) for question answering.
"- [ERNIE-Tiny: A Progressive Distillation Framework for Pretrained Transformer Compression](https://arxiv.org/abs/2106.02241)\n",
"- [ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation](https://arxiv.org/abs/2112.12731)\n",
" title={Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation},\n",
" author={Sun, Yu and Wang, Shuohuan and Feng, Shikun and Ding, Siyu and Pang, Chao and Shang, Junyuan and Liu, Jiaxiang and Chen, Xuyi and Zhao, Yanbin and Lu, Yuxiang and others},\n",
" journal={arXiv preprint arXiv:2107.02137},\n",
" year={2021}\n",
"}\n",
"\n",
"@article{su2021ernie,\n",
" title={Ernie-tiny: A progressive distillation framework for pretrained transformer compression},\n",
" author={Su, Weiyue and Chen, Xuyi and Feng, Shikun and Liu, Jiaxiang and Liu, Weixin and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},\n",
" journal={arXiv preprint arXiv:2106.02241},\n",
" year={2021}\n",
"}\n",
"\n",
"@article{wang2021ernie,\n",
" title={Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation},\n",
" author={Wang, Shuohuan and Sun, Yu and Xiang, Yang and Wu, Zhihua and Ding, Siyu and Gong, Weibao and Feng, Shikun and Shang, Junyuan and Zhao, Yanbin and Pang, Chao and others},\n",
"[ERNIE Tiny Models](https://github.com/paddlepaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0) are lightweight models obtained from Wenxin large model ERNIE 3.0 using distillation technology. The model structure is consistent with ERNIE 2.0, and has a stronger Chinese effect than ERNIE 2.0.\n",
"\n",
"For a detailed explanation of related technologies, please refer to the article [_解析全球最大中文单体模型鹏城-百度·文心技术细节_](https://www.jiqizhixin.com/articles/2021-12-08-9)\n",
"Below is the **precision-latency graph** of the small Chinese models in PaddleNLP. The abscissa represents the latency (unit: ms) tested on CLUE IFLYTEK dataset (maximum sequence length is set to 128), and the ordinate is the average accuracy on 10 CLUE tasks (including text classification, text matching, natural language inference, Pronoun disambiguation, machine reading comprehension and other tasks), among which the metric of CMRC2018 is Exact Match (EM), and the metric of other tasks is Accuracy. The closer the model to the top left in the figure, the higher the level of accuracy and performance.The top left model in the figure has the highest level of accuracy and performance.\n",
"\n",
"The number of parameters of the model are marked under the model name in the figure. For the test environment, see [Performance Test](https://github.com/paddlepaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0#%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95) in details.\n",
"\n",
"\n",
"precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 32:\n",
"As can be seen from the figure, the comprehensive performance of the ERNIE Tiny 3.0 models has been comprehensively ahead of UER-py, Huawei-Noah and HFL in terms of accuracy and performance. And when batch_size=1 and the precision mode is FP16, the inference performance of the wide and shallow model on the GPU is more advantageous.\n",
"\n",
"在 CLUE **验证集**上评测指标如下表所示:\n",
"The precision data on the CLUE **validation set** are shown in the following table:\n",
"Below is the **precision-latency graph** of the small Chinese models in PaddleNLP. The abscissa represents the latency (unit: ms) tested on CLUE IFLYTEK dataset (maximum sequence length is set to 128), and the ordinate is the average accuracy on 10 CLUE tasks (including text classification, text matching, natural language inference, Pronoun disambiguation, machine reading comprehension and other tasks), among which the metric of CMRC2018 is Exact Match (EM), and the metric of other tasks is Accuracy. The closer the model to the top left in the figure, the higher the level of accuracy and performance.The top left model in the figure has the highest level of accuracy and performance.\n",
"\n",
"The number of parameters of the model are marked under the model name in the figure. For the test environment, see [Performance Test](https://github.com/paddlepaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0#%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95) in details.\n",
"\n",
"\n",
"precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 32:\n",
"As can be seen from the figure, the comprehensive performance of the ERNIE Tiny 3.0 models has been comprehensively ahead of UER-py, Huawei-Noah and HFL in terms of accuracy and performance. And when batch_size=1 and the precision mode is FP16, the inference performance of the wide and shallow model on the GPU is more advantageous.\n",
"\n",
"在 CLUE **验证集**上评测指标如下表所示:\n",
"The precision data on the CLUE **validation set** are shown in the following table:\n",
"The pre-trained models released by ERNIE 3.0 cannot be directly used for prediction, and the pre-trained models need to be fine-tuned using task-specific data.\n",
"\n",
"Using PaddleNLP, you only need one line code to get the ERNIE Tiny 3.0 models, and then you can fine-tune it under your own task data to obtain task-specific models.\n"
"If there is a need to deploy the model online, you can further compress the model size. You can use the model compression plans and API to compress the fine-tuned models.\n",
"\n",
"The usage of the model compression API can be found in [documentation](../../docs/compression.md). Similarly, the model compression API also supports natural language processing tasks such as classification (including text classification, text matching, natural language inference, pronoun disambiguation, etc.), token classification, machine reading comprehension and information extraction and so on.\n",
"\n",
"The model exported after compression can be used directly for deployment.\n",
"We provide [various deployment scenarios](https://github.com/paddlepaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0#%E9%83%A8%E7%BD%B2) for ERNIE 3.0, which can meet the deployment requirements in different tasks, please choose according to the actual situation:\n",
"In the process of model learning, online distillation technology periodically transmits knowledge signals to several student models for simultaneous training, thereby producing student models of multiple sizes at one time in the distillation stage. Compared with the traditional distillation technology, this technology greatly saves the computing power consumption caused by the extra distillation calculation of the large model and the repeated knowledge transfer of multiple students.\n",
"\n",
"This novel distillation method takes advantage of the scale advantage of the Wenxin large model, and ensures the effect and size richness of the student model after the distillation is completed, which is convenient for application scenarios with different performance requirements. In addition, due to the huge gap between the model size of the Wenxin model and the student model, the model distillation is extremely difficult or even easy to fail. To this end, by introducing the technology of the teaching assistant model for distillation, the teaching assistant is used as a bridge for knowledge transfer to shorten the problem that the expression space between the student model and the large model is too large, thereby promoting the improvement of distillation efficiency.\n",
"\n",
"For more technical details, please refer to the paper:\n",
"- [ERNIE-Tiny: A Progressive Distillation Framework for Pretrained Transformer Compression](https://arxiv.org/abs/2106.02241)\n",
"- [ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation](https://arxiv.org/abs/2112.12731)\n",
" title={Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation},\n",
" author={Sun, Yu and Wang, Shuohuan and Feng, Shikun and Ding, Siyu and Pang, Chao and Shang, Junyuan and Liu, Jiaxiang and Chen, Xuyi and Zhao, Yanbin and Lu, Yuxiang and others},\n",
" journal={arXiv preprint arXiv:2107.02137},\n",
" year={2021}\n",
"}\n",
"\n",
"@article{su2021ernie,\n",
" title={Ernie-tiny: A progressive distillation framework for pretrained transformer compression},\n",
" author={Su, Weiyue and Chen, Xuyi and Feng, Shikun and Liu, Jiaxiang and Liu, Weixin and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},\n",
" journal={arXiv preprint arXiv:2106.02241},\n",
" year={2021}\n",
"}\n",
"\n",
"@article{wang2021ernie,\n",
" title={Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation},\n",
" author={Wang, Shuohuan and Sun, Yu and Xiang, Yang and Wu, Zhihua and Ding, Siyu and Gong, Weibao and Feng, Shikun and Shang, Junyuan and Zhao, Yanbin and Pang, Chao and others},\n",