{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. PP-OCRv2模型简介\n", "\n", "PP-OCR是PaddleOCR针对OCR领域发布的文字检测识别系统，PP-OCRv2针对 PP-OCR 进行了一些经验性改进，构建了一个新的 OCR 系统。PP-OCRv2系统框图如下所示（粉色框中为PP-OCRv2新增策略）：\n", "\n", "

\n", "

\n", "

\n", "\n", "从算法改进思路上看，主要有五个方面的改进：\n", "1. 检测模型优化：采用 CML 协同互学习知识蒸馏策略；\n", "2. 检测模型优化：CopyPaste 数据增广策略；\n", "3. 识别模型优化：PP-LCNet 轻量级骨干网络；\n", "4. 识别模型优化：UDML 改进知识蒸馏策略；\n", "5. 识别模型优化：Enhanced CTC loss 损失函数改进。\n", "\n", "从效果上看，主要有三个方面提升：\n", "1. 在模型效果上，相对于 PP-OCR mobile 版本提升超7%；\n", "2. 在速度上，相对于 PP-OCR server 版本提升超过220%；\n", "3. 在模型大小上，11.6M 的总大小，服务器端和移动端都可以轻松部署。\n", "\n", "更详细的优化细节可参考技术报告：https://arxiv.org/abs/2109.03144 。\n", "\n", "更多关于PaddleOCR的内容，可以点击 https://github.com/PaddlePaddle/PaddleOCR 进行了解。\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 模型效果\n", "\n", "PP-OCRv2的效果如下：\n", "\n", "

\n", "

\n", "

\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 模型如何使用\n", "\n", "### 3.1 模型推理\n", "* 安装PaddleOCR whl包" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "! pip install paddleocr --user" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 快速体验" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# 命令行使用\n", "! wget https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/imgs/11.jpg\n", "! paddleocr --image_dir 11.jpg --use_angle_cls true --ocr_version PP-OCRv2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "运行完成后，会在终端输出如下结果：\n", "```log\n", "[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]\n", "[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]\n", "[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['（45元/每公斤，100公斤起订）', 0.9676722]]\n", "......\n", "```\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 模型训练\n", "PP-OCR系统由文本检测模型、方向分类器和文本识别模型构成，三个模型训练教程可参考如下文档:\n", "1. 文本检测模型：[文本检测训练教程](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/detection.md)\n", "1. 方向分类器: [方向分类器训练教程](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/angle_class.md)\n", "1. 文本识别模型：[文本识别训练教程](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/recognition.md)\n", "\n", "模型训练完成后，可以通过指定模型路径的方式串联使用\n", "命令参考如下：\n", "```python\n", "paddleocr --image_dir 11.jpg --use_angle_cls true --ocr_version PP-OCRv2 --det_model_dir=/path/to/det_inference_model --cls_model_dir=/path/to/cls_inference_model --rec_model_dir=/path/to/rec_inference_model\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. 原理\n", "\n", "优化思路具体如下\n", "\n", "1. 检测模型优化\n", "- 采用 CML (Collaborative Mutual Learning) 协同互学习知识蒸馏策略。\n", "

\n", "

\n", "

\n", "\n", "如上图所示，CML 的核心思想结合了①传统的 Teacher 指导 Student 的标准蒸馏与 ② Students 网络直接的 DML 互学习，可以让 Students 网络互学习的同时，Teacher 网络予以指导。对应的，精心设计关键的三个 Loss 损失函数：GT Loss、DML Loss 和 Distill Loss，在 Teacher 网络 Backbone 为 ResNet18 的条件下，对 Student 的 MobileNetV3 起到了良好的提升效果。\n", "\n", " - CopyPaste 数据增广策略\n", "

\n", "

\n", "

\n", "\n", "数据增广是提升模型泛化能力重要的手段之一，CopyPaste 是一种新颖的数据增强技巧，已经在目标检测和实例分割任务中验证了有效性。利用 CopyPaste，可以合成文本实例来平衡训练图像中的正负样本之间的比例。相比而言，传统图像旋转、随机翻转和随机裁剪是无法做到的。\n", "\n", "CopyPaste 主要步骤包括：①随机选择两幅训练图像，②随机尺度抖动缩放，③随机水平翻转，④随机选择一幅图像中的目标子集，⑤粘贴在另一幅图像中随机的位置。这样，就比较好的提升了样本丰富度，同时也增加了模型对环境鲁棒性。\n", "\n", "2. 识别模型优化\n", "- PP-LCNet 轻量级骨干网络\n", " \n", "采用速度和精度均衡的PP-LCNet，有效提升精度的同时减少网络推理时间。\n", "\n", "- UDML 知识蒸馏策略\n", "

\n", "

\n", "

\n", " 在标准的 DML 知识蒸馏的基础上，新增引入了对于 Feature Map 的监督机制，新增 Feature Loss，增加迭代次数，在 Head 部分增加额外的 FC 网络，最终加快蒸馏的速度同时提升效果。\n", "\n", "- Enhanced CTC loss 改进\n", "

\n", "

\n", "

\n", " 中文文字识别任务中，经常遇到相似字符数误识别的问题，这里借鉴 Metric Learning，引入 Center Loss，进一步增大类间距离来增强模型对相似字符的区分能力，核心思路如上图公式所示。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. 注意事项\n", "\n", "PP-OCR系列模型训练过程中均使用通用数据，如在实际场景中表现不满意，可标注少量数据进行finetune。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. 相关论文以及引用信息\n", "```\n", "@article{du2021pp,\n", " title={PP-OCRv2: bag of tricks for ultra lightweight OCR system},\n", " author={Du, Yuning and Li, Chenxia and Guo, Ruoyu and Cui, Cheng and Liu, Weiwei and Zhou, Jun and Lu, Bin and Yang, Yehua and Liu, Qiwen and Hu, Xiaoguang and others},\n", " journal={arXiv preprint arXiv:2109.03144},\n", " year={2021}\n", "}\n", "```\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.13 ('py38')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "vscode": { "interpreter": { "hash": "58fd1890da6594cebec461cf98c6cb9764024814357f166387d10d267624ecd6" } } }, "nbformat": 4, "nbformat_minor": 4 }