introduction_cn.ipynb 8.6 KB
Notebook
Newer Older
Z
zhoujun 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. PP-OCRv2模型简介\n",
    "\n",
    "PP-OCR是PaddleOCR针对OCR领域发布的文字检测识别系统,PP-OCRv2针对 PP-OCR 进行了一些经验性改进,构建了一个新的 OCR 系统。PP-OCRv2系统框图如下所示(粉色框中为PP-OCRv2新增策略):\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/12406017/200258931-771f5a1d-230c-4168-9130-0b79321558a9.png\"  width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "从算法改进思路上看,主要有五个方面的改进:\n",
    "1. 检测模型优化:采用 CML 协同互学习知识蒸馏策略;\n",
    "2. 检测模型优化:CopyPaste 数据增广策略;\n",
    "3. 识别模型优化:PP-LCNet 轻量级骨干网络;\n",
    "4. 识别模型优化:UDML 改进知识蒸馏策略;\n",
    "5. 识别模型优化:Enhanced CTC loss 损失函数改进。\n",
    "\n",
    "从效果上看,主要有三个方面提升:\n",
    "1. 在模型效果上,相对于 PP-OCR mobile 版本提升超7%;\n",
    "2. 在速度上,相对于 PP-OCR server 版本提升超过220%;\n",
    "3. 在模型大小上,11.6M 的总大小,服务器端和移动端都可以轻松部署。\n",
    "\n",
    "更详细的优化细节可参考技术报告:https://arxiv.org/abs/2109.03144 。\n",
    "\n",
    "更多关于PaddleOCR的内容,可以点击 https://github.com/PaddlePaddle/PaddleOCR 进行了解。\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 模型效果\n",
    "\n",
    "PP-OCRv2的效果如下:\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/12406017/200239467-a082eef9-fee0-4587-be48-b276a95bf8d0.gif\"  width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 模型如何使用\n",
    "\n",
56
    "### 3.1 模型推理\n",
Z
zhoujun 已提交
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
    "* 安装PaddleOCR whl包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
文幕地方's avatar
文幕地方 已提交
73
    "! pip install paddleocr --user"
Z
zhoujun 已提交
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 快速体验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 命令行使用\n",
93 94
    "! wget https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/imgs/11.jpg\n",
    "! paddleocr --image_dir 11.jpg --use_angle_cls true --ocr_version PP-OCRv2"
Z
zhoujun 已提交
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "运行完成后,会在终端输出如下结果:\n",
    "```log\n",
    "[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]\n",
    "[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]\n",
    "[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]\n",
    "......\n",
    "```\n",
    "\n",
    "\n"
   ]
  },
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 模型训练\n",
    "PP-OCR系统由文本检测模型、方向分类器和文本识别模型构成,三个模型训练教程可参考如下文档:\n",
    "1. 文本检测模型:[文本检测训练教程](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/detection.md)\n",
    "1. 方向分类器: [方向分类器训练教程](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/angle_class.md)\n",
    "1. 文本识别模型:[文本识别训练教程](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/recognition.md)\n",
    "\n",
    "模型训练完成后,可以通过指定模型路径的方式串联使用\n",
    "命令参考如下:\n",
    "```python\n",
    "paddleocr --image_dir 11.jpg --use_angle_cls true --ocr_version PP-OCRv2 --det_model_dir=/path/to/det_inference_model --cls_model_dir=/path/to/cls_inference_model --rec_model_dir=/path/to/rec_inference_model\n",
    "```"
   ]
  },
Z
zhoujun 已提交
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 原理\n",
    "\n",
    "优化思路具体如下\n",
    "\n",
    "1. 检测模型优化\n",
    "- 采用 CML (Collaborative Mutual Learning) 协同互学习知识蒸馏策略。\n",
    "     <div align=\"center\">\n",
    "   <img src=\"https://pic4.zhimg.com/80/v2-05f12bcd1784993edabdadbc89b5e9e7_720w.webp\"  width = \"60%\"  />\n",
    "   </div>\n",
    "\n",
    "如上图所示,CML 的核心思想结合了①传统的 Teacher 指导 Student 的标准蒸馏与 ② Students 网络直接的 DML 互学习,可以让 Students 网络互学习的同时,Teacher 网络予以指导。对应的,精心设计关键的三个 Loss 损失函数:GT Loss、DML Loss 和 Distill Loss,在 Teacher 网络 Backbone 为 ResNet18 的条件下,对 Student 的 MobileNetV3 起到了良好的提升效果。\n",
    "\n",
    "   - CopyPaste 数据增广策略\n",
    "   <div align=\"center\">\n",
    "   <img src=\"https://pic1.zhimg.com/80/v2-90239608c554972ac307be07f487f254_720w.webp\"  width = \"60%\"  />\n",
    "   </div>\n",
    "\n",
    "数据增广是提升模型泛化能力重要的手段之一,CopyPaste 是一种新颖的数据增强技巧,已经在目标检测和实例分割任务中验证了有效性。利用 CopyPaste,可以合成文本实例来平衡训练图像中的正负样本之间的比例。相比而言,传统图像旋转、随机翻转和随机裁剪是无法做到的。\n",
    "\n",
    "CopyPaste 主要步骤包括:①随机选择两幅训练图像,②随机尺度抖动缩放,③随机水平翻转,④随机选择一幅图像中的目标子集,⑤粘贴在另一幅图像中随机的位置。这样,就比较好的提升了样本丰富度,同时也增加了模型对环境鲁棒性。\n",
    "\n",
    "2. 识别模型优化\n",
    "- PP-LCNet 轻量级骨干网络\n",
    "  \n",
    "采用速度和精度均衡的PP-LCNet,有效提升精度的同时减少网络推理时间。\n",
    "\n",
    "- UDML 知识蒸馏策略\n",
    "   <div align=\"center\">\n",
    "   <img src=\"https://pic1.zhimg.com/80/v2-642d94e092c7d5f90bedbd7c7511636c_720w.webp\"  width = \"60%\"  />\n",
    "   </div>\n",
    "   在标准的 DML 知识蒸馏的基础上,新增引入了对于 Feature Map 的监督机制,新增 Feature Loss,增加迭代次数,在 Head 部分增加额外的 FC 网络,最终加快蒸馏的速度同时提升效果。\n",
    "\n",
    "- Enhanced CTC loss 改进\n",
    "   <div align=\"center\">\n",
    "   <img src=\"https://pic3.zhimg.com/80/v2-864d255454b5196e0a2a916d81ff92c6_720w.webp\"  width = \"40%\"  />\n",
    "   </div>\n",
    "   中文文字识别任务中,经常遇到相似字符数误识别的问题,这里借鉴 Metric Learning,引入 Center Loss,进一步增大类间距离来增强模型对相似字符的区分能力,核心思路如上图公式所示。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 注意事项\n",
    "\n",
    "PP-OCR系列模型训练过程中均使用通用数据,如在实际场景中表现不满意,可标注少量数据进行finetune。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. 相关论文以及引用信息\n",
    "```\n",
    "@article{du2021pp,\n",
    "  title={PP-OCRv2: bag of tricks for ultra lightweight OCR system},\n",
    "  author={Du, Yuning and Li, Chenxia and Guo, Ruoyu and Cui, Cheng and Liu, Weiwei and Zhou, Jun and Lu, Bin and Yang, Yehua and Liu, Qiwen and Hu, Xiaoguang and others},\n",
    "  journal={arXiv preprint arXiv:2109.03144},\n",
    "  year={2021}\n",
    "}\n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
文幕地方's avatar
文幕地方 已提交
199
   "display_name": "Python 3.8.13 ('py38')",
Z
zhoujun 已提交
200 201 202 203 204 205 206 207 208 209 210 211 212
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
文幕地方's avatar
文幕地方 已提交
213 214 215 216 217 218
   "version": "3.8.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "58fd1890da6594cebec461cf98c6cb9764024814357f166387d10d267624ecd6"
   }
Z
zhoujun 已提交
219 220 221 222 223
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}