introduction_en.ipynb 9.5 KB
Notebook
Newer Older
Z
zhoujun 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. PP-OCRv2 Introduction\n",
    "\n",
    "PP-OCR is a text detection and recognition system released by PaddleOCR for the OCR field. PP-OCRv2 has made some improvements for PP-OCR and constructed a new OCR system. The pipeline of PP-OCRv2 system is as follows (the pink box is the new policy of PP-OCRv2):\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/12406017/200258931-771f5a1d-230c-4168-9130-0b79321558a9.png\"  width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "There are five enhancement strategies:\n",
    "1. Text detection enhancement strategies: collaborative mutual learning;\n",
    "2. Text detection enhancement strategies: CcopyPaste data augmentation;\n",
    "3. Text recognition enhancement strategies: PP-LCNet lightweight backbone network;\n",
    "4. Text recognition enhancement strategies: UDML knowledge distillation;\n",
    "5. Text recognition enhancement strategies: enhanced CTC loss;\n",
    "\n",
    "Compared with PP-OCR, the performance improvement of PP-OCRv2 is as follows:\n",
    "1. Compared with the PP-OCR mobile version, the accuracy of the model is improved by more than 7%;\n",
    "2. Compared with the PP-OCR server version, the speed is increased by more than 220%;\n",
    "3. The total model size is 11.6M, which can be easily deployed on both the server side and the mobile side;\n",
    "\n",
    "For more details, please refer to the technical report: https://arxiv.org/abs/2109.03144 .\n",
    "\n",
    "For more information about PaddleOCR, you can click https://github.com/PaddlePaddle/PaddleOCR to learn more.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Model Effects\n",
    "\n",
    "The results of PP-OCRv2 are as follows:\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/12406017/200239467-a082eef9-fee0-4587-be48-b276a95bf8d0.gif\"  width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How to Use the Model\n",
    "\n",
    "### 3.1 Inference\n",
    "* Install PaddleOCR whl package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "! pip install paddleocr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Quick experience"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# command line usage\n",
92 93
    "! wget https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/imgs/11.jpg\n",
    "! paddleocr --image_dir 11.jpg --use_angle_cls true --ocr_version PP-OCRv2"
Z
zhoujun 已提交
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the operation is complete, the following results will be output in the terminal:\n",
    "```log\n",
    "[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]\n",
    "[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]\n",
    "[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]\n",
    "......\n",
    "```\n",
    "\n",
    "\n"
   ]
  },
111 112 113 114
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
文幕地方's avatar
文幕地方 已提交
115
    "### 3.2 Train the model\n",
116 117 118 119 120 121 122 123 124 125 126
    "The PP-OCR system consists of a text detection model, an angle classifier and a text recognition model. For the three model training tutorials, please refer to the following documents:\n",
    "1. text detection model: [text detection training tutorial](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/detection.md)\n",
    "1. angle classifier: [angle classifier training tutorial](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/angle_class.md)\n",
    "1. text recognition model: [text recognition training tutorial](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_ch/recognition.md)\n",
    "\n",
    "After the model training is completed, it can be used in series by specifying the model path. The command reference is as follows:\n",
    "```python\n",
    "paddleocr --image_dir 11.jpg --use_angle_cls true --ocr_version PP-OCRv2 --det_model_dir=/path/to/det_inference_model --cls_model_dir=/path/to/cls_inference_model --rec_model_dir=/path/to/rec_inference_model\n",
    "```"
   ]
  },
Z
zhoujun 已提交
127 128 129 130 131 132
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Model Principles\n",
    "\n",
文幕地方's avatar
文幕地方 已提交
133
    "The enhancement strategies are as follows\n",
Z
zhoujun 已提交
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196
    "\n",
    "1. Text detection enhancement strategies\n",
    "- Adopt CML (Collaborative Mutual Learning) collaborative mutual learning knowledge distillation strategy.\n",
    "     <div align=\"center\">\n",
    "   <img src=\"https://pic4.zhimg.com/80/v2-05f12bcd1784993edabdadbc89b5e9e7_720w.webp\"  width = \"60%\"  />\n",
    "   </div>\n",
    "\n",
    "As shown in the figure above, the core idea of CML combines ① the standard distillation of the traditional Teacher guiding Students and ② the direct DML mutual learning of the Students network, which allows the Students network to learn from each other and the Teacher network to guide. Correspondingly, the three key Loss loss functions are carefully designed: GT Loss, DML Loss and Distill Loss. Under the condition that the Teacher network Backbone is ResNet18, it has a good effect on the Student's MobileNetV3.\n",
    "\n",
    "   - CopyPaste data augmentation\n",
    "   <div align=\"center\">\n",
    "   <img src=\"https://pic1.zhimg.com/80/v2-90239608c554972ac307be07f487f254_720w.webp\"  width = \"60%\"  />\n",
    "   </div>\n",
    "\n",
    "Data augmentation is one of the important means to improve the generalization ability of the model. CopyPaste is a novel data augmentation technique, which has been proven effective in object detection and instance segmentation tasks. With CopyPaste, text instances can be synthesized to balance the ratio between positive and negative samples in training images. In contrast, this is impossible with traditional image rotation, random flipping, and random cropping.\n",
    "\n",
    "The main steps of CopyPaste include: ① randomly select two training images, ② randomly scale jitter and zoom, ③ randomly flip horizontally, ④ randomly select a target subset in one image, and ⑤ paste at a random position in another image. In this way, the sample richness can be greatly improved, and the robustness of the model to the environment can be increased.\n",
    "\n",
    "2. Text recognition enhancement strategies\n",
    "- PP-LCNet lightweight backbone network\n",
    "  \n",
    "The PP-LCNet with balanced speed and accuracy is adopted to effectively improve the accuracy and reduce the network inference time.\n",
    "\n",
    "- UDML Knowledge Distillation Strategy\n",
    "   <div align=\"center\">\n",
    "   <img src=\"https://pic1.zhimg.com/80/v2-642d94e092c7d5f90bedbd7c7511636c_720w.webp\"  width = \"60%\"  />\n",
    "   </div>\n",
    "On the basis of standard DML knowledge distillation, a supervision mechanism for Feature Map is introduced, Feature Loss is added, the number of iterations is increased, and an additional FC network is added to the Head part, which finally speeds up the distillation and improves the effect.\n",
    "\n",
    "- Enhanced CTC loss\n",
    "   <div align=\"center\">\n",
    "   <img src=\"https://pic3.zhimg.com/80/v2-864d255454b5196e0a2a916d81ff92c6_720w.webp\"  width = \"40%\"  />\n",
    "   </div>\n",
    "   In Chinese text recognition tasks, the problem of misrecognition of the number of similar characters is often encountered. Here, we draw on Metric Learning and introduce Center Loss to further increase the distance between classes to enhance the model's ability to distinguish similar characters. The core idea is shown in the formula above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Attention\n",
    "\n",
    "General data are used in the training process of PP-OCR series models. If the performance is not satisfactory in the actual scene, a small amount of data can be marked for finetune."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Related papers and citations\n",
    "```\n",
    "@article{du2021pp,\n",
    "  title={PP-OCRv2: bag of tricks for ultra lightweight OCR system},\n",
    "  author={Du, Yuning and Li, Chenxia and Guo, Ruoyu and Cui, Cheng and Liu, Weiwei and Zhou, Jun and Lu, Bin and Yang, Yehua and Liu, Qiwen and Hu, Xiaoguang and others},\n",
    "  journal={arXiv preprint arXiv:2109.03144},\n",
    "  year={2021}\n",
    "}\n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
文幕地方's avatar
文幕地方 已提交
197
   "display_name": "Python 3.8.13 ('py38')",
Z
zhoujun 已提交
198 199 200 201 202 203 204 205 206 207 208 209 210
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
文幕地方's avatar
文幕地方 已提交
211 212 213 214 215 216
   "version": "3.8.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "58fd1890da6594cebec461cf98c6cb9764024814357f166387d10d267624ecd6"
   }
Z
zhoujun 已提交
217 218 219 220 221
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}