diff --git "a/notebook/notebook_ch/4.ppocr_system_strategy/PP-OCR\347\263\273\347\273\237\345\217\212\344\274\230\345\214\226\347\255\226\347\225\245.ipynb" "b/notebook/notebook_ch/4.ppocr_system_strategy/PP-OCR\347\263\273\347\273\237\345\217\212\344\274\230\345\214\226\347\255\226\347\225\245.ipynb"
new file mode 100644
index 0000000000000000000000000000000000000000..4160b077b3ef2939ffd2ac88867bb12c93438c61
--- /dev/null
+++ "b/notebook/notebook_ch/4.ppocr_system_strategy/PP-OCR\347\263\273\347\273\237\345\217\212\344\274\230\345\214\226\347\255\226\347\225\245.ipynb"
@@ -0,0 +1,3491 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "# 1. PP-OCR系统简介与总览\n",
+ "\n",
+ "前两章主要介绍了DBNet文字检测算法以及CRNN文字识别算法。然而对于我们实际场景中的一张图像,想要单独基于文字检测或者识别模型,是无法同时获取文字位置与文字内容的,因此,我们将文字检测算法以及文字识别算法进行串联,构建了PP-OCR文字检测与识别系统。在实际使用过程中,检测出的文字方向可能不是我们期望的方向,最终导致文字识别错误,因此我们在PP-OCR系统中也引入了方向分类器。\n",
+ "\n",
+ "本章主要介绍PP-OCR文字检测与识别系统以及该系统中涉及到的优化策略。通过本节课的学习,您可以获得:\n",
+ "\n",
+ "* PaddleOCR策略调优技巧\n",
+ "* 文本检测、识别、方向分类器模型的优化技巧和优化方法\n",
+ "\n",
+ "PP-OCR系统共经历了2次优化,下面对PP-OCR系统和这2次优化进行简单介绍。\n",
+ "\n",
+ "## 1.1 PP-OCR系统与优化策略简介\n",
+ "\n",
+ "PP-OCR中,对于一张图像,如果希望提取其中的文字信息,需要完成以下几个步骤:\n",
+ "\n",
+ "* 使用文本检测的方法,获取文本区域多边形信息(PP-OCR中文本检测使用的是DBNet,因此获取的是四点信息)。\n",
+ "* 对上述文本多边形区域进行裁剪与透视变换校正,将文本区域转化成矩形框,再使用方向分类器对方向进行校正。\n",
+ "* 基于包含文字区域的矩形框进行文本识别,得到最终识别结果。\n",
+ "\n",
+ "上面便完成了对于一张图像的文本检测与识别过程。\n",
+ "\n",
+ "PP-OCR的系统框图如下所示。\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
PP-OCR系统框图\n",
+ "\n",
+ "文本检测基于后处理方案比较简单的DBNet,文字区域校正主要使用几何变换以及方向分类器,文本识别使用了基于融合了卷积特征与序列特征的CRNN模型,使用CTC loss解决预测结果与标签不一致的问题。\n",
+ "\n",
+ "PP-OCR从骨干网络、学习率策略、数据增广、模型裁剪量化等方面,共使用了19个策略,对模型进行优化瘦身,最终打造了面向服务器端的PP-OCR server系统以及面向移动端的PP-OCR mobile系统。\n",
+ "\n",
+ "## 1.2 PP-OCRv2系统与优化策略简介\n",
+ "\n",
+ "相比于PP-OCR, PP-OCRv2 在骨干网络、数据增广、损失函数这三个方面进行进一步优化,解决端侧预测效率较差、背景复杂以及相似字符的误识等问题,同时引入了知识蒸馏训练策略,进一步提升模型精度。具体地:\n",
+ "\n",
+ "* 检测模型优化: (1) 采用 CML 协同互学习知识蒸馏策略;(2) CopyPaste 数据增广策略;\n",
+ "* 识别模型优化: (1) PP-LCNet 轻量级骨干网络;(2) U-DML 改进知识蒸馏策略; (3) Enhanced CTC loss 损失函数改进。\n",
+ "\n",
+ "从效果上看,主要有三个方面提升:\n",
+ "\n",
+ "* 在模型效果上,相对于 PP-OCR mobile 版本提升超7%;\n",
+ "* 在速度上,相对于 PP-OCR server 版本提升超过220%;\n",
+ "* 在模型大小上,11.6M 的总大小,服务器端和移动端都可以轻松部署。\n",
+ "\n",
+ "PP-OCRv2 模型与之前 PP-OCR 系列模型的精度、预测耗时、模型大小对比图如下所示。\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "
PP-OCRv2与PP-OCR的速度、精度、模型大小对比\n",
+ "\n",
+ "PP-OCRv2的系统框图如下所示。\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "
PP-OCRv2系统框图\n",
+ " \n",
+ "\n",
+ "本章将对上述PP-OCR以及PP-OCRv2系统优化策略进行详细的解读。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "# 2. PP-OCR 优化策略\n",
+ "\n",
+ "PP-OCR系统包括文本检测器、方向分类器以及文本识别器。本节针对这三个方向的模型优化策略进行详细介绍。\n",
+ "\n",
+ "## 2.1 文本检测\n",
+ "\n",
+ "PP-OCR中的文本检测基于DBNet (Differentiable Binarization)模型,它基于分割方案,后处理简单。DBNet的具体模型结构如下图。\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "DBNet框图\n",
+ "\n",
+ "DBNet通过骨干网络(backbone)提取特征,使用DBFPN的结构(neck)对各阶段的特征进行融合,得到融合后的特征。融合后的特征经过卷积等操作(head)进行解码,生成概率图和阈值图,二者融合后计算得到一个近似的二值图。计算损失函数时,对这三个特征图均计算损失函数,这里把二值化的监督也也加入训练过程,从而让模型学习到更准确的边界。\n",
+ "\n",
+ "DBNet中使用了6种优化策略用于提升模型精度与速度,包括骨干网络、特征金字塔网络、头部结构、学习率策略、模型裁剪等策略。在验证集上,不同模块的消融实验结论如下所示。\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "DBNet消融实验\n",
+ "\n",
+ "\n",
+ "下面进行详细说明。\n",
+ "\n",
+ "### 2.1.1 轻量级骨干网络\n",
+ "\n",
+ "骨干网络的大小对文本检测器的模型大小有重要影响。因此,在构建超轻量检测模型时,应选择轻量的骨干网络。随着图像分类技术的发展,MobileNetV1、MobileNetV2、MobileNetV3和ShuffleNetV2系列常用作轻量骨干网络。每个系列都有不同的模型大小和性能表现。[PaddeClas](https://github.com/PaddlePaddle/PaddleClas)提供了20多种轻量级骨干网络。他们在ARM上的`精度-速度`曲线如下图所示。\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "PaddleClas中骨干网络的\"速度-精度\"曲线\n",
+ "\n",
+ "在预测时间相同的情况下,MobileNetV3系列可以实现更高的精度。作者在设计的时候为了覆盖尽可能多的场景,使用scale这个参数来调整特征图通道数,标准为1x,如果是0.5x,则表示该网络中部分特征图通道数为1x对应网络的0.5倍。为了进一步平衡准确率和效率,在V3的尺寸选择上,我们采用了MobileNetV3_large 0.5x的结构。\n",
+ "\n",
+ "下面打印出DBNet中MobileNetV3各个阶段的特征图尺寸。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "fatal: destination path 'PaddleOCR' already exists and is not an empty directory.\n",
+ "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
+ "Requirement already satisfied: pip in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (21.3.1)\n",
+ "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
+ "Requirement already satisfied: shapely in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (1.8.0)\n",
+ "Requirement already satisfied: scikit-image in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (0.19.1)\n",
+ "Requirement already satisfied: imgaug==0.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.4.0)\n",
+ "Requirement already satisfied: pyclipper in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (1.3.0.post2)\n",
+ "Requirement already satisfied: lmdb in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (1.2.1)\n",
+ "Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (4.27.0)\n",
+ "Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (1.20.3)\n",
+ "Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (2.2.0)\n",
+ "Requirement already satisfied: python-Levenshtein in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (0.12.2)\n",
+ "Requirement already satisfied: opencv-contrib-python==4.4.0.46 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (4.4.0.46)\n",
+ "Requirement already satisfied: cython in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (0.29)\n",
+ "Requirement already satisfied: lxml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 12)) (4.7.1)\n",
+ "Requirement already satisfied: premailer in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 13)) (3.10.0)\n",
+ "Requirement already satisfied: openpyxl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 14)) (3.0.5)\n",
+ "Requirement already satisfied: fasttext==0.9.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 15)) (0.9.1)\n",
+ "Requirement already satisfied: imageio in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (2.6.1)\n",
+ "Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (1.6.3)\n",
+ "Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (7.1.2)\n",
+ "Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (4.1.1.26)\n",
+ "Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (1.15.0)\n",
+ "Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (2.2.3)\n",
+ "Requirement already satisfied: pybind11>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->-r requirements.txt (line 15)) (2.8.1)\n",
+ "Requirement already satisfied: setuptools>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->-r requirements.txt (line 15)) (56.2.0)\n",
+ "Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r requirements.txt (line 2)) (20.9)\n",
+ "Requirement already satisfied: networkx>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r requirements.txt (line 2)) (2.4)\n",
+ "Requirement already satisfied: tifffile>=2019.7.26 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r requirements.txt (line 2)) (2021.11.2)\n",
+ "Requirement already satisfied: PyWavelets>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r requirements.txt (line 2)) (1.2.0)\n",
+ "Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (3.14.0)\n",
+ "Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (3.8.2)\n",
+ "Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (1.0.0)\n",
+ "Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (1.1.5)\n",
+ "Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (1.21.0)\n",
+ "Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (0.7.1.1)\n",
+ "Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (2.22.0)\n",
+ "Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (1.1.1)\n",
+ "Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r requirements.txt (line 8)) (0.8.53)\n",
+ "Requirement already satisfied: cssutils in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r requirements.txt (line 13)) (2.3.0)\n",
+ "Requirement already satisfied: cachetools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r requirements.txt (line 13)) (4.0.0)\n",
+ "Requirement already satisfied: cssselect in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r requirements.txt (line 13)) (1.1.0)\n",
+ "Requirement already satisfied: et-xmlfile in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r requirements.txt (line 14)) (1.0.1)\n",
+ "Requirement already satisfied: jdcal in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r requirements.txt (line 14)) (1.4.1)\n",
+ "Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r requirements.txt (line 8)) (2.6.0)\n",
+ "Requirement already satisfied: importlib-metadata in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r requirements.txt (line 8)) (0.23)\n",
+ "Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r requirements.txt (line 8)) (0.6.1)\n",
+ "Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r requirements.txt (line 8)) (2.2.0)\n",
+ "Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 8)) (0.16.0)\n",
+ "Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 8)) (7.0)\n",
+ "Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 8)) (2.11.0)\n",
+ "Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 8)) (1.1.0)\n",
+ "Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r requirements.txt (line 8)) (2.8.0)\n",
+ "Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r requirements.txt (line 8)) (2019.3)\n",
+ "Requirement already satisfied: decorator>=4.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from networkx>=2.2->scikit-image->-r requirements.txt (line 2)) (4.4.2)\n",
+ "Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from packaging>=20.0->scikit-image->-r requirements.txt (line 2)) (2.4.2)\n",
+ "Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r requirements.txt (line 8)) (0.18.0)\n",
+ "Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r requirements.txt (line 8)) (3.9.9)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r requirements.txt (line 3)) (1.1.0)\n",
+ "Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r requirements.txt (line 3)) (0.10.0)\n",
+ "Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r requirements.txt (line 3)) (2.8.0)\n",
+ "Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r requirements.txt (line 8)) (1.3.0)\n",
+ "Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r requirements.txt (line 8)) (2.0.1)\n",
+ "Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r requirements.txt (line 8)) (5.1.2)\n",
+ "Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r requirements.txt (line 8)) (0.10.0)\n",
+ "Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r requirements.txt (line 8)) (1.3.4)\n",
+ "Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r requirements.txt (line 8)) (16.7.9)\n",
+ "Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r requirements.txt (line 8)) (1.4.10)\n",
+ "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r requirements.txt (line 8)) (1.25.6)\n",
+ "Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r requirements.txt (line 8)) (2.8)\n",
+ "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r requirements.txt (line 8)) (3.0.4)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r requirements.txt (line 8)) (2019.9.11)\n",
+ "Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->-r requirements.txt (line 8)) (1.1.1)\n",
+ "Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.9->visualdl->-r requirements.txt (line 8)) (3.6.0)\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os\n",
+ "import sys\n",
+ "\n",
+ "# 下载代码\n",
+ "os.chdir(\"/home/aistudio/\")\n",
+ "!git clone https://gitee.com/paddlepaddle/PaddleOCR.git\n",
+ "# 切换工作目录\n",
+ "os.chdir(\"/home/aistudio/PaddleOCR/\")\n",
+ "!pip install -U pip\n",
+ "!pip install -r requirements.txt"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "the shape of 0 stage: [1, 16, 160, 160]\n",
+ "the shape of 1 stage: [1, 24, 80, 80]\n",
+ "the shape of 2 stage: [1, 56, 40, 40]\n",
+ "the shape of 3 stage: [1, 480, 20, 20]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 具体代码实现位于:\n",
+ "# https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppocr/modeling/backbones/det_mobilenet_v3.py\n",
+ "import numpy as np\n",
+ "import paddle\n",
+ "\n",
+ "# 设置随机输入\n",
+ "inputs = np.random.rand(1, 3, 640, 640).astype(np.float32)\n",
+ "x = paddle.to_tensor(inputs)\n",
+ "\n",
+ "# 导入MobileNetV3库\n",
+ "from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3\n",
+ "\n",
+ "# 模型定义\n",
+ "backbone_mv3 = MobileNetV3(scale=0.5, model_name='large')\n",
+ "\n",
+ "# 模型forward\n",
+ "bk_out = backbone_mv3(x)\n",
+ "\n",
+ "# 模型中间层打印\n",
+ "for i, stage_out in enumerate(bk_out):\n",
+ " print(\"the shape of \",i,'stage: ',stage_out.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "### 2.1.2 轻量级特征金字塔网络DBFPN结构\n",
+ "\n",
+ "文本检测器的特征融合(neck)部分DBFPN与目标检测任务中的FPN结构类似,融合不同尺度的特征图,以提升不同尺度的文本区域检测效果。\n",
+ "\n",
+ "为了方便合并不同通道的特征图,这里使用`1×1`的卷积将特征图减少到相同数量的通道。\n",
+ "\n",
+ "概率图和阈值图是由卷积融合的特征图生成的,卷积也与inner_channels相关联。因此,inner_channels对模型尺寸有很大的影响。当inner_channels由256减小到96时,模型尺寸由7M减小到4.1M,速度提升48%,但精度只是略有下降。\n",
+ "\n",
+ "下面打印DBFPN的结构以及对于骨干网络特征图的融合结果。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "DBFPN(\n",
+ " (in2_conv): Conv2D(16, 96, kernel_size=[1, 1], data_format=NCHW)\n",
+ " (in3_conv): Conv2D(24, 96, kernel_size=[1, 1], data_format=NCHW)\n",
+ " (in4_conv): Conv2D(56, 96, kernel_size=[1, 1], data_format=NCHW)\n",
+ " (in5_conv): Conv2D(480, 96, kernel_size=[1, 1], data_format=NCHW)\n",
+ " (p5_conv): Conv2D(96, 24, kernel_size=[3, 3], padding=1, data_format=NCHW)\n",
+ " (p4_conv): Conv2D(96, 24, kernel_size=[3, 3], padding=1, data_format=NCHW)\n",
+ " (p3_conv): Conv2D(96, 24, kernel_size=[3, 3], padding=1, data_format=NCHW)\n",
+ " (p2_conv): Conv2D(96, 24, kernel_size=[3, 3], padding=1, data_format=NCHW)\n",
+ ")\n",
+ "the shape of output of DBFPN: [1, 96, 160, 160]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 具体代码实现位于:\n",
+ "# https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppocr/modeling/necks/db_fpn.py\n",
+ "from ppocr.modeling.necks.db_fpn import DBFPN\n",
+ "\n",
+ "neck_bdfpn = DBFPN(in_channels=[16, 24, 56, 480], out_channels=96)\n",
+ "# 打印 DBFPN结构\n",
+ "print(neck_bdfpn)\n",
+ "\n",
+ "# 先对原始的通道数降到96,再降到24,最后4个feature map进行concat\n",
+ "fpn_out = neck_bdfpn(bk_out)\n",
+ "\n",
+ "print('the shape of output of DBFPN: ', fpn_out.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "### 2.1.3 骨干网络中SE模块分析\n",
+ "\n",
+ "SE是`squeeze-and-excitation`的缩写(Hu, Shen, and Sun 2018)。如图所示\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "SE模块示意图\n",
+ "\n",
+ "SE块显式地建模通道之间的相互依赖关系,并自适应地重新校准通道特征响应。在网络中使用SE块可以明显提高视觉任务的准确性,因此MobileNetV3的搜索空间包含了SE模块,最终MobileNetV3中也包含很多个SE模块。然而,当输入分辨率较大时,例如`640×640`,使用SE模块较难估计通道的特征响应,精度提高有限,但SE模块的时间成本非常高。在DBNet中,**我们将SE模块从骨干网络中移除**,模型大小从`4.1M`降到`2.6M`,但精度没有影响。\n",
+ "\n",
+ "PaddleOCR中可以通过设置`disable_se=True`来移除骨干网络中的SE模块,使用方法如下所示。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "the shape of 0 stage: [1, 16, 160, 160]\n",
+ "the shape of 1 stage: [1, 24, 80, 80]\n",
+ "the shape of 2 stage: [1, 56, 40, 40]\n",
+ "the shape of 3 stage: [1, 480, 20, 20]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 具体代码实现位于:\n",
+ "# https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppocr/modeling/backbones/det_mobilenet_v3.py\n",
+ "\n",
+ "x = paddle.rand([1, 3, 640, 640])\n",
+ "\n",
+ "from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3\n",
+ "\n",
+ "# 定义模型\n",
+ "backbone_mv3 = MobileNetV3(scale=0.5, model_name='large', disable_se=True)\n",
+ "\n",
+ "# 模型forward\n",
+ "bk_out = backbone_mv3(x)\n",
+ "# 输出\n",
+ "for i, stage_out in enumerate(bk_out):\n",
+ " print(\"the shape of \",i,'stage: ',stage_out.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "### 2.1.4 学习率策略优化\n",
+ "\n",
+ "* Cosine 学习率下降策略\n",
+ "\n",
+ "梯度下降算法需要我们设置一个值,用来控制权重更新幅度,我们将其称之为学习率。它是控制模型学习速度的超参数。学习率越小,loss的变化越慢。虽然使用较低的学习速率可以确保不会错过任何局部极小值,但这也意味着模型收敛速度较慢。\n",
+ "\n",
+ "因此,在训练前期,权重处于随机初始化状态,我们可以设置一个相对较大的学习速率以加快收敛速度。在训练后期,权重接近最优值,使用相对较小的学习率可以防止模型在收敛的过程中发生震荡。\n",
+ "\n",
+ "Cosine学习率策略也就应运而生,Cosine学习率策略指的是学习率在训练的过程中,按照余弦的曲线变化。在整个训练过程中,Cosine学习率衰减策略使得在网络在训练初期保持了较大的学习速率,在后期学习率会逐渐衰减至0,其收敛速度相对较慢,但最终收敛精度较好。下图比较了两种不同的学习率衰减策略`piecewise decay`和`cosine decay`。\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "Cosine与Piecewise学习率下降策略\n",
+ "\n",
+ "* 学习率预热策略\n",
+ "\n",
+ "模型刚开始训练时,模型权重是随机初始化的,此时若选择一个较大的学习率,可能造成模型训练不稳定的问题,因此**学习率预热**的概念被提出,用于解决模型训练初期不收敛的问题。\n",
+ "\n",
+ "学习率预热指的是将学习率从一个很小的值开始,逐步增加到初始较大的学习率。它可以保证模型在训练初期的稳定性。使用学习率预热策略有助于提高图像分类任务的准确性。在DBNet中,实验表明该策略也是有效的。学习率预热策略与Cosine学习率结合时,学习率的变化趋势如下代码演示。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ "