未验证 提交 07336d14 编写于 作者: D Double_V 提交者: GitHub

Merge pull request #6081 from LDOUBLEV/dygraph

[doc]add east and ppocrv3 det
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/ch_db_mv3/
save_epoch_step: 1200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [3000, 2000]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt
name: DistillationModel
algorithm: Distillation
model_type: det
return_all_feats: false
model_type: det
algorithm: DB
name: ResNet
in_channels: 3
layers: 50
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50
return_all_feats: false
model_type: det
algorithm: DB
name: ResNet
in_channels: 3
layers: 50
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50
name: CombinedLoss
- DistillationDMLLoss:
- ["Student", "Student2"]
maps_name: "thrink_maps"
weight: 1.0
act: "softmax"
model_name_pairs: ["Student", "Student2"]
key: maps
- DistillationDBLoss:
weight: 1.0
model_name_list: ["Student", "Student2"]
# key: maps
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
name: Adam
beta1: 0.9
beta2: 0.999
name: Cosine
learning_rate: 0.001
warmup_epoch: 2
name: 'L2'
factor: 0
name: DistillationDBPostProcess
model_name: ["Student", "Student2"]
key: head_out
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
name: DistillationMetric
base_metric_name: DetMetric
main_indicator: hmean
key: "Student"
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [1.0]
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- CopyPaste:
- IaaAugment:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
- { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
- { 'type': Resize, 'args': { 'size': [0.5, 3] } }
- EastRandomCropData:
size: [960, 960]
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
# image_shape: [736, 1280]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 2
# Jetson部署PaddleOCR模型
本节介绍PaddleOCR在Jetson NX、TX2、nano、AGX等系列硬件的部署。
## 1. 环境准备
1. Jetson安装PaddlePaddle
# 安装paddle,以paddlepaddle_gpu-2.3.0rc0-cp36-cp36m-linux_aarch64.whl 为例
pip3 install -U paddlepaddle_gpu-2.3.0rc0-cp36-cp36m-linux_aarch64.whl
2. 下载PaddleOCR代码并安装依赖
首先 clone PaddleOCR 代码:
git clone https://github.com/PaddlePaddle/PaddleOCR
cd PaddleOCR
pip3 install -r requirements.txt
## 2. 执行预测
[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/ppocr_introduction.md#6-%E6%A8%A1%E5%9E%8B%E5%BA%93) 模型库中获取PPOCR模型,下面以PP-OCRv3模型为例,介绍在PPOCR模型在jetson上的使用方式:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar
tar xf ch_PP-OCRv3_det_infer.tar
tar xf ch_PP-OCRv3_rec_infer.tar
cd PaddleOCR
python3 tools/infer/predict_det.py --det_model_dir=./inference/ch_PP-OCRv2_det_infer/ --image_dir=./doc/imgs/french_0.jpg --use_gpu=True
执行命令后在终端会打印出预测的信息,并在 `./inference_results/` 下保存可视化结果。
python3 tools/infer/predict_det.py --rec_model_dir=./inference/ch_PP-OCRv2_rec_infer/ --image_dir=./doc/imgs_words/en/word_2.png --use_gpu=True --rec_image_shape="3,48,320"
[2022/04/28 15:41:45] root INFO: Predicts of ./doc/imgs_words/en/word_2.png:('yourself', 0.98084533)
python3 tools/infer/predict_system.py --det_model_dir=./inference/ch_PP-OCRv2_det_infer/ --rec_model_dir=./inference/ch_PP-OCRv2_rec_infer/ --image_dir=./doc/imgs/ --use_gpu=True --rec_image_shape="3,48,320"
执行命令后在终端会打印出预测的信息,并在 `./inference_results/` 下保存可视化结果。
python3 tools/infer/predict_system.py --det_model_dir=./inference/ch_PP-OCRv2_det_infer/ --rec_model_dir=./inference/ch_PP-OCRv2_rec_infer/ --image_dir=./doc/imgs/00057937.jpg --use_gpu=True --use_tensorrt=True --rec_image_shape="3,48,320"
# Jetson Deployment for PaddleOCR
This section introduces the deployment of PaddleOCR on Jetson NX, TX2, nano, AGX and other series of hardware.
## 1. Prepare Environment
You need to prepare a Jetson development hardware. If you need TensorRT, you need to prepare the TensorRT environment. It is recommended to use TensorRT version 7.1.3;
1. Install PaddlePaddle in Jetson
The PaddlePaddle download [link](https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python)
Please select the appropriate installation package for your Jetpack version, cuda version, and trt version. Here, we download paddlepaddle_gpu-2.3.0rc0-cp36-cp36m-linux_aarch64.whl.
Install PaddlePaddle:
pip3 install -U paddlepaddle_gpu-2.3.0rc0-cp36-cp36m-linux_aarch64.whl
2. Download PaddleOCR code and install dependencies
Clone the PaddleOCR code:
git clone https://github.com/PaddlePaddle/PaddleOCR
and install dependencies:
cd PaddleOCR
pip3 install -r requirements.txt
*Note: Jetson hardware CPU is poor, dependency installation is slow, please wait patiently*
## 2. Perform prediction
Obtain the PPOCR model from the [document](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/ppocr_introduction_en.md#6-model-zoo) model library. The following takes the PP-OCRv3 model as an example to introduce the use of the PPOCR model on Jetson:
Download and unzip the PP-OCRv3 models.
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar
tar xf ch_PP-OCRv3_det_infer.tar
tar xf ch_PP-OCRv3_rec_infer.tar
The text detection inference:
cd PaddleOCR
python3 tools/infer/predict_det.py --det_model_dir=./inference/ch_PP-OCRv2_det_infer/ --image_dir=./doc/imgs/french_0.jpg --use_gpu=True
After executing the command, the predicted information will be printed out in the terminal, and the visualization results will be saved in the `./inference_results/` directory.
The text recognition inference:
python3 tools/infer/predict_det.py --rec_model_dir=./inference/ch_PP-OCRv2_rec_infer/ --image_dir=./doc/imgs_words/en/word_2.png --use_gpu=True --rec_image_shape="3,48,320"
After executing the command, the predicted information will be printed on the terminal, and the output is as follows:
[2022/04/28 15:41:45] root INFO: Predicts of ./doc/imgs_words/en/word_2.png:('yourself', 0.98084533)
The text detection and text recognition inference:
python3 tools/infer/predict_system.py --det_model_dir=./inference/ch_PP-OCRv2_det_infer/ --rec_model_dir=./inference/ch_PP-OCRv2_rec_infer/ --image_dir=./doc/imgs/00057937.jpg --use_gpu=True --rec_image_shape="3,48,320"
After executing the command, the predicted information will be printed out in the terminal, and the visualization results will be saved in the `./inference_results/` directory.
To enable TRT prediction, you only need to set `--use_tensorrt=True` on the basis of the above command:
python3 tools/infer/predict_system.py --det_model_dir=./inference/ch_PP-OCRv2_det_infer/ --rec_model_dir=./inference/ch_PP-OCRv2_rec_infer/ --image_dir=./doc/imgs/ --rec_image_shape="3,48,320" --use_gpu=True --use_tensorrt=True
For more ppocr model predictions, please refer to[document](../../doc/doc_en/models_list_en.md)
......@@ -22,9 +22,7 @@
### 1. 安装PaddleSlim
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
python setup.py install
pip3 install paddleslim==2.2.2
### 2. 准备训练好的模型
......@@ -43,7 +41,15 @@ python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar
tar -xf ch_ppocr_mobile_v2.0_det_train.tar
python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./ch_ppocr_mobile_v2.0_det_train/best_accuracy Global.save_model_dir=./output/quant_model
# 下载检测预训练模型:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
python deploy/slim/quantization/quant.py -c configs/det/ch_PP-OCRv3_det/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model='./ch_PP-OCRv3_det_distill_train/best_accuracy' Global.save_model_dir=./output/quant_model_distill/
......@@ -25,9 +25,7 @@ After training, if you want to further compress the model size and accelerate th
### 1. Install PaddleSlim
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddlSlim
python setup.py install
pip3 install paddleslim==2.2.2
......@@ -52,6 +50,17 @@ python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3
Model distillation and model quantization can be used at the same time, taking the PPOCRv3 detection model as an example:
# download provided model
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
python deploy/slim/quantization/quant.py -c configs/det/ch_PP-OCRv3_det/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model='./ch_PP-OCRv3_det_distill_train/best_accuracy' Global.save_model_dir=./output/quant_model_distill/
If you want to quantify the text recognition model, you can modify the configuration file and loaded model parameters.
### 4. Export inference model
Once we got the model after pruning and fine-tuning, we can export it as an inference model for the deployment of predictive tasks:
- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 训练](#3-1)
- [3.2 评估](#3-2)
- [3.3 预测](#3-3)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. 算法简介
> [EAST: An Efficient and Accurate Scene Text Detector](https://arxiv.org/abs/1704.03155)
> Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang
> CVPR, 2017
| --- | --- | --- | --- | --- | --- | --- |
|EAST|ResNet50_vd|88.71%| 81.36%| 84.88%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
|EAST| MobileNetV3| 78.2%| 79.1%| 78.65%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
<a name="2"></a>
## 2. 环境配置
<a name="3"></a>
## 3. 模型训练、评估、预测
上表中的EAST训练模型使用ICDAR2015文本检测公开数据集训练得到,数据集下载可参考 [ocr_datasets](./dataset/ocr_datasets.md)
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)),可以使用如下命令进行转换:
python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_r50_east/
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_r50_east/" --det_algorithm="EAST"
<a name="4-2"></a>
### 4.2 C++推理
<a name="4-3"></a>
### 4.3 Serving服务化部署
<a name="4-4"></a>
### 4.4 更多推理部署
<a name="5"></a>
## 5. FAQ
## 引用
title={East: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
......@@ -305,10 +305,9 @@ paddle.save(s_params, "ch_PP-OCRv2_rec_train/student.pdparams")
<a name="22"></a>
### 2.2 检测配置文件解析
- ch_PP-OCRv2_det_cml.yml,采用cml蒸馏,采用一个大模型蒸馏两个小模型,且两个小模型互相学习的方法
- ch_PP-OCRv2_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法
- ch_PP-OCRv2_det_distill.yml,采用Teacher大模型蒸馏小模型Student的方法
- ch_PP-OCRv3_det_cml.yml,采用cml蒸馏,采用一个大模型蒸馏两个小模型,且两个小模型互相学习的方法
- ch_PP-OCRv3_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法
<a name="221"></a>
#### 2.2.1 模型结构
......@@ -321,44 +320,44 @@ Architecture:
algorithm: Distillation # 算法名称
Models: # 模型,包含子网络的配置信息
Student: # 子网络名称,至少需要包含`pretrained`与`freeze_params`信息,其他的参数为子网络的构造参数
pretrained: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
freeze_params: false # 是否需要固定参数
return_all_feats: false # 子网络的参数,表示是否需要返回所有的features,如果为False,则只返回最后的输出
model_type: det
algorithm: DB
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
name: ResNet
in_channels: 3
layers: 50
name: DBFPN
out_channels: 96
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50
Teacher: # 另外一个子网络,这里给的是普通大模型蒸小模型的蒸馏示例,
pretrained: ./pretrain_models/ch_ppocr_server_v2.0_det_train/best_accuracy
freeze_params: true # Teacher模型是训练好的,不需要参与训练,freeze_params设置为True
Teacher: # 另外一个子网络,这里给的是DML蒸馏示例,
freeze_params: true
return_all_feats: false
model_type: det
algorithm: DB
name: ResNet
layers: 18
in_channels: 3
layers: 50
name: DBFPN
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50

......@@ -375,12 +374,14 @@ Architecture:
name: ResNet
layers: 18
in_channels: 3
layers: 50
name: DBFPN
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50
Student: # CML蒸馏的Student模型配置
pretrained: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
......@@ -392,10 +393,11 @@ Architecture:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
disable_se: true
name: DBFPN
name: RSEFPN
out_channels: 96
shortcut: True
name: DBHead
k: 50
......@@ -410,10 +412,11 @@ Architecture:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
disable_se: true
name: DBFPN
name: RSEFPN
out_channels: 96
shortcut: True
name: DBHead
k: 50
......@@ -445,34 +448,7 @@ Architecture:
<a name="222"></a>
#### 2.2.2 损失函数
name: CombinedLoss # 损失函数名称,基于改名称,构建用于损失函数的类
loss_config_list: # 损失函数配置文件列表,为CombinedLoss的必备函数
- DistillationDilaDBLoss: # 基于蒸馏的DB损失函数,继承自标准的DBloss
weight: 1.0 # 损失函数的权重,loss_config_list中,每个损失函数的配置都必须包含该字段
model_name_pairs: # 对于蒸馏模型的预测结果,提取这两个子网络的输出,计算Teacher模型和Student模型输出的loss
- ["Student", "Teacher"]
key: maps # 取子网络输出dict中,该key对应的tensor
balance_loss: true # 以下几个参数为标准DBloss的配置参数
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
- DistillationDBLoss: # 基于蒸馏的DB损失函数,继承自标准的DBloss,用于计算Student和GT之间的loss
weight: 1.0
model_name_list: ["Student"] # 模型名字只有Student,表示计算Student和GT之间的loss
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
name: CombinedLoss
......@@ -545,26 +521,25 @@ Metric:
<a name="225"></a>
#### 2.2.5 检测蒸馏模型finetune
- 采用ch_PP-OCRv2_det_distill.yml,Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型
- 采用ch_PP-OCRv2_det_cml.yml,采用cml蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型
- 采用ch_PP-OCRv2_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法,在PaddleOCR采用的数据集上大约有1.7%的精度提升。
- 采用ch_PP-OCRv3_det_cml.yml,采用cml蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型
- 采用ch_PP-OCRv3_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法,在PaddleOCR采用的数据集上相比单独训练Student模型有1%-2%的提升。
# 下载蒸馏训练模型的参数
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv3_det_distill_train.tar
import paddle
# 加载预训练模型
all_params = paddle.load("ch_PP-OCRv2_det_distill_train/best_accuracy.pdparams")
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# 查看权重参数的keys
# 学生模型的权重提取
......@@ -572,7 +547,7 @@ s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Stu
# 查看学生模型权重参数的keys
# 保存
paddle.save(s_params, "ch_PP-OCRv2_det_distill_train/student.pdparams")
paddle.save(s_params, "ch_PP-OCRv3_det_distill_train/student.pdparams")
......@@ -38,6 +38,19 @@ PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模
#### PP-OCRv3
- 网络结构改进:提出两种改进后的FPN网络结构,RSEFPN,LKPAN,分别从channel attention、更大感受野的角度优化FPN中的特征,优化FPN提取的特征。
- 蒸馏训练策略:首先,以resnet50作为backbone,改进后的LKPAN网络结构作为FPN,使用DML自蒸馏策略得到精度更高的teacher模型;然后,student模型FPN部分采用RSEFPN,采用PPOCRV2提出的CML蒸馏方法蒸馏,在训练过程中,动态调整CML蒸馏teacher loss的占比。
|序号|策略|模型大小|hmean|Intel Gold 6148CPU+mkldnn预测耗时|
|2|teacher DML|124M|86.0|-|
|3|1 + 2 + RESFPN|3.6M|85.4|124ms|
|4|1 + 2 + LKPAN|4.6M|86.0|156ms|
<a name="2"></a>
## 2. 特性
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Training](#3-1)
- [3.2 Evaluation](#3-2)
- [3.3 Prediction](#3-3)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
> [EAST: An Efficient and Accurate Scene Text Detector](https://arxiv.org/abs/1704.03155)
> Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang
> CVPR, 2017
On the ICDAR2015 dataset, the text detection result is as follows:
| --- | --- | --- | --- | --- | --- | --- |
|EAST|ResNet50_vd|88.71%| 81.36%| 84.88%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
|EAST| MobileNetV3| 78.2%| 79.1%| 78.65%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
<a name="2"></a>
## 2. Environment
Please prepare your environment referring to [prepare the environment](./environment_en.md) and [clone the repo](./clone_en.md).
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
The above EAST model is trained using the ICDAR2015 text detection public dataset. For the download of the dataset, please refer to [ocr_datasets](./dataset/ocr_datasets_en.md).
After the data download is complete, please refer to [Text Detection Training Tutorial](./detection.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)), you can use the following command to convert:
python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_r50_east/
For EAST text detection model inference, you need to set the parameter --det_algorithm="EAST", run the following command:
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_r50_east/" --det_algorithm="EAST"
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'.
<a name="4-2"></a>
### 4.2 C++ Inference
Since the post-processing is not written in CPP, the EAST text detection model does not support CPP inference.
<a name="4-3"></a>
### 4.3 Serving
Not supported
<a name="4-4"></a>
### 4.4 More
Not supported
<a name="5"></a>
## 5. FAQ
## Citation
title={East: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
......@@ -319,11 +319,10 @@ After the extraction is complete, use [ch_PP-OCRv2_rec.yml](../../configs/rec/ch
<a name="22"></a>
### 2.2 Detection Model Configuration File Analysis
The configuration file of the detection model distillation is in the ```PaddleOCR/configs/det/ch_PP-OCRv2/``` directory, which contains three distillation configuration files:
The configuration file of the detection model distillation is in the ```PaddleOCR/configs/det/ch_PP-OCRv3/``` directory, which contains three distillation configuration files:
- ```ch_PP-OCRv2_det_cml.yml```, Use one large model to distill two small models, and the two small models learn from each other
- ```ch_PP-OCRv2_det_dml.yml```, Method of mutual distillation of two student models
- ```ch_PP-OCRv2_det_distill.yml```, The method of using large teacher model to distill small student model
- ```ch_PP-OCRv3_det_cml.yml```, Use one large model to distill two small models, and the two small models learn from each other
- ```ch_PP-OCRv3_det_dml.yml```, Method of mutual distillation of two student models
<a name="221"></a>
#### 2.2.1 Model Structure
......@@ -341,39 +340,40 @@ Architecture:
model_type: det
algorithm: DB
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
name: ResNet
in_channels: 3
layers: 50
name: DBFPN
out_channels: 96
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50
Teacher: # Another sub-network, here is a distillation example of a large model distill a small model
pretrained: ./pretrain_models/ch_ppocr_server_v2.0_det_train/best_accuracy
freeze_params: true # The Teacher model is well-trained and does not need to participate in training
return_all_feats: false
model_type: det
algorithm: DB
name: ResNet
layers: 18
in_channels: 3
layers: 50
name: DBFPN
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50
If DML is used, that is, the method of two small models learning from each other, the Teacher network structure in the above configuration file needs to be set to the same configuration as the Student model.
Refer to the configuration file for details. [ch_PP-OCRv2_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_dml.yml)
Refer to the configuration file for details. [ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)
The following describes the configuration file parameters [ch_PP-OCRv2_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml):
The following describes the configuration file parameters [ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml):
......@@ -390,12 +390,14 @@ Architecture:
name: ResNet
layers: 18
in_channels: 3
layers: 50
name: DBFPN
name: LKPAN
out_channels: 256
name: DBHead
kernel_list: [7,2,2]
k: 50
Student: # Student model configuration for CML distillation
pretrained: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
......@@ -407,10 +409,11 @@ Architecture:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
disable_se: true
name: DBFPN
name: RSEFPN
out_channels: 96
shortcut: True
name: DBHead
k: 50
......@@ -425,10 +428,11 @@ Architecture:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
disable_se: true
name: DBFPN
name: RSEFPN
out_channels: 96
shortcut: True
name: DBHead
k: 50
......@@ -460,34 +464,7 @@ The key contains `backbone_out`, `neck_out`, `head_out`, and `value` is the tens
<a name="222"></a>
#### 2.2.2 Loss Function
In the task of detection knowledge distillation ```ch_PP-OCRv2_det_distill.yml````, the distillation loss function configuration is as follows.
name: CombinedLoss # Loss function name
loss_config_list: # List of loss function configuration files, mandatory functions for CombinedLoss
- DistillationDilaDBLoss: # DB loss function based on distillation, inherited from standard DBloss
weight: 1.0 # The weight of the loss function. In loss_config_list, each loss function must include this field
model_name_pairs: # Extract the output of these two sub-networks and calculate the loss between them
- ["Student", "Teacher"]
key: maps # In the sub-network output dict, take the corresponding tensor
balance_loss: true # The following parameters are the configuration parameters of standard DBloss
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
- DistillationDBLoss: # Used to calculate the loss between Student and GT
weight: 1.0
model_name_list: ["Student"] # The model name only has Student, which means that the loss between Student and GT is calculated
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Similarly, distillation loss function configuration(`ch_PP-OCRv2_det_cml.yml`) is shown below. Compared with the loss function configuration of ch_PP-OCRv2_det_distill.yml, there are three changes:
The distillation loss function configuration(`ch_PP-OCRv3_det_cml.yml`) is shown below.
name: CombinedLoss
......@@ -530,7 +507,7 @@ In the task of detecting knowledge distillation, the post-processing configurati
name: DistillationDBPostProcess # The CTC decoding post-processing of the DB detection distillation task, inherited from the standard DBPostProcess class
name: DistillationDBPostProcess # The post-processing of the DB detection distillation task, inherited from the standard DBPostProcess class
model_name: ["Student", "Student2", "Teacher"] # Extract the output of multiple sub-networks and decode them. The network that does not require post-processing is not set in model_name
thresh: 0.3
box_thresh: 0.6
......@@ -561,9 +538,9 @@ Model Structure
#### 2.2.5 Fine-tuning Distillation Model
There are three ways to fine-tune the detection distillation task:
- `ch_PP-OCRv2_det_distill.yml`, The teacher model is set to the model provided by PaddleOCR or the large model you have trained.
- `ch_PP-OCRv2_det_cml.yml`, Use cml distillation. Similarly, the Teacher model is set to the model provided by PaddleOCR or the large model you have trained.
- `ch_PP-OCRv2_det_dml.yml`, Distillation using DML. The method of mutual distillation of the two Student models has an accuracy improvement of about 1.7% on the data set used by PaddleOCR.
- `ch_PP-OCRv3_det_distill.yml`, The teacher model is set to the model provided by PaddleOCR or the large model you have trained.
- `ch_PP-OCRv3_det_cml.yml`, Use cml distillation. Similarly, the Teacher model is set to the model provided by PaddleOCR or the large model you have trained.
- `ch_PP-OCRv3_det_dml.yml`, Distillation using DML. The method of mutual distillation of the two Student models has an accuracy improvement of about 1.7% on the data set used by PaddleOCR.
In fine-tune, you need to set the pre-trained model to be loaded in the `pretrained` parameter of the network structure.
......@@ -572,13 +549,13 @@ In terms of accuracy improvement, `cml` > `dml` > `distill`. When the amount of
In addition, since the distillation pre-training model provided by PaddleOCR contains multiple model parameters, if you want to extract the parameters of the student model, you can refer to the following code:
# Download the parameters of the distillation training model
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
import paddle
# Load the pre-trained model
all_params = paddle.load("ch_PP-OCRv2_det_distill_train/best_accuracy.pdparams")
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# View the keys of the weight parameter
# Extract the weights of the student model
......@@ -586,7 +563,7 @@ s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Stu
# View the keys of the weight parameters of the student model
# Save
paddle.save(s_params, "ch_PP-OCRv2_det_distill_train/student.pdparams")
paddle.save(s_params, "ch_PP-OCRv3_det_distill_train/student.pdparams")
Finally, the parameters of the student model will be saved in `ch_PP-OCRv2_det_distill_train/student.pdparams` for the fine-tune of the model.
Finally, the parameters of the student model will be saved in `ch_PP-OCRv3_det_distill_train/student.pdparams` for the fine-tune of the model.
......@@ -32,6 +32,21 @@ PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 hav
[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144).
[3] PP-OCRv3 is further upgraded on the basis of PP-OCRv2.
PP-OCRv3 text detection has been further optimized from the two directions of network structure and distillation training strategy:
- Network structure improvement: Two improved FPN network structures, RSEFPN and LKPAN, are proposed to optimize the features in the FPN from the perspective of channel attention and a larger receptive field, and optimize the features extracted by the FPN.
- Distillation training strategy: First, use resnet50 as the backbone, the improved LKPAN network structure as the FPN, and use the DML self-distillation strategy to obtain a teacher model with higher accuracy; then, the FPN part of the student model adopts RSEFPN, and adopts the CML distillation method proposed by PPOCRV2, during the training process, dynamically adjust the proportion of CML distillation teacher loss.
|Index|Method|Model SIze|Hmean|CPU inference time|
|2|teacher DML|124M|86.0|-|
|3|1 + 2 + RESFPN|3.6M|85.4|124ms|
|4|1 + 2 + LKPAN|4.6M|86.0|156ms|
*note: CPU inference time refers to the average inference time on an Intel Gold 6148CPU with mkldnn enabled.*
<a name="2"></a>
## 2. Features
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
想要评论请 注册