未验证 提交 b7f15dc2 编写于 作者: Z zhoujun 提交者: GitHub

Add PP-OCRv2 to model center (#5552)

* add PP-OCRv2

* add PP-OCRv2 benckmark

* update app

* update introduction

* update PP-OCRv2 introduction

* add pipeline of PP-OCRv2

* add PP-OCRv3

* update benckmark
上级 8065d741
import gradio as gr
import base64
from io import BytesIO
from PIL import Image
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(ocr_version='PP-OCRv2', use_angle_cls=True, lang="ch")
def image_to_base64(image):
# 输入为PIL读取的图片,输出为base64格式
byte_data = BytesIO() # 创建一个字节流管道
image.save(byte_data, format="JPEG") # 将图片数据存入字节流管道
byte_data = byte_data.getvalue() # 从字节流管道中获取二进制
base64_str = base64.b64encode(byte_data).decode("ascii") # 二进制转base64
return base64_str
# UGC: Define the inference fn() for your models
def model_inference(image):
result = ocr.ocr(image, cls=True)
# 显示结果
result = result[0]
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts=None, scores=None)
im_show = Image.fromarray(im_show)
res = []
for i in range(len(boxes)):
res.append(dict(boxes=boxes[i], txt=txts[i], score=scores[i]))
json_out = {"base64": image_to_base64(im_show), "result": res}
return im_show, json_out
def clear_all():
return None, None, None
with gr.Blocks() as demo:
gr.Markdown("PP-OCRv2")
with gr.Column(scale=1, min_width=100):
img_in = gr.Image(
value="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/imgs/11.jpg",
label="Input")
with gr.Row():
btn1 = gr.Button("Clear")
btn2 = gr.Button("Submit")
img_out = gr.Image(label="Output").style(height=400)
json_out = gr.JSON(label="jsonOutput")
btn2.click(fn=model_inference, inputs=img_in, outputs=[img_out, json_out])
btn1.click(fn=clear_all, inputs=None, outputs=[img_in, img_out, json_out])
gr.Button.style(1)
demo.launch()
【PP-OCRv2-App-YAML】
APP_Info:
title: PP-OCRv2-App
colorFrom: blue
colorTo: yellow
sdk: gradio
sdk_version: 3.4.1
app_file: app.py
license: apache-2.0
device: cpu
\ No newline at end of file
## 1. 推理 Benchmark
### 1.1 软硬件环境
* PP-OCRv2模型推理速度测试采用6148CPU,开启MKLDNN,线程数为10进行测试。
### 1.2 数据集
使用内部数据集进行测试
### 1.3 指标
| 模型 | 硬件 | det preprocess time | det inference time | det post process time | rec preprocess time | rec inference time | rec post process time | total time (s) |
|---|---|---|---|---|---|---|---|---|
| ppocr_mobile | 6148CPU | 10.6291 | 103.8162 | 16.0715 | 0.246 | 62.8177 | 4.6695 | 40.4602 + 69.9684 |
| ppocr_server | 6148CPU | 10.6834 | 178.5214 | 16.2959 | 0.2741 | 237.5255 | 4.8711 | 63.7052 + 263.783 |
| ppocr_mobile_v2 | 6148CPU | 10.58 | 102.9626 | 16.5514 | 0.2418 | 53.395 | 4.4622 | 40.3293 + 62.2241 |
## 3. 相关使用说明
1. https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4#pp-ocrv2-pipeline
## 1. Inference Benchmark
### 1.1 Environment
The PP-OCRv2 model inference speed test uses 6148CPU, MKLDNN is turned on, and the number of threads is 10 for testing.
### 1.2 Benchmark
| model | hardware | det preprocess time | det inference time | det post process time | rec preprocess time | rec inference time | rec post process time | total time (s) |
|---|---|---|---|---|---|---|---|---|
| ppocr_mobile | 6148CPU | 10.6291 | 103.8162 | 16.0715 | 0.246 | 62.8177 | 4.6695 | 40.4602 + 69.9684 |
| ppocr_server | 6148CPU | 10.6834 | 178.5214 | 16.2959 | 0.2741 | 237.5255 | 4.8711 | 63.7052 + 263.783 |
| ppocr_mobile_v2 | 6148CPU | 10.58 | 102.9626 | 16.5514 | 0.2418 | 53.395 | 4.4622 | 40.3293 + 62.2241 |
## 2. Reference
1. https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4#pp-ocrv2-pipeline
# 模型列表
## 1. 文本检测模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|ch_PP-OCRv2_det_slim|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测| 3M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|ch_PP-OCRv2_det|原始超轻量模型,支持中英文、多语种文本检测|3M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)|
## 2. 文本识别模型
### 2.1 中文识别模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|ch_PP-OCRv2_rec_slim|slim量化版超轻量模型,支持中英文、数字识别| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|ch_PP-OCRv2_rec|原始超轻量模型,支持中英文、数字识别|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
**说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。
### 2.2 英文识别模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别| 2.7M | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) |
|en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
### 2.3 多语言识别模型
|模型名称|字典文件|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |--- |
| french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt |法文识别|2.65M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) |
| german_mobile_v2.0_rec | ppocr/utils/dict/german_dict.txt |德文识别|2.65M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_train.tar) |
| korean_mobile_v2.0_rec | ppocr/utils/dict/korean_dict.txt |韩文识别|3.9M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_train.tar) |
| japan_mobile_v2.0_rec | ppocr/utils/dict/japan_dict.txt |日文识别|4.23M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_train.tar) |
| chinese_cht_mobile_v2.0_rec | ppocr/utils/dict/chinese_cht_dict.txt | 中文繁体识别|5.63M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_train.tar) |
| te_mobile_v2.0_rec | ppocr/utils/dict/te_dict.txt | 泰卢固文识别|2.63M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_train.tar) |
| ka_mobile_v2.0_rec | ppocr/utils/dict/ka_dict.txt |卡纳达文识别|2.63M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_train.tar) |
| ta_mobile_v2.0_rec | ppocr/utils/dict/ta_dict.txt |泰米尔文识别|2.63M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_train.tar) |
| latin_mobile_v2.0_rec | ppocr/utils/dict/latin_dict.txt | 拉丁文识别 |2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_train.tar) |
| arabic_mobile_v2.0_rec | ppocr/utils/dict/arabic_dict.txt | 阿拉伯字母 |2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_train.tar) |
| cyrillic_mobile_v2.0_rec | ppocr/utils/dict/cyrillic_dict.txt | 斯拉夫字母 |2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_train.tar) |
| devanagari_mobile_v2.0_rec | ppocr/utils/dict/devanagari_dict.txt |梵文字母 |2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_train.tar) |
## 3. 文本方向分类模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型,对检测到的文本行文字角度分类| 2.1M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) |
|ch_ppocr_mobile_v2.0_cls|原始分类器模型,对检测到的文本行文字角度分类|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
# model list
## 1. Text Detection Model
| model | description | model_size | download |
| --- | --- | --- | --- |
|ch_PP-OCRv2_det_slim|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|ch_PP-OCRv2_det|Original lightweight model, supporting Chinese, English, multilingual text detection|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)|
## 2. Text Recognition Model
### 2.1 Chinese Recognition Model
| model | description | model_size | download |
| --- | --- | --- | --- |
|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|ch_PP-OCRv2_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
**Note**: The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset.
### 2.2 English Recognition Model
| model | description | model_size | download |
| --- | --- | --- | --- |
|en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition| 2.7M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) |
|en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
### 2.3 Multilingual Recognition Model
| model | dict file| description | model_size | download |
| --- | --- | --- | --- |--- |
| french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt |Lightweight model for French recognition|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) |
| german_mobile_v2.0_rec | ppocr/utils/dict/german_dict.txt |Lightweight model for German recognition|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_train.tar) |
| korean_mobile_v2.0_rec | ppocr/utils/dict/korean_dict.txt |Lightweight model for Korean recognition|3.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_train.tar) |
| japan_mobile_v2.0_rec | ppocr/utils/dict/japan_dict.txt |Lightweight model for Japanese recognition|4.23M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_train.tar) |
| chinese_cht_mobile_v2.0_rec | ppocr/utils/dict/chinese_cht_dict.txt | Lightweight model for chinese cht recognition|5.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_train.tar) |
| te_mobile_v2.0_rec | ppocr/utils/dict/te_dict.txt | Lightweight model for Telugu recognition|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_train.tar) |
| ka_mobile_v2.0_rec | ppocr/utils/dict/ka_dict.txt |Lightweight model for Kannada recognition|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_train.tar) |
| ta_mobile_v2.0_rec | ppocr/utils/dict/ta_dict.txt |Lightweight model for Tamil recognition|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_train.tar) |
| latin_mobile_v2.0_rec | ppocr/utils/dict/latin_dict.txt | Lightweight model for latin recognition |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_train.tar) |
| arabic_mobile_v2.0_rec | ppocr/utils/dict/arabic_dict.txt | Lightweight model for arabic recognition |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_train.tar) |
| cyrillic_mobile_v2.0_rec | ppocr/utils/dict/cyrillic_dict.txt | Lightweight model for cyrillic recognition |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_train.tar) |
| devanagari_mobile_v2.0_rec | ppocr/utils/dict/devanagari_dict.txt |Lightweight model for devanagari recognition |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_train.tar) |
## 3. Text Angle Classification Model
| model | description | model_size | download |
| --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model for text angle classification| 2.1M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) |
|ch_ppocr_mobile_v2.0_cls|Original model for text angle classification|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
---
Model_Info:
name: "PP-OCRv2"
description: ""
description_en: ""
icon: "@后续UE统一设计之后,会存到bos上某个位置"
from_repo: "PaddleOCR"
Task:
- tag_en: "CV"
tag: "计算机视觉"
sub_tag_en: "Character Recognition"
sub_tag: "文字识别"
Example:
- title: "《动手学OCR》系列课程之:PP-OCRv2预测部署实战"
url: "https://aistudio.baidu.com/aistudio/projectdetail/3552922?channelType=0&channel=0"
- title: "《动手学OCR》系列课程之:OCR文本识别实战"
url: "https://aistudio.baidu.com/aistudio/projectdetail/3552051?channelType=0&channel=0"
- title: "《动手学OCR》系列课程之:OCR文本检测实践"
url: "https://aistudio.baidu.com/aistudio/projectdetail/3551779?channelType=0&channel=0"
Datasets: "ICDAR 2015, ICDAR2019-LSVT,ICDAR2017-RCTW-17,Total-Text,ICDAR2019-ArT"
Pulisher: "Baidu"
License: "apache.2.0"
Paper:
- title: "PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System"
url: "https://arxiv.org/pdf/2109.03144v2.pdf"
IfTraining: 0
IfOnlineDemo: 1
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. PP-OCRv2模型简介\n",
"\n",
"PP-OCR是PaddleOCR针对OCR领域发布的文字检测识别系统,PP-OCRv2针对 PP-OCR 进行了一些经验性改进,构建了一个新的 OCR 系统。PP-OCRv2系统框图如下所示(粉色框中为PP-OCRv2新增策略):\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200258931-771f5a1d-230c-4168-9130-0b79321558a9.png\" width = \"80%\" />\n",
"</div>\n",
"\n",
"从算法改进思路上看,主要有五个方面的改进:\n",
"1. 检测模型优化:采用 CML 协同互学习知识蒸馏策略;\n",
"2. 检测模型优化:CopyPaste 数据增广策略;\n",
"3. 识别模型优化:PP-LCNet 轻量级骨干网络;\n",
"4. 识别模型优化:UDML 改进知识蒸馏策略;\n",
"5. 识别模型优化:Enhanced CTC loss 损失函数改进。\n",
"\n",
"从效果上看,主要有三个方面提升:\n",
"1. 在模型效果上,相对于 PP-OCR mobile 版本提升超7%;\n",
"2. 在速度上,相对于 PP-OCR server 版本提升超过220%;\n",
"3. 在模型大小上,11.6M 的总大小,服务器端和移动端都可以轻松部署。\n",
"\n",
"更详细的优化细节可参考技术报告:https://arxiv.org/abs/2109.03144 。\n",
"\n",
"更多关于PaddleOCR的内容,可以点击 https://github.com/PaddlePaddle/PaddleOCR 进行了解。\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 模型效果\n",
"\n",
"PP-OCRv2的效果如下:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200239467-a082eef9-fee0-4587-be48-b276a95bf8d0.gif\" width = \"80%\" />\n",
"</div>\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 模型如何使用\n",
"\n",
"### 3.1 模型推理:\n",
"* 安装PaddleOCR whl包"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"! pip install paddleocr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* 快速体验"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# 命令行使用\n",
"! paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --ocr_version PP-OCRv2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"运行完成后,会在终端输出如下结果:\n",
"```log\n",
"[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]\n",
"[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]\n",
"[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]\n",
"......\n",
"```\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 原理\n",
"\n",
"优化思路具体如下\n",
"\n",
"1. 检测模型优化\n",
"- 采用 CML (Collaborative Mutual Learning) 协同互学习知识蒸馏策略。\n",
" <div align=\"center\">\n",
" <img src=\"https://pic4.zhimg.com/80/v2-05f12bcd1784993edabdadbc89b5e9e7_720w.webp\" width = \"60%\" />\n",
" </div>\n",
"\n",
"如上图所示,CML 的核心思想结合了①传统的 Teacher 指导 Student 的标准蒸馏与 ② Students 网络直接的 DML 互学习,可以让 Students 网络互学习的同时,Teacher 网络予以指导。对应的,精心设计关键的三个 Loss 损失函数:GT Loss、DML Loss 和 Distill Loss,在 Teacher 网络 Backbone 为 ResNet18 的条件下,对 Student 的 MobileNetV3 起到了良好的提升效果。\n",
"\n",
" - CopyPaste 数据增广策略\n",
" <div align=\"center\">\n",
" <img src=\"https://pic1.zhimg.com/80/v2-90239608c554972ac307be07f487f254_720w.webp\" width = \"60%\" />\n",
" </div>\n",
"\n",
"数据增广是提升模型泛化能力重要的手段之一,CopyPaste 是一种新颖的数据增强技巧,已经在目标检测和实例分割任务中验证了有效性。利用 CopyPaste,可以合成文本实例来平衡训练图像中的正负样本之间的比例。相比而言,传统图像旋转、随机翻转和随机裁剪是无法做到的。\n",
"\n",
"CopyPaste 主要步骤包括:①随机选择两幅训练图像,②随机尺度抖动缩放,③随机水平翻转,④随机选择一幅图像中的目标子集,⑤粘贴在另一幅图像中随机的位置。这样,就比较好的提升了样本丰富度,同时也增加了模型对环境鲁棒性。\n",
"\n",
"2. 识别模型优化\n",
"- PP-LCNet 轻量级骨干网络\n",
" \n",
"采用速度和精度均衡的PP-LCNet,有效提升精度的同时减少网络推理时间。\n",
"\n",
"- UDML 知识蒸馏策略\n",
" <div align=\"center\">\n",
" <img src=\"https://pic1.zhimg.com/80/v2-642d94e092c7d5f90bedbd7c7511636c_720w.webp\" width = \"60%\" />\n",
" </div>\n",
" 在标准的 DML 知识蒸馏的基础上,新增引入了对于 Feature Map 的监督机制,新增 Feature Loss,增加迭代次数,在 Head 部分增加额外的 FC 网络,最终加快蒸馏的速度同时提升效果。\n",
"\n",
"- Enhanced CTC loss 改进\n",
" <div align=\"center\">\n",
" <img src=\"https://pic3.zhimg.com/80/v2-864d255454b5196e0a2a916d81ff92c6_720w.webp\" width = \"40%\" />\n",
" </div>\n",
" 中文文字识别任务中,经常遇到相似字符数误识别的问题,这里借鉴 Metric Learning,引入 Center Loss,进一步增大类间距离来增强模型对相似字符的区分能力,核心思路如上图公式所示。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 注意事项\n",
"\n",
"PP-OCR系列模型训练过程中均使用通用数据,如在实际场景中表现不满意,可标注少量数据进行finetune。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. 相关论文以及引用信息\n",
"```\n",
"@article{du2021pp,\n",
" title={PP-OCRv2: bag of tricks for ultra lightweight OCR system},\n",
" author={Du, Yuning and Li, Chenxia and Guo, Ruoyu and Cui, Cheng and Liu, Weiwei and Zhou, Jun and Lu, Bin and Yang, Yehua and Liu, Qiwen and Hu, Xiaoguang and others},\n",
" journal={arXiv preprint arXiv:2109.03144},\n",
" year={2021}\n",
"}\n",
"```\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. PP-OCRv2 Introduction\n",
"\n",
"PP-OCR is a text detection and recognition system released by PaddleOCR for the OCR field. PP-OCRv2 has made some improvements for PP-OCR and constructed a new OCR system. The pipeline of PP-OCRv2 system is as follows (the pink box is the new policy of PP-OCRv2):\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200258931-771f5a1d-230c-4168-9130-0b79321558a9.png\" width = \"80%\" />\n",
"</div>\n",
"\n",
"There are five enhancement strategies:\n",
"1. Text detection enhancement strategies: collaborative mutual learning;\n",
"2. Text detection enhancement strategies: CcopyPaste data augmentation;\n",
"3. Text recognition enhancement strategies: PP-LCNet lightweight backbone network;\n",
"4. Text recognition enhancement strategies: UDML knowledge distillation;\n",
"5. Text recognition enhancement strategies: enhanced CTC loss;\n",
"\n",
"Compared with PP-OCR, the performance improvement of PP-OCRv2 is as follows:\n",
"1. Compared with the PP-OCR mobile version, the accuracy of the model is improved by more than 7%;\n",
"2. Compared with the PP-OCR server version, the speed is increased by more than 220%;\n",
"3. The total model size is 11.6M, which can be easily deployed on both the server side and the mobile side;\n",
"\n",
"For more details, please refer to the technical report: https://arxiv.org/abs/2109.03144 .\n",
"\n",
"For more information about PaddleOCR, you can click https://github.com/PaddlePaddle/PaddleOCR to learn more.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Model Effects\n",
"\n",
"The results of PP-OCRv2 are as follows:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200239467-a082eef9-fee0-4587-be48-b276a95bf8d0.gif\" width = \"80%\" />\n",
"</div>\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. How to Use the Model\n",
"\n",
"### 3.1 Inference\n",
"* Install PaddleOCR whl package"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"! pip install paddleocr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Quick experience"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# command line usage\n",
"! paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --ocr_version PP-OCRv2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After the operation is complete, the following results will be output in the terminal:\n",
"```log\n",
"[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]\n",
"[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]\n",
"[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]\n",
"......\n",
"```\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Model Principles\n",
"\n",
"\n",
"1. Text detection enhancement strategies\n",
"- Adopt CML (Collaborative Mutual Learning) collaborative mutual learning knowledge distillation strategy.\n",
" <div align=\"center\">\n",
" <img src=\"https://pic4.zhimg.com/80/v2-05f12bcd1784993edabdadbc89b5e9e7_720w.webp\" width = \"60%\" />\n",
" </div>\n",
"\n",
"As shown in the figure above, the core idea of CML combines ① the standard distillation of the traditional Teacher guiding Students and ② the direct DML mutual learning of the Students network, which allows the Students network to learn from each other and the Teacher network to guide. Correspondingly, the three key Loss loss functions are carefully designed: GT Loss, DML Loss and Distill Loss. Under the condition that the Teacher network Backbone is ResNet18, it has a good effect on the Student's MobileNetV3.\n",
"\n",
" - CopyPaste data augmentation\n",
" <div align=\"center\">\n",
" <img src=\"https://pic1.zhimg.com/80/v2-90239608c554972ac307be07f487f254_720w.webp\" width = \"60%\" />\n",
" </div>\n",
"\n",
"Data augmentation is one of the important means to improve the generalization ability of the model. CopyPaste is a novel data augmentation technique, which has been proven effective in object detection and instance segmentation tasks. With CopyPaste, text instances can be synthesized to balance the ratio between positive and negative samples in training images. In contrast, this is impossible with traditional image rotation, random flipping, and random cropping.\n",
"\n",
"The main steps of CopyPaste include: ① randomly select two training images, ② randomly scale jitter and zoom, ③ randomly flip horizontally, ④ randomly select a target subset in one image, and ⑤ paste at a random position in another image. In this way, the sample richness can be greatly improved, and the robustness of the model to the environment can be increased.\n",
"\n",
"2. Text recognition enhancement strategies\n",
"- PP-LCNet lightweight backbone network\n",
" \n",
"The PP-LCNet with balanced speed and accuracy is adopted to effectively improve the accuracy and reduce the network inference time.\n",
"\n",
"- UDML Knowledge Distillation Strategy\n",
" <div align=\"center\">\n",
" <img src=\"https://pic1.zhimg.com/80/v2-642d94e092c7d5f90bedbd7c7511636c_720w.webp\" width = \"60%\" />\n",
" </div>\n",
"On the basis of standard DML knowledge distillation, a supervision mechanism for Feature Map is introduced, Feature Loss is added, the number of iterations is increased, and an additional FC network is added to the Head part, which finally speeds up the distillation and improves the effect.\n",
"\n",
"- Enhanced CTC loss\n",
" <div align=\"center\">\n",
" <img src=\"https://pic3.zhimg.com/80/v2-864d255454b5196e0a2a916d81ff92c6_720w.webp\" width = \"40%\" />\n",
" </div>\n",
" In Chinese text recognition tasks, the problem of misrecognition of the number of similar characters is often encountered. Here, we draw on Metric Learning and introduce Center Loss to further increase the distance between classes to enhance the model's ability to distinguish similar characters. The core idea is shown in the formula above."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Attention\n",
"\n",
"General data are used in the training process of PP-OCR series models. If the performance is not satisfactory in the actual scene, a small amount of data can be marked for finetune."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Related papers and citations\n",
"```\n",
"@article{du2021pp,\n",
" title={PP-OCRv2: bag of tricks for ultra lightweight OCR system},\n",
" author={Du, Yuning and Li, Chenxia and Guo, Ruoyu and Cui, Cheng and Liu, Weiwei and Zhou, Jun and Lu, Bin and Yang, Yehua and Liu, Qiwen and Hu, Xiaoguang and others},\n",
" journal={arXiv preprint arXiv:2109.03144},\n",
" year={2021}\n",
"}\n",
"```\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
import gradio as gr
import base64
from io import BytesIO
from PIL import Image
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="ch")
def image_to_base64(image):
# 输入为PIL读取的图片,输出为base64格式
byte_data = BytesIO() # 创建一个字节流管道
image.save(byte_data, format="JPEG") # 将图片数据存入字节流管道
byte_data = byte_data.getvalue() # 从字节流管道中获取二进制
base64_str = base64.b64encode(byte_data).decode("ascii") # 二进制转base64
return base64_str
# UGC: Define the inference fn() for your models
def model_inference(image):
result = ocr.ocr(image, cls=True)
# 显示结果
result = result[0]
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts=None, scores=None)
im_show = Image.fromarray(im_show)
res = []
for i in range(len(boxes)):
res.append(dict(boxes=boxes[i], txt=txts[i], score=scores[i]))
json_out = {"base64": image_to_base64(im_show), "result": res}
return im_show, json_out
def clear_all():
return None, None, None
with gr.Blocks() as demo:
gr.Markdown("PP-OCRv3")
with gr.Column(scale=1, min_width=100):
img_in = gr.Image(
value="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/imgs/11.jpg",
label="Input")
with gr.Row():
btn1 = gr.Button("Clear")
btn2 = gr.Button("Submit")
img_out = gr.Image(label="Output").style(height=400)
json_out = gr.JSON(label="jsonOutput")
btn2.click(fn=model_inference, inputs=img_in, outputs=[img_out, json_out])
btn1.click(fn=clear_all, inputs=None, outputs=[img_in, img_out, json_out])
gr.Button.style(1)
demo.launch()
【PP-OCRv3-App-YAML】
APP_Info:
title: PP-OCRv3-App
colorFrom: blue
colorTo: yellow
sdk: gradio
sdk_version: 3.4.1
app_file: app.py
license: apache-2.0
device: cpu
\ No newline at end of file
## 1. 推理 Benchmark
### 1.1 软硬件环境
* PP-OCRv3模型推理速度测试采用6148CPU,开启MKLDNN,线程数为10进行测试。
### 1.2 数据集
使用内部数据集进行测试
### 1.3 指标
| Model | Hmean | Model Size (M) | Time Cost (CPU, ms) |
|-----|-----|--------|----|
| PP-OCR mobile | 50.30% | 8.1 | 356.00 |
| PP-OCR server | 57.00% | 155.1 | 1056.00 |
| PP-OCRv2 | 57.60% | 11.6 | 330.00 |
| PP-OCRv3 | 62.90% | 15.6 | 331.00 |
## 3. 相关使用说明
1. https://github.com/PaddlePaddle/PaddleOCR/edit/dygraph/doc/doc_ch/PP-OCRv3_introduction.md
## 1. Inference Benchmark
### 1.1 Environment
The PP-OCRv3 model inference speed test uses 6148CPU, MKLDNN is turned on, and the number of threads is 10 for testing.
### 1.2 Benchmark
| Model | Hmean | Model Size (M) | Time Cost (CPU, ms) |
|-----|-----|--------|----|
| PP-OCR mobile | 50.30% | 8.1 | 356.00 |
| PP-OCR server | 57.00% | 155.1 | 1056.00 |
| PP-OCRv2 | 57.60% | 11.6 | 330.00 |
| PP-OCRv3 | 62.90% | 15.6 | 331.00 |
## 2. Reference
1. https://github.com/PaddlePaddle/PaddleOCR/edit/dygraph/doc/doc_ch/PP-OCRv3_introduction.md
# 模型列表
## 1. 文本检测模型
### 1.1 中文检测模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|ch_PP-OCRv3_det_slim|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测| 1.1M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_distill_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.nb)|
|ch_PP-OCRv3_det| 原始超轻量模型,支持中英文、多语种文本检测 | 3.80M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)|
### 1.2 英文检测模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|en_PP-OCRv3_det_slim |slim量化版超轻量模型,支持英文、数字检测 | 1.1M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_distill_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.nb) |
|en_PP-OCRv3_det |原始超轻量模型,支持英文、数字检测|3.8M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) |
### 1.3 多语言检测模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
| ml_PP-OCRv3_det_slim |slim量化版超轻量模型,支持多语言检测 | 1.1M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_distill_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.nb) |
| ml_PP-OCRv3_det |原始超轻量模型,支持多语言检测 | 3.8M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_distill_train.tar) |
## 2. 文本识别模型
### 2.1 中文识别模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|ch_PP-OCRv3_rec_slim |slim量化版超轻量模型,支持中英文、数字识别| 4.9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) |
|ch_PP-OCRv3_rec|原始超轻量模型,支持中英文、数字识别| 12.4M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
**说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。
### 2.2 英文识别模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|en_PP-OCRv3_rec_slim |slim量化版超轻量模型,支持英文、数字识别 | 3.2M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) |
|en_PP-OCRv3_rec |原始超轻量模型,支持英文、数字识别|9.6M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
### 2.3 多语言识别模型
|模型名称|字典文件|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |--- |
| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |韩文识别|11.0M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) |
| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |日文识别|11.0M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) |
| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | 中文繁体识别|12.0M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) |
| te_PP-OCRv3_rec | ppocr/utils/dict/te_dict.txt | 泰卢固文识别|9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_train.tar) |
| ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt |卡纳达文识别|9.9M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) |
| ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |泰米尔文识别|9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) |
| latin_PP-OCRv3_rec | ppocr/utils/dict/latin_dict.txt | 拉丁文识别 |9.7M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_train.tar) |
| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | 阿拉伯字母|9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) |
| cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | 斯拉夫字母 |9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) |
| devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt |梵文字母 |9.9M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) |
<a name="文本方向分类模型"></a>
## 3. 文本方向分类模型
|模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型,对检测到的文本行文字角度分类| 2.1M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb) |
|ch_ppocr_mobile_v2.0_cls|原始分类器模型,对检测到的文本行文字角度分类|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
# model list
## 1. Text Detection Model
### 1.1 Chinese detection model
| model | description | model_size | download |
| --- | --- | --- | --- |
|ch_PP-OCRv3_det_slim|Slim quantization + distilled ultra-light model, supports Chinese, English and multilingual text detection| 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.nb)|
|ch_PP-OCRv3_det| The original ultra-lightweight model supports Chinese, English, and multilingual text detection| 3.80M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)|
### 1.2 English detection model
| model | description | model_size | download |
| --- | --- | --- | --- |
|en_PP-OCRv3_det_slim |Slim quantitative version ultra-light model, supports English and digital detection | 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.nb) |
|en_PP-OCRv3_det |Original ultra-lightweight model, supports English and digital detection|3.8M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) |
### 1.3 Multilingual Detection Model
| model | description | model_size | download |
| --- | --- | --- | --- |
| ml_PP-OCRv3_det_slim |The slim quantized version of the ultra-lightweight model supports multi-language detection | 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.nb) |
| ml_PP-OCRv3_det |Original ultra-lightweight model that supports multilingual detection | 3.8M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_distill_train.tar) |
## 2. Text Recognition Model
### 2.1 Chinese Recognition Model
| model | description | model_size | download |
| --- | --- | --- | --- |
|ch_PP-OCRv3_rec_slim |Slim quantitative version ultra-light model, supports Chinese and English, digital recognition| 4.9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) |
|ch_PP-OCRv3_rec|Original ultra-lightweight model, support Chinese and English, number recognition| 12.4M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
**Note**: The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset.
### 2.2 English Recognition Model
| model | description | model_size | download |
| --- | --- | --- | --- |
|en_PP-OCRv3_rec_slim |Slim pruned and quantized lightweight model, supporting English and number recognition | 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) |
|en_PP-OCRv3_rec |Original lightweight model, supporting English and number recognition|9.6M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
### 2.3 Multilingual Recognition Model
| model | dict file| description | model_size | download |
| --- | --- | --- | --- |--- |
| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |Korean recognition|11.0M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) |
| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |Japanese recognition|11.0M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) |
| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | Traditional Chinese recognition|12.0M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) |
| te_PP-OCRv3_rec | ppocr/utils/dict/te_dict.txt | Telugu recognition|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_train.tar) |
| ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt |Kannada recognition|9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) |
| ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |Tamil recognition|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) |
| latin_PP-OCRv3_rec | ppocr/utils/dict/latin_dict.txt | Latin recognition |9.7M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_train.tar) |
| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | Arabic alphabet|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) |
| cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | Cyrillic alphabet |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) |
| devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt |Sanskrit alphabet |9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) |
## 3. Text Angle Classification Model
| model | description | model_size | download |
| --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model for text angle classification| 2.1M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) |
|ch_ppocr_mobile_v2.0_cls|Original model for text angle classification|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
---
Model_Info:
name: "PP-OCRv3"
description: ""
description_en: ""
icon: "@后续UE统一设计之后,会存到bos上某个位置"
from_repo: "PaddleOCR"
Task:
- tag_en: "CV"
tag: "计算机视觉"
sub_tag_en: "Character Recognition"
sub_tag: "文字识别"
Example:
- title: "《【官方】十分钟完成 PP-OCRv3 识别全流程实战"
url: "https://aistudio.baidu.com/aistudio/projectdetail/3916206?channelType=0&channel=0"
- title: "鸟枪换炮!基于PP-OCRv3的电表检测识别"
url: "https://aistudio.baidu.com/aistudio/projectdetail/511591?channelType=0&channel=0"
- title: "基于PP-OCRv3实现PCB字符识别"
url: "https://aistudio.baidu.com/aistudio/projectdetail/4008973?channelType=0&channel=0"
Datasets: "ICDAR 2015, ICDAR2019-LSVT,ICDAR2017-RCTW-17,Total-Text,ICDAR2019-ArT"
Pulisher: "Baidu"
License: "apache.2.0"
Paper:
- title: "PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System"
url: "https://arxiv.org/abs/2206.03001"
IfTraining: 0
IfOnlineDemo: 1
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. PP-OCRv3模型简介\n",
"\n",
"PP-OCRv3在PP-OCRv2的基础上进一步升级。整体的框架图保持了与PP-OCRv2相同的pipeline,针对检测模型和识别模型进行了优化。其中,检测模块仍基于DB算法优化,而识别模块不再采用CRNN,换成了IJCAI 2022最新收录的文本识别算法SVTR,并对其进行产业适配。PP-OCRv3系统框图如下所示(粉色框中为PP-OCRv3新增策略):\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocrv3_framework.png\" width = \"80%\" />\n",
"</div>\n",
"\n",
"\n",
"从算法改进思路上看,分别针对检测和识别模型,进行了共9个方面的改进:\n",
"\n",
"- 检测模块:\n",
" - LK-PAN:大感受野的PAN结构;\n",
" - DML:教师模型互学习策略;\n",
" - RSE-FPN:残差注意力机制的FPN结构;\n",
"\n",
"\n",
"- 识别模块:\n",
" - SVTR_LCNet:轻量级文本识别网络;\n",
" - GTC:Attention指导CTC训练策略;\n",
" - TextConAug:挖掘文字上下文信息的数据增广策略;\n",
" - TextRotNet:自监督的预训练模型;\n",
" - UDML:联合互学习策略;\n",
" - UIM:无标注数据挖掘方案。\n",
"\n",
"从效果上看,速度可比情况下,多种场景精度均有大幅提升:\n",
"- 中文场景,相对于PP-OCRv2中文模型提升超5%;\n",
"- 英文数字场景,相比于PP-OCRv2英文模型提升11%;\n",
"- 多语言场景,优化80+语种识别效果,平均准确率提升超5%。\n",
"\n",
"\n",
"更详细的优化细节可参考技术报告:https://arxiv.org/abs/2206.03001 。\n",
"\n",
"更多关于PaddleOCR的内容,可以点击 https://github.com/PaddlePaddle/PaddleOCR 进行了解。\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 模型效果\n",
"\n",
"PP-OCRv3的效果如下:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200261622-1b928d93-93ab-4575-8c60-214bcc03eda1.png\" width = \"80%\" />\n",
"</div>\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200261711-9f18bb04-3736-4f51-892c-de801db9ab9e.png\" width = \"80%\" />\n",
"</div>\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 模型如何使用\n",
"\n",
"### 3.1 模型推理:\n",
"* 安装PaddleOCR whl包"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"! pip install paddleocr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* 快速体验"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# 命令行使用\n",
"! paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"运行完成后,会在终端输出如下结果:\n",
"```log\n",
"[[[28.0, 37.0], [302.0, 39.0], [302.0, 72.0], [27.0, 70.0]], ('纯臻营养护发素', 0.96588134765625)]\n",
"[[[26.0, 81.0], [172.0, 83.0], [172.0, 104.0], [25.0, 101.0]], ('产品信息/参数', 0.9113278985023499)]\n",
"[[[28.0, 115.0], [330.0, 115.0], [330.0, 132.0], [28.0, 132.0]], ('(45元/每公斤,100公斤起订)', 0.8843421936035156)]\n",
"......\n",
"```\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 原理\n",
"\n",
"优化思路具体如下\n",
"\n",
"1. 检测模型优化\n",
"- LK-PAN:大感受野的PAN结构。\n",
" \n",
"LK-PAN (Large Kernel PAN) 是一个具有更大感受野的轻量级PAN结构,核心是将PAN结构的path augmentation中卷积核从3*3改为9*9。通过增大卷积核,提升特征图每个位置覆盖的感受野,更容易检测大字体的文字以及极端长宽比的文字。使用LK-PAN结构,可以将教师模型的hmean从83.2%提升到85.0%。\n",
" <div align=\"center\">\n",
" <img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/LKPAN.png\" width = \"60%\" />\n",
" </div>\n",
"\n",
"- DML:教师模型互学习策略\n",
"\n",
"[DML](https://arxiv.org/abs/1706.00384) (Deep Mutual Learning)互学习蒸馏方法,如下图所示,通过两个结构相同的模型互相学习,可以有效提升文本检测模型的精度。教师模型采用DML策略,hmean从85%提升到86%。将PP-OCRv2中CML的教师模型更新为上述更高精度的教师模型,学生模型的hmean可以进一步从83.2%提升到84.3%。\n",
" <div align=\"center\">\n",
" <img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/teacher_dml.png\" width = \"60%\" />\n",
" </div>\n",
"\n",
"- RSE-FPN:残差注意力机制的FPN结构\n",
"\n",
"RSE-FPN(Residual Squeeze-and-Excitation FPN)如下图所示,引入残差结构和通道注意力结构,将FPN中的卷积层更换为通道注意力结构的RSEConv层,进一步提升特征图的表征能力。考虑到PP-OCRv2的检测模型中FPN通道数非常小,仅为96,如果直接用SEblock代替FPN中卷积会导致某些通道的特征被抑制,精度会下降。RSEConv引入残差结构会缓解上述问题,提升文本检测效果。进一步将PP-OCRv2中CML的学生模型的FPN结构更新为RSE-FPN,学生模型的hmean可以进一步从84.3%提升到85.4%。\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/RSEFPN.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"1. 识别模型优化\n",
"- SVTR_LCNet:轻量级文本识别网络\n",
"\n",
"SVTR_LCNet是针对文本识别任务,将基于Transformer的SVTR网络和轻量级CNN网络PP-LCNet 融合的一种轻量级文本识别网络。使用该网络,预测速度优于PP-OCRv2的识别模型20%,但是由于没有采用蒸馏策略,该识别模型效果略差。此外,进一步将输入图片规范化高度从32提升到48,预测速度稍微变慢,但是模型效果大幅提升,识别准确率达到73.98%(+2.08%),接近PP-OCRv2采用蒸馏策略的识别模型效果。\n",
"\n",
"- GTC:Attention指导CTC训练策略\n",
" \n",
"[GTC](https://arxiv.org/pdf/2002.01276.pdf)(Guided Training of CTC),利用Attention模块CTC训练,融合多种文本特征的表达,是一种有效的提升文本识别的策略。使用该策略,预测时完全去除 Attention 模块,在推理阶段不增加任何耗时,识别模型的准确率进一步提升到75.8%(+1.82%)。训练流程如下所示:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200265540-1bbb730f-35d4-4d72-8e00-70856bb932ee.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"- TextConAug:挖掘文字上下文信息的数据增广策略\n",
"\n",
"TextConAug是一种挖掘文字上下文信息的数据增广策略,主要思想来源于论文[ConCLR](https://www.cse.cuhk.edu.hk/~byu/papers/C139-AAAI2022-ConCLR.pdf),作者提出ConAug数据增广,在一个batch内对2张不同的图像进行联结,组成新的图像并进行自监督对比学习。PP-OCRv3将此方法应用到有监督的学习任务中,设计了TextConAug数据增强方法,可以丰富训练数据上下文信息,提升训练数据多样性。使用该策略,识别模型的准确率进一步提升到76.3%(+0.5%)。TextConAug示意图如下所示:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200265540-1bbb730f-35d4-4d72-8e00-70856bb932ee.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"- TextRotNet:自监督的预训练模型\n",
"\n",
"TextRotNet是使用大量无标注的文本行数据,通过自监督方式训练的预训练模型,参考于论文[STR-Fewer-Labels](https://github.com/ku21fan/STR-Fewer-Labels)。该模型可以初始化SVTR_LCNet的初始权重,从而帮助文本识别模型收敛到更佳位置。使用该策略,识别模型的准确率进一步提升到76.9%(+0.6%)。TextRotNet训练流程如下图所示:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/SSL.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"- UDML:联合互学习策略\n",
"\n",
"UDML(Unified-Deep Mutual Learning)联合互学习是PP-OCRv2中就采用的对于文本识别非常有效的提升模型效果的策略。在PP-OCRv3中,针对两个不同的SVTR_LCNet和Attention结构,对他们之间的PP-LCNet的特征图、SVTR模块的输出和Attention模块的输出同时进行监督训练。使用该策略,识别模型的准确率进一步提升到78.4%(+1.5%)。\n",
"\n",
"- UDML:联合互学习策略\n",
"\n",
"UIM(Unlabeled Images Mining)是一种非常简单的无标注数据挖掘方案。核心思想是利用高精度的文本识别大模型对无标注数据进行预测,获取伪标签,并且选择预测置信度高的样本作为训练数据,用于训练小模型。使用该策略,识别模型的准确率进一步提升到79.4%(+1%)。\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/UIM.png\" width = \"60%\" />\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 注意事项\n",
"\n",
"PP-OCR系列模型训练过程中均使用通用数据,如在实际场景中表现不满意,可标注少量数据进行finetune。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. 相关论文以及引用信息\n",
"```\n",
"@article{du2021pp,\n",
" title={PP-OCRv2: bag of tricks for ultra lightweight OCR system},\n",
" author={Du, Yuning and Li, Chenxia and Guo, Ruoyu and Cui, Cheng and Liu, Weiwei and Zhou, Jun and Lu, Bin and Yang, Yehua and Liu, Qiwen and Hu, Xiaoguang and others},\n",
" journal={arXiv preprint arXiv:2109.03144},\n",
" year={2021}\n",
"}\n",
"\n",
"@inproceedings{zhang2018deep,\n",
" title={Deep mutual learning},\n",
" author={Zhang, Ying and Xiang, Tao and Hospedales, Timothy M and Lu, Huchuan},\n",
" booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},\n",
" pages={4320--4328},\n",
" year={2018}\n",
"}\n",
"\n",
"@inproceedings{hu2020gtc,\n",
" title={Gtc: Guided training of ctc towards efficient and accurate scene text recognition},\n",
" author={Hu, Wenyang and Cai, Xiaocong and Hou, Jun and Yi, Shuai and Lin, Zhiping},\n",
" booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n",
" volume={34},\n",
" number={07},\n",
" pages={11005--11012},\n",
" year={2020}\n",
"}\n",
"\n",
"@inproceedings{zhang2022context,\n",
" title={Context-based Contrastive Learning for Scene Text Recognition},\n",
" author={Zhang, Xinyun and Zhu, Binwu and Yao, Xufeng and Sun, Qi and Li, Ruiyu and Yu, Bei},\n",
" year={2022},\n",
" organization={AAAI}\n",
"}\n",
"```\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. PP-OCRv3 Introduction\n",
"\n",
"PP-OCRv3 is further upgraded on the basis of PP-OCRv2. The pipeline is the same as PP-OCRv2, optimized for detection model and recognition model. Among them, the detection module is still optimized based on the DB algorithm, while the recognition module uses CVRT to replace CRNN, and makes industrial adaptation to it. The pipeline of PP-OCRv3 is as follows (the new strategy for PP-OCRv3 is in the pink box):\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocrv3_framework.png\" width = \"80%\" />\n",
"</div>\n",
"\n",
"\n",
"PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2. \n",
"\n",
"- Text detection:\n",
" - LK-PAN:LK-PAN: a PAN module with large receptive field;\n",
" - DML: deep mutual learning for teacher model;\n",
" - RSE-FPN: a FPN module with residual attention mechanism;\n",
"\n",
"\n",
"- Text recognition\n",
" - SVTR-LCNet: lightweight text recognition network;\n",
" - GTC: Guided rraining of CTC by attention;\n",
" - TextConAug: data augmentation for mining text context information;\n",
" - TextRotNet: self-supervised pre-trained model;\n",
" - U-DML: unified-deep mutual learning;\n",
" - UIM: unlabeled images mining;\n",
"\n",
"In the case of comparable speeds, the accuracy of various scenarios has been greatly improved:\n",
"- Compared with the PP-OCRv2 Chinese model, the Chinese scene is improved by more than 5%;\n",
"- Compared with the PP-OCRv2 English model in the English digital scene, it is improved by 11%;\n",
"- In multi-language scenarios, the recognition performance of 80+ languages is optimized, and the average accuracy rate is increased by more than 5%.\n",
"\n",
"\n",
"\n",
"For more details, please refer to the technical report: https://arxiv.org/abs/2206.03001 .\n",
"\n",
"For more information about PaddleOCR, you can click https://github.com/PaddlePaddle/PaddleOCR to learn more.\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Model Effects\n",
"\n",
"The results of PP-OCRv3 are as follows:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200261622-1b928d93-93ab-4575-8c60-214bcc03eda1.png\" width = \"80%\" />\n",
"</div>\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200261711-9f18bb04-3736-4f51-892c-de801db9ab9e.png\" width = \"80%\" />\n",
"</div>\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. How to Use the Model\n",
"\n",
"### 3.1 Inference\n",
"* Install PaddleOCR whl package"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"! pip install paddleocr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Quick experience"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# command line usage\n",
"! paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After the operation is complete, the following results will be output in the terminal:\n",
"```log\n",
"[[[28.0, 37.0], [302.0, 39.0], [302.0, 72.0], [27.0, 70.0]], ('纯臻营养护发素', 0.96588134765625)]\n",
"[[[26.0, 81.0], [172.0, 83.0], [172.0, 104.0], [25.0, 101.0]], ('产品信息/参数', 0.9113278985023499)]\n",
"[[[28.0, 115.0], [330.0, 115.0], [330.0, 132.0], [28.0, 132.0]], ('(45元/每公斤,100公斤起订)', 0.8843421936035156)]\n",
"......\n",
"```\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Model Principles\n",
"\n",
"The optimization ideas are as follows\n",
"\n",
"1. Text detection enhancement strategies\n",
"- LK-PAN: a PAN module with large receptive field\n",
" \n",
"LK-PAN (Large Kernel PAN) is a lightweight PAN structure with a larger receptive field. The core is to change the convolution kernel in the path augmentation of the PAN structure from 3*3 to 9*9. By increasing the convolution kernel, the receptive field covered by each position of the feature map is improved, and it is easier to detect text in large fonts and text with extreme aspect ratios. Using the LK-PAN structure, the hmean of the teacher model can be improved from 83.2% to 85.0%.\n",
"\n",
" <div align=\"center\">\n",
" <img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/LKPAN.png\" width = \"60%\" />\n",
" </div>\n",
"\n",
"- DML: deep mutual learning for teacher model\n",
"\n",
"[DML](https://arxiv.org/abs/1706.00384) (Deep Mutual Learning) The mutual learning distillation method, as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The teacher model adopts the DML strategy, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above higher-accuracy teacher model, the hmean of the student model can be further improved from 83.2% to 84.3%.\n",
" <div align=\"center\">\n",
" <img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/teacher_dml.png\" width = \"60%\" />\n",
" </div>\n",
"\n",
"- RSE-FPN: a FPN module with residual attention mechanism\n",
"\n",
"RSE-FPN (Residual Squeeze-and-Excitation FPN), as shown in the figure below, introduces a residual structure and a channel attention structure, and replaces the convolutional layer in the FPN with the RSEConv layer of the channel attention structure to further improve the representation of the feature map ability. Considering that the number of FPN channels in the detection model of PP-OCRv2 is very small, only 96, if SEblock is directly used to replace the convolution in FPN, the features of some channels will be suppressed, and the accuracy will be reduced. The introduction of residual structure in RSEConv will alleviate the above problems and improve the text detection effect. By further updating the FPN structure of the student model of CML in PP-OCRv2 to RSE-FPN, the hmean of the student model can be further improved from 84.3% to 85.4%.\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/RSEFPN.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"1. Text recognition enhancement strategies\n",
"- SVTR_LCNet: lightweight text recognition network\n",
"\n",
"SVTR_LCNet is a lightweight text recognition network that integrates Transformer-based SVTR network and lightweight CNN network PP-LCNet for text recognition tasks. Using this network, the prediction speed is 20% better than the recognition model of PP-OCRv2, but the effect of the recognition model is slightly worse because the distillation strategy is not adopted. In addition, the normalization height of the input image is further increased from 32 to 48, and the prediction speed is slightly slower, but the model effect is greatly improved, and the recognition accuracy rate reaches 73.98% (+2.08%), which is close to the recognition model effect of PP-OCRv2 using the distillation strategy.\n",
"\n",
"- GTC: Guided rraining of CTC by attention\n",
"\n",
"[GTC](https://arxiv.org/pdf/2002.01276.pdf) (Guided Training of CTC), which uses the Attention module CTC training and integrates the expression of multiple text features is an effective strategy to improve text recognition. Using this strategy, the Attention module is completely removed during prediction, and no time-consuming is added in the inference stage, and the accuracy of the recognition model is further improved to 75.8% (+1.82%). The training process is as follows:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200265540-1bbb730f-35d4-4d72-8e00-70856bb932ee.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"- TextConAug: data augmentation for mining text context information\n",
"\n",
"TextConAug is a data augmentation strategy for mining textual context information. The main idea comes from the paper [ConCLR](https://www.cse.cuhk.edu.hk/~byu/papers/C139-AAAI2022-ConCLR.pdf) , the author proposes ConAug data augmentation to connect 2 different images in a batch to form new images and perform self-supervised comparative learning. PP-OCRv3 applies this method to supervised learning tasks, and designs the TextConAug data augmentation method, which can enrich the context information of training data and improve the diversity of training data. Using this strategy, the accuracy of the recognition model is further improved to 76.3% (+0.5%). The schematic diagram of TextConAug is as follows:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/12406017/200265540-1bbb730f-35d4-4d72-8e00-70856bb932ee.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"- TextRotNet: self-supervised pre-trained model\n",
"\n",
"TextRotNet is a pre-training model that uses a large amount of unlabeled text line data and is trained in a self-supervised manner. Refer to the paper [STR-Fewer-Labels](https://github.com/ku21fan/STR-Fewer-Labels). This model can initialize the initial weights of SVTR_LCNet, which helps the text recognition model to converge to a better position. Using this strategy, the accuracy of the recognition model is further improved to 76.9% (+0.6%). The TextRotNet training process is shown in the following figure:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/SSL.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"- U-DML: unified-deep mutual learning\n",
"\n",
"UDML (Unified-Deep Mutual Learning) joint mutual learning is a strategy adopted in PP-OCRv2 that is very effective for text recognition to improve the model effect. In PP-OCRv3, for two different SVTR_LCNet and Attention structures, the feature map of PP-LCNet, the output of the SVTR module and the output of the Attention module between them are simultaneously supervised and trained. Using this strategy, the accuracy of the recognition model is further improved to 78.4% (+1.5%).\n",
"\n",
"- UIM: unlabeled images mining\n",
"\n",
"UIM (Unlabeled Images Mining) is a very simple unlabeled data mining scheme. The core idea is to use a high-precision text recognition model to predict unlabeled data, obtain pseudo-labels, and select samples with high prediction confidence as training data for training small models. Using this strategy, the accuracy of the recognition model is further improved to 79.4% (+1%).\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/doc/ppocr_v3/UIM.png\" width = \"60%\" />\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Attention\n",
"\n",
"\n",
"General data are used in the training process of PP-OCR series models. If the performance is not satisfactory in the actual scene, a small amount of data can be marked for finetune."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Related papers and citations\n",
"```\n",
"@article{du2021pp,\n",
" title={PP-OCRv2: bag of tricks for ultra lightweight OCR system},\n",
" author={Du, Yuning and Li, Chenxia and Guo, Ruoyu and Cui, Cheng and Liu, Weiwei and Zhou, Jun and Lu, Bin and Yang, Yehua and Liu, Qiwen and Hu, Xiaoguang and others},\n",
" journal={arXiv preprint arXiv:2109.03144},\n",
" year={2021}\n",
"}\n",
"\n",
"@inproceedings{zhang2018deep,\n",
" title={Deep mutual learning},\n",
" author={Zhang, Ying and Xiang, Tao and Hospedales, Timothy M and Lu, Huchuan},\n",
" booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},\n",
" pages={4320--4328},\n",
" year={2018}\n",
"}\n",
"\n",
"@inproceedings{hu2020gtc,\n",
" title={Gtc: Guided training of ctc towards efficient and accurate scene text recognition},\n",
" author={Hu, Wenyang and Cai, Xiaocong and Hou, Jun and Yi, Shuai and Lin, Zhiping},\n",
" booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n",
" volume={34},\n",
" number={07},\n",
" pages={11005--11012},\n",
" year={2020}\n",
"}\n",
"\n",
"@inproceedings{zhang2022context,\n",
" title={Context-based Contrastive Learning for Scene Text Recognition},\n",
" author={Zhang, Xinyun and Zhu, Binwu and Yao, Xufeng and Sun, Qi and Li, Ruiyu and Yu, Bei},\n",
" year={2022},\n",
" organization={AAAI}\n",
"}\n",
"```\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册