提交 4eb26893 编写于 作者: qq_25193841's avatar qq_25193841

Merge remote-tracking branch 'origin/dygraph' into dygraph

......@@ -2,7 +2,7 @@ English | [简体中文](README_ch.md)
# PPOCRLabel
PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PPOCR model to automatically detect and re-recognize data. It is written in python3 and pyqt5, supporting rectangular box annotation and four-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models.
PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PPOCR model to automatically detect and re-recognize data. It is written in python3 and pyqt5, supporting rectangular box, table and multi-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models.
<img src="./data/gif/steps_en.gif" width="100%"/>
......
......@@ -18,21 +18,25 @@ English | [简体中文](README_ch.md)
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="./doc/imgs_results/PP-OCRv3/en/en_4.png" width="800">
</div>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
</div>
## Recent updates
- 2022.5.9 release PaddleOCR v2.5, including:
- [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
- [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image.
- Interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology.
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR, [tutorial](./ppstructure/docs/kie_en.md)) and 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM, [tutorial](./ppstructure/vqa)).
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](./ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- **🔥2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- Release [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
- Release [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image.
- Release interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology.
- 2021.12.21 Release PaddleOCR [release/2.4](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4)
- Release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR).
- Release 1 key information extraction algorithm (SDMGR, [tutorial](./ppstructure/docs/kie_en.md)) and 3 [DocVQA](./ppstructure/vqa) algorithms (LayoutLM, LayoutLMv2, LayoutXLM).
- 2021.9.7 Release PaddleOCR [release/2.3](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.3)
- Release [PP-OCRv2](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv2). The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 Release PaddleOCR [release/2.2](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.2)
- Release a new structured documents analysis toolkit, i.e., [PP-Structure](./ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- [more](./doc/doc_en/update_en.md)
......@@ -145,27 +149,27 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
## Visualization [more](./doc/doc_en/visualization_en.md)
<details open>
<summary>PP-OCRv2 Chinese model</summary>
<summary>PP-OCRv3 Chinese model</summary>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 English model</summary>
<summary>PP-OCRv3 English model</summary>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/en/en_1.png" width="800">
<img src="doc/imgs_results/PP-OCRv3/en/en_2.png" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 Multilingual model</summary>
<summary>PP-OCRv3 Multilingual model</summary>
<div align="center">
<img src="./doc/imgs_results/french_0.jpg" width="800">
<img src="./doc/imgs_results/korean.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/multi_lang/japan_2.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/multi_lang/korean_1.jpg" width="800">
</div>
</details>
......
......@@ -22,7 +22,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
</div>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
</div>
## 近期更新
......@@ -61,7 +61,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<a name="开源社区"></a>
## 开源社区
- **加入社区👬:**微信扫描二维码并填写问卷之后,加入交流群领取福利
- **加入社区👬:** 微信扫描二维码并填写问卷之后,加入交流群领取福利
- **获取5月11-13日每晚20:30《OCR超强技术详解与产业应用实战》的直播课链接**
- **10G重磅OCR学习大礼包:**《动手学OCR》电子书,配套讲解视频和notebook项目;66篇OCR相关顶会前沿论文打包放送,包括CVPR、AAAI、IJCAI、ICCV等;PaddleOCR历次发版直播课视频;OCR社区优秀开发者项目分享视频。
......@@ -87,6 +87,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure 系列模型下载](./ppstructure/docs/models_list.md)
<a name="文档教程"></a>
## 文档教程
- [运行环境准备](./doc/doc_ch/environment.md)
......@@ -125,11 +126,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [文本识别算法](./doc/doc_ch/algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [端到端算法](./doc/doc_ch/algorithm_overview.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md)
- [场景应用](./doc/doc_ch/application.md)
- [金融场景(表单/票据等)]()
- [工业场景(电表度数/车牌等)]()
- [教育场景(手写体/公式等)]()
- [医疗场景(化验单等)]()
- [场景应用](./applications)
- 数据标注与合成
- [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md)
- [数据合成工具Style-Text](./StyleText/README_ch.md)
......@@ -158,36 +155,34 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 效果展示 [more](./doc/doc_ch/visualization.md)
<details open>
<summary>PP-OCRv2 中文模型</summary>
<summary>PP-OCRv3 中文模型</summary>
<div align="center">
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 英文模型</summary>
<summary>PP-OCRv3 英文模型</summary>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/en/en_1.png" width="800">
<img src="doc/imgs_results/PP-OCRv3/en/en_2.png" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 其他语言模型</summary>
<summary>PP-OCRv3 多语言模型</summary>
<div align="center">
<img src="./doc/imgs_results/french_0.jpg" width="800">
<img src="./doc/imgs_results/korean.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/multi_lang/japan_2.jpg" width="800">
<img src="doc/imgs_results/PP-OCRv3/multi_lang/korean_1.jpg" width="800">
</div>
</details>
......
......@@ -16,14 +16,14 @@
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/9bd844b970f94e5ba0bc0c5799bd819ea9b1861bb306471fabc2d628864d418e'></center>
<center>图1 多模态表单识别流程图</center>
注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: [多模态表单识别](https://aistudio.baidu.com/aistudio/projectdetail/3815918)(配备Tesla V100、A100等高级算力资源)
注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: [多模态表单识别](https://aistudio.baidu.com/aistudio/projectdetail/3884375)(配备Tesla V100、A100等高级算力资源)
# 2 安装说明
下载PaddleOCR源码,项目中已经帮大家打包好的PaddleOCR(已经修改好配置文件),无需下载解压即可,只需安装依赖环境~
下载PaddleOCR源码,上述AIStudio项目中已经帮大家打包好的PaddleOCR(已经修改好配置文件),无需下载解压即可,只需安装依赖环境~
```python
......@@ -33,7 +33,7 @@
```python
# 如仍需安装or安装更新,可以执行以下步骤
! git clone https://github.com/PaddlePaddle/PaddleOCR.git -b dygraph
# ! git clone https://github.com/PaddlePaddle/PaddleOCR.git -b dygraph
# ! git clone https://gitee.com/PaddlePaddle/PaddleOCR
```
......@@ -290,7 +290,7 @@ Eval.dataset.transforms.DetResizeForTest:评估尺寸,添加如下参数
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/5a75137c5f924dfeb6956b5818812298cc3dc7992ac84954b4175be9adf83c77"></center>
<center>图8 文本检测方案2-模型评估</center>
使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/ch_db_mv3-student1600-finetune/best_accuracy`
使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/ch_db_mv3-student1600-finetune/best_accuracy`[模型下载地址](https://paddleocr.bj.bcebos.com/fanliku/sheet_recognition/ch_db_mv3-student1600-finetune.zip)
```python
......@@ -538,7 +538,7 @@ Train.dataset.ratio_list:动态采样
<center>图16 文本识别方案3-模型评估</center>
使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/rec_mobile_pp-OCRv2-student-readldata/best_accuracy`
使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/rec_mobile_pp-OCRv2-student-readldata/best_accuracy`[模型下载地址](https://paddleocr.bj.bcebos.com/fanliku/sheet_recognition/rec_mobile_pp-OCRv2-student-realdata.zip)
```python
......
......@@ -30,7 +30,7 @@ client.load_client_config(sys.argv[1:])
client.connect(["127.0.0.1:9293"])
import paddle
test_img_dir = "test_img/"
test_img_dir = "../../doc/imgs/"
ocr_reader = OCRReader(char_dict_path="../../ppocr/utils/ppocr_keys_v1.txt")
......@@ -45,8 +45,7 @@ for img_file in os.listdir(test_img_dir):
image_data = file.read()
image = cv2_to_base64(image_data)
res_list = []
fetch_map = client.predict(
feed={"x": image}, fetch=["save_infer_model/scale_0.tmp_1"], batch=True)
fetch_map = client.predict(feed={"x": image}, fetch=[], batch=True)
one_batch_res = ocr_reader.postprocess(fetch_map, with_score=True)
for res in one_batch_res:
res_list.append(res[0])
......
......@@ -44,7 +44,7 @@
在CTW1500文本检测公开数据集上,算法效果如下:
|模型|骨干网络|precision|recall|Hmean|下载链接|
| --- | --- | --- | --- | --- | --- |
| --- | --- | --- | --- | --- | --- |
|FCE|ResNet50_dcn|88.39%|82.18%|85.27%|[训练模型](https://paddleocr.bj.bcebos.com/contribution/det_r50_dcn_fce_ctw_v2.0_train.tar)|
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:
......@@ -65,6 +65,7 @@
- [x] [NRTR](./algorithm_rec_nrtr.md)
- [x] [SAR](./algorithm_rec_sar.md)
- [x] [SEED](./algorithm_rec_seed.md)
- [x] [SVTR](./algorithm_rec_svtr.md)
参考[DTRB](https://arxiv.org/abs/1904.01906)[3]文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
......@@ -82,6 +83,7 @@
|NRTR|NRTR_MTB| 84.21% | rec_mtb_nrtr | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
|SAR|Resnet31| 87.20% | rec_r31_sar | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
|SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) |
|SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) |
<a name="2"></a>
......@@ -90,5 +92,3 @@
已支持的端到端OCR算法列表(戳链接获取使用教程):
- [x] [PGNet](./algorithm_e2e_pgnet.md)
# 场景应用
\ No newline at end of file
......@@ -97,7 +97,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
| --- | --- | --- | --- | --- |
|en_PP-OCRv3_rec_slim |【最新】slim量化版超轻量模型,支持英文、数字识别 | [en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) |
|en_PP-OCRv3_rec |【最新】原始超轻量模型,支持英文、数字识别|[en_PP-OCRv3_rec.yml](../../configs/rec/en_PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
|en_PP-OCRv3_rec |【最新】原始超轻量模型,支持英文、数字识别|[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
|en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) |
|en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
......@@ -118,7 +118,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
| cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | 斯拉夫字母 | [cyrillic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/cyrillic_PP-OCRv3_rec.yml) |9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) |
| devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt |梵文字母 | [devanagari_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/devanagari_PP-OCRv3_rec.yml) |9.9M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) |
更多支持语种请参考: [多语言模型](./multi_languages.md)
查看完整语种列表与使用教程请参考: [多语言模型](./multi_languages.md)
<a name="文本方向分类模型"></a>
......
......@@ -2,6 +2,7 @@
**近期更新**
- 2022.5.8 更新`PP-OCRv3`版 多语言检测和识别模型,平均识别准确率提升5%以上。
- 2021.4.9 支持**80种**语言的检测和识别
- 2021.4.9 支持**轻量高精度**英文模型检测识别
......@@ -254,7 +255,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
|英文|english|en| |乌克兰文|Ukranian|uk|
|法文|french|fr| |白俄罗斯文|Belarusian|be|
|德文|german|german| |泰卢固文|Telugu |te|
|日文|japan|japan| | 阿巴扎文 | Abaza | abq |
|日文|japan|japan| | 阿巴扎文 |Abaza | abq |
|韩文|korean|korean| |泰米尔文|Tamil |ta|
|中文繁体|chinese traditional |chinese_cht| |南非荷兰文 |Afrikaans |af|
|意大利文| Italian |it| |阿塞拜疆文 |Azerbaijani |az|
......
# 《动手学OCR》电子书
特点:
- 覆盖OCR全栈技术
- 理论实践相结合
- Notebook交互式学习
- 配套教学视频
《动手学OCR》是PaddleOCR团队携手复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下:
[电子书下载]()
- 覆盖从文本检测识别到文档分析的OCR全栈技术
- 紧密结合理论实践,跨越代码实现鸿沟,并配套教学视频
- Notebook交互式学习,灵活修改代码,即刻获得结果
目录:
![]()
[notebook教程](../../notebook/notebook_ch/)
## 本书结构
[教学视频](https://aistudio.baidu.com/aistudio/education/group/info/25207)
\ No newline at end of file
![](https://ai-studio-static-online.cdn.bcebos.com/5e612d9079b84958940614d9613eb928f1a50fe21ba6446cb99186bf2d76fe3d)
- 第一部分是本书的推荐序、序言与预备知识,包含本书的定位与使用书籍内容的过程中需要用到的知识索引、资源链接等
- 第二部分是本书的4-8章,介绍与OCR核心的检测、识别能力相关的概念、应用与产业实践。在“OCR技术导论”中总括性的解释OCR的应用场景和挑战、技术基本概念以及在产业应用中的痛点问题。然后在
“文本检测”与“文本识别”两章中介绍OCR的两个基本任务,并在每章中配套一个算法展开代码详解与实战练习。第6、7章是关于PP-OCR系列模型的详细介绍,PP-OCR是一套面向产业应用的OCR系统,在
基础检测和识别模型的基础之上经过一系列优化策略达到通用领域的产业级SOTA模型,同时打通多种预测部署方案,赋能企业快速落地OCR应用。
- 第三部分是本书的9-12章,介绍两阶段OCR引擎之外的应用,包括数据合成、预处理算法、端到端模型,重点展开了OCR在文档场景下的版面分析、表格识别、视觉文档问答的能力,同样通过算法与代码结
合的方式使得读者能够深入理解并应用。
## 资料地址
- 中文版电子书下载请扫描首页二维码入群后领取
- [notebook教程](../../notebook/notebook_ch/)
- [教学视频](https://aistudio.baidu.com/aistudio/education/group/info/25207)
......@@ -71,38 +71,28 @@ PP-OCRv3系统pipeline如下:
## 4. 效果展示 [more](./visualization.md)
<details open>
<summary>PP-OCRv2 中文模型</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
<summary>PP-OCRv3 中文模型</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 英文模型</summary>
<summary>PP-OCRv3 英文模型</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_1.png" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_2.png" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 其他语言模型</summary>
<summary>PP-OCRv3 多语言模型</summary>
<div align="center">
<img src="../imgs_results/french_0.jpg" width="800">
<img src="../imgs_results/korean.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/multi_lang/japan_2.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/multi_lang/korean_1.jpg" width="800">
</div>
</details>
......
# 效果展示
<a name="超轻量PP-OCRv3效果展示"></a>
## 超轻量PP-OCRv3效果展示
### PP-OCRv3中文模型
<div align="center">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg" width="800">
</div>
### PP-OCRv3英文数字模型
<div align="center">
<img src="../imgs_results/PP-OCRv3/en/en_1.png" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_2.png" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_3.png" width="800">
</div>
### PP-OCRv3多语言模型
<div align="center">
<img src="../imgs_results/PP-OCRv3/multi_lang/japan_2.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/multi_lang/korean_1.jpg" width="800">
</div>
<a name="超轻量PP-OCRv2效果展示"></a>
## 超轻量PP-OCRv2效果展示
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic001.jpg" width="800">
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic002.jpg" width="800">
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg" width="800">
<a name="超轻量ppocr_server_2.0效果展示"></a>
<a name="通用ppocr_server_2.0效果展示"></a>
## 通用PP-OCR server 效果展示
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00009282.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg" width="800">
</div>
......
......@@ -99,7 +99,7 @@ Considering that the features of some channels will be suppressed if the convolu
<a name="3"></a>
## 3. Optimization for Text Recognition Model
The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.
The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.
The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
......@@ -151,7 +151,7 @@ Due to the limited model structure supported by the MKLDNN acceleration library,
3. The experiment found that the prediction speed of the Global Mixing Block is related to the shape of the input features. Therefore, after moving the position of the Global Mixing Block to the back of pooling layer, the accuracy dropped to 71.9%, and the speed surpassed the PP-OCRv2-baseline based on the CNN structure by 22%. The network structure is as follows:
<div align="center">
<img src="../ppocr_v3/LCNet_SVTR.png" width=800>
<img src="../ppocr_v3/LCNet_SVTR_en.png" width=800>
</div>
The ablation experiments are as follows:
......@@ -172,7 +172,7 @@ Note: When testing the speed, the input image shape of 01-05 are all (3, 32, 320
[GTC](https://arxiv.org/pdf/2002.01276.pdf) (Guided Training of CTC), using the Attention module to guide the training of CTC to fuse multiple features is an effective strategy to improve text recognition accuracy. No more time-consuming is added in the inference process as the Attention module is completely removed during prediction. The accuracy of the recognition model is further improved to 75.8% (+1.82%). The training process is as follows:
<div align="center">
<img src="../ppocr_v3/GTC.png" width=800>
<img src="../ppocr_v3/GTC_en.png" width=800>
</div>
**(3)TextConAug:Data Augmentation Strategy for Mining Text Context Information**
......
......@@ -58,6 +58,7 @@ Supported text recognition algorithms (Click the link to get the tutorial):
- [x] [NRTR](./algorithm_rec_nrtr_en.md)
- [x] [SAR](./algorithm_rec_sar_en.md)
- [x] [SEED](./algorithm_rec_seed_en.md)
- [x] [SVTR](./algorithm_rec_svtr_en.md)
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
......@@ -75,6 +76,7 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
|NRTR|NRTR_MTB| 84.21% | rec_mtb_nrtr | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
|SAR|Resnet31| 87.20% | rec_r31_sar | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
|SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) |
|SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) |
<a name="2"></a>
......
# OCR Model List(V2.1, updated on 2022.4.28)
# OCR Model List(V3, updated on 2022.4.28)
> **Note**
> 1. Compared with the model v2, the 3rd version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model has optimizations in accuracy and speed with CPU.
> 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 or higher are the dynamic graph trained version and achieve close performance.
......@@ -91,7 +91,7 @@ Relationship of the above models is as follows.
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting english, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) |
|en_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting english, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) |
|en_PP-OCRv3_rec| [New] Original lightweight model, supporting english, English, multilingual text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
|en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) |
|en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
......@@ -101,21 +101,18 @@ Relationship of the above models is as follows.
|model name| dict file | description|config|model size|download|
| --- | --- | --- |--- | --- | --- |
| french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt | Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_language/rec_french_lite_train.yml)|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) |
| german_mobile_v2.0_rec | ppocr/utils/dict/german_dict.txt | Lightweight model for German recognition|[rec_german_lite_train.yml](../../configs/rec/multi_language/rec_german_lite_train.yml)|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_train.tar) |
| korean_mobile_v2.0_rec | ppocr/utils/dict/korean_dict.txt | Lightweight model for Korean recognition|[rec_korean_lite_train.yml](../../configs/rec/multi_language/rec_korean_lite_train.yml)|3.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_train.tar) |
| japan_mobile_v2.0_rec | ppocr/utils/dict/japan_dict.txt | Lightweight model for Japanese recognition|[rec_japan_lite_train.yml](../../configs/rec/multi_language/rec_japan_lite_train.yml)|4.23M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_train.tar) |
| chinese_cht_mobile_v2.0_rec | ppocr/utils/dict/chinese_cht_dict.txt | Lightweight model for chinese cht recognition|rec_chinese_cht_lite_train.yml|5.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_train.tar) |
| te_mobile_v2.0_rec | ppocr/utils/dict/te_dict.txt | Lightweight model for Telugu recognition|rec_te_lite_train.yml|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_train.tar) |
| ka_mobile_v2.0_rec | ppocr/utils/dict/ka_dict.txt | Lightweight model for Kannada recognition|rec_ka_lite_train.yml|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_train.tar) |
| ta_mobile_v2.0_rec | ppocr/utils/dict/ta_dict.txt | Lightweight model for Tamil recognition|rec_ta_lite_train.yml|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_train.tar) |
| latin_mobile_v2.0_rec | ppocr/utils/dict/latin_dict.txt | Lightweight model for latin recognition | [rec_latin_lite_train.yml](../../configs/rec/multi_language/rec_latin_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_train.tar) |
| arabic_mobile_v2.0_rec | ppocr/utils/dict/arabic_dict.txt | Lightweight model for arabic recognition | [rec_arabic_lite_train.yml](../../configs/rec/multi_language/rec_arabic_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_train.tar) |
| cyrillic_mobile_v2.0_rec | ppocr/utils/dict/cyrillic_dict.txt | Lightweight model for cyrillic recognition | [rec_cyrillic_lite_train.yml](../../configs/rec/multi_language/rec_cyrillic_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_train.tar) |
| devanagari_mobile_v2.0_rec | ppocr/utils/dict/devanagari_dict.txt | Lightweight model for devanagari recognition | [rec_devanagari_lite_train.yml](../../configs/rec/multi_language/rec_devanagari_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_train.tar) |
For more supported languages, please refer to : [Multi-language model](./multi_languages_en.md)
| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |Lightweight model for Korean recognition|[korean_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/korean_PP-OCRv3_rec.yml)|11M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) |
| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |Lightweight model for Japanese recognition|[japan_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/japan_PP-OCRv3_rec.yml)|11M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) |
| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | Lightweight model for chinese cht|[chinese_cht_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/chinese_cht_PP-OCRv3_rec.yml)|12M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) |
| te_PP-OCRv3_rec | ppocr/utils/dict/te_dict.txt | Lightweight model for Telugu recognition |[te_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/te_PP-OCRv3_rec.yml)|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_train.tar) |
| ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt | Lightweight model for Kannada recognition |[ka_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ka_PP-OCRv3_rec.yml)|9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) |
| ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |Lightweight model for Tamil recognition|[ta_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ta_PP-OCRv3_rec.yml)|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) |
| latin_PP-OCRv3_rec | ppocr/utils/dict/latin_dict.txt | Lightweight model for latin recognition | [latin_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/latin_PP-OCRv3_rec.yml) |9.7M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_train.tar) |
| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | Lightweight model for arabic recognition | [arabic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/rec_arabic_lite_train.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) |
| cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | Lightweight model for cyrillic recognition | [cyrillic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/cyrillic_PP-OCRv3_rec.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) |
| devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt | Lightweight model for devanagari recognition | [devanagari_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/devanagari_PP-OCRv3_rec.yml) |9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) |
For a complete list of languages ​​and tutorials, please refer to : [Multi-language model](./multi_languages_en.md)
<a name="Angle"></a>
## 3. Text Angle Classification Model
......
......@@ -2,6 +2,7 @@
**Recent Update**
- 2022.5.8 update the `PP-OCRv3` version of the multi-language detection and recognition model, and the average recognition accuracy has increased by more than 5%.
- 2021.4.9 supports the detection and recognition of 80 languages
- 2021.4.9 supports **lightweight high-precision** English model detection and recognition
......
# E-book: *Dive Into OCR*
\ No newline at end of file
# E-book: *Dive Into OCR*
"Dive Into OCR" is a textbook that combines OCR theory and practice, written by the PaddleOCR team, Chen Zhineng, a Pre-tenure Professor at Fudan University, Huang Wenhui, a senior expert in the field of vision at China Mobile Research Institute, and other industry-university-research colleagues, as well as OCR developers. The main features are as follows:
- OCR full-stack technology covering text detection, recognition and document analysis
- Closely integrate theory and practice, cross the code implementation gap, and supporting instructional videos
- Jupyter Notebook textbook, flexibly modifying code for instant results
## Structure
- The first part is the preliminary knowledge of the book, including the knowledge index and resource links needed in the process of positioning and using the book content of the book
- The second part is chapters 4-8 of the book, which introduce the concepts, applications, and industry practices related to the detection and identification capabilities of the OCR engine. In the "Introduction to OCR Technology", the application scenarios and challenges of OCR, the basic concepts of technology, and the pain points in industrial applications are comprehensively explained. Then, in the two chapters of "Text Detection" and "Text Recognition", the two basic tasks of OCR are introduced. In each chapter, an algorithm is accompanied by a detailed explanation of the code and practical exercises. Chapters 6 and 7 are a detailed introduction to the PP-OCR series model, PP-OCR is a set of OCR systems for industrial applications, on the basis of the basic detection and identification model, after a series of optimization strategies to achieve the general field of industrial SOTA model, while opening up a variety of predictive deployment solutions, enabling enterprises to quickly land OCR applications.
- The third part is chapter 9-12 of the book, which introduces applications other than the two-stage OCR engine, including data synthesis, preprocessing algorithm, and end-to-end model, focusing on OCR's layout analysis, table recognition, visual document question and answer capabilities in the document scene, and also through the combination of algorithm and code, so that readers can deeply understand and apply.
## Address
- [E-book: *Dive Into OCR* (link generating)]()
- [Jupyter notebook](../../notebook/notebook_en/)
- [videos (Chinese only)](https://aistudio.baidu.com/aistudio/education/group/info/25207)
......@@ -67,36 +67,28 @@ For the performance comparison between PP-OCR series models, please check the [b
## 4. Visualization [more](./visualization.md)
<details open>
<summary>PP-OCRv2 English model</summary>
<summary>PP-OCRv3 Chinese model</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 Chinese model</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
<summary>PP-OCRv3 English model</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_1.png" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_2.png" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 Multilingual model</summary>
<summary>PP-OCRv3 Multilingual model</summary>
<div align="center">
<img src="../imgs_results/french_0.jpg" width="800">
<img src="../imgs_results/korean.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/multi_lang/japan_2.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/multi_lang/korean_1.jpg" width="800">
</div>
</details>
......
# Visualization
<a name="PP-OCRv3"></a>
## PP-OCRv3
### PP-OCRv3 Chinese model
<div align="center">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg" width="800">
</div>
### PP-OCRv3 English model
<div align="center">
<img src="../imgs_results/PP-OCRv3/en/en_1.png" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_2.png" width="800">
<img src="../imgs_results/PP-OCRv3/en/en_3.png" width="800">
</div>
### PP-OCRv3 Multilingual model
<div align="center">
<img src="../imgs_results/PP-OCRv3/multi_lang/japan_2.jpg" width="800">
<img src="../imgs_results/PP-OCRv3/multi_lang/korean_1.jpg" width="800">
</div>
<a name="PP-OCRv2"></a>
## PP-OCRv2
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic001.jpg" width="800">
......@@ -13,13 +38,6 @@
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00009282.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg" width="800">
</div>
......
doc/features.png

1.1 MB | W: | H:

doc/features.png

1.1 MB | W: | H:

doc/features.png
doc/features.png
doc/features.png
doc/features.png
  • 2-up
  • Swipe
  • Onion skin
doc/features_en.png

1.2 MB | W: | H:

doc/features_en.png

1.2 MB | W: | H:

doc/features_en.png
doc/features_en.png
doc/features_en.png
doc/features_en.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -47,8 +47,8 @@ __all__ = [
]
SUPPORT_DET_MODEL = ['DB']
VERSION = '2.5.0.1'
SUPPORT_REC_MODEL = ['CRNN']
VERSION = '2.5.0.3'
SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet']
BASE_DIR = os.path.expanduser("~/.paddleocr/")
DEFAULT_OCR_MODEL_VERSION = 'PP-OCRv3'
......
......@@ -40,7 +40,7 @@ The main features of PP-Structure are as follows:
### 4.1 Layout analysis and table recognition
<img src="../doc/table/ppstructure.GIF" width="100%"/>
<img src="docs/table/ppstructure.GIF" width="100%"/>
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
......@@ -48,7 +48,7 @@ The figure shows the pipeline of layout analysis + table recognition. The image
* SER
*
![](../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../doc/vqa/result_ser/zh_val_42_ser.jpg)
![](docs/vqa/result_ser/zh_val_0_ser.jpg) | ![](docs/vqa/result_ser/zh_val_42_ser.jpg)
---|---
Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header:
......@@ -62,7 +62,7 @@ The corresponding category and OCR recognition results are also marked at the to
* RE
![](../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../doc/vqa/result_re/zh_val_40_re.jpg)
![](docs/vqa/result_re/zh_val_21_re.jpg) | ![](docs/vqa/result_re/zh_val_40_re.jpg)
---|---
......@@ -76,7 +76,7 @@ Start from [Quick Installation](./docs/quickstart.md)
### 6.1 Layout analysis and table recognition
![pipeline](../doc/table/pipeline.jpg)
![pipeline](docs/table/pipeline.jpg)
In PP-Structure, the image will be divided into 5 types of areas **text, title, image list and table**. For the first 4 types of areas, directly use PP-OCR system to complete the text detection and recognition. For the table area, after the table structuring process, the table in image is converted into an Excel file with the same table style.
......
......@@ -23,6 +23,7 @@ sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
os.environ["FLAGS_allocator_strategy"] = 'auto_growth'
import cv2
import json
import numpy as np
import time
import logging
from copy import deepcopy
......@@ -33,6 +34,7 @@ from ppocr.utils.logging import get_logger
from tools.infer.predict_system import TextSystem
from ppstructure.table.predict_table import TableSystem, to_excel
from ppstructure.utility import parse_args, draw_structure_result
from ppstructure.recovery.docx import convert_info_docx
logger = get_logger()
......@@ -104,7 +106,12 @@ class StructureSystem(object):
return_ocr_result_in_table)
else:
if self.text_system is not None:
filter_boxes, filter_rec_res = self.text_system(roi_img)
if args.recovery:
wht_im = np.ones(ori_im.shape, dtype=ori_im.dtype)
wht_im[y1:y2, x1:x2, :] = roi_img
filter_boxes, filter_rec_res = self.text_system(wht_im)
else:
filter_boxes, filter_rec_res = self.text_system(roi_img)
# remove style char
style_token = [
'<strike>', '<strike>', '<sup>', '</sub>', '<b>',
......@@ -118,7 +125,8 @@ class StructureSystem(object):
for token in style_token:
if token in rec_str:
rec_str = rec_str.replace(token, '')
box += [x1, y1]
if not args.recovery:
box += [x1, y1]
res.append({
'text': rec_str,
'confidence': float(rec_conf),
......@@ -192,6 +200,8 @@ def main(args):
# img_save_path = os.path.join(save_folder, img_name + '.jpg')
cv2.imwrite(img_save_path, draw_img)
logger.info('result save to {}'.format(img_save_path))
if args.recovery:
convert_info_docx(img, res, save_folder, img_name)
elapse = time.time() - starttime
logger.info("Predict time : {:.3f}s".format(elapse))
......
English | [简体中文](README_ch.md)
- [Getting Started](#getting-started)
- [1. Introduction](#1)
- [2. Install](#2)
- [2.1 Installation dependencies](#2.1)
- [2.2 Install PaddleOCR](#2.2)
- [3. Quick Start](#3)
<a name="1"></a>
## 1. Introduction
Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order.
Layout recovery combines [layout analysis](../layout/README.md)[table recognition](../table/README.md) to better recover images, tables, titles, etc.
The following figure shows the result:
<div align="center">
<img src="../docs/table/recovery.jpg" width = "700" />
</div>
<a name="2"></a>
## 2. Install
<a name="2.1"></a>
### 2.1 Install dependencies
- **(1) Install PaddlePaddle**
```bash
python3 -m pip install --upgrade pip
# GPU installation
python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple
# CPU installation
python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple
````
For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick).
<a name="2.2"></a>
### 2.2 Install PaddleOCR
- **(1) Download source code**
```bash
[Recommended] git clone https://github.com/PaddlePaddle/PaddleOCR
# If the pull cannot be successful due to network problems, you can also choose to use the hosting on the code cloud:
git clone https://gitee.com/paddlepaddle/PaddleOCR
# Note: Code cloud hosting code may not be able to synchronize the update of this github project in real time, there is a delay of 3 to 5 days, please use the recommended method first.
````
- **(2) Install recovery's `requirements`**
```bash
python3 -m pip install -r ppstructure/recovery/requirements.txt
````
<a name="3"></a>
## 3. Quick Start
```python
cd PaddleOCR/ppstructure
# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
# run
python3 predict_system.py --det_model_dir=inference/en_PP-OCRv3_det_infer --rec_model_dir=inference/en_PP-OCRv3_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --rec_char_dict_path=../ppocr/utils/en_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --output ./output/table --rec_image_shape=3,48,320 --vis_font_path=../doc/fonts/simfang.ttf --recovery=True --image_dir=./docs/table/1.png
```
After running, the docx of each picture will be saved in the directory specified by the output field
\ No newline at end of file
[English](README.md) | 简体中文
# 版面恢复使用说明
- [1. 简介](#1)
- [2. 安装](#2)
- [2.1 安装依赖](#2.1)
- [2.2 安装PaddleOCR](#2.2)
- [3. 使用](#3)
<a name="1"></a>
## 1. 简介
版面恢复就是在OCR识别后,内容仍然像原文档图片那样排列着,段落不变、顺序不变的输出到word文档中等。
版面恢复结合了[版面分析](../layout/README_ch.md)[表格识别](../table/README_ch.md)技术,从而更好地恢复图片、表格、标题等内容,下图展示了版面恢复的结果:
<div align="center">
<img src="../docs/table/recovery.jpg" width = "700" />
</div>
<a name="2"></a>
## 2. 安装
<a name="2.1"></a>
### 2.1 安装依赖
- **(1) 安装PaddlePaddle**
```bash
python3 -m pip install --upgrade pip
# GPU安装
python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple
# CPU安装
python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple
```
更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
<a name="2.2"></a>
### 2.2 安装PaddleOCR
- **(1)下载版面恢复源码**
```bash
【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
# 如果因为网络问题无法pull成功,也可选择使用码云上的托管:
git clone https://gitee.com/paddlepaddle/PaddleOCR
# 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
```
- **(2)安装recovery的`requirements`**
```bash
python3 -m pip install -r ppstructure/recovery/requirements.txt
```
<a name="3"></a>
## 3. 使用
恢复给定文档的版面:
```python
cd PaddleOCR/ppstructure
# 下载模型
mkdir inference && cd inference
# 下载超英文轻量级PP-OCRv3模型的检测模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# 下载英文轻量级PP-OCRv3模型的识别模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# 下载超轻量级英文表格英寸模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
# 执行预测
python3 predict_system.py --det_model_dir=inference/en_PP-OCRv3_det_infer --rec_model_dir=inference/en_PP-OCRv3_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --rec_char_dict_path=../ppocr/utils/en_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --output ./output/table --rec_image_shape=3,48,320 --vis_font_path=../doc/fonts/simfang.ttf --recovery=True --image_dir=./docs/table/1.png
```
运行完成后,每张图片的docx文档会保存到output字段指定的目录下
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import os
import pypandoc
from copy import deepcopy
from docx import Document
from docx import shared
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.section import WD_SECTION
from docx.oxml.ns import qn
from ppocr.utils.logging import get_logger
logger = get_logger()
def convert_info_docx(img, res, save_folder, img_name):
doc = Document()
doc.styles['Normal'].font.name = 'Times New Roman'
doc.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u'宋体')
doc.styles['Normal'].font.size = shared.Pt(6.5)
h, w, _ = img.shape
res = sorted_layout_boxes(res, w)
flag = 1
for i, region in enumerate(res):
if flag == 2 and region['layout'] == 'single':
section = doc.add_section(WD_SECTION.CONTINUOUS)
section._sectPr.xpath('./w:cols')[0].set(qn('w:num'), '1')
flag = 1
elif flag == 1 and region['layout'] == 'double':
section = doc.add_section(WD_SECTION.CONTINUOUS)
section._sectPr.xpath('./w:cols')[0].set(qn('w:num'), '2')
flag = 2
if region['type'] == 'Figure':
excel_save_folder = os.path.join(save_folder, img_name)
img_path = os.path.join(excel_save_folder,
'{}.jpg'.format(region['bbox']))
paragraph_pic = doc.add_paragraph()
paragraph_pic.alignment = WD_ALIGN_PARAGRAPH.CENTER
run = paragraph_pic.add_run("")
if flag == 1:
run.add_picture(img_path, width=shared.Inches(5))
elif flag == 2:
run.add_picture(img_path, width=shared.Inches(2))
elif region['type'] == 'Title':
doc.add_heading(region['res'][0]['text'])
elif region['type'] == 'Text':
paragraph = doc.add_paragraph()
paragraph_format = paragraph.paragraph_format
for i, line in enumerate(region['res']):
if i == 0:
paragraph_format.first_line_indent = shared.Inches(0.25)
text_run = paragraph.add_run(line['text'] + ' ')
text_run.font.size = shared.Pt(9)
elif region['type'] == 'Table':
pypandoc.convert(
source=region['res']['html'],
format='html',
to='docx',
outputfile='tmp.docx')
tmp_doc = Document('tmp.docx')
paragraph = doc.add_paragraph()
table = tmp_doc.tables[0]
new_table = deepcopy(table)
new_table.style = doc.styles['Table Grid']
from docx.enum.table import WD_TABLE_ALIGNMENT
new_table.alignment = WD_TABLE_ALIGNMENT.CENTER
paragraph.add_run().element.addnext(new_table._tbl)
os.remove('tmp.docx')
else:
continue
# save to docx
docx_path = os.path.join(save_folder, '{}.docx'.format(img_name))
doc.save(docx_path)
logger.info('docx save to {}'.format(docx_path))
def sorted_layout_boxes(res, w):
"""
Sort text boxes in order from top to bottom, left to right
args:
res(list):ppstructure results
return:
sorted results(list)
"""
num_boxes = len(res)
if num_boxes == 1:
res[0]['layout'] = 'single'
return res
sorted_boxes = sorted(res, key=lambda x: (x['bbox'][1], x['bbox'][0]))
_boxes = list(sorted_boxes)
new_res = []
res_left = []
res_right = []
i = 0
while True:
if i >= num_boxes:
break
if i == num_boxes - 1:
if _boxes[i]['bbox'][1] > _boxes[i - 1]['bbox'][3] and _boxes[i][
'bbox'][0] < w / 2 and _boxes[i]['bbox'][2] > w / 2:
new_res += res_left
new_res += res_right
_boxes[i]['layout'] = 'single'
new_res.append(_boxes[i])
else:
if _boxes[i]['bbox'][2] > w / 2:
_boxes[i]['layout'] = 'double'
res_right.append(_boxes[i])
new_res += res_left
new_res += res_right
elif _boxes[i]['bbox'][0] < w / 2:
_boxes[i]['layout'] = 'double'
res_left.append(_boxes[i])
new_res += res_left
new_res += res_right
res_left = []
res_right = []
break
elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3*w / 4:
_boxes[i]['layout'] = 'double'
res_left.append(_boxes[i])
i += 1
elif _boxes[i]['bbox'][0] > w / 4 and _boxes[i]['bbox'][2] > w / 2:
_boxes[i]['layout'] = 'double'
res_right.append(_boxes[i])
i += 1
else:
new_res += res_left
new_res += res_right
_boxes[i]['layout'] = 'single'
new_res.append(_boxes[i])
res_left = []
res_right = []
i += 1
if res_left:
new_res += res_left
if res_right:
new_res += res_right
return new_res
\ No newline at end of file
opencv-contrib-python==4.4.0.46
pypandoc
python-docx
\ No newline at end of file
......@@ -61,6 +61,11 @@ def init_args():
type=str2bool,
default=True,
help='In the forward, whether the non-table area is recognition by ocr')
parser.add_argument(
"--recovery",
type=bool,
default=False,
help='Whether to enable layout of recovery')
return parser
......
......@@ -131,7 +131,7 @@ class TextRecognizer(object):
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
def resize_norm_img_svtr(self, img, image_shape):
imgC, imgH, imgW = image_shape
......@@ -274,7 +274,7 @@ class TextRecognizer(object):
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
if self.rec_algorithm == "SAR":
norm_img, _, _, valid_ratio = self.resize_norm_img_sar(
img_list[indices[ino]], self.rec_image_shape)
......@@ -296,8 +296,8 @@ class TextRecognizer(object):
gsrm_slf_attn_bias2_list.append(norm_img[4])
norm_img_batch.append(norm_img[0])
elif self.rec_algorithm == "SVTR":
norm_img = self.resize_norm_img_svtr(
img_list[indices[ino]], self.rec_image_shape)
norm_img = self.resize_norm_img_svtr(img_list[indices[ino]],
self.rec_image_shape)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
else:
......@@ -405,9 +405,13 @@ def main(args):
valid_image_file_list = []
img_list = []
logger.info(
"In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', "
"if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320"
)
# warmup 2 times
if args.warmup:
img = np.random.uniform(0, 255, [32, 320, 3]).astype(np.uint8)
img = np.random.uniform(0, 255, [48, 320, 3]).astype(np.uint8)
for i in range(2):
res = text_recognizer([img] * int(args.rec_batch_num))
......
......@@ -133,6 +133,9 @@ def main(args):
os.makedirs(draw_img_save_dir, exist_ok=True)
save_results = []
logger.info("In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', "
"if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320")
# warm up 10 times
if args.warmup:
img = np.random.uniform(0, 255, [640, 640, 3]).astype(np.uint8)
......
......@@ -79,9 +79,9 @@ def init_args():
parser.add_argument("--det_fce_box_type", type=str, default='poly')
# params for text recognizer
parser.add_argument("--rec_algorithm", type=str, default='CRNN')
parser.add_argument("--rec_algorithm", type=str, default='SVTR_LCNet')
parser.add_argument("--rec_model_dir", type=str)
parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
parser.add_argument("--rec_image_shape", type=str, default="3, 48, 320")
parser.add_argument("--rec_batch_num", type=int, default=6)
parser.add_argument("--max_text_length", type=int, default=25)
parser.add_argument(
......@@ -269,11 +269,11 @@ def create_predictor(args, mode, logger):
max_input_shape.update(max_pact_shape)
opt_input_shape.update(opt_pact_shape)
elif mode == "rec":
if args.rec_algorithm != "CRNN":
if args.rec_algorithm not in ["CRNN", "SVTR_LCNet"]:
use_dynamic_shape = False
imgH = int(args.rec_image_shape.split(',')[-2])
min_input_shape = {"x": [1, 3, imgH, 10]}
max_input_shape = {"x": [args.rec_batch_num, 3, imgH, 1536]}
max_input_shape = {"x": [args.rec_batch_num, 3, imgH, 2304]}
opt_input_shape = {"x": [args.rec_batch_num, 3, imgH, 320]}
elif mode == "cls":
min_input_shape = {"x": [1, 3, 48, 10]}
......@@ -320,7 +320,7 @@ def create_predictor(args, mode, logger):
def get_output_tensors(args, mode, predictor):
output_names = predictor.get_output_names()
output_tensors = []
if mode == "rec" and args.rec_algorithm == "CRNN":
if mode == "rec" and args.rec_algorithm in ["CRNN", "SVTR_LCNet"]:
output_name = 'softmax_0.tmp_0'
if output_name in output_names:
return [predictor.get_output_handle(output_name)]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册