diff --git a/PPOCRLabel/README.md b/PPOCRLabel/README.md index 46c2ed85124fd8d992ac17f10ff257bda0b39cd8..368a835c203f62d529bb874b3cbbf7593b96a8ba 100644 --- a/PPOCRLabel/README.md +++ b/PPOCRLabel/README.md @@ -2,7 +2,7 @@ English | [简体中文](README_ch.md) # PPOCRLabel -PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PPOCR model to automatically detect and re-recognize data. It is written in python3 and pyqt5, supporting rectangular box annotation and four-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models. +PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PPOCR model to automatically detect and re-recognize data. It is written in python3 and pyqt5, supporting rectangular box, table and multi-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models. diff --git a/README.md b/README.md index 34421fff56ac33de940a0d2489adf21ebafded28..33898c7f84b9c46fa9b361e835c2f6a472169bfc 100644 --- a/README.md +++ b/README.md @@ -18,21 +18,25 @@ English | [简体中文](README_ch.md) PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
- +
- +
## Recent updates -- 2022.5.9 release PaddleOCR v2.5, including: - - [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%. - - [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image. - - Interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology. -- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR, [tutorial](./ppstructure/docs/kie_en.md)) and 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM, [tutorial](./ppstructure/vqa)). -- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. -- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](./ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). +- **🔥2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)** + - Release [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%. + - Release [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image. + - Release interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology. +- 2021.12.21 Release PaddleOCR [release/2.4](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4) + - Release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR). + - Release 1 key information extraction algorithm (SDMGR, [tutorial](./ppstructure/docs/kie_en.md)) and 3 [DocVQA](./ppstructure/vqa) algorithms (LayoutLM, LayoutLMv2, LayoutXLM). +- 2021.9.7 Release PaddleOCR [release/2.3](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.3) + - Release [PP-OCRv2](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv2). The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. +- 2021.8.3 Release PaddleOCR [release/2.2](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.2) + - Release a new structured documents analysis toolkit, i.e., [PP-Structure](./ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). - [more](./doc/doc_en/update_en.md) @@ -145,27 +149,27 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ## Visualization [more](./doc/doc_en/visualization_en.md)
-PP-OCRv2 Chinese model +PP-OCRv3 Chinese model
- - - - + + +
-PP-OCRv2 English model +PP-OCRv3 English model
- + +
-PP-OCRv2 Multilingual model +PP-OCRv3 Multilingual model
- - + +
diff --git a/README_ch.md b/README_ch.md index eb5cf8bce4ddb1e4be9d8f7e44c4e9a82d7e286f..9de3110531ae10004e3b29497a9baeb5fb6fc449 100755 --- a/README_ch.md +++ b/README_ch.md @@ -22,7 +22,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- +
## 近期更新 @@ -61,7 +61,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 开源社区 -- **加入社区👬:**微信扫描二维码并填写问卷之后,加入交流群领取福利 +- **加入社区👬:** 微信扫描二维码并填写问卷之后,加入交流群领取福利 - **获取5月11-13日每晚20:30《OCR超强技术详解与产业应用实战》的直播课链接** - **10G重磅OCR学习大礼包:**《动手学OCR》电子书,配套讲解视频和notebook项目;66篇OCR相关顶会前沿论文打包放送,包括CVPR、AAAI、IJCAI、ICCV等;PaddleOCR历次发版直播课视频;OCR社区优秀开发者项目分享视频。 @@ -87,6 +87,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure 系列模型下载](./ppstructure/docs/models_list.md) + ## 文档教程 - [运行环境准备](./doc/doc_ch/environment.md) @@ -125,11 +126,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - [文本识别算法](./doc/doc_ch/algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95) - [端到端算法](./doc/doc_ch/algorithm_overview.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95) - [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md) -- [场景应用](./doc/doc_ch/application.md) - - [金融场景(表单/票据等)]() - - [工业场景(电表度数/车牌等)]() - - [教育场景(手写体/公式等)]() - - [医疗场景(化验单等)]() +- [场景应用](./applications) - 数据标注与合成 - [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md) - [数据合成工具Style-Text](./StyleText/README_ch.md) @@ -158,36 +155,34 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 效果展示 [more](./doc/doc_ch/visualization.md)
-PP-OCRv2 中文模型 +PP-OCRv3 中文模型
- - -
-
- - + + +
-PP-OCRv2 英文模型 +PP-OCRv3 英文模型
- + +
-PP-OCRv2 其他语言模型 +PP-OCRv3 多语言模型
- - + +
diff --git "a/applications/\345\244\232\346\250\241\346\200\201\350\241\250\345\215\225\350\257\206\345\210\253.md" "b/applications/\345\244\232\346\250\241\346\200\201\350\241\250\345\215\225\350\257\206\345\210\253.md" index e64a22e169482ae51cadf8b25d75c5d98651e80b..d47bbe77045d502d82a5a8d8b8eca685963e6380 100644 --- "a/applications/\345\244\232\346\250\241\346\200\201\350\241\250\345\215\225\350\257\206\345\210\253.md" +++ "b/applications/\345\244\232\346\250\241\346\200\201\350\241\250\345\215\225\350\257\206\345\210\253.md" @@ -16,14 +16,14 @@
图1 多模态表单识别流程图
-注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: [多模态表单识别](https://aistudio.baidu.com/aistudio/projectdetail/3815918)(配备Tesla V100、A100等高级算力资源) +注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: [多模态表单识别](https://aistudio.baidu.com/aistudio/projectdetail/3884375)(配备Tesla V100、A100等高级算力资源) # 2 安装说明 -下载PaddleOCR源码,本项目中已经帮大家打包好的PaddleOCR(已经修改好配置文件),无需下载解压即可,只需安装依赖环境~ +下载PaddleOCR源码,上述AIStudio项目中已经帮大家打包好的PaddleOCR(已经修改好配置文件),无需下载解压即可,只需安装依赖环境~ ```python @@ -33,7 +33,7 @@ ```python # 如仍需安装or安装更新,可以执行以下步骤 -! git clone https://github.com/PaddlePaddle/PaddleOCR.git -b dygraph +# ! git clone https://github.com/PaddlePaddle/PaddleOCR.git -b dygraph # ! git clone https://gitee.com/PaddlePaddle/PaddleOCR ``` @@ -290,7 +290,7 @@ Eval.dataset.transforms.DetResizeForTest:评估尺寸,添加如下参数
图8 文本检测方案2-模型评估
-使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/ch_db_mv3-student1600-finetune/best_accuracy` +使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/ch_db_mv3-student1600-finetune/best_accuracy`,[模型下载地址](https://paddleocr.bj.bcebos.com/fanliku/sheet_recognition/ch_db_mv3-student1600-finetune.zip) ```python @@ -538,7 +538,7 @@ Train.dataset.ratio_list:动态采样
图16 文本识别方案3-模型评估
-使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/rec_mobile_pp-OCRv2-student-readldata/best_accuracy` +使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/rec_mobile_pp-OCRv2-student-readldata/best_accuracy`,[模型下载地址](https://paddleocr.bj.bcebos.com/fanliku/sheet_recognition/rec_mobile_pp-OCRv2-student-realdata.zip) ```python diff --git a/deploy/pdserving/ocr_cpp_client.py b/deploy/pdserving/ocr_cpp_client.py index cb42943923879d1138e065881a15da893a505083..7f9333dd858aad5440ff256d501cf1e5d2f5fb1f 100755 --- a/deploy/pdserving/ocr_cpp_client.py +++ b/deploy/pdserving/ocr_cpp_client.py @@ -30,7 +30,7 @@ client.load_client_config(sys.argv[1:]) client.connect(["127.0.0.1:9293"]) import paddle -test_img_dir = "test_img/" +test_img_dir = "../../doc/imgs/" ocr_reader = OCRReader(char_dict_path="../../ppocr/utils/ppocr_keys_v1.txt") @@ -45,8 +45,7 @@ for img_file in os.listdir(test_img_dir): image_data = file.read() image = cv2_to_base64(image_data) res_list = [] - fetch_map = client.predict( - feed={"x": image}, fetch=["save_infer_model/scale_0.tmp_1"], batch=True) + fetch_map = client.predict(feed={"x": image}, fetch=[], batch=True) one_batch_res = ocr_reader.postprocess(fetch_map, with_score=True) for res in one_batch_res: res_list.append(res[0]) diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md index 313ef9b15e7e3a2d8e7aa3ea31add75f18bb27e3..6227a21498eda7d8527e21e7f2567995251d9e47 100755 --- a/doc/doc_ch/algorithm_overview.md +++ b/doc/doc_ch/algorithm_overview.md @@ -44,7 +44,7 @@ 在CTW1500文本检测公开数据集上,算法效果如下: |模型|骨干网络|precision|recall|Hmean|下载链接| -| --- | --- | --- | --- | --- | --- | +| --- | --- | --- | --- | --- | --- | |FCE|ResNet50_dcn|88.39%|82.18%|85.27%|[训练模型](https://paddleocr.bj.bcebos.com/contribution/det_r50_dcn_fce_ctw_v2.0_train.tar)| **说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载: @@ -65,6 +65,7 @@ - [x] [NRTR](./algorithm_rec_nrtr.md) - [x] [SAR](./algorithm_rec_sar.md) - [x] [SEED](./algorithm_rec_seed.md) +- [x] [SVTR](./algorithm_rec_svtr.md) 参考[DTRB](https://arxiv.org/abs/1904.01906)[3]文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: @@ -82,6 +83,7 @@ |NRTR|NRTR_MTB| 84.21% | rec_mtb_nrtr | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) | |SAR|Resnet31| 87.20% | rec_r31_sar | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) | |SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | +|SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) | @@ -90,5 +92,3 @@ 已支持的端到端OCR算法列表(戳链接获取使用教程): - [x] [PGNet](./algorithm_e2e_pgnet.md) - - diff --git a/doc/doc_ch/application.md b/doc/doc_ch/application.md deleted file mode 100644 index 6dd465f9e71951bfbc1f749b0ca93d66cbfeb220..0000000000000000000000000000000000000000 --- a/doc/doc_ch/application.md +++ /dev/null @@ -1 +0,0 @@ -# 场景应用 \ No newline at end of file diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index 2012381af5a1cfe53771903e0ab99bab0b7cbc08..318d5874f5e01390976723ccdb98012b95a6eb7f 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -97,7 +97,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | |en_PP-OCRv3_rec_slim |【最新】slim量化版超轻量模型,支持英文、数字识别 | [en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) | -|en_PP-OCRv3_rec |【最新】原始超轻量模型,支持英文、数字识别|[en_PP-OCRv3_rec.yml](../../configs/rec/en_PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) | +|en_PP-OCRv3_rec |【最新】原始超轻量模型,支持英文、数字识别|[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) | |en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) | |en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | @@ -118,7 +118,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 | cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | 斯拉夫字母 | [cyrillic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/cyrillic_PP-OCRv3_rec.yml) |9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) | | devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt |梵文字母 | [devanagari_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/devanagari_PP-OCRv3_rec.yml) |9.9M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) | -更多支持语种请参考: [多语言模型](./multi_languages.md) +查看完整语种列表与使用教程请参考: [多语言模型](./multi_languages.md) diff --git a/doc/doc_ch/multi_languages.md b/doc/doc_ch/multi_languages.md index 6838b350403a7044e629e5fcc5893bced98af9d3..499fdd9881563b3a784b5f4ba4feace54f1a3a6a 100644 --- a/doc/doc_ch/multi_languages.md +++ b/doc/doc_ch/multi_languages.md @@ -2,6 +2,7 @@ **近期更新** +- 2022.5.8 更新`PP-OCRv3`版 多语言检测和识别模型,平均识别准确率提升5%以上。 - 2021.4.9 支持**80种**语言的检测和识别 - 2021.4.9 支持**轻量高精度**英文模型检测识别 @@ -254,7 +255,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs |英文|english|en| |乌克兰文|Ukranian|uk| |法文|french|fr| |白俄罗斯文|Belarusian|be| |德文|german|german| |泰卢固文|Telugu |te| -|日文|japan|japan| | 阿巴扎文 | Abaza | abq | +|日文|japan|japan| | 阿巴扎文 |Abaza | abq | |韩文|korean|korean| |泰米尔文|Tamil |ta| |中文繁体|chinese traditional |chinese_cht| |南非荷兰文 |Afrikaans |af| |意大利文| Italian |it| |阿塞拜疆文 |Azerbaijani |az| diff --git a/doc/doc_ch/ocr_book.md b/doc/doc_ch/ocr_book.md index fb2369e414ec454f0e3c51f4f2e83c1f5d155c6c..03a6011b6b921eff82ab41863058341fc599e41b 100644 --- a/doc/doc_ch/ocr_book.md +++ b/doc/doc_ch/ocr_book.md @@ -1,16 +1,25 @@ # 《动手学OCR》电子书 -特点: -- 覆盖OCR全栈技术 -- 理论实践相结合 -- Notebook交互式学习 -- 配套教学视频 +《动手学OCR》是PaddleOCR团队携手复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下: -[电子书下载]() +- 覆盖从文本检测识别到文档分析的OCR全栈技术 +- 紧密结合理论实践,跨越代码实现鸿沟,并配套教学视频 +- Notebook交互式学习,灵活修改代码,即刻获得结果 -目录: -![]() -[notebook教程](../../notebook/notebook_ch/) +## 本书结构 -[教学视频](https://aistudio.baidu.com/aistudio/education/group/info/25207) \ No newline at end of file +![](https://ai-studio-static-online.cdn.bcebos.com/5e612d9079b84958940614d9613eb928f1a50fe21ba6446cb99186bf2d76fe3d) + +- 第一部分是本书的推荐序、序言与预备知识,包含本书的定位与使用书籍内容的过程中需要用到的知识索引、资源链接等 +- 第二部分是本书的4-8章,介绍与OCR核心的检测、识别能力相关的概念、应用与产业实践。在“OCR技术导论”中总括性的解释OCR的应用场景和挑战、技术基本概念以及在产业应用中的痛点问题。然后在 +“文本检测”与“文本识别”两章中介绍OCR的两个基本任务,并在每章中配套一个算法展开代码详解与实战练习。第6、7章是关于PP-OCR系列模型的详细介绍,PP-OCR是一套面向产业应用的OCR系统,在 +基础检测和识别模型的基础之上经过一系列优化策略达到通用领域的产业级SOTA模型,同时打通多种预测部署方案,赋能企业快速落地OCR应用。 +- 第三部分是本书的9-12章,介绍两阶段OCR引擎之外的应用,包括数据合成、预处理算法、端到端模型,重点展开了OCR在文档场景下的版面分析、表格识别、视觉文档问答的能力,同样通过算法与代码结 +合的方式使得读者能够深入理解并应用。 + + +## 资料地址 +- 中文版电子书下载请扫描首页二维码入群后领取 +- [notebook教程](../../notebook/notebook_ch/) +- [教学视频](https://aistudio.baidu.com/aistudio/education/group/info/25207) diff --git a/doc/doc_ch/ppocr_introduction.md b/doc/doc_ch/ppocr_introduction.md index 14f95f1cd65da249d58da39c5228cb6d4bcb045e..59de124e2ab855d0b4abb90d0a356aefd6db586d 100644 --- a/doc/doc_ch/ppocr_introduction.md +++ b/doc/doc_ch/ppocr_introduction.md @@ -71,38 +71,28 @@ PP-OCRv3系统pipeline如下: ## 4. 效果展示 [more](./visualization.md)
-PP-OCRv2 中文模型 - -
- - -
+PP-OCRv3 中文模型
- - + + +
-
-
-PP-OCRv2 英文模型 - +PP-OCRv3 英文模型
- + +
-
-
-PP-OCRv2 其他语言模型 - +PP-OCRv3 多语言模型
- - + +
-
diff --git a/doc/doc_ch/visualization.md b/doc/doc_ch/visualization.md index 99d071ec22daccaa295b5087760c5fc0d45f9802..254634753282f53f367e44f4859e5b748f32bffd 100644 --- a/doc/doc_ch/visualization.md +++ b/doc/doc_ch/visualization.md @@ -1,23 +1,46 @@ # 效果展示 + + +## 超轻量PP-OCRv3效果展示 + +### PP-OCRv3中文模型 +
+ + + +
+ +### PP-OCRv3英文数字模型 + +
+ + + +
+ +### PP-OCRv3多语言模型 + +
+ + +
+ + ## 超轻量PP-OCRv2效果展示 - + + ## 通用PP-OCR server 效果展示
- - - - -
diff --git a/doc/doc_en/PP-OCRv3_introduction_en.md b/doc/doc_en/PP-OCRv3_introduction_en.md index 74b6086837148260742417b9471ac2dc4efeab9e..9ab25653e219c18e1acaaf7c99b050f790bcb1b9 100644 --- a/doc/doc_en/PP-OCRv3_introduction_en.md +++ b/doc/doc_en/PP-OCRv3_introduction_en.md @@ -99,7 +99,7 @@ Considering that the features of some channels will be suppressed if the convolu ## 3. Optimization for Text Recognition Model -The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability. +The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability. The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model. @@ -151,7 +151,7 @@ Due to the limited model structure supported by the MKLDNN acceleration library, 3. The experiment found that the prediction speed of the Global Mixing Block is related to the shape of the input features. Therefore, after moving the position of the Global Mixing Block to the back of pooling layer, the accuracy dropped to 71.9%, and the speed surpassed the PP-OCRv2-baseline based on the CNN structure by 22%. The network structure is as follows:
- +
The ablation experiments are as follows: @@ -172,7 +172,7 @@ Note: When testing the speed, the input image shape of 01-05 are all (3, 32, 320 [GTC](https://arxiv.org/pdf/2002.01276.pdf) (Guided Training of CTC), using the Attention module to guide the training of CTC to fuse multiple features is an effective strategy to improve text recognition accuracy. No more time-consuming is added in the inference process as the Attention module is completely removed during prediction. The accuracy of the recognition model is further improved to 75.8% (+1.82%). The training process is as follows:
- +
**(3)TextConAug:Data Augmentation Strategy for Mining Text Context Information** diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md index 0cee8f4a41088a8a4d4a8df86c8ebdbe41a2c814..18c9cd7d51bdf0129245afca8a759afab5d9d589 100755 --- a/doc/doc_en/algorithm_overview_en.md +++ b/doc/doc_en/algorithm_overview_en.md @@ -58,6 +58,7 @@ Supported text recognition algorithms (Click the link to get the tutorial): - [x] [NRTR](./algorithm_rec_nrtr_en.md) - [x] [SAR](./algorithm_rec_sar_en.md) - [x] [SEED](./algorithm_rec_seed_en.md) +- [x] [SVTR](./algorithm_rec_svtr_en.md) Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: @@ -75,6 +76,7 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r |NRTR|NRTR_MTB| 84.21% | rec_mtb_nrtr | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) | |SAR|Resnet31| 87.20% | rec_r31_sar | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) | |SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | +|SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) | diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index 15a7fdb94e303297f7be681f297a5e52613a268a..8e8c1f2fe11bcd0748d556d34fd184fed4b3a86f 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -1,4 +1,4 @@ -# OCR Model List(V2.1, updated on 2022.4.28) +# OCR Model List(V3, updated on 2022.4.28) > **Note** > 1. Compared with the model v2, the 3rd version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model has optimizations in accuracy and speed with CPU. > 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 or higher are the dynamic graph trained version and achieve close performance. @@ -91,7 +91,7 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|en_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting english, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) | +|en_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting english, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) | |en_PP-OCRv3_rec| [New] Original lightweight model, supporting english, English, multilingual text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) | |en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) | |en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | @@ -101,21 +101,18 @@ Relationship of the above models is as follows. |model name| dict file | description|config|model size|download| | --- | --- | --- |--- | --- | --- | -| french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt | Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_language/rec_french_lite_train.yml)|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) | -| german_mobile_v2.0_rec | ppocr/utils/dict/german_dict.txt | Lightweight model for German recognition|[rec_german_lite_train.yml](../../configs/rec/multi_language/rec_german_lite_train.yml)|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_train.tar) | -| korean_mobile_v2.0_rec | ppocr/utils/dict/korean_dict.txt | Lightweight model for Korean recognition|[rec_korean_lite_train.yml](../../configs/rec/multi_language/rec_korean_lite_train.yml)|3.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_train.tar) | -| japan_mobile_v2.0_rec | ppocr/utils/dict/japan_dict.txt | Lightweight model for Japanese recognition|[rec_japan_lite_train.yml](../../configs/rec/multi_language/rec_japan_lite_train.yml)|4.23M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_train.tar) | -| chinese_cht_mobile_v2.0_rec | ppocr/utils/dict/chinese_cht_dict.txt | Lightweight model for chinese cht recognition|rec_chinese_cht_lite_train.yml|5.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_train.tar) | -| te_mobile_v2.0_rec | ppocr/utils/dict/te_dict.txt | Lightweight model for Telugu recognition|rec_te_lite_train.yml|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_train.tar) | -| ka_mobile_v2.0_rec | ppocr/utils/dict/ka_dict.txt | Lightweight model for Kannada recognition|rec_ka_lite_train.yml|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_train.tar) | -| ta_mobile_v2.0_rec | ppocr/utils/dict/ta_dict.txt | Lightweight model for Tamil recognition|rec_ta_lite_train.yml|2.63M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_train.tar) | -| latin_mobile_v2.0_rec | ppocr/utils/dict/latin_dict.txt | Lightweight model for latin recognition | [rec_latin_lite_train.yml](../../configs/rec/multi_language/rec_latin_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_train.tar) | -| arabic_mobile_v2.0_rec | ppocr/utils/dict/arabic_dict.txt | Lightweight model for arabic recognition | [rec_arabic_lite_train.yml](../../configs/rec/multi_language/rec_arabic_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_train.tar) | -| cyrillic_mobile_v2.0_rec | ppocr/utils/dict/cyrillic_dict.txt | Lightweight model for cyrillic recognition | [rec_cyrillic_lite_train.yml](../../configs/rec/multi_language/rec_cyrillic_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_train.tar) | -| devanagari_mobile_v2.0_rec | ppocr/utils/dict/devanagari_dict.txt | Lightweight model for devanagari recognition | [rec_devanagari_lite_train.yml](../../configs/rec/multi_language/rec_devanagari_lite_train.yml) |2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_train.tar) | - -For more supported languages, please refer to : [Multi-language model](./multi_languages_en.md) - +| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |Lightweight model for Korean recognition|[korean_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/korean_PP-OCRv3_rec.yml)|11M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) | +| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |Lightweight model for Japanese recognition|[japan_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/japan_PP-OCRv3_rec.yml)|11M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) | +| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | Lightweight model for chinese cht|[chinese_cht_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/chinese_cht_PP-OCRv3_rec.yml)|12M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) | +| te_PP-OCRv3_rec | ppocr/utils/dict/te_dict.txt | Lightweight model for Telugu recognition |[te_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/te_PP-OCRv3_rec.yml)|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_train.tar) | +| ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt | Lightweight model for Kannada recognition |[ka_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ka_PP-OCRv3_rec.yml)|9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) | +| ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |Lightweight model for Tamil recognition|[ta_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ta_PP-OCRv3_rec.yml)|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) | +| latin_PP-OCRv3_rec | ppocr/utils/dict/latin_dict.txt | Lightweight model for latin recognition | [latin_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/latin_PP-OCRv3_rec.yml) |9.7M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_train.tar) | +| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | Lightweight model for arabic recognition | [arabic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/rec_arabic_lite_train.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) | +| cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | Lightweight model for cyrillic recognition | [cyrillic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/cyrillic_PP-OCRv3_rec.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) | +| devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt | Lightweight model for devanagari recognition | [devanagari_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/devanagari_PP-OCRv3_rec.yml) |9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) | + +For a complete list of languages ​​and tutorials, please refer to : [Multi-language model](./multi_languages_en.md) ## 3. Text Angle Classification Model diff --git a/doc/doc_en/multi_languages_en.md b/doc/doc_en/multi_languages_en.md index 9f09b531d9f6f9912b69804e57cf4e78f0c15531..4696a3e842242517d19bcac7d7bdef3b4c233b12 100644 --- a/doc/doc_en/multi_languages_en.md +++ b/doc/doc_en/multi_languages_en.md @@ -2,6 +2,7 @@ **Recent Update** +- 2022.5.8 update the `PP-OCRv3` version of the multi-language detection and recognition model, and the average recognition accuracy has increased by more than 5%. - 2021.4.9 supports the detection and recognition of 80 languages - 2021.4.9 supports **lightweight high-precision** English model detection and recognition diff --git a/doc/doc_en/ocr_book_en.md b/doc/doc_en/ocr_book_en.md index bbf202cbde31c25ef7da771fa03ad0819f2b7c4e..b0455fe61afe8ae456f224e57d346b1fed553eb4 100644 --- a/doc/doc_en/ocr_book_en.md +++ b/doc/doc_en/ocr_book_en.md @@ -1 +1,21 @@ -# E-book: *Dive Into OCR* \ No newline at end of file +# E-book: *Dive Into OCR* + +"Dive Into OCR" is a textbook that combines OCR theory and practice, written by the PaddleOCR team, Chen Zhineng, a Pre-tenure Professor at Fudan University, Huang Wenhui, a senior expert in the field of vision at China Mobile Research Institute, and other industry-university-research colleagues, as well as OCR developers. The main features are as follows: + +- OCR full-stack technology covering text detection, recognition and document analysis +- Closely integrate theory and practice, cross the code implementation gap, and supporting instructional videos +- Jupyter Notebook textbook, flexibly modifying code for instant results + +## Structure + +- The first part is the preliminary knowledge of the book, including the knowledge index and resource links needed in the process of positioning and using the book content of the book + +- The second part is chapters 4-8 of the book, which introduce the concepts, applications, and industry practices related to the detection and identification capabilities of the OCR engine. In the "Introduction to OCR Technology", the application scenarios and challenges of OCR, the basic concepts of technology, and the pain points in industrial applications are comprehensively explained. Then, in the two chapters of "Text Detection" and "Text Recognition", the two basic tasks of OCR are introduced. In each chapter, an algorithm is accompanied by a detailed explanation of the code and practical exercises. Chapters 6 and 7 are a detailed introduction to the PP-OCR series model, PP-OCR is a set of OCR systems for industrial applications, on the basis of the basic detection and identification model, after a series of optimization strategies to achieve the general field of industrial SOTA model, while opening up a variety of predictive deployment solutions, enabling enterprises to quickly land OCR applications. + +- The third part is chapter 9-12 of the book, which introduces applications other than the two-stage OCR engine, including data synthesis, preprocessing algorithm, and end-to-end model, focusing on OCR's layout analysis, table recognition, visual document question and answer capabilities in the document scene, and also through the combination of algorithm and code, so that readers can deeply understand and apply. + + +## Address +- [E-book: *Dive Into OCR* (link generating)]() +- [Jupyter notebook](../../notebook/notebook_en/) +- [videos (Chinese only)](https://aistudio.baidu.com/aistudio/education/group/info/25207) diff --git a/doc/doc_en/ppocr_introduction_en.md b/doc/doc_en/ppocr_introduction_en.md index b2895cc27b98564a99c73a9abf7ee0d7451176e1..8fe6bc683ac69bdff0e3b4297f2eaa95b934fa17 100644 --- a/doc/doc_en/ppocr_introduction_en.md +++ b/doc/doc_en/ppocr_introduction_en.md @@ -67,36 +67,28 @@ For the performance comparison between PP-OCR series models, please check the [b ## 4. Visualization [more](./visualization.md)
-PP-OCRv2 English model - +PP-OCRv3 Chinese model
- + + +
-
-PP-OCRv2 Chinese model - -
- - -
+PP-OCRv3 English model
- - + +
-
-PP-OCRv2 Multilingual model - +PP-OCRv3 Multilingual model
- - + +
-
diff --git a/doc/doc_en/visualization_en.md b/doc/doc_en/visualization_en.md index 71cfb043462f34f2b3bef594364d33f15e98d81e..8ea64925eabb55a68e00a1ee13b465cb260db29b 100644 --- a/doc/doc_en/visualization_en.md +++ b/doc/doc_en/visualization_en.md @@ -1,5 +1,30 @@ # Visualization + +## PP-OCRv3 + +### PP-OCRv3 Chinese model +
+ + + +
+ +### PP-OCRv3 English model + +
+ + + +
+ +### PP-OCRv3 Multilingual model + +
+ + +
+ ## PP-OCRv2 @@ -13,13 +38,6 @@ - - - - - - - diff --git a/doc/features.png b/doc/features.png index ea7214565869f95d5ecf35bc85be1a8c48318265..273e4beb74771b723ab732f703863fa2a3a4c21c 100644 Binary files a/doc/features.png and b/doc/features.png differ diff --git a/doc/features_en.png b/doc/features_en.png index 4e7baec4d563d50b1ace9e997e72b05ae1e803f0..310a1b7e50920304521a5fa68c5c2e2a881d3917 100644 Binary files a/doc/features_en.png and b/doc/features_en.png differ diff --git a/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg b/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c35936cc1a9509d4c2aec66bbd9c22345f10694d Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic001.jpg differ diff --git a/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg b/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg new file mode 100644 index 0000000000000000000000000000000000000000..e5ad6a4b2a3ab735ec15fad6bae428f4008226e0 Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic002.jpg differ diff --git a/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg b/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg new file mode 100644 index 0000000000000000000000000000000000000000..dc024296bdae41a32cf1aa0c5f396caa57383496 Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/ch/PP-OCRv3-pic003.jpg differ diff --git a/doc/imgs_results/PP-OCRv3/en/en_1.png b/doc/imgs_results/PP-OCRv3/en/en_1.png new file mode 100644 index 0000000000000000000000000000000000000000..36245613e304fce0e376fe78e795f9a76d3b6015 Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/en/en_1.png differ diff --git a/doc/imgs_results/PP-OCRv3/en/en_2.png b/doc/imgs_results/PP-OCRv3/en/en_2.png new file mode 100644 index 0000000000000000000000000000000000000000..d2df8556ad30a9f429d943cb940842a95056d604 Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/en/en_2.png differ diff --git a/doc/imgs_results/PP-OCRv3/en/en_3.png b/doc/imgs_results/PP-OCRv3/en/en_3.png new file mode 100644 index 0000000000000000000000000000000000000000..baf146c0102505a308656d92fdd89d5b1333ccb1 Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/en/en_3.png differ diff --git a/doc/imgs_results/PP-OCRv3/en/en_4.png b/doc/imgs_results/PP-OCRv3/en/en_4.png new file mode 100644 index 0000000000000000000000000000000000000000..f0f19db95b7917dc884bdb7d2c2f98b9e74c22e1 Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/en/en_4.png differ diff --git a/doc/imgs_results/PP-OCRv3/multi_lang/japan_2.jpg b/doc/imgs_results/PP-OCRv3/multi_lang/japan_2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..076ced92ad62b7e30b62a389a1849e1709dba87e Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/multi_lang/japan_2.jpg differ diff --git a/doc/imgs_results/PP-OCRv3/multi_lang/korean_1.jpg b/doc/imgs_results/PP-OCRv3/multi_lang/korean_1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f93de40e18fb3bff9d2379c5c61464a85ac3f344 Binary files /dev/null and b/doc/imgs_results/PP-OCRv3/multi_lang/korean_1.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg deleted file mode 100644 index b3d645779428bcce8c120976ef66bef10deee0c5..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg and /dev/null differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg deleted file mode 100644 index 7dba7708be61d912d574b610fc0b04cfa4e5feea..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg and /dev/null differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg deleted file mode 100644 index 2168ecd1f0acb75d7ecc9c15202f342d18111495..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg and /dev/null differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg deleted file mode 100644 index f1acbf0f94a1febbbf0d780ed019723b3dd78fa9..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg and /dev/null differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg deleted file mode 100644 index 59d9a5632d3054dbf8cc6bdb021ebef224c890a8..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg and /dev/null differ diff --git a/doc/ppocr_v3/GTC_en.png b/doc/ppocr_v3/GTC_en.png new file mode 100644 index 0000000000000000000000000000000000000000..a1a7fc52505f3f7f84f484fb1ee07d462e9e0648 Binary files /dev/null and b/doc/ppocr_v3/GTC_en.png differ diff --git a/doc/ppocr_v3/LCNet_SVTR_en.png b/doc/ppocr_v3/LCNet_SVTR_en.png new file mode 100644 index 0000000000000000000000000000000000000000..7890448470957cc7866a0b4e2cd09c36a788e213 Binary files /dev/null and b/doc/ppocr_v3/LCNet_SVTR_en.png differ diff --git a/paddleocr.py b/paddleocr.py index f7871db6470c75db82e8251dff5361c099c4adda..a1265f79def7018a5586be954127e5b7fdba011e 100644 --- a/paddleocr.py +++ b/paddleocr.py @@ -47,8 +47,8 @@ __all__ = [ ] SUPPORT_DET_MODEL = ['DB'] -VERSION = '2.5.0.1' -SUPPORT_REC_MODEL = ['CRNN'] +VERSION = '2.5.0.3' +SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet'] BASE_DIR = os.path.expanduser("~/.paddleocr/") DEFAULT_OCR_MODEL_VERSION = 'PP-OCRv3' diff --git a/ppstructure/README.md b/ppstructure/README.md index 0febf233d883e59e4377777e5b96e354853e2f33..72670e33575ebe444c78b15fbab4e330389a7498 100644 --- a/ppstructure/README.md +++ b/ppstructure/README.md @@ -40,7 +40,7 @@ The main features of PP-Structure are as follows: ### 4.1 Layout analysis and table recognition - + The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use. @@ -48,7 +48,7 @@ The figure shows the pipeline of layout analysis + table recognition. The image * SER * -![](../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../doc/vqa/result_ser/zh_val_42_ser.jpg) +![](docs/vqa/result_ser/zh_val_0_ser.jpg) | ![](docs/vqa/result_ser/zh_val_42_ser.jpg) ---|--- Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header: @@ -62,7 +62,7 @@ The corresponding category and OCR recognition results are also marked at the to * RE -![](../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../doc/vqa/result_re/zh_val_40_re.jpg) +![](docs/vqa/result_re/zh_val_21_re.jpg) | ![](docs/vqa/result_re/zh_val_40_re.jpg) ---|--- @@ -76,7 +76,7 @@ Start from [Quick Installation](./docs/quickstart.md) ### 6.1 Layout analysis and table recognition -![pipeline](../doc/table/pipeline.jpg) +![pipeline](docs/table/pipeline.jpg) In PP-Structure, the image will be divided into 5 types of areas **text, title, image list and table**. For the first 4 types of areas, directly use PP-OCR system to complete the text detection and recognition. For the table area, after the table structuring process, the table in image is converted into an Excel file with the same table style. diff --git a/ppstructure/docs/table/recovery.jpg b/ppstructure/docs/table/recovery.jpg new file mode 100644 index 0000000000000000000000000000000000000000..bee2e2fb3499ec4b348e2b2f1475a87c9c562190 Binary files /dev/null and b/ppstructure/docs/table/recovery.jpg differ diff --git a/ppstructure/predict_system.py b/ppstructure/predict_system.py index 7f18fcdf8e6b57be6e129f3271f5bb583f4da616..b0ede5f3a1b88df6efed53d7ca33a696bc7a7fff 100644 --- a/ppstructure/predict_system.py +++ b/ppstructure/predict_system.py @@ -23,6 +23,7 @@ sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) os.environ["FLAGS_allocator_strategy"] = 'auto_growth' import cv2 import json +import numpy as np import time import logging from copy import deepcopy @@ -33,6 +34,7 @@ from ppocr.utils.logging import get_logger from tools.infer.predict_system import TextSystem from ppstructure.table.predict_table import TableSystem, to_excel from ppstructure.utility import parse_args, draw_structure_result +from ppstructure.recovery.docx import convert_info_docx logger = get_logger() @@ -104,7 +106,12 @@ class StructureSystem(object): return_ocr_result_in_table) else: if self.text_system is not None: - filter_boxes, filter_rec_res = self.text_system(roi_img) + if args.recovery: + wht_im = np.ones(ori_im.shape, dtype=ori_im.dtype) + wht_im[y1:y2, x1:x2, :] = roi_img + filter_boxes, filter_rec_res = self.text_system(wht_im) + else: + filter_boxes, filter_rec_res = self.text_system(roi_img) # remove style char style_token = [ '', '', '', '', '', @@ -118,7 +125,8 @@ class StructureSystem(object): for token in style_token: if token in rec_str: rec_str = rec_str.replace(token, '') - box += [x1, y1] + if not args.recovery: + box += [x1, y1] res.append({ 'text': rec_str, 'confidence': float(rec_conf), @@ -192,6 +200,8 @@ def main(args): # img_save_path = os.path.join(save_folder, img_name + '.jpg') cv2.imwrite(img_save_path, draw_img) logger.info('result save to {}'.format(img_save_path)) + if args.recovery: + convert_info_docx(img, res, save_folder, img_name) elapse = time.time() - starttime logger.info("Predict time : {:.3f}s".format(elapse)) diff --git a/ppstructure/recovery/README.md b/ppstructure/recovery/README.md new file mode 100644 index 0000000000000000000000000000000000000000..883dbef3e829dfa213644b610af1ca279dac8641 --- /dev/null +++ b/ppstructure/recovery/README.md @@ -0,0 +1,86 @@ +English | [简体中文](README_ch.md) + +- [Getting Started](#getting-started) + - [1. Introduction](#1) + - [2. Install](#2) + - [2.1 Installation dependencies](#2.1) + - [2.2 Install PaddleOCR](#2.2) + - [3. Quick Start](#3) + + + +## 1. Introduction + +Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order. + +Layout recovery combines [layout analysis](../layout/README.md)、[table recognition](../table/README.md) to better recover images, tables, titles, etc. +The following figure shows the result: + +
+ +
+ + +## 2. Install + + + +### 2.1 Install dependencies + +- **(1) Install PaddlePaddle** + +```bash +python3 -m pip install --upgrade pip + +# GPU installation +python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple + +# CPU installation +python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple + +```` + +For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick). + + + +### 2.2 Install PaddleOCR + +- **(1) Download source code** + +```bash +[Recommended] git clone https://github.com/PaddlePaddle/PaddleOCR + +# If the pull cannot be successful due to network problems, you can also choose to use the hosting on the code cloud: +git clone https://gitee.com/paddlepaddle/PaddleOCR + +# Note: Code cloud hosting code may not be able to synchronize the update of this github project in real time, there is a delay of 3 to 5 days, please use the recommended method first. +```` + +- **(2) Install recovery's `requirements`** + +```bash +python3 -m pip install -r ppstructure/recovery/requirements.txt +```` + + + +## 3. Quick Start + +```python +cd PaddleOCR/ppstructure + +# download model +mkdir inference && cd inference +# Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar +# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar +# Download the ultra-lightweight English table inch model and unzip it +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar +cd .. +# run +python3 predict_system.py --det_model_dir=inference/en_PP-OCRv3_det_infer --rec_model_dir=inference/en_PP-OCRv3_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --rec_char_dict_path=../ppocr/utils/en_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --output ./output/table --rec_image_shape=3,48,320 --vis_font_path=../doc/fonts/simfang.ttf --recovery=True --image_dir=./docs/table/1.png +``` + +After running, the docx of each picture will be saved in the directory specified by the output field \ No newline at end of file diff --git a/ppstructure/recovery/README_ch.md b/ppstructure/recovery/README_ch.md new file mode 100644 index 0000000000000000000000000000000000000000..1f72f8de8a5e2eb51c8c4f58df30465f5361a301 --- /dev/null +++ b/ppstructure/recovery/README_ch.md @@ -0,0 +1,91 @@ +[English](README.md) | 简体中文 + +# 版面恢复使用说明 + +- [1. 简介](#1) +- [2. 安装](#2) + - [2.1 安装依赖](#2.1) + - [2.2 安装PaddleOCR](#2.2) + +- [3. 使用](#3) + + + + +## 1. 简介 + +版面恢复就是在OCR识别后,内容仍然像原文档图片那样排列着,段落不变、顺序不变的输出到word文档中等。 + +版面恢复结合了[版面分析](../layout/README_ch.md)、[表格识别](../table/README_ch.md)技术,从而更好地恢复图片、表格、标题等内容,下图展示了版面恢复的结果: + +
+ +
+ + +## 2. 安装 + + + +### 2.1 安装依赖 + +- **(1) 安装PaddlePaddle** + +```bash +python3 -m pip install --upgrade pip + +# GPU安装 +python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple + +# CPU安装 +python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple + +``` + +更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 + + + +### 2.2 安装PaddleOCR + +- **(1)下载版面恢复源码** + +```bash +【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR + +# 如果因为网络问题无法pull成功,也可选择使用码云上的托管: +git clone https://gitee.com/paddlepaddle/PaddleOCR + +# 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。 +``` + +- **(2)安装recovery的`requirements`** + +```bash +python3 -m pip install -r ppstructure/recovery/requirements.txt +``` + + + +## 3. 使用 + +恢复给定文档的版面: + +```python +cd PaddleOCR/ppstructure + +# 下载模型 +mkdir inference && cd inference +# 下载超英文轻量级PP-OCRv3模型的检测模型并解压 +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar +# 下载英文轻量级PP-OCRv3模型的识别模型并解压 +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar +# 下载超轻量级英文表格英寸模型并解压 +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar +cd .. +# 执行预测 +python3 predict_system.py --det_model_dir=inference/en_PP-OCRv3_det_infer --rec_model_dir=inference/en_PP-OCRv3_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --rec_char_dict_path=../ppocr/utils/en_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --output ./output/table --rec_image_shape=3,48,320 --vis_font_path=../doc/fonts/simfang.ttf --recovery=True --image_dir=./docs/table/1.png +``` + +运行完成后,每张图片的docx文档会保存到output字段指定的目录下 + diff --git a/ppstructure/recovery/docx.py b/ppstructure/recovery/docx.py new file mode 100644 index 0000000000000000000000000000000000000000..5278217d5b983008d357b6b1be3ab1b883a4939d --- /dev/null +++ b/ppstructure/recovery/docx.py @@ -0,0 +1,160 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import os +import pypandoc +from copy import deepcopy + +from docx import Document +from docx import shared +from docx.enum.text import WD_ALIGN_PARAGRAPH +from docx.enum.section import WD_SECTION +from docx.oxml.ns import qn + +from ppocr.utils.logging import get_logger +logger = get_logger() + + +def convert_info_docx(img, res, save_folder, img_name): + doc = Document() + doc.styles['Normal'].font.name = 'Times New Roman' + doc.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u'宋体') + doc.styles['Normal'].font.size = shared.Pt(6.5) + h, w, _ = img.shape + + res = sorted_layout_boxes(res, w) + flag = 1 + for i, region in enumerate(res): + if flag == 2 and region['layout'] == 'single': + section = doc.add_section(WD_SECTION.CONTINUOUS) + section._sectPr.xpath('./w:cols')[0].set(qn('w:num'), '1') + flag = 1 + elif flag == 1 and region['layout'] == 'double': + section = doc.add_section(WD_SECTION.CONTINUOUS) + section._sectPr.xpath('./w:cols')[0].set(qn('w:num'), '2') + flag = 2 + + if region['type'] == 'Figure': + excel_save_folder = os.path.join(save_folder, img_name) + img_path = os.path.join(excel_save_folder, + '{}.jpg'.format(region['bbox'])) + paragraph_pic = doc.add_paragraph() + paragraph_pic.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = paragraph_pic.add_run("") + if flag == 1: + run.add_picture(img_path, width=shared.Inches(5)) + elif flag == 2: + run.add_picture(img_path, width=shared.Inches(2)) + elif region['type'] == 'Title': + doc.add_heading(region['res'][0]['text']) + elif region['type'] == 'Text': + paragraph = doc.add_paragraph() + paragraph_format = paragraph.paragraph_format + for i, line in enumerate(region['res']): + if i == 0: + paragraph_format.first_line_indent = shared.Inches(0.25) + text_run = paragraph.add_run(line['text'] + ' ') + text_run.font.size = shared.Pt(9) + elif region['type'] == 'Table': + pypandoc.convert( + source=region['res']['html'], + format='html', + to='docx', + outputfile='tmp.docx') + tmp_doc = Document('tmp.docx') + paragraph = doc.add_paragraph() + + table = tmp_doc.tables[0] + new_table = deepcopy(table) + new_table.style = doc.styles['Table Grid'] + from docx.enum.table import WD_TABLE_ALIGNMENT + new_table.alignment = WD_TABLE_ALIGNMENT.CENTER + paragraph.add_run().element.addnext(new_table._tbl) + os.remove('tmp.docx') + else: + continue + + # save to docx + docx_path = os.path.join(save_folder, '{}.docx'.format(img_name)) + doc.save(docx_path) + logger.info('docx save to {}'.format(docx_path)) + + +def sorted_layout_boxes(res, w): + """ + Sort text boxes in order from top to bottom, left to right + args: + res(list):ppstructure results + return: + sorted results(list) + """ + num_boxes = len(res) + if num_boxes == 1: + res[0]['layout'] = 'single' + return res + + sorted_boxes = sorted(res, key=lambda x: (x['bbox'][1], x['bbox'][0])) + _boxes = list(sorted_boxes) + + new_res = [] + res_left = [] + res_right = [] + i = 0 + + while True: + if i >= num_boxes: + break + if i == num_boxes - 1: + if _boxes[i]['bbox'][1] > _boxes[i - 1]['bbox'][3] and _boxes[i][ + 'bbox'][0] < w / 2 and _boxes[i]['bbox'][2] > w / 2: + new_res += res_left + new_res += res_right + _boxes[i]['layout'] = 'single' + new_res.append(_boxes[i]) + else: + if _boxes[i]['bbox'][2] > w / 2: + _boxes[i]['layout'] = 'double' + res_right.append(_boxes[i]) + new_res += res_left + new_res += res_right + elif _boxes[i]['bbox'][0] < w / 2: + _boxes[i]['layout'] = 'double' + res_left.append(_boxes[i]) + new_res += res_left + new_res += res_right + res_left = [] + res_right = [] + break + elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3*w / 4: + _boxes[i]['layout'] = 'double' + res_left.append(_boxes[i]) + i += 1 + elif _boxes[i]['bbox'][0] > w / 4 and _boxes[i]['bbox'][2] > w / 2: + _boxes[i]['layout'] = 'double' + res_right.append(_boxes[i]) + i += 1 + else: + new_res += res_left + new_res += res_right + _boxes[i]['layout'] = 'single' + new_res.append(_boxes[i]) + res_left = [] + res_right = [] + i += 1 + if res_left: + new_res += res_left + if res_right: + new_res += res_right + return new_res \ No newline at end of file diff --git a/ppstructure/recovery/requirements.txt b/ppstructure/recovery/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..04187baa2a72d2ac60f0a4e5ce643f882b7255fb --- /dev/null +++ b/ppstructure/recovery/requirements.txt @@ -0,0 +1,3 @@ +opencv-contrib-python==4.4.0.46 +pypandoc +python-docx \ No newline at end of file diff --git a/ppstructure/utility.py b/ppstructure/utility.py index 938c12f951730ed1b81186608dd10efb383e8cfc..1ad902e7e6be95a6901e3774420fad337f594861 100644 --- a/ppstructure/utility.py +++ b/ppstructure/utility.py @@ -61,6 +61,11 @@ def init_args(): type=str2bool, default=True, help='In the forward, whether the non-table area is recognition by ocr') + parser.add_argument( + "--recovery", + type=bool, + default=False, + help='Whether to enable layout of recovery') return parser diff --git a/tools/infer/predict_rec.py b/tools/infer/predict_rec.py index 2abc0220937175f95ee4c1e4b0b949d24d5fa3e8..3664ef2caf4b888d6a3918202256c99cc54c5eb1 100755 --- a/tools/infer/predict_rec.py +++ b/tools/infer/predict_rec.py @@ -131,7 +131,7 @@ class TextRecognizer(object): padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) padding_im[:, :, 0:resized_w] = resized_image return padding_im - + def resize_norm_img_svtr(self, img, image_shape): imgC, imgH, imgW = image_shape @@ -274,7 +274,7 @@ class TextRecognizer(object): wh_ratio = w * 1.0 / h max_wh_ratio = max(max_wh_ratio, wh_ratio) for ino in range(beg_img_no, end_img_no): - + if self.rec_algorithm == "SAR": norm_img, _, _, valid_ratio = self.resize_norm_img_sar( img_list[indices[ino]], self.rec_image_shape) @@ -296,8 +296,8 @@ class TextRecognizer(object): gsrm_slf_attn_bias2_list.append(norm_img[4]) norm_img_batch.append(norm_img[0]) elif self.rec_algorithm == "SVTR": - norm_img = self.resize_norm_img_svtr( - img_list[indices[ino]], self.rec_image_shape) + norm_img = self.resize_norm_img_svtr(img_list[indices[ino]], + self.rec_image_shape) norm_img = norm_img[np.newaxis, :] norm_img_batch.append(norm_img) else: @@ -405,9 +405,13 @@ def main(args): valid_image_file_list = [] img_list = [] + logger.info( + "In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', " + "if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320" + ) # warmup 2 times if args.warmup: - img = np.random.uniform(0, 255, [32, 320, 3]).astype(np.uint8) + img = np.random.uniform(0, 255, [48, 320, 3]).astype(np.uint8) for i in range(2): res = text_recognizer([img] * int(args.rec_batch_num)) diff --git a/tools/infer/predict_system.py b/tools/infer/predict_system.py index 534f08fbcb3b90eea6a371d9f0ad128e276874c4..625d365f45c578d051974d7174e26246e9bc2442 100755 --- a/tools/infer/predict_system.py +++ b/tools/infer/predict_system.py @@ -133,6 +133,9 @@ def main(args): os.makedirs(draw_img_save_dir, exist_ok=True) save_results = [] + logger.info("In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', " + "if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320") + # warm up 10 times if args.warmup: img = np.random.uniform(0, 255, [640, 640, 3]).astype(np.uint8) diff --git a/tools/infer/utility.py b/tools/infer/utility.py index ce4e2d92c2851fc7c7ae2d2a371c755c82dc97e5..74ec42ec842abe0f214f13eea6b30a613cfc517b 100644 --- a/tools/infer/utility.py +++ b/tools/infer/utility.py @@ -79,9 +79,9 @@ def init_args(): parser.add_argument("--det_fce_box_type", type=str, default='poly') # params for text recognizer - parser.add_argument("--rec_algorithm", type=str, default='CRNN') + parser.add_argument("--rec_algorithm", type=str, default='SVTR_LCNet') parser.add_argument("--rec_model_dir", type=str) - parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320") + parser.add_argument("--rec_image_shape", type=str, default="3, 48, 320") parser.add_argument("--rec_batch_num", type=int, default=6) parser.add_argument("--max_text_length", type=int, default=25) parser.add_argument( @@ -269,11 +269,11 @@ def create_predictor(args, mode, logger): max_input_shape.update(max_pact_shape) opt_input_shape.update(opt_pact_shape) elif mode == "rec": - if args.rec_algorithm != "CRNN": + if args.rec_algorithm not in ["CRNN", "SVTR_LCNet"]: use_dynamic_shape = False imgH = int(args.rec_image_shape.split(',')[-2]) min_input_shape = {"x": [1, 3, imgH, 10]} - max_input_shape = {"x": [args.rec_batch_num, 3, imgH, 1536]} + max_input_shape = {"x": [args.rec_batch_num, 3, imgH, 2304]} opt_input_shape = {"x": [args.rec_batch_num, 3, imgH, 320]} elif mode == "cls": min_input_shape = {"x": [1, 3, 48, 10]} @@ -320,7 +320,7 @@ def create_predictor(args, mode, logger): def get_output_tensors(args, mode, predictor): output_names = predictor.get_output_names() output_tensors = [] - if mode == "rec" and args.rec_algorithm == "CRNN": + if mode == "rec" and args.rec_algorithm in ["CRNN", "SVTR_LCNet"]: output_name = 'softmax_0.tmp_0' if output_name in output_names: return [predictor.get_output_handle(output_name)]