提交 aa59fca5 编写于 作者: qq_25193841's avatar qq_25193841

Merge remote-tracking branch 'origin/dygraph' into dygraph

...@@ -3,10 +3,6 @@ English | [简体中文](README_ch.md) ...@@ -3,10 +3,6 @@ English | [简体中文](README_ch.md)
<p align="center"> <p align="center">
<img src="./doc/PaddleOCR_log.png" align="middle" width = "600"/> <img src="./doc/PaddleOCR_log.png" align="middle" width = "600"/>
<p align="center"> <p align="center">
------------------------------------------------------------------------------------------
<p align="left"> <p align="left">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a> <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a> <a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a>
...@@ -32,60 +28,42 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools ...@@ -32,60 +28,42 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
- [more](./doc/doc_en/update_en.md) - [more](./doc/doc_en/update_en.md)
## Features
- PP-OCR - A series of high-quality pre-trained models, comparable to commercial products
- Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M
- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
- Support multi-lingual recognition: about 80 languages like Korean, Japanese, German, French, etc
- PP-Structure: a document structurize system
- Support layout analysis and table recognition (support export to Excel)
- Support key information extraction
- Support DocVQA
- Rich OCR toolkit
- Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation
- Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image
- Support user-defined training, provides rich predictive inference deployment solutions
- Support PIP installation, easy to use
- Support Linux, Windows, MacOS and other systems
## Visualization
<div align="center"> ## Features
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="doc/imgs_results/multi_lang/img_01.jpg" width="800">
<img src="doc/imgs_results/multi_lang/img_02.jpg" width="800">
</div>
The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md). PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution [PP-OCR](./doc/doc_en/ppocr_introduction_en.md) and [PP-Structure](./ppstructure/README.md) on this basis, and get through the whole process of data production, model training, compression, inference and deployment.
<a name="Community"></a> ![](./doc/features_en.png)
## Community
- Scan the QR code below with your Wechat, you can join the official technical discussion group. Looking forward to your participation.
<div align="center"> > It is recommended to start with the “quick experience” in the document tutorial
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
</div>
## Quick Experience ## Quick Experience
You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) - Web online experience for the ultra-lightweight OCR: [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
- Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
- One line of code quick use: [Quick Start](./doc/doc_en/quickstart_en.md)
<a name="book"></a>
## E-book: *Dive Into OCR*
- [Dive Into OCR 📚](./doc/doc_en/ocr_book_en.md)
<a name="Community"></a>
## Community
Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) - **Join us**👬: Scan the QR code below with your Wechat, you can join the official technical discussion group. Looking forward to your participation.
- **Contribution**🏅️: [Contribution page](./doc/doc_en/thirdparty.md) contains various tools and applications developed by community developers using PaddleOCR, as well as the functions, optimized documents and codes contributed to PaddleOCR. It is an official honor wall for community developers and a broadcasting station to help publicize high-quality projects.
- **Regular Season**🎁: The community regular season is a point competition for OCR developers, covering four types: documents, codes, models and applications. Awards are selected and awarded on a quarterly basis. Please refer to the [link](https://github.com/PaddlePaddle/PaddleOCR/issues/4982) for more details.
Also, you can scan the QR code below to install the App (**Android support only**)
<div align="center"> <div align="center">
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" /> <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
</div> </div>
- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md)
<a name="Supported-Chinese-model-list"></a> <a name="Supported-Chinese-model-list"></a>
## PP-OCR Series Model List(Update on September 8th) ## PP-OCR Series Model List(Update on September 8th)
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | | Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
...@@ -95,72 +73,67 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr ...@@ -95,72 +73,67 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
| Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) | | Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) |
For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./doc/doc_en/models_list_en.md). - For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./doc/doc_en/models_list_en.md).
- For a new language request, please refer to [Guideline for new language_requests](#language_requests).
For a new language request, please refer to [Guideline for new language_requests](#language_requests). - For structural document analysis models, please refer to [PP-Structure models](./ppstructure/docs/models_list_en.md).
## Tutorials ## Tutorials
- [Environment Preparation](./doc/doc_en/environment_en.md) - [Environment Preparation](./doc/doc_en/environment_en.md)
- [Quick Start](./doc/doc_en/quickstart_en.md) - [Quick Start](./doc/doc_en/quickstart_en.md)
- [PaddleOCR Overview and Project Clone](./doc/doc_en/paddleOCR_overview_en.md) - [PP-OCR 🔥](./doc/doc_en/ppocr_introduction_en.md)
- PP-OCR Industry Landing: from Training to Deployment - [Quick Start](./doc/doc_en/quickstart_en.md)
- [PP-OCR Model Zoo](./doc/doc_en/models_en.md) - [Model Zoo](./doc/doc_en/models_en.md)
- [PP-OCR Model Download](./doc/doc_en/models_list_en.md) - [Model training](./doc/doc_en/training_en.md)
- [Python Inference for PP-OCR Model Zoo](./doc/doc_en/inference_ppocr_en.md)
- [PP-OCR Training](./doc/doc_en/training_en.md)
- [Text Detection](./doc/doc_en/detection_en.md) - [Text Detection](./doc/doc_en/detection_en.md)
- [Text Recognition](./doc/doc_en/recognition_en.md) - [Text Recognition](./doc/doc_en/recognition_en.md)
- [Text Direction Classification](./doc/doc_en/angle_class_en.md) - [Text Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md) - Model Compression
- PP-OCR Models Compression
- [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md)
- [Model Quantization](./deploy/slim/quantization/README_en.md) - [Model Quantization](./deploy/slim/quantization/README_en.md)
- [Model Pruning](./deploy/slim/prune/README_en.md) - [Model Pruning](./deploy/slim/prune/README_en.md)
- Inference and Deployment - [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md)
- [C++ Inference](./deploy/cpp_infer/readme_en.md) - [Inference and Deployment](./deploy/README.md)
- [Python Inference](./doc/doc_en/inference_ppocr_en.md)
- [C++ Inference](./deploy/cpp_infer/readme.md)
- [Serving](./deploy/pdserving/README.md) - [Serving](./deploy/pdserving/README.md)
- [Mobile](./deploy/lite/readme_en.md) - [Mobile](./deploy/lite/readme.md)
- [Paddle2ONNX](./deploy/paddle2onnx/readme.md)
- [Benchmark](./doc/doc_en/benchmark_en.md) - [Benchmark](./doc/doc_en/benchmark_en.md)
- [PP-Structure: Information Extraction](./ppstructure/README.md) - [PP-Structure 🔥](./ppstructure/README.md)
- [Quick Start](./ppstructure/docs/quickstart_en.md)
- [Model Zoo](./ppstructure/docs/models_list_en.md)
- [Model training](./doc/doc_en/training_en.md)
- [Layout Parser](./ppstructure/layout/README.md) - [Layout Parser](./ppstructure/layout/README.md)
- [Table Recognition](./ppstructure/table/README.md) - [Table Recognition](./ppstructure/table/README.md)
- [DocVQA](./ppstructure/vqa/README.md) - [DocVQA](./ppstructure/vqa/README.md)
- [Key Information Extraction](./ppstructure/docs/kie.md) - [Key Information Extraction](./ppstructure/docs/kie_en.md)
- Academic Circles - [Inference and Deployment](./deploy/README.md)
- [Two-stage Algorithm](./doc/doc_en/algorithm_overview_en.md) - [Python Inference](./ppstructure/docs/inference_en.md)
- [PGNet Algorithm](./doc/doc_en/pgnet_en.md) - [C++ Inference]()
- [Python Inference](./doc/doc_en/inference_en.md) - [Serving](./deploy/pdserving/README.md)
- [Academic algorithms](./doc/doc_en/algorithms_en.md)
- [Text detection](./doc/doc_en/algorithm_overview_en.md)
- [Text recognition](./doc/doc_en/algorithm_overview_en.md)
- [End-to-end](./doc/doc_en/algorithm_overview_en.md)
- [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md)
- Data Annotation and Synthesis - Data Annotation and Synthesis
- [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md) - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
- [Data Synthesis Tool: Style-Text](./StyleText/README.md) - [Data Synthesis Tool: Style-Text](./StyleText/README.md)
- [Other Data Annotation Tools](./doc/doc_en/data_annotation_en.md) - [Other Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
- [Other Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md) - [Other Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
- Datasets - Datasets
- [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md) - [General OCR Datasets(Chinese/English)](doc/doc_en/dataset/datasets_en.md)
- [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md) - [HandWritten_OCR_Datasets(Chinese)](doc/doc_en/dataset/handwritten_datasets_en.md)
- [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md) - [Various OCR Datasets(multilingual)](doc/doc_en/dataset/vertical_and_multilingual_datasets_en.md)
- [Code Structure](./doc/doc_en/tree_en.md)
- [Visualization](#Visualization) - [Visualization](#Visualization)
- [Community](#Community)
- [New language requests](#language_requests) - [New language requests](#language_requests)
- [FAQ](./doc/doc_en/FAQ_en.md) - [FAQ](./doc/doc_en/FAQ_en.md)
- [Community](#Community)
- [References](./doc/doc_en/reference_en.md) - [References](./doc/doc_en/reference_en.md)
- [License](#LICENSE) - [License](#LICENSE)
- [Contribution](#CONTRIBUTION)
<a name="PP-OCRv2"></a>
## PP-OCRv2 Pipeline
<div align="center">
<img src="./doc/ppocrv2_framework.jpg" width="800">
</div>
[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144).
<a name="Visualization"></a>
## Visualization [more](./doc/doc_en/visualization_en.md) ## Visualization [more](./doc/doc_en/visualization_en.md)
- Chinese OCR model - Chinese OCR model
<div align="center"> <div align="center">
...@@ -198,19 +171,3 @@ More details, please refer to [Multilingual OCR Development Plan](https://github ...@@ -198,19 +171,3 @@ More details, please refer to [Multilingual OCR Development Plan](https://github
<a name="LICENSE"></a> <a name="LICENSE"></a>
## License ## License
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a> This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
<a name="CONTRIBUTION"></a>
## Contribution
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation.
- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitignore and discard set PYTHONPATH manually.
- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure.
- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets.
- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively.
- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style.
- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services.
- Thanks [lijinhan](https://github.com/lijinhan) for contributing a new way, i.e., java SpringBoot, to achieve the request for the Hubserving deployment.
- Thanks [Mejans](https://github.com/Mejans) for contributing the Occitan corpus and character set.
- Thanks [LKKlein](https://github.com/LKKlein) for contributing a new deploying package with the Golang program language.
- Thanks [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself) and [1084667371](https://github.com/1084667371) for contributing a new data annotation tool, i.e., PPOCRLabel。
...@@ -17,11 +17,18 @@ ...@@ -17,11 +17,18 @@
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力开发者训练出更好的模型,并应用落地。 PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力开发者训练出更好的模型,并应用落地。
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
</div>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
## 近期更新 ## 近期更新
- 2021.12.21《动手学OCR · 十讲》课程开讲,12月21日起每晚八点半线上授课![免费报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207) - 2021.12.21《动手学OCR · 十讲》课程开讲,12月21日起每晚八点半线上授课![免费报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207)
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR,[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM,[文档](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa))。 - 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR,[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM,[文档](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa))。
- PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[课程回放](https://aistudio.baidu.com/aistudio/education/group/info/6758)
- 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。 - 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。 - 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
...@@ -29,41 +36,35 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -29,41 +36,35 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 特性 ## 特性
- PP-OCR系列高质量预训练模型,准确的识别效果 支持多种OCR相关前沿算法,在此基础上打造产业级特色模型[PP-OCR](./doc/doc_ch/ppocr_introduction.md)[PP-Structure](./ppstructure/README_ch.md),并打通数据生产、模型训练、压缩、预测部署全流程。
- 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M ![](./doc/features.png)
- 通用PP-OCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语等约80种语言
- PP-Structure文档结构化系统
- 支持版面分析与表格识别(含Excel导出)
- 支持关键信息提取任务
- 支持DocVQA任务
- 丰富易用的OCR相关工具组件
- 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
- 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
- 支持用户自定义训练,提供丰富的预测推理部署方案
- 支持PIP快速安装使用
- 可运行于Linux、Windows、MacOS等多种系统
> 上述内容的使用方法建议从文档教程中的快速开始体验 > 上述内容的使用方法建议从文档教程中的快速开始体验
<a name="贡献代码"></a>
## 社区、社区贡献与社区常规赛 ## 快速开始
- 在线网站体验:超轻量PP-OCR mobile模型体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
- 移动端demo体验:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)
- 一行命令快速使用:[快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md)
<a name="电子书"></a>
## 《动手学OCR》电子书
- [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md)
- 加入社区:微信扫描下方二维码加入官方交流群,与各行各业开发者充分交流,期待您的加入。 <a name="开源社区"></a>
- 社区贡献:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙、也是帮助优质项目宣传的广播站。如果您的OCR项目未被收集在文档中,可根据文档说明与我们联系。 ## 开源社区
- 社区常规赛:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)
- **加入社区**👬:微信扫描下方二维码加入官方交流群,与各行各业开发者充分交流,期待您的加入。
- **社区贡献**🏅️:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙,也是帮助优质项目宣传的广播站。
- **社区常规赛**🎁:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)
<div align="center"> <div align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" /> <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
</div> </div>
## 零代码体验
- 在线网站体验:超轻量PP-OCR mobile模型体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)
<a name="模型下载"></a> <a name="模型下载"></a>
## PP-OCR系列模型列表(更新中) ## PP-OCR系列模型列表(更新中)
...@@ -74,73 +75,82 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -74,73 +75,82 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | | 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| 中英文通用PP-OCR server模型(143.4M) | ch_ppocr_server_v2.0_xx | 服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | | 中英文通用PP-OCR server模型(143.4M) | ch_ppocr_server_v2.0_xx | 服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md) 更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure 系列模型下载](./ppstructure/docs/models_list.md)
## 文档教程 ## 文档教程
- [运行环境准备](./doc/doc_ch/environment.md) - [运行环境准备](./doc/doc_ch/environment.md)
- [快速开始(中英文/多语言/版面分析)](./doc/doc_ch/quickstart.md) - [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md)
- [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md) - [PP-OCR文本检测识别🔥](./doc/doc_ch/ppocr_introduction.md)
- PP-OCR产业落地:从训练到部署 - [快速开始](./doc/doc_ch/quickstart.md)
- [PP-OCR模型库](./doc/doc_ch/models.md) - [模型库](./doc/doc_ch/models_list.md)
- [PP-OCR模型下载](./doc/doc_ch/models_list.md) - [模型训练](./doc/doc_ch/training.md)
- [Python引擎的PP-OCR模型库推理](./doc/doc_ch/inference_ppocr.md)
- [PP-OCR模型训练](./doc/doc_ch/training.md)
- [文本检测](./doc/doc_ch/detection.md) - [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md) - [文本识别](./doc/doc_ch/recognition.md)
- [文本方向分类器](./doc/doc_ch/angle_class.md) - [文本方向分类器](./doc/doc_ch/angle_class.md)
- [配置文件内容与生成](./doc/doc_ch/config.md) - 模型压缩
- PP-OCR模型压缩
- [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
- [模型量化](./deploy/slim/quantization/README.md) - [模型量化](./deploy/slim/quantization/README.md)
- [模型裁剪](./deploy/slim/prune/README.md) - [模型裁剪](./deploy/slim/prune/README.md)
- PP-OCR模型推理部署 - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
- [推理部署](./deploy/README_ch.md)
- [基于Python预测引擎推理](./doc/doc_ch/inference_ppocr.md)
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/pdserving/README_CN.md) - [服务化部署](./deploy/pdserving/README_CN.md)
- [端侧部署](./deploy/lite/readme.md) - [端侧部署](./deploy/lite/readme.md)
- [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md) - [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md)
- [Benchmark](./doc/doc_ch/benchmark.md) - [Benchmark](./doc/doc_ch/benchmark.md)
- [PP-Structure信息提取](./ppstructure/README_ch.md) - [PP-Structure文档分析🔥](./ppstructure/README_ch.md)
- [快速开始](./ppstructure/docs/quickstart.md)
- [模型库](./ppstructure/docs/models_list.md)
- [模型训练](./doc/doc_ch/training.md)
- [版面分析](./ppstructure/layout/README_ch.md) - [版面分析](./ppstructure/layout/README_ch.md)
- [表格识别](./ppstructure/table/README_ch.md) - [表格识别](./ppstructure/table/README_ch.md)
- [DocVQA](./ppstructure/vqa/README.md)
- [关键信息提取](./ppstructure/docs/kie.md) - [关键信息提取](./ppstructure/docs/kie.md)
- OCR学术圈 - [DocVQA](./ppstructure/vqa/README_ch.md)
- [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md) - [推理部署](./deploy/README_ch.md)
- [端到端PGNet算法](./doc/doc_ch/pgnet.md) - [基于Python预测引擎推理](./ppstructure/docs/inference.md)
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md) - [基于C++预测引擎推理]()
- [服务化部署](./deploy/pdserving/README_CN.md)
- [前沿算法与模型🚀](./doc/doc_ch/algorithm.md)
- [文本检测算法](./doc/doc_ch/algorithm_overview.md#11-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E7%AE%97%E6%B3%95)
- [文本识别算法](./doc/doc_ch/algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [端到端算法](./doc/doc_ch/algorithm_overview.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md) - [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md)
- [场景应用](./doc/doc_ch/application.md)
- [金融场景(表单/票据等)]()
- [工业场景(电表度数/车牌等)]()
- [教育场景(手写体/公式等)]()
- [医疗场景(化验单等)]()
- 数据标注与合成 - 数据标注与合成
- [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md) - [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md)
- [数据合成工具Style-Text](./StyleText/README_ch.md) - [数据合成工具Style-Text](./StyleText/README_ch.md)
- [其它数据标注工具](./doc/doc_ch/data_annotation.md) - [其它数据标注工具](./doc/doc_ch/data_annotation.md)
- [其它数据合成工具](./doc/doc_ch/data_synthesis.md) - [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
- 数据集 - 数据集
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md) - [通用中英文OCR数据集](doc/doc_ch/dataset/datasets.md)
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) - [手写中文OCR数据集](doc/doc_ch/dataset/handwritten_datasets.md)
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md) - [垂类多语言OCR数据集](doc/doc_ch/dataset/vertical_and_multilingual_datasets.md)
- [版面分析数据集](doc/doc_ch/dataset/layout_datasets.md)
- [表格识别数据集](doc/doc_ch/dataset/table_datasets.md)
- [DocVQA数据集](doc/doc_ch/dataset/docvqa_datasets.md)
- [代码组织结构](./doc/doc_ch/tree.md)
- [效果展示](#效果展示) - [效果展示](#效果展示)
- [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md)
- [开源社区](#开源社区)
- FAQ - FAQ
- [通用问题](./doc/doc_ch/FAQ.md) - [通用问题](./doc/doc_ch/FAQ.md)
- [PaddleOCR实战问题](./doc/doc_ch/FAQ.md) - [PaddleOCR实战问题](./doc/doc_ch/FAQ.md)
- [参考文献](./doc/doc_ch/reference.md) - [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书) - [许可证书](#许可证书)
- [代码组织结构](./doc/doc_ch/tree.md)
<a name="PP-OCRv2"></a>
## PP-OCRv2 Pipeline
<div align="center">
<img src="./doc/ppocrv2_framework.jpg" width="800">
</div>
[1] PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
[2] PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./doc/doc_ch/enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)
<a name="效果展示"></a> <a name="效果展示"></a>
## 效果展示 [more](./doc/doc_ch/visualization.md) ## 效果展示 [more](./doc/doc_ch/visualization.md)
- 中文模型
<details open>
<summary>PP-OCRv2 中文模型</summary>
<div align="center"> <div align="center">
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800"> <img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
...@@ -151,16 +161,49 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -151,16 +161,49 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800"> <img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
</div> </div>
- 英文模型 </details>
<details open>
<summary>PP-OCRv2 英文模型</summary>
<div align="center"> <div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800"> <img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
</div> </div>
- 其他语言模型 </details>
<details open>
<summary>PP-OCRv2 其他语言模型</summary>
<div align="center"> <div align="center">
<img src="./doc/imgs_results/french_0.jpg" width="800"> <img src="./doc/imgs_results/french_0.jpg" width="800">
<img src="./doc/imgs_results/korean.jpg" width="800"> <img src="./doc/imgs_results/korean.jpg" width="800">
</div> </div>
</details>
<details open>
<summary>PP-Structure 文档分析</summary>
- 版面分析+表格识别
<div align="center">
<img src="./ppstructure/docs/table/ppstructure.GIF" width="800">
</div>
- SER(语义实体识别)
<div align="center">
<img src="./ppstructure/docs/vqa/result_ser/zh_val_0_ser.jpg" width="800">
</div>
- RE(关系提取)
<div align="center">
<img src="./ppstructure/docs/vqa/result_re/zh_val_21_re.jpg" width="800">
</div>
</details>
<a name="许可证书"></a> <a name="许可证书"></a>
## 许可证书 ## 许可证书
......
# 1 项目说明
计算机视觉在金融领域的应用覆盖文字识别、图像识别、视频识别等,其中文字识别(OCR)是金融领域中的核心AI能力,其应用覆盖客户服务、风险防控、运营管理等各项业务,针对的对象包括通用卡证票据识别(银行卡、身份证、营业执照等)、通用文本表格识别(印刷体、多语言、手写体等)以及一些金融特色票据凭证。通过因此如果能够在结构化信息提取时同时利用文字、页面布局等信息,便可增强不同版式下的泛化性。
表单识别旨在识别各种具有表格性质的证件、房产证、营业执照、个人信息表、发票等关键键值对(如姓名-张三),其广泛应用于银行、证券、公司财务等领域,具有很高的商业价值。本次范例项目开源了全流程表单识别方案,能够在多个场景快速实现迁移能力。表单识别通常存在以下难点:
- 人工摘录工作效率低;
- 国内常见表单版式多;
- 传统技术方案泛化效果不满足。
表单识别包含两大阶段:OCR阶段和文档视觉问答阶段。
其中,OCR阶段选取了PaddleOCR的PP-OCRv2模型,主要由文本检测和文本识别两个模块组成。DOC-VQA文档视觉问答阶段基于PaddleNLP自然语言处理算法库实现的LayoutXLM模型,支持基于多模态方法的语义实体识别(Semantic Entity Recognition, SER)以及关系抽取(Relation Extraction, RE)任务。本案例流程如 **图1** 所示:
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/9bd844b970f94e5ba0bc0c5799bd819ea9b1861bb306471fabc2d628864d418e'></center>
<center>图1 多模态表单识别流程图</center>
注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: [多模态表单识别](https://aistudio.baidu.com/aistudio/projectdetail/3815918)(配备Tesla V100、A100等高级算力资源)
# 2 安装说明
下载PaddleOCR源码,本项目中已经帮大家打包好的PaddleOCR(已经修改好配置文件),无需下载解压即可,只需安装依赖环境~
```python
! unzip -q PaddleOCR.zip
```
```python
# 如仍需安装or安装更新,可以执行以下步骤
! git clone https://github.com/PaddlePaddle/PaddleOCR.git -b dygraph
# ! git clone https://gitee.com/PaddlePaddle/PaddleOCR
```
```python
# 安装依赖包
! pip install -U pip
! pip install -r /home/aistudio/PaddleOCR/requirements.txt
! pip install paddleocr
! pip install yacs gnureadline paddlenlp==2.2.1
! pip install xlsxwriter
```
# 3 数据准备
这里使用[XFUN数据集](https://github.com/doc-analysis/XFUND)做为实验数据集。 XFUN数据集是微软提出的一个用于KIE任务的多语言数据集,共包含七个数据集,每个数据集包含149张训练集和50张验证集
分别为:ZH(中文)、JA(日语)、ES(西班牙)、FR(法语)、IT(意大利)、DE(德语)、PT(葡萄牙)
本次实验选取中文数据集作为我们的演示数据集。法语数据集作为实践课程的数据集,数据集样例图如 **图2** 所示。
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/0f84137778cd4ab6899c64109d452290e9c678ccf01744978bc9c0647adbba45" width="1000" ></center>
<center>图2 数据集样例,左中文,右法语</center>
## 3.1 下载处理好的数据集
处理好的XFUND中文数据集下载地址:[https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar) ,可以运行如下指令完成中文数据集下载和解压。
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/31e3dbee31d441d2a36d45b5af660e832dfa2f437f4d49a1914312a15b6a29a7"></center>
<center>图3 下载数据集</center>
```python
! wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
! tar -xf XFUND.tar
# XFUN其他数据集使用下面的代码进行转换
# 代码链接:https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppstructure/vqa/helper/trans_xfun_data.py
# %cd PaddleOCR
# !python3 ppstructure/vqa/tools/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
# %cd ../
```
运行上述指令后在 /home/aistudio/PaddleOCR/ppstructure/vqa/XFUND 目录下有2个文件夹,目录结构如下所示:
```bash
/home/aistudio/PaddleOCR/ppstructure/vqa/XFUND
└─ zh_train/ 训练集
├── image/ 图片存放文件夹
├── xfun_normalize_train.json 标注信息
└─ zh_val/ 验证集
├── image/ 图片存放文件夹
├── xfun_normalize_val.json 标注信息
```
该数据集的标注格式为
```bash
{
"height": 3508, # 图像高度
"width": 2480, # 图像宽度
"ocr_info": [
{
"text": "邮政地址:", # 单个文本内容
"label": "question", # 文本所属类别
"bbox": [261, 802, 483, 859], # 单个文本框
"id": 54, # 文本索引
"linking": [[54, 60]], # 当前文本和其他文本的关系 [question, answer]
"words": []
},
{
"text": "湖南省怀化市市辖区",
"label": "answer",
"bbox": [487, 810, 862, 859],
"id": 60,
"linking": [[54, 60]],
"words": []
}
]
}
```
## 3.2 转换为PaddleOCR检测和识别格式
使用XFUND训练PaddleOCR检测和识别模型,需要将数据集格式改为训练需求的格式。
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/9a709f19e7174725a8cfb09fd922ade74f8e9eb73ae1438596cbb2facef9c24a"></center>
<center>图4 转换为OCR格式</center>
- **文本检测** 标注文件格式如下,中间用'\t'分隔:
" 图像文件名 json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 `transcription` 表示当前文本框的文字,***当其内容为“###”时,表示该文本框无效,在训练时会跳过。***
- **文本识别** 标注文件的格式如下, txt文件中默认请将图片路径和图片标签用'\t'分割,如用其他方式分割将造成训练报错。
```
" 图像文件名 图像标注信息 "
train_data/rec/train/word_001.jpg 简单可依赖
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
...
```
```python
! unzip -q /home/aistudio/data/data140302/XFUND_ori.zip -d /home/aistudio/data/data140302/
```
已经提供转换脚本,执行如下代码即可转换成功:
```python
%cd /home/aistudio/
! python trans_xfund_data.py
```
# 4 OCR
选用飞桨OCR开发套件[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README_ch.md)中的PP-OCRv2模型进行文本检测和识别。PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/enhanced_ctc_loss.md)损失函数改进,进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)
## 4.1 文本检测
我们使用2种方案进行训练、评估:
- **PP-OCRv2中英文超轻量检测预训练模型**
- **XFUND数据集+fine-tune**
### **4.1.1 方案1:预训练模型**
**1)下载预训练模型**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/2aff41ee8fce4e9bac8295cc00720217bde2aeee7ee7473689848bed0b6fde05"></center>
<center>图5 文本检测方案1-下载预训练模型</center>
PaddleOCR已经提供了PP-OCR系列模型,部分模型展示如下表所示:
| 模型简介 | 模型名称 | 推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------------------------------- | ----------------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| 中英文通用PP-OCR server模型(143.4M) | ch_ppocr_server_v2.0_xx | 服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md)
这里我们使用PP-OCRv2中英文超轻量检测模型,下载并解压预训练模型:
```python
%cd /home/aistudio/PaddleOCR/pretrain/
! wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar
! tar -xf ch_PP-OCRv2_det_distill_train.tar && rm -rf ch_PP-OCRv2_det_distill_train.tar
% cd ..
```
**2)模型评估**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/75b0e977dfb74a83851f8828460759f337b1b7a0c33c47a08a30f3570e1e2e74"></center>
<center>图6 文本检测方案1-模型评估</center>
接着使用下载的超轻量检测模型在XFUND验证集上进行评估,由于蒸馏需要包含多个网络,甚至多个Student网络,在计算指标的时候只需要计算一个Student网络的指标即可,key字段设置为Student则表示只计算Student网络的精度。
```
Metric:
name: DistillationMetric
base_metric_name: DetMetric
main_indicator: hmean
key: "Student"
```
首先修改配置文件`configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml`中的以下字段:
```
Eval.dataset.data_dir:指向验证集图片存放目录
Eval.dataset.label_file_list:指向验证集标注文件
```
然后在XFUND验证集上进行评估,具体代码如下:
```python
%cd /home/aistudio/PaddleOCR
! python tools/eval.py \
-c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml \
-o Global.checkpoints="./pretrain_models/ch_PP-OCRv2_det_distill_train/best_accuracy"
```
使用预训练模型进行评估,指标如下所示:
| 方案 | hmeans |
| -------- | -------- |
| PP-OCRv2中英文超轻量检测预训练模型 | 77.26% |
使用文本检测预训练模型在XFUND验证集上评估,达到77%左右,充分说明ppocr提供的预训练模型有一定的泛化能力。
### **4.1.2 方案2:XFUND数据集+fine-tune**
PaddleOCR提供的蒸馏预训练模型包含了多个模型的参数,我们提取Student模型的参数,在XFUND数据集上进行finetune,可以参考如下代码:
```python
import paddle
# 加载预训练模型
all_params = paddle.load("pretrain/ch_PP-OCRv2_det_distill_train/best_accuracy.pdparams")
# 查看权重参数的keys
# print(all_params.keys())
# 学生模型的权重提取
s_params = {key[len("student_model."):]: all_params[key] for key in all_params if "student_model." in key}
# 查看学生模型权重参数的keys
print(s_params.keys())
# 保存
paddle.save(s_params, "pretrain/ch_PP-OCRv2_det_distill_train/student.pdparams")
```
**1)模型训练**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/560c44b8dd604da7987bd25da0a882156ffcfb7f6bcb44108fe9bde77512e572"></center>
<center>图7 文本检测方案2-模型训练</center>
修改配置文件`configs/det/ch_PP-OCRv2_det_student.yml`中的以下字段:
```
Global.pretrained_model:指向预训练模型路径
Train.dataset.data_dir:指向训练集图片存放目录
Train.dataset.label_file_list:指向训练集标注文件
Eval.dataset.data_dir:指向验证集图片存放目录
Eval.dataset.label_file_list:指向验证集标注文件
Optimizer.lr.learning_rate:调整学习率,本实验设置为0.005
Train.dataset.transforms.EastRandomCropData.size:训练尺寸改为[1600, 1600]
Eval.dataset.transforms.DetResizeForTest:评估尺寸,添加如下参数
limit_side_len: 1600
limit_type: 'min'
```
执行下面命令启动训练:
```python
! CUDA_VISIBLE_DEVICES=0 python tools/train.py \
-c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml
```
**2)模型评估**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/5a75137c5f924dfeb6956b5818812298cc3dc7992ac84954b4175be9adf83c77"></center>
<center>图8 文本检测方案2-模型评估</center>
使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/ch_db_mv3-student1600-finetune/best_accuracy`
```python
%cd /home/aistudio/PaddleOCR/
! python tools/eval.py \
-c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml \
-o Global.checkpoints="pretrain/ch_db_mv3-student1600-finetune/best_accuracy"
```
同时我们提供了未finetuen的模型,配置文件参数(`pretrained_model`设置为空,`learning_rate` 设置为0.001)
```python
%cd /home/aistudio/PaddleOCR/
! python tools/eval.py \
-c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml \
-o Global.checkpoints="pretrain/ch_db_mv3-student1600/best_accuracy"
```
使用训练好的模型进行评估,指标如下所示:
| 方案 | hmeans |
| -------- | -------- |
| XFUND数据集 | 79.27% |
| XFUND数据集+fine-tune | 85.24% |
对比仅使用XFUND数据集训练的模型,使用XFUND数据集+finetune训练,在验证集上评估达到85%左右,说明 finetune会提升垂类场景效果。
**3)导出模型**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/07c3b060c54e4b00be7de8d41a8a4696ff53835343cc4981aab0555183306e79"></center>
<center>图9 文本检测方案2-模型导出</center>
在模型训练过程中保存的模型文件是包含前向预测和反向传播的过程,在实际的工业部署则不需要反向传播,因此需要将模型进行导成部署需要的模型格式。 执行下面命令,即可导出模型。
```python
# 加载配置文件`ch_PP-OCRv2_det_student.yml`,从`pretrain/ch_db_mv3-student1600-finetune`目录下加载`best_accuracy`模型
# inference模型保存在`./output/det_db_inference`目录下
%cd /home/aistudio/PaddleOCR/
! python tools/export_model.py \
-c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml \
-o Global.pretrained_model="pretrain/ch_db_mv3-student1600-finetune/best_accuracy" \
Global.save_inference_dir="./output/det_db_inference/"
```
转换成功后,在目录下有三个文件:
```
/inference/rec_crnn/
├── inference.pdiparams # 识别inference模型的参数文件
├── inference.pdiparams.info # 识别inference模型的参数信息,可忽略
└── inference.pdmodel # 识别inference模型的program文件
```
**4)模型预测**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/0d582de9aa46474791e08654f84a614a6510e98bfe5f4ad3a26501cbf49ec151"></center>
<center>图10 文本检测方案2-模型预测</center>
加载上面导出的模型,执行如下命令对验证集或测试集图片进行预测:
```
det_model_dir:预测模型
image_dir:测试图片路径
use_gpu:是否使用GPU
```
检测可视化结果保存在`/home/aistudio/inference_results/`目录下,查看检测效果。
```python
%pwd
!python tools/infer/predict_det.py \
--det_algorithm="DB" \
--det_model_dir="./output/det_db_inference/" \
--image_dir="./doc/vqa/input/zh_val_21.jpg" \
--use_gpu=True
```
总结,我们分别使用PP-OCRv2中英文超轻量检测预训练模型、XFUND数据集+finetune2种方案进行评估、训练等,指标对比如下:
| 方案 | hmeans | 结果分析 |
| -------- | -------- | -------- |
| PP-OCRv2中英文超轻量检测预训练模型 | 77.26% | ppocr提供的预训练模型有一定的泛化能力 |
| XFUND数据集 | 79.27% | |
| XFUND数据集+finetune | 85.24% | finetune会提升垂类场景效果 |
## 4.2 文本识别
我们分别使用如下3种方案进行训练、评估:
- PP-OCRv2中英文超轻量识别预训练模型
- XFUND数据集+fine-tune
- XFUND数据集+fine-tune+真实通用识别数据
### **4.2.1 方案1:预训练模型**
**1)下载预训练模型**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/b7230e9964074181837e1132029f9da8178bf564ac5c43a9a93a30e975c0d8b4"></center>
<center>图11 文本识别方案1-下载预训练模型</center>
我们使用PP-OCRv2中英文超轻量文本识别模型,下载并解压预训练模型:
```python
%cd /home/aistudio/PaddleOCR/pretrain/
! wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar
! tar -xf ch_PP-OCRv2_rec_train.tar && rm -rf ch_PP-OCRv2_rec_train.tar
% cd ..
```
**2)模型评估**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/166ce56d634c4c7589fe68fbc6e7ae663305dcc82ba144c781507341ffae7fe8"></center>
<center>图12 文本识别方案1-模型评估</center>
首先修改配置文件`configs/det/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml`中的以下字段:
```
Eval.dataset.data_dir:指向验证集图片存放目录
Eval.dataset.label_file_list:指向验证集标注文件
```
我们使用下载的预训练模型进行评估:
```python
%cd /home/aistudio/PaddleOCR
! CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
-c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml \
-o Global.checkpoints=./pretrain/ch_PP-OCRv2_rec_train/best_accuracy
```
使用预训练模型进行评估,指标如下所示:
| 方案 | acc |
| -------- | -------- |
| PP-OCRv2中英文超轻量识别预训练模型 | 67.48% |
使用文本预训练模型在XFUND验证集上评估,acc达到67%左右,充分说明ppocr提供的预训练模型有一定的泛化能力。
### **4.2.2 方案2:XFUND数据集+finetune**
同检测模型,我们提取Student模型的参数,在XFUND数据集上进行finetune,可以参考如下代码:
```python
import paddle
# 加载预训练模型
all_params = paddle.load("pretrain/ch_PP-OCRv2_rec_train/best_accuracy.pdparams")
# 查看权重参数的keys
print(all_params.keys())
# 学生模型的权重提取
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# 查看学生模型权重参数的keys
print(s_params.keys())
# 保存
paddle.save(s_params, "pretrain/ch_PP-OCRv2_rec_train/student.pdparams")
```
**1)模型训练**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/06dad690219b42a59a27c84a060af1436bcd05de10b843209c6270e04e4dda10"></center>
<center>图13 文本识别方案2-模型训练</center>
修改配置文件`configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml`中的以下字段:
```
Global.pretrained_model:指向预训练模型路径
Global.character_dict_path: 字典路径
Optimizer.lr.values:学习率
Train.dataset.data_dir:指向训练集图片存放目录
Train.dataset.label_file_list:指向训练集标注文件
Eval.dataset.data_dir:指向验证集图片存放目录
Eval.dataset.label_file_list:指向验证集标注文件
```
执行如下命令启动训练:
```python
%cd /home/aistudio/PaddleOCR/
! CUDA_VISIBLE_DEVICES=0 python tools/train.py \
-c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml
```
**2)模型评估**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/c07c88f708ad43cc8cd615861626d0e8333c0e3d4dda49ac8cba1f8939fa8a94"></center>
<center>图14 文本识别方案2-模型评估</center>
使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/rec_mobile_pp-OCRv2-student-finetune/best_accuracy`
```python
%cd /home/aistudio/PaddleOCR/
! CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
-c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml \
-o Global.checkpoints=./pretrain/rec_mobile_pp-OCRv2-student-finetune/best_accuracy
```
使用预训练模型进行评估,指标如下所示:
| 方案 | acc |
| -------- | -------- |
| XFUND数据集+finetune | 72.33% |
使用XFUND数据集+finetune训练,在验证集上评估达到72%左右,说明 finetune会提升垂类场景效果。
### **4.2.3 方案3:XFUND数据集+finetune+真实通用识别数据**
接着我们在上述`XFUND数据集+finetune`实验的基础上,添加真实通用识别数据,进一步提升识别效果。首先准备真实通用识别数据,并上传到AIStudio:
**1)模型训练**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/45f288ce8b2c45d8aa5407785b4b40f4876fc3da23744bd7a78060797fba0190"></center>
<center>图15 文本识别方案3-模型训练</center>
在上述`XFUND数据集+finetune`实验中修改配置文件`configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml`的基础上,继续修改以下字段:
```
Train.dataset.label_file_list:指向真实识别训练集图片存放目录
Train.dataset.ratio_list:动态采样
```
执行如下命令启动训练:
```python
%cd /home/aistudio/PaddleOCR/
! CUDA_VISIBLE_DEVICES=0 python tools/train.py \
-c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml
```
**2)模型评估**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/965db9f758614c6f9be301286cd5918f21110603c8aa4a1dbf5371e3afeec782"></center>
<center>图16 文本识别方案3-模型评估</center>
使用训练好的模型进行评估,更新模型路径`Global.checkpoints`,这里为大家提供训练好的模型`./pretrain/rec_mobile_pp-OCRv2-student-readldata/best_accuracy`
```python
! CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
-c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml \
-o Global.checkpoints=./pretrain/rec_mobile_pp-OCRv2-student-realdata/best_accuracy
```
使用预训练模型进行评估,指标如下所示:
| 方案 | acc |
| -------- | -------- |
| XFUND数据集+fine-tune+真实通用识别数据 | 85.29% |
使用XFUND数据集+finetune训练,在验证集上评估达到85%左右,说明真实通用识别数据对于性能提升很有帮助。
**3)导出模型**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/3dc7f69fac174cde96b9d08b5e2353a1d88dc63e7be9410894c0783660b35b76"></center>
<center>图17 文本识别方案3-导出模型</center>
导出模型只保留前向预测的过程:
```python
!python tools/export_model.py \
-c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml \
-o Global.pretrained_model=pretrain/rec_mobile_pp-OCRv2-student-realdata/best_accuracy \
Global.save_inference_dir=./output/rec_crnn_inference/
```
**4)模型预测**
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/60b95b4945954f81a080a8f308cee66f83146479cd1142b9b6b1290938fd1df8"></center>
<center>图18 文本识别方案3-模型预测</center>
加载上面导出的模型,执行如下命令对验证集或测试集图片进行预测,检测可视化结果保存在`/home/aistudio/inference_results/`目录下,查看检测、识别效果。需要通过`--rec_char_dict_path`指定使用的字典路径
```python
! python tools/infer/predict_system.py \
--image_dir="./doc/vqa/input/zh_val_21.jpg" \
--det_model_dir="./output/det_db_inference/" \
--rec_model_dir="./output/rec_crnn_inference/" \
--rec_image_shape="3, 32, 320" \
--rec_char_dict_path="/home/aistudio/XFUND/word_dict.txt"
```
总结,我们分别使用PP-OCRv2中英文超轻量检测预训练模型、XFUND数据集+finetune2种方案进行评估、训练等,指标对比如下:
| 方案 | acc | 结果分析 |
| -------- | -------- | -------- |
| PP-OCRv2中英文超轻量识别预训练模型 | 67.48% | ppocr提供的预训练模型有一定的泛化能力 |
| XFUND数据集+fine-tune |72.33% | finetune会提升垂类场景效果 |
| XFUND数据集+fine-tune+真实通用识别数据 | 85.29% | 真实通用识别数据对于性能提升很有帮助 |
# 5 文档视觉问答(DOC-VQA)
VQA指视觉问答,主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种,DOC-VQA主要针对文本图像的文字内容提出问题。
PaddleOCR中DOC-VQA系列算法基于PaddleNLP自然语言处理算法库实现LayoutXLM论文,支持基于多模态方法的 **语义实体识别 (Semantic Entity Recognition, SER)** 以及 **关系抽取 (Relation Extraction, RE)** 任务。
如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。
```python
%cd pretrain
#下载SER模型
! wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar && tar -xvf ser_LayoutXLM_xfun_zh.tar
#下载RE模型
! wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -xvf re_LayoutXLM_xfun_zh.tar
%cd ../
```
## 5.1 SER
SER: 语义实体识别 (Semantic Entity Recognition), 可以完成对图像中的文本识别与分类。
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b' width='700'></center>
<center>图19 SER测试效果图</center>
**图19** 中不同颜色的框表示不同的类别,对于XFUND数据集,有QUESTION, ANSWER, HEADER 3种类别
- 深紫色:HEADER
- 浅紫色:QUESTION
- 军绿色:ANSWER
在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
#### 5.1.1 模型训练
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/2e45f297c9d44ca5b8718ae100a365f7348eaeed4cb8495b904f28a9c8075d8a"></center>
<center>图20 SER-模型训练</center>
启动训练之前,需要修改配置文件 `configs/vqa/ser/layoutxlm.yml` 以下四个字段:
1. Train.dataset.data_dir:指向训练集图片存放目录
2. Train.dataset.label_file_list:指向训练集标注文件
3. Eval.dataset.data_dir:指指向验证集图片存放目录
4. Eval.dataset.label_file_list:指向验证集标注文件
```python
%cd /home/aistudio/PaddleOCR/
! CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/vqa/ser/layoutxlm.yml
```
最终会打印出`precision`, `recall`, `hmean`等指标。 在`./output/ser_layoutxlm/`文件夹中会保存训练日志,最优的模型和最新epoch的模型。
#### 5.1.2 模型评估
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/5df160ac39ee4d9e92a937094bc53a737272f9f2abeb4ddfaebb48e8eccf1be2"></center>
<center>图21 SER-模型评估</center>
我们使用下载的预训练模型进行评估,如果使用自己训练好的模型进行评估,将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段即可。
```python
! CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
-c configs/vqa/ser/layoutxlm.yml \
-o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
```
最终会打印出`precision`, `recall`, `hmean`等指标,预训练模型评估指标如下:
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/2854aee557a74079a82dd5cd57e48bc2ce97974d5637477fb4deea137d0e312c' width='700'></center>
<center>图 SER预训练模型评估指标</center>
#### 5.1.3 模型预测
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/0f7d50a0fb924b408b93e1fbd6ca64148eed34a2e6724280acd3e113fef7dc48"></center>
<center>图22 SER-模型预测</center>
使用如下命令即可完成`OCR引擎 + SER`的串联预测, 以SER预训练模型为例:
```python
! CUDA_VISIBLE_DEVICES=0 python tools/infer_vqa_token_ser.py \
-c configs/vqa/ser/layoutxlm.yml \
-o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ \
Global.infer_img=doc/vqa/input/zh_val_42.jpg
```
最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为`infer_results.txt`。通过如下命令查看预测图片:
```python
import cv2
from matplotlib import pyplot as plt
# 在notebook中使用matplotlib.pyplot绘图时,需要添加该命令进行显示
%matplotlib inline
img = cv2.imread('output/ser/zh_val_42_ser.jpg')
plt.figure(figsize=(48,24))
plt.imshow(img)
```
## 5.2 RE
基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/4de19ca3e54343e88961e816cad28bbacdc807f40b9440be914d871b0a914570' width='700'></center>
<center>图23 RE预测效果图</center>
图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
#### 5.2.1 模型训练
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/268c707a62c54e93958d2b2ab29e0932953aad41819e44aaaaa05c8ad85c6491"></center>
<center>图24 RE-模型训练</center>
启动训练之前,需要修改配置文件`configs/vqa/re/layoutxlm.yml`中的以下四个字段
Train.dataset.data_dir:指向训练集图片存放目录
Train.dataset.label_file_list:指向训练集标注文件
Eval.dataset.data_dir:指指向验证集图片存放目录
Eval.dataset.label_file_list:指向验证集标注文件
```python
! CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml
```
最终会打印出`precision`, `recall`, `hmean`等指标。 在`./output/re_layoutxlm/`文件夹中会保存训练日志,最优的模型和最新epoch的模型
#### 5.2.2 模型评估
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/93c66a43a69e472899c1c6732408b7a42e99a43721e94e9ca3c0a64e080306e4"></center>
<center>图25 RE-模型评估</center>
我们使用下载的预训练模型进行评估,如果使用自己训练好的模型进行评估,将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段即可。
```python
! CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py \
-c configs/vqa/re/layoutxlm.yml \
-o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/
```
最终会打印出`precision`, `recall`, `hmean`等指标,预训练模型评估指标如下:
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/f99af54fb2d14691a73b1a748e0ca22618aeddfded0c4da58bbbb03edb8c2340' width='700'></center>
<center>图 RE预训练模型评估指标</center>
#### 5.2.3 模型预测
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/bab32d32bdec4339b9a3e5f911e4b41f77996f3faabc40bd8309b5b20cad31e4"></center>
<center>图26 RE-模型预测</center>
使用OCR引擎 + SER + RE串联预测
使用如下命令即可完成OCR引擎 + SER + RE的串联预测, 以预训练SER和RE模型为例:
最终会在config.Global.save_res_path字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为infer_results.txt。
```python
%cd /home/aistudio/PaddleOCR
! CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser_re.py \
-c configs/vqa/re/layoutxlm.yml \
-o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/ \
Global.infer_img=test_imgs/ \
-c_ser configs/vqa/ser/layoutxlm.yml \
-o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
```
最终会在config.Global.save_res_path字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为infer_results.txt, 每一行表示一张图片的结果,每张图片的结果如下所示,前面表示测试图片路径,后面为测试结果:key字段及对应的value字段。
```
test_imgs/t131.jpg {"政治面税": "群众", "性别": "男", "籍贯": "河北省邯郸市", "婚姻状况": "亏末婚口已婚口已娇", "通讯地址": "邯郸市阳光苑7号楼003", "民族": "汉族", "毕业院校": "河南工业大学", "户口性质": "口农村城镇", "户口地址": "河北省邯郸市", "联系电话": "13288888888", "健康状况": "健康", "姓名": "小六", "好高cm": "180", "出生年月": "1996年8月9日", "文化程度": "本科", "身份证号码": "458933777777777777"}
````
```python
# 展示预测结果
import cv2
from matplotlib import pyplot as plt
%matplotlib inline
img = cv2.imread('./output/re/t131_ser.jpg')
plt.figure(figsize=(48,24))
plt.imshow(img)
```
# 6 导出Excel
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/ab93d3d90d77437a81c9534b2dd1d3e39ef81e8473054fd3aeff6e837ebfb827"></center>
<center>图27 导出Excel</center>
为了输出信息匹配对,我们修改`tools/infer_vqa_token_ser_re.py`文件中的`line 194-197`。
```
fout.write(img_path + "\t" + json.dumps(
{
"ser_resule": result,
}, ensure_ascii=False) + "\n")
```
更改为
```
result_key = {}
for ocr_info_head, ocr_info_tail in result:
result_key[ocr_info_head['text']] = ocr_info_tail['text']
fout.write(img_path + "\t" + json.dumps(
result_key, ensure_ascii=False) + "\n")
```
同时将输出结果导出到Excel中,效果如 图28 所示:
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/9f45d3eef75e4842a0828bb9e518c2438300264aec0646cc9addfce860a04196' width='700'></center>
<center>图28 Excel效果图</center>
```python
import json
import xlsxwriter as xw
workbook = xw.Workbook('output/re/infer_results.xlsx')
format1 = workbook.add_format({
'align': 'center',
'valign': 'vcenter',
'text_wrap': True,
})
worksheet1 = workbook.add_worksheet('sheet1')
worksheet1.activate()
title = ['姓名', '性别', '民族', '文化程度', '身份证号码', '联系电话', '通讯地址']
worksheet1.write_row('A1', title)
i = 2
with open('output/re/infer_results.txt', 'r', encoding='utf-8') as fin:
lines = fin.readlines()
for line in lines:
img_path, result = line.strip().split('\t')
result_key = json.loads(result)
# 写入Excel
row_data = [result_key['姓名'], result_key['性别'], result_key['民族'], result_key['文化程度'], result_key['身份证号码'],
result_key['联系电话'], result_key['通讯地址']]
row = 'A' + str(i)
worksheet1.write_row(row, row_data, format1)
i+=1
workbook.close()
```
# 更多资源
- 更多深度学习知识、产业案例、面试宝典等,请参考:[awesome-DeepLearning](https://github.com/paddlepaddle/awesome-DeepLearning)
- 更多PaddleOCR使用教程,请参考:[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph)
- 更多PaddleNLP使用教程,请参考:[PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP)
- 飞桨框架相关资料,请参考:[飞桨深度学习平台](https://www.paddlepaddle.org.cn/?fr=paddleEdu_aistudio)
# 参考链接
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
- XFUND dataset, https://github.com/doc-analysis/XFUND
Global:
debug: false
use_gpu: true
epoch_num: 100
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v3_rotnet
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model: null
checkpoints: null
save_inference_dir: null
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: 25
infer_mode: false
use_space_char: true
save_res_path: ./output/rec/predicts_chinese_lite_v2.0.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
regularizer:
name: L2
factor: 1.0e-05
Architecture:
model_type: cls
algorithm: CLS
Transform: null
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Neck:
Head:
name: ClsHead
class_dim: 4
Loss:
name: ClsLoss
main_indicator: acc
PostProcess:
name: ClsPostProcess
Metric:
name: ClsMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecAug:
use_tia: False
- RandAugment:
- SSLRotateResize:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys: ["image", "label"]
loader:
collate_fn: "SSLRotateCollate"
shuffle: true
batch_size_per_card: 32
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- SSLRotateResize:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys: ["image", "label"]
loader:
collate_fn: "SSLRotateCollate"
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 8
profiler_options: null
Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/ch_PP-OCR_v3_det/
save_epoch_step: 100
eval_batch_step:
- 0
- 400
cal_metric_during_train: false
pretrained_model: null
checkpoints: null
save_inference_dir: null
use_visualdl: false
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./checkpoints/det_db/predicts_db.txt
distributed: true
Architecture:
name: DistillationModel
algorithm: Distillation
model_type: det
Models:
Student:
model_type: det
algorithm: DB
Transform: null
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: true
Neck:
name: RSEFPN
out_channels: 96
shortcut: True
Head:
name: DBHead
k: 50
Student2:
model_type: det
algorithm: DB
Transform: null
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: true
Neck:
name: RSEFPN
out_channels: 96
shortcut: True
Head:
name: DBHead
k: 50
Teacher:
freeze_params: true
return_all_feats: false
model_type: det
algorithm: DB
Backbone:
name: ResNet
in_channels: 3
layers: 50
Neck:
name: LKPAN
out_channels: 256
Head:
name: DBHead
kernel_list: [7,2,2]
k: 50
Loss:
name: CombinedLoss
loss_config_list:
- DistillationDilaDBLoss:
weight: 1.0
model_name_pairs:
- ["Student", "Teacher"]
- ["Student2", "Teacher"]
key: maps
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
- DistillationDMLLoss:
model_name_pairs:
- ["Student", "Student2"]
maps_name: "thrink_maps"
weight: 1.0
# act: None
model_name_pairs: ["Student", "Student2"]
key: maps
- DistillationDBLoss:
weight: 1.0
model_name_list: ["Student", "Student2"]
# key: maps
# name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 2
regularizer:
name: L2
factor: 5.0e-05
PostProcess:
name: DistillationDBPostProcess
model_name: ["Student"]
key: head_out
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DistillationMetric
base_metric_name: DetMetric
main_indicator: hmean
key: "Student"
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [1.0]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- CopyPaste:
- IaaAugment:
augmenter_args:
- type: Fliplr
args:
p: 0.5
- type: Affine
args:
rotate:
- -10
- 10
- type: Resize
args:
size:
- 0.5
- 3
- EastRandomCropData:
size:
- 960
- 960
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- threshold_map
- threshold_mask
- shrink_map
- shrink_mask
loader:
shuffle: true
drop_last: false
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- DetResizeForTest: null
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- shape
- polys
- ignore_tags
loader:
shuffle: false
drop_last: false
batch_size_per_card: 1
num_workers: 2
Global: Global:
use_gpu: false debug: false
epoch_num: 5 use_gpu: true
epoch_num: 500
log_smooth_window: 20 log_smooth_window: 20
print_batch_step: 1 print_batch_step: 10
save_model_dir: ./output/db_mv3/ save_model_dir: ./output/ch_PP-OCR_V3_det/
save_epoch_step: 1200 save_epoch_step: 100
# evaluation is run every 2000 iterations eval_batch_step:
eval_batch_step: [0, 400] - 0
cal_metric_during_train: False - 400
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained cal_metric_during_train: false
checkpoints: pretrained_model: null
save_inference_dir: checkpoints: null
use_visualdl: False save_inference_dir: null
use_visualdl: false
infer_img: doc/imgs_en/img_10.jpg infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt save_res_path: ./checkpoints/det_db/predicts_db.txt
distributed: true
Architecture: Architecture:
model_type: det model_type: det
...@@ -23,10 +26,11 @@ Architecture: ...@@ -23,10 +26,11 @@ Architecture:
name: MobileNetV3 name: MobileNetV3
scale: 0.5 scale: 0.5
model_name: large model_name: large
disable_se: False disable_se: True
Neck: Neck:
name: DBFPN name: RSEFPN
out_channels: 256 out_channels: 96
shortcut: True
Head: Head:
name: DBHead name: DBHead
k: 50 k: 50
...@@ -38,29 +42,26 @@ Loss: ...@@ -38,29 +42,26 @@ Loss:
alpha: 5 alpha: 5
beta: 10 beta: 10
ohem_ratio: 3 ohem_ratio: 3
Optimizer: Optimizer:
name: Adam #Momentum name: Adam
#momentum: 0.9
beta1: 0.9 beta1: 0.9
beta2: 0.999 beta2: 0.999
lr: lr:
name: Cosine
learning_rate: 0.001 learning_rate: 0.001
warmup_epoch: 2
regularizer: regularizer:
name: 'L2' name: L2
factor: 0 factor: 5.0e-05
PostProcess: PostProcess:
name: DBPostProcess name: DBPostProcess
thresh: 0.3 thresh: 0.3
box_thresh: 0.6 box_thresh: 0.6
max_candidates: 1000 max_candidates: 1000
unclip_ratio: 1.5 unclip_ratio: 1.5
Metric: Metric:
name: DetMetric name: DetMetric
main_indicator: hmean main_indicator: hmean
Train: Train:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
...@@ -69,12 +70,31 @@ Train: ...@@ -69,12 +70,31 @@ Train:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt - ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [1.0] ratio_list: [1.0]
transforms: transforms:
- DecodeImage: # load image - DecodeImage:
img_mode: BGR img_mode: BGR
channel_first: False channel_first: false
- DetLabelEncode: # Class handling label - DetLabelEncode: null
- Resize: - IaaAugment:
size: [640, 640] augmenter_args:
- type: Fliplr
args:
p: 0.5
- type: Affine
args:
rotate:
- -10
- 10
- type: Resize
args:
size:
- 0.5
- 3
- EastRandomCropData:
size:
- 960
- 960
max_tries: 50
keep_ratio: true
- MakeBorderMap: - MakeBorderMap:
shrink_ratio: 0.4 shrink_ratio: 0.4
thresh_min: 0.3 thresh_min: 0.3
...@@ -84,19 +104,28 @@ Train: ...@@ -84,19 +104,28 @@ Train:
min_text_size: 8 min_text_size: 8
- NormalizeImage: - NormalizeImage:
scale: 1./255. scale: 1./255.
mean: [0.485, 0.456, 0.406] mean:
std: [0.229, 0.224, 0.225] - 0.485
order: 'hwc' - 0.456
- ToCHWImage: - 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys: - KeepKeys:
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list keep_keys:
- image
- threshold_map
- threshold_mask
- shrink_map
- shrink_mask
loader: loader:
shuffle: False shuffle: true
drop_last: False drop_last: false
batch_size_per_card: 1 batch_size_per_card: 8
num_workers: 0 num_workers: 4
use_shared_memory: False
Eval: Eval:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
...@@ -104,23 +133,31 @@ Eval: ...@@ -104,23 +133,31 @@ Eval:
label_file_list: label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt - ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
transforms: transforms:
- DecodeImage: # load image - DecodeImage:
img_mode: BGR img_mode: BGR
channel_first: False channel_first: false
- DetLabelEncode: # Class handling label - DetLabelEncode: null
- DetResizeForTest: - DetResizeForTest: null
image_shape: [736, 1280]
- NormalizeImage: - NormalizeImage:
scale: 1./255. scale: 1./255.
mean: [0.485, 0.456, 0.406] mean:
std: [0.229, 0.224, 0.225] - 0.485
order: 'hwc' - 0.456
- ToCHWImage: - 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys: - KeepKeys:
keep_keys: ['image', 'shape', 'polys', 'ignore_tags'] keep_keys:
- image
- shape
- polys
- ignore_tags
loader: loader:
shuffle: False shuffle: false
drop_last: False drop_last: false
batch_size_per_card: 1 # must be 1 batch_size_per_card: 1
num_workers: 0 num_workers: 2
use_shared_memory: False
Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v3
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: True
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
ext_op_transform_idx: 1
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 128
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4
Global:
debug: false
use_gpu: true
epoch_num: 800
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v3_distillation
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3_distillation.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs : [700, 800]
values : [0.0005, 0.00005]
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: &model_type "rec"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: CombinedLoss
loss_config_list:
- DistillationDMLLoss:
weight: 1.0
act: "softmax"
use_log: true
model_name_pairs:
- ["Student", "Teacher"]
key: head_out
multi_head: True
dis_head: ctc
name: dml_ctc
- DistillationDMLLoss:
weight: 0.5
act: "softmax"
use_log: true
model_name_pairs:
- ["Student", "Teacher"]
key: head_out
multi_head: True
dis_head: sar
name: dml_sar
- DistillationDistanceLoss:
weight: 1.0
mode: "l2"
model_name_pairs:
- ["Student", "Teacher"]
key: backbone_out
- DistillationCTCLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
- DistillationSARLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: head_out
multi_head: True
PostProcess:
name: DistillationCTCLabelDecode
model_name: ["Student", "Teacher"]
key: head_out
multi_head: True
Metric:
name: DistillationMetric
base_metric_name: RecMetric
main_indicator: acc
key: "Student"
ignore_space: True
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
ext_op_transform_idx: 1
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 128
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4
Global:
use_gpu: True
epoch_num: 20
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/svtr/
save_epoch_step: 1
# evaluation is run every 2000 iterations after the 0th iteration
eval_batch_step: [0, 2000]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path:
character_type: en
max_text_length: 25
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_svtr_tiny.txt
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.99
epsilon: 0.00000008
weight_decay: 0.05
no_weight_decay_name: norm pos_embed
one_dim_param_no_weight_decay: true
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 2
Architecture:
model_type: rec
algorithm: SVTR
Transform:
name: STN_ON
tps_inputsize: [32, 64]
tps_outputsize: [32, 100]
num_control_points: 20
tps_margins: [0.05,0.05]
stn_activation: none
Backbone:
name: SVTRNet
img_size: [32, 100]
out_char_num: 25
out_channels: 192
patch_merging: 'Conv'
embed_dim: [64, 128, 256]
depth: [3, 6, 3]
num_heads: [2, 4, 8]
mixer: ['Local','Local','Local','Local','Local','Local','Global','Global','Global','Global','Global','Global']
local_mixer: [[7, 11], [7, 11], [7, 11]]
last_stage: True
prenorm: false
Neck:
name: SequenceEncoder
encoder_type: reshape
Head:
name: CTCHead
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/training/
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
character_dict_path:
image_shape: [3, 64, 256]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 512
drop_last: True
num_workers: 4
Eval:
dataset:
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/validation/
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
character_dict_path:
image_shape: [3, 64, 256]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 2
English | [简体中文](README_ch.md)
# PP-OCR Deployment
- [Paddle Deployment Introduction](#1)
- [PP-OCR Deployment](#2)
<a name="1"></a>
## Paddle Deployment Introduction
Paddle provides a variety of deployment schemes to meet the deployment requirements of different scenarios. Please choose according to the actual situation:
<div align="center">
<img src="../doc/deployment_en.png" width="800">
</div>
<a name="2"></a>
## PP-OCR Deployment
PP-OCR has supported muti deployment schemes. Click the link to get the specific tutorial.
- [Python Inference](../doc/doc_en/inference_ppocr_en.md)
- [C++ Inference](./cpp_infer/readme.md)
- [Serving (Python/C++)](./pdserving/README.md)
- [Paddle-Lite (ARM CPU/OpenCL ARM GPU/Metal ARM GPU)](./lite/readme.md)
- [Paddle.js](./paddlejs/README.md)
- [Jetson Inference]()
- [XPU Inference]()
- [Paddle2ONNX](./paddle2onnx/readme.md)
If you need the deployment tutorial of academic algorithm models other than PP-OCR, please directly enter the main page of corresponding algorithms, [entrance](../doc/doc_en/algorithm_overview_en.md)
\ No newline at end of file
[English](README.md) | 简体中文
# PP-OCR 模型推理部署
- [Paddle 推理部署方式简介](#1)
- [PP-OCR 推理部署](#2)
<a name="1"></a>
## Paddle 推理部署方式简介
飞桨提供多种部署方案,以满足不同场景的部署需求,请根据实际情况进行选择:
<div align="center">
<img src="../doc/deployment.png" width="800">
</div>
<a name="2"></a>
## PP-OCR 推理部署
PP-OCR模型已打通多种场景部署方案,点击链接获取具体的使用教程。
- [Python 推理](../doc/doc_ch/inference_ppocr.md)
- [C++ 推理](./cpp_infer/readme_ch.md)
- [Serving 服务化部署(Python/C++)](./pdserving/README_CN.md)
- [Paddle-Lite 端侧部署(ARM CPU/OpenCL ARM GPU/Metal ARM GPU)](./lite/readme_ch.md)
- [Paddle.js 部署](./paddlejs/README_ch.md)
- [Jetson 推理]()
- [XPU 推理]()
- [Paddle2ONNX 推理](./paddle2onnx/readme_ch.md)
需要PP-OCR以外的学术算法模型的推理部署,请直接进入相应算法主页面,[入口](../doc/doc_ch/algorithm_overview.md)
\ No newline at end of file
...@@ -70,7 +70,7 @@ cmake安装完后后系统里会有一个cmake-gui程序,打开cmake-gui,在 ...@@ -70,7 +70,7 @@ cmake安装完后后系统里会有一个cmake-gui程序,打开cmake-gui,在
* cpu版本,仅需考虑OPENCV_DIR、OpenCV_DIR、PADDLE_LIB三个参数 * cpu版本,仅需考虑OPENCV_DIR、OpenCV_DIR、PADDLE_LIB三个参数
- OPENCV_DIR:填写opencv lib文件夹所在位置 - OPENCV_DIR:填写opencv lib文件夹所在位置
- OpenCV_DIR:同填写opencv lib文件夹所在位 - OpenCV_DIR:同填写opencv lib文件夹所在位
- PADDLE_LIB:paddle_inference文件夹所在位置 - PADDLE_LIB:paddle_inference文件夹所在位置
* GPU版本,在cpu版本的基础上,还需填写以下变量 * GPU版本,在cpu版本的基础上,还需填写以下变量
...@@ -78,7 +78,7 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT ...@@ -78,7 +78,7 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT
- CUDA_LIB: CUDA地址,如 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64` - CUDA_LIB: CUDA地址,如 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64`
- CUDNN_LIB: 和CUDA_LIB一致 - CUDNN_LIB: 和CUDA_LIB一致
- TENSORRT_DIR:TRT下载后解压缩的位置 - TENSORRT_DIR:TRT下载后解压缩的位置,如 `D:\TensorRT-8.0.1.6`
- WITH_GPU: 打钩 - WITH_GPU: 打钩
- WITH_TENSORRT:打勾 - WITH_TENSORRT:打勾
...@@ -110,10 +110,11 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT ...@@ -110,10 +110,11 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT
运行之前,将下面文件拷贝到`build/Release/`文件夹下 运行之前,将下面文件拷贝到`build/Release/`文件夹下
1. `paddle_inference/paddle/lib/paddle_inference.dll` 1. `paddle_inference/paddle/lib/paddle_inference.dll`
2. `opencv/build/x64/vc15/bin/opencv_world455.dll` 2. `opencv/build/x64/vc15/bin/opencv_world455.dll`
3. 如果使用openblas版本的预测库还需要拷贝 `paddle_inference/third_party/install/openblas/lib/openblas.dll`
### Step4: 预测 ### Step4: 预测
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release\Release`目录下,打开`cmd`,并切换到`D:\projects\cpp\PaddleOCR\deploy\cpp_infer\`: 上述`Visual Studio 2019`编译产出的可执行文件在`build/Release/`目录下,打开`cmd`,并切换到`D:\projects\cpp\PaddleOCR\deploy\cpp_infer\`:
``` ```
cd /d D:\projects\cpp\PaddleOCR\deploy\cpp_infer cd /d D:\projects\cpp\PaddleOCR\deploy\cpp_infer
...@@ -128,7 +129,7 @@ CHCP 65001 ...@@ -128,7 +129,7 @@ CHCP 65001
``` ```
识别结果如下 识别结果如下
![result](imgs/result.png) ![result](imgs/result.jpg)
## FAQ ## FAQ
......
...@@ -46,6 +46,8 @@ DECLARE_int32(cls_batch_num); ...@@ -46,6 +46,8 @@ DECLARE_int32(cls_batch_num);
DECLARE_string(rec_model_dir); DECLARE_string(rec_model_dir);
DECLARE_int32(rec_batch_num); DECLARE_int32(rec_batch_num);
DECLARE_string(rec_char_dict_path); DECLARE_string(rec_char_dict_path);
DECLARE_int32(rec_img_h);
DECLARE_int32(rec_img_w);
// forward related // forward related
DECLARE_bool(det); DECLARE_bool(det);
DECLARE_bool(rec); DECLARE_bool(rec);
......
...@@ -45,7 +45,8 @@ public: ...@@ -45,7 +45,8 @@ public:
const bool &use_mkldnn, const string &label_path, const bool &use_mkldnn, const string &label_path,
const bool &use_tensorrt, const bool &use_tensorrt,
const std::string &precision, const std::string &precision,
const int &rec_batch_num) { const int &rec_batch_num, const int &rec_img_h,
const int &rec_img_w) {
this->use_gpu_ = use_gpu; this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id; this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem; this->gpu_mem_ = gpu_mem;
...@@ -54,6 +55,10 @@ public: ...@@ -54,6 +55,10 @@ public:
this->use_tensorrt_ = use_tensorrt; this->use_tensorrt_ = use_tensorrt;
this->precision_ = precision; this->precision_ = precision;
this->rec_batch_num_ = rec_batch_num; this->rec_batch_num_ = rec_batch_num;
this->rec_img_h_ = rec_img_h;
this->rec_img_w_ = rec_img_w;
std::vector<int> rec_image_shape = {3, rec_img_h, rec_img_w};
this->rec_image_shape_ = rec_image_shape;
this->label_list_ = Utility::ReadDict(label_path); this->label_list_ = Utility::ReadDict(label_path);
this->label_list_.insert(this->label_list_.begin(), this->label_list_.insert(this->label_list_.begin(),
...@@ -86,7 +91,9 @@ private: ...@@ -86,7 +91,9 @@ private:
bool use_tensorrt_ = false; bool use_tensorrt_ = false;
std::string precision_ = "fp32"; std::string precision_ = "fp32";
int rec_batch_num_ = 6; int rec_batch_num_ = 6;
int rec_img_h_ = 32;
int rec_img_w_ = 320;
std::vector<int> rec_image_shape_ = {3, rec_img_h_, rec_img_w_};
// pre-process // pre-process
CrnnResizeImg resize_op_; CrnnResizeImg resize_op_;
Normalize normalize_op_; Normalize normalize_op_;
......
...@@ -39,10 +39,10 @@ using namespace paddle_infer; ...@@ -39,10 +39,10 @@ using namespace paddle_infer;
namespace PaddleOCR { namespace PaddleOCR {
class PaddleOCR { class PPOCR {
public: public:
explicit PaddleOCR(); explicit PPOCR();
~PaddleOCR(); ~PPOCR();
std::vector<std::vector<OCRPredictResult>> std::vector<std::vector<OCRPredictResult>>
ocr(std::vector<cv::String> cv_all_img_names, bool det = true, ocr(std::vector<cv::String> cv_all_img_names, bool det = true,
bool rec = true, bool cls = true); bool rec = true, bool cls = true);
......
...@@ -65,6 +65,8 @@ public: ...@@ -65,6 +65,8 @@ public:
static bool PathExists(const std::string &path); static bool PathExists(const std::string &path);
static void CreateDir(const std::string &path);
static void print_result(const std::vector<OCRPredictResult> &ocr_result); static void print_result(const std::vector<OCRPredictResult> &ocr_result);
}; };
......
- [服务器端C++预测](#服务器端c预测) English | [简体中文](readme_ch.md)
- [1. 准备环境](#1-准备环境)
- [1.0 运行准备](#10-运行准备)
- [1.1 编译opencv库](#11-编译opencv库)
- [1.2 下载或者编译Paddle预测库](#12-下载或者编译paddle预测库)
- [1.2.1 直接下载安装](#121-直接下载安装)
- [1.2.2 预测库源码编译](#122-预测库源码编译)
- [2 开始运行](#2-开始运行)
- [2.1 将模型导出为inference model](#21-将模型导出为inference-model)
- [2.2 编译PaddleOCR C++预测demo](#22-编译paddleocr-c预测demo)
- [2.3 运行demo](#23-运行demo)
- [1. 检测+分类+识别:](#1-检测分类识别)
- [2. 检测+识别:](#2-检测识别)
- [3. 检测:](#3-检测)
- [4. 分类+识别:](#4-分类识别)
- [5. 识别:](#5-识别)
- [6. 分类:](#6-分类)
- [3. FAQ](#3-faq)
# 服务器端C++预测
本章节介绍PaddleOCR 模型的的C++部署方法,与之对应的python预测部署方式参考[文档](../../doc/doc_ch/inference.md)
C++在性能计算上优于python,因此,在大多数CPU、GPU部署场景,多采用C++的部署方式,本节将介绍如何在Linux\Windows (CPU\GPU)环境下配置C++环境并完成
PaddleOCR模型部署。
# Server-side C++ Inference
<a name="1"></a> - [1. Prepare the Environment](#1)
- [1.1 Environment](#11)
## 1. 准备环境 - [1.2 Compile OpenCV](#12)
- [1.3 Compile or Download or the Paddle Inference Library](#13)
- [2. Compile and Run the Demo](#2)
- [2.1 Export the inference model](#21)
- [2.2 Compile PaddleOCR C++ inference demo](#22)
- [2.3 Run the demo](#23)
- [3. FAQ](#3)
<a name="10"></a>
### 1.0 运行准备 This chapter introduces the C++ deployment steps of the PaddleOCR model. C++ is better than Python in terms of performance. Therefore, in CPU and GPU deployment scenarios, C++ deployment is mostly used.
This section will introduce how to configure the C++ environment and deploy PaddleOCR in Linux (CPU\GPU) environment. For Windows deployment please refer to [Windows](./docs/windows_vs2019_build.md) compilation guidelines.
- Linux环境,推荐使用docker。
- Windows环境。
* 该文档主要介绍基于Linux环境的PaddleOCR C++预测流程,如果需要在Windows下基于预测库进行C++预测,具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md) <a name="1"></a>
## 1. Prepare the Environment
<a name="11"></a> <a name="11"></a>
### 1.1 Environment
- Linux, docker is recommended.
- Windows.
### 1.1 编译opencv库
* 首先需要从opencv官网上下载在Linux环境下源码编译的包,以opencv3.4.7为例,下载命令如下。 <a name="12"></a>
### 1.2 Compile OpenCV
* First of all, you need to download the source code compiled package in the Linux environment from the OpenCV official website. Taking OpenCV 3.4.7 as an example, the download command is as follows.
```bash ```bash
cd deploy/cpp_infer cd deploy/cpp_infer
...@@ -49,18 +38,18 @@ wget https://paddleocr.bj.bcebos.com/libs/opencv/opencv-3.4.7.tar.gz ...@@ -49,18 +38,18 @@ wget https://paddleocr.bj.bcebos.com/libs/opencv/opencv-3.4.7.tar.gz
tar -xf opencv-3.4.7.tar.gz tar -xf opencv-3.4.7.tar.gz
``` ```
最终可以在当前目录下看到`opencv-3.4.7/`的文件夹。 Finally, you will see the folder of `opencv-3.4.7/` in the current directory.
* Compile OpenCV, the OpenCV source path (`root_path`) and installation path (`install_path`) should be set by yourself. Enter the OpenCV source code path and compile it in the following way.
* 编译opencv,设置opencv源码路径(`root_path`)以及安装路径(`install_path`)。进入opencv源码路径下,按照下面的方式进行编译。
```shell ```shell
root_path="your_opencv_root_path" root_path=your_opencv_root_path
install_path=${root_path}/opencv3 install_path=${root_path}/opencv3
build_dir=${root_path}/build
rm -rf ${build_dir} rm -rf build
mkdir ${build_dir} mkdir build
cd ${build_dir} cd build
cmake .. \ cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path} \ -DCMAKE_INSTALL_PREFIX=${install_path} \
...@@ -84,15 +73,11 @@ make -j ...@@ -84,15 +73,11 @@ make -j
make install make install
``` ```
也可以直接修改`tools/build_opencv.sh`的内容,然后直接运行下面的命令进行编译。 In the above commands, `root_path` is the downloaded OpenCV source code path, and `install_path` is the installation path of OpenCV. After `make install` is completed, the OpenCV header file and library file will be generated in this folder for later OCR source code compilation.
```shell
sh tools/build_opencv.sh
```
其中`root_path`为下载的opencv源码路径,`install_path`为opencv的安装路径,`make install`完成之后,会在该文件夹下生成opencv头文件和库文件,用于后面的OCR代码编译。
最终在安装路径下的文件结构如下所示。 The final file structure under the OpenCV installation path is as follows.
``` ```
opencv3/ opencv3/
...@@ -103,35 +88,35 @@ opencv3/ ...@@ -103,35 +88,35 @@ opencv3/
|-- share |-- share
``` ```
<a name="12"></a> <a name="13"></a>
### 1.3 Compile or Download or the Paddle Inference Library
### 1.2 下载或者编译Paddle预测库
* 有2种方式获取Paddle预测库,下面进行详细介绍。 * There are 2 ways to obtain the Paddle inference library, described in detail below.
#### 1.3.1 Direct download and installation
#### 1.2.1 直接下载安装 [Paddle inference library official website](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#linux). You can review and select the appropriate version of the inference library on the official website.
* [Paddle预测库官网](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#linux) 上提供了不同cuda版本的Linux预测库,可以在官网查看并选择合适的预测库版本(*建议选择paddle版本>=2.0.1版本的预测库* )。
* 下载之后使用下面的方法解压。 * After downloading, use the following command to extract files.
``` ```
tar -xf paddle_inference.tgz tar -xf paddle_inference.tgz
``` ```
最终会在当前的文件夹中生成`paddle_inference/`的子文件夹。 Finally you will see the the folder of `paddle_inference/` in the current path.
#### 1.3.2 Compile the inference source code
* If you want to get the latest Paddle inference library features, you can download the latest code from Paddle GitHub repository and compile the inference library from the source code. It is recommended to download the inference library with paddle version greater than or equal to 2.0.1.
* You can refer to [Paddle inference library] (https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html) to get the Paddle source code from GitHub, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
#### 1.2.2 预测库源码编译
* 如果希望获取最新预测库特性,可以从Paddle github上克隆最新代码,源码编译预测库。
* 可以参考[Paddle预测库安装编译说明](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#congyuanmabianyi) 的说明,从github上获取Paddle代码,然后进行编译,生成最新的预测库。使用git获取代码方法如下。
```shell ```shell
git clone https://github.com/PaddlePaddle/Paddle.git git clone https://github.com/PaddlePaddle/Paddle.git
git checkout develop git checkout develop
``` ```
* 进入Paddle目录后,编译方法如下。 * Enter the Paddle directory and run the following commands to compile the paddle inference library.
```shell ```shell
rm -rf build rm -rf build
...@@ -151,10 +136,10 @@ make -j ...@@ -151,10 +136,10 @@ make -j
make inference_lib_dist make inference_lib_dist
``` ```
更多编译参数选项介绍可以参考[文档说明](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#congyuanmabianyi) For more compilation parameter options, please refer to the [document](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#congyuanmabianyi).
* 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。 * After the compilation process, you can see the following files in the folder of `build/paddle_inference_install_dir/`.
``` ```
build/paddle_inference_install_dir/ build/paddle_inference_install_dir/
...@@ -164,17 +149,16 @@ build/paddle_inference_install_dir/ ...@@ -164,17 +149,16 @@ build/paddle_inference_install_dir/
|-- version.txt |-- version.txt
``` ```
其中`paddle`就是C++预测所需的Paddle库,`version.txt`中包含当前预测库的版本信息。 `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
<a name="2"></a>
## 2 开始运行 <a name="2"></a>
## 2. Compile and Run the Demo
<a name="21"></a> <a name="21"></a>
### 2.1 Export the inference model
### 2.1 将模型导出为inference model * You can refer to [Model inference](../../doc/doc_ch/inference.md) and export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
* 可以参考[模型预测章节](../../doc/doc_ch/inference.md),导出inference model,用于模型预测。模型导出之后,假设放在`inference`目录下,则目录结构如下。
``` ```
inference/ inference/
...@@ -189,41 +173,44 @@ inference/ ...@@ -189,41 +173,44 @@ inference/
| |--inference.pdmodel | |--inference.pdmodel
``` ```
<a name="22"></a>
### 2.2 编译PaddleOCR C++预测demo <a name="22"></a>
### 2.2 Compile PaddleOCR C++ inference demo
* 编译命令如下,其中Paddle C++预测库、opencv等其他依赖库的地址需要换成自己机器上的实际地址。 * The compilation commands are as follows. The addresses of Paddle C++ inference library, opencv and other Dependencies need to be replaced with the actual addresses on your own machines.
```shell ```shell
sh tools/build.sh sh tools/build.sh
``` ```
* 具体的,需要修改`tools/build.sh`中环境路径,相关内容如下: Specifically, you should modify the paths in `tools/build.sh`. The related content is as follows.
```shell ```shell
OPENCV_DIR=your_opencv_dir OPENCV_DIR=your_opencv_dir
LIB_DIR=your_paddle_inference_dir LIB_DIR=your_paddle_inference_dir
CUDA_LIB_DIR=your_cuda_lib_dir CUDA_LIB_DIR=your_cuda_lib_dir
CUDNN_LIB_DIR=/your_cudnn_lib_dir CUDNN_LIB_DIR=your_cudnn_lib_dir
``` ```
其中,`OPENCV_DIR`为opencv编译安装的地址;`LIB_DIR`为下载(`paddle_inference`文件夹)或者编译生成的Paddle预测库地址(`build/paddle_inference_install_dir`文件夹);`CUDA_LIB_DIR`为cuda库文件地址,在docker中为`/usr/local/cuda/lib64``CUDNN_LIB_DIR`为cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/`**注意:以上路径都写绝对路径,不要写相对路径。** `OPENCV_DIR` is the OpenCV installation path; `LIB_DIR` is the download (`paddle_inference` folder)
or the generated Paddle inference library path (`build/paddle_inference_install_dir` folder);
`CUDA_LIB_DIR` is the CUDA library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cuDNN library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* 编译完成之后,会在`build`文件夹下生成一个名为`ppocr`的可执行文件。 * After the compilation is completed, an executable file named `ppocr` will be generated in the `build` folder.
<a name="23"></a>
### 2.3 运行demo <a name="23"></a>
### 2.3 Run the demo
运行方式: Execute the built executable file:
```shell ```shell
./build/ppocr [--param1] [--param2] [...] ./build/ppocr [--param1] [--param2] [...]
``` ```
具体命令如下:
##### 1. 检测+分类+识别: Specifically,
##### 1. det+cls+rec:
```shell ```shell
./build/ppocr --det_model_dir=inference/det_db \ ./build/ppocr --det_model_dir=inference/det_db \
--rec_model_dir=inference/rec_rcnn \ --rec_model_dir=inference/rec_rcnn \
...@@ -235,7 +222,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir ...@@ -235,7 +222,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--cls=true \ --cls=true \
``` ```
##### 2. 检测+识别 ##### 2. det+rec
```shell ```shell
./build/ppocr --det_model_dir=inference/det_db \ ./build/ppocr --det_model_dir=inference/det_db \
--rec_model_dir=inference/rec_rcnn \ --rec_model_dir=inference/rec_rcnn \
...@@ -246,7 +233,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir ...@@ -246,7 +233,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--cls=false \ --cls=false \
``` ```
##### 3. 检测: ##### 3. det
```shell ```shell
./build/ppocr --det_model_dir=inference/det_db \ ./build/ppocr --det_model_dir=inference/det_db \
--image_dir=../../doc/imgs/12.jpg \ --image_dir=../../doc/imgs/12.jpg \
...@@ -254,7 +241,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir ...@@ -254,7 +241,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--rec=false --rec=false
``` ```
##### 4. 分类+识别 ##### 4. cls+rec
```shell ```shell
./build/ppocr --rec_model_dir=inference/rec_rcnn \ ./build/ppocr --rec_model_dir=inference/rec_rcnn \
--cls_model_dir=inference/cls \ --cls_model_dir=inference/cls \
...@@ -265,7 +252,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir ...@@ -265,7 +252,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--cls=true \ --cls=true \
``` ```
##### 5. 识别: ##### 5. rec
```shell ```shell
./build/ppocr --rec_model_dir=inference/rec_rcnn \ ./build/ppocr --rec_model_dir=inference/rec_rcnn \
--image_dir=../../doc/imgs_words/ch/word_1.jpg \ --image_dir=../../doc/imgs_words/ch/word_1.jpg \
...@@ -275,7 +262,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir ...@@ -275,7 +262,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--cls=false \ --cls=false \
``` ```
##### 6. 分类: ##### 6. cls
```shell ```shell
./build/ppocr --cls_model_dir=inference/cls \ ./build/ppocr --cls_model_dir=inference/cls \
--cls_model_dir=inference/cls \ --cls_model_dir=inference/cls \
...@@ -286,61 +273,64 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir ...@@ -286,61 +273,64 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--cls=true \ --cls=true \
``` ```
更多支持的可调节参数解释如下: More parameters are as follows,
- 通用参数 - Common parameters
|参数名称|类型|默认参数|意义| |parameter|data type|default|meaning|
| :---: | :---: | :---: | :---: | | --- | --- | --- | --- |
|use_gpu|bool|false|是否使用GPU| |use_gpu|bool|false|Whether to use GPU|
|gpu_id|int|0|GPU id,使用GPU时有效| |gpu_id|int|0|GPU id when use_gpu is true|
|gpu_mem|int|4000|申请的GPU内存| |gpu_mem|int|4000|GPU memory requested|
|cpu_math_library_num_threads|int|10|CPU预测时的线程数,在机器核数充足的情况下,该值越大,预测速度越快| |cpu_math_library_num_threads|int|10|Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed|
|enable_mkldnn|bool|true|是否使用mkldnn库| |enable_mkldnn|bool|true|Whether to use mkdlnn library|
|output|str|./output|可视化结果保存的路径| |output|str|./output|Path where visualization results are saved|
- 前向相关
|参数名称|类型|默认参数|意义| - forward
|parameter|data type|default|meaning|
| :---: | :---: | :---: | :---: | | :---: | :---: | :---: | :---: |
|det|bool|true|前向是否执行文字检测| |det|bool|true|前向是否执行文字检测|
|rec|bool|true|前向是否执行文字识别| |rec|bool|true|前向是否执行文字识别|
|cls|bool|false|前向是否执行文字方向分类| |cls|bool|false|前向是否执行文字方向分类|
- 检测模型相关 - Detection related parameters
|参数名称|类型|默认参数|意义| |parameter|data type|default|meaning|
| :---: | :---: | :---: | :---: | | --- | --- | --- | --- |
|det_model_dir|string|-|检测模型inference model地址| |det_model_dir|string|-|Address of detection inference model|
|max_side_len|int|960|输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960| |max_side_len|int|960|Limit the maximum image height and width to 960|
|det_db_thresh|float|0.3|用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显| |det_db_thresh|float|0.3|Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result|
|det_db_box_thresh|float|0.5|DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小| |det_db_box_thresh|float|0.5|DB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate|
|det_db_unclip_ratio|float|1.6|表示文本框的紧致程度,越小则文本框更靠近文本| |det_db_unclip_ratio|float|1.6|Indicates the compactness of the text box, the smaller the value, the closer the text box to the text|
|det_db_score_mode|string|slow|slow:使用多边形框计算bbox score,fast:使用矩形框计算。矩形框计算速度更快,多边形框对弯曲文本区域计算更准确。| |det_db_score_mode|string|slow| slow: use polygon box to calculate bbox score, fast: use rectangle box to calculate. Use rectangular box to calculate faster, and polygonal box more accurate for curved text area.|
|visualize|bool|true|是否对结果进行可视化,为1时,预测结果会保存在`output`字段指定的文件夹下和输入图像同名的图像上。| |visualize|bool|true|Whether to visualize the results,when it is set as true, the prediction results will be saved in the folder specified by the `output` field on an image with the same name as the input image.|
- 方向分类器相关 - Classifier related parameters
|参数名称|类型|默认参数|意义| |parameter|data type|default|meaning|
| :---: | :---: | :---: | :---: | | --- | --- | --- | --- |
|use_angle_cls|bool|false|是否使用方向分类器| |use_angle_cls|bool|false|Whether to use the direction classifier|
|cls_model_dir|string|-|方向分类器inference model地址| |cls_model_dir|string|-|Address of direction classifier inference model|
|cls_thresh|float|0.9|方向分类器的得分阈值| |cls_thresh|float|0.9|Score threshold of the direction classifier|
|cls_batch_num|int|1|方向分类器batchsize| |cls_batch_num|int|1|batch size of classifier|
- 识别模型相关 - Recognition related parameters
|参数名称|类型|默认参数|意义| |parameter|data type|default|meaning|
| :---: | :---: | :---: | :---: | | --- | --- | --- | --- |
|rec_model_dir|string|-|识别模型inference model地址| |rec_model_dir|string|-|Address of recognition inference model|
|rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|字典文件| |rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|dictionary file|
|rec_batch_num|int|6|识别模型batchsize| |rec_batch_num|int|6|batch size of recognition|
|rec_img_h|int|32|image height of recognition|
|rec_img_w|int|320|image width of recognition|
* Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `rec_char_dict_path` and `rec_model_dir`.
* PaddleOCR也支持多语言的预测,更多支持的语言和模型可以参考[识别文档](../../doc/doc_ch/recognition.md)中的多语言字典与模型部分,如果希望进行多语言预测,只需将修改`rec_char_dict_path`(字典文件路径)以及`rec_model_dir`(inference模型路径)字段即可。
最终屏幕上会输出检测结果如下。 The detection results will be shown on the screen, which is as follows.
```bash ```bash
predict img: ../../doc/imgs/12.jpg predict img: ../../doc/imgs/12.jpg
...@@ -352,6 +342,8 @@ predict img: ../../doc/imgs/12.jpg ...@@ -352,6 +342,8 @@ predict img: ../../doc/imgs/12.jpg
The detection visualized image saved in ./output//12.jpg The detection visualized image saved in ./output//12.jpg
``` ```
<a name="3"></a>
## 3. FAQ ## 3. FAQ
1. 遇到报错 `unable to access 'https://github.com/LDOUBLEV/AutoLog.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.`, 将 `deploy/cpp_infer/external-cmake/auto-log.cmake` 中的github地址改为 https://gitee.com/Double_V/AutoLog 地址即可。 1. Encountered the error `unable to access 'https://github.com/LDOUBLEV/AutoLog.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.`, change the github address in `deploy/cpp_infer/external-cmake/auto-log.cmake` to the https://gitee.com/Double_V/AutoLog address.
- [Server-side C++ Inference](#server-side-c-inference) [English](readme.md) | 简体中文
- [1. Prepare the Environment](#1-prepare-the-environment)
- [Environment](#environment)
- [1.1 Compile OpenCV](#11-compile-opencv)
- [1.2 Compile or Download or the Paddle Inference Library](#12-compile-or-download-or-the-paddle-inference-library)
- [1.2.1 Direct download and installation](#121-direct-download-and-installation)
- [1.2.2 Compile the inference source code](#122-compile-the-inference-source-code)
- [2. Compile and Run the Demo](#2-compile-and-run-the-demo)
- [2.1 Export the inference model](#21-export-the-inference-model)
- [2.2 Compile PaddleOCR C++ inference demo](#22-compile-paddleocr-c-inference-demo)
- [Run the demo](#run-the-demo)
- [1. det+cls+rec:](#1-detclsrec)
- [2. det+rec:](#2-detrec)
- [3. det](#3-det)
- [4. cls+rec:](#4-clsrec)
- [5. rec](#5-rec)
- [6. cls](#6-cls)
- [3. FAQ](#3-faq)
# Server-side C++ Inference # 服务器端C++预测
This chapter introduces the C++ deployment steps of the PaddleOCR model. The corresponding Python predictive deployment method refers to [document](../../doc/doc_ch/inference.md). - [1. 准备环境](#1)
C++ is better than python in terms of performance. Therefore, in CPU and GPU deployment scenarios, C++ deployment is mostly used. - [1.1 运行准备](#11)
This section will introduce how to configure the C++ environment and deploy PaddleOCR in Linux (CPU\GPU) environment. For Windows deployment please refer to [Windows](./docs/windows_vs2019_build.md) compilation guidelines. - [1.2 编译opencv库](#12)
- [1.3 下载或者编译Paddle预测库](#13)
- [2 开始运行](#2)
- [2.1 准备模型](#21)
- [2.2 编译PaddleOCR C++预测demo](#22)
- [2.3 运行demo](#23)
- [3. FAQ](#3)
本章节介绍PaddleOCR 模型的的C++部署方法。C++在性能计算上优于Python,因此,在大多数CPU、GPU部署场景,多采用C++的部署方式,本节将介绍如何在Linux\Windows (CPU\GPU)环境下配置C++环境并完成PaddleOCR模型部署。
## 1. Prepare the Environment
### Environment <a name="1"></a>
- Linux, docker is recommended. ## 1. 准备环境
- Windows.
<a name="11"></a>
### 1.1 Compile OpenCV ### 1.1 运行准备
* First of all, you need to download the source code compiled package in the Linux environment from the OpenCV official website. Taking OpenCV 3.4.7 as an example, the download command is as follows. - Linux环境,推荐使用docker。
- Windows环境。
* 该文档主要介绍基于Linux环境的PaddleOCR C++预测流程,如果需要在Windows下基于预测库进行C++预测,具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md)
<a name="12"></a>
### 1.2 编译opencv库
* 首先需要从opencv官网上下载在Linux环境下源码编译的包,以opencv3.4.7为例,下载命令如下。
```bash ```bash
cd deploy/cpp_infer cd deploy/cpp_infer
...@@ -42,18 +40,18 @@ wget https://paddleocr.bj.bcebos.com/libs/opencv/opencv-3.4.7.tar.gz ...@@ -42,18 +40,18 @@ wget https://paddleocr.bj.bcebos.com/libs/opencv/opencv-3.4.7.tar.gz
tar -xf opencv-3.4.7.tar.gz tar -xf opencv-3.4.7.tar.gz
``` ```
Finally, you will see the folder of `opencv-3.4.7/` in the current directory. 最终可以在当前目录下看到`opencv-3.4.7/`的文件夹。
* Compile OpenCV, the OpenCV source path (`root_path`) and installation path (`install_path`) should be set by yourself. Enter the OpenCV source code path and compile it in the following way.
* 编译opencv,设置opencv源码路径(`root_path`)以及安装路径(`install_path`)。进入opencv源码路径下,按照下面的方式进行编译。
```shell ```shell
root_path=your_opencv_root_path root_path="your_opencv_root_path"
install_path=${root_path}/opencv3 install_path=${root_path}/opencv3
build_dir=${root_path}/build
rm -rf build rm -rf ${build_dir}
mkdir build mkdir ${build_dir}
cd build cd ${build_dir}
cmake .. \ cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path} \ -DCMAKE_INSTALL_PREFIX=${install_path} \
...@@ -77,11 +75,15 @@ make -j ...@@ -77,11 +75,15 @@ make -j
make install make install
``` ```
In the above commands, `root_path` is the downloaded OpenCV source code path, and `install_path` is the installation path of OpenCV. After `make install` is completed, the OpenCV header file and library file will be generated in this folder for later OCR source code compilation. 也可以直接修改`tools/build_opencv.sh`的内容,然后直接运行下面的命令进行编译。
```shell
sh tools/build_opencv.sh
```
其中`root_path`为下载的opencv源码路径,`install_path`为opencv的安装路径,`make install`完成之后,会在该文件夹下生成opencv头文件和库文件,用于后面的OCR代码编译。
The final file structure under the OpenCV installation path is as follows. 最终在安装路径下的文件结构如下所示。
``` ```
opencv3/ opencv3/
...@@ -92,34 +94,38 @@ opencv3/ ...@@ -92,34 +94,38 @@ opencv3/
|-- share |-- share
``` ```
### 1.2 Compile or Download or the Paddle Inference Library <a name="13"></a>
* There are 2 ways to obtain the Paddle inference library, described in detail below. ### 1.3 下载或者编译Paddle预测库
#### 1.2.1 Direct download and installation 可以选择直接下载安装或者从源码编译,下文分别进行具体说明。
[Paddle inference library official website](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#linux). You can review and select the appropriate version of the inference library on the official website. <a name="131"></a>
#### 1.3.1 直接下载安装
[Paddle预测库官网](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#linux) 上提供了不同cuda版本的Linux预测库,可以在官网查看并选择合适的预测库版本(*建议选择paddle版本>=2.0.1版本的预测库* )。
* After downloading, use the following command to extract files. 下载之后解压:
``` ```shell
tar -xf paddle_inference.tgz tar -xf paddle_inference.tgz
``` ```
Finally you will see the the folder of `paddle_inference/` in the current path. 最终会在当前的文件夹中生成`paddle_inference/`的子文件夹。
#### 1.2.2 Compile the inference source code <a name="132"></a>
* If you want to get the latest Paddle inference library features, you can download the latest code from Paddle GitHub repository and compile the inference library from the source code. It is recommended to download the inference library with paddle version greater than or equal to 2.0.1. #### 1.3.2 预测库源码编译
* You can refer to [Paddle inference library] (https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html) to get the Paddle source code from GitHub, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
如果希望获取最新预测库特性,可以从github上克隆最新Paddle代码进行编译,生成最新的预测库。
* 使用git获取代码:
```shell ```shell
git clone https://github.com/PaddlePaddle/Paddle.git git clone https://github.com/PaddlePaddle/Paddle.git
git checkout develop git checkout develop
``` ```
* Enter the Paddle directory and run the following commands to compile the paddle inference library. * 进入Paddle目录,进行编译:
```shell ```shell
rm -rf build rm -rf build
...@@ -139,10 +145,10 @@ make -j ...@@ -139,10 +145,10 @@ make -j
make inference_lib_dist make inference_lib_dist
``` ```
For more compilation parameter options, please refer to the [document](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#congyuanmabianyi). 更多编译参数选项介绍可以参考[Paddle预测库编译文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#congyuanmabianyi)
* After the compilation process, you can see the following files in the folder of `build/paddle_inference_install_dir/`. * 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。
``` ```
build/paddle_inference_install_dir/ build/paddle_inference_install_dir/
...@@ -152,14 +158,17 @@ build/paddle_inference_install_dir/ ...@@ -152,14 +158,17 @@ build/paddle_inference_install_dir/
|-- version.txt |-- version.txt
``` ```
`paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library. 其中`paddle`就是C++预测所需的Paddle库,`version.txt`中包含当前预测库的版本信息。
<a name="2"></a>
## 2. 开始运行
## 2. Compile and Run the Demo <a name="21"></a>
### 2.1 Export the inference model ### 2.1 准备模型
* You can refer to [Model inference](../../doc/doc_ch/inference.md) and export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows. 直接下载PaddleOCR提供的推理模型,或者参考[模型预测章节](../../doc/doc_ch/inference_ppocr.md),将训练好的模型导出为推理模型。模型导出之后,假设放在`inference`目录下,则目录结构如下。
``` ```
inference/ inference/
...@@ -174,42 +183,43 @@ inference/ ...@@ -174,42 +183,43 @@ inference/
| |--inference.pdmodel | |--inference.pdmodel
``` ```
<a name="22"></a>
### 2.2 Compile PaddleOCR C++ inference demo ### 2.2 编译PaddleOCR C++预测demo
* The compilation commands are as follows. The addresses of Paddle C++ inference library, opencv and other Dependencies need to be replaced with the actual addresses on your own machines. 编译命令如下,其中Paddle C++预测库、opencv等其他依赖库的地址需要换成自己机器上的实际地址。
```shell ```shell
sh tools/build.sh sh tools/build.sh
``` ```
Specifically, you should modify the paths in `tools/build.sh`. The related content is as follows. 具体的,需要修改`tools/build.sh`中环境路径,相关内容如下:
```shell ```shell
OPENCV_DIR=your_opencv_dir OPENCV_DIR=your_opencv_dir
LIB_DIR=your_paddle_inference_dir LIB_DIR=your_paddle_inference_dir
CUDA_LIB_DIR=your_cuda_lib_dir CUDA_LIB_DIR=your_cuda_lib_dir
CUDNN_LIB_DIR=your_cudnn_lib_dir CUDNN_LIB_DIR=/your_cudnn_lib_dir
``` ```
`OPENCV_DIR` is the OpenCV installation path; `LIB_DIR` is the download (`paddle_inference` folder) 其中,`OPENCV_DIR`为opencv编译安装的地址;`LIB_DIR`为下载(`paddle_inference`文件夹)或者编译生成的Paddle预测库地址(`build/paddle_inference_install_dir`文件夹);`CUDA_LIB_DIR`为cuda库文件地址,在docker中为`/usr/local/cuda/lib64``CUDNN_LIB_DIR`为cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/`**注意:以上路径都写绝对路径,不要写相对路径。**
or the generated Paddle inference library path (`build/paddle_inference_install_dir` folder);
`CUDA_LIB_DIR` is the CUDA library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cuDNN library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* After the compilation is completed, an executable file named `ppocr` will be generated in the `build` folder. 编译完成之后,会在`build`文件夹下生成一个名为`ppocr`的可执行文件。
<a name="23"></a>
### Run the demo ### 2.3 运行demo
Execute the built executable file:
本demo支持系统串联调用,也支持单个功能的调用,如,只使用检测或识别功能。
运行方式:
```shell ```shell
./build/ppocr [--param1] [--param2] [...] ./build/ppocr [--param1] [--param2] [...]
``` ```
具体命令如下:
Specifically, ##### 1. 检测+分类+识别:
##### 1. det+cls+rec:
```shell ```shell
./build/ppocr --det_model_dir=inference/det_db \ ./build/ppocr --det_model_dir=inference/det_db \
--rec_model_dir=inference/rec_rcnn \ --rec_model_dir=inference/rec_rcnn \
...@@ -221,7 +231,7 @@ Specifically, ...@@ -221,7 +231,7 @@ Specifically,
--cls=true \ --cls=true \
``` ```
##### 2. det+rec ##### 2. 检测+识别
```shell ```shell
./build/ppocr --det_model_dir=inference/det_db \ ./build/ppocr --det_model_dir=inference/det_db \
--rec_model_dir=inference/rec_rcnn \ --rec_model_dir=inference/rec_rcnn \
...@@ -232,7 +242,7 @@ Specifically, ...@@ -232,7 +242,7 @@ Specifically,
--cls=false \ --cls=false \
``` ```
##### 3. det ##### 3. 检测:
```shell ```shell
./build/ppocr --det_model_dir=inference/det_db \ ./build/ppocr --det_model_dir=inference/det_db \
--image_dir=../../doc/imgs/12.jpg \ --image_dir=../../doc/imgs/12.jpg \
...@@ -240,7 +250,7 @@ Specifically, ...@@ -240,7 +250,7 @@ Specifically,
--rec=false --rec=false
``` ```
##### 4. cls+rec ##### 4. 分类+识别
```shell ```shell
./build/ppocr --rec_model_dir=inference/rec_rcnn \ ./build/ppocr --rec_model_dir=inference/rec_rcnn \
--cls_model_dir=inference/cls \ --cls_model_dir=inference/cls \
...@@ -251,7 +261,7 @@ Specifically, ...@@ -251,7 +261,7 @@ Specifically,
--cls=true \ --cls=true \
``` ```
##### 5. rec ##### 5. 识别:
```shell ```shell
./build/ppocr --rec_model_dir=inference/rec_rcnn \ ./build/ppocr --rec_model_dir=inference/rec_rcnn \
--image_dir=../../doc/imgs_words/ch/word_1.jpg \ --image_dir=../../doc/imgs_words/ch/word_1.jpg \
...@@ -261,7 +271,7 @@ Specifically, ...@@ -261,7 +271,7 @@ Specifically,
--cls=false \ --cls=false \
``` ```
##### 6. cls ##### 6. 分类:
```shell ```shell
./build/ppocr --cls_model_dir=inference/cls \ ./build/ppocr --cls_model_dir=inference/cls \
--cls_model_dir=inference/cls \ --cls_model_dir=inference/cls \
...@@ -272,62 +282,63 @@ Specifically, ...@@ -272,62 +282,63 @@ Specifically,
--cls=true \ --cls=true \
``` ```
More parameters are as follows, 更多支持的可调节参数解释如下:
- Common parameters - 通用参数
|parameter|data type|default|meaning|
| --- | --- | --- | --- |
|use_gpu|bool|false|Whether to use GPU|
|gpu_id|int|0|GPU id when use_gpu is true|
|gpu_mem|int|4000|GPU memory requested|
|cpu_math_library_num_threads|int|10|Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed|
|enable_mkldnn|bool|true|Whether to use mkdlnn library|
|output|str|./output|Path where visualization results are saved|
|参数名称|类型|默认参数|意义|
| :---: | :---: | :---: | :---: |
|use_gpu|bool|false|是否使用GPU|
|gpu_id|int|0|GPU id,使用GPU时有效|
|gpu_mem|int|4000|申请的GPU内存|
|cpu_math_library_num_threads|int|10|CPU预测时的线程数,在机器核数充足的情况下,该值越大,预测速度越快|
|enable_mkldnn|bool|true|是否使用mkldnn库|
|output|str|./output|可视化结果保存的路径|
- forward - 前向相关
|parameter|data type|default|meaning| |参数名称|类型|默认参数|意义|
| :---: | :---: | :---: | :---: | | :---: | :---: | :---: | :---: |
|det|bool|true|前向是否执行文字检测| |det|bool|true|前向是否执行文字检测|
|rec|bool|true|前向是否执行文字识别| |rec|bool|true|前向是否执行文字识别|
|cls|bool|false|前向是否执行文字方向分类| |cls|bool|false|前向是否执行文字方向分类|
- Detection related parameters - 检测模型相关
|parameter|data type|default|meaning| |参数名称|类型|默认参数|意义|
| --- | --- | --- | --- | | :---: | :---: | :---: | :---: |
|det_model_dir|string|-|Address of detection inference model| |det_model_dir|string|-|检测模型inference model地址|
|max_side_len|int|960|Limit the maximum image height and width to 960| |max_side_len|int|960|输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960|
|det_db_thresh|float|0.3|Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result| |det_db_thresh|float|0.3|用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显|
|det_db_box_thresh|float|0.5|DB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate| |det_db_box_thresh|float|0.5|DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小|
|det_db_unclip_ratio|float|1.6|Indicates the compactness of the text box, the smaller the value, the closer the text box to the text| |det_db_unclip_ratio|float|1.6|表示文本框的紧致程度,越小则文本框更靠近文本|
|det_db_score_mode|string|slow| slow: use polygon box to calculate bbox score, fast: use rectangle box to calculate. Use rectangular box to calculate faster, and polygonal box more accurate for curved text area.| |det_db_score_mode|string|slow|slow:使用多边形框计算bbox score,fast:使用矩形框计算。矩形框计算速度更快,多边形框对弯曲文本区域计算更准确。|
|visualize|bool|true|Whether to visualize the results,when it is set as true, the prediction results will be saved in the folder specified by the `output` field on an image with the same name as the input image.| |visualize|bool|true|是否对结果进行可视化,为1时,预测结果会保存在`output`字段指定的文件夹下和输入图像同名的图像上。|
- Classifier related parameters - 方向分类器相关
|parameter|data type|default|meaning| |参数名称|类型|默认参数|意义|
| --- | --- | --- | --- | | :---: | :---: | :---: | :---: |
|use_angle_cls|bool|false|Whether to use the direction classifier| |use_angle_cls|bool|false|是否使用方向分类器|
|cls_model_dir|string|-|Address of direction classifier inference model| |cls_model_dir|string|-|方向分类器inference model地址|
|cls_thresh|float|0.9|Score threshold of the direction classifier| |cls_thresh|float|0.9|方向分类器的得分阈值|
|cls_batch_num|int|1|batch size of classifier| |cls_batch_num|int|1|方向分类器batchsize|
- Recognition related parameters - 识别模型相关
|parameter|data type|default|meaning| |参数名称|类型|默认参数|意义|
| --- | --- | --- | --- | | :---: | :---: | :---: | :---: |
|rec_model_dir|string|-|Address of recognition inference model| |rec_model_dir|string|-|识别模型inference model地址|
|rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|dictionary file| |rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|字典文件|
|rec_batch_num|int|6|batch size of recognition| |rec_batch_num|int|6|识别模型batchsize|
|rec_img_h|int|32|识别模型输入图像高度|
|rec_img_w|int|320|识别模型输入图像宽度|
* Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `rec_char_dict_path` and `rec_model_dir`.
* PaddleOCR也支持多语言的预测,更多支持的语言和模型可以参考[识别文档](../../doc/doc_ch/recognition.md)中的多语言字典与模型部分,如果希望进行多语言预测,只需将修改`rec_char_dict_path`(字典文件路径)以及`rec_model_dir`(inference模型路径)字段即可。
The detection results will be shown on the screen, which is as follows. 最终屏幕上会输出检测结果如下。
```bash ```bash
predict img: ../../doc/imgs/12.jpg predict img: ../../doc/imgs/12.jpg
...@@ -339,7 +350,7 @@ predict img: ../../doc/imgs/12.jpg ...@@ -339,7 +350,7 @@ predict img: ../../doc/imgs/12.jpg
The detection visualized image saved in ./output//12.jpg The detection visualized image saved in ./output//12.jpg
``` ```
<a name="3"></a>
## 3. FAQ ## 3. FAQ
1. Encountered the error `unable to access 'https://github.com/LDOUBLEV/AutoLog.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.`, change the github address in `deploy/cpp_infer/external-cmake/auto-log.cmake` to the https://gitee.com/Double_V/AutoLog address. 1. 遇到报错 `unable to access 'https://github.com/LDOUBLEV/AutoLog.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.`, 将 `deploy/cpp_infer/external-cmake/auto-log.cmake` 中的github地址改为 https://gitee.com/Double_V/AutoLog 地址即可。
...@@ -47,6 +47,8 @@ DEFINE_string(rec_model_dir, "", "Path of rec inference model."); ...@@ -47,6 +47,8 @@ DEFINE_string(rec_model_dir, "", "Path of rec inference model.");
DEFINE_int32(rec_batch_num, 6, "rec_batch_num."); DEFINE_int32(rec_batch_num, 6, "rec_batch_num.");
DEFINE_string(rec_char_dict_path, "../../ppocr/utils/ppocr_keys_v1.txt", DEFINE_string(rec_char_dict_path, "../../ppocr/utils/ppocr_keys_v1.txt",
"Path of dictionary."); "Path of dictionary.");
DEFINE_int32(rec_img_h, 32, "rec image height");
DEFINE_int32(rec_img_w, 320, "rec image width");
// ocr forward related // ocr forward related
DEFINE_bool(det, true, "Whether use det in forward."); DEFINE_bool(det, true, "Whether use det in forward.");
......
...@@ -69,7 +69,7 @@ int main(int argc, char **argv) { ...@@ -69,7 +69,7 @@ int main(int argc, char **argv) {
cv::glob(FLAGS_image_dir, cv_all_img_names); cv::glob(FLAGS_image_dir, cv_all_img_names);
std::cout << "total images num: " << cv_all_img_names.size() << endl; std::cout << "total images num: " << cv_all_img_names.size() << endl;
PaddleOCR::PaddleOCR ocr = PaddleOCR::PaddleOCR(); PPOCR ocr = PPOCR();
std::vector<std::vector<OCRPredictResult>> ocr_results = std::vector<std::vector<OCRPredictResult>> ocr_results =
ocr.ocr(cv_all_img_names, FLAGS_det, FLAGS_rec, FLAGS_cls); ocr.ocr(cv_all_img_names, FLAGS_det, FLAGS_rec, FLAGS_cls);
......
...@@ -39,7 +39,9 @@ void CRNNRecognizer::Run(std::vector<cv::Mat> img_list, ...@@ -39,7 +39,9 @@ void CRNNRecognizer::Run(std::vector<cv::Mat> img_list,
auto preprocess_start = std::chrono::steady_clock::now(); auto preprocess_start = std::chrono::steady_clock::now();
int end_img_no = min(img_num, beg_img_no + this->rec_batch_num_); int end_img_no = min(img_num, beg_img_no + this->rec_batch_num_);
int batch_num = end_img_no - beg_img_no; int batch_num = end_img_no - beg_img_no;
float max_wh_ratio = 0; int imgH = this->rec_image_shape_[1];
int imgW = this->rec_image_shape_[2];
float max_wh_ratio = imgW * 1.0 / imgH;
for (int ino = beg_img_no; ino < end_img_no; ino++) { for (int ino = beg_img_no; ino < end_img_no; ino++) {
int h = img_list[indices[ino]].rows; int h = img_list[indices[ino]].rows;
int w = img_list[indices[ino]].cols; int w = img_list[indices[ino]].cols;
...@@ -47,28 +49,28 @@ void CRNNRecognizer::Run(std::vector<cv::Mat> img_list, ...@@ -47,28 +49,28 @@ void CRNNRecognizer::Run(std::vector<cv::Mat> img_list,
max_wh_ratio = max(max_wh_ratio, wh_ratio); max_wh_ratio = max(max_wh_ratio, wh_ratio);
} }
int batch_width = 0; int batch_width = imgW;
std::vector<cv::Mat> norm_img_batch; std::vector<cv::Mat> norm_img_batch;
for (int ino = beg_img_no; ino < end_img_no; ino++) { for (int ino = beg_img_no; ino < end_img_no; ino++) {
cv::Mat srcimg; cv::Mat srcimg;
img_list[indices[ino]].copyTo(srcimg); img_list[indices[ino]].copyTo(srcimg);
cv::Mat resize_img; cv::Mat resize_img;
this->resize_op_.Run(srcimg, resize_img, max_wh_ratio, this->resize_op_.Run(srcimg, resize_img, max_wh_ratio,
this->use_tensorrt_); this->use_tensorrt_, this->rec_image_shape_);
this->normalize_op_.Run(&resize_img, this->mean_, this->scale_, this->normalize_op_.Run(&resize_img, this->mean_, this->scale_,
this->is_scale_); this->is_scale_);
norm_img_batch.push_back(resize_img); norm_img_batch.push_back(resize_img);
batch_width = max(resize_img.cols, batch_width); batch_width = max(resize_img.cols, batch_width);
} }
std::vector<float> input(batch_num * 3 * 32 * batch_width, 0.0f); std::vector<float> input(batch_num * 3 * imgH * batch_width, 0.0f);
this->permute_op_.Run(norm_img_batch, input.data()); this->permute_op_.Run(norm_img_batch, input.data());
auto preprocess_end = std::chrono::steady_clock::now(); auto preprocess_end = std::chrono::steady_clock::now();
preprocess_diff += preprocess_end - preprocess_start; preprocess_diff += preprocess_end - preprocess_start;
// Inference. // Inference.
auto input_names = this->predictor_->GetInputNames(); auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputHandle(input_names[0]); auto input_t = this->predictor_->GetInputHandle(input_names[0]);
input_t->Reshape({batch_num, 3, 32, batch_width}); input_t->Reshape({batch_num, 3, imgH, batch_width});
auto inference_start = std::chrono::steady_clock::now(); auto inference_start = std::chrono::steady_clock::now();
input_t->CopyFromCpu(input.data()); input_t->CopyFromCpu(input.data());
this->predictor_->Run(); this->predictor_->Run();
...@@ -142,13 +144,14 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) { ...@@ -142,13 +144,14 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) {
precision = paddle_infer::Config::Precision::kInt8; precision = paddle_infer::Config::Precision::kInt8;
} }
config.EnableTensorRtEngine(1 << 20, 10, 3, precision, false, false); config.EnableTensorRtEngine(1 << 20, 10, 3, precision, false, false);
int imgH = this->rec_image_shape_[1];
int imgW = this->rec_image_shape_[2];
std::map<std::string, std::vector<int>> min_input_shape = { std::map<std::string, std::vector<int>> min_input_shape = {
{"x", {1, 3, 32, 10}}, {"lstm_0.tmp_0", {10, 1, 96}}}; {"x", {1, 3, imgH, 10}}, {"lstm_0.tmp_0", {10, 1, 96}}};
std::map<std::string, std::vector<int>> max_input_shape = { std::map<std::string, std::vector<int>> max_input_shape = {
{"x", {1, 3, 32, 2000}}, {"lstm_0.tmp_0", {1000, 1, 96}}}; {"x", {1, 3, imgH, 2000}}, {"lstm_0.tmp_0", {1000, 1, 96}}};
std::map<std::string, std::vector<int>> opt_input_shape = { std::map<std::string, std::vector<int>> opt_input_shape = {
{"x", {1, 3, 32, 320}}, {"lstm_0.tmp_0", {25, 1, 96}}}; {"x", {1, 3, imgH, imgW}}, {"lstm_0.tmp_0", {25, 1, 96}}};
config.SetTRTDynamicShapeInfo(min_input_shape, max_input_shape, config.SetTRTDynamicShapeInfo(min_input_shape, max_input_shape,
opt_input_shape); opt_input_shape);
......
...@@ -17,11 +17,9 @@ ...@@ -17,11 +17,9 @@
#include "auto_log/autolog.h" #include "auto_log/autolog.h"
#include <numeric> #include <numeric>
#include <sys/stat.h>
namespace PaddleOCR { namespace PaddleOCR {
PaddleOCR::PaddleOCR() { PPOCR::PPOCR() {
if (FLAGS_det) { if (FLAGS_det) {
this->detector_ = new DBDetector( this->detector_ = new DBDetector(
FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_gpu_id, FLAGS_gpu_mem, FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_gpu_id, FLAGS_gpu_mem,
...@@ -41,11 +39,12 @@ PaddleOCR::PaddleOCR() { ...@@ -41,11 +39,12 @@ PaddleOCR::PaddleOCR() {
this->recognizer_ = new CRNNRecognizer( this->recognizer_ = new CRNNRecognizer(
FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id, FLAGS_gpu_mem, FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id, FLAGS_gpu_mem,
FLAGS_cpu_threads, FLAGS_enable_mkldnn, FLAGS_rec_char_dict_path, FLAGS_cpu_threads, FLAGS_enable_mkldnn, FLAGS_rec_char_dict_path,
FLAGS_use_tensorrt, FLAGS_precision, FLAGS_rec_batch_num); FLAGS_use_tensorrt, FLAGS_precision, FLAGS_rec_batch_num,
FLAGS_rec_img_h, FLAGS_rec_img_w);
} }
}; };
void PaddleOCR::det(cv::Mat img, std::vector<OCRPredictResult> &ocr_results, void PPOCR::det(cv::Mat img, std::vector<OCRPredictResult> &ocr_results,
std::vector<double> &times) { std::vector<double> &times) {
std::vector<std::vector<std::vector<int>>> boxes; std::vector<std::vector<std::vector<int>>> boxes;
std::vector<double> det_times; std::vector<double> det_times;
...@@ -63,7 +62,7 @@ void PaddleOCR::det(cv::Mat img, std::vector<OCRPredictResult> &ocr_results, ...@@ -63,7 +62,7 @@ void PaddleOCR::det(cv::Mat img, std::vector<OCRPredictResult> &ocr_results,
times[2] += det_times[2]; times[2] += det_times[2];
} }
void PaddleOCR::rec(std::vector<cv::Mat> img_list, void PPOCR::rec(std::vector<cv::Mat> img_list,
std::vector<OCRPredictResult> &ocr_results, std::vector<OCRPredictResult> &ocr_results,
std::vector<double> &times) { std::vector<double> &times) {
std::vector<std::string> rec_texts(img_list.size(), ""); std::vector<std::string> rec_texts(img_list.size(), "");
...@@ -80,7 +79,7 @@ void PaddleOCR::rec(std::vector<cv::Mat> img_list, ...@@ -80,7 +79,7 @@ void PaddleOCR::rec(std::vector<cv::Mat> img_list,
times[2] += rec_times[2]; times[2] += rec_times[2];
} }
void PaddleOCR::cls(std::vector<cv::Mat> img_list, void PPOCR::cls(std::vector<cv::Mat> img_list,
std::vector<OCRPredictResult> &ocr_results, std::vector<OCRPredictResult> &ocr_results,
std::vector<double> &times) { std::vector<double> &times) {
std::vector<int> cls_labels(img_list.size(), 0); std::vector<int> cls_labels(img_list.size(), 0);
...@@ -98,7 +97,7 @@ void PaddleOCR::cls(std::vector<cv::Mat> img_list, ...@@ -98,7 +97,7 @@ void PaddleOCR::cls(std::vector<cv::Mat> img_list,
} }
std::vector<std::vector<OCRPredictResult>> std::vector<std::vector<OCRPredictResult>>
PaddleOCR::ocr(std::vector<cv::String> cv_all_img_names, bool det, bool rec, PPOCR::ocr(std::vector<cv::String> cv_all_img_names, bool det, bool rec,
bool cls) { bool cls) {
std::vector<double> time_info_det = {0, 0, 0}; std::vector<double> time_info_det = {0, 0, 0};
std::vector<double> time_info_rec = {0, 0, 0}; std::vector<double> time_info_rec = {0, 0, 0};
...@@ -139,7 +138,7 @@ PaddleOCR::ocr(std::vector<cv::String> cv_all_img_names, bool det, bool rec, ...@@ -139,7 +138,7 @@ PaddleOCR::ocr(std::vector<cv::String> cv_all_img_names, bool det, bool rec,
} }
} else { } else {
if (!Utility::PathExists(FLAGS_output) && FLAGS_det) { if (!Utility::PathExists(FLAGS_output) && FLAGS_det) {
mkdir(FLAGS_output.c_str(), 0777); Utility::CreateDir(FLAGS_output);
} }
for (int i = 0; i < cv_all_img_names.size(); ++i) { for (int i = 0; i < cv_all_img_names.size(); ++i) {
...@@ -188,8 +187,7 @@ PaddleOCR::ocr(std::vector<cv::String> cv_all_img_names, bool det, bool rec, ...@@ -188,8 +187,7 @@ PaddleOCR::ocr(std::vector<cv::String> cv_all_img_names, bool det, bool rec,
return ocr_results; return ocr_results;
} // namespace PaddleOCR } // namespace PaddleOCR
void PaddleOCR::log(std::vector<double> &det_times, void PPOCR::log(std::vector<double> &det_times, std::vector<double> &rec_times,
std::vector<double> &rec_times,
std::vector<double> &cls_times, int img_num) { std::vector<double> &cls_times, int img_num) {
if (det_times[0] + det_times[1] + det_times[2] > 0) { if (det_times[0] + det_times[1] + det_times[2] > 0) {
AutoLogger autolog_det("ocr_det", FLAGS_use_gpu, FLAGS_use_tensorrt, AutoLogger autolog_det("ocr_det", FLAGS_use_gpu, FLAGS_use_tensorrt,
...@@ -212,7 +210,7 @@ void PaddleOCR::log(std::vector<double> &det_times, ...@@ -212,7 +210,7 @@ void PaddleOCR::log(std::vector<double> &det_times,
autolog_cls.report(); autolog_cls.report();
} }
} }
PaddleOCR::~PaddleOCR() { PPOCR::~PPOCR() {
if (this->detector_ != nullptr) { if (this->detector_ != nullptr) {
delete this->detector_; delete this->detector_;
} }
......
...@@ -41,12 +41,13 @@ void Permute::Run(const cv::Mat *im, float *data) { ...@@ -41,12 +41,13 @@ void Permute::Run(const cv::Mat *im, float *data) {
} }
void PermuteBatch::Run(const std::vector<cv::Mat> imgs, float *data) { void PermuteBatch::Run(const std::vector<cv::Mat> imgs, float *data) {
for (int j = 0; j < imgs.size(); j ++){ for (int j = 0; j < imgs.size(); j++) {
int rh = imgs[j].rows; int rh = imgs[j].rows;
int rw = imgs[j].cols; int rw = imgs[j].cols;
int rc = imgs[j].channels(); int rc = imgs[j].channels();
for (int i = 0; i < rc; ++i) { for (int i = 0; i < rc; ++i) {
cv::extractChannel(imgs[j], cv::Mat(rh, rw, CV_32FC1, data + (j * rc + i) * rh * rw), i); cv::extractChannel(
imgs[j], cv::Mat(rh, rw, CV_32FC1, data + (j * rc + i) * rh * rw), i);
} }
} }
} }
...@@ -102,7 +103,7 @@ void CrnnResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img, float wh_ratio, ...@@ -102,7 +103,7 @@ void CrnnResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img, float wh_ratio,
imgH = rec_image_shape[1]; imgH = rec_image_shape[1];
imgW = rec_image_shape[2]; imgW = rec_image_shape[2];
imgW = int(32 * wh_ratio); imgW = int(imgH * wh_ratio);
float ratio = float(img.cols) / float(img.rows); float ratio = float(img.cols) / float(img.rows);
int resize_w, resize_h; int resize_w, resize_h;
......
...@@ -16,10 +16,15 @@ ...@@ -16,10 +16,15 @@
#include <include/utility.h> #include <include/utility.h>
#include <iostream> #include <iostream>
#include <ostream> #include <ostream>
#include <sys/stat.h>
#include <sys/types.h>
#include <vector> #include <vector>
#ifdef _WIN32
#include <direct.h>
#else
#include <sys/stat.h>
#endif
namespace PaddleOCR { namespace PaddleOCR {
std::vector<std::string> Utility::ReadDict(const std::string &path) { std::vector<std::string> Utility::ReadDict(const std::string &path) {
...@@ -206,6 +211,14 @@ bool Utility::PathExists(const std::string &path) { ...@@ -206,6 +211,14 @@ bool Utility::PathExists(const std::string &path) {
#endif // !_WIN32 #endif // !_WIN32
} }
void Utility::CreateDir(const std::string &path) {
#ifdef _WIN32
_mkdir(path.c_str());
#else
mkdir(path.c_str(), 0777);
#endif // !_WIN32
}
void Utility::print_result(const std::vector<OCRPredictResult> &ocr_result) { void Utility::print_result(const std::vector<OCRPredictResult> &ocr_result) {
for (int i = 0; i < ocr_result.size(); i++) { for (int i = 0; i < ocr_result.size(); i++) {
std::cout << i << "\t"; std::cout << i << "\t";
......
- [端侧部署](#端侧部署) - [Tutorial of PaddleOCR Mobile deployment](#tutorial-of-paddleocr-mobile-deployment)
- [1. 准备环境](#1-准备环境) - [1. Preparation](#1-preparation)
- [运行准备](#运行准备) - [Preparation environment](#preparation-environment)
- [1.1 准备交叉编译环境](#11-准备交叉编译环境) - [1.1 Prepare the cross-compilation environment](#11-prepare-the-cross-compilation-environment)
- [1.2 准备预测库](#12-准备预测库) - [1.2 Prepare Paddle-Lite library](#12-prepare-paddle-lite-library)
- [2 开始运行](#2-开始运行) - [2 Run](#2-run)
- [2.1 模型优化](#21-模型优化) - [2.1 Inference Model Optimization](#21-inference-model-optimization)
- [2.2 与手机联调](#22-与手机联调) - [2.2 Run optimized model on Phone](#22-run-optimized-model-on-phone)
- [注意:](#注意) - [注意:](#注意)
- [FAQ](#faq) - [FAQ](#faq)
# 端侧部署 # Tutorial of PaddleOCR Mobile deployment
本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleOCR超轻量中文检测、识别模型的详细步骤。 This tutorial will introduce how to use [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) to deploy PaddleOCR ultra-lightweight Chinese and English detection models on mobile phones.
Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。 paddle-lite is a lightweight inference engine for PaddlePaddle. It provides efficient inference capabilities for mobile phones and IoT, and extensively integrates cross-platform hardware to provide lightweight deployment solutions for end-side deployment issues.
## 1. Preparation
## 1. 准备环境 ### Preparation environment
### 运行准备 - Computer (for Compiling Paddle Lite)
- 电脑(编译Paddle Lite) - Mobile phone (arm7 or arm8)
- 安卓手机(armv7或armv8)
### 1.1 准备交叉编译环境 ### 1.1 Prepare the cross-compilation environment
交叉编译环境用于编译 Paddle Lite 和 PaddleOCR 的C++ demo。 The cross-compilation environment is used to compile C++ demos of Paddle Lite and PaddleOCR.
支持多种开发环境,不同开发环境的编译流程请参考对应文档。 Supports multiple development environments.
For the compilation process of different development environments, please refer to the corresponding documents.
1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker) 1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux) 2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os) 3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
### 1.2 准备预测库 ### 1.2 Prepare Paddle-Lite library
预测库有两种获取方式 There are two ways to obtain the Paddle-Lite library
- 1. 直接下载,预测库下载链接如下 - 1. Download directly, the download link of the Paddle-Lite library is as follows
| 平台 | 预测库下载链接 | | Platform | Paddle-Lite library download link |
|---|---| |---|---|
|Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)| |Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)|
|IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)| |IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)|
注:1. 上述预测库为PaddleLite 2.10分支编译得到,有关PaddleLite 2.10 详细信息可参考 [链接](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10) 。 Note: 1. The above Paddle-Lite library is compiled from the Paddle-Lite 2.10 branch. For more information about Paddle-Lite 2.10, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10).
- 2. [推荐]编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下 - 2. [Recommended] Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows
``` ```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite cd Paddle-Lite
# 切换到Paddle-Lite release/v2.10 稳定分支 # Switch to Paddle-Lite release/v2.10 stable branch
git checkout release/v2.10 git checkout release/v2.10
./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON ./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON
``` ```
注意:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8, Note: When compiling Paddle-Lite to obtain the Paddle-Lite library, you need to turn on the two options `--with_cv=ON --with_extra=ON`, `--arch` means the `arm` version, here is designated as armv8,
更多编译命令
介绍请参考 [链接](https://paddle-lite.readthedocs.io/zh/release-v2.10_a/source_compile/linux_x86_compile_android.html) More compilation commands refer to the introduction [link](https://paddle-lite.readthedocs.io/zh/release-v2.10_a/source_compile/linux_x86_compile_android.html)
After directly downloading the Paddle-Lite library and decompressing it, you can get the `inference_lite_lib.android.armv8/` folder, and the Paddle-Lite library obtained by compiling Paddle-Lite is located
`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/` folder.
直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8/`文件夹,通过编译Paddle-Lite得到的预测库位于 The structure of the prediction library is as follows:
`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。
预测库的文件目录如下:
``` ```
inference_lite_lib.android.armv8/ inference_lite_lib.android.armv8/
|-- cxx C++ 预测库和头文件 |-- cxx C++ prebuild library
| |-- include C++ 头文件 | |-- include C++
| | |-- paddle_api.h | | |-- paddle_api.h
| | |-- paddle_image_preprocess.h | | |-- paddle_image_preprocess.h
| | |-- paddle_lite_factory_helper.h | | |-- paddle_lite_factory_helper.h
...@@ -69,219 +72,215 @@ inference_lite_lib.android.armv8/ ...@@ -69,219 +72,215 @@ inference_lite_lib.android.armv8/
| | |-- paddle_use_kernels.h | | |-- paddle_use_kernels.h
| | |-- paddle_use_ops.h | | |-- paddle_use_ops.h
| | `-- paddle_use_passes.h | | `-- paddle_use_passes.h
| `-- lib C++预测库 | `-- lib C++ library
| |-- libpaddle_api_light_bundled.a C++静态库 | |-- libpaddle_api_light_bundled.a C++ static library
| `-- libpaddle_light_api_shared.so C++动态库 | `-- libpaddle_light_api_shared.so C++ dynamic library
|-- java Java预测库 |-- java Java library
| |-- jar | |-- jar
| | `-- PaddlePredictor.jar | | `-- PaddlePredictor.jar
| |-- so | |-- so
| | `-- libpaddle_lite_jni.so | | `-- libpaddle_lite_jni.so
| `-- src | `-- src
|-- demo C++和Java示例代码 |-- demo C++ and Java demo
| |-- cxx C++ 预测库demo | |-- cxx C++ demo
| `-- java Java 预测库demo | `-- java Java demo
``` ```
## 2 开始运行 ## 2 Run
### 2.1 模型优化 ### 2.1 Inference Model Optimization
Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-lite的opt工具可以自动 Paddle Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle Lite provide opt tools to automatically complete the optimization steps and output a lightweight, optimal executable model.
对inference模型进行优化,优化后的模型更轻量,模型运行速度更快。
如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。 If you have prepared the model file ending in .nb, you can skip this step.
下述表格中也提供了一系列中文移动端模型: The following table also provides a series of models that can be deployed on mobile phones to recognize Chinese. You can directly download the optimized model.
|模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本| |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch|
|---|---|---|---|---|---|---| |---|---|---|---|---|---|---|
|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10| |PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10|
|PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.6M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_slim_opt.nb)|v2.10| |PP-OCRv2(slim)|extra-lightweight chinese OCR optimized model|4.6M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_slim_opt.nb)|v2.10|
如果直接使用上述表格中的模型进行部署,可略过下述步骤,直接阅读 [2.2节](#2.2与手机联调) If you directly use the model in the above table for deployment, you can skip the following steps and directly read [Section 2.2](#2.2-Run-optimized-model-on-Phone).
如果要部署的模型不在上述表格中,则需要按照如下步骤获得优化后的模型。 If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model.
模型优化需要Paddle-Lite的opt可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下: The `opt` tool can be obtained by compiling Paddle Lite.
``` ```
# 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite
git clone https://github.com/PaddlePaddle/Paddle-Lite.git git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite cd Paddle-Lite
git checkout release/v2.10 git checkout release/v2.10
# 启动编译
./lite/tools/build.sh build_optimize_tool ./lite/tools/build.sh build_optimize_tool
``` ```
编译完成后,opt文件位于`build.opt/lite/api/`下,可通过如下方式查看opt的运行选项和使用方式; After the compilation is complete, the opt file is located under build.opt/lite/api/, You can view the operating options and usage of opt in the following ways:
``` ```
cd build.opt/lite/api/ cd build.opt/lite/api/
./opt ./opt
``` ```
|选项|说明| |Options|Description|
|---|---| |---|---|
|--model_dir|待优化的PaddlePaddle模型(非combined形式)的路径| |--model_dir|The path of the PaddlePaddle model to be optimized (non-combined form)|
|--model_file|待优化的PaddlePaddle模型(combined形式)的网络结构文件路径| |--model_file|The network structure file path of the PaddlePaddle model (combined form) to be optimized|
|--param_file|待优化的PaddlePaddle模型(combined形式)的权重文件路径| |--param_file|The weight file path of the PaddlePaddle model (combined form) to be optimized|
|--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现。若您需要在mobile端执行模型预测,请将此选项设置为naive_buffer。默认为protobuf| |--optimize_out_type|Output model type, currently supports two types: protobuf and naive_buffer, among which naive_buffer is a more lightweight serialization/deserialization implementation. If you need to perform model prediction on the mobile side, please set this option to naive_buffer. The default is protobuf|
|--optimize_out|优化模型的输出路径| |--optimize_out|The output path of the optimized model|
|--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm| |--valid_targets|The executable backend of the model, the default is arm. Currently it supports x86, arm, opencl, npu, xpu, multiple backends can be specified at the same time (separated by spaces), and Model Optimize Tool will automatically select the best method. If you need to support Huawei NPU (DaVinci architecture NPU equipped with Kirin 810/990 Soc), it should be set to npu, arm|
|--record_tailoring_info|当使用 根据模型裁剪库文件 功能时,则设置该选项为true,以记录优化后模型含有的kernel和OP信息,默认为false| |--record_tailoring_info|When using the function of cutting library files according to the model, set this option to true to record the kernel and OP information contained in the optimized model. The default is false|
`--model_dir`适用于待优化的模型是非combined方式,PaddleOCR的inference模型是combined方式,即模型结构和模型参数使用单独一个文件存储。 `--model_dir` is suitable for the non-combined mode of the model to be optimized, and the inference model of PaddleOCR is the combined mode, that is, the model structure and model parameters are stored in a single file.
下面以PaddleOCR的超轻量中文模型为例,介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。 The following takes the ultra-lightweight Chinese model of PaddleOCR as an example to introduce the use of the compiled opt file to complete the conversion of the inference model to the Paddle-Lite optimized model
``` ```
# 【推荐】 下载 PP-OCRv2版本的中英文 inference模型 # 【[Recommendation] Download the Chinese and English inference model of PP-OCRv2
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls_slim_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_slim_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls_slim_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_slim_infer.tar
# 转换检测模型 # Convert detection model
./opt --model_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_det_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer ./opt --model_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_det_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
# 转换识别模型 # Convert recognition model
./opt --model_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_rec_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer ./opt --model_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_rec_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
# 转换方向分类器模型 # Convert angle classifier model
./opt --model_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdiparams --optimize_out=./ch_ppocr_mobile_v2.0_cls_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer ./opt --model_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdiparams --optimize_out=./ch_ppocr_mobile_v2.0_cls_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
``` ```
转换成功后,inference模型目录下会多出`.nb`结尾的文件,即是转换成功的模型文件。 After the conversion is successful, there will be more files ending with `.nb` in the inference model directory, which is the successfully converted model file.
注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt 工具的输入模型是paddle保存的inference模型
<a name="2.2与手机联调"></a> <a name="2.2-Run-optimized-model-on-Phone"></a>
### 2.2 与手机联调 ### 2.2 Run optimized model on Phone
首先需要进行一些准备工作。 Some preparatory work is required first.
1. 准备一台arm8的安卓手机,如果编译的预测库和opt文件是armv7,则需要arm7的手机,并修改Makefile中`ARM_ABI = arm7` 1. Prepare an Android phone with arm8. If the compiled prediction library and opt file are armv7, you need an arm7 phone and modify ARM_ABI = arm7 in the Makefile.
2. 打开手机的USB调试选项,选择文件传输模式,连接电脑。 2. Make sure the phone is connected to the computer, open the USB debugging option of the phone, and select the file transfer mode.
3. 电脑上安装adb工具,用于调试。 adb安装方式如下: 3. Install the adb tool on the computer.
3.1. MAC电脑安装ADB: 3.1. Install ADB for MAC:
``` ```
brew cask install android-platform-tools brew cask install android-platform-tools
``` ```
3.2. Linux安装ADB 3.2. Install ADB for Linux
``` ```
sudo apt update sudo apt update
sudo apt install -y wget adb sudo apt install -y wget adb
``` ```
3.3. Window安装ADB 3.3. Install ADB for windows
win上安装需要去谷歌的安卓平台下载adb软件包进行安装:[链接](https://developer.android.com/studio) To install on win, you need to go to Google's Android platform to download the adb package for installation:[link](https://developer.android.com/studio)
打开终端,手机连接电脑,在终端中输入 Verify whether adb is installed successfully
``` ```
adb devices adb devices
``` ```
如果有device输出,则表示安装成功 If there is device output, it means the installation is successful
``` ```
List of devices attached List of devices attached
744be294 device 744be294 device
``` ```
4. 准备优化后的模型、预测库文件、测试图像和使用的字典文件。 4. Prepare optimized models, prediction library files, test images and dictionary files used.
``` ```
git clone https://github.com/PaddlePaddle/PaddleOCR.git git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR/deploy/lite/ cd PaddleOCR/deploy/lite/
# 运行prepare.sh,准备预测库文件、测试图像和使用的字典文件,并放置在预测库中的demo/cxx/ocr文件夹下 # run prepare.sh
sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8 sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8
# 进入OCR demo的工作目录 #
cd /{lite prediction library path}/inference_lite_lib.android.armv8/ cd /{lite prediction library path}/inference_lite_lib.android.armv8/
cd demo/cxx/ocr/ cd demo/cxx/ocr/
# 将C++预测动态库so文件复制到debug文件夹中 # copy paddle-lite C++ .so file to debug/ directory
cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
cd inference_lite_lib.android.armv8/demo/cxx/ocr/
cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/ cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
``` ```
准备测试图像,以`PaddleOCR/doc/imgs/11.jpg`为例,将测试的图像复制到`demo/cxx/ocr/debug/`文件夹下。 Prepare the test image, taking PaddleOCR/doc/imgs/11.jpg as an example, copy the image file to the demo/cxx/ocr/debug/ folder. Prepare the model files optimized by the lite opt tool, ch_det_mv3_db_opt.nb, ch_rec_mv3_crnn_opt.nb, and place them under the demo/cxx/ocr/debug/ folder.
准备lite opt工具优化后的模型文件,比如使用`ch_PP-OCRv2_det_slim_opt.ch_PP-OCRv2_rec_slim_rec.nb, ch_ppocr_mobile_v2.0_cls_slim_opt.nb`,模型文件放置在`demo/cxx/ocr/debug/`文件夹下。
执行完成后,ocr文件夹下将有如下文件格式: The structure of the OCR demo is as follows after the above command is executed:
``` ```
demo/cxx/ocr/ demo/cxx/ocr/
|-- debug/ |-- debug/
| |--ch_PP-OCRv2_det_slim_opt.nb 优化后的检测模型文件 | |--ch_PP-OCRv2_det_slim_opt.nb Detection model
| |--ch_PP-OCRv2_rec_slim_opt.nb 优化后的识别模型文件 | |--ch_PP-OCRv2_rec_slim_opt.nb Recognition model
| |--ch_ppocr_mobile_v2.0_cls_slim_opt.nb 优化后的文字方向分类器模型文件 | |--ch_ppocr_mobile_v2.0_cls_slim_opt.nb Text direction classification model
| |--11.jpg 待测试图像 | |--11.jpg Image for OCR
| |--ppocr_keys_v1.txt 中文字典文件 | |--ppocr_keys_v1.txt Dictionary file
| |--libpaddle_light_api_shared.so C++预测库文件 | |--libpaddle_light_api_shared.so C++ .so file
| |--config.txt 超参数配置 | |--config.txt Config file
|-- config.txt 超参数配置 |-- config.txt Config file
|-- cls_process.cc 方向分类器的预处理和后处理文件 |-- cls_process.cc Pre-processing and post-processing files for the angle classifier
|-- cls_process.h |-- cls_process.h
|-- crnn_process.cc 识别模型CRNN的预处理和后处理文件 |-- crnn_process.cc Pre-processing and post-processing files for the CRNN model
|-- crnn_process.h |-- crnn_process.h
|-- db_post_process.cc 检测模型DB的后处理文件 |-- db_post_process.cc Pre-processing and post-processing files for the DB model
|-- db_post_process.h |-- db_post_process.h
|-- Makefile 编译文件 |-- Makefile
|-- ocr_db_crnn.cc C++预测源文件 |-- ocr_db_crnn.cc C++ main code
``` ```
#### 注意: #### 注意:
1. ppocr_keys_v1.txt是中文字典文件,如果使用的 nb 模型是英文数字或其他语言的模型,需要更换为对应语言的字典。 1. `ppocr_keys_v1.txt` is a Chinese dictionary file. If the nb model is used for English recognition or other language recognition, dictionary file should be replaced with a dictionary of the corresponding language. PaddleOCR provides a variety of dictionaries under ppocr/utils/, including:
PaddleOCR 在ppocr/utils/下存放了多种字典,包括:
``` ```
dict/french_dict.txt # 法语字典 dict/french_dict.txt # french
dict/german_dict.txt # 德语字典 dict/german_dict.txt # german
ic15_dict.txt # 英文字典 ic15_dict.txt # english
dict/japan_dict.txt # 日语字典 dict/japan_dict.txt # japan
dict/korean_dict.txt # 韩语字典 dict/korean_dict.txt # korean
ppocr_keys_v1.txt # 中文字典 ppocr_keys_v1.txt # chinese
...
``` ```
2. `config.txt` 包含了检测器、分类器的超参数,如下: 2. `config.txt` of the detector and classifier, as shown below:
``` ```
max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960 max_side_len 960 # Limit the maximum image height and width to 960
det_db_thresh 0.3 # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显 det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
det_db_box_thresh 0.5 # DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小 det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
det_db_unclip_ratio 1.6 # 表示文本框的紧致程度,越小则文本框更靠近文本 det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1表示使用 use_direction_classify 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use
``` ```
5. 启动调试 5. Run Model on phone
上述步骤完成后就可以使用adb将文件push到手机上运行,步骤如下: After the above steps are completed, you can use adb to push the file to the phone to run, the steps are as follows:
``` ```
# 执行编译,得到可执行文件ocr_db_crnn, 第一次执行此命令会下载opencv等依赖库,下载完成后,需要再执行一次 # Execute the compilation and get the executable file ocr_db_crnn
# The first execution of this command will download dependent libraries such as opencv. After the download is complete, you need to execute it again
make -j make -j
# Move the compiled executable file to the debug folder
# 将编译的可执行文件移动到debug文件夹中
mv ocr_db_crnn ./debug/ mv ocr_db_crnn ./debug/
# 将debug文件夹push到手机上 # Push the debug folder to the phone
adb push debug /data/local/tmp/ adb push debug /data/local/tmp/
adb shell adb shell
cd /data/local/tmp/debug cd /data/local/tmp/debug
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
# 开始使用,ocr_db_crnn可执行文件的使用方式为: # The use of ocr_db_crnn is:
# ./ocr_db_crnn 检测模型文件 方向分类器模型文件 识别模型文件 测试图像路径 字典文件路径 # ./ocr_db_crnn Detection model file Orientation classifier model file Recognition model file Test image path Dictionary file path
./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb ./11.jpg ppocr_keys_v1.txt ./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_opt.nb ./11.jpg ppocr_keys_v1.txt
``` ```
如果对代码做了修改,则需要重新编译并push到手机上。 If you modify the code, you need to recompile and push to the phone.
运行效果如下: The outputs are as follows:
<div align="center"> <div align="center">
<img src="imgs/lite_demo.png" width="600"> <img src="imgs/lite_demo.png" width="600">
</div> </div>
## FAQ ## FAQ
Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗?
A1:如果已经走通了上述步骤,更换模型只需要替换 .nb 模型文件即可,同时要注意更新字典 Q1: What if I want to change the model, do I need to run it again according to the process?
A1: If you have performed the above steps, you only need to replace the .nb model file to complete the model replacement.
Q2:换一个图测试怎么做? Q2: How to test with another picture?
A2:替换debug下的.jpg测试图像为你想要测试的图像,adb push 到手机上即可 A2: Replace the .jpg test image under ./debug with the image you want to test, and run adb push to push new image to the phone.
Q3:如何封装到手机APP中? Q3: How to package it into the mobile APP?
A3:此demo旨在提供能在手机上运行OCR的核心算法部分,PaddleOCR/deploy/android_demo是将这个demo封装到手机app的示例,供参考 A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further, PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference.
- [端侧部署](#端侧部署)
- [1. 准备环境](#1-准备环境)
- [运行准备](#运行准备)
- [1.1 准备交叉编译环境](#11-准备交叉编译环境)
- [1.2 准备预测库](#12-准备预测库)
- [2 开始运行](#2-开始运行)
- [2.1 模型优化](#21-模型优化)
- [2.2 与手机联调](#22-与手机联调)
- [注意:](#注意)
- [FAQ](#faq)
# 端侧部署
本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleOCR超轻量中文检测、识别模型的详细步骤。
Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。
## 1. 准备环境
### 运行准备
- 电脑(编译Paddle Lite)
- 安卓手机(armv7或armv8)
### 1.1 准备交叉编译环境
交叉编译环境用于编译 Paddle Lite 和 PaddleOCR 的C++ demo。
支持多种开发环境,不同开发环境的编译流程请参考对应文档。
1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
### 1.2 准备预测库
预测库有两种获取方式:
- 1. 直接下载,预测库下载链接如下:
| 平台 | 预测库下载链接 |
|---|---|
|Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)|
|IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)|
注:1. 上述预测库为PaddleLite 2.10分支编译得到,有关PaddleLite 2.10 详细信息可参考 [链接](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10) 。
- 2. [推荐]编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下:
```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
# 切换到Paddle-Lite release/v2.10 稳定分支
git checkout release/v2.10
./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON
```
注意:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,
更多编译命令
介绍请参考 [链接](https://paddle-lite.readthedocs.io/zh/release-v2.10_a/source_compile/linux_x86_compile_android.html)
直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8/`文件夹,通过编译Paddle-Lite得到的预测库位于
`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。
预测库的文件目录如下:
```
inference_lite_lib.android.armv8/
|-- cxx C++ 预测库和头文件
| |-- include C++ 头文件
| | |-- paddle_api.h
| | |-- paddle_image_preprocess.h
| | |-- paddle_lite_factory_helper.h
| | |-- paddle_place.h
| | |-- paddle_use_kernels.h
| | |-- paddle_use_ops.h
| | `-- paddle_use_passes.h
| `-- lib C++预测库
| |-- libpaddle_api_light_bundled.a C++静态库
| `-- libpaddle_light_api_shared.so C++动态库
|-- java Java预测库
| |-- jar
| | `-- PaddlePredictor.jar
| |-- so
| | `-- libpaddle_lite_jni.so
| `-- src
|-- demo C++和Java示例代码
| |-- cxx C++ 预测库demo
| `-- java Java 预测库demo
```
## 2 开始运行
### 2.1 模型优化
Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-lite的opt工具可以自动
对inference模型进行优化,优化后的模型更轻量,模型运行速度更快。
如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。
下述表格中也提供了一系列中文移动端模型:
|模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本|
|---|---|---|---|---|---|---|
|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10|
|PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.6M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_slim_opt.nb)|v2.10|
如果直接使用上述表格中的模型进行部署,可略过下述步骤,直接阅读 [2.2节](#2.2与手机联调)
如果要部署的模型不在上述表格中,则需要按照如下步骤获得优化后的模型。
模型优化需要Paddle-Lite的opt可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下:
```
# 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
git checkout release/v2.10
# 启动编译
./lite/tools/build.sh build_optimize_tool
```
编译完成后,opt文件位于`build.opt/lite/api/`下,可通过如下方式查看opt的运行选项和使用方式;
```
cd build.opt/lite/api/
./opt
```
|选项|说明|
|---|---|
|--model_dir|待优化的PaddlePaddle模型(非combined形式)的路径|
|--model_file|待优化的PaddlePaddle模型(combined形式)的网络结构文件路径|
|--param_file|待优化的PaddlePaddle模型(combined形式)的权重文件路径|
|--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现。若您需要在mobile端执行模型预测,请将此选项设置为naive_buffer。默认为protobuf|
|--optimize_out|优化模型的输出路径|
|--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm|
|--record_tailoring_info|当使用 根据模型裁剪库文件 功能时,则设置该选项为true,以记录优化后模型含有的kernel和OP信息,默认为false|
`--model_dir`适用于待优化的模型是非combined方式,PaddleOCR的inference模型是combined方式,即模型结构和模型参数使用单独一个文件存储。
下面以PaddleOCR的超轻量中文模型为例,介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。
```
# 【推荐】 下载 PP-OCRv2版本的中英文 inference模型
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls_slim_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_slim_infer.tar
# 转换检测模型
./opt --model_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_det_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
# 转换识别模型
./opt --model_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_rec_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
# 转换方向分类器模型
./opt --model_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdiparams --optimize_out=./ch_ppocr_mobile_v2.0_cls_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
```
转换成功后,inference模型目录下会多出`.nb`结尾的文件,即是转换成功的模型文件。
注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt 工具的输入模型是paddle保存的inference模型
<a name="2.2与手机联调"></a>
### 2.2 与手机联调
首先需要进行一些准备工作。
1. 准备一台arm8的安卓手机,如果编译的预测库和opt文件是armv7,则需要arm7的手机,并修改Makefile中`ARM_ABI = arm7`
2. 打开手机的USB调试选项,选择文件传输模式,连接电脑。
3. 电脑上安装adb工具,用于调试。 adb安装方式如下:
3.1. MAC电脑安装ADB:
```
brew cask install android-platform-tools
```
3.2. Linux安装ADB
```
sudo apt update
sudo apt install -y wget adb
```
3.3. Window安装ADB
win上安装需要去谷歌的安卓平台下载adb软件包进行安装:[链接](https://developer.android.com/studio)
打开终端,手机连接电脑,在终端中输入
```
adb devices
```
如果有device输出,则表示安装成功。
```
List of devices attached
744be294 device
```
4. 准备优化后的模型、预测库文件、测试图像和使用的字典文件。
```
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR/deploy/lite/
# 运行prepare.sh,准备预测库文件、测试图像和使用的字典文件,并放置在预测库中的demo/cxx/ocr文件夹下
sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8
# 进入OCR demo的工作目录
cd /{lite prediction library path}/inference_lite_lib.android.armv8/
cd demo/cxx/ocr/
# 将C++预测动态库so文件复制到debug文件夹中
cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
```
准备测试图像,以`PaddleOCR/doc/imgs/11.jpg`为例,将测试的图像复制到`demo/cxx/ocr/debug/`文件夹下。
准备lite opt工具优化后的模型文件,比如使用`ch_PP-OCRv2_det_slim_opt.ch_PP-OCRv2_rec_slim_rec.nb, ch_ppocr_mobile_v2.0_cls_slim_opt.nb`,模型文件放置在`demo/cxx/ocr/debug/`文件夹下。
执行完成后,ocr文件夹下将有如下文件格式:
```
demo/cxx/ocr/
|-- debug/
| |--ch_PP-OCRv2_det_slim_opt.nb 优化后的检测模型文件
| |--ch_PP-OCRv2_rec_slim_opt.nb 优化后的识别模型文件
| |--ch_ppocr_mobile_v2.0_cls_slim_opt.nb 优化后的文字方向分类器模型文件
| |--11.jpg 待测试图像
| |--ppocr_keys_v1.txt 中文字典文件
| |--libpaddle_light_api_shared.so C++预测库文件
| |--config.txt 超参数配置
|-- config.txt 超参数配置
|-- cls_process.cc 方向分类器的预处理和后处理文件
|-- cls_process.h
|-- crnn_process.cc 识别模型CRNN的预处理和后处理文件
|-- crnn_process.h
|-- db_post_process.cc 检测模型DB的后处理文件
|-- db_post_process.h
|-- Makefile 编译文件
|-- ocr_db_crnn.cc C++预测源文件
```
#### 注意:
1. ppocr_keys_v1.txt是中文字典文件,如果使用的 nb 模型是英文数字或其他语言的模型,需要更换为对应语言的字典。
PaddleOCR 在ppocr/utils/下存放了多种字典,包括:
```
dict/french_dict.txt # 法语字典
dict/german_dict.txt # 德语字典
ic15_dict.txt # 英文字典
dict/japan_dict.txt # 日语字典
dict/korean_dict.txt # 韩语字典
ppocr_keys_v1.txt # 中文字典
...
```
2. `config.txt` 包含了检测器、分类器的超参数,如下:
```
max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960
det_db_thresh 0.3 # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显
det_db_box_thresh 0.5 # DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小
det_db_unclip_ratio 1.6 # 表示文本框的紧致程度,越小则文本框更靠近文本
use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1表示使用
```
5. 启动调试
上述步骤完成后就可以使用adb将文件push到手机上运行,步骤如下:
```
# 执行编译,得到可执行文件ocr_db_crnn, 第一次执行此命令会下载opencv等依赖库,下载完成后,需要再执行一次
make -j
# 将编译的可执行文件移动到debug文件夹中
mv ocr_db_crnn ./debug/
# 将debug文件夹push到手机上
adb push debug /data/local/tmp/
adb shell
cd /data/local/tmp/debug
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
# 开始使用,ocr_db_crnn可执行文件的使用方式为:
# ./ocr_db_crnn 检测模型文件 方向分类器模型文件 识别模型文件 测试图像路径 字典文件路径
./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb ./11.jpg ppocr_keys_v1.txt
```
如果对代码做了修改,则需要重新编译并push到手机上。
运行效果如下:
<div align="center">
<img src="imgs/lite_demo.png" width="600">
</div>
## FAQ
Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗?
A1:如果已经走通了上述步骤,更换模型只需要替换 .nb 模型文件即可,同时要注意更新字典
Q2:换一个图测试怎么做?
A2:替换debug下的.jpg测试图像为你想要测试的图像,adb push 到手机上即可
Q3:如何封装到手机APP中?
A3:此demo旨在提供能在手机上运行OCR的核心算法部分,PaddleOCR/deploy/android_demo是将这个demo封装到手机app的示例,供参考
- [Tutorial of PaddleOCR Mobile deployment](#tutorial-of-paddleocr-mobile-deployment)
- [1. Preparation](#1-preparation)
- [Preparation environment](#preparation-environment)
- [1.1 Prepare the cross-compilation environment](#11-prepare-the-cross-compilation-environment)
- [1.2 Prepare Paddle-Lite library](#12-prepare-paddle-lite-library)
- [2 Run](#2-run)
- [2.1 Inference Model Optimization](#21-inference-model-optimization)
- [2.2 Run optimized model on Phone](#22-run-optimized-model-on-phone)
- [注意:](#注意)
- [FAQ](#faq)
# Tutorial of PaddleOCR Mobile deployment
This tutorial will introduce how to use [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) to deploy PaddleOCR ultra-lightweight Chinese and English detection models on mobile phones.
paddle-lite is a lightweight inference engine for PaddlePaddle. It provides efficient inference capabilities for mobile phones and IoT, and extensively integrates cross-platform hardware to provide lightweight deployment solutions for end-side deployment issues.
## 1. Preparation
### Preparation environment
- Computer (for Compiling Paddle Lite)
- Mobile phone (arm7 or arm8)
### 1.1 Prepare the cross-compilation environment
The cross-compilation environment is used to compile C++ demos of Paddle Lite and PaddleOCR.
Supports multiple development environments.
For the compilation process of different development environments, please refer to the corresponding documents.
1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
### 1.2 Prepare Paddle-Lite library
There are two ways to obtain the Paddle-Lite library:
- 1. Download directly, the download link of the Paddle-Lite library is as follows:
| Platform | Paddle-Lite library download link |
|---|---|
|Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)|
|IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)|
Note: 1. The above Paddle-Lite library is compiled from the Paddle-Lite 2.10 branch. For more information about Paddle-Lite 2.10, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10).
- 2. [Recommended] Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows:
```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
# Switch to Paddle-Lite release/v2.10 stable branch
git checkout release/v2.10
./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON
```
Note: When compiling Paddle-Lite to obtain the Paddle-Lite library, you need to turn on the two options `--with_cv=ON --with_extra=ON`, `--arch` means the `arm` version, here is designated as armv8,
More compilation commands refer to the introduction [link](https://paddle-lite.readthedocs.io/zh/release-v2.10_a/source_compile/linux_x86_compile_android.html)
After directly downloading the Paddle-Lite library and decompressing it, you can get the `inference_lite_lib.android.armv8/` folder, and the Paddle-Lite library obtained by compiling Paddle-Lite is located
`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/` folder.
The structure of the prediction library is as follows:
```
inference_lite_lib.android.armv8/
|-- cxx C++ prebuild library
| |-- include C++
| | |-- paddle_api.h
| | |-- paddle_image_preprocess.h
| | |-- paddle_lite_factory_helper.h
| | |-- paddle_place.h
| | |-- paddle_use_kernels.h
| | |-- paddle_use_ops.h
| | `-- paddle_use_passes.h
| `-- lib C++ library
| |-- libpaddle_api_light_bundled.a C++ static library
| `-- libpaddle_light_api_shared.so C++ dynamic library
|-- java Java library
| |-- jar
| | `-- PaddlePredictor.jar
| |-- so
| | `-- libpaddle_lite_jni.so
| `-- src
|-- demo C++ and Java demo
| |-- cxx C++ demo
| `-- java Java demo
```
## 2 Run
### 2.1 Inference Model Optimization
Paddle Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle Lite provide opt tools to automatically complete the optimization steps and output a lightweight, optimal executable model.
If you have prepared the model file ending in .nb, you can skip this step.
The following table also provides a series of models that can be deployed on mobile phones to recognize Chinese. You can directly download the optimized model.
|Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch|
|---|---|---|---|---|---|---|
|PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10|
|PP-OCRv2(slim)|extra-lightweight chinese OCR optimized model|4.6M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_slim_opt.nb)|v2.10|
If you directly use the model in the above table for deployment, you can skip the following steps and directly read [Section 2.2](#2.2-Run-optimized-model-on-Phone).
If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model.
The `opt` tool can be obtained by compiling Paddle Lite.
```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
git checkout release/v2.10
./lite/tools/build.sh build_optimize_tool
```
After the compilation is complete, the opt file is located under build.opt/lite/api/, You can view the operating options and usage of opt in the following ways:
```
cd build.opt/lite/api/
./opt
```
|Options|Description|
|---|---|
|--model_dir|The path of the PaddlePaddle model to be optimized (non-combined form)|
|--model_file|The network structure file path of the PaddlePaddle model (combined form) to be optimized|
|--param_file|The weight file path of the PaddlePaddle model (combined form) to be optimized|
|--optimize_out_type|Output model type, currently supports two types: protobuf and naive_buffer, among which naive_buffer is a more lightweight serialization/deserialization implementation. If you need to perform model prediction on the mobile side, please set this option to naive_buffer. The default is protobuf|
|--optimize_out|The output path of the optimized model|
|--valid_targets|The executable backend of the model, the default is arm. Currently it supports x86, arm, opencl, npu, xpu, multiple backends can be specified at the same time (separated by spaces), and Model Optimize Tool will automatically select the best method. If you need to support Huawei NPU (DaVinci architecture NPU equipped with Kirin 810/990 Soc), it should be set to npu, arm|
|--record_tailoring_info|When using the function of cutting library files according to the model, set this option to true to record the kernel and OP information contained in the optimized model. The default is false|
`--model_dir` is suitable for the non-combined mode of the model to be optimized, and the inference model of PaddleOCR is the combined mode, that is, the model structure and model parameters are stored in a single file.
The following takes the ultra-lightweight Chinese model of PaddleOCR as an example to introduce the use of the compiled opt file to complete the conversion of the inference model to the Paddle-Lite optimized model
```
# 【[Recommendation] Download the Chinese and English inference model of PP-OCRv2
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls_slim_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_slim_infer.tar
# Convert detection model
./opt --model_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_det_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_det_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
# Convert recognition model
./opt --model_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdmodel --param_file=./ch_PP-OCRv2_rec_slim_quant_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv2_rec_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
# Convert angle classifier model
./opt --model_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdiparams --optimize_out=./ch_ppocr_mobile_v2.0_cls_slim_opt --valid_targets=arm --optimize_out_type=naive_buffer
```
After the conversion is successful, there will be more files ending with `.nb` in the inference model directory, which is the successfully converted model file.
<a name="2.2-Run-optimized-model-on-Phone"></a>
### 2.2 Run optimized model on Phone
Some preparatory work is required first.
1. Prepare an Android phone with arm8. If the compiled prediction library and opt file are armv7, you need an arm7 phone and modify ARM_ABI = arm7 in the Makefile.
2. Make sure the phone is connected to the computer, open the USB debugging option of the phone, and select the file transfer mode.
3. Install the adb tool on the computer.
3.1. Install ADB for MAC:
```
brew cask install android-platform-tools
```
3.2. Install ADB for Linux
```
sudo apt update
sudo apt install -y wget adb
```
3.3. Install ADB for windows
To install on win, you need to go to Google's Android platform to download the adb package for installation:[link](https://developer.android.com/studio)
Verify whether adb is installed successfully
```
adb devices
```
If there is device output, it means the installation is successful。
```
List of devices attached
744be294 device
```
4. Prepare optimized models, prediction library files, test images and dictionary files used.
```
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR/deploy/lite/
# run prepare.sh
sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8
#
cd /{lite prediction library path}/inference_lite_lib.android.armv8/
cd demo/cxx/ocr/
# copy paddle-lite C++ .so file to debug/ directory
cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
cd inference_lite_lib.android.armv8/demo/cxx/ocr/
cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
```
Prepare the test image, taking PaddleOCR/doc/imgs/11.jpg as an example, copy the image file to the demo/cxx/ocr/debug/ folder. Prepare the model files optimized by the lite opt tool, ch_det_mv3_db_opt.nb, ch_rec_mv3_crnn_opt.nb, and place them under the demo/cxx/ocr/debug/ folder.
The structure of the OCR demo is as follows after the above command is executed:
```
demo/cxx/ocr/
|-- debug/
| |--ch_PP-OCRv2_det_slim_opt.nb Detection model
| |--ch_PP-OCRv2_rec_slim_opt.nb Recognition model
| |--ch_ppocr_mobile_v2.0_cls_slim_opt.nb Text direction classification model
| |--11.jpg Image for OCR
| |--ppocr_keys_v1.txt Dictionary file
| |--libpaddle_light_api_shared.so C++ .so file
| |--config.txt Config file
|-- config.txt Config file
|-- cls_process.cc Pre-processing and post-processing files for the angle classifier
|-- cls_process.h
|-- crnn_process.cc Pre-processing and post-processing files for the CRNN model
|-- crnn_process.h
|-- db_post_process.cc Pre-processing and post-processing files for the DB model
|-- db_post_process.h
|-- Makefile
|-- ocr_db_crnn.cc C++ main code
```
#### 注意:
1. `ppocr_keys_v1.txt` is a Chinese dictionary file. If the nb model is used for English recognition or other language recognition, dictionary file should be replaced with a dictionary of the corresponding language. PaddleOCR provides a variety of dictionaries under ppocr/utils/, including:
```
dict/french_dict.txt # french
dict/german_dict.txt # german
ic15_dict.txt # english
dict/japan_dict.txt # japan
dict/korean_dict.txt # korean
ppocr_keys_v1.txt # chinese
```
2. `config.txt` of the detector and classifier, as shown below:
```
max_side_len 960 # Limit the maximum image height and width to 960
det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
use_direction_classify 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use
```
5. Run Model on phone
After the above steps are completed, you can use adb to push the file to the phone to run, the steps are as follows:
```
# Execute the compilation and get the executable file ocr_db_crnn
# The first execution of this command will download dependent libraries such as opencv. After the download is complete, you need to execute it again
make -j
# Move the compiled executable file to the debug folder
mv ocr_db_crnn ./debug/
# Push the debug folder to the phone
adb push debug /data/local/tmp/
adb shell
cd /data/local/tmp/debug
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
# The use of ocr_db_crnn is:
# ./ocr_db_crnn Detection model file Orientation classifier model file Recognition model file Test image path Dictionary file path
./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_opt.nb ./11.jpg ppocr_keys_v1.txt
```
If you modify the code, you need to recompile and push to the phone.
The outputs are as follows:
<div align="center">
<img src="imgs/lite_demo.png" width="600">
</div>
## FAQ
Q1: What if I want to change the model, do I need to run it again according to the process?
A1: If you have performed the above steps, you only need to replace the .nb model file to complete the model replacement.
Q2: How to test with another picture?
A2: Replace the .jpg test image under ./debug with the image you want to test, and run adb push to push new image to the phone.
Q3: How to package it into the mobile APP?
A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further, PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference.
# Paddle2ONNX模型转化与预测
本章节介绍 PaddleOCR 模型如何转化为 ONNX 模型,并基于 ONNXRuntime 引擎预测。
## 1. 环境准备
需要准备 PaddleOCR、Paddle2ONNX 模型转化环境,和 ONNXRuntime 预测环境
### PaddleOCR
克隆PaddleOCR的仓库,使用release/2.4分支,并进行安装,由于PaddleOCR仓库比较大,git clone速度比较慢,所以本教程已下载
```
git clone -b release/2.4 https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR && python3.7 setup.py install
```
### Paddle2ONNX
Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式,算子目前稳定支持导出 ONNX Opset 9~11,部分Paddle算子支持更低的ONNX Opset转换。
更多细节可参考 [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- 安装 Paddle2ONNX
```
python3.7 -m pip install paddle2onnx
```
- 安装 ONNXRuntime
```
# 建议安装 1.9.0 版本,可根据环境更换版本号
python3.7 -m pip install onnxruntime==1.9.0
```
## 2. 模型转换
- Paddle 模型下载
有两种方式获取Paddle静态图模型:在 [model_list](../../doc/doc_ch/models_list.md) 中下载PaddleOCR提供的预测模型;
参考[模型导出说明](../../doc/doc_ch/inference.md#训练模型转inference模型)把训练好的权重转为 inference_model。
以 ppocr 中文检测、识别、分类模型为例:
```
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar
cd ./inference && tar xf ch_PP-OCRv2_det_infer.tar && cd ..
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar
cd ./inference && tar xf ch_PP-OCRv2_rec_infer.tar && cd ..
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
cd ./inference && tar xf ch_ppocr_mobile_v2.0_cls_infer.tar && cd ..
```
- 模型转换
使用 Paddle2ONNX 将Paddle静态图模型转换为ONNX模型格式:
```
paddle2onnx --model_dir ./inference/ch_PP-OCRv2_det_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./inference/det_onnx/model.onnx \
--opset_version 10 \
--input_shape_dict="{'x':[-1,3,-1,-1]}" \
--enable_onnx_checker True
paddle2onnx --model_dir ./inference/ch_PP-OCRv2_rec_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./inference/rec_onnx/model.onnx \
--opset_version 10 \
--input_shape_dict="{'x':[-1,3,-1,-1]}" \
--enable_onnx_checker True
paddle2onnx --model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer \
--model_filename ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel \
--params_filename ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams \
--save_file ./inferencecls_onnx/model.onnx \
--opset_version 10 \
--input_shape_dict="{'x':[-1,3,-1,-1]}" \
--enable_onnx_checker True
```
执行完毕后,ONNX 模型会被分别保存在 `./inference/det_onnx/``./inference/rec_onnx/``./inference/cls_onnx/`路径下
* 注意:对于OCR模型,转化过程中必须采用动态shape的形式,即加入选项--input_shape_dict="{'x': [-1, 3, -1, -1]}",否则预测结果可能与直接使用Paddle预测有细微不同。
另外,以下几个模型暂不支持转换为 ONNX 模型:
NRTR、SAR、RARE、SRN
## 3. 推理预测
以中文OCR模型为例,使用 ONNXRuntime 预测可执行如下命令:
```
python3.7 tools/infer/predict_system.py --use_gpu=False --use_onnx=True \
--det_model_dir=./inference/det_onnx/model.onnx \
--rec_model_dir=./inference/rec_onnx/model.onnx \
--cls_model_dir=./inference/cls_onnx/model.onnx \
--image_dir=./deploy/lite/imgs/lite_demo.png
```
以中文OCR模型为例,使用 Paddle Inference 预测可执行如下命令:
```
python3.7 tools/infer/predict_system.py --use_gpu=False \
--cls_model_dir=./inference/ch_ppocr_mobile_v2.0_cls_infer \
--rec_model_dir=./inference/ch_PP-OCRv2_rec_infer \
--det_model_dir=./inference/ch_PP-OCRv2_det_infer \
--image_dir=./deploy/lite/imgs/lite_demo.png
```
执行命令后在终端会打印出预测的识别信息,并在 `./inference_results/` 下保存可视化结果。
ONNXRuntime 执行效果:
<div align="center">
<img src="./images/lite_demo_onnx.png" width=800">
</div>
Paddle Inference 执行效果:
<div align="center">
<img src="./images/lite_demo_paddle.png" width=800">
</div>
使用 ONNXRuntime 预测,终端输出:
```
[2022/02/22 17:48:27] root DEBUG: dt_boxes num : 38, elapse : 0.043187856674194336
[2022/02/22 17:48:27] root DEBUG: rec_res num : 38, elapse : 0.592170000076294
[2022/02/22 17:48:27] root DEBUG: 0 Predict time of ./deploy/lite/imgs/lite_demo.png: 0.642s
[2022/02/22 17:48:27] root DEBUG: The, 0.984
[2022/02/22 17:48:27] root DEBUG: visualized, 0.882
[2022/02/22 17:48:27] root DEBUG: etect18片, 0.720
[2022/02/22 17:48:27] root DEBUG: image saved in./vis.jpg, 0.947
[2022/02/22 17:48:27] root DEBUG: 纯臻营养护发素0.993604, 0.996
[2022/02/22 17:48:27] root DEBUG: 产品信息/参数, 0.922
[2022/02/22 17:48:27] root DEBUG: 0.992728, 0.914
[2022/02/22 17:48:27] root DEBUG: (45元/每公斤,100公斤起订), 0.926
[2022/02/22 17:48:27] root DEBUG: 0.97417, 0.977
[2022/02/22 17:48:27] root DEBUG: 每瓶22元,1000瓶起订)0.993976, 0.962
[2022/02/22 17:48:27] root DEBUG: 【品牌】:代加工方式/0EMODM, 0.945
[2022/02/22 17:48:27] root DEBUG: 0.985133, 0.980
[2022/02/22 17:48:27] root DEBUG: 【品名】:纯臻营养护发素, 0.921
[2022/02/22 17:48:27] root DEBUG: 0.995007, 0.883
[2022/02/22 17:48:27] root DEBUG: 【产品编号】:YM-X-30110.96899, 0.955
[2022/02/22 17:48:27] root DEBUG: 【净含量】:220ml, 0.943
[2022/02/22 17:48:27] root DEBUG: Q.996577, 0.932
[2022/02/22 17:48:27] root DEBUG: 【适用人群】:适合所有肤质, 0.913
[2022/02/22 17:48:27] root DEBUG: 0.995842, 0.969
[2022/02/22 17:48:27] root DEBUG: 【主要成分】:鲸蜡硬脂醇、燕麦B-葡聚, 0.883
[2022/02/22 17:48:27] root DEBUG: 0.961928, 0.964
[2022/02/22 17:48:27] root DEBUG: 10, 0.812
[2022/02/22 17:48:27] root DEBUG: 糖、椰油酰胺丙基甜菜碱、泛醒, 0.866
[2022/02/22 17:48:27] root DEBUG: 0.925898, 0.943
[2022/02/22 17:48:27] root DEBUG: (成品包材), 0.974
[2022/02/22 17:48:27] root DEBUG: 0.972573, 0.961
[2022/02/22 17:48:27] root DEBUG: 【主要功能】:可紧致头发磷层,从而达到, 0.936
[2022/02/22 17:48:27] root DEBUG: 0.994448, 0.952
[2022/02/22 17:48:27] root DEBUG: 13, 0.998
[2022/02/22 17:48:27] root DEBUG: 即时持久改善头发光泽的效果,给干燥的头, 0.994
[2022/02/22 17:48:27] root DEBUG: 0.990198, 0.975
[2022/02/22 17:48:27] root DEBUG: 14, 0.977
[2022/02/22 17:48:27] root DEBUG: 发足够的滋养, 0.991
[2022/02/22 17:48:27] root DEBUG: 0.997668, 0.918
[2022/02/22 17:48:27] root DEBUG: 花费了0.457335秒, 0.901
[2022/02/22 17:48:27] root DEBUG: The visualized image saved in ./inference_results/lite_demo.png
[2022/02/22 17:48:27] root INFO: The predict total time is 0.7003889083862305
```
使用 Paddle Inference 预测,终端输出:
```
[2022/02/22 17:47:25] root DEBUG: dt_boxes num : 38, elapse : 0.11791276931762695
[2022/02/22 17:47:27] root DEBUG: rec_res num : 38, elapse : 2.6206860542297363
[2022/02/22 17:47:27] root DEBUG: 0 Predict time of ./deploy/lite/imgs/lite_demo.png: 2.746s
[2022/02/22 17:47:27] root DEBUG: The, 0.984
[2022/02/22 17:47:27] root DEBUG: visualized, 0.882
[2022/02/22 17:47:27] root DEBUG: etect18片, 0.720
[2022/02/22 17:47:27] root DEBUG: image saved in./vis.jpg, 0.947
[2022/02/22 17:47:27] root DEBUG: 纯臻营养护发素0.993604, 0.996
[2022/02/22 17:47:27] root DEBUG: 产品信息/参数, 0.922
[2022/02/22 17:47:27] root DEBUG: 0.992728, 0.914
[2022/02/22 17:47:27] root DEBUG: (45元/每公斤,100公斤起订), 0.926
[2022/02/22 17:47:27] root DEBUG: 0.97417, 0.977
[2022/02/22 17:47:27] root DEBUG: 每瓶22元,1000瓶起订)0.993976, 0.962
[2022/02/22 17:47:27] root DEBUG: 【品牌】:代加工方式/0EMODM, 0.945
[2022/02/22 17:47:27] root DEBUG: 0.985133, 0.980
[2022/02/22 17:47:27] root DEBUG: 【品名】:纯臻营养护发素, 0.921
[2022/02/22 17:47:27] root DEBUG: 0.995007, 0.883
[2022/02/22 17:47:27] root DEBUG: 【产品编号】:YM-X-30110.96899, 0.955
[2022/02/22 17:47:27] root DEBUG: 【净含量】:220ml, 0.943
[2022/02/22 17:47:27] root DEBUG: Q.996577, 0.932
[2022/02/22 17:47:27] root DEBUG: 【适用人群】:适合所有肤质, 0.913
[2022/02/22 17:47:27] root DEBUG: 0.995842, 0.969
[2022/02/22 17:47:27] root DEBUG: 【主要成分】:鲸蜡硬脂醇、燕麦B-葡聚, 0.883
[2022/02/22 17:47:27] root DEBUG: 0.961928, 0.964
[2022/02/22 17:47:27] root DEBUG: 10, 0.812
[2022/02/22 17:47:27] root DEBUG: 糖、椰油酰胺丙基甜菜碱、泛醒, 0.866
[2022/02/22 17:47:27] root DEBUG: 0.925898, 0.943
[2022/02/22 17:47:27] root DEBUG: (成品包材), 0.974
[2022/02/22 17:47:27] root DEBUG: 0.972573, 0.961
[2022/02/22 17:47:27] root DEBUG: 【主要功能】:可紧致头发磷层,从而达到, 0.936
[2022/02/22 17:47:27] root DEBUG: 0.994448, 0.952
[2022/02/22 17:47:27] root DEBUG: 13, 0.998
[2022/02/22 17:47:27] root DEBUG: 即时持久改善头发光泽的效果,给干燥的头, 0.994
[2022/02/22 17:47:27] root DEBUG: 0.990198, 0.975
[2022/02/22 17:47:27] root DEBUG: 14, 0.977
[2022/02/22 17:47:27] root DEBUG: 发足够的滋养, 0.991
[2022/02/22 17:47:27] root DEBUG: 0.997668, 0.918
[2022/02/22 17:47:27] root DEBUG: 花费了0.457335秒, 0.901
[2022/02/22 17:47:27] root DEBUG: The visualized image saved in ./inference_results/lite_demo.png
[2022/02/22 17:47:27] root INFO: The predict total time is 2.8338775634765625
```
English| [简体中文](README_ch.md)
# Paddle.js
[Paddle.js](https://github.com/PaddlePaddle/Paddle.js) is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.
- [Online experience](https://paddlejs.baidu.com/ocr)
- [Tutorial](https://github.com/PaddlePaddle/Paddle.js/blob/release/v2.2.3/packages/paddlejs-models/ocr/README_cn.md)
- Visualization:
<div align="center">
<img src="./paddlejs_demo.gif" width="800">
</div>
\ No newline at end of file
[English](README.md) | 简体中文
# Paddle.js 网页前端部署
[Paddle.js](https://github.com/PaddlePaddle/Paddle.js) 是百度 PaddlePaddle 的 web 方向子项目,是一个运行在浏览器中的开源深度学习框架。Paddle.js 可以加载提前训练好的 paddle 模型,通过 Paddle.js 的模型转换工具 paddlejs-converter 变成浏览器友好的模型进行在线推理预测使用。目前,Paddle.js 可以在支持 WebGL/WebGPU/WebAssembly 的浏览器中运行,也可以在百度小程序和微信小程序环境下运行。
- [在线体验](https://paddlejs.baidu.com/ocr)
- [直达教程](https://github.com/PaddlePaddle/Paddle.js/blob/release/v2.2.3/packages/paddlejs-models/ocr/README_cn.md)
- 效果:
<div align="center">
<img src="./paddlejs_demo.gif" width="800">
</div>
\ No newline at end of file
...@@ -36,7 +36,6 @@ PaddleOCR operating environment and Paddle Serving operating environment are nee ...@@ -36,7 +36,6 @@ PaddleOCR operating environment and Paddle Serving operating environment are nee
1. Please prepare PaddleOCR operating environment reference [link](../../doc/doc_ch/installation.md). 1. Please prepare PaddleOCR operating environment reference [link](../../doc/doc_ch/installation.md).
Download the corresponding paddlepaddle whl package according to the environment, it is recommended to install version 2.2.2. Download the corresponding paddlepaddle whl package according to the environment, it is recommended to install version 2.2.2.
2. The steps of PaddleServing operating environment prepare are as follows: 2. The steps of PaddleServing operating environment prepare are as follows:
...@@ -194,6 +193,52 @@ The recognition model is the same. ...@@ -194,6 +193,52 @@ The recognition model is the same.
2021-05-13 03:42:36,979 chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0] 2021-05-13 03:42:36,979 chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0]
``` ```
## C++ Serving
Service deployment based on python obviously has the advantage of convenient secondary development. However, the real application often needs to pursue better performance. PaddleServing also provides a more performant C++ deployment version.
The C++ service deployment is the same as python in the environment setup and data preparation stages, the difference is when the service is started and the client sends requests.
| Language | Speed ​​| Secondary development | Do you need to compile |
|-----|-----|---------|------------|
| C++ | fast | Slightly difficult | Single model prediction does not need to be compiled, multi-model concatenation needs to be compiled |
| python | general | easy | single-model/multi-model no compilation required |
1. Compile Serving
To improve predictive performance, C++ services also provide multiple model concatenation services. Unlike Python Pipeline services, multiple model concatenation requires the pre - and post-model processing code to be written on the server side, so local recompilation is required to generate serving. Specific may refer to the official document: [how to compile Serving](https://github.com/PaddlePaddle/Serving/blob/v0.8.3/doc/Compile_EN.md)
2. Run the following command to start the service.
```
# Start the service and save the running log in log.txt
python3 -m paddle_serving_server.serve --model ppocrv2_det_serving ppocrv2_rec_serving --op GeneralDetectionOp GeneralInferOp --port 9293 &>log.txt &
```
After the service is successfully started, a log similar to the following will be printed in log.txt
![](./imgs/start_server.png)
3. Send service request
Due to the need for pre and post-processing in the C++Server part, in order to speed up the input to the C++Server is only the base64 encoded string of the picture, it needs to be manually modified
Change the feed_type field and shape field in ppocrv2_det_client/serving_client_conf.prototxt to the following:
```
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 20
shape: 1
}
```
start the client:
```
python3 ocr_cpp_client.py ppocrv2_det_client ppocrv2_rec_client
```
After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
![](./imgs/results.png)
## WINDOWS Users ## WINDOWS Users
Windows does not support Pipeline Serving, if we want to lauch paddle serving on Windows, we should use Web Service, for more infomation please refer to [Paddle Serving for Windows Users](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Windows_Tutorial_EN.md) Windows does not support Pipeline Serving, if we want to lauch paddle serving on Windows, we should use Web Service, for more infomation please refer to [Paddle Serving for Windows Users](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Windows_Tutorial_EN.md)
......
...@@ -6,6 +6,7 @@ PaddleOCR提供2种服务部署方式: ...@@ -6,6 +6,7 @@ PaddleOCR提供2种服务部署方式:
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",使用方法参考[文档](../../deploy/hubserving/readme.md) - 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",使用方法参考[文档](../../deploy/hubserving/readme.md)
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",按照本教程使用。 - 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",按照本教程使用。
# 基于PaddleServing的服务部署 # 基于PaddleServing的服务部署
本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PP-OCR动态图模型的pipeline在线服务。 本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PP-OCR动态图模型的pipeline在线服务。
...@@ -17,6 +18,8 @@ PaddleOCR提供2种服务部署方式: ...@@ -17,6 +18,8 @@ PaddleOCR提供2种服务部署方式:
更多有关PaddleServing服务化部署框架介绍和使用教程参考[文档](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md) 更多有关PaddleServing服务化部署框架介绍和使用教程参考[文档](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)
AIStudio演示案例可参考 [基于PaddleServing的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
## 目录 ## 目录
- [环境准备](#环境准备) - [环境准备](#环境准备)
- [模型转换](#模型转换) - [模型转换](#模型转换)
...@@ -30,7 +33,6 @@ PaddleOCR提供2种服务部署方式: ...@@ -30,7 +33,6 @@ PaddleOCR提供2种服务部署方式:
需要准备PaddleOCR的运行环境和Paddle Serving的运行环境。 需要准备PaddleOCR的运行环境和Paddle Serving的运行环境。
- 准备PaddleOCR的运行环境[链接](../../doc/doc_ch/installation.md) - 准备PaddleOCR的运行环境[链接](../../doc/doc_ch/installation.md)
根据环境下载对应的paddlepaddle whl包,推荐安装2.2.2版本
- 准备PaddleServing的运行环境,步骤如下 - 准备PaddleServing的运行环境,步骤如下
...@@ -132,7 +134,7 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \ ...@@ -132,7 +134,7 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \
python3 pipeline_http_client.py python3 pipeline_http_client.py
``` ```
成功运行后,模型预测的结果会打印在cmd窗口中,结果示例为: 成功运行后,模型预测的结果会打印在cmd窗口中,结果示例为:
![](./imgs/results.png) ![](./imgs/pipeline_result.png)
调整 config.yml 中的并发个数获得最大的QPS, 一般检测和识别的并发数为2:1 调整 config.yml 中的并发个数获得最大的QPS, 一般检测和识别的并发数为2:1
``` ```
...@@ -187,6 +189,73 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \ ...@@ -187,6 +189,73 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \
2021-05-13 03:42:36,979 chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0] 2021-05-13 03:42:36,979 chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0]
``` ```
<a name="C++"></a>
## Paddle Serving C++ 部署
基于python的服务部署,显然具有二次开发便捷的优势,然而真正落地应用,往往需要追求更优的性能。PaddleServing 也提供了性能更优的C++部署版本。
C++ 服务部署在环境搭建和数据准备阶段与 python 相同,区别在于启动服务和客户端发送请求时不同。
| 语言 | 速度 | 二次开发 | 是否需要编译 |
|-----|-----|---------|------------|
| C++ | 很快 | 略有难度 | 单模型预测无需编译,多模型串联需要编译 |
| python | 一般 | 容易 | 单模型/多模型 均无需编译|
1. 准备 Serving 环境
为了提高预测性能,C++ 服务同样提供了多模型串联服务。与python pipeline服务不同,多模型串联的过程中需要将模型前后处理代码写在服务端,因此需要在本地重新编译生成serving。
首先需要下载Serving代码库, 把OCR文本检测预处理相关代码替换到Serving库中
```
git clone https://github.com/PaddlePaddle/Serving
cp -rf general_detection_op.cpp Serving/core/general-server/op
```
具体可参考官方文档:[如何编译Serving](https://github.com/PaddlePaddle/Serving/blob/v0.8.3/doc/Compile_CN.md),注意需要开启 WITH_OPENCV 选项。
完成编译后,注意要安装编译出的三个whl包,并设置SERVING_BIN环境变量。
2. 启动服务可运行如下命令:
一个服务启动两个模型串联,只需要在--model后依次按顺序传入模型文件夹的相对路径,且需要在--op后依次传入自定义C++OP类名称:
```
# 启动服务,运行日志保存在log.txt
python3 -m paddle_serving_server.serve --model ppocrv2_det_serving ppocrv2_rec_serving --op GeneralDetectionOp GeneralInferOp --port 9293 &>log.txt &
```
成功启动服务后,log.txt中会打印类似如下日志
![](./imgs/start_server.png)
3. 发送服务请求:
由于需要在C++Server部分进行前后处理,为了加速传入C++Server的仅仅是图片的base64编码的字符串,故需要手动修改
ppocrv2_det_client/serving_client_conf.prototxt 中 feed_type 字段 和 shape 字段,修改成如下内容:
```
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 20
shape: 1
}
```
启动客户端
```
python3 ocr_cpp_client.py ppocrv2_det_client ppocrv2_rec_client
```
成功运行后,模型预测的结果会打印在cmd窗口中,结果示例为:
![](./imgs/results.png)
在浏览器中输入服务器 ip:端口号,可以看到当前服务的实时QPS。(端口号范围需要是8000-9000)
在200张真实图片上测试,把检测长边限制为960。T4 GPU 上 QPS 峰值可达到51左右,约为pipeline的 2.12 倍。
![](./imgs/c++_qps.png)
<a name="Windows用户"></a> <a name="Windows用户"></a>
## Windows用户 ## Windows用户
......
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "core/general-server/op/general_detection_op.h"
#include "core/predictor/framework/infer.h"
#include "core/predictor/framework/memory.h"
#include "core/predictor/framework/resource.h"
#include "core/util/include/timer.h"
#include <algorithm>
#include <iostream>
#include <memory>
#include <sstream>
/*
#include "opencv2/imgcodecs/legacy/constants_c.h"
#include "opencv2/imgproc/types_c.h"
*/
namespace baidu {
namespace paddle_serving {
namespace serving {
using baidu::paddle_serving::Timer;
using baidu::paddle_serving::predictor::MempoolWrapper;
using baidu::paddle_serving::predictor::general_model::Tensor;
using baidu::paddle_serving::predictor::general_model::Response;
using baidu::paddle_serving::predictor::general_model::Request;
using baidu::paddle_serving::predictor::InferManager;
using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralDetectionOp::inference() {
VLOG(2) << "Going to run inference";
const std::vector<std::string> pre_node_names = pre_names();
if (pre_node_names.size() != 1) {
LOG(ERROR) << "This op(" << op_name()
<< ") can only have one predecessor op, but received "
<< pre_node_names.size();
return -1;
}
const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
if (!input_blob) {
LOG(ERROR) << "input_blob is nullptr,error";
return -1;
}
uint64_t log_id = input_blob->GetLogId();
VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name;
GeneralBlob *output_blob = mutable_data<GeneralBlob>();
if (!output_blob) {
LOG(ERROR) << "output_blob is nullptr,error";
return -1;
}
output_blob->SetLogId(log_id);
if (!input_blob) {
LOG(ERROR) << "(logid=" << log_id
<< ") Failed mutable depended argument, op:" << pre_name;
return -1;
}
const TensorVector *in = &input_blob->tensor_vector;
TensorVector *out = &output_blob->tensor_vector;
int batch_size = input_blob->_batch_size;
VLOG(2) << "(logid=" << log_id << ") input batch size: " << batch_size;
output_blob->_batch_size = batch_size;
std::vector<int> input_shape;
int in_num = 0;
void *databuf_data = NULL;
char *databuf_char = NULL;
size_t databuf_size = 0;
// now only support single string
char *total_input_ptr = static_cast<char *>(in->at(0).data.data());
std::string base64str = total_input_ptr;
float ratio_h{};
float ratio_w{};
cv::Mat img = Base2Mat(base64str);
cv::Mat srcimg;
cv::Mat resize_img;
cv::Mat resize_img_rec;
cv::Mat crop_img;
img.copyTo(srcimg);
this->resize_op_.Run(img, resize_img, this->max_side_len_, ratio_h, ratio_w,
this->use_tensorrt_);
this->normalize_op_.Run(&resize_img, this->mean_det, this->scale_det,
this->is_scale_);
std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
this->permute_op_.Run(&resize_img, input.data());
TensorVector *real_in = new TensorVector();
if (!real_in) {
LOG(ERROR) << "real_in is nullptr,error";
return -1;
}
for (int i = 0; i < in->size(); ++i) {
input_shape = {1, 3, resize_img.rows, resize_img.cols};
in_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
std::multiplies<int>());
databuf_size = in_num * sizeof(float);
databuf_data = MempoolWrapper::instance().malloc(databuf_size);
if (!databuf_data) {
LOG(ERROR) << "Malloc failed, size: " << databuf_size;
return -1;
}
memcpy(databuf_data, input.data(), databuf_size);
databuf_char = reinterpret_cast<char *>(databuf_data);
paddle::PaddleBuf paddleBuf(databuf_char, databuf_size);
paddle::PaddleTensor tensor_in;
tensor_in.name = in->at(i).name;
tensor_in.dtype = paddle::PaddleDType::FLOAT32;
tensor_in.shape = {1, 3, resize_img.rows, resize_img.cols};
tensor_in.lod = in->at(i).lod;
tensor_in.data = paddleBuf;
real_in->push_back(tensor_in);
}
Timer timeline;
int64_t start = timeline.TimeStampUS();
timeline.Start();
if (InferManager::instance().infer(engine_name().c_str(), real_in, out,
batch_size)) {
LOG(ERROR) << "(logid=" << log_id
<< ") Failed do infer in fluid model: " << engine_name().c_str();
return -1;
}
delete real_in;
std::vector<int> output_shape;
int out_num = 0;
void *databuf_data_out = NULL;
char *databuf_char_out = NULL;
size_t databuf_size_out = 0;
// this is special add for PaddleOCR postprecess
int infer_outnum = out->size();
for (int k = 0; k < infer_outnum; ++k) {
int n2 = out->at(k).shape[2];
int n3 = out->at(k).shape[3];
int n = n2 * n3;
float *out_data = static_cast<float *>(out->at(k).data.data());
std::vector<float> pred(n, 0.0);
std::vector<unsigned char> cbuf(n, ' ');
for (int i = 0; i < n; i++) {
pred[i] = float(out_data[i]);
cbuf[i] = (unsigned char)((out_data[i]) * 255);
}
cv::Mat cbuf_map(n2, n3, CV_8UC1, (unsigned char *)cbuf.data());
cv::Mat pred_map(n2, n3, CV_32F, (float *)pred.data());
const double threshold = this->det_db_thresh_ * 255;
const double maxvalue = 255;
cv::Mat bit_map;
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
cv::Mat dilation_map;
cv::Mat dila_ele =
cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2, 2));
cv::dilate(bit_map, dilation_map, dila_ele);
boxes = post_processor_.BoxesFromBitmap(pred_map, dilation_map,
this->det_db_box_thresh_,
this->det_db_unclip_ratio_);
boxes = post_processor_.FilterTagDetRes(boxes, ratio_h, ratio_w, srcimg);
float max_wh_ratio = 0.0f;
std::vector<cv::Mat> crop_imgs;
std::vector<cv::Mat> resize_imgs;
int max_resize_w = 0;
int max_resize_h = 0;
int box_num = boxes.size();
std::vector<std::vector<float>> output_rec;
for (int i = 0; i < box_num; ++i) {
cv::Mat line_img = GetRotateCropImage(img, boxes[i]);
float wh_ratio = float(line_img.cols) / float(line_img.rows);
max_wh_ratio = max_wh_ratio > wh_ratio ? max_wh_ratio : wh_ratio;
crop_imgs.push_back(line_img);
}
for (int i = 0; i < box_num; ++i) {
cv::Mat resize_img;
crop_img = crop_imgs[i];
this->resize_op_rec.Run(crop_img, resize_img, max_wh_ratio,
this->use_tensorrt_);
this->normalize_op_.Run(&resize_img, this->mean_rec, this->scale_rec,
this->is_scale_);
max_resize_w = std::max(max_resize_w, resize_img.cols);
max_resize_h = std::max(max_resize_h, resize_img.rows);
resize_imgs.push_back(resize_img);
}
int buf_size = 3 * max_resize_h * max_resize_w;
output_rec = std::vector<std::vector<float>>(
box_num, std::vector<float>(buf_size, 0.0f));
for (int i = 0; i < box_num; ++i) {
resize_img_rec = resize_imgs[i];
this->permute_op_.Run(&resize_img_rec, output_rec[i].data());
}
// Inference.
output_shape = {box_num, 3, max_resize_h, max_resize_w};
out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1,
std::multiplies<int>());
databuf_size_out = out_num * sizeof(float);
databuf_data_out = MempoolWrapper::instance().malloc(databuf_size_out);
if (!databuf_data_out) {
LOG(ERROR) << "Malloc failed, size: " << databuf_size_out;
return -1;
}
int offset = buf_size * sizeof(float);
for (int i = 0; i < box_num; ++i) {
memcpy(databuf_data_out + i * offset, output_rec[i].data(), offset);
}
databuf_char_out = reinterpret_cast<char *>(databuf_data_out);
paddle::PaddleBuf paddleBuf(databuf_char_out, databuf_size_out);
paddle::PaddleTensor tensor_out;
tensor_out.name = "x";
tensor_out.dtype = paddle::PaddleDType::FLOAT32;
tensor_out.shape = output_shape;
tensor_out.data = paddleBuf;
out->push_back(tensor_out);
}
out->erase(out->begin(), out->begin() + infer_outnum);
int64_t end = timeline.TimeStampUS();
CopyBlobInfo(input_blob, output_blob);
AddBlobInfo(output_blob, start);
AddBlobInfo(output_blob, end);
return 0;
}
cv::Mat GeneralDetectionOp::Base2Mat(std::string &base64_data) {
cv::Mat img;
std::string s_mat;
s_mat = base64Decode(base64_data.data(), base64_data.size());
std::vector<char> base64_img(s_mat.begin(), s_mat.end());
img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR
return img;
}
std::string GeneralDetectionOp::base64Decode(const char *Data, int DataByte) {
const char DecodeTable[] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
62, // '+'
0, 0, 0,
63, // '/'
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9'
0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z'
0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z'
};
std::string strDecode;
int nValue;
int i = 0;
while (i < DataByte) {
if (*Data != '\r' && *Data != '\n') {
nValue = DecodeTable[*Data++] << 18;
nValue += DecodeTable[*Data++] << 12;
strDecode += (nValue & 0x00FF0000) >> 16;
if (*Data != '=') {
nValue += DecodeTable[*Data++] << 6;
strDecode += (nValue & 0x0000FF00) >> 8;
if (*Data != '=') {
nValue += DecodeTable[*Data++];
strDecode += nValue & 0x000000FF;
}
}
i += 4;
} else // 回车换行,跳过
{
Data++;
i++;
}
}
return strDecode;
}
cv::Mat
GeneralDetectionOp::GetRotateCropImage(const cv::Mat &srcimage,
std::vector<std::vector<int>> box) {
cv::Mat image;
srcimage.copyTo(image);
std::vector<std::vector<int>> points = box;
int x_collect[4] = {box[0][0], box[1][0], box[2][0], box[3][0]};
int y_collect[4] = {box[0][1], box[1][1], box[2][1], box[3][1]};
int left = int(*std::min_element(x_collect, x_collect + 4));
int right = int(*std::max_element(x_collect, x_collect + 4));
int top = int(*std::min_element(y_collect, y_collect + 4));
int bottom = int(*std::max_element(y_collect, y_collect + 4));
cv::Mat img_crop;
image(cv::Rect(left, top, right - left, bottom - top)).copyTo(img_crop);
for (int i = 0; i < points.size(); i++) {
points[i][0] -= left;
points[i][1] -= top;
}
int img_crop_width = int(sqrt(pow(points[0][0] - points[1][0], 2) +
pow(points[0][1] - points[1][1], 2)));
int img_crop_height = int(sqrt(pow(points[0][0] - points[3][0], 2) +
pow(points[0][1] - points[3][1], 2)));
cv::Point2f pts_std[4];
pts_std[0] = cv::Point2f(0., 0.);
pts_std[1] = cv::Point2f(img_crop_width, 0.);
pts_std[2] = cv::Point2f(img_crop_width, img_crop_height);
pts_std[3] = cv::Point2f(0.f, img_crop_height);
cv::Point2f pointsf[4];
pointsf[0] = cv::Point2f(points[0][0], points[0][1]);
pointsf[1] = cv::Point2f(points[1][0], points[1][1]);
pointsf[2] = cv::Point2f(points[2][0], points[2][1]);
pointsf[3] = cv::Point2f(points[3][0], points[3][1]);
cv::Mat M = cv::getPerspectiveTransform(pointsf, pts_std);
cv::Mat dst_img;
cv::warpPerspective(img_crop, dst_img, M,
cv::Size(img_crop_width, img_crop_height),
cv::BORDER_REPLICATE);
if (float(dst_img.rows) >= float(dst_img.cols) * 1.5) {
cv::Mat srcCopy = cv::Mat(dst_img.rows, dst_img.cols, dst_img.depth());
cv::transpose(dst_img, srcCopy);
cv::flip(srcCopy, srcCopy, 0);
return srcCopy;
} else {
return dst_img;
}
}
DEFINE_OP(GeneralDetectionOp);
} // namespace serving
} // namespace paddle_serving
} // namespace baidu
...@@ -45,10 +45,8 @@ for img_file in os.listdir(test_img_dir): ...@@ -45,10 +45,8 @@ for img_file in os.listdir(test_img_dir):
image_data = file.read() image_data = file.read()
image = cv2_to_base64(image_data) image = cv2_to_base64(image_data)
res_list = [] res_list = []
#print(image)
fetch_map = client.predict( fetch_map = client.predict(
feed={"x": image}, fetch=["save_infer_model/scale_0.tmp_1"], batch=True) feed={"x": image}, fetch=["save_infer_model/scale_0.tmp_1"], batch=True)
print("fetrch map:", fetch_map)
one_batch_res = ocr_reader.postprocess(fetch_map, with_score=True) one_batch_res = ocr_reader.postprocess(fetch_map, with_score=True)
for res in one_batch_res: for res in one_batch_res:
res_list.append(res[0]) res_list.append(res[0])
......
...@@ -34,12 +34,28 @@ test_img_dir = args.image_dir ...@@ -34,12 +34,28 @@ test_img_dir = args.image_dir
for idx, img_file in enumerate(os.listdir(test_img_dir)): for idx, img_file in enumerate(os.listdir(test_img_dir)):
with open(os.path.join(test_img_dir, img_file), 'rb') as file: with open(os.path.join(test_img_dir, img_file), 'rb') as file:
image_data1 = file.read() image_data1 = file.read()
# print file name
print('{}{}{}'.format('*' * 10, img_file, '*' * 10))
image = cv2_to_base64(image_data1) image = cv2_to_base64(image_data1)
for i in range(1):
data = {"key": ["image"], "value": [image]} data = {"key": ["image"], "value": [image]}
r = requests.post(url=url, data=json.dumps(data)) r = requests.post(url=url, data=json.dumps(data))
print(r.json()) result = r.json()
print("erro_no:{}, err_msg:{}".format(result["err_no"], result["err_msg"]))
# check success
if result["err_no"] == 0:
ocr_result = result["value"][0]
try:
for item in eval(ocr_result):
# return transcription and points
print("{}, {}".format(item[0], item[1]))
except Exception as e:
print("No results")
continue
else:
print(
"For details about error message, see PipelineServingLogs/pipeline.log"
)
print("==> total number of test imgs: ", len(os.listdir(test_img_dir))) print("==> total number of test imgs: ", len(os.listdir(test_img_dir)))
...@@ -15,6 +15,7 @@ from paddle_serving_server.web_service import WebService, Op ...@@ -15,6 +15,7 @@ from paddle_serving_server.web_service import WebService, Op
import logging import logging
import numpy as np import numpy as np
import copy
import cv2 import cv2
import base64 import base64
# from paddle_serving_app.reader import OCRReader # from paddle_serving_app.reader import OCRReader
...@@ -36,7 +37,7 @@ class DetOp(Op): ...@@ -36,7 +37,7 @@ class DetOp(Op):
self.filter_func = FilterBoxes(10, 10) self.filter_func = FilterBoxes(10, 10)
self.post_func = DBPostProcess({ self.post_func = DBPostProcess({
"thresh": 0.3, "thresh": 0.3,
"box_thresh": 0.5, "box_thresh": 0.6,
"max_candidates": 1000, "max_candidates": 1000,
"unclip_ratio": 1.5, "unclip_ratio": 1.5,
"min_size": 3 "min_size": 3
...@@ -79,8 +80,10 @@ class RecOp(Op): ...@@ -79,8 +80,10 @@ class RecOp(Op):
raw_im = input_dict["image"] raw_im = input_dict["image"]
data = np.frombuffer(raw_im, np.uint8) data = np.frombuffer(raw_im, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR) im = cv2.imdecode(data, cv2.IMREAD_COLOR)
dt_boxes = input_dict["dt_boxes"] self.dt_list = input_dict["dt_boxes"]
dt_boxes = self.sorted_boxes(dt_boxes) self.dt_list = self.sorted_boxes(self.dt_list)
# deepcopy to save origin dt_boxes
dt_boxes = copy.deepcopy(self.dt_list)
feed_list = [] feed_list = []
img_list = [] img_list = []
max_wh_ratio = 0 max_wh_ratio = 0
...@@ -126,25 +129,29 @@ class RecOp(Op): ...@@ -126,25 +129,29 @@ class RecOp(Op):
imgs[id] = norm_img imgs[id] = norm_img
feed = {"x": imgs.copy()} feed = {"x": imgs.copy()}
feed_list.append(feed) feed_list.append(feed)
return feed_list, False, None, "" return feed_list, False, None, ""
def postprocess(self, input_dicts, fetch_data, data_id, log_id): def postprocess(self, input_dicts, fetch_data, data_id, log_id):
res_list = [] rec_list = []
dt_num = len(self.dt_list)
if isinstance(fetch_data, dict): if isinstance(fetch_data, dict):
if len(fetch_data) > 0: if len(fetch_data) > 0:
rec_batch_res = self.ocr_reader.postprocess( rec_batch_res = self.ocr_reader.postprocess(
fetch_data, with_score=True) fetch_data, with_score=True)
for res in rec_batch_res: for res in rec_batch_res:
res_list.append(res[0]) rec_list.append(res)
elif isinstance(fetch_data, list): elif isinstance(fetch_data, list):
for one_batch in fetch_data: for one_batch in fetch_data:
one_batch_res = self.ocr_reader.postprocess( one_batch_res = self.ocr_reader.postprocess(
one_batch, with_score=True) one_batch, with_score=True)
for res in one_batch_res: for res in one_batch_res:
res_list.append(res[0]) rec_list.append(res)
result_list = []
res = {"res": str(res_list)} for i in range(dt_num):
text = rec_list[i]
dt_box = self.dt_list[i]
result_list.append([text, dt_box.tolist()])
res = {"result": str(result_list)}
return res, None, "" return res, None, ""
......
...@@ -42,7 +42,7 @@ python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3 ...@@ -42,7 +42,7 @@ python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3
# 比如下载提供的训练模型 # 比如下载提供的训练模型
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar
tar -xf ch_ppocr_mobile_v2.0_det_train.tar tar -xf ch_ppocr_mobile_v2.0_det_train.tar
python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./ch_ppocr_mobile_v2.0_det_train/best_accuracy Global.save_model_dir=./output/quant_inference_model python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./ch_ppocr_mobile_v2.0_det_train/best_accuracy Global.save_model_dir=./output/quant_model
``` ```
如果要训练识别模型的量化,修改配置文件和加载的模型参数即可。 如果要训练识别模型的量化,修改配置文件和加载的模型参数即可。
......
...@@ -35,17 +35,7 @@ from ppocr.metrics import build_metric ...@@ -35,17 +35,7 @@ from ppocr.metrics import build_metric
import tools.program as program import tools.program as program
from paddleslim.dygraph.quant import QAT from paddleslim.dygraph.quant import QAT
from ppocr.data import build_dataloader from ppocr.data import build_dataloader
from tools.export_model import export_single_model
def export_single_model(quanter, model, infer_shape, save_path, logger):
quanter.save_quantized_model(
model,
save_path,
input_spec=[
paddle.static.InputSpec(
shape=[None] + infer_shape, dtype='float32')
])
logger.info('inference QAT model is saved to {}'.format(save_path))
def main(): def main():
...@@ -84,17 +74,54 @@ def main(): ...@@ -84,17 +74,54 @@ def main():
config['Global']) config['Global'])
# build model # build model
# for rec algorithm
if hasattr(post_process_class, 'character'): if hasattr(post_process_class, 'character'):
char_num = len(getattr(post_process_class, 'character')) char_num = len(getattr(post_process_class, 'character'))
if config['Architecture']["algorithm"] in ["Distillation", if config['Architecture']["algorithm"] in ["Distillation",
]: # distillation model ]: # distillation model
for key in config['Architecture']["Models"]: for key in config['Architecture']["Models"]:
if config['Architecture']['Models'][key]['Head'][
'name'] == 'MultiHead': # for multi head
if config['PostProcess'][
'name'] == 'DistillationSARLabelDecode':
char_num = char_num - 2
# update SARLoss params
assert list(config['Loss']['loss_config_list'][-1].keys())[
0] == 'DistillationSARLoss'
config['Loss']['loss_config_list'][-1][
'DistillationSARLoss']['ignore_index'] = char_num + 1
out_channels_list = {}
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Models'][key]['Head'][
'out_channels_list'] = out_channels_list
else:
config['Architecture']["Models"][key]["Head"][ config['Architecture']["Models"][key]["Head"][
'out_channels'] = char_num 'out_channels'] = char_num
elif config['Architecture']['Head'][
'name'] == 'MultiHead': # for multi head
if config['PostProcess']['name'] == 'SARLabelDecode':
char_num = char_num - 2
# update SARLoss params
assert list(config['Loss']['loss_config_list'][1].keys())[
0] == 'SARLoss'
if config['Loss']['loss_config_list'][1]['SARLoss'] is None:
config['Loss']['loss_config_list'][1]['SARLoss'] = {
'ignore_index': char_num + 1
}
else:
config['Loss']['loss_config_list'][1]['SARLoss'][
'ignore_index'] = char_num + 1
out_channels_list = {}
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Head'][
'out_channels_list'] = out_channels_list
else: # base rec model else: # base rec model
config['Architecture']["Head"]['out_channels'] = char_num config['Architecture']["Head"]['out_channels'] = char_num
if config['PostProcess']['name'] == 'SARLabelDecode': # for SAR model
config['Loss']['ignore_index'] = char_num - 1
model = build_model(config['Architecture']) model = build_model(config['Architecture'])
# get QAT model # get QAT model
...@@ -120,21 +147,22 @@ def main(): ...@@ -120,21 +147,22 @@ def main():
for k, v in metric.items(): for k, v in metric.items():
logger.info('{}:{}'.format(k, v)) logger.info('{}:{}'.format(k, v))
infer_shape = [3, 32, 100] if model_type == "rec" else [3, 640, 640]
save_path = config["Global"]["save_inference_dir"] save_path = config["Global"]["save_inference_dir"]
arch_config = config["Architecture"] arch_config = config["Architecture"]
arch_config = config["Architecture"]
if arch_config["algorithm"] in ["Distillation", ]: # distillation model if arch_config["algorithm"] in ["Distillation", ]: # distillation model
archs = list(arch_config["Models"].values())
for idx, name in enumerate(model.model_name_list): for idx, name in enumerate(model.model_name_list):
model.model_list[idx].eval() model.model_list[idx].eval()
sub_model_save_path = os.path.join(save_path, name, "inference") sub_model_save_path = os.path.join(save_path, name, "inference")
export_single_model(quanter, model.model_list[idx], infer_shape, export_single_model(model.model_list[idx], archs[idx],
sub_model_save_path, logger) sub_model_save_path, logger, quanter)
else: else:
save_path = os.path.join(save_path, "inference") save_path = os.path.join(save_path, "inference")
model.eval() export_single_model(model, arch_config, save_path, logger, quanter)
export_single_model(quanter, model, infer_shape, save_path, logger)
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -112,10 +112,48 @@ def main(config, device, logger, vdl_writer): ...@@ -112,10 +112,48 @@ def main(config, device, logger, vdl_writer):
if config['Architecture']["algorithm"] in ["Distillation", if config['Architecture']["algorithm"] in ["Distillation",
]: # distillation model ]: # distillation model
for key in config['Architecture']["Models"]: for key in config['Architecture']["Models"]:
if config['Architecture']['Models'][key]['Head'][
'name'] == 'MultiHead': # for multi head
if config['PostProcess'][
'name'] == 'DistillationSARLabelDecode':
char_num = char_num - 2
# update SARLoss params
assert list(config['Loss']['loss_config_list'][-1].keys())[
0] == 'DistillationSARLoss'
config['Loss']['loss_config_list'][-1][
'DistillationSARLoss']['ignore_index'] = char_num + 1
out_channels_list = {}
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Models'][key]['Head'][
'out_channels_list'] = out_channels_list
else:
config['Architecture']["Models"][key]["Head"][ config['Architecture']["Models"][key]["Head"][
'out_channels'] = char_num 'out_channels'] = char_num
elif config['Architecture']['Head'][
'name'] == 'MultiHead': # for multi head
if config['PostProcess']['name'] == 'SARLabelDecode':
char_num = char_num - 2
# update SARLoss params
assert list(config['Loss']['loss_config_list'][1].keys())[
0] == 'SARLoss'
if config['Loss']['loss_config_list'][1]['SARLoss'] is None:
config['Loss']['loss_config_list'][1]['SARLoss'] = {
'ignore_index': char_num + 1
}
else:
config['Loss']['loss_config_list'][1]['SARLoss'][
'ignore_index'] = char_num + 1
out_channels_list = {}
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Head'][
'out_channels_list'] = out_channels_list
else: # base rec model else: # base rec model
config['Architecture']["Head"]['out_channels'] = char_num config['Architecture']["Head"]['out_channels'] = char_num
if config['PostProcess']['name'] == 'SARLabelDecode': # for SAR model
config['Loss']['ignore_index'] = char_num - 1
model = build_model(config['Architecture']) model = build_model(config['Architecture'])
pre_best_model_dict = dict() pre_best_model_dict = dict()
...@@ -137,7 +175,7 @@ def main(config, device, logger, vdl_writer): ...@@ -137,7 +175,7 @@ def main(config, device, logger, vdl_writer):
config['Optimizer'], config['Optimizer'],
epochs=config['Global']['epoch_num'], epochs=config['Global']['epoch_num'],
step_each_epoch=len(train_dataloader), step_each_epoch=len(train_dataloader),
parameters=model.parameters()) model=model)
# resume PACT training process # resume PACT training process
if config["Global"]["checkpoints"] is not None: if config["Global"]["checkpoints"] is not None:
......
# 前沿算法与模型
PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,已支持的模型与使用教程可点击下方列表查看:
- [文本检测算法](./algorithm_overview.md#11-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E7%AE%97%E6%B3%95)
- [文本识别算法](./algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [端到端算法](./algorithm_overview.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
**欢迎广大开发者合作共建,贡献更多算法,合入有奖🎁!具体可查看[社区常规赛](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。**
新增算法可参考如下教程:
- [使用PaddleOCR架构添加新算法](./add_new_algorithm.md)
\ No newline at end of file
# DB
- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 训练](#3-1)
- [3.2 评估](#3-2)
- [3.3 预测](#3-3)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. 算法简介
论文信息:
> [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
> Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang
> AAAI, 2020
在ICDAR2015文本检测公开数据集上,算法复现效果如下:
|模型|骨干网络|配置文件|precision|recall|Hmean|下载链接|
| --- | --- | --- | --- | --- | --- | --- |
|DB|ResNet50_vd|[configs/det/det_r50_vd_db.yml](../../configs/det/det_r50_vd_db.yml)|86.41%|78.72%|82.38%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|DB|MobileNetV3|[configs/det/det_mv3_db.yml](../../configs/det/det_mv3_db.yml)|77.29%|73.08%|75.12%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
<a name="3"></a>
## 3. 模型训练、评估、预测
请参考[文本检测训练教程](./detection.md)。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要**更换配置文件**即可。
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将DB文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar) ),可以使用如下命令进行转换:
```shell
python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_db
```
DB文本检测模型推理,可以执行如下命令:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/"
```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](../imgs_results/det_res_img_10_db.jpg)
**注意**:由于ICDAR2015数据集只有1000张训练图像,且主要针对英文场景,所以上述模型对中文文本图像检测效果会比较差。
<a name="4-2"></a>
### 4.2 C++推理
准备好推理模型后,参考[cpp infer](../../deploy/cpp_infer/)教程进行操作即可。
<a name="4-3"></a>
### 4.3 Serving服务化部署
准备好推理模型后,参考[pdserving](../../deploy/pdserving/)教程进行Serving服务化部署,包括Python Serving和C++ Serving两种模式。
<a name="4-4"></a>
### 4.4 更多推理部署
DB模型还支持以下推理部署方式:
- Paddle2ONNX推理:准备好推理模型后,参考[paddle2onnx](../../deploy/paddle2onnx/)教程操作。
<a name="5"></a>
## 5. FAQ
## 引用
```bibtex
@inproceedings{liao2020real,
title={Real-time scene text detection with differentiable binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={07},
pages={11474--11481},
year={2020}
}
```
\ No newline at end of file
# FCENet
- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 训练](#3-1)
- [3.2 评估](#3-2)
- [3.3 预测](#3-3)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. 算法简介
论文信息:
> [Fourier Contour Embedding for Arbitrary-Shaped Text Detection](https://arxiv.org/abs/2104.10442)
> Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang
> CVPR, 2021
在CTW1500文本检测公开数据集上,算法复现效果如下:
| 模型 |骨干网络|配置文件|precision|recall|Hmean|下载链接|
|-----| --- | --- | --- | --- | --- | --- |
| FCE | ResNet50_dcn | [configs/det/det_r50_vd_dcn_fce_ctw.yml](../../configs/det/det_r50_vd_dcn_fce_ctw.yml)| 88.39%|82.18%|85.27%|[训练模型](https://paddleocr.bj.bcebos.com/contribution/det_r50_dcn_fce_ctw_v2.0_train.tar)|
<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
<a name="3"></a>
## 3. 模型训练、评估、预测
上述FCE模型使用CTW1500文本检测公开数据集训练得到,数据集下载可参考 [ocr_datasets](./dataset/ocr_datasets.md)
数据下载完成后,请参考[文本检测训练教程](./detection.md)进行训练。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要**更换配置文件**即可。
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将FCE文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd_dcn骨干网络,在CTW1500英文数据集训练的模型为例( [模型下载地址](https://paddleocr.bj.bcebos.com/contribution/det_r50_dcn_fce_ctw_v2.0_train.tar) ),可以使用如下命令进行转换:
```shell
python3 tools/export_model.py -c configs/det/det_r50_vd_dcn_fce_ctw.yml -o Global.pretrained_model=./det_r50_dcn_fce_ctw_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_fce
```
FCE文本检测模型推理,执行非弯曲文本检测,可以执行如下命令:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_fce/" --det_algorithm="FCE" --det_fce_box_type=quad
```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](../imgs_results/det_res_img_10_fce.jpg)
如果想执行弯曲文本检测,可以执行如下命令:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_fce/" --det_algorithm="FCE" --det_fce_box_type=poly
```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](../imgs_results/det_res_img623_fce.jpg)
**注意**:由于CTW1500数据集只有1000张训练图像,且主要针对英文场景,所以上述模型对中文文本图像检测效果会比较差。
<a name="4-2"></a>
### 4.2 C++推理
由于后处理暂未使用CPP编写,FCE文本检测模型暂不支持CPP推理。
<a name="4-3"></a>
### 4.3 Serving服务化部署
暂未支持
<a name="4-4"></a>
### 4.4 更多推理部署
暂未支持
<a name="5"></a>
## 5. FAQ
## 引用
```bibtex
@InProceedings{zhu2021fourier,
title={Fourier Contour Embedding for Arbitrary-Shaped Text Detection},
author={Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang},
year={2021},
booktitle = {CVPR}
}
```
# PSENet
- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 训练](#3-1)
- [3.2 评估](#3-2)
- [3.3 预测](#3-3)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. 算法简介
论文信息:
> [Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473)
> Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai
> CVPR, 2019
在ICDAR2015文本检测公开数据集上,算法复现效果如下:
|模型|骨干网络|配置文件|precision|recall|Hmean|下载链接|
| --- | --- | --- | --- | --- | --- | --- |
|PSE| ResNet50_vd | [configs/det/det_r50_vd_pse.yml](../../configs/det/det_r50_vd_pse.yml)| 85.81% |79.53%|82.55%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|PSE| MobileNetV3| [configs/det/det_mv3_pse.yml](../../configs/det/det_mv3_pse.yml) | 82.20% |70.48%|75.89%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
<a name="3"></a>
## 3. 模型训练、评估、预测
上述PSE模型使用ICDAR2015文本检测公开数据集训练得到,数据集下载可参考 [ocr_datasets](./dataset/ocr_datasets.md)
数据下载完成后,请参考[文本检测训练教程](./detection.md)进行训练。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要**更换配置文件**即可。
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将PSE文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar) ),可以使用如下命令进行转换:
```shell
python3 tools/export_model.py -c configs/det/det_r50_vd_pse.yml -o Global.pretrained_model=./det_r50_vd_pse_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_pse
```
PSE文本检测模型推理,执行非弯曲文本检测,可以执行如下命令:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=quad
```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](../imgs_results/det_res_img_10_pse.jpg)
如果想执行弯曲文本检测,可以执行如下命令:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=poly
```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](../imgs_results/det_res_img_10_pse_poly.jpg)
**注意**:由于ICDAR2015数据集只有1000张训练图像,且主要针对英文场景,所以上述模型对中文或弯曲文本图像检测效果会比较差。
<a name="4-2"></a>
### 4.2 C++推理
由于后处理暂未使用CPP编写,PSE文本检测模型暂不支持CPP推理。
<a name="4-3"></a>
### 4.3 Serving服务化部署
暂未支持
<a name="4-4"></a>
### 4.4 更多推理部署
暂未支持
<a name="5"></a>
## 5. FAQ
## 引用
```bibtex
@inproceedings{wang2019shape,
title={Shape robust text detection with progressive scale expansion network},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={9336--9345},
year={2019}
}
```
...@@ -43,7 +43,7 @@ PGNet算法细节详见[论文](https://www.aaai.org/AAAI21Papers/AAAI-2885.Wang ...@@ -43,7 +43,7 @@ PGNet算法细节详见[论文](https://www.aaai.org/AAAI21Papers/AAAI-2885.Wang
<a name="环境配置"></a> <a name="环境配置"></a>
## 二、环境配置 ## 二、环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[PaddleOCR全景图与项目克隆》](./paddleOCR_overview.md)克隆项目 请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[项目克隆》](./clone.md)克隆项目
<a name="快速使用"></a> <a name="快速使用"></a>
## 三、快速使用 ## 三、快速使用
......
...@@ -14,12 +14,14 @@ inference 模型(`paddle.jit.save`保存的模型) ...@@ -14,12 +14,14 @@ inference 模型(`paddle.jit.save`保存的模型)
- [识别模型转inference模型](#识别模型转inference模型) - [识别模型转inference模型](#识别模型转inference模型)
- [方向分类模型转inference模型](#方向分类模型转inference模型) - [方向分类模型转inference模型](#方向分类模型转inference模型)
- [二、文本检测模型推理](#文本检测模型推理) - [二、文本检测模型推理](#文本检测模型推理)
- [1. 超轻量中文检测模型推理](#超轻量中文检测模型推理) - [1. 超轻量中文检测模型推理](#超轻量中文检测模型推理)
- [2. DB文本检测模型推理](#DB文本检测模型推理) - [2. DB文本检测模型推理](#DB文本检测模型推理)
- [3. EAST文本检测模型推理](#EAST文本检测模型推理) - [3. EAST文本检测模型推理](#EAST文本检测模型推理)
- [4. SAST文本检测模型推理](#SAST文本检测模型推理) - [4. SAST文本检测模型推理](#SAST文本检测模型推理)
- [三、文本识别模型推理](#文本识别模型推理) - [三、文本识别模型推理](#文本识别模型推理)
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理) - [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理) - [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
...@@ -27,15 +29,19 @@ inference 模型(`paddle.jit.save`保存的模型) ...@@ -27,15 +29,19 @@ inference 模型(`paddle.jit.save`保存的模型)
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理) - [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
- [5. 多语言模型的推理](#多语言模型的推理) - [5. 多语言模型的推理](#多语言模型的推理)
- [四、方向分类模型推理](#方向识别模型推理) - [四、方向分类模型推理](#方向识别模型推理)
- [1. 方向分类模型推理](#方向分类模型推理) - [1. 方向分类模型推理](#方向分类模型推理)
- [五、文本检测、方向分类和文字识别串联推理](#文本检测、方向分类和文字识别串联推理) - [五、文本检测、方向分类和文字识别串联推理](#文本检测、方向分类和文字识别串联推理)
- [1. 超轻量中文OCR模型推理](#超轻量中文OCR模型推理) - [1. 超轻量中文OCR模型推理](#超轻量中文OCR模型推理)
- [2. 其他模型推理](#其他模型推理) - [2. 其他模型推理](#其他模型推理)
- [六、参数解释](#参数解释) - [六、参数解释](#参数解释)
- [七、FAQ](#FAQ) - [七、FAQ](#FAQ)
......
# 两阶段算法 # OCR算法
- [两阶段算法](#两阶段算法) - [1. 两阶段算法](#1-两阶段算法)
- [1. 算法介绍](#1-算法介绍)
- [1.1 文本检测算法](#11-文本检测算法) - [1.1 文本检测算法](#11-文本检测算法)
- [1.2 文本识别算法](#12-文本识别算法) - [1.2 文本识别算法](#12-文本识别算法)
- [2. 模型训练](#2-模型训练) - [2. 端到端算法](#2-端到端算法)
- [3. 模型推理](#3-模型推理)
本文给出了PaddleOCR已支持的OCR算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)
<a name="1"></a> <a name="1"></a>
## 1. 算法介绍 ## 1. 两阶段算法
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)
<a name="11"></a> <a name="11"></a>
### 1.1 文本检测算法 ### 1.1 文本检测算法
PaddleOCR开源的文本检测算法列表 已支持的文本检测算法列表(戳链接获取使用教程)
- [x] DB([paper]( https://arxiv.org/abs/1911.08947)) [2](ppocr推荐) - [x] [DB](./algorithm_det_db.md)
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))[1] - [x] [EAST](./algorithm_det_east.md)
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))[4] - [x] [SAST](./algorithm_det_sast.md)
- [x] PSENet([paper](https://arxiv.org/abs/1903.12473v2) - [x] [PSENet](./algorithm_det_psenet.md)
- [x] FCENet([paper](https://arxiv.org/abs/2104.10442)) - [x] [FCENet](./algorithm_det_fcenet.md)
在ICDAR2015文本检测公开数据集上,算法效果如下: 在ICDAR2015文本检测公开数据集上,算法效果如下:
|模型|骨干网络|precision|recall|Hmean|下载链接| |模型|骨干网络|precision|recall|Hmean|下载链接|
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- |
|EAST|ResNet50_vd|88.71%|81.36%|84.88%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| |EAST|ResNet50_vd|88.71%|81.36%|84.88%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
...@@ -50,19 +51,20 @@ PaddleOCR开源的文本检测算法列表: ...@@ -50,19 +51,20 @@ PaddleOCR开源的文本检测算法列表:
* [百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) * [百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
* [Google Drive下载地址](https://drive.google.com/drive/folders/1ll2-XEVyCQLpJjawLDiRlvo_i4BqHCJe?usp=sharing) * [Google Drive下载地址](https://drive.google.com/drive/folders/1ll2-XEVyCQLpJjawLDiRlvo_i4BqHCJe?usp=sharing)
<a name="12"></a> <a name="12"></a>
### 1.2 文本识别算法 ### 1.2 文本识别算法
PaddleOCR基于动态图开源的文本识别算法列表 已支持的文本识别算法列表(戳链接获取使用教程)
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))[7](ppocr推荐) - [x] [CRNN](./algorithm_rec_crnn.md)
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))[10] - [x] [Rosetta](./algorithm_rec_rosetta.md)
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))[11] - [x] [STAR-Net](./algorithm_rec_starnet.md)
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12] - [x] [RARE](./algorithm_rec_rare.md)
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5] - [x] [SRN](./algorithm_rec_srn.md)
- [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2))[13] - [x] [NRTR](./algorithm_rec_nrtr.md)
- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2)) - [x] [SAR](./algorithm_rec_sar.md)
- [x] SEED([paper](https://arxiv.org/pdf/2005.10977.pdf)) - [x] [SEED](./algorithm_rec_seed.md)
参考[DTRB](https://arxiv.org/abs/1904.01906)[3]文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: 参考[DTRB](https://arxiv.org/abs/1904.01906)[3]文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
...@@ -81,14 +83,12 @@ PaddleOCR基于动态图开源的文本识别算法列表: ...@@ -81,14 +83,12 @@ PaddleOCR基于动态图开源的文本识别算法列表:
|SAR|Resnet31| 87.20% | rec_r31_sar | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) | |SAR|Resnet31| 87.20% | rec_r31_sar | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
|SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | |SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) |
<a name="2"></a>
## 2. 模型训练 <a name="2"></a>
PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./detection.md)。文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md) ## 2. 端到端算法
<a name="3"></a> 已支持的端到端OCR算法列表(戳链接获取使用教程):
- [x] [PGNet](./algorithm_e2e_pgnet.md)
## 3. 模型推理
上述模型中除PP-OCR系列模型以外,其余模型仅支持基于Python引擎的推理,具体内容可参考[基于Python预测引擎推理](./inference.md)
# SAR
- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 训练](#3-1)
- [3.2 评估](#3-2)
- [3.3 预测](#3-3)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. 算法简介
论文信息:
> [Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751)
> Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang
> AAAI, 2019
使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:
|模型|骨干网络|配置文件|Acc|下载链接|
| --- | --- | --- | --- | --- | --- | --- |
|SAR|ResNet31|[rec_r31_sar.yml](../../configs/rec/rec_r31_sar.yml)|87.20%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar)|
注:除了使用MJSynth和SynthText两个文字识别数据集外,还加入了[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg)数据(提取码:627x),和部分真实数据,具体数据细节可以参考论文。
<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
<a name="3"></a>
## 3. 模型训练、评估、预测
请参考[文本识别教程](./recognition.md)。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要**更换配置文件**即可。
训练
具体地,在完成数据准备后,便可以启动训练,训练命令如下:
```
#单卡训练(训练周期长,不建议)
python3 tools/train.py -c configs/rec/rec_r31_sar.yml
#多卡训练,通过--gpus参数指定卡号
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r31_sar.yml
```
评估
```
# GPU 评估, Global.pretrained_model 为待测权重
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
```
预测:
```
# 预测使用的配置文件必须与训练一致
python3 tools/infer_rec.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
```
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将SAR文本识别训练过程中保存的模型,转换成inference model。( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) ),可以使用如下命令进行转换:
```
python3 tools/export_model.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model=./rec_r31_sar_train/best_accuracy Global.save_inference_dir=./inference/rec_sar
```
SAR文本识别模型推理,可以执行如下命令:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_sar/" --rec_image_shape="3, 48, 48, 160" --rec_char_type="ch" --rec_algorithm="SAR" --rec_char_dict_path="ppocr/utils/dict90.txt" --max_text_length=30 --use_space_char=False
```
<a name="4-2"></a>
### 4.2 C++推理
由于C++预处理后处理还未支持SAR,所以暂未支持
<a name="4-3"></a>
### 4.3 Serving服务化部署
暂不支持
<a name="4-4"></a>
### 4.4 更多推理部署
暂不支持
<a name="5"></a>
## 5. FAQ
## 引用
```bibtex
@article{Li2019ShowAA,
title={Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition},
author={Hui Li and Peng Wang and Chunhua Shen and Guyu Zhang},
journal={ArXiv},
year={2019},
volume={abs/1811.00751}
}
```
# SRN
- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 训练](#3-1)
- [3.2 评估](#3-2)
- [3.3 预测](#3-3)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. 算法简介
论文信息:
> [Towards Accurate Scene Text Recognition with Semantic Reasoning Networks](https://arxiv.org/abs/2003.12294#)
> Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, Errui Ding
> CVPR,2020
使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:
|模型|骨干网络|配置文件|Acc|下载链接|
| --- | --- | --- | --- | --- | --- | --- |
|SRN|Resnet50_vd_fpn|[rec_r50_fpn_srn.yml](../../configs/rec/rec_r50_fpn_srn.yml)|86.31%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
<a name="3"></a>
## 3. 模型训练、评估、预测
请参考[文本识别教程](./recognition.md)。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要**更换配置文件**即可。
训练
具体地,在完成数据准备后,便可以启动训练,训练命令如下:
```
#单卡训练(训练周期长,不建议)
python3 tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
#多卡训练,通过--gpus参数指定卡号
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
```
评估
```
# GPU 评估, Global.pretrained_model 为待测权重
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
```
预测:
```
# 预测使用的配置文件必须与训练一致
python3 tools/infer_rec.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
```
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将SRN文本识别训练过程中保存的模型,转换成inference model。( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar) ),可以使用如下命令进行转换:
```
python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./rec_r50_vd_srn_train/best_accuracy Global.save_inference_dir=./inference/rec_srn
```
SRN文本识别模型推理,可以执行如下命令:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_char_type="ch" --rec_algorithm="SRN" --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --use_space_char=False
```
<a name="4-2"></a>
### 4.2 C++推理
由于C++预处理后处理还未支持SRN,所以暂未支持
<a name="4-3"></a>
### 4.3 Serving服务化部署
暂不支持
<a name="4-4"></a>
### 4.4 更多推理部署
暂不支持
<a name="5"></a>
## 5. FAQ
## 引用
```bibtex
@article{Yu2020TowardsAS,
title={Towards Accurate Scene Text Recognition With Semantic Reasoning Networks},
author={Deli Yu and Xuan Li and Chengquan Zhang and Junyu Han and Jingtuo Liu and Errui Ding},
journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020},
pages={12110-12119}
}
```
# Android Demo 快速测试
### 1. 安装最新版本的Android Studio
可以从 https://developer.android.com/studio 下载。本Demo使用是4.0版本Android Studio编写。
### 2. 创建新项目
Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编译成功。
如果您是初学者,可以用以下方式安装和测试NDK编译环境。
点击 File -> New ->New Project, 新建 "Native C++" project
1. Start a new Android Studio project
在项目模版中选择 Native C++ 选择PaddleOCR/deploy/android_demo 路径
进入项目后会自动编译,第一次编译会花费较长的时间,建议添加代理加速下载。
**代理添加:**
选择 Android Studio -> Preferences -> Appearance & Behavior -> System Settings -> HTTP Proxy -> Manual proxy configuration
![](../demo/proxy.png)
2. 开始编译
点击编译按钮,连接手机,跟着Android Studio的引导完成操作。
在 Android Studio 里看到下图,表示编译完成:
![](../demo/build.png)
**提示:** 此时如果出现下列找不到OpenCV的报错信息,请重新点击编译,编译完成后退出项目,再次进入。
![](../demo/error.png)
### 3. 发送到手机端
完成编译,点击运行,在手机端查看效果。
### 4. 如何自定义demo图片
1. 图片存放路径:android_demo/app/src/main/assets/images
将自定义图片放置在该路径下
2. 配置文件: android_demo/app/src/main/res/values/strings.xml
修改 IMAGE_PATH_DEFAULT 为自定义图片名即可
# 获得更多支持
前往[端计算模型生成平台EasyEdge](https://ai.baidu.com/easyedge/app/open_source_demo?referrerUrl=paddlelite),获得更多开发支持:
- Demo APP:可使用手机扫码安装,方便手机端快速体验文字识别
- SDK:模型被封装为适配不同芯片硬件和操作系统SDK,包括完善的接口,方便进行二次开发
# 场景应用
\ No newline at end of file
# PaddleOCR全景图与项目克隆 # 项目克隆
## 1. PaddleOCR全景图 ## 1. 克隆PaddleOCR repo代码
PaddleOCR包含丰富的文本检测、文本识别以及端到端算法。结合实际测试与产业经验,PaddleOCR选择DB和CRNN作为基础的检测和识别模型,经过一系列优化策略提出面向产业应用的PP-OCR模型。PP-OCR模型针对通用场景,根据不同语种形成了PP-OCR模型库。基于PP-OCR的能力,PaddleOCR针对文档场景任务发布PP-Structure工具库,包含版面分析和表格识别两大任务。为了打通产业落地的全流程,PaddleOCR提供了规模化的数据生产工具和多种预测部署工具,助力开发者快速落地。
<div align="center">
<img src="../overview.png">
</div>
## 2. 项目克隆
### **2.1 克隆PaddleOCR repo代码**
``` ```
【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR 【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
...@@ -24,7 +14,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR ...@@ -24,7 +14,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
### **2.2 安装第三方库** ## 2. 安装第三方库
``` ```
cd PaddleOCR cd PaddleOCR
......
...@@ -46,7 +46,7 @@ ...@@ -46,7 +46,7 @@
### Optimizer ([ppocr/optimizer](../../ppocr/optimizer)) ### Optimizer ([ppocr/optimizer](../../ppocr/optimizer))
| 字段 | 用途 | 默认值 | 备注 | | 字段 | 用途 | 默认值 | 备注 |
| :---------------------: | :---------------------: | :--------------: | :--------------------: | | :---------------------: |:-------------:|:-------------:| :--------------------: |
| name | 优化器类名 | Adam | 目前支持`Momentum`,`Adam`,`RMSProp`, 见[ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) | | name | 优化器类名 | Adam | 目前支持`Momentum`,`Adam`,`RMSProp`, 见[ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) |
| beta1 | 设置一阶矩估计的指数衰减率 | 0.9 | \ | | beta1 | 设置一阶矩估计的指数衰减率 | 0.9 | \ |
| beta2 | 设置二阶矩估计的指数衰减率 | 0.999 | \ | | beta2 | 设置二阶矩估计的指数衰减率 | 0.999 | \ |
...@@ -56,7 +56,7 @@ ...@@ -56,7 +56,7 @@
| learning_rate | 基础学习率 | 0.001 | \ | | learning_rate | 基础学习率 | 0.001 | \ |
| **regularizer** | 设置网络正则化方式 | - | \ | | **regularizer** | 设置网络正则化方式 | - | \ |
| name | 正则化类名 | L2 | 目前支持`L1`,`L2`, 见[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py) | | name | 正则化类名 | L2 | 目前支持`L1`,`L2`, 见[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py) |
| factor | 学习率衰减系数 | 0.00004 | \ | | factor | 正则化系数 | 0.00001 | \ |
### Architecture ([ppocr/modeling](../../ppocr/modeling)) ### Architecture ([ppocr/modeling](../../ppocr/modeling))
......
...@@ -6,17 +6,17 @@ ...@@ -6,17 +6,17 @@
- [中文文档文字识别](#中文文档文字识别) - [中文文档文字识别](#中文文档文字识别)
- [ICDAR2019-ArT](#ICDAR2019-ArT) - [ICDAR2019-ArT](#ICDAR2019-ArT)
除了开源数据,用户还可使用合成工具自行合成,可参考[数据合成工具](./data_synthesis.md) 除了开源数据,用户还可使用合成工具自行合成,可参考[数据合成工具](../data_synthesis.md)
如果需要标注自己的数据,可参考[数据标注工具](./data_annotation.md) 如果需要标注自己的数据,可参考[数据标注工具](../data_annotation.md)
<a name="ICDAR2019-LSVT"></a> <a name="ICDAR2019-LSVT"></a>
#### 1、ICDAR2019-LSVT #### 1、ICDAR2019-LSVT
- **数据来源**:https://ai.baidu.com/broad/introduction?dataset=lsvt - **数据来源**:https://ai.baidu.com/broad/introduction?dataset=lsvt
- **数据简介**: 共45w中文街景图像,包含5w(2w测试+3w训练)全标注数据(文本坐标+文本内容),40w弱标注数据(仅文本内容),如下图所示: - **数据简介**: 共45w中文街景图像,包含5w(2w测试+3w训练)全标注数据(文本坐标+文本内容),40w弱标注数据(仅文本内容),如下图所示:
![](../datasets/LSVT_1.jpg) ![](../../datasets/LSVT_1.jpg)
(a) 全标注数据 (a) 全标注数据
![](../datasets/LSVT_2.jpg) ![](../../datasets/LSVT_2.jpg)
(b) 弱标注数据 (b) 弱标注数据
- **下载地址**:https://ai.baidu.com/broad/download?dataset=lsvt - **下载地址**:https://ai.baidu.com/broad/download?dataset=lsvt
- **说明**:其中,test数据集的label目前没有开源,如要评估结果,可以去官网提交:https://rrc.cvc.uab.es/?ch=16 - **说明**:其中,test数据集的label目前没有开源,如要评估结果,可以去官网提交:https://rrc.cvc.uab.es/?ch=16
...@@ -25,16 +25,16 @@ ...@@ -25,16 +25,16 @@
#### 2、ICDAR2017-RCTW-17 #### 2、ICDAR2017-RCTW-17
- **数据来源**:https://rctw.vlrlab.net/ - **数据来源**:https://rctw.vlrlab.net/
- **数据简介**:共包含12,000+图像,大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。 - **数据简介**:共包含12,000+图像,大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。
![](../datasets/rctw.jpg) ![](../../datasets/rctw.jpg)
- **下载地址**:https://rctw.vlrlab.net/dataset/ - **下载地址**:https://rctw.vlrlab.net/dataset/
<a name="中文街景文字识别"></a> <a name="中文街景文字识别"></a>
#### 3、中文街景文字识别 #### 3、中文街景文字识别
- **数据来源**:https://aistudio.baidu.com/aistudio/competition/detail/8 - **数据来源**:https://aistudio.baidu.com/aistudio/competition/detail/8
- **数据简介**:ICDAR2019-LSVT行识别任务,共包括29万张图片,其中21万张图片作为训练集(带标注),8万张作为测试集(无标注)。数据集采自中国街景,并由街景图片中的文字行区域(例如店铺标牌、地标等等)截取出来而形成。所有图像都经过一些预处理,将文字区域利用仿射变化,等比映射为一张高为48像素的图片,如图所示: - **数据简介**:ICDAR2019-LSVT行识别任务,共包括29万张图片,其中21万张图片作为训练集(带标注),8万张作为测试集(无标注)。数据集采自中国街景,并由街景图片中的文字行区域(例如店铺标牌、地标等等)截取出来而形成。所有图像都经过一些预处理,将文字区域利用仿射变化,等比映射为一张高为48像素的图片,如图所示:
![](../datasets/ch_street_rec_1.png) ![](../../datasets/ch_street_rec_1.png)
(a) 标注:魅派集成吊顶 (a) 标注:魅派集成吊顶
![](../datasets/ch_street_rec_2.png) ![](../../datasets/ch_street_rec_2.png)
(b) 标注:母婴用品连锁 (b) 标注:母婴用品连锁
- **下载地址** - **下载地址**
https://aistudio.baidu.com/aistudio/datasetdetail/8429 https://aistudio.baidu.com/aistudio/datasetdetail/8429
...@@ -48,15 +48,15 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429 ...@@ -48,15 +48,15 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429
- 包含汉字、英文字母、数字和标点共5990个字符(字符集合:https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt ) - 包含汉字、英文字母、数字和标点共5990个字符(字符集合:https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt )
- 每个样本固定10个字符,字符随机截取自语料库中的句子 - 每个样本固定10个字符,字符随机截取自语料库中的句子
- 图片分辨率统一为280x32 - 图片分辨率统一为280x32
![](../datasets/ch_doc1.jpg) ![](../../datasets/ch_doc1.jpg)
![](../datasets/ch_doc3.jpg) ![](../../datasets/ch_doc3.jpg)
- **下载地址**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码:lu7m) - **下载地址**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码:lu7m)
<a name="ICDAR2019-ArT"></a> <a name="ICDAR2019-ArT"></a>
#### 5、ICDAR2019-ArT #### 5、ICDAR2019-ArT
- **数据来源**:https://ai.baidu.com/broad/introduction?dataset=art - **数据来源**:https://ai.baidu.com/broad/introduction?dataset=art
- **数据简介**:共包含10,166张图像,训练集5603图,测试集4563图。由Total-Text、SCUT-CTW1500、Baidu Curved Scene Text (ICDAR2019-LSVT部分弯曲数据) 三部分组成,包含水平、多方向和弯曲等多种形状的文本。 - **数据简介**:共包含10,166张图像,训练集5603图,测试集4563图。由Total-Text、SCUT-CTW1500、Baidu Curved Scene Text (ICDAR2019-LSVT部分弯曲数据) 三部分组成,包含水平、多方向和弯曲等多种形状的文本。
![](../datasets/ArT.jpg) ![](../../datasets/ArT.jpg)
- **下载地址**:https://ai.baidu.com/broad/download?dataset=art - **下载地址**:https://ai.baidu.com/broad/download?dataset=art
## 参考文献 ## 参考文献
......
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
- **数据简介** - **数据简介**
* 包含在线和离线两类手写数据,`HWDB1.0~1.2`总共有3895135个手写单字样本,分属7356类(7185个汉字和171个英文字母、数字、符号);`HWDB2.0~2.2`总共有5091页图像,分割为52230个文本行和1349414个文字。所有文字和文本样本均存为灰度图像。部分单字样本图片如下所示。 * 包含在线和离线两类手写数据,`HWDB1.0~1.2`总共有3895135个手写单字样本,分属7356类(7185个汉字和171个英文字母、数字、符号);`HWDB2.0~2.2`总共有5091页图像,分割为52230个文本行和1349414个文字。所有文字和文本样本均存为灰度图像。部分单字样本图片如下所示。
![](../datasets/CASIA_0.jpg) ![](../../datasets/CASIA_0.jpg)
- **下载地址**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html - **下载地址**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html
- **使用建议**:数据为单字,白色背景,可以大量合成文字行进行训练。白色背景可以处理成透明状态,方便添加各种背景。对于需要语义的情况,建议从真实语料出发,抽取单字组成文字行 - **使用建议**:数据为单字,白色背景,可以大量合成文字行进行训练。白色背景可以处理成透明状态,方便添加各种背景。对于需要语义的情况,建议从真实语料出发,抽取单字组成文字行
...@@ -22,7 +22,7 @@ ...@@ -22,7 +22,7 @@
- **数据简介**: NIST19数据集适用于手写文档和字符识别的模型训练,从3600位作者的手写样本表格中提取得到,总共包含81万张字符图片。其中9张图片示例如下。 - **数据简介**: NIST19数据集适用于手写文档和字符识别的模型训练,从3600位作者的手写样本表格中提取得到,总共包含81万张字符图片。其中9张图片示例如下。
![](../datasets/nist_demo.png) ![](../../datasets/nist_demo.png)
- **下载地址**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19) - **下载地址**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19)
## 版面分析数据集
这里整理了常用版面分析数据集,持续更新中,欢迎各位小伙伴贡献数据集~
- [publaynet数据集](#publaynet)
- [CDLA数据集](#CDLA)
- [TableBank数据集](#TableBank)
版面分析数据集多为目标检测数据集,除了开源数据,用户还可使用合成工具自行合成,如[labelme](https://github.com/wkentaro/labelme)等。
<a name="publaynet"></a>
#### 1、publaynet数据集
- **数据来源**:https://github.com/ibm-aur-nlp/PubLayNet
- **数据简介**:publaynet数据集的训练集合中包含35万张图像,验证集合中包含1.1万张图像。总共包含5个类别,分别是: `text, title, list, table, figure`。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/publaynet_demo/gt_PMC3724501_00006.jpg" width="500">
<img src="../datasets/publaynet_demo/gt_PMC5086060_00002.jpg" width="500">
</div>
- **下载地址**:https://developer.ibm.com/exchanges/data/all/publaynet/
- **说明**:使用该数据集时,需要遵守[CDLA-Permissive](https://cdla.io/permissive-1-0/)协议。
<a name="CDLA"></a>
#### 2、CDLA数据集
- **数据来源**:https://github.com/buptlihang/CDLA
- **数据简介**:CDLA据集的训练集合中包含5000张图像,验证集合中包含1000张图像。总共包含10个类别,分别是: `Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation`。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/CDLA_demo/val_0633.jpg" width="500">
<img src="../datasets/CDLA_demo/val_0941.jpg" width="500">
</div>
- **下载地址**:https://github.com/buptlihang/CDLA
- **说明**:基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/develop)套件,在该数据集上训练目标检测模型时,在转换label时,需要将`label.txt`中的`__ignore__``_background_`去除。
<a name="TableBank"></a>
#### 3、TableBank数据集
- **数据来源**:https://doc-analysis.github.io/tablebank-page/index.html
- **数据简介**:TableBank数据集包含Latex(训练集187199张,验证集7265张,测试集5719张)与Word(训练集73383张,验证集2735张,测试集2281张)两种类别的文档。仅包含`Table` 1个类别。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/tablebank_demo/004.png" height="700">
<img src="../datasets/tablebank_demo/005.png" height="700">
</div>
- **下载地址**:https://doc-analysis.github.io/tablebank-page/index.html
- **说明**:使用该数据集时,需要遵守[Apache-2.0](https://github.com/doc-analysis/TableBank/blob/master/LICENSE)协议。
# OCR数据集
- [1. 文本检测](#1-文本检测)
- [1.1 PaddleOCR 文字检测数据格式](#11-paddleocr-文字检测数据格式)
- [1.2 公开数据集](#12-公开数据集)
- [1.2.1 ICDAR 2015](#121-icdar-2015)
- [2. 文本识别](#2-文本识别)
- [2.1 PaddleOCR 文字识别数据格式](#21-paddleocr-文字识别数据格式)
- [2.2 公开数据集](#22-公开数据集)
- [2.1 ICDAR 2015](#21-icdar-2015)
- [3. 数据存放路径](#3-数据存放路径)
这里整理了OCR中常用的公开数据集,持续更新中,欢迎各位小伙伴贡献数据集~
## 1. 文本检测
### 1.1 PaddleOCR 文字检测数据格式
PaddleOCR 中的文本检测算法支持的标注文件格式如下,中间用"\t"分隔:
```
" 图像文件名 json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
```
json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。
`transcription` 表示当前文本框的文字,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。**
如果您想在我们未提供的数据集上训练,可以按照上述形式构建标注文件。
### 1.2 公开数据集
| 数据集名称 |图片下载地址| PaddleOCR 标注下载地址 |
|---|---|---|
| ICDAR 2015 |https://rrc.cvc.uab.es/?ch=4&com=downloads| [train](https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt) / [test](https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt) |
| ctw1500 |https://paddleocr.bj.bcebos.com/dataset/ctw1500.zip| 图片下载地址中已包含 |
| total text |https://paddleocr.bj.bcebos.com/dataset/total_text.tar| 图片下载地址中已包含 |
#### 1.2.1 ICDAR 2015
ICDAR 2015 数据集包含1000张训练图像和500张测试图像。ICDAR 2015 数据集可以从上表中链接下载,首次下载需注册。
注册完成登陆后,下载下图中红色框标出的部分,其中, `Training Set Images`下载的内容保存在`icdar_c4_train_imgs`文件夹下,`Test Set Images` 下载的内容保存早`ch4_test_images`文件夹下
<p align="center">
<img src="../../datasets/ic15_location_download.png" align="middle" width = "700"/>
<p align="center">
将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/下。然后从上表中下载转换好的标注文件。
PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `ppocr/utils/gen_label.py`, 这里以训练集为例:
```
# 将官网下载的标签文件转换为 train_icdar2015_label.txt
python gen_label.py --mode="det" --root_path="/path/to/icdar_c4_train_imgs/" \
--input_path="/path/to/ch4_training_localization_transcription_gt" \
--output_label="/path/to/train_icdar2015_label.txt"
```
解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,按照如下方式组织icdar2015数据集:
```
/PaddleOCR/train_data/icdar2015/text_localization/
└─ icdar_c4_train_imgs/ icdar 2015 数据集的训练数据
└─ ch4_test_images/ icdar 2015 数据集的测试数据
└─ train_icdar2015_label.txt icdar 2015 数据集的训练标注
└─ test_icdar2015_label.txt icdar 2015 数据集的测试标注
```
## 2. 文本识别
### 2.1 PaddleOCR 文字识别数据格式
PaddleOCR 中的文字识别算法支持两种数据格式:
- `lmdb` 用于训练以lmdb格式存储的数据集,使用 [lmdb_dataset.py](../../../ppocr/data/lmdb_dataset.py) 进行读取;
- `通用数据` 用于训练以文本文件存储的数据集,使用 [simple_dataset.py](../../../ppocr/data/simple_dataset.py)进行读取。
下面以通用数据集为例, 介绍如何准备数据集:
* 训练集
建议将训练图片放入同一个文件夹,并用一个txt文件(rec_gt_train.txt)记录图片路径和标签,txt文件里的内容如下:
**注意:** txt文件中默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错。
```
" 图像文件名 图像标注信息 "
train_data/rec/train/word_001.jpg 简单可依赖
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
...
```
最终训练集应有如下文件结构:
```
|-train_data
|-rec
|- rec_gt_train.txt
|- train
|- word_001.png
|- word_002.jpg
|- word_003.jpg
| ...
```
除上述单张图像为一行格式之外,PaddleOCR也支持对离线增广后的数据进行训练,为了防止相同样本在同一个batch中被多次采样,我们可以将相同标签对应的图片路径写在一行中,以列表的形式给出,在训练中,PaddleOCR会随机选择列表中的一张图片进行训练。对应地,标注文件的格式如下。
```
["11.jpg", "12.jpg"] 简单可依赖
["21.jpg", "22.jpg", "23.jpg"] 用科技让复杂的世界更简单
3.jpg ocr
```
上述示例标注文件中,"11.jpg"和"12.jpg"的标签相同,都是`简单可依赖`,在训练的时候,对于该行标注,会随机选择其中的一张图片进行训练。
- 验证集
同训练集类似,验证集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,验证集的结构如下所示:
```
|-train_data
|-rec
|- rec_gt_test.txt
|- test
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
```
### 2.2 公开数据集
| 数据集名称 | 图片下载地址 | PaddleOCR 标注下载地址 |
|---|---|---------------------------------------------------------------------|
| en benchmark(MJ, SJ, IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE.) | [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) | LMDB格式,可直接用[lmdb_dataset.py](../../../ppocr/data/lmdb_dataset.py)加载 |
|ICDAR 2015| http://rrc.cvc.uab.es/?ch=4&com=downloads | [train](https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt)/ [test](https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt) |
| 多语言数据集 |[百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 提取码:frgi <br> [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view) | 图片下载地址中已包含 |
#### 2.1 ICDAR 2015
ICDAR 2015 数据集可以在上表中链接下载,用于快速验证。也可以从上表中下载 en benchmark 所需的lmdb格式数据集。
下载完图片后从上表中下载转换好的标注文件。
PaddleOCR 也提供了数据格式转换脚本,可以将ICDAR官网 label 转换为PaddleOCR支持的数据格式。 数据转换工具在 `ppocr/utils/gen_label.py`, 这里以训练集为例:
```
# 将官网下载的标签文件转换为 rec_gt_label.txt
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
```
数据样式格式如下,(a)为原始图片,(b)为每张图片对应的 Ground Truth 文本文件:
![](../../datasets/icdar_rec.png)
## 3. 数据存放路径
PaddleOCR训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录:
```
# linux and mac os
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
# windows
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
```
# 表格识别数据集
- [数据集汇总](#数据集汇总)
- [1. PubTabNet数据集](#1-pubtabnet数据集)
- [2. 好未来表格识别竞赛数据集](#2-好未来表格识别竞赛数据集)
这里整理了常用表格识别数据集,持续更新中,欢迎各位小伙伴贡献数据集~
## 数据集汇总
| 数据集名称 |图片下载地址| PPOCR标注下载地址 |
|---|---|---|
| PubTabNet |https://github.com/ibm-aur-nlp/PubTabNet| jsonl格式,可直接用[pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py)加载 |
| 好未来表格识别竞赛数据集 |https://ai.100tal.com/dataset| jsonl格式,可直接用[pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py)加载 |
## 1. PubTabNet数据集
- **数据简介**:PubTabNet数据集的训练集合中包含50万张图像,验证集合中包含0.9万张图像。部分图像可视化如下所示。
<div align="center">
<img src="../../datasets/table_PubTabNet_demo/PMC524509_007_00.png" width="500">
<img src="../../datasets/table_PubTabNet_demo/PMC535543_007_01.png" width="500">
</div>
- **说明**:使用该数据集时,需要遵守[CDLA-Permissive](https://cdla.io/permissive-1-0/)协议。
## 2. 好未来表格识别竞赛数据集
- **数据简介**:好未来表格识别竞赛数据集的训练集合中包含1.6万张图像。验证集未给出可训练的标注。
<div align="center">
<img src="../../datasets/table_tal_demo/1.jpg" width="500">
<img src="../../datasets/table_tal_demo/2.jpg" width="500">
</div>
...@@ -22,7 +22,7 @@ ...@@ -22,7 +22,7 @@
* CCPD-Challenge: 至今在车牌检测识别任务中最有挑战性的一些图片 * CCPD-Challenge: 至今在车牌检测识别任务中最有挑战性的一些图片
* CCPD-NP: 没有安装车牌的新车图片。 * CCPD-NP: 没有安装车牌的新车图片。
![](../datasets/ccpd_demo.png) ![](../../datasets/ccpd_demo.png)
- **下载地址** - **下载地址**
...@@ -46,7 +46,7 @@ ...@@ -46,7 +46,7 @@
* 有效期结束:07/41 * 有效期结束:07/41
* 卡用户拼音:MICHAEL * 卡用户拼音:MICHAEL
![](../datasets/cmb_demo.jpg) ![](../../datasets/cmb_demo.jpg)
- **下载地址**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip) - **下载地址**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip)
...@@ -59,7 +59,7 @@ ...@@ -59,7 +59,7 @@
- **数据简介**: 这是一个数据合成的工具包,可以根据输入的文本,输出验证码图片,使用该工具包生成几张demo图片如下。 - **数据简介**: 这是一个数据合成的工具包,可以根据输入的文本,输出验证码图片,使用该工具包生成几张demo图片如下。
![](../datasets/captcha_demo.png) ![](../../datasets/captcha_demo.png)
- **下载地址**: 该数据集是生成得到,无下载地址。 - **下载地址**: 该数据集是生成得到,无下载地址。
......
# 文字检测 # 文字检测
本节以icdar2015数据集为例,介绍PaddleOCR中检测模型训练、评估、测试的使用方式。 本节以icdar2015数据集为例,介绍PaddleOCR中检测模型训练、评估、测试的使用方式。
- [1. 准备数据和模型](#1--------) - [1. 准备数据和模型](#1-准备数据和模型)
* [1.1 数据准备](#11-----) - [1.1 准备数据集](#11-准备数据集)
* [1.2 下载预训练模型](#12--------) - [1.2 下载预训练模型](#12-下载预训练模型)
- [2. 开始训练](#2-----) - [2. 开始训练](#2-开始训练)
* [2.1 启动训练](#21-----) - [2.1 启动训练](#21-启动训练)
* [2.2 断点训练](#22-----) - [2.2 断点训练](#22-断点训练)
* [2.3 更换Backbone 训练](#23---backbone---) - [2.3 更换Backbone 训练](#23-更换backbone-训练)
* [2.4 知识蒸馏训练](#24---distill---) - [2.4 混合精度训练](#24-混合精度训练)
- [3. 模型评估与预测](#3--------) - [2.5 分布式训练](#25-分布式训练)
* [3.1 指标评估](#31-----) - [2.6 知识蒸馏训练](#26-知识蒸馏训练)
* [3.2 测试检测效果](#32-------) - [2.7 其他训练环境](#27-其他训练环境)
- [4. 模型导出与预测](#4--------) - [3. 模型评估与预测](#3-模型评估与预测)
- [3.1 指标评估](#31-指标评估)
- [3.2 测试检测效果](#32-测试检测效果)
- [4. 模型导出与预测](#4-模型导出与预测)
- [5. FAQ](#5-faq) - [5. FAQ](#5-faq)
<a name="1--------"></a> <a name="1--------"></a>
# 1. 准备数据和模型 # 1. 准备数据和模型
<a name="11-----"></a> ## 1.1 准备数据集
## 1.1 数据准备
icdar2015 TextLocalization数据集是文本检测的数据集,包含1000张训练图像和500张测试图像。 准备数据集可参考 [ocr_datasets](./dataset/ocr_datasets.md)
icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。
注册完成登陆后,下载下图中红色框标出的部分,其中, `Training Set Images`下载的内容保存为`icdar_c4_train_imgs`文件夹下,`Test Set Images` 下载的内容保存为`ch4_test_images`文件夹下
<p align="center">
<img src="../datasets/ic15_location_download.png" align="middle" width = "700"/>
<p align="center">
将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件
,您可以通过wget的方式进行下载。
```shell
# 在PaddleOCR路径下
cd PaddleOCR/
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```
PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `ppocr/utils/gen_label.py`, 这里以训练集为例:
```
# 将官网下载的标签文件转换为 train_icdar2015_label.txt
python gen_label.py --mode="det" --root_path="/path/to/icdar_c4_train_imgs/" \
--input_path="/path/to/ch4_training_localization_transcription_gt" \
--output_label="/path/to/train_icdar2015_label.txt"
```
解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,按照如下方式组织icdar2015数据集:
```
/PaddleOCR/train_data/icdar2015/text_localization/
└─ icdar_c4_train_imgs/ icdar数据集的训练数据
└─ ch4_test_images/ icdar数据集的测试数据
└─ train_icdar2015_label.txt icdar数据集的训练标注
└─ test_icdar2015_label.txt icdar数据集的测试标注
```
提供的标注文件格式如下,中间用"\t"分隔:
```
" 图像文件名 json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
```
json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。
`transcription` 表示当前文本框的文字,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。**
如果您想在其他数据集上训练,可以按照上述形式构建标注文件。
<a name="12--------"></a> <a name="12--------"></a>
## 1.2 下载预训练模型 ## 1.2 下载预训练模型
...@@ -103,9 +62,6 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \ ...@@ -103,9 +62,6 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
# 多机多卡训练,通过 --ips 参数设置使用的机器IP地址,通过 --gpus 参数设置使用的GPU ID
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
``` ```
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。 上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。
...@@ -116,15 +72,6 @@ python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1 ...@@ -116,15 +72,6 @@ python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
``` ```
**注意:** 采用多机多卡训练时,需要替换上面命令中的ips值为您机器的地址,机器之间需要能够相互ping通。另外,训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`
如果您想进一步加快训练速度,可以使用[自动混合精度训练](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), 以单机单卡为例,命令如下:
```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```
<a name="22-----"></a> <a name="22-----"></a>
## 2.2 断点训练 ## 2.2 断点训练
...@@ -183,14 +130,52 @@ args1: args1 ...@@ -183,14 +130,52 @@ args1: args1
**注意**:如果要更换网络的其他模块,可以参考[文档](./add_new_algorithm.md) **注意**:如果要更换网络的其他模块,可以参考[文档](./add_new_algorithm.md)
<a name="24---amp---"></a>
## 2.4 混合精度训练
如果您想进一步加快训练速度,可以使用[自动混合精度训练](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), 以单机单卡为例,命令如下:
```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```
<a name="26---fleet---"></a>
## 2.5 分布式训练
多机多卡训练时,通过 `--ips` 参数设置使用的机器IP地址,通过 `--gpus` 参数设置使用的GPU ID:
<a name="24---distill---"></a> ```bash
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
```
**注意:** 采用多机多卡训练时,需要替换上面命令中的ips值为您机器的地址,机器之间需要能够相互ping通。另外,训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`。
## 2.4 知识蒸馏训练
<a name="26---distill---"></a>
## 2.6 知识蒸馏训练
PaddleOCR支持了基于知识蒸馏的检测模型训练过程,更多内容可以参考[知识蒸馏说明文档](./knowledge_distillation.md)。 PaddleOCR支持了基于知识蒸馏的检测模型训练过程,更多内容可以参考[知识蒸馏说明文档](./knowledge_distillation.md)。
**注意:** 知识蒸馏训练目前只支持PP-OCR使用的`DB`和`CRNN`算法。
<a name="27---other---"></a>
## 2.7 其他训练环境
- Windows GPU/CPU
在Windows平台上与Linux平台略有不同:
Windows平台只支持`单卡`的训练与预测,指定GPU进行训练`set CUDA_VISIBLE_DEVICES=0`
在Windows平台,DataLoader只支持单进程模式,因此需要设置 `num_workers` 为0;
- macOS
不支持GPU模式,需要在配置文件中设置`use_gpu`为False,其余训练评估预测命令与Linux GPU完全相同。
- Linux DCU
DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3`,其余训练评估预测命令与Linux GPU完全相同。
<a name="3--------"></a> <a name="3--------"></a>
# 3. 模型评估与预测 # 3. 模型评估与预测
...@@ -206,22 +191,22 @@ PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall ...@@ -206,22 +191,22 @@ PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy"
``` ```
* 注:`box_thresh`、`unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置
<a name="32-------"></a> <a name="32-------"></a>
## 3.2 测试检测效果 ## 3.2 测试检测效果
测试单张图像的检测效果 测试单张图像的检测效果
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"
``` ```
测试DB模型时,调整后处理阈值 测试DB模型时,调整后处理阈值
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0
``` ```
* 注:`box_thresh`、`unclip_ratio`是DB后处理参数,其他检测模型不支持。
测试文件夹下所有图像的检测效果 测试文件夹下所有图像的检测效果
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
``` ```
......
# 《动手学OCR》电子书
特点:
- 覆盖OCR全栈技术
- 理论实践相结合
- Notebook交互式学习
- 配套教学视频
[电子书下载]()
目录:
![]()
[notebook教程](../../notebook/notebook_ch/)
[教学视频](https://aistudio.baidu.com/aistudio/education/group/info/25207)
\ No newline at end of file
[English](../doc_en/ppocr_introduction_en.md) | 简体中文
# PP-OCR
- [1. 简介](#1)
- [2. 特性](#2)
- [3. benchmark](#3)
- [4. 效果展示](#4)
- [5. 使用教程](#5)
- [5.1 快速体验](#51)
- [5.2 模型训练、压缩、推理部署](#52)
- [6. 模型库](#6)
<a name="1"></a>
## 1. 简介
PP-OCR是PaddleOCR自研的实用的超轻量OCR系统。在实现[前沿算法](algorithm.md)的基础上,考虑精度与速度的平衡,进行**模型瘦身****深度优化**,使其尽可能满足产业落地需求。
#### PP-OCR
PP-OCR是一个两阶段的OCR系统,其中文本检测算法选用[DB](algorithm_det_db.md),文本识别算法选用[CRNN](algorithm_rec_crnn.md),并在检测和识别模块之间添加[文本方向分类器](angle_class.md),以应对不同方向的文本识别。
PP-OCR系统pipeline如下:
<div align="center">
<img src="../ppocrv2_framework.jpg" width="800">
</div>
PP-OCR系统在持续迭代优化,目前已发布PP-OCR和PP-OCRv2两个版本:
PP-OCR从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
#### PP-OCRv2
PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./doc/doc_ch/enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)
#### PP-OCRv3
<a name="2"></a>
## 2. 特性
- 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
- 通用PP-OCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语等约80种语言
<a name="3"></a>
## 3. benchmark
关于PP-OCR系列模型之间的性能对比,请查看[benchmark](./benchmark.md)文档。
<a name="4"></a>
## 4. 效果展示 [more](./visualization.md)
<details open>
<summary>PP-OCRv2 中文模型</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 英文模型</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 其他语言模型</summary>
<div align="center">
<img src="../imgs_results/french_0.jpg" width="800">
<img src="../imgs_results/korean.jpg" width="800">
</div>
</details>
<a name="5"></a>
## 5. 使用教程
<a name="51"></a>
### 5.1 快速体验
- 在线网站体验:超轻量PP-OCR mobile模型体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
- 移动端demo体验:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)
- 一行命令快速使用:[快速开始(中英文/多语言)](./doc/doc_ch/quickstart.md)
<a name="52"></a>
### 5.2 模型训练、压缩、推理部署
更多教程,包括模型训练、模型压缩、推理部署等,请参考[文档教程](../../README_ch.md#文档教程)
<a name="6"></a>
## 6. 模型库
PP-OCR中英文模型列表如下:
| 模型简介 | 模型名称 | 推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------------------------------- | ----------------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| 中英文通用PP-OCR server模型(143.4M) | ch_ppocr_server_v2.0_xx | 服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
更多模型下载(包括英文数字模型、多语言模型、Paddle-Lite模型等),可以参考[PP-OCR 系列模型下载](./models_list.md)
\ No newline at end of file
- [PaddleOCR快速开始](#paddleocr快速开始) # PaddleOCR 快速开始
- [1. 安装](#1-安装)
- [1.1 安装PaddlePaddle](#11-安装paddlepaddle) **说明:** 本文主要介绍PaddleOCR wheel包对PP-OCR系列模型的快速使用,如要体验文档分析相关功能,请参考[PP-Structure快速使用教程](../../ppstructure/docs/quickstart.md)
- [1.2 安装PaddleOCR whl包](#12-安装paddleocr-whl包)
- [2. 便捷使用](#2-便捷使用) - [1. 安装](#1)
- [2.1 命令行使用](#21-命令行使用) - [1.1 安装PaddlePaddle](#11)
- [2.1.1 中英文模型](#211-中英文模型) - [1.2 安装PaddleOCR whl包](#12)
- [2.1.2 多语言模型](#212-多语言模型) - [2. 便捷使用](#2)
- [2.1.3 版面分析](#213-版面分析) - [2.1 命令行使用](#21)
- [2.2 Python脚本使用](#22-python脚本使用) - [2.1.1 中英文模型](#211)
- [2.2.1 中英文与多语言使用](#221-中英文与多语言使用) - [2.1.2 多语言模型](#212)
- [2.2.2 版面分析](#222-版面分析) - [2.2 Python脚本使用](#22)
- [3. 小结](#3-小结) - [2.2.1 中英文与多语言使用](#221)
- [3.小结](#3)
# PaddleOCR快速开始
<a name="1"></a>
<a name="1"></a>
## 1. 安装 ## 1. 安装
<a name="11"></a> <a name="11"></a>
### 1.1 安装PaddlePaddle ### 1.1 安装PaddlePaddle
> 如果您没有基础的Python运行环境,请参考[运行环境准备](./environment.md)。 > 如果您没有基础的Python运行环境,请参考[运行环境准备](./environment.md)。
...@@ -39,22 +37,13 @@ ...@@ -39,22 +37,13 @@
更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
<a name="12"></a> <a name="12"></a>
### 1.2 安装PaddleOCR whl包 ### 1.2 安装PaddleOCR whl包
```bash ```bash
pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本 pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本
``` ```
- 对于Windows环境用户: - 对于Windows环境用户:直接通过pip安装的shapely库可能出现`[winRrror 126] 找不到指定模块的问题`。建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装。
直接通过pip安装的shapely库可能出现`[winRrror 126] 找不到指定模块的问题`。建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装,
- 使用**版面分析**功能时,运行以下命令**安装 Layout-Parser**
```bash
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
```
<a name="2"></a> <a name="2"></a>
...@@ -68,7 +57,8 @@ PaddleOCR提供了一系列测试图片,点击[这里](https://paddleocr.bj.bc ...@@ -68,7 +57,8 @@ PaddleOCR提供了一系列测试图片,点击[这里](https://paddleocr.bj.bc
cd /path/to/ppocr_img cd /path/to/ppocr_img
``` ```
如果不使用提供的测试图片,可以将下方`--image_dir`参数替换为相应的测试图片路径 如果不使用提供的测试图片,可以将下方`--image_dir`参数替换为相应的测试图片路径。
<a name="211"></a> <a name="211"></a>
#### 2.1.1 中英文模型 #### 2.1.1 中英文模型
...@@ -154,60 +144,6 @@ paddleocr --image_dir ./imgs_en/254.jpg --lang=en ...@@ -154,60 +144,6 @@ paddleocr --image_dir ./imgs_en/254.jpg --lang=en
| 繁体中文 | chinese_cht | | 意大利文 | it | | 俄罗斯文 | ru | | 繁体中文 | chinese_cht | | 意大利文 | it | | 俄罗斯文 | ru |
全部语种及其对应的缩写列表可查看[多语言模型教程](./multi_languages.md) 全部语种及其对应的缩写列表可查看[多语言模型教程](./multi_languages.md)
<a name="213"></a>
#### 2.1.3 版面分析
版面分析是指对文档图片中的文字、标题、列表、图片和表格5类区域进行划分。对于前三类区域,直接使用OCR模型完成对应区域文字检测与识别,并将结果保存在txt中。对于表格类区域,经过表格结构化处理后,表格图片转换为相同表格样式的Excel文件。图片区域会被单独裁剪成图像。
使用PaddleOCR的版面分析功能,需要指定`--type=structure`
```bash
paddleocr --image_dir=./table/1.png --type=structure
```
- **返回结果说明**
PP-Structure的返回结果为一个dict组成的list,示例如下
```shell
[{ 'type': 'Text',
'bbox': [34, 432, 345, 462],
'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
[('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)])
}
]
```
其中各个字段说明如下
| 字段 | 说明 |
| ---- | ------------------------------------------------------------ |
| type | 图片区域的类型 |
| bbox | 图片区域的在原图的坐标,分别[左上角x,左上角y,右下角x,右下角y] |
| res | 图片区域的OCR或表格识别结果。<br>表格: 表格的HTML字符串; <br>OCR: 一个包含各个单行文字的检测坐标和识别结果的元组 |
运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。
```
/output/table/1/
└─ res.txt
└─ [454, 360, 824, 658].xlsx 表格识别结果
└─ [16, 2, 828, 305].jpg 被裁剪出的图片区域
└─ [17, 361, 404, 711].xlsx 表格识别结果
```
- **参数说明**
| 字段 | 说明 | 默认值 |
| --------------- | ---------------------------------------- | -------------------------------------------- |
| output | excel和识别结果保存的地址 | ./output/table |
| table_max_len | 表格结构模型预测时,图像的长边resize尺度 | 488 |
| table_model_dir | 表格结构模型 inference 模型地址 | None |
| table_char_dict_path | 表格结构模型所用字典地址 | ../ppocr/utils/dict/table_structure_dict.txt |
大部分参数和paddleocr whl包保持一致,见 [whl包文档](./whl.md)
<a name="22"></a> <a name="22"></a>
...@@ -256,35 +192,7 @@ im_show.save('result.jpg') ...@@ -256,35 +192,7 @@ im_show.save('result.jpg')
<div align="center"> <div align="center">
<img src="../imgs_results/whl/11_det_rec.jpg" width="800"> <img src="../imgs_results/whl/11_det_rec.jpg" width="800">
</div> </div>
<a name="222"></a>
#### 2.2.2 版面分析
```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True)
save_folder = './output/table'
img_path = './table/paper-image.jpg'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
from PIL import Image
font_path = './fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
<a name="3"></a> <a name="3"></a>
...@@ -292,4 +200,4 @@ im_show.save('result.jpg') ...@@ -292,4 +200,4 @@ im_show.save('result.jpg')
通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。 通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。
PaddleOCR是一套丰富领先实用的OCR工具库,打通数据、模型训练、压缩和推理部署全流程,因此在[下一节](./paddleOCR_overview.md)中我们将首先为您介绍PaddleOCR的全景图,然后克隆PaddleOCR项目,正式开启PaddleOCR的应用之旅。 PaddleOCR是一套丰富领先实用的OCR工具库,打通数据、模型训练、压缩和推理部署全流程,您可以参考[文档教程](../../README_ch.md#文档教程),正式开启PaddleOCR的应用之旅。
...@@ -2,25 +2,32 @@ ...@@ -2,25 +2,32 @@
本文提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明: 本文提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
- [文字识别](#文字识别) - [1. 数据准备](#1-数据准备)
- [1. 数据准备](#1-数据准备) * [1.1 自定义数据集](#11-自定义数据集)
- [1.1 自定义数据集](#11-自定义数据集) * [1.2 数据下载](#12-数据下载)
- [1.2 数据下载](#12-数据下载) * [1.3 字典](#13-字典)
- [1.3 字典](#13-字典) * [1.4 添加空格类别](#14-添加空格类别)
- [1.4 添加空格类别](#14-添加空格类别) * [1.5 数据增强](#15-数据增强)
- [2. 启动训练](#2-启动训练) - [2. 开始训练](#2-开始训练)
- [2.1 数据增强](#21-数据增强) * [2.1 启动训练](#21-----)
- [2.2 通用模型训练](#22-通用模型训练) * [2.2 断点训练](#22-----)
- [2.3 多语言模型训练](#23-多语言模型训练) * [2.3 更换Backbone 训练](#23---backbone---)
- [2.4 知识蒸馏训练](#24-知识蒸馏训练) * [2.4 混合精度训练](#24---amp---)
- [3 评估](#3-评估) * [2.5 分布式训练](#25---fleet---)
- [4 预测](#4-预测) * [2.6 知识蒸馏训练](#26---distill---)
- [5. 转Inference模型测试](#5-转inference模型测试) * [2.7 多语言模型训练](#27-多语言模型训练)
* [2.8 其他训练环境(Windows/macOS/Linux DCU)](#28---other---)
- [3. 模型评估与预测](#3--------)
<a name="数据准备"></a> * [3.1 指标评估](#31-----)
## 1. 数据准备 * [3.2 测试识别效果](#32-------)
- [4. 模型导出与预测](#4--------)
- [5. FAQ](#5-faq)
<a name="1-数据准备"></a>
# 1. 数据准备
### 1.1 准备数据集
PaddleOCR 支持两种数据格式: PaddleOCR 支持两种数据格式:
- `lmdb` 用于训练以lmdb格式存储的数据集(LMDBDataSet); - `lmdb` 用于训练以lmdb格式存储的数据集(LMDBDataSet);
...@@ -35,8 +42,8 @@ ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset ...@@ -35,8 +42,8 @@ ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset> mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
``` ```
<a name="准备数据集"></a> <a name="11-自定义数据集"></a>
### 1.1 自定义数据集 ## 1.1 自定义数据集
下面以通用数据集为例, 介绍如何准备数据集: 下面以通用数据集为例, 介绍如何准备数据集:
* 训练集 * 训练集
...@@ -91,9 +98,8 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单 ...@@ -91,9 +98,8 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
| ... | ...
``` ```
<a name="数据下载"></a> <a name="12-数据下载"></a>
## 1.2 数据下载
### 1.2 数据下载
- ICDAR2015 - ICDAR2015
...@@ -127,8 +133,8 @@ python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_ ...@@ -127,8 +133,8 @@ python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_
* [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view) * [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
<a name="字典"></a> <a name="13-字典"></a>
### 1.3 字典 ## 1.3 字典
最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。 最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。
...@@ -163,9 +169,6 @@ PaddleOCR内置了一部分字典,可以按需使用。 ...@@ -163,9 +169,6 @@ PaddleOCR内置了一部分字典,可以按需使用。
`ppocr/utils/en_dict.txt` 是一个包含96个字符的英文字典 `ppocr/utils/en_dict.txt` 是一个包含96个字符的英文字典
目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体** 目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体**
如您愿意可将字典文件提交至 [dict](../../ppocr/utils/dict),我们会在Repo中感谢您。 如您愿意可将字典文件提交至 [dict](../../ppocr/utils/dict),我们会在Repo中感谢您。
...@@ -174,16 +177,12 @@ PaddleOCR内置了一部分字典,可以按需使用。 ...@@ -174,16 +177,12 @@ PaddleOCR内置了一部分字典,可以按需使用。
如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。 如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。
<a name="支持空格"></a> <a name="支持空格"></a>
### 1.4 添加空格类别 ## 1.4 添加空格类别
如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True` 如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True`
<a name="启动训练"></a>
## 2. 启动训练
<a name="数据增强"></a> <a name="数据增强"></a>
### 2.1 数据增强 ## 1.5 数据增强
PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加了数据增广。 PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加了数据增广。
...@@ -193,11 +192,14 @@ PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加 ...@@ -193,11 +192,14 @@ PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加
*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux* *由于OpenCV的兼容性问题,扰动操作暂时只支持Linux*
<a name="通用模型训练"></a> <a name="开始训练"></a>
### 2.2 通用模型训练 # 2. 开始训练
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例: PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
<a name="启动训练"></a>
## 2.1 启动训练
首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune 首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune
``` ```
...@@ -317,8 +319,96 @@ Eval: ...@@ -317,8 +319,96 @@ Eval:
``` ```
**注意,预测/评估时的配置文件请务必与训练一致。** **注意,预测/评估时的配置文件请务必与训练一致。**
<a name="多语言模型训练"></a>
### 2.3 多语言模型训练 <a name="断点训练"></a>
## 2.2 断点训练
如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径:
```shell
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./your/trained/model
```
**注意**`Global.checkpoints`的优先级高于`Global.pretrained_model`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrained_model`指定的模型。
<a name="23---backbone---"></a>
## 2.3 更换Backbone 训练
PaddleOCR将网络划分为四部分,分别在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones->necks->heads)依次通过这四个部分。
```bash
├── architectures # 网络的组网代码
├── transforms # 网络的图像变换模块
├── backbones # 网络的特征提取模块
├── necks # 网络的特征增强模块
└── heads # 网络的输出模块
```
如果要更换的Backbone 在PaddleOCR中有对应实现,直接修改配置yml文件中`Backbone`部分的参数即可。
如果要使用新的Backbone,更换backbones的例子如下:
1.[ppocr/modeling/backbones](../../ppocr/modeling/backbones) 文件夹下新建文件,如my_backbone.py。
2. 在 my_backbone.py 文件内添加相关代码,示例代码如下:
```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class MyBackbone(nn.Layer):
def __init__(self, *args, **kwargs):
super(MyBackbone, self).__init__()
# your init code
self.conv = nn.xxxx
def forward(self, inputs):
# your network forward
y = self.conv(inputs)
return y
```
3.[ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py)文件内导入添加的`MyBackbone`模块,然后修改配置文件中Backbone进行配置即可使用,格式如下:
```yaml
Backbone:
name: MyBackbone
args1: args1
```
**注意**:如果要更换网络的其他模块,可以参考[文档](./add_new_algorithm.md)
<a name="24---amp---"></a>
## 2.4 混合精度训练
如果您想进一步加快训练速度,可以使用[自动混合精度训练](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), 以单机单卡为例,命令如下:
```shell
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml \
-o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```
<a name="26---fleet---"></a>
## 2.5 分布式训练
多机多卡训练时,通过 `--ips` 参数设置使用的机器IP地址,通过 `--gpus` 参数设置使用的GPU ID:
```bash
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml \
-o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train
```
**注意:** 采用多机多卡训练时,需要替换上面命令中的ips值为您机器的地址,机器之间需要能够相互ping通。另外,训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`。
<a name="26---distill---"></a>
## 2.6 知识蒸馏训练
PaddleOCR支持了基于知识蒸馏的文本识别模型训练过程,更多内容可以参考[知识蒸馏说明文档](./knowledge_distillation.md)。
<a name="27-多语言模型训练"></a>
## 2.7 多语言模型训练
PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。 PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
...@@ -374,24 +464,36 @@ Eval: ...@@ -374,24 +464,36 @@ Eval:
... ...
``` ```
<a name="知识蒸馏训练"></a> <a name="28---other---"></a>
## 2.8 其他训练环境
### 2.4 知识蒸馏训练 - Windows GPU/CPU
在Windows平台上与Linux平台略有不同:
Windows平台只支持`单卡`的训练与预测,指定GPU进行训练`set CUDA_VISIBLE_DEVICES=0`
在Windows平台,DataLoader只支持单进程模式,因此需要设置 `num_workers` 为0;
- macOS
不支持GPU模式,需要在配置文件中设置`use_gpu`为False,其余训练评估预测命令与Linux GPU完全相同。
- Linux DCU
DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3`,其余训练评估预测命令与Linux GPU完全相同。
PaddleOCR支持了基于知识蒸馏的文本识别模型训练过程,更多内容可以参考[知识蒸馏说明文档](./knowledge_distillation.md)
<a name="评估"></a> <a name="3--------"></a>
## 3 评估 # 3. 模型评估与预测
评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。 <a name="31-----"></a>
## 3.1 指标评估
训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。
``` ```
# GPU 评估, Global.checkpoints 为待测权重 # GPU 评估, Global.checkpoints 为待测权重
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
``` ```
<a name="预测"></a> <a name="32-------"></a>
## 4 预测 ## 3.2 测试识别效果
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。 使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
...@@ -450,9 +552,14 @@ infer_img: doc/imgs_words/ch/word_1.jpg ...@@ -450,9 +552,14 @@ infer_img: doc/imgs_words/ch/word_1.jpg
result: ('韩国小馆', 0.997218) result: ('韩国小馆', 0.997218)
``` ```
<a name="Inference"></a>
## 5. 转Inference模型测试 <a name="4--------"></a>
# 4. 模型导出与预测
inference 模型(`paddle.jit.save`保存的模型)
一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。
训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。
与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
识别模型转inference模型与检测的方式相同,如下: 识别模型转inference模型与检测的方式相同,如下:
...@@ -483,3 +590,11 @@ python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_trai ...@@ -483,3 +590,11 @@ python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_trai
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path"
``` ```
<a name="5-faq"></a>
# 5. FAQ
Q1: 训练模型转inference 模型之后预测效果不一致?
**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。
...@@ -87,7 +87,7 @@ Optimizer: ...@@ -87,7 +87,7 @@ Optimizer:
- 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。 - 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。
- 小语种数据集,使用不同语料和字体,分别生成了100w合成数据集,并使用ICDAR-MLT作为验证集。 - 小语种数据集,使用不同语料和字体,分别生成了100w合成数据集,并使用ICDAR-MLT作为验证集。
其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)[SynthText](https://github.com/ankush-me/SynthText)[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) 等。 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](dataset/datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)[SynthText](https://github.com/ankush-me/SynthText)[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) 等。
<a name="垂类场景"></a> <a name="垂类场景"></a>
### 3.2 垂类场景 ### 3.2 垂类场景
...@@ -149,4 +149,3 @@ PaddleOCR主要聚焦通用OCR,如果有垂类需求,您可以用PaddleOCR+ ...@@ -149,4 +149,3 @@ PaddleOCR主要聚焦通用OCR,如果有垂类需求,您可以用PaddleOCR+
- [文本方向分类器训练](./angle_class.md) - [文本方向分类器训练](./angle_class.md)
- [知识蒸馏](./knowledge_distillation.md) - [知识蒸馏](./knowledge_distillation.md)
...@@ -22,7 +22,7 @@ ...@@ -22,7 +22,7 @@
- 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具 - 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具
- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档 - 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./config.md) - 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./config.md)
- 2020.6.8 添加[数据集](./datasets.md),并保持持续更新 - 2020.6.8 添加[数据集](dataset/datasets.md),并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model` - 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分 - 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验 - 2020.5.30 提供超轻量级中文OCR在线体验
......
...@@ -42,7 +42,7 @@ At present, the open source model, dataset and magnitude are as follows: ...@@ -42,7 +42,7 @@ At present, the open source model, dataset and magnitude are as follows:
English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions. English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions.
Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w. Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w.
Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](dataset/datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.
10. **Error in using the model with TPS module for prediction** 10. **Error in using the model with TPS module for prediction**
Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3]\(108) != Grid dimension[2]\(100) Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3]\(108) != Grid dimension[2]\(100)
......
# DB
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Training](#3-1)
- [3.2 Evaluation](#3-2)
- [3.3 Prediction](#3-3)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
Paper:
> [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
> Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang
> AAAI, 2020
On the ICDAR2015 dataset, the text detection result is as follows:
|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
| --- | --- | --- | --- | --- | --- | --- |
|DB|ResNet50_vd|[configs/det/det_r50_vd_db.yml](../../configs/det/det_r50_vd_db.yml)|86.41%|78.72%|82.38%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|DB|MobileNetV3|[configs/det/det_mv3_db.yml](../../configs/det/det_mv3_db.yml)|77.29%|73.08%|75.12%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
<a name="2"></a>
## 2. Environment
Please prepare your environment referring to [prepare the environment](./environment_en.md) and [clone the repo](./clone_en.md).
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
Please refer to [text detection training tutorial](./detection_en.md). PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)), you can use the following command to convert:
```shell
python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_db
```
DB text detection model inference, you can execute the following command:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/"
```
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_img_10_db.jpg)
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
<a name="4-2"></a>
### 4.2 C++ Inference
With the inference model prepared, refer to the [cpp infer](../../deploy/cpp_infer/) tutorial for C++ inference.
<a name="4-3"></a>
### 4.3 Serving
With the inference model prepared, refer to the [pdserving](../../deploy/pdserving/) tutorial for service deployment by Paddle Serving.
<a name="4-4"></a>
### 4.4 More
More deployment schemes supported for DB:
- Paddle2ONNX: with the inference model prepared, please refer to the [paddle2onnx](../../deploy/paddle2onnx/) tutorial.
<a name="5"></a>
## 5. FAQ
## Citation
```bibtex
@inproceedings{liao2020real,
title={Real-time scene text detection with differentiable binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={07},
pages={11474--11481},
year={2020}
}
```
\ No newline at end of file
# FCENet
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Training](#3-1)
- [3.2 Evaluation](#3-2)
- [3.3 Prediction](#3-3)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
Paper:
> [Fourier Contour Embedding for Arbitrary-Shaped Text Detection](https://arxiv.org/abs/2104.10442)
> Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang
> CVPR, 2021
On the CTW1500 dataset, the text detection result is as follows:
|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
| --- | --- | --- | --- | --- | --- | --- |
| FCE | ResNet50_dcn | [configs/det/det_r50_vd_dcn_fce_ctw.yml](../../configs/det/det_r50_vd_dcn_fce_ctw.yml)| 88.39%|82.18%|85.27%|[trained model](https://paddleocr.bj.bcebos.com/contribution/det_r50_dcn_fce_ctw_v2.0_train.tar)|
<a name="2"></a>
## 2. Environment
Please prepare your environment referring to [prepare the environment](./environment_en.md) and [clone the repo](./clone_en.md).
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
The above FCE model is trained using the CTW1500 text detection public dataset. For the download of the dataset, please refer to [ocr_datasets](./dataset/ocr_datasets_en.md).
After the data download is complete, please refer to [Text Detection Training Tutorial](./detection.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, convert the model saved in the FCE text detection training process into an inference model. Taking the model based on the Resnet50_vd_dcn backbone network and trained on the CTW1500 English dataset as example ([model download link](https://paddleocr.bj.bcebos.com/contribution/det_r50_dcn_fce_ctw_v2.0_train.tar)), you can use the following command to convert:
```shell
python3 tools/export_model.py -c configs/det/det_r50_vd_dcn_fce_ctw.yml -o Global.pretrained_model=./det_r50_dcn_fce_ctw_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_fce
```
FCE text detection model inference, to perform non-curved text detection, you can run the following commands:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_fce/" --det_algorithm="FCE" --det_fce_box_type=quad
```
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_img_10_fce.jpg)
If you want to perform curved text detection, you can execute the following command:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_fce/" --det_algorithm="FCE" --det_fce_box_type=poly
```
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_img623_fce.jpg)
**Note**: Since the CTW1500 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese or curved text images.
<a name="4-2"></a>
### 4.2 C++ Inference
Since the post-processing is not written in CPP, the FCE text detection model does not support CPP inference.
<a name="4-3"></a>
### 4.3 Serving
Not supported
<a name="4-4"></a>
### 4.4 More
Not supported
<a name="5"></a>
## 5. FAQ
## Citation
```bibtex
@InProceedings{zhu2021fourier,
title={Fourier Contour Embedding for Arbitrary-Shaped Text Detection},
author={Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang},
year={2021},
booktitle = {CVPR}
}
```
# PSENet
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Training](#3-1)
- [3.2 Evaluation](#3-2)
- [3.3 Prediction](#3-3)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
Paper:
> [Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473)
> Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai
> CVPR, 2019
On the ICDAR2015 dataset, the text detection result is as follows:
|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
| --- | --- | --- | --- | --- | --- | --- |
|PSE| ResNet50_vd | [configs/det/det_r50_vd_pse.yml](../../configs/det/det_r50_vd_pse.yml)| 85.81% |79.53%|82.55%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|PSE| MobileNetV3| [configs/det/det_mv3_pse.yml](../../configs/det/det_mv3_pse.yml) | 82.20% |70.48%|75.89%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
<a name="2"></a>
## 2. Environment
Please prepare your environment referring to [prepare the environment](./environment_en.md) and [clone the repo](./clone_en.md).
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
The above PSE model is trained using the ICDAR2015 text detection public dataset. For the download of the dataset, please refer to [ocr_datasets](./dataset/ocr_datasets_en.md).
After the data download is complete, please refer to [Text Detection Training Tutorial](./detection.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, convert the model saved in the PSE text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)), you can use the following command to convert:
```shell
python3 tools/export_model.py -c configs/det/det_r50_vd_pse.yml -o Global.pretrained_model=./det_r50_vd_pse_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_pse
```
PSE text detection model inference, to perform non-curved text detection, you can run the following commands:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=quad
```
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_img_10_pse.jpg)
If you want to perform curved text detection, you can execute the following command:
```shell
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=poly
```
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_img_10_pse_poly.jpg)
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese or curved text images.
<a name="4-2"></a>
### 4.2 C++ Inference
Since the post-processing is not written in CPP, the PSE text detection model does not support CPP inference.
<a name="4-3"></a>
### 4.3 Serving
Not supported
<a name="4-4"></a>
### 4.4 More
Not supported
<a name="5"></a>
## 5. FAQ
## Citation
```bibtex
@inproceedings{wang2019shape,
title={Shape robust text detection with progressive scale expansion network},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={9336--9345},
year={2019}
}
```
# Academic Algorithms and Models
PaddleOCR will add cutting-edge OCR algorithms and models continuously. Check out the supported models and tutorials by clicking the following list:
- [text detection algorithms](./algorithm_overview_en.md#11)
- [text recognition algorithms](./algorithm_overview_en.md#12)
- [end-to-end algorithms](./algorithm_overview_en.md#2)
Developers are welcome to contribute more algorithms! Please refer to [add new algorithm](./add_new_algorithm_en.md) guideline.
\ No newline at end of file
# Two-stage Algorithm # OCR Algorithms
- [1. Algorithm Introduction](#1-algorithm-introduction) - [1. Two-stage Algorithms](#1)
* [1.1 Text Detection Algorithm](#11-text-detection-algorithm) * [1.1 Text Detection Algorithms](#11)
* [1.2 Text Recognition Algorithm](#12-text-recognition-algorithm) * [1.2 Text Recognition Algorithms](#12)
- [2. Training](#2-training) - [2. End-to-end Algorithms](#2)
- [3. Inference](#3-inference)
<a name="Algorithm_introduction"></a>
## 1. Algorithm Introduction This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md).
This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md). <a name="1"></a>
## 1. Two-stage Algorithms
- [1. Text Detection Algorithm](#TEXTDETECTIONALGORITHM) <a name="11"></a>
- [2. Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM)
<a name="TEXTDETECTIONALGORITHM"></a> ### 1.1 Text Detection Algorithms
### 1.1 Text Detection Algorithm Supported text detection algorithms (Click the link to get the tutorial):
- [x] [DB](./algorithm_det_db_en.md)
PaddleOCR open source text detection algorithms list: - [x] [EAST](./algorithm_det_east_en.md)
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))[2] - [x] [SAST](./algorithm_det_sast_en.md)
- [x] DB([paper](https://arxiv.org/abs/1911.08947))[1] - [x] [PSENet](./algorithm_det_psenet_en.md)
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))[4] - [x] [FCENet](./algorithm_det_fcenet_en.md)
- [x] PSENet([paper](https://arxiv.org/abs/1903.12473v2)
On the ICDAR2015 dataset, the text detection result is as follows: On the ICDAR2015 dataset, the text detection result is as follows:
...@@ -48,20 +45,19 @@ On Total-Text dataset, the text detection result is as follows: ...@@ -48,20 +45,19 @@ On Total-Text dataset, the text detection result is as follows:
* [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). * [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
* [Google Drive](https://drive.google.com/drive/folders/1ll2-XEVyCQLpJjawLDiRlvo_i4BqHCJe?usp=sharing) * [Google Drive](https://drive.google.com/drive/folders/1ll2-XEVyCQLpJjawLDiRlvo_i4BqHCJe?usp=sharing)
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md)
<a name="TEXTRECOGNITIONALGORITHM"></a> <a name="12"></a>
### 1.2 Text Recognition Algorithm ### 1.2 Text Recognition Algorithms
PaddleOCR open-source text recognition algorithms list: Supported text recognition algorithms (Click the link to get the tutorial):
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))[7] - [x] [CRNN](./algorithm_rec_crnn_en.md)
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))[10] - [x] [Rosetta](./algorithm_rec_rosetta_en.md)
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))[11] - [x] [STAR-Net](./algorithm_rec_starnet_en.md)
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12] - [x] [RARE](./algorithm_rec_rare_en.md)
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5] - [x] [SRN](./algorithm_rec_srn_en.md)
- [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2))[13] - [x] [NRTR](./algorithm_rec_nrtr_en.md)
- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2)) - [x] [SAR](./algorithm_rec_sar_en.md)
- [x] SEED([paper](https://arxiv.org/pdf/2005.10977.pdf)) - [x] [SEED](./algorithm_rec_seed_en.md)
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
...@@ -80,12 +76,10 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r ...@@ -80,12 +76,10 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
|SAR|Resnet31| 87.20% | rec_r31_sar | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) | |SAR|Resnet31| 87.20% | rec_r31_sar | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
|SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | |SEED|Aster_Resnet| 85.35% | rec_resnet_stn_bilstm_att | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) |
Please refer to the document for training guide and use of PaddleOCR
## 2. Training
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md). For text recognition algorithms, please refer to [Text recognition model training/evaluation/prediction](./recognition_en.md) <a name="2"></a>
## 3. Inference ## 2. End-to-end Algorithms
Except for the PP-OCR series models of the above models, the other models only support inference based on the Python engine. For details, please refer to [Inference based on Python prediction engine](./inference_en.md) Supported end-to-end algorithms (Click the link to get the tutorial):
- [x] [PGNet](./algorithm_e2e_pgnet_en.md)
# SAR
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Training](#3-1)
- [3.2 Evaluation](#3-2)
- [3.3 Prediction](#3-3)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
Paper:
> [Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751)
> Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang
> AAAI, 2019
Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets, the algorithm reproduction effect is as follows:
|Model|Backbone|config|Acc|Download link|
| --- | --- | --- | --- | --- | --- | --- |
|SAR|ResNet31|[rec_r31_sar.yml](../../configs/rec/rec_r31_sar.yml)|87.20%|[train model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar)|
Note:In addition to using the two text recognition datasets MJSynth and SynthText, [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg) data (extraction code: 627x), and some real data are used in training, the specific data details can refer to the paper.
<a name="2"></a>
## 2. Environment
Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
```
#Single GPU training (long training period, not recommended)
python3 tools/train.py -c configs/rec/rec_r31_sar.yml
#Multi GPU training, specify the gpu number through the --gpus parameter
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r31_sar.yml
```
Evaluation:
```
# GPU evaluation
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
```
Prediction:
```
# The configuration file used for prediction must match the training
python3 tools/infer_rec.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
```
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, the model saved during the SAR text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) ), you can use the following command to convert:
```
python3 tools/export_model.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model=./rec_r31_sar_train/best_accuracy Global.save_inference_dir=./inference/rec_sar
```
For SAR text recognition model inference, the following commands can be executed:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_sar/" --rec_image_shape="3, 48, 48, 160" --rec_char_type="ch" --rec_algorithm="SAR" --rec_char_dict_path="ppocr/utils/dict90.txt" --max_text_length=30 --use_space_char=False
```
<a name="4-2"></a>
### 4.2 C++ Inference
Not supported
<a name="4-3"></a>
### 4.3 Serving
Not supported
<a name="4-4"></a>
### 4.4 More
Not supported
<a name="5"></a>
## 5. FAQ
## Citation
```bibtex
@article{Li2019ShowAA,
title={Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition},
author={Hui Li and Peng Wang and Chunhua Shen and Guyu Zhang},
journal={ArXiv},
year={2019},
volume={abs/1811.00751}
}
```
# SRN
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Training](#3-1)
- [3.2 Evaluation](#3-2)
- [3.3 Prediction](#3-3)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
Paper:
> [Towards Accurate Scene Text Recognition with Semantic Reasoning Networks](https://arxiv.org/abs/2003.12294#)
> Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, Errui Ding
> CVPR,2020
Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets, the algorithm reproduction effect is as follows:
|Model|Backbone|config|Acc|Download link|
| --- | --- | --- | --- | --- | --- | --- |
|SRN|Resnet50_vd_fpn|[rec_r50_fpn_srn.yml](../../configs/rec/rec_r50_fpn_srn.yml)|86.31%|[train model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
<a name="2"></a>
## 2. Environment
Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
```
#Single GPU training (long training period, not recommended)
python3 tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
#Multi GPU training, specify the gpu number through the --gpus parameter
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
```
Evaluation:
```
# GPU evaluation
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
```
Prediction:
```
# The configuration file used for prediction must match the training
python3 tools/infer_rec.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
```
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, the model saved during the SRN text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar) ), you can use the following command to convert:
```
python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./rec_r50_vd_srn_train/best_accuracy Global.save_inference_dir=./inference/rec_srn
```
For SRN text recognition model inference, the following commands can be executed:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_char_type="ch" --rec_algorithm="SRN" --rec_char_dict_path="ppocr/utils/ic15_dict.txt" --use_space_char=False
```
<a name="4-2"></a>
### 4.2 C++ Inference
Not supported
<a name="4-3"></a>
### 4.3 Serving
Not supported
<a name="4-4"></a>
### 4.4 More
Not supported
<a name="5"></a>
## 5. FAQ
## Citation
```bibtex
@article{Yu2020TowardsAS,
title={Towards Accurate Scene Text Recognition With Semantic Reasoning Networks},
author={Deli Yu and Xuan Li and Chengquan Zhang and Junyu Han and Jingtuo Liu and Errui Ding},
journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020},
pages={12110-12119}
}
```
# Android Demo quick start
### 1. Install the latest version of Android Studio
It can be downloaded from https://developer.android.com/studio . This Demo is written by Android Studio version 4.0.
### 2. Create a new project
The NDK version 20b is used in the demo test, and the compilation can be successfully supported for version 20 and above.
If you are a beginner, you can install and test the NDK compilation environment in the following ways.
File -> New ->New Project to create "Native C++" project
1. Start a new Android Studio project
Select Native C++ in the project template, select Paddle OCR/deploy/android_demo path
After entering the project, it will be automatically compiled. The first compilation
will take a long time. It is recommended to add an agent to speed up the download.
**Agent add:**
Android Studio -> Preferences -> Appearance & Behavior -> System Settings -> HTTP Proxy -> Manual proxy configuration
![](../demo/proxy.png)
2. Start compilation
Click the compile button, connect the phone, and follow the instructions of Android Studio to complete the operation.
When you see the following picture in Android Studio, the compilation is complete:
![](../demo/build.png)
**Tip:** At this time, if the following error message that OpenCV cannot be found appears, please re-click compile,
exit the project after compiling, and enter again.
![](../demo/error.png)
### 3. Send to mobile
Complete the compilation, click Run, and check the effect on the mobile phone.
### 4. How to customize the demo picture
1. Image storage path: android_demo/app/src/main/assets/images
Place the custom picture under this path
2. Configuration file: android_demo/app/src/main/res/values/strings.xml
Modify IMAGE_PATH_DEFAULT to a custom picture name
# Get more support
Go to [EasyEdge](https://ai.baidu.com/easyedge/app/open_source_demo?referrerUrl=paddlelite) to get more development support:
- Demo APP: You can use your mobile phone to scan the code to install, which is convenient for the mobile terminal to quickly experience text recognition
- SDK: The model is packaged to adapt to different chip hardware and operating system SDKs, including a complete interface to facilitate secondary development
# PaddleOCR Overview and Project Clone # Project Clone
## 1. PaddleOCR Overview ## 1. Clone PaddleOCR
PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. With the experience from real world scenarios and the industry, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library of different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure toolkit for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality. ```bash
<div align="center">
<img src="../overview_en.png">
</div>
## 2. Project Clone
### **2.1 Clone PaddleOCR repo**
```
# Recommend # Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR git clone https://github.com/PaddlePaddle/PaddleOCR
...@@ -25,9 +13,9 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR ...@@ -25,9 +13,9 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
# Note: The mirror on Gitee may not keep in synchronization with the latest project on GitHub. There might be a delay of 3-5 days. Please try GitHub at first. # Note: The mirror on Gitee may not keep in synchronization with the latest project on GitHub. There might be a delay of 3-5 days. Please try GitHub at first.
``` ```
### **2.2 Install third-party libraries** ## 2. Install third-party libraries
``` ```bash
cd PaddleOCR cd PaddleOCR
pip3 install -r requirements.txt pip3 install -r requirements.txt
``` ```
......
...@@ -56,7 +56,7 @@ Take rec_chinese_lite_train_v2.0.yml as an example ...@@ -56,7 +56,7 @@ Take rec_chinese_lite_train_v2.0.yml as an example
| learning_rate | Set the base learning rate | 0.001 | \ | | learning_rate | Set the base learning rate | 0.001 | \ |
| **regularizer** | Set network regularization method | - | \ | | **regularizer** | Set network regularization method | - | \ |
| name | Regularizer class name | L2 | Currently support`L1`,`L2`, see[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py) | | name | Regularizer class name | L2 | Currently support`L1`,`L2`, see[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py) |
| factor | Learning rate decay coefficient | 0.00004 | \ | | factor | Regularizer coefficient | 0.00001 | \ |
### Architecture ([ppocr/modeling](../../ppocr/modeling)) ### Architecture ([ppocr/modeling](../../ppocr/modeling))
......
...@@ -12,11 +12,11 @@ In addition to opensource data, users can also use synthesis tools to synthesize ...@@ -12,11 +12,11 @@ In addition to opensource data, users can also use synthesis tools to synthesize
#### 1. ICDAR2019-LSVT #### 1. ICDAR2019-LSVT
- **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt - **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt
- **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure: - **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
![](../datasets/LSVT_1.jpg) ![](../../datasets/LSVT_1.jpg)
(a) Fully labeled data (a) Fully labeled data
![](../datasets/LSVT_2.jpg) ![](../../datasets/LSVT_2.jpg)
(b) Weakly labeled data (b) Weakly labeled data
- **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt - **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt
...@@ -25,7 +25,7 @@ In addition to opensource data, users can also use synthesis tools to synthesize ...@@ -25,7 +25,7 @@ In addition to opensource data, users can also use synthesis tools to synthesize
#### 2. ICDAR2017-RCTW-17 #### 2. ICDAR2017-RCTW-17
- **Data sources**:https://rctw.vlrlab.net/ - **Data sources**:https://rctw.vlrlab.net/
- **Introduction**:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications. - **Introduction**:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications.
![](../datasets/rctw.jpg) ![](../../datasets/rctw.jpg)
- **Download link**:https://rctw.vlrlab.net/dataset/ - **Download link**:https://rctw.vlrlab.net/dataset/
<a name="中文街景文字识别"></a> <a name="中文街景文字识别"></a>
...@@ -33,9 +33,9 @@ In addition to opensource data, users can also use synthesis tools to synthesize ...@@ -33,9 +33,9 @@ In addition to opensource data, users can also use synthesis tools to synthesize
- **Data sources**:https://aistudio.baidu.com/aistudio/competition/detail/8 - **Data sources**:https://aistudio.baidu.com/aistudio/competition/detail/8
- **Introduction**:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure: - **Introduction**:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure:
![](../datasets/ch_street_rec_1.png) ![](../../datasets/ch_street_rec_1.png)
(a) Label: 魅派集成吊顶 (a) Label: 魅派集成吊顶
![](../datasets/ch_street_rec_2.png) ![](../../datasets/ch_street_rec_2.png)
(b) Label: 母婴用品连锁 (b) Label: 母婴用品连锁
- **Download link** - **Download link**
https://aistudio.baidu.com/aistudio/datasetdetail/8429 https://aistudio.baidu.com/aistudio/datasetdetail/8429
...@@ -49,13 +49,13 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429 ...@@ -49,13 +49,13 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429
- 5990 characters including Chinese characters, English letters, numbers and punctuation(Characters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt ) - 5990 characters including Chinese characters, English letters, numbers and punctuation(Characters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt )
- Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus - Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus
- Image resolution is 280x32 - Image resolution is 280x32
![](../datasets/ch_doc1.jpg) ![](../../datasets/ch_doc1.jpg)
![](../datasets/ch_doc3.jpg) ![](../../datasets/ch_doc3.jpg)
- **Download link**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (Password: lu7m) - **Download link**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (Password: lu7m)
<a name="ICDAR2019-ArT"></a> <a name="ICDAR2019-ArT"></a>
#### 5、ICDAR2019-ArT #### 5、ICDAR2019-ArT
- **Data source**:https://ai.baidu.com/broad/introduction?dataset=art - **Data source**:https://ai.baidu.com/broad/introduction?dataset=art
- **Introduction**:It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved. - **Introduction**:It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved.
![](../datasets/ArT.jpg) ![](../../datasets/ArT.jpg)
- **Download link**:https://ai.baidu.com/broad/download?dataset=art - **Download link**:https://ai.baidu.com/broad/download?dataset=art
...@@ -9,7 +9,7 @@ Here we have sorted out the commonly used handwritten OCR dataset datasets, whic ...@@ -9,7 +9,7 @@ Here we have sorted out the commonly used handwritten OCR dataset datasets, whic
- **Data introduction**: - **Data introduction**:
* It includes online and offline handwritten data,`HWDB1.0~1.2` has totally 3895135 handwritten single character samples, which belong to 7356 categories (7185 Chinese characters and 171 English letters, numbers and symbols);`HWDB2.0~2.2` has totally 5091 pages of images, which are divided into 52230 text lines and 1349414 words. All text and text samples are stored as grayscale images. Some sample words are shown below. * It includes online and offline handwritten data,`HWDB1.0~1.2` has totally 3895135 handwritten single character samples, which belong to 7356 categories (7185 Chinese characters and 171 English letters, numbers and symbols);`HWDB2.0~2.2` has totally 5091 pages of images, which are divided into 52230 text lines and 1349414 words. All text and text samples are stored as grayscale images. Some sample words are shown below.
![](../datasets/CASIA_0.jpg) ![](../../datasets/CASIA_0.jpg)
- **Download address**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html - **Download address**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html
- **使用建议**:Data for single character, white background, can form a large number of text lines for training. White background can be processed into transparent state, which is convenient to add various backgrounds. For the case of semantic needs, it is suggested to extract single character from real corpus to form text lines. - **使用建议**:Data for single character, white background, can form a large number of text lines for training. White background can be processed into transparent state, which is convenient to add various backgrounds. For the case of semantic needs, it is suggested to extract single character from real corpus to form text lines.
...@@ -22,7 +22,7 @@ Here we have sorted out the commonly used handwritten OCR dataset datasets, whic ...@@ -22,7 +22,7 @@ Here we have sorted out the commonly used handwritten OCR dataset datasets, whic
- **Data introduction**: NIST19 dataset is suitable for handwritten document and character recognition model training. It is extracted from the handwritten sample form of 3600 authors and contains 810000 character images in total. Nine of them are shown below. - **Data introduction**: NIST19 dataset is suitable for handwritten document and character recognition model training. It is extracted from the handwritten sample form of 3600 authors and contains 810000 character images in total. Nine of them are shown below.
![](../datasets/nist_demo.png) ![](../../datasets/nist_demo.png)
- **Download address**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19) - **Download address**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19)
# OCR datasets
- [1. Text detection](#1-text-detection)
- [1.1 PaddleOCR text detection format annotation](#11-paddleocr-text-detection-format-annotation)
- [1.2 Public dataset](#12-public-dataset)
- [1.2.1 ICDAR 2015](#121-icdar-2015)
- [2. Text recognition](#2-text-recognition)
- [2.1 PaddleOCR text recognition format annotation](#21-paddleocr-text-recognition-format-annotation)
- [2.2 Public dataset](#22-public-dataset)
- [2.1 ICDAR 2015](#21-icdar-2015)
- [3. Data storage path](#3-data-storage-path)
Here is a list of public datasets commonly used in OCR, which are being continuously updated. Welcome to contribute datasets~
## 1. Text detection
### 1.1 PaddleOCR text detection format annotation
The annotation file formats supported by the PaddleOCR text detection algorithm are as follows, separated by "\t":
```
" Image file name Image annotation information encoded by json.dumps"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
```
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
### 1.2 Public dataset
| dataset | Image download link | PaddleOCR format annotation download link |
|---|---|---|
| ICDAR 2015 | https://rrc.cvc.uab.es/?ch=4&com=downloads | [train](https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt) / [test](https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt) |
| ctw1500 | https://paddleocr.bj.bcebos.com/dataset/ctw1500.zip | Included in the downloaded image zip |
| total text | https://paddleocr.bj.bcebos.com/dataset/total_text.tar | Included in the downloaded image zip |
#### 1.2.1 ICDAR 2015
The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 dataset can be downloaded from the link in the table above. Registration is required for downloading.
After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`
<p align="center">
<img src="../../datasets/ic15_location_download.png" align="middle" width = "700"/>
<p align="center">
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. Then download the PaddleOCR format annotation file from the table above.
PaddleOCR also provides a data format conversion script, which can convert the official website label to the PaddleOCR format. The data conversion tool is in `ppocr/utils/gen_label.py`, here is the training set as an example:
```
# Convert the label file downloaded from the official website to train_icdar2015_label.txt
python gen_label.py --mode="det" --root_path="/path/to/icdar_c4_train_imgs/" \
--input_path="/path/to/ch4_training_localization_transcription_gt" \
--output_label="/path/to/train_icdar2015_label.txt"
```
After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
└─ icdar_c4_train_imgs/ Training data of icdar dataset
└─ ch4_test_images/ Testing data of icdar dataset
└─ train_icdar2015_label.txt Training annotation of icdar dataset
└─ test_icdar2015_label.txt Test annotation of icdar dataset
```
## 2. Text recognition
### 2.1 PaddleOCR text recognition format annotation
The text recognition algorithm in PaddleOCR supports two data formats:
- `lmdb` is used to train data sets stored in lmdb format, use [lmdb_dataset.py](../../../ppocr/data/lmdb_dataset.py) to load;
- `common dataset` is used to train data sets stored in text files, use [simple_dataset.py](../../../ppocr/data/simple_dataset.py) to load.
If you want to use your own data for training, please refer to the following to organize your data.
- Training set
It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
```
" Image file name Image annotation "
train_data/rec/train/word_001.jpg 简单可依赖
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
...
```
The final training set should have the following file structure:
```
|-train_data
|-rec
|- rec_gt_train.txt
|- train
|- word_001.png
|- word_002.jpg
|- word_003.jpg
| ...
```
- Test set
Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a rec_gt_test.txt. The structure of the test set is as follows:
```
|-train_data
|-rec
|-ic15_data
|- rec_gt_test.txt
|- test
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
```
### 2.2 Public dataset
| dataset | Image download link | PaddleOCR format annotation download link |
|---|---|---|
| en benchmark(MJ, SJ, IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE.) | [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) | LMDB format, which can be loaded directly with [lmdb_dataset.py](../../../ppocr/data/lmdb_dataset.py) |
|ICDAR 2015| http://rrc.cvc.uab.es/?ch=4&com=downloads | [train](https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt)/ [test](https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt) |
| Multilingual datasets |[Baidu network disk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) Extraction code: frgi <br> [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view) | Included in the downloaded image zip |
#### 2.1 ICDAR 2015
The ICDAR 2015 dataset can be downloaded from the link in the table above for quick validation. The lmdb format dataset required by en benchmark can also be downloaded from the table above.
Then download the PaddleOCR format annotation file from the table above.
PaddleOCR also provides a data format conversion script, which can convert the ICDAR official website label to the data format supported by PaddleOCR. The data conversion tool is in `ppocr/utils/gen_label.py`, here is the training set as an example:
```
# Convert the label file downloaded from the official website to rec_gt_label.txt
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
```
The data format is as follows, (a) is the original picture, (b) is the Ground Truth text file corresponding to each picture:
![](../../datasets/icdar_rec.png)
## 3. Data storage path
The default storage path for PaddleOCR training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
```
# linux and mac os
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
# windows
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
```
# Table Recognition Datasets
- [Dataset Summary](#dataset-summary)
- [1. PubTabNet](#1-pubtabnet)
- [2. TAL Table Recognition Competition Dataset](#2-tal-table-recognition-competition-dataset)
Here are the commonly used table recognition datasets, which are being updated continuously. Welcome to contribute datasets~
## Dataset Summary
| dataset | Image download link | PPOCR format annotation download link |
|---|---|---|
| PubTabNet |https://github.com/ibm-aur-nlp/PubTabNet| jsonl format, which can be loaded directly with [pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py) |
| TAL Table Recognition Competition Dataset |https://ai.100tal.com/dataset| jsonl format, which can be loaded directly with [pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py) |
## 1. PubTabNet
- **Data Introduction**:The training set of the PubTabNet dataset contains 500,000 images and the validation set contains 9000 images. Part of the image visualization is shown below.
<div align="center">
<img src="../../datasets/table_PubTabNet_demo/PMC524509_007_00.png" width="500">
<img src="../../datasets/table_PubTabNet_demo/PMC535543_007_01.png" width="500">
</div>
- **illustrate**:When using this dataset, the [CDLA-Permissive](https://cdla.io/permissive-1-0/) protocol is required.
## 2. TAL Table Recognition Competition Dataset
- **Data Introduction**:The training set of the TAL table recognition competition dataset contains 16,000 images. The validation set does not give trainable annotations.
<div align="center">
<img src="../../datasets/table_tal_demo/1.jpg" width="500">
<img src="../../datasets/table_tal_demo/2.jpg" width="500">
</div>
...@@ -22,7 +22,7 @@ Here we have sorted out the commonly used vertical multi-language OCR dataset da ...@@ -22,7 +22,7 @@ Here we have sorted out the commonly used vertical multi-language OCR dataset da
* CCPD-Challenge: So far, some of the most challenging images in license plate detection and recognition tasks * CCPD-Challenge: So far, some of the most challenging images in license plate detection and recognition tasks
* CCPD-NP: Pictures of new cars without license plates. * CCPD-NP: Pictures of new cars without license plates.
![](../datasets/ccpd_demo.png) ![](../../datasets/ccpd_demo.png)
- **Download address** - **Download address**
...@@ -46,7 +46,7 @@ Here we have sorted out the commonly used vertical multi-language OCR dataset da ...@@ -46,7 +46,7 @@ Here we have sorted out the commonly used vertical multi-language OCR dataset da
* End of validity: 07/41 * End of validity: 07/41
* Chinese phonetic alphabet of card users: MICHAEL * Chinese phonetic alphabet of card users: MICHAEL
![](../datasets/cmb_demo.jpg) ![](../../datasets/cmb_demo.jpg)
- **Download address**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip) - **Download address**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip)
...@@ -59,7 +59,7 @@ Here we have sorted out the commonly used vertical multi-language OCR dataset da ...@@ -59,7 +59,7 @@ Here we have sorted out the commonly used vertical multi-language OCR dataset da
- **Data introduction**: This is a toolkit for data synthesis. You can output captcha images according to the input text. Use the toolkit to generate several demo images as follows. - **Data introduction**: This is a toolkit for data synthesis. You can output captcha images according to the input text. Use the toolkit to generate several demo images as follows.
![](../datasets/captcha_demo.png) ![](../../datasets/captcha_demo.png)
- **Download address**: The dataset is generated and has no download address. - **Download address**: The dataset is generated and has no download address.
......
...@@ -2,63 +2,28 @@ ...@@ -2,63 +2,28 @@
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
- [1. Data and Weights Preparation](#1-data-and-weights-preparatio) - [1. Data and Weights Preparation](#1-data-and-weights-preparation)
* [1.1 Data Preparation](#11-data-preparation) - [1.1 Data Preparation](#11-data-preparation)
* [1.2 Download Pre-trained Model](#12-download-pretrained-model) - [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
- [2. Training](#2-training) - [2. Training](#2-training)
* [2.1 Start Training](#21-start-training) * [2.1 Start Training](#21-start-training)
* [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training) * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
* [2.3 Training with New Backbone](#23-training-with-new-backbone) * [2.3 Training with New Backbone](#23-training-with-new-backbone)
* [2.4 Training with knowledge distillation](#24) * [2.4 Mixed Precision Training](#24-amp-training)
* [2.5 Distributed Training](#25-distributed-training)
* [2.6 Training with knowledge distillation](#26)
* [2.7 Training on other platform(Windows/macOS/Linux DCU)](#27)
- [3. Evaluation and Test](#3-evaluation-and-test) - [3. Evaluation and Test](#3-evaluation-and-test)
* [3.1 Evaluation](#31-evaluation) - [3.1 Evaluation](#31-evaluation)
* [3.2 Test](#32-test) - [3.2 Test](#32-test)
- [4. Inference](#4-inference) - [4. Inference](#4-inference)
- [5. FAQ](#2-faq) - [5. FAQ](#5-faq)
## 1. Data and Weights Preparation ## 1. Data and Weights Preparation
### 1.1 Data Preparation ### 1.1 Data Preparation
The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets_en.md) .
After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`
<p align="center">
<img src="../datasets/ic15_location_download.png" align="middle" width = "700"/>
<p align="center">
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
```shell
# Under the PaddleOCR path
cd PaddleOCR/
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```
After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
└─ icdar_c4_train_imgs/ Training data of icdar dataset
└─ ch4_test_images/ Testing data of icdar dataset
└─ train_icdar2015_label.txt Training annotation of icdar dataset
└─ test_icdar2015_label.txt Test annotation of icdar dataset
```
The provided annotation file format is as follow, separated by "\t":
```
" Image file name Image annotation information encoded by json.dumps"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
```
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
### 1.2 Download Pre-trained Model ### 1.2 Download Pre-trained Model
...@@ -175,11 +140,44 @@ After adding the four-part modules of the network, you only need to configure th ...@@ -175,11 +140,44 @@ After adding the four-part modules of the network, you only need to configure th
**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md). **NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
### 2.4 Mixed Precision Training
### 2.4 Training with knowledge distillation If you want to speed up your training further, you can use [Auto Mixed Precision Training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), taking a single machine and a single gpu as an example, the commands are as follows:
```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```
### 2.5 Distributed Training
During multi-machine multi-gpu training, use the `--ips` parameter to set the used machine IP address, and the `--gpus` parameter to set the used GPU ID:
```bash
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
```
**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`.
### 2.6 Training with knowledge distillation
Knowledge distillation is supported in PaddleOCR for text detection training process. For more details, please refer to [doc](./knowledge_distillation_en.md). Knowledge distillation is supported in PaddleOCR for text detection training process. For more details, please refer to [doc](./knowledge_distillation_en.md).
### 2.7 Training on other platform(Windows/macOS/Linux DCU)
- Windows GPU/CPU
The Windows platform is slightly different from the Linux platform:
Windows platform only supports `single gpu` training and inference, specify GPU for training `set CUDA_VISIBLE_DEVICES=0`
On the Windows platform, DataLoader only supports single-process mode, so you need to set `num_workers` to 0;
- macOS
GPU mode is not supported, you need to set `use_gpu` to False in the configuration file, and the rest of the training evaluation prediction commands are exactly the same as Linux GPU.
- Linux DCU
Running on a DCU device requires setting the environment variable `export HIP_VISIBLE_DEVICES=0,1,2,3`, and the rest of the training and evaluation prediction commands are exactly the same as the Linux GPU.
## 3. Evaluation and Test ## 3. Evaluation and Test
### 3.1 Evaluation ### 3.1 Evaluation
......
# E-book: *Dive Into OCR*
\ No newline at end of file
English | [简体中文](../doc_ch/ppocr_introduction.md)
# PP-OCR
- [1. Introduction](#1)
- [2. Features](#2)
- [3. Benchmark](#3)
- [4. Visualization](#4)
- [5. Tutorial](#5)
- [5.1 Quick start](#51)
- [5.2 Model training / compression / deployment](#52)
- [6. Model zoo](#6)
<a name="1"></a>
## 1. Introduction
PP-OCR is a self-developed practical ultra-lightweight OCR system, which is slimed and optimized based on the reimplemented [academic algorithms](algorithm_en.md), considering the balance between **accuracy** and **speed**.
PP-OCR is a two-stage OCR system, in which the text detection algorithm is [DB](algorithm_det_db_en.md), and the text recognition algorithm is [CRNN](algorithm_rec_crnn_en.md). Besides, a [text direction classifier](angle_class_en.md) is added between the detection and recognition modules to deal with text in different directions.
PP-OCR pipeline is as follows:
<div align="center">
<img src="../ppocrv2_framework.jpg" width="800">
</div>
PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 have been released:
[1] PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144).
<a name="2"></a>
## 2. Features
- Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M
- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
- Support multi-lingual recognition: about 80 languages like Korean, Japanese, German, French, etc
<a name="3"></a>
## 3. benchmark
For the performance comparison between PP-OCR series models, please check the [benchmark](./benchmark_en.md) documentation.
<a name="4"></a>
## 4. Visualization [more](./visualization.md)
<details open>
<summary>PP-OCRv2 English model</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 Chinese model</summary>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
</div>
</details>
<details open>
<summary>PP-OCRv2 Multilingual model</summary>
<div align="center">
<img src="../imgs_results/french_0.jpg" width="800">
<img src="../imgs_results/korean.jpg" width="800">
</div>
</details>
<a name="5"></a>
## 5. Tutorial
<a name="51"></a>
### 5.1 Quick start
- You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
- Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
- One line of code quick use: [Quick Start](./quickstart_en.md)
<a name="52"></a>
### 5.2 Model training / compression / deployment
For more tutorials, including model training, model compression, deployment, etc., please refer to [tutorials](../../README.md#Tutorials)
<a name="6"></a>
## 6. Model zoo
## PP-OCR Series Model List(Update on September 8th)
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Chinese and English ultra-lightweight PP-OCRv2 model(11.6M) | ch_PP-OCRv2_xx |Mobile & Server|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
| Chinese and English ultra-lightweight PP-OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) |
| Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) |
For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./models_list_en.md).
For a new language request, please refer to [Guideline for new language_requests](../../README.md#language_requests).
# Text Recognition # Text Recognition
- [1. Data Preparation](#DATA_PREPARATION) - [1. Data Preparation](#DATA_PREPARATION)
- [1.1 Costom Dataset](#Costom_Dataset) * [1.1 Costom Dataset](#Costom_Dataset)
- [1.2 Dataset Download](#Dataset_download) * [1.2 Dataset Download](#Dataset_download)
- [1.3 Dictionary](#Dictionary) * [1.3 Dictionary](#Dictionary)
- [1.4 Add Space Category](#Add_space_category) * [1.4 Add Space Category](#Add_space_category)
* [1.5 Data Augmentation](#Data_Augmentation)
- [2. Training](#TRAINING) - [2. Training](#TRAINING)
- [2.1 Data Augmentation](#Data_Augmentation) * [2.1 Start Training](#21-start-training)
- [2.2 General Training](#Training) * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
- [2.3 Multi-language Training](#Multi_language) * [2.3 Training with New Backbone](#23-training-with-new-backbone)
- [2.4 Training with Knowledge Distillation](#kd) * [2.4 Mixed Precision Training](#24-amp-training)
* [2.5 Distributed Training](#25-distributed-training)
- [3. Evaluation](#EVALUATION) * [2.6 Training with knowledge distillation](#kd)
* [2.7 Multi-language Training](#Multi_language)
- [4. Prediction](#PREDICTION) * [2.8 Training on other platform(Windows/macOS/Linux DCU)](#28)
- [5. Convert to Inference Model](#Inference) - [3. Evaluation and Test](#3-evaluation-and-test)
* [3.1 Evaluation](#31-evaluation)
* [3.2 Test](#32-test)
- [4. Inference](#4-inference)
- [5. FAQ](#5-faq)
<a name="DATA_PREPARATION"></a> <a name="DATA_PREPARATION"></a>
## 1. Data Preparation ## 1. Data Preparation
### 1.1 DataSet Preparation
PaddleOCR supports two data formats: To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets.md) .
- `LMDB` is used to train data sets stored in lmdb format(LMDBDataSet);
- `general data` is used to train data sets stored in text files(SimpleDataSet):
Please organize the dataset as follows:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
```
# linux and mac os
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
# windows
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
```
<a name="Costom_Dataset"></a>
### 1.1 Costom Dataset
If you want to use your own data for training, please refer to the following to organize your data.
- Training set
It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
```
" Image file name Image annotation "
train_data/rec/train/word_001.jpg 简单可依赖
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
...
```
The final training set should have the following file structure:
```
|-train_data
|-rec
|- rec_gt_train.txt
|- train
|- word_001.png
|- word_002.jpg
|- word_003.jpg
| ...
```
- Test set
Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a rec_gt_test.txt. The structure of the test set is as follows:
```
|-train_data
|-rec
|-ic15_data
|- rec_gt_test.txt
|- test
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
```
<a name="Dataset_download"></a>
### 1.2 Dataset Download
- ICDAR2015
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads).
Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR. If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR.
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
```
# Training set label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# Test Set Label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
```
PaddleOCR also provides a data format conversion script, which can convert ICDAR official website label to a data format
supported by PaddleOCR. The data conversion tool is in `ppocr/utils/gen_label.py`, here is the training set as an example:
```
# convert the official gt to rec_gt_label.txt
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
```
The data format is as follows, (a) is the original picture, (b) is the Ground Truth text file corresponding to each picture:
![](../datasets/icdar_rec.png)
- Multilingual dataset
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) ,Extraction code:frgi.
* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
<a name="Dictionary"></a> <a name="Dictionary"></a>
### 1.3 Dictionary ### 1.2 Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
...@@ -173,11 +80,8 @@ If you need to customize dic file, please add character_dict_path field in confi ...@@ -173,11 +80,8 @@ If you need to customize dic file, please add character_dict_path field in confi
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
<a name="TRAINING"></a>
## 2.Training
<a name="Data_Augmentation"></a> <a name="Data_Augmentation"></a>
### 2.1 Data Augmentation ### 1.5 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default. PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
...@@ -185,11 +89,14 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand ...@@ -185,11 +89,14 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
<a name="Training"></a> <a name="TRAINING"></a>
### 2.2 General Training ## 2.Training
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
<a name="21-start-training"></a>
### 2.1 Start Training
First download the pretrain model, you can download the trained model to finetune on the icdar2015 data: First download the pretrain model, you can download the trained model to finetune on the icdar2015 data:
``` ```
...@@ -305,8 +212,99 @@ Eval: ...@@ -305,8 +212,99 @@ Eval:
``` ```
**Note that the configuration file for prediction/evaluation must be consistent with the training.** **Note that the configuration file for prediction/evaluation must be consistent with the training.**
<a name="22-load-trained-model-and-continue-training"></a>
### 2.2 Load Trained Model and Continue Training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
For example:
```shell
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./your/trained/model
```
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded.
<a name="23-training-with-new-backbone"></a>
### 2.3 Training with New Backbone
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).
```bash
├── architectures # Code for building network
├── transforms # Image Transformation Module
├── backbones # Feature extraction module
├── necks # Feature enhancement module
└── heads # Output module
```
If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file.
However, if you want to use a new Backbone, an example of replacing the backbones is as follows:
1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:
```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class MyBackbone(nn.Layer):
def __init__(self, *args, **kwargs):
super(MyBackbone, self).__init__()
# your init code
self.conv = nn.xxxx
def forward(self, inputs):
# your network forward
y = self.conv(inputs)
return y
```
3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.
After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
```yaml
Backbone:
name: MyBackbone
args1: args1
```
**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
<a name="24-amp-training"></a>
### 2.4 Mixed Precision Training
If you want to speed up your training further, you can use [Auto Mixed Precision Training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), taking a single machine and a single gpu as an example, the commands are as follows:
```shell
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml \
-o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```
<a name="25-distributed-training"></a>
### 2.5 Distributed Training
During multi-machine multi-gpu training, use the `--ips` parameter to set the used machine IP address, and the `--gpus` parameter to set the used GPU ID:
```bash
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml \
-o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train
```
**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`.
<a name="kd"></a>
### 2.6 Training with Knowledge Distillation
Knowledge distillation is supported in PaddleOCR for text recognition training process. For more details, please refer to [doc](./knowledge_distillation_en.md).
<a name="Multi_language"></a> <a name="Multi_language"></a>
### 2.3 Multi-language Training ### 2.7 Multi-language Training
Currently, the multi-language algorithms supported by PaddleOCR are: Currently, the multi-language algorithms supported by PaddleOCR are:
...@@ -362,25 +360,35 @@ Eval: ...@@ -362,25 +360,35 @@ Eval:
... ...
``` ```
<a name="kd"></a> <a name="28"></a>
### 2.8 Training on other platform(Windows/macOS/Linux DCU)
### 2.4 Training with Knowledge Distillation - Windows GPU/CPU
The Windows platform is slightly different from the Linux platform:
Windows platform only supports `single gpu` training and inference, specify GPU for training `set CUDA_VISIBLE_DEVICES=0`
On the Windows platform, DataLoader only supports single-process mode, so you need to set `num_workers` to 0;
Knowledge distillation is supported in PaddleOCR for text recognition training process. For more details, please refer to [doc](./knowledge_distillation_en.md). - macOS
GPU mode is not supported, you need to set `use_gpu` to False in the configuration file, and the rest of the training evaluation prediction commands are exactly the same as Linux GPU.
- Linux DCU
Running on a DCU device requires setting the environment variable `export HIP_VISIBLE_DEVICES=0,1,2,3`, and the rest of the training and evaluation prediction commands are exactly the same as the Linux GPU.
<a name="EVALUATION"></a> <a name="3-evaluation-and-test"></a>
## 3. Evaluation and Test
## 3. Evalution <a name="31-evaluation"></a>
### 3.1 Evaluation
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file. The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
``` ```
# GPU evaluation, Global.checkpoints is the weight to be tested # GPU evaluation, Global.checkpoints is the weight to be tested
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
``` ```
<a name="PREDICTION"></a> <a name="32-test"></a>
## 4. Prediction ### 3.2 Test
Using the model trained by paddleocr, you can quickly get prediction through the following script. Using the model trained by paddleocr, you can quickly get prediction through the following script.
...@@ -442,9 +450,14 @@ infer_img: doc/imgs_words/ch/word_1.jpg ...@@ -442,9 +450,14 @@ infer_img: doc/imgs_words/ch/word_1.jpg
result: ('韩国小馆', 0.997218) result: ('韩国小馆', 0.997218)
``` ```
<a name="Inference"></a> <a name="4-inference"></a>
## 4. Inference
## 5. Convert to Inference Model The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.
The recognition model is converted to the inference model in the same way as the detection, as follows: The recognition model is converted to the inference model in the same way as the detection, as follows:
...@@ -462,7 +475,7 @@ If you have a model trained on your own dataset with a different dictionary file ...@@ -462,7 +475,7 @@ If you have a model trained on your own dataset with a different dictionary file
After the conversion is successful, there are three files in the model save directory: After the conversion is successful, there are three files in the model save directory:
``` ```
inference/det_db/ inference/rec_crnn/
├── inference.pdiparams # The parameter file of recognition inference model ├── inference.pdiparams # The parameter file of recognition inference model
├── inference.pdiparams.info # The parameter information of recognition inference model, which can be ignored ├── inference.pdiparams.info # The parameter information of recognition inference model, which can be ignored
└── inference.pdmodel # The program file of recognition model └── inference.pdmodel # The program file of recognition model
...@@ -475,3 +488,10 @@ inference/det_db/ ...@@ -475,3 +488,10 @@ inference/det_db/
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path"
``` ```
<a name="5-faq"></a>
## 5. FAQ
Q1: After the training model is transferred to the inference model, the prediction effect is inconsistent?
**A**: There are many such problems, and the problems are mostly caused by inconsistent preprocessing and postprocessing parameters when the trained model predicts and the preprocessing and postprocessing parameters when the inference model predicts. You can compare whether there are differences in preprocessing, postprocessing, and prediction in the configuration files used for training.
...@@ -94,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows: ...@@ -94,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows:
- Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data. - Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
- Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set. - Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc. Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](dataset/datasets_en.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
<a name="22-vertical-scene"></a> <a name="22-vertical-scene"></a>
......
...@@ -19,7 +19,7 @@ ...@@ -19,7 +19,7 @@
- 2020.7.15, Add several related datasets, data annotation and synthesis tools. - 2020.7.15, Add several related datasets, data annotation and synthesis tools.
- 2020.7.9 Add a new model to support recognize the character "space". - 2020.7.9 Add a new model to support recognize the character "space".
- 2020.7.9 Add the data augument and learning rate decay strategies during training. - 2020.7.9 Add the data augument and learning rate decay strategies during training.
- 2020.6.8 Add [datasets](./datasets_en.md) and keep updating - 2020.6.8 Add [datasets](dataset/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score - 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide Lightweight Chinese OCR online experience - 2020.5.30 Provide Lightweight Chinese OCR online experience
......
...@@ -47,7 +47,7 @@ __all__ = [ ...@@ -47,7 +47,7 @@ __all__ = [
] ]
SUPPORT_DET_MODEL = ['DB'] SUPPORT_DET_MODEL = ['DB']
VERSION = '2.4.0.4' VERSION = '2.5'
SUPPORT_REC_MODEL = ['CRNN'] SUPPORT_REC_MODEL = ['CRNN']
BASE_DIR = os.path.expanduser("~/.paddleocr/") BASE_DIR = os.path.expanduser("~/.paddleocr/")
...@@ -442,7 +442,7 @@ class PPStructure(StructureSystem): ...@@ -442,7 +442,7 @@ class PPStructure(StructureSystem):
logger.debug(params) logger.debug(params)
super().__init__(params) super().__init__(params)
def __call__(self, img): def __call__(self, img, return_ocr_result_in_table=False):
if isinstance(img, str): if isinstance(img, str):
# download net image # download net image
if img.startswith('http'): if img.startswith('http'):
...@@ -460,7 +460,7 @@ class PPStructure(StructureSystem): ...@@ -460,7 +460,7 @@ class PPStructure(StructureSystem):
if isinstance(img, np.ndarray) and len(img.shape) == 2: if isinstance(img, np.ndarray) and len(img.shape) == 2:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
res = super().__call__(img) res = super().__call__(img, return_ocr_result_in_table)
return res return res
......
...@@ -72,6 +72,7 @@ def build_dataloader(config, mode, device, logger, seed=None): ...@@ -72,6 +72,7 @@ def build_dataloader(config, mode, device, logger, seed=None):
use_shared_memory = loader_config['use_shared_memory'] use_shared_memory = loader_config['use_shared_memory']
else: else:
use_shared_memory = True use_shared_memory = True
if mode == "Train": if mode == "Train":
# Distribute data to multiple cards # Distribute data to multiple cards
batch_sampler = DistributedBatchSampler( batch_sampler = DistributedBatchSampler(
......
...@@ -56,3 +56,17 @@ class ListCollator(object): ...@@ -56,3 +56,17 @@ class ListCollator(object):
for idx in to_tensor_idxs: for idx in to_tensor_idxs:
data_dict[idx] = paddle.to_tensor(data_dict[idx]) data_dict[idx] = paddle.to_tensor(data_dict[idx])
return list(data_dict.values()) return list(data_dict.values())
class SSLRotateCollate(object):
"""
bach: [
[(4*3xH*W), (4,)]
[(4*3xH*W), (4,)]
...
]
"""
def __call__(self, batch):
output = [np.concatenate(d, axis=0) for d in zip(*batch)]
return output
...@@ -22,8 +22,9 @@ from .make_shrink_map import MakeShrinkMap ...@@ -22,8 +22,9 @@ from .make_shrink_map import MakeShrinkMap
from .random_crop_data import EastRandomCropData, RandomCropImgMask from .random_crop_data import EastRandomCropData, RandomCropImgMask
from .make_pse_gt import MakePseGt from .make_pse_gt import MakePseGt
from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, \ from .rec_img_aug import RecAug, RecConAug, RecResizeImg, ClsResizeImg, \
SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg, PRENResizeImg SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg, PRENResizeImg, SVTRRecResizeImg
from .ssl_img_aug import SSLRotateResize
from .randaugment import RandAugment from .randaugment import RandAugment
from .copy_paste import CopyPaste from .copy_paste import CopyPaste
from .ColorJitter import ColorJitter from .ColorJitter import ColorJitter
......
...@@ -22,6 +22,7 @@ import numpy as np ...@@ -22,6 +22,7 @@ import numpy as np
import string import string
from shapely.geometry import LineString, Point, Polygon from shapely.geometry import LineString, Point, Polygon
import json import json
import copy
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
...@@ -112,14 +113,14 @@ class BaseRecLabelEncode(object): ...@@ -112,14 +113,14 @@ class BaseRecLabelEncode(object):
dict_character = list(self.character_str) dict_character = list(self.character_str)
self.lower = True self.lower = True
else: else:
self.character_str = "" self.character_str = []
with open(character_dict_path, "rb") as fin: with open(character_dict_path, "rb") as fin:
lines = fin.readlines() lines = fin.readlines()
for line in lines: for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n") line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str += line self.character_str.append(line)
if use_space_char: if use_space_char:
self.character_str += " " self.character_str.append(" ")
dict_character = list(self.character_str) dict_character = list(self.character_str)
dict_character = self.add_special_char(dict_character) dict_character = self.add_special_char(dict_character)
self.dict = {} self.dict = {}
...@@ -1007,3 +1008,34 @@ class VQATokenLabelEncode(object): ...@@ -1007,3 +1008,34 @@ class VQATokenLabelEncode(object):
gt_label.extend([self.label2id_map[("i-" + label).upper()]] * gt_label.extend([self.label2id_map[("i-" + label).upper()]] *
(len(encode_res["input_ids"]) - 1)) (len(encode_res["input_ids"]) - 1))
return gt_label return gt_label
class MultiLabelEncode(BaseRecLabelEncode):
def __init__(self,
max_text_length,
character_dict_path=None,
use_space_char=False,
**kwargs):
super(MultiLabelEncode, self).__init__(
max_text_length, character_dict_path, use_space_char)
self.ctc_encode = CTCLabelEncode(max_text_length, character_dict_path,
use_space_char, **kwargs)
self.sar_encode = SARLabelEncode(max_text_length, character_dict_path,
use_space_char, **kwargs)
def __call__(self, data):
data_ctc = copy.deepcopy(data)
data_sar = copy.deepcopy(data)
data_out = dict()
data_out['img_path'] = data.get('img_path', None)
data_out['image'] = data['image']
ctc = self.ctc_encode.__call__(data_ctc)
sar = self.sar_encode.__call__(data_sar)
if ctc is None or sar is None:
return None
data_out['label_ctc'] = ctc['label']
data_out['label_sar'] = sar['label']
data_out['length'] = ctc['length']
return data_out
...@@ -16,6 +16,7 @@ import math ...@@ -16,6 +16,7 @@ import math
import cv2 import cv2
import numpy as np import numpy as np
import random import random
import copy
from PIL import Image from PIL import Image
from .text_image_aug import tia_perspective, tia_stretch, tia_distort from .text_image_aug import tia_perspective, tia_stretch, tia_distort
...@@ -32,13 +33,56 @@ class RecAug(object): ...@@ -32,13 +33,56 @@ class RecAug(object):
return data return data
class RecConAug(object):
def __init__(self,
prob=0.5,
image_shape=(32, 320, 3),
max_text_length=25,
ext_data_num=1,
**kwargs):
self.ext_data_num = ext_data_num
self.prob = prob
self.max_text_length = max_text_length
self.image_shape = image_shape
self.max_wh_ratio = self.image_shape[1] / self.image_shape[0]
def merge_ext_data(self, data, ext_data):
ori_w = round(data['image'].shape[1] / data['image'].shape[0] *
self.image_shape[0])
ext_w = round(ext_data['image'].shape[1] / ext_data['image'].shape[0] *
self.image_shape[0])
data['image'] = cv2.resize(data['image'], (ori_w, self.image_shape[0]))
ext_data['image'] = cv2.resize(ext_data['image'],
(ext_w, self.image_shape[0]))
data['image'] = np.concatenate(
[data['image'], ext_data['image']], axis=1)
data["label"] += ext_data["label"]
return data
def __call__(self, data):
rnd_num = random.random()
if rnd_num > self.prob:
return data
for idx, ext_data in enumerate(data["ext_data"]):
if len(data["label"]) + len(ext_data[
"label"]) > self.max_text_length:
break
concat_ratio = data['image'].shape[1] / data['image'].shape[
0] + ext_data['image'].shape[1] / ext_data['image'].shape[0]
if concat_ratio > self.max_wh_ratio:
break
data = self.merge_ext_data(data, ext_data)
data.pop("ext_data")
return data
class ClsResizeImg(object): class ClsResizeImg(object):
def __init__(self, image_shape, **kwargs): def __init__(self, image_shape, **kwargs):
self.image_shape = image_shape self.image_shape = image_shape
def __call__(self, data): def __call__(self, data):
img = data['image'] img = data['image']
norm_img = resize_norm_img(img, self.image_shape) norm_img, _ = resize_norm_img(img, self.image_shape)
data['image'] = norm_img data['image'] = norm_img
return data return data
...@@ -98,10 +142,13 @@ class RecResizeImg(object): ...@@ -98,10 +142,13 @@ class RecResizeImg(object):
def __call__(self, data): def __call__(self, data):
img = data['image'] img = data['image']
if self.infer_mode and self.character_dict_path is not None: if self.infer_mode and self.character_dict_path is not None:
norm_img = resize_norm_img_chinese(img, self.image_shape) norm_img, valid_ratio = resize_norm_img_chinese(img,
self.image_shape)
else: else:
norm_img = resize_norm_img(img, self.image_shape, self.padding) norm_img, valid_ratio = resize_norm_img(img, self.image_shape,
self.padding)
data['image'] = norm_img data['image'] = norm_img
data['valid_ratio'] = valid_ratio
return data return data
...@@ -160,6 +207,25 @@ class PRENResizeImg(object): ...@@ -160,6 +207,25 @@ class PRENResizeImg(object):
return data return data
class SVTRRecResizeImg(object):
def __init__(self,
image_shape,
infer_mode=False,
character_dict_path='./ppocr/utils/ppocr_keys_v1.txt',
padding=True,
**kwargs):
self.image_shape = image_shape
self.infer_mode = infer_mode
self.character_dict_path = character_dict_path
self.padding = padding
def __call__(self, data):
img = data['image']
norm_img = resize_norm_img_svtr(img, self.image_shape, self.padding)
data['image'] = norm_img
return data
def resize_norm_img_sar(img, image_shape, width_downsample_ratio=0.25): def resize_norm_img_sar(img, image_shape, width_downsample_ratio=0.25):
imgC, imgH, imgW_min, imgW_max = image_shape imgC, imgH, imgW_min, imgW_max = image_shape
h = img.shape[0] h = img.shape[0]
...@@ -220,7 +286,8 @@ def resize_norm_img(img, image_shape, padding=True): ...@@ -220,7 +286,8 @@ def resize_norm_img(img, image_shape, padding=True):
resized_image /= 0.5 resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image padding_im[:, :, 0:resized_w] = resized_image
return padding_im valid_ratio = min(1.0, float(resized_w / imgW))
return padding_im, valid_ratio
def resize_norm_img_chinese(img, image_shape): def resize_norm_img_chinese(img, image_shape):
...@@ -230,7 +297,7 @@ def resize_norm_img_chinese(img, image_shape): ...@@ -230,7 +297,7 @@ def resize_norm_img_chinese(img, image_shape):
h, w = img.shape[0], img.shape[1] h, w = img.shape[0], img.shape[1]
ratio = w * 1.0 / h ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, ratio) max_wh_ratio = max(max_wh_ratio, ratio)
imgW = int(32 * max_wh_ratio) imgW = int(imgH * max_wh_ratio)
if math.ceil(imgH * ratio) > imgW: if math.ceil(imgH * ratio) > imgW:
resized_w = imgW resized_w = imgW
else: else:
...@@ -246,7 +313,8 @@ def resize_norm_img_chinese(img, image_shape): ...@@ -246,7 +313,8 @@ def resize_norm_img_chinese(img, image_shape):
resized_image /= 0.5 resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image padding_im[:, :, 0:resized_w] = resized_image
return padding_im valid_ratio = min(1.0, float(resized_w / imgW))
return padding_im, valid_ratio
def resize_norm_img_srn(img, image_shape): def resize_norm_img_srn(img, image_shape):
...@@ -276,6 +344,58 @@ def resize_norm_img_srn(img, image_shape): ...@@ -276,6 +344,58 @@ def resize_norm_img_srn(img, image_shape):
return np.reshape(img_black, (c, row, col)).astype(np.float32) return np.reshape(img_black, (c, row, col)).astype(np.float32)
def resize_norm_img_svtr(img, image_shape, padding=False):
imgC, imgH, imgW = image_shape
h = img.shape[0]
w = img.shape[1]
if not padding:
if h > 2.0 * w:
image = Image.fromarray(img)
image1 = image.rotate(90, expand=True)
image2 = image.rotate(-90, expand=True)
img1 = np.array(image1)
img2 = np.array(image2)
else:
img1 = copy.deepcopy(img)
img2 = copy.deepcopy(img)
resized_image = cv2.resize(
img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
resized_image1 = cv2.resize(
img1, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
resized_image2 = cv2.resize(
img2, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
resized_w = imgW
else:
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
resized_image1 = resized_image1.astype('float32')
resized_image2 = resized_image2.astype('float32')
if image_shape[0] == 1:
resized_image = resized_image / 255
resized_image = resized_image[np.newaxis, :]
else:
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image1 = resized_image1.transpose((2, 0, 1)) / 255
resized_image2 = resized_image2.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
resized_image1 -= 0.5
resized_image1 /= 0.5
resized_image2 -= 0.5
resized_image2 /= 0.5
padding_im = np.zeros((3, imgC, imgH, imgW), dtype=np.float32)
padding_im[0, :, :, 0:resized_w] = resized_image
padding_im[1, :, :, 0:resized_w] = resized_image1
padding_im[2, :, :, 0:resized_w] = resized_image2
return padding_im
def srn_other_inputs(image_shape, num_heads, max_text_length): def srn_other_inputs(image_shape, num_heads, max_text_length):
imgC, imgH, imgW = image_shape imgC, imgH, imgW = image_shape
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import cv2
import numpy as np
import random
from PIL import Image
from .rec_img_aug import resize_norm_img
class SSLRotateResize(object):
def __init__(self,
image_shape,
padding=False,
select_all=True,
mode="train",
**kwargs):
self.image_shape = image_shape
self.padding = padding
self.select_all = select_all
self.mode = mode
def __call__(self, data):
img = data["image"]
data["image_r90"] = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
data["image_r180"] = cv2.rotate(data["image_r90"],
cv2.ROTATE_90_CLOCKWISE)
data["image_r270"] = cv2.rotate(data["image_r180"],
cv2.ROTATE_90_CLOCKWISE)
images = []
for key in ["image", "image_r90", "image_r180", "image_r270"]:
images.append(
resize_norm_img(
data.pop(key),
image_shape=self.image_shape,
padding=self.padding)[0])
data["image"] = np.stack(images, axis=0)
data["label"] = np.array(list(range(4)))
if not self.select_all:
data["image"] = data["image"][0::2] # just choose 0 and 180
data["label"] = data["label"][0:2] # label needs to be continuous
if self.mode == "test":
data["image"] = data["image"][0]
data["label"] = data["label"][0]
return data
...@@ -49,7 +49,8 @@ class SimpleDataSet(Dataset): ...@@ -49,7 +49,8 @@ class SimpleDataSet(Dataset):
if self.mode == "train" and self.do_shuffle: if self.mode == "train" and self.do_shuffle:
self.shuffle_data_random() self.shuffle_data_random()
self.ops = create_operators(dataset_config['transforms'], global_config) self.ops = create_operators(dataset_config['transforms'], global_config)
self.ext_op_transform_idx = dataset_config.get("ext_op_transform_idx",
2)
self.need_reset = True in [x < 1 for x in ratio_list] self.need_reset = True in [x < 1 for x in ratio_list]
def get_image_info_list(self, file_list, ratio_list): def get_image_info_list(self, file_list, ratio_list):
...@@ -87,7 +88,7 @@ class SimpleDataSet(Dataset): ...@@ -87,7 +88,7 @@ class SimpleDataSet(Dataset):
if hasattr(op, 'ext_data_num'): if hasattr(op, 'ext_data_num'):
ext_data_num = getattr(op, 'ext_data_num') ext_data_num = getattr(op, 'ext_data_num')
break break
load_data_ops = self.ops[:2] load_data_ops = self.ops[:self.ext_op_transform_idx]
ext_data = [] ext_data = []
while len(ext_data) < ext_data_num: while len(ext_data) < ext_data_num:
...@@ -108,7 +109,10 @@ class SimpleDataSet(Dataset): ...@@ -108,7 +109,10 @@ class SimpleDataSet(Dataset):
data['image'] = img data['image'] = img
data = transform(data, load_data_ops) data = transform(data, load_data_ops)
if data is None or data['polys'].shape[1] != 4: if data is None:
continue
if 'polys' in data.keys():
if data['polys'].shape[1] != 4:
continue continue
ext_data.append(data) ext_data.append(data)
return ext_data return ext_data
......
...@@ -34,6 +34,7 @@ from .rec_nrtr_loss import NRTRLoss ...@@ -34,6 +34,7 @@ from .rec_nrtr_loss import NRTRLoss
from .rec_sar_loss import SARLoss from .rec_sar_loss import SARLoss
from .rec_aster_loss import AsterLoss from .rec_aster_loss import AsterLoss
from .rec_pren_loss import PRENLoss from .rec_pren_loss import PRENLoss
from .rec_multi_loss import MultiLoss
# cls loss # cls loss
from .cls_loss import ClsLoss from .cls_loss import ClsLoss
...@@ -60,7 +61,7 @@ def build_loss(config): ...@@ -60,7 +61,7 @@ def build_loss(config):
'DBLoss', 'PSELoss', 'EASTLoss', 'SASTLoss', 'FCELoss', 'CTCLoss', 'DBLoss', 'PSELoss', 'EASTLoss', 'SASTLoss', 'FCELoss', 'CTCLoss',
'ClsLoss', 'AttentionLoss', 'SRNLoss', 'PGLoss', 'CombinedLoss', 'ClsLoss', 'AttentionLoss', 'SRNLoss', 'PGLoss', 'CombinedLoss',
'NRTRLoss', 'TableAttentionLoss', 'SARLoss', 'AsterLoss', 'SDMGRLoss', 'NRTRLoss', 'TableAttentionLoss', 'SARLoss', 'AsterLoss', 'SDMGRLoss',
'VQASerTokenLayoutLMLoss', 'LossFromOutput', 'PRENLoss' 'VQASerTokenLayoutLMLoss', 'LossFromOutput', 'PRENLoss', 'MultiLoss'
] ]
config = copy.deepcopy(config) config = copy.deepcopy(config)
module_name = config.pop('name') module_name = config.pop('name')
......
...@@ -106,8 +106,8 @@ class DMLLoss(nn.Layer): ...@@ -106,8 +106,8 @@ class DMLLoss(nn.Layer):
def forward(self, out1, out2): def forward(self, out1, out2):
if self.act is not None: if self.act is not None:
out1 = self.act(out1) out1 = self.act(out1) + 1e-10
out2 = self.act(out2) out2 = self.act(out2) + 1e-10
if self.use_log: if self.use_log:
# for recognition distillation, log is needed for feature map # for recognition distillation, log is needed for feature map
log_out1 = paddle.log(out1) log_out1 = paddle.log(out1)
......
...@@ -18,8 +18,10 @@ import paddle.nn as nn ...@@ -18,8 +18,10 @@ import paddle.nn as nn
from .rec_ctc_loss import CTCLoss from .rec_ctc_loss import CTCLoss
from .center_loss import CenterLoss from .center_loss import CenterLoss
from .ace_loss import ACELoss from .ace_loss import ACELoss
from .rec_sar_loss import SARLoss
from .distillation_loss import DistillationCTCLoss from .distillation_loss import DistillationCTCLoss
from .distillation_loss import DistillationSARLoss
from .distillation_loss import DistillationDMLLoss from .distillation_loss import DistillationDMLLoss
from .distillation_loss import DistillationDistanceLoss, DistillationDBLoss, DistillationDilaDBLoss from .distillation_loss import DistillationDistanceLoss, DistillationDBLoss, DistillationDilaDBLoss
......
...@@ -18,6 +18,7 @@ import numpy as np ...@@ -18,6 +18,7 @@ import numpy as np
import cv2 import cv2
from .rec_ctc_loss import CTCLoss from .rec_ctc_loss import CTCLoss
from .rec_sar_loss import SARLoss
from .basic_loss import DMLLoss from .basic_loss import DMLLoss
from .basic_loss import DistanceLoss from .basic_loss import DistanceLoss
from .det_db_loss import DBLoss from .det_db_loss import DBLoss
...@@ -46,11 +47,15 @@ class DistillationDMLLoss(DMLLoss): ...@@ -46,11 +47,15 @@ class DistillationDMLLoss(DMLLoss):
act=None, act=None,
use_log=False, use_log=False,
key=None, key=None,
multi_head=False,
dis_head='ctc',
maps_name=None, maps_name=None,
name="dml"): name="dml"):
super().__init__(act=act, use_log=use_log) super().__init__(act=act, use_log=use_log)
assert isinstance(model_name_pairs, list) assert isinstance(model_name_pairs, list)
self.key = key self.key = key
self.multi_head = multi_head
self.dis_head = dis_head
self.model_name_pairs = self._check_model_name_pairs(model_name_pairs) self.model_name_pairs = self._check_model_name_pairs(model_name_pairs)
self.name = name self.name = name
self.maps_name = self._check_maps_name(maps_name) self.maps_name = self._check_maps_name(maps_name)
...@@ -97,6 +102,10 @@ class DistillationDMLLoss(DMLLoss): ...@@ -97,6 +102,10 @@ class DistillationDMLLoss(DMLLoss):
out2 = out2[self.key] out2 = out2[self.key]
if self.maps_name is None: if self.maps_name is None:
if self.multi_head:
loss = super().forward(out1[self.dis_head],
out2[self.dis_head])
else:
loss = super().forward(out1, out2) loss = super().forward(out1, out2)
if isinstance(loss, dict): if isinstance(loss, dict):
for key in loss: for key in loss:
...@@ -123,11 +132,50 @@ class DistillationDMLLoss(DMLLoss): ...@@ -123,11 +132,50 @@ class DistillationDMLLoss(DMLLoss):
class DistillationCTCLoss(CTCLoss): class DistillationCTCLoss(CTCLoss):
def __init__(self, model_name_list=[], key=None, name="loss_ctc"): def __init__(self,
model_name_list=[],
key=None,
multi_head=False,
name="loss_ctc"):
super().__init__() super().__init__()
self.model_name_list = model_name_list self.model_name_list = model_name_list
self.key = key self.key = key
self.name = name self.name = name
self.multi_head = multi_head
def forward(self, predicts, batch):
loss_dict = dict()
for idx, model_name in enumerate(self.model_name_list):
out = predicts[model_name]
if self.key is not None:
out = out[self.key]
if self.multi_head:
assert 'ctc' in out, 'multi head has multi out'
loss = super().forward(out['ctc'], batch[:2] + batch[3:])
else:
loss = super().forward(out, batch)
if isinstance(loss, dict):
for key in loss:
loss_dict["{}_{}_{}".format(self.name, model_name,
idx)] = loss[key]
else:
loss_dict["{}_{}".format(self.name, model_name)] = loss
return loss_dict
class DistillationSARLoss(SARLoss):
def __init__(self,
model_name_list=[],
key=None,
multi_head=False,
name="loss_sar",
**kwargs):
ignore_index = kwargs.get('ignore_index', 92)
super().__init__(ignore_index=ignore_index)
self.model_name_list = model_name_list
self.key = key
self.name = name
self.multi_head = multi_head
def forward(self, predicts, batch): def forward(self, predicts, batch):
loss_dict = dict() loss_dict = dict()
...@@ -135,6 +183,10 @@ class DistillationCTCLoss(CTCLoss): ...@@ -135,6 +183,10 @@ class DistillationCTCLoss(CTCLoss):
out = predicts[model_name] out = predicts[model_name]
if self.key is not None: if self.key is not None:
out = out[self.key] out = out[self.key]
if self.multi_head:
assert 'sar' in out, 'multi head has multi out'
loss = super().forward(out['sar'], batch[:1] + batch[2:])
else:
loss = super().forward(out, batch) loss = super().forward(out, batch)
if isinstance(loss, dict): if isinstance(loss, dict):
for key in loss: for key in loss:
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
from paddle import nn
from .rec_ctc_loss import CTCLoss
from .rec_sar_loss import SARLoss
class MultiLoss(nn.Layer):
def __init__(self, **kwargs):
super().__init__()
self.loss_funcs = {}
self.loss_list = kwargs.pop('loss_config_list')
self.weight_1 = kwargs.get('weight_1', 1.0)
self.weight_2 = kwargs.get('weight_2', 1.0)
self.gtc_loss = kwargs.get('gtc_loss', 'sar')
for loss_info in self.loss_list:
for name, param in loss_info.items():
if param is not None:
kwargs.update(param)
loss = eval(name)(**kwargs)
self.loss_funcs[name] = loss
def forward(self, predicts, batch):
self.total_loss = {}
total_loss = 0.0
# batch [image, label_ctc, label_sar, length, valid_ratio]
for name, loss_func in self.loss_funcs.items():
if name == 'CTCLoss':
loss = loss_func(predicts['ctc'],
batch[:2] + batch[3:])['loss'] * self.weight_1
elif name == 'SARLoss':
loss = loss_func(predicts['sar'],
batch[:1] + batch[2:])['loss'] * self.weight_2
else:
raise NotImplementedError(
'{} is not supported in MultiLoss yet'.format(name))
self.total_loss[name] = loss
total_loss += loss
self.total_loss['loss'] = total_loss
return self.total_loss
...@@ -9,8 +9,9 @@ from paddle import nn ...@@ -9,8 +9,9 @@ from paddle import nn
class SARLoss(nn.Layer): class SARLoss(nn.Layer):
def __init__(self, **kwargs): def __init__(self, **kwargs):
super(SARLoss, self).__init__() super(SARLoss, self).__init__()
ignore_index = kwargs.get('ignore_index', 92) # 6626
self.loss_func = paddle.nn.loss.CrossEntropyLoss( self.loss_func = paddle.nn.loss.CrossEntropyLoss(
reduction="mean", ignore_index=92) reduction="mean", ignore_index=ignore_index)
def forward(self, predicts, batch): def forward(self, predicts, batch):
predict = predicts[:, : predict = predicts[:, :
......
...@@ -17,9 +17,14 @@ import string ...@@ -17,9 +17,14 @@ import string
class RecMetric(object): class RecMetric(object):
def __init__(self, main_indicator='acc', is_filter=False, **kwargs): def __init__(self,
main_indicator='acc',
is_filter=False,
ignore_space=True,
**kwargs):
self.main_indicator = main_indicator self.main_indicator = main_indicator
self.is_filter = is_filter self.is_filter = is_filter
self.ignore_space = ignore_space
self.eps = 1e-5 self.eps = 1e-5
self.reset() self.reset()
...@@ -34,6 +39,7 @@ class RecMetric(object): ...@@ -34,6 +39,7 @@ class RecMetric(object):
all_num = 0 all_num = 0
norm_edit_dis = 0.0 norm_edit_dis = 0.0
for (pred, pred_conf), (target, _) in zip(preds, labels): for (pred, pred_conf), (target, _) in zip(preds, labels):
if self.ignore_space:
pred = pred.replace(" ", "") pred = pred.replace(" ", "")
target = target.replace(" ", "") target = target.replace(" ", "")
if self.is_filter: if self.is_filter:
......
...@@ -83,7 +83,11 @@ class BaseModel(nn.Layer): ...@@ -83,7 +83,11 @@ class BaseModel(nn.Layer):
y["neck_out"] = x y["neck_out"] = x
if self.use_head: if self.use_head:
x = self.head(x, targets=data) x = self.head(x, targets=data)
if isinstance(x, dict): # for multi head, save ctc neck out for udml
if isinstance(x, dict) and 'ctc_neck' in x.keys():
y["neck_out"] = x["ctc_neck"]
y["head_out"] = x
elif isinstance(x, dict):
y.update(x) y.update(x)
else: else:
y["head_out"] = x y["head_out"] = x
......
...@@ -53,8 +53,8 @@ class DistillationModel(nn.Layer): ...@@ -53,8 +53,8 @@ class DistillationModel(nn.Layer):
self.model_list.append(self.add_sublayer(key, model)) self.model_list.append(self.add_sublayer(key, model))
self.model_name_list.append(key) self.model_name_list.append(key)
def forward(self, x): def forward(self, x, data=None):
result_dict = dict() result_dict = dict()
for idx, model_name in enumerate(self.model_name_list): for idx, model_name in enumerate(self.model_name_list):
result_dict[model_name] = self.model_list[idx](x) result_dict[model_name] = self.model_list[idx](x, data)
return result_dict return result_dict
...@@ -31,9 +31,11 @@ def build_backbone(config, model_type): ...@@ -31,9 +31,11 @@ def build_backbone(config, model_type):
from .rec_resnet_aster import ResNet_ASTER from .rec_resnet_aster import ResNet_ASTER
from .rec_micronet import MicroNet from .rec_micronet import MicroNet
from .rec_efficientb3_pren import EfficientNetb3_PREN from .rec_efficientb3_pren import EfficientNetb3_PREN
from .rec_svtrnet import SVTRNet
support_dict = [ support_dict = [
'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB', 'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB',
"ResNet31", "ResNet_ASTER", 'MicroNet', 'EfficientNetb3_PREN' "ResNet31", "ResNet_ASTER", 'MicroNet', 'EfficientNetb3_PREN',
'SVTRNet'
] ]
elif model_type == "e2e": elif model_type == "e2e":
from .e2e_resnet_vd_pg import ResNet from .e2e_resnet_vd_pg import ResNet
......
...@@ -103,7 +103,12 @@ class DepthwiseSeparable(nn.Layer): ...@@ -103,7 +103,12 @@ class DepthwiseSeparable(nn.Layer):
class MobileNetV1Enhance(nn.Layer): class MobileNetV1Enhance(nn.Layer):
def __init__(self, in_channels=3, scale=0.5, **kwargs): def __init__(self,
in_channels=3,
scale=0.5,
last_conv_stride=1,
last_pool_type='max',
**kwargs):
super().__init__() super().__init__()
self.scale = scale self.scale = scale
self.block_list = [] self.block_list = []
...@@ -200,7 +205,7 @@ class MobileNetV1Enhance(nn.Layer): ...@@ -200,7 +205,7 @@ class MobileNetV1Enhance(nn.Layer):
num_filters1=1024, num_filters1=1024,
num_filters2=1024, num_filters2=1024,
num_groups=1024, num_groups=1024,
stride=1, stride=last_conv_stride,
dw_size=5, dw_size=5,
padding=2, padding=2,
use_se=True, use_se=True,
...@@ -208,7 +213,9 @@ class MobileNetV1Enhance(nn.Layer): ...@@ -208,7 +213,9 @@ class MobileNetV1Enhance(nn.Layer):
self.block_list.append(conv6) self.block_list.append(conv6)
self.block_list = nn.Sequential(*self.block_list) self.block_list = nn.Sequential(*self.block_list)
if last_pool_type == 'avg':
self.pool = nn.AvgPool2D(kernel_size=2, stride=2, padding=0)
else:
self.pool = nn.MaxPool2D(kernel_size=2, stride=2, padding=0) self.pool = nn.MaxPool2D(kernel_size=2, stride=2, padding=0)
self.out_channels = int(1024 * scale) self.out_channels = int(1024 * scale)
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from collections import Callable
from paddle import ParamAttr
from paddle.nn.initializer import KaimingNormal
import numpy as np
import paddle
import paddle.nn as nn
from paddle.nn.initializer import TruncatedNormal, Constant, Normal
trunc_normal_ = TruncatedNormal(std=.02)
normal_ = Normal
zeros_ = Constant(value=0.)
ones_ = Constant(value=1.)
def drop_path(x, drop_prob=0., training=False):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ...
"""
if drop_prob == 0. or not training:
return x
keep_prob = paddle.to_tensor(1 - drop_prob)
shape = (paddle.shape(x)[0], ) + (1, ) * (x.ndim - 1)
random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype)
random_tensor = paddle.floor(random_tensor) # binarize
output = x.divide(keep_prob) * random_tensor
return output
class ConvBNLayer(nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size=3,
stride=1,
padding=0,
bias_attr=False,
groups=1,
act=nn.GELU):
super().__init__()
self.conv = nn.Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
weight_attr=paddle.ParamAttr(
initializer=nn.initializer.KaimingUniform()),
bias_attr=bias_attr)
self.norm = nn.BatchNorm2D(out_channels)
self.act = act()
def forward(self, inputs):
out = self.conv(inputs)
out = self.norm(out)
out = self.act(out)
return out
class DropPath(nn.Layer):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"""
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob
def forward(self, x):
return drop_path(x, self.drop_prob, self.training)
class Identity(nn.Layer):
def __init__(self):
super(Identity, self).__init__()
def forward(self, input):
return input
class Mlp(nn.Layer):
def __init__(self,
in_features,
hidden_features=None,
out_features=None,
act_layer=nn.GELU,
drop=0.):
super().__init__()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
self.fc1 = nn.Linear(in_features, hidden_features)
self.act = act_layer()
self.fc2 = nn.Linear(hidden_features, out_features)
self.drop = nn.Dropout(drop)
def forward(self, x):
x = self.fc1(x)
x = self.act(x)
x = self.drop(x)
x = self.fc2(x)
x = self.drop(x)
return x
class ConvMixer(nn.Layer):
def __init__(
self,
dim,
num_heads=8,
HW=[8, 25],
local_k=[3, 3], ):
super().__init__()
self.HW = HW
self.dim = dim
self.local_mixer = nn.Conv2D(
dim,
dim,
local_k,
1, [local_k[0] // 2, local_k[1] // 2],
groups=num_heads,
weight_attr=ParamAttr(initializer=KaimingNormal()))
def forward(self, x):
h = self.HW[0]
w = self.HW[1]
x = x.transpose([0, 2, 1]).reshape([0, self.dim, h, w])
x = self.local_mixer(x)
x = x.flatten(2).transpose([0, 2, 1])
return x
class Attention(nn.Layer):
def __init__(self,
dim,
num_heads=8,
mixer='Global',
HW=[8, 25],
local_k=[7, 11],
qkv_bias=False,
qk_scale=None,
attn_drop=0.,
proj_drop=0.):
super().__init__()
self.num_heads = num_heads
head_dim = dim // num_heads
self.scale = qk_scale or head_dim**-0.5
self.qkv = nn.Linear(dim, dim * 3, bias_attr=qkv_bias)
self.attn_drop = nn.Dropout(attn_drop)
self.proj = nn.Linear(dim, dim)
self.proj_drop = nn.Dropout(proj_drop)
self.HW = HW
if HW is not None:
H = HW[0]
W = HW[1]
self.N = H * W
self.C = dim
if mixer == 'Local' and HW is not None:
hk = local_k[0]
wk = local_k[1]
mask = np.ones([H * W, H * W])
for h in range(H):
for w in range(W):
for kh in range(-(hk // 2), (hk // 2) + 1):
for kw in range(-(wk // 2), (wk // 2) + 1):
if H > (h + kh) >= 0 and W > (w + kw) >= 0:
mask[h * W + w][(h + kh) * W + (w + kw)] = 0
mask_paddle = paddle.to_tensor(mask, dtype='float32')
mask_inf = paddle.full([H * W, H * W], '-inf', dtype='float32')
mask = paddle.where(mask_paddle < 1, mask_paddle, mask_inf)
self.mask = mask.unsqueeze([0, 1])
self.mixer = mixer
def forward(self, x):
if self.HW is not None:
N = self.N
C = self.C
else:
_, N, C = x.shape
qkv = self.qkv(x).reshape((0, N, 3, self.num_heads, C //
self.num_heads)).transpose((2, 0, 3, 1, 4))
q, k, v = qkv[0] * self.scale, qkv[1], qkv[2]
attn = (q.matmul(k.transpose((0, 1, 3, 2))))
if self.mixer == 'Local':
attn += self.mask
attn = nn.functional.softmax(attn, axis=-1)
attn = self.attn_drop(attn)
x = (attn.matmul(v)).transpose((0, 2, 1, 3)).reshape((0, N, C))
x = self.proj(x)
x = self.proj_drop(x)
return x
class Block(nn.Layer):
def __init__(self,
dim,
num_heads,
mixer='Global',
local_mixer=[7, 11],
HW=[8, 25],
mlp_ratio=4.,
qkv_bias=False,
qk_scale=None,
drop=0.,
attn_drop=0.,
drop_path=0.,
act_layer=nn.GELU,
norm_layer='nn.LayerNorm',
epsilon=1e-6,
prenorm=True):
super().__init__()
if isinstance(norm_layer, str):
self.norm1 = eval(norm_layer)(dim, epsilon=epsilon)
elif isinstance(norm_layer, Callable):
self.norm1 = norm_layer(dim)
else:
raise TypeError(
"The norm_layer must be str or paddle.nn.layer.Layer class")
if mixer == 'Global' or mixer == 'Local':
self.mixer = Attention(
dim,
num_heads=num_heads,
mixer=mixer,
HW=HW,
local_k=local_mixer,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
attn_drop=attn_drop,
proj_drop=drop)
elif mixer == 'Conv':
self.mixer = ConvMixer(
dim, num_heads=num_heads, HW=HW, local_k=local_mixer)
else:
raise TypeError("The mixer must be one of [Global, Local, Conv]")
# NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity()
if isinstance(norm_layer, str):
self.norm2 = eval(norm_layer)(dim, epsilon=epsilon)
elif isinstance(norm_layer, Callable):
self.norm2 = norm_layer(dim)
else:
raise TypeError(
"The norm_layer must be str or paddle.nn.layer.Layer class")
mlp_hidden_dim = int(dim * mlp_ratio)
self.mlp_ratio = mlp_ratio
self.mlp = Mlp(in_features=dim,
hidden_features=mlp_hidden_dim,
act_layer=act_layer,
drop=drop)
self.prenorm = prenorm
def forward(self, x):
if self.prenorm:
x = self.norm1(x + self.drop_path(self.mixer(x)))
x = self.norm2(x + self.drop_path(self.mlp(x)))
else:
x = x + self.drop_path(self.mixer(self.norm1(x)))
x = x + self.drop_path(self.mlp(self.norm2(x)))
return x
class PatchEmbed(nn.Layer):
""" Image to Patch Embedding
"""
def __init__(self,
img_size=[32, 100],
in_channels=3,
embed_dim=768,
sub_num=2):
super().__init__()
num_patches = (img_size[1] // (2 ** sub_num)) * \
(img_size[0] // (2 ** sub_num))
self.img_size = img_size
self.num_patches = num_patches
self.embed_dim = embed_dim
self.norm = None
if sub_num == 2:
self.proj = nn.Sequential(
ConvBNLayer(
in_channels=in_channels,
out_channels=embed_dim // 2,
kernel_size=3,
stride=2,
padding=1,
act=nn.GELU,
bias_attr=None),
ConvBNLayer(
in_channels=embed_dim // 2,
out_channels=embed_dim,
kernel_size=3,
stride=2,
padding=1,
act=nn.GELU,
bias_attr=None))
if sub_num == 3:
self.proj = nn.Sequential(
ConvBNLayer(
in_channels=in_channels,
out_channels=embed_dim // 4,
kernel_size=3,
stride=2,
padding=1,
act=nn.GELU,
bias_attr=None),
ConvBNLayer(
in_channels=embed_dim // 4,
out_channels=embed_dim // 2,
kernel_size=3,
stride=2,
padding=1,
act=nn.GELU,
bias_attr=None),
ConvBNLayer(
embed_dim // 2,
embed_dim,
in_channels=embed_dim // 2,
out_channels=embed_dim,
kernel_size=3,
stride=2,
padding=1,
act=nn.GELU,
bias_attr=None))
def forward(self, x):
B, C, H, W = x.shape
assert H == self.img_size[0] and W == self.img_size[1], \
f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
x = self.proj(x).flatten(2).transpose((0, 2, 1))
return x
class SubSample(nn.Layer):
def __init__(self,
in_channels,
out_channels,
types='Pool',
stride=[2, 1],
sub_norm='nn.LayerNorm',
act=None):
super().__init__()
self.types = types
if types == 'Pool':
self.avgpool = nn.AvgPool2D(
kernel_size=[3, 5], stride=stride, padding=[1, 2])
self.maxpool = nn.MaxPool2D(
kernel_size=[3, 5], stride=stride, padding=[1, 2])
self.proj = nn.Linear(in_channels, out_channels)
else:
self.conv = nn.Conv2D(
in_channels,
out_channels,
kernel_size=3,
stride=stride,
padding=1,
weight_attr=ParamAttr(initializer=KaimingNormal()))
self.norm = eval(sub_norm)(out_channels)
if act is not None:
self.act = act()
else:
self.act = None
def forward(self, x):
if self.types == 'Pool':
x1 = self.avgpool(x)
x2 = self.maxpool(x)
x = (x1 + x2) * 0.5
out = self.proj(x.flatten(2).transpose((0, 2, 1)))
else:
x = self.conv(x)
out = x.flatten(2).transpose((0, 2, 1))
out = self.norm(out)
if self.act is not None:
out = self.act(out)
return out
class SVTRNet(nn.Layer):
def __init__(
self,
img_size=[32, 100],
in_channels=3,
embed_dim=[64, 128, 256],
depth=[3, 6, 3],
num_heads=[2, 4, 8],
mixer=['Local'] * 6 + ['Global'] *
6, # Local atten, Global atten, Conv
local_mixer=[[7, 11], [7, 11], [7, 11]],
patch_merging='Conv', # Conv, Pool, None
mlp_ratio=4,
qkv_bias=True,
qk_scale=None,
drop_rate=0.,
last_drop=0.1,
attn_drop_rate=0.,
drop_path_rate=0.1,
norm_layer='nn.LayerNorm',
sub_norm='nn.LayerNorm',
epsilon=1e-6,
out_channels=192,
out_char_num=25,
block_unit='Block',
act='nn.GELU',
last_stage=True,
sub_num=2,
prenorm=True,
use_lenhead=False,
**kwargs):
super().__init__()
self.img_size = img_size
self.embed_dim = embed_dim
self.out_channels = out_channels
self.prenorm = prenorm
patch_merging = None if patch_merging != 'Conv' and patch_merging != 'Pool' else patch_merging
self.patch_embed = PatchEmbed(
img_size=img_size,
in_channels=in_channels,
embed_dim=embed_dim[0],
sub_num=sub_num)
num_patches = self.patch_embed.num_patches
self.HW = [img_size[0] // (2**sub_num), img_size[1] // (2**sub_num)]
self.pos_embed = self.create_parameter(
shape=[1, num_patches, embed_dim[0]], default_initializer=zeros_)
self.add_parameter("pos_embed", self.pos_embed)
self.pos_drop = nn.Dropout(p=drop_rate)
Block_unit = eval(block_unit)
dpr = np.linspace(0, drop_path_rate, sum(depth))
self.blocks1 = nn.LayerList([
Block_unit(
dim=embed_dim[0],
num_heads=num_heads[0],
mixer=mixer[0:depth[0]][i],
HW=self.HW,
local_mixer=local_mixer[0],
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
drop=drop_rate,
act_layer=eval(act),
attn_drop=attn_drop_rate,
drop_path=dpr[0:depth[0]][i],
norm_layer=norm_layer,
epsilon=epsilon,
prenorm=prenorm) for i in range(depth[0])
])
if patch_merging is not None:
self.sub_sample1 = SubSample(
embed_dim[0],
embed_dim[1],
sub_norm=sub_norm,
stride=[2, 1],
types=patch_merging)
HW = [self.HW[0] // 2, self.HW[1]]
else:
HW = self.HW
self.patch_merging = patch_merging
self.blocks2 = nn.LayerList([
Block_unit(
dim=embed_dim[1],
num_heads=num_heads[1],
mixer=mixer[depth[0]:depth[0] + depth[1]][i],
HW=HW,
local_mixer=local_mixer[1],
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
drop=drop_rate,
act_layer=eval(act),
attn_drop=attn_drop_rate,
drop_path=dpr[depth[0]:depth[0] + depth[1]][i],
norm_layer=norm_layer,
epsilon=epsilon,
prenorm=prenorm) for i in range(depth[1])
])
if patch_merging is not None:
self.sub_sample2 = SubSample(
embed_dim[1],
embed_dim[2],
sub_norm=sub_norm,
stride=[2, 1],
types=patch_merging)
HW = [self.HW[0] // 4, self.HW[1]]
else:
HW = self.HW
self.blocks3 = nn.LayerList([
Block_unit(
dim=embed_dim[2],
num_heads=num_heads[2],
mixer=mixer[depth[0] + depth[1]:][i],
HW=HW,
local_mixer=local_mixer[2],
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
drop=drop_rate,
act_layer=eval(act),
attn_drop=attn_drop_rate,
drop_path=dpr[depth[0] + depth[1]:][i],
norm_layer=norm_layer,
epsilon=epsilon,
prenorm=prenorm) for i in range(depth[2])
])
self.last_stage = last_stage
if last_stage:
self.avg_pool = nn.AdaptiveAvgPool2D([1, out_char_num])
self.last_conv = nn.Conv2D(
in_channels=embed_dim[2],
out_channels=self.out_channels,
kernel_size=1,
stride=1,
padding=0,
bias_attr=False)
self.hardswish = nn.Hardswish()
self.dropout = nn.Dropout(p=last_drop, mode="downscale_in_infer")
if not prenorm:
self.norm = eval(norm_layer)(embed_dim[-1], epsilon=epsilon)
self.use_lenhead = use_lenhead
if use_lenhead:
self.len_conv = nn.Linear(embed_dim[2], self.out_channels)
self.hardswish_len = nn.Hardswish()
self.dropout_len = nn.Dropout(
p=last_drop, mode="downscale_in_infer")
trunc_normal_(self.pos_embed)
self.apply(self._init_weights)
def _init_weights(self, m):
if isinstance(m, nn.Linear):
trunc_normal_(m.weight)
if isinstance(m, nn.Linear) and m.bias is not None:
zeros_(m.bias)
elif isinstance(m, nn.LayerNorm):
zeros_(m.bias)
ones_(m.weight)
def forward_features(self, x):
x = self.patch_embed(x)
x = x + self.pos_embed
x = self.pos_drop(x)
for blk in self.blocks1:
x = blk(x)
if self.patch_merging is not None:
x = self.sub_sample1(
x.transpose([0, 2, 1]).reshape(
[0, self.embed_dim[0], self.HW[0], self.HW[1]]))
for blk in self.blocks2:
x = blk(x)
if self.patch_merging is not None:
x = self.sub_sample2(
x.transpose([0, 2, 1]).reshape(
[0, self.embed_dim[1], self.HW[0] // 2, self.HW[1]]))
for blk in self.blocks3:
x = blk(x)
if not self.prenorm:
x = self.norm(x)
return x
def forward(self, x):
x = self.forward_features(x)
if self.use_lenhead:
len_x = self.len_conv(x.mean(1))
len_x = self.dropout_len(self.hardswish_len(len_x))
if self.last_stage:
if self.patch_merging is not None:
h = self.HW[0] // 4
else:
h = self.HW[0]
x = self.avg_pool(
x.transpose([0, 2, 1]).reshape(
[0, self.embed_dim[2], h, self.HW[1]]))
x = self.last_conv(x)
x = self.hardswish(x)
x = self.dropout(x)
if self.use_lenhead:
return x, len_x
return x
...@@ -32,6 +32,7 @@ def build_head(config): ...@@ -32,6 +32,7 @@ def build_head(config):
from .rec_sar_head import SARHead from .rec_sar_head import SARHead
from .rec_aster_head import AsterHead from .rec_aster_head import AsterHead
from .rec_pren_head import PRENHead from .rec_pren_head import PRENHead
from .rec_multi_head import MultiHead
# cls head # cls head
from .cls_head import ClsHead from .cls_head import ClsHead
...@@ -44,7 +45,8 @@ def build_head(config): ...@@ -44,7 +45,8 @@ def build_head(config):
support_dict = [ support_dict = [
'DBHead', 'PSEHead', 'FCEHead', 'EASTHead', 'SASTHead', 'CTCHead', 'DBHead', 'PSEHead', 'FCEHead', 'EASTHead', 'SASTHead', 'CTCHead',
'ClsHead', 'AttentionHead', 'SRNHead', 'PGHead', 'Transformer', 'ClsHead', 'AttentionHead', 'SRNHead', 'PGHead', 'Transformer',
'TableAttentionHead', 'SARHead', 'AsterHead', 'SDMGRHead', 'PRENHead' 'TableAttentionHead', 'SARHead', 'AsterHead', 'SDMGRHead', 'PRENHead',
'MultiHead'
] ]
#table head #table head
......
...@@ -31,13 +31,14 @@ def get_bias_attr(k): ...@@ -31,13 +31,14 @@ def get_bias_attr(k):
class Head(nn.Layer): class Head(nn.Layer):
def __init__(self, in_channels, name_list): def __init__(self, in_channels, name_list, kernel_list=[3, 2, 2], **kwargs):
super(Head, self).__init__() super(Head, self).__init__()
self.conv1 = nn.Conv2D( self.conv1 = nn.Conv2D(
in_channels=in_channels, in_channels=in_channels,
out_channels=in_channels // 4, out_channels=in_channels // 4,
kernel_size=3, kernel_size=kernel_list[0],
padding=1, padding=int(kernel_list[0] // 2),
weight_attr=ParamAttr(), weight_attr=ParamAttr(),
bias_attr=False) bias_attr=False)
self.conv_bn1 = nn.BatchNorm( self.conv_bn1 = nn.BatchNorm(
...@@ -50,7 +51,7 @@ class Head(nn.Layer): ...@@ -50,7 +51,7 @@ class Head(nn.Layer):
self.conv2 = nn.Conv2DTranspose( self.conv2 = nn.Conv2DTranspose(
in_channels=in_channels // 4, in_channels=in_channels // 4,
out_channels=in_channels // 4, out_channels=in_channels // 4,
kernel_size=2, kernel_size=kernel_list[1],
stride=2, stride=2,
weight_attr=ParamAttr( weight_attr=ParamAttr(
initializer=paddle.nn.initializer.KaimingUniform()), initializer=paddle.nn.initializer.KaimingUniform()),
...@@ -65,7 +66,7 @@ class Head(nn.Layer): ...@@ -65,7 +66,7 @@ class Head(nn.Layer):
self.conv3 = nn.Conv2DTranspose( self.conv3 = nn.Conv2DTranspose(
in_channels=in_channels // 4, in_channels=in_channels // 4,
out_channels=1, out_channels=1,
kernel_size=2, kernel_size=kernel_list[2],
stride=2, stride=2,
weight_attr=ParamAttr( weight_attr=ParamAttr(
initializer=paddle.nn.initializer.KaimingUniform()), initializer=paddle.nn.initializer.KaimingUniform()),
...@@ -100,8 +101,8 @@ class DBHead(nn.Layer): ...@@ -100,8 +101,8 @@ class DBHead(nn.Layer):
'conv2d_57', 'batch_norm_49', 'conv2d_transpose_2', 'batch_norm_50', 'conv2d_57', 'batch_norm_49', 'conv2d_transpose_2', 'batch_norm_50',
'conv2d_transpose_3', 'thresh' 'conv2d_transpose_3', 'thresh'
] ]
self.binarize = Head(in_channels, binarize_name_list) self.binarize = Head(in_channels, binarize_name_list, **kwargs)
self.thresh = Head(in_channels, thresh_name_list) self.thresh = Head(in_channels, thresh_name_list, **kwargs)
def step_function(self, x, y): def step_function(self, x, y):
return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y))) return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import paddle
from paddle import ParamAttr
import paddle.nn as nn
import paddle.nn.functional as F
from ppocr.modeling.necks.rnn import Im2Seq, EncoderWithRNN, EncoderWithFC, SequenceEncoder, EncoderWithSVTR
from .rec_ctc_head import CTCHead
from .rec_sar_head import SARHead
class MultiHead(nn.Layer):
def __init__(self, in_channels, out_channels_list, **kwargs):
super().__init__()
self.head_list = kwargs.pop('head_list')
self.gtc_head = 'sar'
assert len(self.head_list) >= 2
for idx, head_name in enumerate(self.head_list):
name = list(head_name)[0]
if name == 'SARHead':
# sar head
sar_args = self.head_list[idx][name]
self.sar_head = eval(name)(in_channels=in_channels, \
out_channels=out_channels_list['SARLabelDecode'], **sar_args)
elif name == 'CTCHead':
# ctc neck
self.encoder_reshape = Im2Seq(in_channels)
neck_args = self.head_list[idx][name]['Neck']
encoder_type = neck_args.pop('name')
self.encoder = encoder_type
self.ctc_encoder = SequenceEncoder(in_channels=in_channels, \
encoder_type=encoder_type, **neck_args)
# ctc head
head_args = self.head_list[idx][name]['Head']
self.ctc_head = eval(name)(in_channels=self.ctc_encoder.out_channels, \
out_channels=out_channels_list['CTCLabelDecode'], **head_args)
else:
raise NotImplementedError(
'{} is not supported in MultiHead yet'.format(name))
def forward(self, x, targets=None):
ctc_encoder = self.ctc_encoder(x)
ctc_out = self.ctc_head(ctc_encoder, targets)
head_out = dict()
head_out['ctc'] = ctc_out
head_out['ctc_neck'] = ctc_encoder
# eval mode
if not self.training:
return ctc_out
if self.gtc_head == 'sar':
sar_out = self.sar_head(x, targets[1:])
head_out['sar'] = sar_out
return head_out
else:
return head_out
...@@ -349,7 +349,10 @@ class ParallelSARDecoder(BaseDecoder): ...@@ -349,7 +349,10 @@ class ParallelSARDecoder(BaseDecoder):
class SARHead(nn.Layer): class SARHead(nn.Layer):
def __init__(self, def __init__(self,
in_channels,
out_channels, out_channels,
enc_dim=512,
max_text_length=30,
enc_bi_rnn=False, enc_bi_rnn=False,
enc_drop_rnn=0.1, enc_drop_rnn=0.1,
enc_gru=False, enc_gru=False,
...@@ -358,14 +361,17 @@ class SARHead(nn.Layer): ...@@ -358,14 +361,17 @@ class SARHead(nn.Layer):
dec_gru=False, dec_gru=False,
d_k=512, d_k=512,
pred_dropout=0.1, pred_dropout=0.1,
max_text_length=30,
pred_concat=True, pred_concat=True,
**kwargs): **kwargs):
super(SARHead, self).__init__() super(SARHead, self).__init__()
# encoder module # encoder module
self.encoder = SAREncoder( self.encoder = SAREncoder(
enc_bi_rnn=enc_bi_rnn, enc_drop_rnn=enc_drop_rnn, enc_gru=enc_gru) enc_bi_rnn=enc_bi_rnn,
enc_drop_rnn=enc_drop_rnn,
enc_gru=enc_gru,
d_model=in_channels,
d_enc=enc_dim)
# decoder module # decoder module
self.decoder = ParallelSARDecoder( self.decoder = ParallelSARDecoder(
...@@ -374,6 +380,8 @@ class SARHead(nn.Layer): ...@@ -374,6 +380,8 @@ class SARHead(nn.Layer):
dec_bi_rnn=dec_bi_rnn, dec_bi_rnn=dec_bi_rnn,
dec_drop_rnn=dec_drop_rnn, dec_drop_rnn=dec_drop_rnn,
dec_gru=dec_gru, dec_gru=dec_gru,
d_model=in_channels,
d_enc=enc_dim,
d_k=d_k, d_k=d_k,
pred_dropout=pred_dropout, pred_dropout=pred_dropout,
max_text_length=max_text_length, max_text_length=max_text_length,
...@@ -390,7 +398,7 @@ class SARHead(nn.Layer): ...@@ -390,7 +398,7 @@ class SARHead(nn.Layer):
label = paddle.to_tensor(label, dtype='int64') label = paddle.to_tensor(label, dtype='int64')
final_out = self.decoder( final_out = self.decoder(
feat, holistic_feat, label, img_metas=targets) feat, holistic_feat, label, img_metas=targets)
if not self.training: else:
final_out = self.decoder( final_out = self.decoder(
feat, feat,
holistic_feat, holistic_feat,
......
...@@ -16,7 +16,7 @@ __all__ = ['build_neck'] ...@@ -16,7 +16,7 @@ __all__ = ['build_neck']
def build_neck(config): def build_neck(config):
from .db_fpn import DBFPN from .db_fpn import DBFPN, RSEFPN, LKPAN
from .east_fpn import EASTFPN from .east_fpn import EASTFPN
from .sast_fpn import SASTFPN from .sast_fpn import SASTFPN
from .rnn import SequenceEncoder from .rnn import SequenceEncoder
...@@ -26,8 +26,8 @@ def build_neck(config): ...@@ -26,8 +26,8 @@ def build_neck(config):
from .fce_fpn import FCEFPN from .fce_fpn import FCEFPN
from .pren_fpn import PRENFPN from .pren_fpn import PRENFPN
support_dict = [ support_dict = [
'FPN', 'FCEFPN', 'DBFPN', 'EASTFPN', 'SASTFPN', 'SequenceEncoder', 'FPN', 'FCEFPN', 'LKPAN', 'DBFPN', 'RSEFPN', 'EASTFPN', 'SASTFPN',
'PGFPN', 'TableFPN', 'PRENFPN' 'SequenceEncoder', 'PGFPN', 'TableFPN', 'PRENFPN'
] ]
module_name = config.pop('name') module_name = config.pop('name')
......
...@@ -20,6 +20,88 @@ import paddle ...@@ -20,6 +20,88 @@ import paddle
from paddle import nn from paddle import nn
import paddle.nn.functional as F import paddle.nn.functional as F
from paddle import ParamAttr from paddle import ParamAttr
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.insert(0, os.path.abspath(os.path.join(__dir__, '../../..')))
from ppocr.modeling.backbones.det_mobilenet_v3 import SEModule
class DSConv(nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size,
padding,
stride=1,
groups=None,
if_act=True,
act="relu",
**kwargs):
super(DSConv, self).__init__()
if groups == None:
groups = in_channels
self.if_act = if_act
self.act = act
self.conv1 = nn.Conv2D(
in_channels=in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias_attr=False)
self.bn1 = nn.BatchNorm(num_channels=in_channels, act=None)
self.conv2 = nn.Conv2D(
in_channels=in_channels,
out_channels=int(in_channels * 4),
kernel_size=1,
stride=1,
bias_attr=False)
self.bn2 = nn.BatchNorm(num_channels=int(in_channels * 4), act=None)
self.conv3 = nn.Conv2D(
in_channels=int(in_channels * 4),
out_channels=out_channels,
kernel_size=1,
stride=1,
bias_attr=False)
self._c = [in_channels, out_channels]
if in_channels != out_channels:
self.conv_end = nn.Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
bias_attr=False)
def forward(self, inputs):
x = self.conv1(inputs)
x = self.bn1(x)
x = self.conv2(x)
x = self.bn2(x)
if self.if_act:
if self.act == "relu":
x = F.relu(x)
elif self.act == "hardswish":
x = F.hardswish(x)
else:
print("The activation function({}) is selected incorrectly.".
format(self.act))
exit()
x = self.conv3(x)
if self._c[0] != self._c[1]:
x = x + self.conv_end(inputs)
return x
class DBFPN(nn.Layer): class DBFPN(nn.Layer):
...@@ -106,3 +188,171 @@ class DBFPN(nn.Layer): ...@@ -106,3 +188,171 @@ class DBFPN(nn.Layer):
fuse = paddle.concat([p5, p4, p3, p2], axis=1) fuse = paddle.concat([p5, p4, p3, p2], axis=1)
return fuse return fuse
class RSELayer(nn.Layer):
def __init__(self, in_channels, out_channels, kernel_size, shortcut=True):
super(RSELayer, self).__init__()
weight_attr = paddle.nn.initializer.KaimingUniform()
self.out_channels = out_channels
self.in_conv = nn.Conv2D(
in_channels=in_channels,
out_channels=self.out_channels,
kernel_size=kernel_size,
padding=int(kernel_size // 2),
weight_attr=ParamAttr(initializer=weight_attr),
bias_attr=False)
self.se_block = SEModule(self.out_channels)
self.shortcut = shortcut
def forward(self, ins):
x = self.in_conv(ins)
if self.shortcut:
out = x + self.se_block(x)
else:
out = self.se_block(x)
return out
class RSEFPN(nn.Layer):
def __init__(self, in_channels, out_channels, shortcut=True, **kwargs):
super(RSEFPN, self).__init__()
self.out_channels = out_channels
self.ins_conv = nn.LayerList()
self.inp_conv = nn.LayerList()
for i in range(len(in_channels)):
self.ins_conv.append(
RSELayer(
in_channels[i],
out_channels,
kernel_size=1,
shortcut=shortcut))
self.inp_conv.append(
RSELayer(
out_channels,
out_channels // 4,
kernel_size=3,
shortcut=shortcut))
def forward(self, x):
c2, c3, c4, c5 = x
in5 = self.ins_conv[3](c5)
in4 = self.ins_conv[2](c4)
in3 = self.ins_conv[1](c3)
in2 = self.ins_conv[0](c2)
out4 = in4 + F.upsample(
in5, scale_factor=2, mode="nearest", align_mode=1) # 1/16
out3 = in3 + F.upsample(
out4, scale_factor=2, mode="nearest", align_mode=1) # 1/8
out2 = in2 + F.upsample(
out3, scale_factor=2, mode="nearest", align_mode=1) # 1/4
p5 = self.inp_conv[3](in5)
p4 = self.inp_conv[2](out4)
p3 = self.inp_conv[1](out3)
p2 = self.inp_conv[0](out2)
p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)
fuse = paddle.concat([p5, p4, p3, p2], axis=1)
return fuse
class LKPAN(nn.Layer):
def __init__(self, in_channels, out_channels, mode='large', **kwargs):
super(LKPAN, self).__init__()
self.out_channels = out_channels
weight_attr = paddle.nn.initializer.KaimingUniform()
self.ins_conv = nn.LayerList()
self.inp_conv = nn.LayerList()
# pan head
self.pan_head_conv = nn.LayerList()
self.pan_lat_conv = nn.LayerList()
if mode.lower() == 'lite':
p_layer = DSConv
elif mode.lower() == 'large':
p_layer = nn.Conv2D
else:
raise ValueError(
"mode can only be one of ['lite', 'large'], but received {}".
format(mode))
for i in range(len(in_channels)):
self.ins_conv.append(
nn.Conv2D(
in_channels=in_channels[i],
out_channels=self.out_channels,
kernel_size=1,
weight_attr=ParamAttr(initializer=weight_attr),
bias_attr=False))
self.inp_conv.append(
p_layer(
in_channels=self.out_channels,
out_channels=self.out_channels // 4,
kernel_size=9,
padding=4,
weight_attr=ParamAttr(initializer=weight_attr),
bias_attr=False))
if i > 0:
self.pan_head_conv.append(
nn.Conv2D(
in_channels=self.out_channels // 4,
out_channels=self.out_channels // 4,
kernel_size=3,
padding=1,
stride=2,
weight_attr=ParamAttr(initializer=weight_attr),
bias_attr=False))
self.pan_lat_conv.append(
p_layer(
in_channels=self.out_channels // 4,
out_channels=self.out_channels // 4,
kernel_size=9,
padding=4,
weight_attr=ParamAttr(initializer=weight_attr),
bias_attr=False))
def forward(self, x):
c2, c3, c4, c5 = x
in5 = self.ins_conv[3](c5)
in4 = self.ins_conv[2](c4)
in3 = self.ins_conv[1](c3)
in2 = self.ins_conv[0](c2)
out4 = in4 + F.upsample(
in5, scale_factor=2, mode="nearest", align_mode=1) # 1/16
out3 = in3 + F.upsample(
out4, scale_factor=2, mode="nearest", align_mode=1) # 1/8
out2 = in2 + F.upsample(
out3, scale_factor=2, mode="nearest", align_mode=1) # 1/4
f5 = self.inp_conv[3](in5)
f4 = self.inp_conv[2](out4)
f3 = self.inp_conv[1](out3)
f2 = self.inp_conv[0](out2)
pan3 = f3 + self.pan_head_conv[0](f2)
pan4 = f4 + self.pan_head_conv[1](pan3)
pan5 = f5 + self.pan_head_conv[2](pan4)
p2 = self.pan_lat_conv[0](f2)
p3 = self.pan_lat_conv[1](pan3)
p4 = self.pan_lat_conv[2](pan4)
p5 = self.pan_lat_conv[3](pan5)
p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)
fuse = paddle.concat([p5, p4, p3, p2], axis=1)
return fuse
...@@ -16,9 +16,11 @@ from __future__ import absolute_import ...@@ -16,9 +16,11 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import paddle
from paddle import nn from paddle import nn
from ppocr.modeling.heads.rec_ctc_head import get_para_bias_attr from ppocr.modeling.heads.rec_ctc_head import get_para_bias_attr
from ppocr.modeling.backbones.rec_svtrnet import Block, ConvBNLayer, trunc_normal_, zeros_, ones_
class Im2Seq(nn.Layer): class Im2Seq(nn.Layer):
...@@ -64,29 +66,126 @@ class EncoderWithFC(nn.Layer): ...@@ -64,29 +66,126 @@ class EncoderWithFC(nn.Layer):
return x return x
class EncoderWithSVTR(nn.Layer):
def __init__(
self,
in_channels,
dims=64, # XS
depth=2,
hidden_dims=120,
use_guide=False,
num_heads=8,
qkv_bias=True,
mlp_ratio=2.0,
drop_rate=0.1,
attn_drop_rate=0.1,
drop_path=0.,
qk_scale=None):
super(EncoderWithSVTR, self).__init__()
self.depth = depth
self.use_guide = use_guide
self.conv1 = ConvBNLayer(
in_channels, in_channels // 8, padding=1, act=nn.Swish)
self.conv2 = ConvBNLayer(
in_channels // 8, hidden_dims, kernel_size=1, act=nn.Swish)
self.svtr_block = nn.LayerList([
Block(
dim=hidden_dims,
num_heads=num_heads,
mixer='Global',
HW=None,
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
drop=drop_rate,
act_layer=nn.Swish,
attn_drop=attn_drop_rate,
drop_path=drop_path,
norm_layer='nn.LayerNorm',
epsilon=1e-05,
prenorm=False) for i in range(depth)
])
self.norm = nn.LayerNorm(hidden_dims, epsilon=1e-6)
self.conv3 = ConvBNLayer(
hidden_dims, in_channels, kernel_size=1, act=nn.Swish)
# last conv-nxn, the input is concat of input tensor and conv3 output tensor
self.conv4 = ConvBNLayer(
2 * in_channels, in_channels // 8, padding=1, act=nn.Swish)
self.conv1x1 = ConvBNLayer(
in_channels // 8, dims, kernel_size=1, act=nn.Swish)
self.out_channels = dims
self.apply(self._init_weights)
def _init_weights(self, m):
if isinstance(m, nn.Linear):
trunc_normal_(m.weight)
if isinstance(m, nn.Linear) and m.bias is not None:
zeros_(m.bias)
elif isinstance(m, nn.LayerNorm):
zeros_(m.bias)
ones_(m.weight)
def forward(self, x):
# for use guide
if self.use_guide:
z = x.clone()
z.stop_gradient = True
else:
z = x
# for short cut
h = z
# reduce dim
z = self.conv1(z)
z = self.conv2(z)
# SVTR global block
B, C, H, W = z.shape
z = z.flatten(2).transpose([0, 2, 1])
for blk in self.svtr_block:
z = blk(z)
z = self.norm(z)
# last stage
z = z.reshape([0, H, W, C]).transpose([0, 3, 1, 2])
z = self.conv3(z)
z = paddle.concat((h, z), axis=1)
z = self.conv1x1(self.conv4(z))
return z
class SequenceEncoder(nn.Layer): class SequenceEncoder(nn.Layer):
def __init__(self, in_channels, encoder_type, hidden_size=48, **kwargs): def __init__(self, in_channels, encoder_type, hidden_size=48, **kwargs):
super(SequenceEncoder, self).__init__() super(SequenceEncoder, self).__init__()
self.encoder_reshape = Im2Seq(in_channels) self.encoder_reshape = Im2Seq(in_channels)
self.out_channels = self.encoder_reshape.out_channels self.out_channels = self.encoder_reshape.out_channels
self.encoder_type = encoder_type
if encoder_type == 'reshape': if encoder_type == 'reshape':
self.only_reshape = True self.only_reshape = True
else: else:
support_encoder_dict = { support_encoder_dict = {
'reshape': Im2Seq, 'reshape': Im2Seq,
'fc': EncoderWithFC, 'fc': EncoderWithFC,
'rnn': EncoderWithRNN 'rnn': EncoderWithRNN,
'svtr': EncoderWithSVTR
} }
assert encoder_type in support_encoder_dict, '{} must in {}'.format( assert encoder_type in support_encoder_dict, '{} must in {}'.format(
encoder_type, support_encoder_dict.keys()) encoder_type, support_encoder_dict.keys())
if encoder_type == "svtr":
self.encoder = support_encoder_dict[encoder_type](
self.encoder_reshape.out_channels, **kwargs)
else:
self.encoder = support_encoder_dict[encoder_type]( self.encoder = support_encoder_dict[encoder_type](
self.encoder_reshape.out_channels, hidden_size) self.encoder_reshape.out_channels, hidden_size)
self.out_channels = self.encoder.out_channels self.out_channels = self.encoder.out_channels
self.only_reshape = False self.only_reshape = False
def forward(self, x): def forward(self, x):
if self.encoder_type != 'svtr':
x = self.encoder_reshape(x) x = self.encoder_reshape(x)
if not self.only_reshape: if not self.only_reshape:
x = self.encoder(x) x = self.encoder(x)
return x return x
else:
x = self.encoder(x)
x = self.encoder_reshape(x)
return x
...@@ -128,6 +128,8 @@ class STN_ON(nn.Layer): ...@@ -128,6 +128,8 @@ class STN_ON(nn.Layer):
self.out_channels = in_channels self.out_channels = in_channels
def forward(self, image): def forward(self, image):
if len(image.shape)==5:
image = image.reshape([0, image.shape[-3], image.shape[-2], image.shape[-1]])
stn_input = paddle.nn.functional.interpolate( stn_input = paddle.nn.functional.interpolate(
image, self.tps_inputsize, mode="bilinear", align_corners=True) image, self.tps_inputsize, mode="bilinear", align_corners=True)
stn_img_feat, ctrl_points = self.stn_head(stn_input) stn_img_feat, ctrl_points = self.stn_head(stn_input)
......
...@@ -138,9 +138,9 @@ class TPSSpatialTransformer(nn.Layer): ...@@ -138,9 +138,9 @@ class TPSSpatialTransformer(nn.Layer):
assert source_control_points.shape[2] == 2 assert source_control_points.shape[2] == 2
batch_size = paddle.shape(source_control_points)[0] batch_size = paddle.shape(source_control_points)[0]
self.padding_matrix = paddle.expand( padding_matrix = paddle.expand(
self.padding_matrix, shape=[batch_size, 3, 2]) self.padding_matrix, shape=[batch_size, 3, 2])
Y = paddle.concat([source_control_points, self.padding_matrix], 1) Y = paddle.concat([source_control_points, padding_matrix], 1)
mapping_matrix = paddle.matmul(self.inverse_kernel, Y) mapping_matrix = paddle.matmul(self.inverse_kernel, Y)
source_coordinate = paddle.matmul(self.target_coordinate_repr, source_coordinate = paddle.matmul(self.target_coordinate_repr,
mapping_matrix) mapping_matrix)
......
...@@ -30,7 +30,7 @@ def build_lr_scheduler(lr_config, epochs, step_each_epoch): ...@@ -30,7 +30,7 @@ def build_lr_scheduler(lr_config, epochs, step_each_epoch):
return lr return lr
def build_optimizer(config, epochs, step_each_epoch, parameters): def build_optimizer(config, epochs, step_each_epoch, model):
from . import regularizer, optimizer from . import regularizer, optimizer
config = copy.deepcopy(config) config = copy.deepcopy(config)
# step1 build lr # step1 build lr
...@@ -43,6 +43,8 @@ def build_optimizer(config, epochs, step_each_epoch, parameters): ...@@ -43,6 +43,8 @@ def build_optimizer(config, epochs, step_each_epoch, parameters):
if not hasattr(regularizer, reg_name): if not hasattr(regularizer, reg_name):
reg_name += 'Decay' reg_name += 'Decay'
reg = getattr(regularizer, reg_name)(**reg_config)() reg = getattr(regularizer, reg_name)(**reg_config)()
elif 'weight_decay' in config:
reg = config.pop('weight_decay')
else: else:
reg = None reg = None
...@@ -57,4 +59,4 @@ def build_optimizer(config, epochs, step_each_epoch, parameters): ...@@ -57,4 +59,4 @@ def build_optimizer(config, epochs, step_each_epoch, parameters):
weight_decay=reg, weight_decay=reg,
grad_clip=grad_clip, grad_clip=grad_clip,
**config) **config)
return optim(parameters), lr return optim(model), lr
...@@ -42,13 +42,13 @@ class Momentum(object): ...@@ -42,13 +42,13 @@ class Momentum(object):
self.weight_decay = weight_decay self.weight_decay = weight_decay
self.grad_clip = grad_clip self.grad_clip = grad_clip
def __call__(self, parameters): def __call__(self, model):
opt = optim.Momentum( opt = optim.Momentum(
learning_rate=self.learning_rate, learning_rate=self.learning_rate,
momentum=self.momentum, momentum=self.momentum,
weight_decay=self.weight_decay, weight_decay=self.weight_decay,
grad_clip=self.grad_clip, grad_clip=self.grad_clip,
parameters=parameters) parameters=model.parameters())
return opt return opt
...@@ -75,7 +75,7 @@ class Adam(object): ...@@ -75,7 +75,7 @@ class Adam(object):
self.name = name self.name = name
self.lazy_mode = lazy_mode self.lazy_mode = lazy_mode
def __call__(self, parameters): def __call__(self, model):
opt = optim.Adam( opt = optim.Adam(
learning_rate=self.learning_rate, learning_rate=self.learning_rate,
beta1=self.beta1, beta1=self.beta1,
...@@ -85,7 +85,7 @@ class Adam(object): ...@@ -85,7 +85,7 @@ class Adam(object):
grad_clip=self.grad_clip, grad_clip=self.grad_clip,
name=self.name, name=self.name,
lazy_mode=self.lazy_mode, lazy_mode=self.lazy_mode,
parameters=parameters) parameters=model.parameters())
return opt return opt
...@@ -117,7 +117,7 @@ class RMSProp(object): ...@@ -117,7 +117,7 @@ class RMSProp(object):
self.weight_decay = weight_decay self.weight_decay = weight_decay
self.grad_clip = grad_clip self.grad_clip = grad_clip
def __call__(self, parameters): def __call__(self, model):
opt = optim.RMSProp( opt = optim.RMSProp(
learning_rate=self.learning_rate, learning_rate=self.learning_rate,
momentum=self.momentum, momentum=self.momentum,
...@@ -125,7 +125,7 @@ class RMSProp(object): ...@@ -125,7 +125,7 @@ class RMSProp(object):
epsilon=self.epsilon, epsilon=self.epsilon,
weight_decay=self.weight_decay, weight_decay=self.weight_decay,
grad_clip=self.grad_clip, grad_clip=self.grad_clip,
parameters=parameters) parameters=model.parameters())
return opt return opt
...@@ -148,7 +148,7 @@ class Adadelta(object): ...@@ -148,7 +148,7 @@ class Adadelta(object):
self.grad_clip = grad_clip self.grad_clip = grad_clip
self.name = name self.name = name
def __call__(self, parameters): def __call__(self, model):
opt = optim.Adadelta( opt = optim.Adadelta(
learning_rate=self.learning_rate, learning_rate=self.learning_rate,
epsilon=self.epsilon, epsilon=self.epsilon,
...@@ -156,7 +156,7 @@ class Adadelta(object): ...@@ -156,7 +156,7 @@ class Adadelta(object):
weight_decay=self.weight_decay, weight_decay=self.weight_decay,
grad_clip=self.grad_clip, grad_clip=self.grad_clip,
name=self.name, name=self.name,
parameters=parameters) parameters=model.parameters())
return opt return opt
...@@ -165,31 +165,55 @@ class AdamW(object): ...@@ -165,31 +165,55 @@ class AdamW(object):
learning_rate=0.001, learning_rate=0.001,
beta1=0.9, beta1=0.9,
beta2=0.999, beta2=0.999,
epsilon=1e-08, epsilon=1e-8,
weight_decay=0.01, weight_decay=0.01,
multi_precision=False,
grad_clip=None, grad_clip=None,
no_weight_decay_name=None,
one_dim_param_no_weight_decay=False,
name=None, name=None,
lazy_mode=False, lazy_mode=False,
**kwargs): **args):
super().__init__()
self.learning_rate = learning_rate self.learning_rate = learning_rate
self.beta1 = beta1 self.beta1 = beta1
self.beta2 = beta2 self.beta2 = beta2
self.epsilon = epsilon self.epsilon = epsilon
self.learning_rate = learning_rate self.grad_clip = grad_clip
self.weight_decay = 0.01 if weight_decay is None else weight_decay self.weight_decay = 0.01 if weight_decay is None else weight_decay
self.grad_clip = grad_clip self.grad_clip = grad_clip
self.name = name self.name = name
self.lazy_mode = lazy_mode self.lazy_mode = lazy_mode
self.multi_precision = multi_precision
self.no_weight_decay_name_list = no_weight_decay_name.split(
) if no_weight_decay_name else []
self.one_dim_param_no_weight_decay = one_dim_param_no_weight_decay
def __call__(self, model):
parameters = model.parameters()
self.no_weight_decay_param_name_list = [
p.name for n, p in model.named_parameters() if any(nd in n for nd in self.no_weight_decay_name_list)
]
if self.one_dim_param_no_weight_decay:
self.no_weight_decay_param_name_list += [
p.name for n, p in model.named_parameters() if len(p.shape) == 1
]
def __call__(self, parameters):
opt = optim.AdamW( opt = optim.AdamW(
learning_rate=self.learning_rate, learning_rate=self.learning_rate,
beta1=self.beta1, beta1=self.beta1,
beta2=self.beta2, beta2=self.beta2,
epsilon=self.epsilon, epsilon=self.epsilon,
parameters=parameters,
weight_decay=self.weight_decay, weight_decay=self.weight_decay,
multi_precision=self.multi_precision,
grad_clip=self.grad_clip, grad_clip=self.grad_clip,
name=self.name, name=self.name,
lazy_mode=self.lazy_mode, lazy_mode=self.lazy_mode,
parameters=parameters) apply_decay_param_fun=self._apply_decay_param_fun)
return opt return opt
def _apply_decay_param_fun(self, name):
return name not in self.no_weight_decay_param_name_list
\ No newline at end of file
...@@ -27,7 +27,7 @@ from .sast_postprocess import SASTPostProcess ...@@ -27,7 +27,7 @@ from .sast_postprocess import SASTPostProcess
from .fce_postprocess import FCEPostProcess from .fce_postprocess import FCEPostProcess
from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, \ from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, \
DistillationCTCLabelDecode, TableLabelDecode, NRTRLabelDecode, SARLabelDecode, \ DistillationCTCLabelDecode, TableLabelDecode, NRTRLabelDecode, SARLabelDecode, \
SEEDLabelDecode, PRENLabelDecode SEEDLabelDecode, PRENLabelDecode, SVTRLabelDecode
from .cls_postprocess import ClsPostProcess from .cls_postprocess import ClsPostProcess
from .pg_postprocess import PGPostProcess from .pg_postprocess import PGPostProcess
from .vqa_token_ser_layoutlm_postprocess import VQASerTokenLayoutLMPostProcess from .vqa_token_ser_layoutlm_postprocess import VQASerTokenLayoutLMPostProcess
...@@ -41,7 +41,8 @@ def build_post_process(config, global_config=None): ...@@ -41,7 +41,8 @@ def build_post_process(config, global_config=None):
'PGPostProcess', 'DistillationCTCLabelDecode', 'TableLabelDecode', 'PGPostProcess', 'DistillationCTCLabelDecode', 'TableLabelDecode',
'DistillationDBPostProcess', 'NRTRLabelDecode', 'SARLabelDecode', 'DistillationDBPostProcess', 'NRTRLabelDecode', 'SARLabelDecode',
'SEEDLabelDecode', 'VQASerTokenLayoutLMPostProcess', 'SEEDLabelDecode', 'VQASerTokenLayoutLMPostProcess',
'VQAReTokenLayoutLMPostProcess', 'PRENLabelDecode' 'VQAReTokenLayoutLMPostProcess', 'PRENLabelDecode',
'DistillationSARLabelDecode', 'SVTRLabelDecode'
] ]
if config['name'] == 'PSEPostProcess': if config['name'] == 'PSEPostProcess':
......
...@@ -17,17 +17,26 @@ import paddle ...@@ -17,17 +17,26 @@ import paddle
class ClsPostProcess(object): class ClsPostProcess(object):
""" Convert between text-label and text-index """ """ Convert between text-label and text-index """
def __init__(self, label_list, **kwargs): def __init__(self, label_list=None, key=None, **kwargs):
super(ClsPostProcess, self).__init__() super(ClsPostProcess, self).__init__()
self.label_list = label_list self.label_list = label_list
self.key = key
def __call__(self, preds, label=None, *args, **kwargs): def __call__(self, preds, label=None, *args, **kwargs):
if self.key is not None:
preds = preds[self.key]
label_list = self.label_list
if label_list is None:
label_list = {idx: idx for idx in range(preds.shape[-1])}
if isinstance(preds, paddle.Tensor): if isinstance(preds, paddle.Tensor):
preds = preds.numpy() preds = preds.numpy()
pred_idxs = preds.argmax(axis=1) pred_idxs = preds.argmax(axis=1)
decode_out = [(self.label_list[idx], preds[i, idx]) decode_out = [(label_list[idx], preds[i, idx])
for i, idx in enumerate(pred_idxs)] for i, idx in enumerate(pred_idxs)]
if label is None: if label is None:
return decode_out return decode_out
label = [(self.label_list[idx], 1.0) for idx in label] label = [(label_list[idx], 1.0) for idx in label]
return decode_out, label return decode_out, label
...@@ -73,7 +73,7 @@ class BaseRecLabelDecode(object): ...@@ -73,7 +73,7 @@ class BaseRecLabelDecode(object):
conf_list = [0] conf_list = [0]
text = ''.join(char_list) text = ''.join(char_list)
result_list.append((text, np.mean(conf_list))) result_list.append((text, np.mean(conf_list).tolist()))
return result_list return result_list
def get_ignored_tokens(self): def get_ignored_tokens(self):
...@@ -117,6 +117,7 @@ class DistillationCTCLabelDecode(CTCLabelDecode): ...@@ -117,6 +117,7 @@ class DistillationCTCLabelDecode(CTCLabelDecode):
use_space_char=False, use_space_char=False,
model_name=["student"], model_name=["student"],
key=None, key=None,
multi_head=False,
**kwargs): **kwargs):
super(DistillationCTCLabelDecode, self).__init__(character_dict_path, super(DistillationCTCLabelDecode, self).__init__(character_dict_path,
use_space_char) use_space_char)
...@@ -125,6 +126,7 @@ class DistillationCTCLabelDecode(CTCLabelDecode): ...@@ -125,6 +126,7 @@ class DistillationCTCLabelDecode(CTCLabelDecode):
self.model_name = model_name self.model_name = model_name
self.key = key self.key = key
self.multi_head = multi_head
def __call__(self, preds, label=None, *args, **kwargs): def __call__(self, preds, label=None, *args, **kwargs):
output = dict() output = dict()
...@@ -132,6 +134,8 @@ class DistillationCTCLabelDecode(CTCLabelDecode): ...@@ -132,6 +134,8 @@ class DistillationCTCLabelDecode(CTCLabelDecode):
pred = preds[name] pred = preds[name]
if self.key is not None: if self.key is not None:
pred = pred[self.key] pred = pred[self.key]
if self.multi_head and isinstance(pred, dict):
pred = pred['ctc']
output[name] = super().__call__(pred, label=label, *args, **kwargs) output[name] = super().__call__(pred, label=label, *args, **kwargs)
return output return output
...@@ -196,7 +200,7 @@ class NRTRLabelDecode(BaseRecLabelDecode): ...@@ -196,7 +200,7 @@ class NRTRLabelDecode(BaseRecLabelDecode):
else: else:
conf_list.append(1) conf_list.append(1)
text = ''.join(char_list) text = ''.join(char_list)
result_list.append((text.lower(), np.mean(conf_list))) result_list.append((text.lower(), np.mean(conf_list).tolist()))
return result_list return result_list
...@@ -241,7 +245,7 @@ class AttnLabelDecode(BaseRecLabelDecode): ...@@ -241,7 +245,7 @@ class AttnLabelDecode(BaseRecLabelDecode):
else: else:
conf_list.append(1) conf_list.append(1)
text = ''.join(char_list) text = ''.join(char_list)
result_list.append((text, np.mean(conf_list))) result_list.append((text, np.mean(conf_list).tolist()))
return result_list return result_list
def __call__(self, preds, label=None, *args, **kwargs): def __call__(self, preds, label=None, *args, **kwargs):
...@@ -333,7 +337,7 @@ class SEEDLabelDecode(BaseRecLabelDecode): ...@@ -333,7 +337,7 @@ class SEEDLabelDecode(BaseRecLabelDecode):
else: else:
conf_list.append(1) conf_list.append(1)
text = ''.join(char_list) text = ''.join(char_list)
result_list.append((text, np.mean(conf_list))) result_list.append((text, np.mean(conf_list).tolist()))
return result_list return result_list
def __call__(self, preds, label=None, *args, **kwargs): def __call__(self, preds, label=None, *args, **kwargs):
...@@ -417,7 +421,7 @@ class SRNLabelDecode(BaseRecLabelDecode): ...@@ -417,7 +421,7 @@ class SRNLabelDecode(BaseRecLabelDecode):
conf_list.append(1) conf_list.append(1)
text = ''.join(char_list) text = ''.join(char_list)
result_list.append((text, np.mean(conf_list))) result_list.append((text, np.mean(conf_list).tolist()))
return result_list return result_list
def add_special_char(self, dict_character): def add_special_char(self, dict_character):
...@@ -636,7 +640,7 @@ class SARLabelDecode(BaseRecLabelDecode): ...@@ -636,7 +640,7 @@ class SARLabelDecode(BaseRecLabelDecode):
comp = re.compile('[^A-Z^a-z^0-9^\u4e00-\u9fa5]') comp = re.compile('[^A-Z^a-z^0-9^\u4e00-\u9fa5]')
text = text.lower() text = text.lower()
text = comp.sub('', text) text = comp.sub('', text)
result_list.append((text, np.mean(conf_list))) result_list.append((text, np.mean(conf_list).tolist()))
return result_list return result_list
def __call__(self, preds, label=None, *args, **kwargs): def __call__(self, preds, label=None, *args, **kwargs):
...@@ -656,6 +660,40 @@ class SARLabelDecode(BaseRecLabelDecode): ...@@ -656,6 +660,40 @@ class SARLabelDecode(BaseRecLabelDecode):
return [self.padding_idx] return [self.padding_idx]
class DistillationSARLabelDecode(SARLabelDecode):
"""
Convert
Convert between text-label and text-index
"""
def __init__(self,
character_dict_path=None,
use_space_char=False,
model_name=["student"],
key=None,
multi_head=False,
**kwargs):
super(DistillationSARLabelDecode, self).__init__(character_dict_path,
use_space_char)
if not isinstance(model_name, list):
model_name = [model_name]
self.model_name = model_name
self.key = key
self.multi_head = multi_head
def __call__(self, preds, label=None, *args, **kwargs):
output = dict()
for name in self.model_name:
pred = preds[name]
if self.key is not None:
pred = pred[self.key]
if self.multi_head and isinstance(pred, dict):
pred = pred['sar']
output[name] = super().__call__(pred, label=label, *args, **kwargs)
return output
class PRENLabelDecode(BaseRecLabelDecode): class PRENLabelDecode(BaseRecLabelDecode):
""" Convert between text-label and text-index """ """ Convert between text-label and text-index """
...@@ -699,7 +737,7 @@ class PRENLabelDecode(BaseRecLabelDecode): ...@@ -699,7 +737,7 @@ class PRENLabelDecode(BaseRecLabelDecode):
text = ''.join(char_list) text = ''.join(char_list)
if len(text) > 0: if len(text) > 0:
result_list.append((text, np.mean(conf_list))) result_list.append((text, np.mean(conf_list).tolist()))
else: else:
# here confidence of empty recog result is 1 # here confidence of empty recog result is 1
result_list.append(('', 1)) result_list.append(('', 1))
...@@ -714,3 +752,40 @@ class PRENLabelDecode(BaseRecLabelDecode): ...@@ -714,3 +752,40 @@ class PRENLabelDecode(BaseRecLabelDecode):
return text return text
label = self.decode(label) label = self.decode(label)
return text, label return text, label
class SVTRLabelDecode(BaseRecLabelDecode):
""" Convert between text-label and text-index """
def __init__(self, character_dict_path=None, use_space_char=False,
**kwargs):
super(SVTRLabelDecode, self).__init__(character_dict_path,
use_space_char)
def __call__(self, preds, label=None, *args, **kwargs):
if isinstance(preds, tuple):
preds = preds[-1]
if isinstance(preds, paddle.Tensor):
preds = preds.numpy()
preds_idx = preds.argmax(axis=-1)
preds_prob = preds.max(axis=-1)
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
return_text = []
for i in range(0, len(text), 3):
text0 = text[i]
text1 = text[i + 1]
text2 = text[i + 2]
text_pred = [text0[0], text1[0], text2[0]]
text_prob = [text0[1], text1[1], text2[1]]
id_max = text_prob.index(max(text_prob))
return_text.append((text_pred[id_max], text_prob[id_max]))
if label is None:
return return_text
label = self.decode(label)
return return_text, label
def add_special_char(self, dict_character):
dict_character = ['blank'] + dict_character
return dict_character
\ No newline at end of file
[English](README.md) | 简体中文 [English](README.md) | 简体中文
- [1. 简介](#1-简介) # PP-Structure 文档分析
- [2. 近期更新](#2-近期更新)
- [3. 特性](#3-特性) - [1. 简介](#1)
- [4. 效果展示](#4-效果展示) - [2. 近期更新](#2)
- [4.1 版面分析和表格识别](#41-版面分析和表格识别) - [3. 特性](#3)
- [4.2 DOC-VQA](#42-doc-vqa) - [4. 效果展示](#4)
- [5. 快速体验](#5-快速体验) - [4.1 版面分析和表格识别](#41)
- [6. PP-Structure 介绍](#6-pp-structure-介绍) - [4.2 DocVQA](#42)
- [6.1 版面分析+表格识别](#61-版面分析表格识别) - [5. 快速体验](#5)
- [6.1.1 版面分析](#611-版面分析) - [6. PP-Structure 介绍](#6)
- [6.1.2 表格识别](#612-表格识别) - [6.1 版面分析+表格识别](#61)
- [6.2 DOC-VQA](#62-doc-vqa) - [6.1.1 版面分析](#611)
- [7. 模型库](#7-模型库) - [6.1.2 表格识别](#612)
- [7.1 版面分析模型](#71-版面分析模型) - [6.2 DocVQA](#62)
- [7.2 OCR和表格识别模型](#72-ocr和表格识别模型) - [7. 模型库](#7)
- [7.2 DOC-VQA 模型](#72-doc-vqa-模型) - [7.1 版面分析模型](#71)
- [7.2 OCR和表格识别模型](#72)
- [7.3 DocVQA 模型](#73)
<a name="1"></a>
## 1. 简介 ## 1. 简介
PP-Structure是一个可用于复杂文档结构分析和处理的OCR工具包,旨在帮助开发者更好的完成文档理解相关任务。 PP-Structure是一个可用于复杂文档结构分析和处理的OCR工具包,旨在帮助开发者更好的完成文档理解相关任务。
<a name="2"></a>
## 2. 近期更新 ## 2. 近期更新
* 2022.02.12 DOC-VQA增加LayoutLMv2模型。 * 2022.02.12 DocVQA增加LayoutLMv2模型。
* 2021.12.07 新增[DOC-VQA任务SER和RE](vqa/README.md) * 2021.12.07 新增[DOC-VQA任务SER和RE](vqa/README.md)
<a name="3"></a>
## 3. 特性 ## 3. 特性
PP-Structure的主要特性如下: PP-Structure的主要特性如下:
...@@ -33,21 +37,24 @@ PP-Structure的主要特性如下: ...@@ -33,21 +37,24 @@ PP-Structure的主要特性如下:
- 支持表格区域进行结构化分析,最终结果输出Excel文件 - 支持表格区域进行结构化分析,最终结果输出Excel文件
- 支持python whl包和命令行两种方式,简单易用 - 支持python whl包和命令行两种方式,简单易用
- 支持版面分析和表格结构化两类任务自定义训练 - 支持版面分析和表格结构化两类任务自定义训练
- 支持文档视觉问答(Document Visual Question Answering,DOC-VQA)任务-语义实体识别(Semantic Entity Recognition,SER)和关系抽取(Relation Extraction,RE) - 支持文档视觉问答(Document Visual Question Answering,DocVQA)任务-语义实体识别(Semantic Entity Recognition,SER)和关系抽取(Relation Extraction,RE)
<a name="4"></a>
## 4. 效果展示 ## 4. 效果展示
<a name="41"></a>
### 4.1 版面分析和表格识别 ### 4.1 版面分析和表格识别
<img src="../doc/table/ppstructure.GIF" width="100%"/> <img src="./docs/table/ppstructure.GIF" width="100%"/>
图中展示了版面分析+表格识别的整体流程,图片先有版面分析划分为图像、文本、标题和表格四种区域,然后对图像、文本和标题三种区域进行OCR的检测识别,对表格进行表格识别,其中图像还会被存储下来以便使用。 图中展示了版面分析+表格识别的整体流程,图片先有版面分析划分为图像、文本、标题和表格四种区域,然后对图像、文本和标题三种区域进行OCR的检测识别,对表格进行表格识别,其中图像还会被存储下来以便使用。
<a name="42"></a>
### 4.2 DOC-VQA ### 4.2 DOC-VQA
* SER * SER
![](../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../doc/vqa/result_ser/zh_val_42_ser.jpg) ![](./docs/vqa/result_ser/zh_val_0_ser.jpg) | ![](./docs/vqa/result_ser/zh_val_42_ser.jpg)
---|--- ---|---
图中不同颜色的框表示不同的类别,对于XFUN数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别 图中不同颜色的框表示不同的类别,对于XFUN数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别
...@@ -60,46 +67,55 @@ PP-Structure的主要特性如下: ...@@ -60,46 +67,55 @@ PP-Structure的主要特性如下:
* RE * RE
![](../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../doc/vqa/result_re/zh_val_40_re.jpg) ![](./docs/vqa/result_re/zh_val_21_re.jpg) | ![](./docs/vqa/result_re/zh_val_40_re.jpg)
---|--- ---|---
图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。 图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
<a name="5"></a>
## 5. 快速体验 ## 5. 快速体验
请参考[快速安装](./docs/quickstart.md)教程。 请参考[快速使用](./docs/quickstart.md)教程。
<a name="6"></a>
## 6. PP-Structure 介绍 ## 6. PP-Structure 介绍
<a name="61"></a>
### 6.1 版面分析+表格识别 ### 6.1 版面分析+表格识别
![pipeline](../doc/table/pipeline.jpg) ![pipeline](./docs/table/pipeline.jpg)
在PP-Structure中,图片会先经由Layout-Parser进行版面分析,在版面分析中,会对图片里的区域进行分类,包括**文字、标题、图片、列表和表格**5类。对于前4类区域,直接使用PP-OCR完成对应区域文字检测与识别。对于表格类区域,经过表格结构化处理后,表格图片转换为相同表格样式的Excel文件。 在PP-Structure中,图片会先经由Layout-Parser进行版面分析,在版面分析中,会对图片里的区域进行分类,包括**文字、标题、图片、列表和表格**5类。对于前4类区域,直接使用PP-OCR完成对应区域文字检测与识别。对于表格类区域,经过表格结构化处理后,表格图片转换为相同表格样式的Excel文件。
<a name="611"></a>
#### 6.1.1 版面分析 #### 6.1.1 版面分析
版面分析对文档数据进行区域分类,其中包括版面分析工具的Python脚本使用、提取指定类别检测框、性能指标以及自定义训练版面分析模型,详细内容可以参考[文档](layout/README_ch.md) 版面分析对文档数据进行区域分类,其中包括版面分析工具的Python脚本使用、提取指定类别检测框、性能指标以及自定义训练版面分析模型,详细内容可以参考[文档](layout/README_ch.md)
<a name="612"></a>
#### 6.1.2 表格识别 #### 6.1.2 表格识别
表格识别将表格图片转换为excel文档,其中包含对于表格文本的检测和识别以及对于表格结构和单元格坐标的预测,详细说明参考[文档](table/README_ch.md) 表格识别将表格图片转换为excel文档,其中包含对于表格文本的检测和识别以及对于表格结构和单元格坐标的预测,详细说明参考[文档](table/README_ch.md)
### 6.2 DOC-VQA <a name="62"></a>
### 6.2 DocVQA
DOC-VQA指文档视觉问答,其中包括语义实体识别 (Semantic Entity Recognition, SER) 和关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair),详细说明参考[文档](vqa/README.md) DocVQA指文档视觉问答,其中包括语义实体识别 (Semantic Entity Recognition, SER) 和关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair),详细说明参考[文档](vqa/README.md)
<a name="7"></a>
## 7. 模型库 ## 7. 模型库
PP-Structure系列模型列表(更新中) PP-Structure系列模型列表(更新中)
<a name="71"></a>
### 7.1 版面分析模型 ### 7.1 版面分析模型
|模型名称|模型简介|下载地址| label_map| |模型名称|模型简介|下载地址| label_map|
| --- | --- | --- | --- | | --- | --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| | ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}|
<a name="72"></a>
### 7.2 OCR和表格识别模型 ### 7.2 OCR和表格识别模型
|模型名称|模型简介|模型大小|下载地址| |模型名称|模型简介|模型大小|下载地址|
...@@ -108,7 +124,8 @@ PP-Structure系列模型列表(更新中) ...@@ -108,7 +124,8 @@ PP-Structure系列模型列表(更新中)
|ch_PP-OCRv2_rec_slim|【最新】slim量化版超轻量模型,支持中英文、数字识别| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | |ch_PP-OCRv2_rec_slim|【最新】slim量化版超轻量模型,支持中英文、数字识别| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | |en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
### 7.2 DOC-VQA 模型 <a name="73"></a>
### 7.3 DocVQA 模型
|模型名称|模型简介|模型大小|下载地址| |模型名称|模型简介|模型大小|下载地址|
| --- | --- | --- | --- | | --- | --- | --- | --- |
......
# 基于Python预测引擎推理
- [1. Structure](#1)
- [1.1 版面分析+表格识别](#1.1)
- [1.2 版面分析](#1.2)
- [1.3 表格识别](#1.3)
- [2. DocVQA](#2)
<a name="1"></a>
## 1. Structure
进入`ppstructure`目录
```bash
cd ppstructure
````
下载模型
```bash
mkdir inference && cd inference
# 下载PP-OCRv2文本检测模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
# 下载PP-OCRv2文本识别模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
# 下载超轻量级英文表格预测模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
```
<a name="1.1"></a>
### 1.1 版面分析+表格识别
```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
--image_dir=./docs/table/1.png \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--output=../output \
--vis_font_path=../doc/fonts/simfang.ttf
```
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。详细的结果会存储在`res.txt`文件中。
<a name="1.2"></a>
### 1.2 版面分析
```bash
python3 predict_system.py --image_dir=./docs/table/1.png --table=false --ocr=false --output=../output/
```
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,图片区域会被裁剪之后保存下来,图片名为表格在图片里的坐标。版面分析结果会存储在`res.txt`文件中。
<a name="1.3"></a>
### 1.3 表格识别
```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
--image_dir=./docs/table/table.jpg \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--output=../output \
--vis_font_path=../doc/fonts/simfang.ttf \
--layout=false
```
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,表格会存储为一个excel,excel文件名为`[0,0,img_h,img_w]`。
<a name="2"></a>
## 2. DocVQA
```bash
cd ppstructure
# 下载模型
mkdir inference && cd inference
# 下载SER xfun 模型并解压
wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar
cd ..
python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained/ \
--mode=vqa \
--image_dir=vqa/images/input/zh_val_0.jpg \
--vis_font_path=../doc/fonts/simfang.ttf
```
运行完成后,每张图片会在`output`字段指定的目录下的`vqa`目录下存放可视化之后的图片,图片名和输入图片名一致。
# Python Inference
- [1. Structure](#1)
- [1.1 layout analysis + table recognition](#1.1)
- [1.2 layout analysis](#1.2)
- [1.3 table recognition](#1.3)
- [2. DocVQA](#2)
<a name="1"></a>
## 1. Structure
Go to the `ppstructure` directory
```bash
cd ppstructure
````
download model
```bash
mkdir inference && cd inference
# Download the PP-OCRv2 text detection model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
# Download the PP-OCRv2 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
# Download the ultra-lightweight English table structure model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
```
<a name="1.1"></a>
### 1.1 layout analysis + table recognition
```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
--image_dir=./docs/table/1.png \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--output=../output \
--vis_font_path=../doc/fonts/simfang.ttf
```
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image. Detailed results are stored in the `res.txt` file.
<a name="1.2"></a>
### 1.2 layout analysis
```bash
python3 predict_system.py --image_dir=./docs/table/1.png --table=false --ocr=false --output=../output/
```
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each picture in image will be cropped and saved. The filename of picture area is their coordinates in the image. Layout analysis results will be stored in the `res.txt` file
<a name="1.3"></a>
### 1.3 table recognition
```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
--image_dir=./docs/table/table.jpg \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--output=../output \
--vis_font_path=../doc/fonts/simfang.ttf \
--layout=false
```
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image.
<a name="2"></a>
## 2. DocVQA
```bash
cd ppstructure
# download model
mkdir inference && cd inference
wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar
cd ..
python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained/ \
--mode=vqa \
--image_dir=vqa/images/input/zh_val_0.jpg \
--vis_font_path=../doc/fonts/simfang.ttf
```
After the operation is completed, each image will store the visualized image in the `vqa` directory under the directory specified by the `output` field, and the image name is the same as the input image name.
- [PP-Structure 系列模型列表](#pp-structure-系列模型列表)
- [1. LayoutParser 模型](#1-layoutparser-模型)
- [2. OCR和表格识别模型](#2-ocr和表格识别模型)
- [2.1 OCR](#21-ocr)
- [2.2 表格识别模型](#22-表格识别模型)
- [3. VQA模型](#3-vqa模型)
- [4. KIE模型](#4-kie模型)
# PP-Structure 系列模型列表 # PP-Structure 系列模型列表
- [1. 版面分析模型](#1)
- [2. OCR和表格识别模型](#2)
- [2.1 OCR](#21)
- [2.2 表格识别模型](#22)
- [3. VQA模型](#3)
- [4. KIE模型](#4)
## 1. LayoutParser 模型 <a name="1"></a>
## 1. 版面分析模型
|模型名称|模型简介|下载地址|label_map| |模型名称|模型简介|下载地址|label_map|
| --- | --- | --- | --- | | --- | --- | --- | --- |
...@@ -17,8 +17,10 @@ ...@@ -17,8 +17,10 @@
| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| | ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}|
| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| | ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}|
<a name="2"></a>
## 2. OCR和表格识别模型 ## 2. OCR和表格识别模型
<a name="21"></a>
### 2.1 OCR ### 2.1 OCR
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|推理模型大小|下载地址|
...@@ -28,12 +30,14 @@ ...@@ -28,12 +30,14 @@
如需要使用其他OCR模型,可以在 [PP-OCR model_list](../../doc/doc_ch/models_list.md) 下载模型或者使用自己训练好的模型配置到 `det_model_dir`, `rec_model_dir`两个字段即可。 如需要使用其他OCR模型,可以在 [PP-OCR model_list](../../doc/doc_ch/models_list.md) 下载模型或者使用自己训练好的模型配置到 `det_model_dir`, `rec_model_dir`两个字段即可。
<a name="22"></a>
### 2.2 表格识别模型 ### 2.2 表格识别模型
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- | | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | |en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
<a name="3"></a>
## 3. VQA模型 ## 3. VQA模型
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|推理模型大小|下载地址|
...@@ -44,6 +48,7 @@ ...@@ -44,6 +48,7 @@
|re_LayoutLMv2_xfun_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | |re_LayoutLMv2_xfun_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
|ser_LayoutLM_xfun_zh|基于LayoutLM在xfun中文数据集上训练的SER模型|430M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | |ser_LayoutLM_xfun_zh|基于LayoutLM在xfun中文数据集上训练的SER模型|430M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
<a name="4"></a>
## 4. KIE模型 ## 4. KIE模型
|模型名称|模型简介|模型大小|下载地址| |模型名称|模型简介|模型大小|下载地址|
......
# PP-Structure Model list
- [1. Layout Analysis](#1)
- [2. OCR and Table Recognition](#2)
- [2.1 OCR](#21)
- [2.2 Table Recognition](#22)
- [3. VQA](#3)
- [4. KIE](#4)
<a name="1"></a>
## 1. Layout Analysis
|model name| description |download|label_map|
| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **text, title, table, picture and list** | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}|
| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}|
| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}|
<a name="2"></a>
## 2. OCR and Table Recognition
<a name="21"></a>
### 2.1 OCR
|model name| description | inference model size |download|
| --- |---|---| --- |
|en_ppocr_mobile_v2.0_table_det| Text detection model of English table scenes trained on PubTabNet dataset | 4.7M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_det_train.tar) |
|en_ppocr_mobile_v2.0_table_rec| Text recognition model of English table scenes trained on PubTabNet dataset | 6.9M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) |
If you need to use other OCR models, you can download the model in [PP-OCR model_list](../../doc/doc_ch/models_list.md) or use the model you trained yourself to configure to `det_model_dir`, `rec_model_dir` field.
<a name="22"></a>
### 2.2 Table Recognition
|model| description |inference model size|download|
| --- |-----------------------------------------------------------------------------| --- | --- |
|en_ppocr_mobile_v2.0_table_structure| Table structure model for English table scenes trained on PubTabNet dataset |18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
<a name="3"></a>
## 3. VQA
|model| description |inference model size|download|
| --- |----------------------------------------------------------------| --- | --- |
|ser_LayoutXLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
|re_LayoutXLM_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|ser_LayoutLMv2_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLMv2 |778M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) |
|re_LayoutLMv2_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLMv2 |765M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
|ser_LayoutLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutLM |430M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
<a name="4"></a>
## 4. KIE
|model|description|model size|download|
| --- | --- | --- | --- |
|SDMGR|Key Information Extraction Model|78M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
# PP-Structure 快速开始 # PP-Structure 快速开始
- [PP-Structure 快速开始](#pp-structure-快速开始) - [1. 安装依赖包](#1)
- [1. 安装依赖包](#1-安装依赖包) - [2. 便捷使用](#2)
- [2. 便捷使用](#2-便捷使用) - [2.1 命令行使用](#21)
- [2.1 命令行使用](#21-命令行使用) - [2.1.1 版面分析+表格识别](#211)
- [2.2 Python脚本使用](#22-python脚本使用) - [2.1.2 版面分析](#212)
- [2.3 返回结果说明](#23-返回结果说明) - [2.1.3 表格识别](#213)
- [2.4 参数说明](#24-参数说明) - [2.1.4 DocVQA](#214)
- [3. Python脚本使用](#3-python脚本使用) - [2.2 代码使用](#22)
- [2.2.1 版面分析+表格识别](#221)
- [2.2.2 版面分析](#222)
- [2.2.3 表格识别](#223)
- [2.2.4 DocVQA](#224)
- [2.3 返回结果说明](#23)
- [2.3.1 版面分析+表格识别](#231)
- [2.3.2 DocVQA](#232)
- [2.4 参数说明](#24)
<a name="1"></a>
## 1. 安装依赖包 ## 1. 安装依赖包
```bash ```bash
pip install "paddleocr>=2.3.0.2" # 推荐使用2.3.0.2+版本 # 安装 paddleocr,推荐使用2.5+版本
pip3 install "paddleocr>=2.5"
# 安装 版面分析依赖包layoutparser(如不需要版面分析功能,可跳过)
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
# 安装 DocVQA依赖包paddlenlp(如不需要DocVQA功能,可跳过)
# 安装 PaddleNLP pip install paddlenlp
git clone https://github.com/PaddlePaddle/PaddleNLP -b develop
cd PaddleNLP
pip3 install -e .
``` ```
<a name="2"></a>
## 2. 便捷使用 ## 2. 便捷使用
<a name="21"></a>
### 2.1 命令行使用 ### 2.1 命令行使用
* 版面分析+表格识别 <a name="211"></a>
#### 2.1.1 版面分析+表格识别
```bash ```bash
paddleocr --image_dir=../doc/table/1.png --type=structure paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure
``` ```
* VQA <a name="212"></a>
#### 2.1.2 版面分析
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
```
<a name="213"></a>
#### 2.1.3 表格识别
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false
```
<a name="214"></a>
#### 2.1.4 DocVQA
请参考:[文档视觉问答](../vqa/README.md) 请参考:[文档视觉问答](../vqa/README.md)
### 2.2 Python脚本使用 <a name="22"></a>
### 2.2 代码使用
<a name="221"></a>
#### 2.2.1 版面分析+表格识别
* 版面分析+表格识别
```python ```python
import os import os
import cv2 import cv2
...@@ -45,8 +73,8 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res ...@@ -45,8 +73,8 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True) table_engine = PPStructure(show_log=True)
save_folder = './output/table' save_folder = './output'
img_path = '../doc/table/1.png' img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
...@@ -57,21 +85,66 @@ for line in result: ...@@ -57,21 +85,66 @@ for line in result:
from PIL import Image from PIL import Image
font_path = '../doc/fonts/simfang.ttf' # PaddleOCR下提供字体包 font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB') image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path) im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
im_show.save('result.jpg') im_show.save('result.jpg')
``` ```
* VQA <a name="222"></a>
#### 2.2.2 版面分析
```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(table=False, ocr=False, show_log=True)
save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
```
<a name="223"></a>
#### 2.2.3 表格识别
```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(layout=False, show_log=True)
save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/table.jpg'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
```
<a name="224"></a>
#### 2.2.4 DocVQA
请参考:[文档视觉问答](../vqa/README.md) 请参考:[文档视觉问答](../vqa/README.md)
<a name="23"></a>
### 2.3 返回结果说明 ### 2.3 返回结果说明
PP-Structure的返回结果为一个dict组成的list,示例如下 PP-Structure的返回结果为一个dict组成的list,示例如下
* 版面分析+表格识别 <a name="231"></a>
#### 2.3.1 版面分析+表格识别
```shell ```shell
[ [
{ 'type': 'Text', { 'type': 'Text',
...@@ -84,19 +157,31 @@ PP-Structure的返回结果为一个dict组成的list,示例如下 ...@@ -84,19 +157,31 @@ PP-Structure的返回结果为一个dict组成的list,示例如下
dict 里各个字段说明如下 dict 里各个字段说明如下
| 字段 | 说明 | | 字段 | 说明 |
| --------------- | -------------| | --------------- |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|type|图片区域的类型| |type| 图片区域的类型 |
|bbox|图片区域的在原图的坐标,分别[左上角x,左上角y,右下角x,右下角y]| |bbox| 图片区域的在原图的坐标,分别[左上角x,左上角y,右下角x,右下角y] |
|res|图片区域的OCR或表格识别结果。<br> 表格: 表格的HTML字符串; <br> OCR: 一个包含各个单行文字的检测坐标和识别结果的元组| |res| 图片区域的OCR或表格识别结果。<br> 表格: 一个dict,字段说明如下<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: 表格的HTML字符串<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; 在代码使用模式下,前向传入return_ocr_result_in_table=True可以拿到表格中每个文本的检测识别结果,对应为如下字段: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: 文本检测坐标<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: 文本识别结果。<br> OCR: 一个包含各个单行文字的检测坐标和识别结果的元组 |
* VQA 运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。
```
/output/table/1/
└─ res.txt
└─ [454, 360, 824, 658].xlsx 表格识别结果
└─ [16, 2, 828, 305].jpg 被裁剪出的图片区域
└─ [17, 361, 404, 711].xlsx 表格识别结果
```
<a name="232"></a>
#### 2.3.2 DocVQA
请参考:[文档视觉问答](../vqa/README.md) 请参考:[文档视觉问答](../vqa/README.md)
<a name="24"></a>
### 2.4 参数说明 ### 2.4 参数说明
| 字段 | 说明 | 默认值 | | 字段 | 说明 | 默认值 |
| --------------- | ---------------------------------------- | ------------------------------------------- | |----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| output | excel和识别结果保存的地址 | ./output/table | | output | excel和识别结果保存的地址 | ./output/table |
| table_max_len | 表格结构模型预测时,图像的长边resize尺度 | 488 | | table_max_len | 表格结构模型预测时,图像的长边resize尺度 | 488 |
| table_model_dir | 表格结构模型 inference 模型地址 | None | | table_model_dir | 表格结构模型 inference 模型地址 | None |
...@@ -106,54 +191,8 @@ dict 里各个字段说明如下 ...@@ -106,54 +191,8 @@ dict 里各个字段说明如下
| model_name_or_path | VQA SER模型地址 | None | | model_name_or_path | VQA SER模型地址 | None |
| max_seq_length | VQA SER模型最大支持token长度 | 512 | | max_seq_length | VQA SER模型最大支持token长度 | 512 |
| label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt | | label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt |
| mode | pipeline预测模式,structure: 版面分析+表格识别; VQA: SER文档信息抽取 | structure | | layout | 前向中是否执行版面分析 | True |
| table | 前向中是否执行表格识别 | True |
| ocr | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False | True |
大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md) 大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md)
运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名名为表格在图片里的坐标。
## 3. Python脚本使用
* 版面分析+表格识别
```bash
cd ppstructure
# 下载模型
mkdir inference && cd inference
# 下载PP-OCRv2文本检测模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
# 下载PP-OCRv2文本识别模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
# 下载超轻量级英文表格预测模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
--image_dir=../doc/table/1.png \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--output=../output/table \
--vis_font_path=../doc/fonts/simfang.ttf
```
运行完成后,每张图片会在`output`字段指定的目录下的`talbe`目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名名为表格在图片里的坐标。
* VQA
```bash
cd ppstructure
# 下载模型
mkdir inference && cd inference
# 下载SER xfun 模型并解压
wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar
cd ..
python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained/ \
--mode=vqa \
--image_dir=vqa/images/input/zh_val_0.jpg \
--vis_font_path=../doc/fonts/simfang.ttf
```
运行完成后,每张图片会在`output`字段指定的目录下的`vqa`目录下存放可视化之后的图片,图片名和输入图片名一致。
# PP-Structure Quick Start
- [1. Install package](#1)
- [2. Use](#2)
- [2.1 Use by command line](#21)
- [2.1.1 layout analysis + table recognition](#211)
- [2.1.2 layout analysis](#212)
- [2.1.3 table recognition](#213)
- [2.1.4 DocVQA](#214)
- [2.2 Use by code](#22)
- [2.2.1 layout analysis + table recognition](#221)
- [2.2.2 layout analysis](#222)
- [2.2.3 table recognition](#223)
- [2.2.4 DocVQA](#224)
- [2.3 Result description](#23)
- [2.3.1 layout analysis + table recognition](#231)
- [2.3.2 DocVQA](#232)
- [2.4 Parameter Description](#24)
<a name="1"></a>
## 1. Install package
```bash
# Install paddleocr, version 2.5+ is recommended
pip3 install "paddleocr>=2.5"
# Install layoutparser (if you do not use the layout analysis, you can skip it)
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
# Install the DocVQA dependency package paddlenlp (if you do not use the DocVQA, you can skip it)
pip install paddlenlp
```
<a name="2"></a>
## 2. Use
<a name="21"></a>
### 2.1 Use by command line
<a name="211"></a>
#### 2.1.1 layout analysis + table recognition
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure
```
<a name="212"></a>
#### 2.1.2 layout analysis
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
```
<a name="213"></a>
#### 2.1.3 table recognition
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false
```
<a name="214"></a>
#### 2.1.4 DocVQA
Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
<a name="22"></a>
### 2.2 Use by code
<a name="221"></a>
#### 2.2.1 layout analysis + table recognition
```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True)
save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
<a name="222"></a>
#### 2.2.2 layout analysis
```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(table=False, ocr=False, show_log=True)
save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
```
<a name="223"></a>
#### 2.2.3 table recognition
```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(layout=False, show_log=True)
save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/table.jpg'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
```
<a name="224"></a>
#### 2.2.4 DocVQA
Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
<a name="23"></a>
### 2.3 Result description
The return of PP-Structure is a list of dicts, the example is as follows:
<a name="231"></a>
#### 2.3.1 layout analysis + table recognition
```shell
[
{ 'type': 'Text',
'bbox': [34, 432, 345, 462],
'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
[('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)])
}
]
```
Each field in dict is described as follows:
| field | description |
| --------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|type| Type of image area. |
|bbox| The coordinates of the image area in the original image, respectively [upper left corner x, upper left corner y, lower right corner x, lower right corner y]. |
|res| OCR or table recognition result of the image area. <br> table: a dict with field descriptions as follows: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: html str of table.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; In the code usage mode, set return_ocr_result_in_table=True whrn call can get the detection and recognition results of each text in the table area, corresponding to the following fields: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: text detection boxes.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: text recognition results.<br> OCR: A tuple containing the detection boxes and recognition results of each single text. |
After the recognition is completed, each image will have a directory with the same name under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image.
```
/output/table/1/
└─ res.txt
└─ [454, 360, 824, 658].xlsx table recognition result
└─ [16, 2, 828, 305].jpg picture in Image
└─ [17, 361, 404, 711].xlsx table recognition result
```
<a name="232"></a>
#### 2.3.2 DocVQA
Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
<a name="24"></a>
### 2.4 Parameter Description
| field | description | default |
|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| output | The save path of result | ./output/table |
| table_max_len | When the table structure model predicts, the long side of the image | 488 |
| table_model_dir | the path of table structure model | None |
| table_char_dict_path | the dict path of table structure model | ../ppocr/utils/dict/table_structure_dict.txt |
| layout_path_model | The model path of the layout analysis model, which can be an online address or a local path. When it is a local path, layout_label_map needs to be set. In command line mode, use --layout_label_map='{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}' | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config |
| layout_label_map | Layout analysis model model label mapping dictionary path | None |
| model_name_or_path | the model path of VQA SER model | None |
| max_seq_length | the max token length of VQA SER model | 512 |
| label_map_path | the label path of VQA SER model | ./vqa/labels/labels_ser.txt |
| layout | Whether to perform layout analysis in forward | True |
| table | Whether to perform table recognition in forward | True |
| ocr | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False | True |
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)
[English](README.md) | 简体中文 [English](README.md) | 简体中文
- [版面分析使用说明](#版面分析使用说明)
- [1. 安装whl包](#1--安装whl包)
- [2. 使用](#2-使用)
- [3. 后处理](#3-后处理)
- [4. 指标](#4-指标)
- [5. 训练版面分析模型](#5-训练版面分析模型)
# 版面分析使用说明 # 版面分析使用说明
- [1. 安装whl包](#1)
- [2. 使用](#2)
- [3. 后处理](#3)
- [4. 指标](#4)
- [5. 训练版面分析模型](#5)
<a name="1"></a>
## 1. 安装whl包 ## 1. 安装whl包
```bash ```bash
pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
``` ```
<a name="2"></a>
## 2. 使用 ## 2. 使用
使用layoutparser识别给定文档的布局: 使用layoutparser识别给定文档的布局:
...@@ -20,7 +23,7 @@ pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-a ...@@ -20,7 +23,7 @@ pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-a
```python ```python
import cv2 import cv2
import layoutparser as lp import layoutparser as lp
image = cv2.imread("doc/table/layout.jpg") image = cv2.imread("ppstructure/docs/table/layout.jpg")
image = image[..., ::-1] image = image[..., ::-1]
# 加载模型 # 加载模型
...@@ -40,7 +43,7 @@ show_img.show() ...@@ -40,7 +43,7 @@ show_img.show()
下图展示了结果,不同颜色的检测框表示不同的类别,并通过`show_element_type`在框的左上角显示具体类别: 下图展示了结果,不同颜色的检测框表示不同的类别,并通过`show_element_type`在框的左上角显示具体类别:
<div align="center"> <div align="center">
<img src="../../doc/table/result_all.jpg" width = "600" /> <img src="../docs/table/result_all.jpg" width = "600" />
</div> </div>
`PaddleDetectionLayoutModel`函数参数说明如下: `PaddleDetectionLayoutModel`函数参数说明如下:
...@@ -68,6 +71,7 @@ show_img.show() ...@@ -68,6 +71,7 @@ show_img.show()
* TableBank word和TableBank latex分别在word文档、latex文档数据集训练; * TableBank word和TableBank latex分别在word文档、latex文档数据集训练;
* 下载的TableBank数据集里同时包含word和latex。 * 下载的TableBank数据集里同时包含word和latex。
<a name="3"></a>
## 3. 后处理 ## 3. 后处理
版面分析检测包含多个类别,如果只想获取指定类别(如"Text"类别)的检测框、可以使用下述代码: 版面分析检测包含多个类别,如果只想获取指定类别(如"Text"类别)的检测框、可以使用下述代码:
...@@ -106,9 +110,10 @@ show_img.show() ...@@ -106,9 +110,10 @@ show_img.show()
显示只有"Text"类别的结果: 显示只有"Text"类别的结果:
<div align="center"> <div align="center">
<img src="../../doc/table/result_text.jpg" width = "600" /> <img src="../docs/table/result_text.jpg" width = "600" />
</div> </div>
<a name="4"></a>
## 4. 指标 ## 4. 指标
| Dataset | mAP | CPU time cost | GPU time cost | | Dataset | mAP | CPU time cost | GPU time cost |
...@@ -122,6 +127,7 @@ show_img.show() ...@@ -122,6 +127,7 @@ show_img.show()
**GPU:** a single NVIDIA Tesla P40 **GPU:** a single NVIDIA Tesla P40
<a name="5"></a>
## 5. 训练版面分析模型 ## 5. 训练版面分析模型
上述模型基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 训练,如果您想训练自己的版面分析模型,请参考:[train_layoutparser_model](train_layoutparser_model_ch.md) 上述模型基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 训练,如果您想训练自己的版面分析模型,请参考:[train_layoutparser_model](train_layoutparser_model_ch.md)
...@@ -23,9 +23,10 @@ sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) ...@@ -23,9 +23,10 @@ sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
os.environ["FLAGS_allocator_strategy"] = 'auto_growth' os.environ["FLAGS_allocator_strategy"] = 'auto_growth'
import cv2 import cv2
import json import json
import numpy as np
import time import time
import logging import logging
from copy import deepcopy
from attrdict import AttrDict
from ppocr.utils.utility import get_image_file_list, check_and_read_gif from ppocr.utils.utility import get_image_file_list, check_and_read_gif
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
...@@ -40,16 +41,18 @@ class StructureSystem(object): ...@@ -40,16 +41,18 @@ class StructureSystem(object):
def __init__(self, args): def __init__(self, args):
self.mode = args.mode self.mode = args.mode
if self.mode == 'structure': if self.mode == 'structure':
import layoutparser as lp
# args.det_limit_type = 'resize_long'
args.drop_score = 0
if not args.show_log: if not args.show_log:
logger.setLevel(logging.INFO) logger.setLevel(logging.INFO)
self.text_system = TextSystem(args) if args.layout == False and args.ocr == True:
self.table_system = TableSystem(args, args.ocr = False
self.text_system.text_detector, logger.warning(
self.text_system.text_recognizer) "When args.layout is false, args.ocr is automatically set to false"
)
args.drop_score = 0
# init layout and ocr model
self.text_system = None
if args.layout:
import layoutparser as lp
config_path = None config_path = None
model_path = None model_path = None
if os.path.isdir(args.layout_path_model): if os.path.isdir(args.layout_path_model):
...@@ -64,29 +67,50 @@ class StructureSystem(object): ...@@ -64,29 +67,50 @@ class StructureSystem(object):
enable_mkldnn=args.enable_mkldnn, enable_mkldnn=args.enable_mkldnn,
enforce_cpu=not args.use_gpu, enforce_cpu=not args.use_gpu,
thread_num=args.cpu_threads) thread_num=args.cpu_threads)
self.use_angle_cls = args.use_angle_cls if args.ocr:
self.drop_score = args.drop_score self.text_system = TextSystem(args)
else:
self.table_layout = None
if args.table:
if self.text_system is not None:
self.table_system = TableSystem(
args, self.text_system.text_detector,
self.text_system.text_recognizer)
else:
self.table_system = TableSystem(args)
else:
self.table_system = None
elif self.mode == 'vqa': elif self.mode == 'vqa':
raise NotImplementedError raise NotImplementedError
def __call__(self, img): def __call__(self, img, return_ocr_result_in_table=False):
if self.mode == 'structure': if self.mode == 'structure':
ori_im = img.copy() ori_im = img.copy()
if self.table_layout is not None:
layout_res = self.table_layout.detect(img[..., ::-1]) layout_res = self.table_layout.detect(img[..., ::-1])
else:
h, w = ori_im.shape[:2]
layout_res = [AttrDict(coordinates=[0, 0, w, h], type='Table')]
res_list = [] res_list = []
for region in layout_res: for region in layout_res:
res = ''
x1, y1, x2, y2 = region.coordinates x1, y1, x2, y2 = region.coordinates
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
roi_img = ori_im[y1:y2, x1:x2, :] roi_img = ori_im[y1:y2, x1:x2, :]
if region.type == 'Table': if region.type == 'Table':
res = self.table_system(roi_img) if self.table_system is not None:
res = self.table_system(roi_img,
return_ocr_result_in_table)
else: else:
if self.text_system is not None:
filter_boxes, filter_rec_res = self.text_system(roi_img) filter_boxes, filter_rec_res = self.text_system(roi_img)
# remove style char # remove style char
style_token = [ style_token = [
'<strike>', '<strike>', '<sup>', '</sub>', '<b>', '<strike>', '<strike>', '<sup>', '</sub>', '<b>',
'</b>', '<sub>', '</sup>', '<overline>', '</overline>', '</b>', '<sub>', '</sup>', '<overline>',
'<underline>', '</underline>', '<i>', '</i>' '</overline>', '<underline>', '</underline>', '<i>',
'</i>'
] ]
res = [] res = []
for box, rec_res in zip(filter_boxes, filter_rec_res): for box, rec_res in zip(filter_boxes, filter_rec_res):
...@@ -106,31 +130,33 @@ class StructureSystem(object): ...@@ -106,31 +130,33 @@ class StructureSystem(object):
'img': roi_img, 'img': roi_img,
'res': res 'res': res
}) })
return res_list
elif self.mode == 'vqa': elif self.mode == 'vqa':
raise NotImplementedError raise NotImplementedError
return res_list return None
def save_structure_res(res, save_folder, img_name): def save_structure_res(res, save_folder, img_name):
excel_save_folder = os.path.join(save_folder, img_name) excel_save_folder = os.path.join(save_folder, img_name)
os.makedirs(excel_save_folder, exist_ok=True) os.makedirs(excel_save_folder, exist_ok=True)
res_cp = deepcopy(res)
# save res # save res
with open( with open(
os.path.join(excel_save_folder, 'res.txt'), 'w', os.path.join(excel_save_folder, 'res.txt'), 'w',
encoding='utf8') as f: encoding='utf8') as f:
for region in res: for region in res_cp:
if region['type'] == 'Table': roi_img = region.pop('img')
f.write('{}\n'.format(json.dumps(region)))
if region['type'] == 'Table' and len(region[
'res']) > 0 and 'html' in region['res']:
excel_path = os.path.join(excel_save_folder, excel_path = os.path.join(excel_save_folder,
'{}.xlsx'.format(region['bbox'])) '{}.xlsx'.format(region['bbox']))
to_excel(region['res'], excel_path) to_excel(region['res']['html'], excel_path)
elif region['type'] == 'Figure': elif region['type'] == 'Figure':
roi_img = region['img']
img_path = os.path.join(excel_save_folder, img_path = os.path.join(excel_save_folder,
'{}.jpg'.format(region['bbox'])) '{}.jpg'.format(region['bbox']))
cv2.imwrite(img_path, roi_img) cv2.imwrite(img_path, roi_img)
else:
for text_result in region['res']:
f.write('{}\n'.format(json.dumps(text_result)))
def main(args): def main(args):
......
...@@ -51,7 +51,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab ...@@ -51,7 +51,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd .. cd ..
# run # run
python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=./docs/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ./output/table
``` ```
Note: The above model is trained on the PubLayNet dataset and only supports English scanning scenarios. If you need to identify other scenarios, you need to train the model yourself and replace the three fields `det_model_dir`, `rec_model_dir`, `table_model_dir`. Note: The above model is trained on the PubLayNet dataset and only supports English scanning scenarios. If you need to identify other scenarios, you need to train the model yourself and replace the three fields `det_model_dir`, `rec_model_dir`, `table_model_dir`.
......
- [表格识别](#表格识别) [English](README.md) | 简体中文
- [1. 表格识别 pipeline](#1-表格识别-pipeline)
- [2. 性能](#2-性能)
- [3. 使用](#3-使用)
- [3.1 快速开始](#31-快速开始)
- [3.2 训练](#32-训练)
- [3.3 评估](#33-评估)
- [3.4 预测](#34-预测)
# 表格识别 # 表格识别
- [1. 表格识别 pipeline](#1)
- [2. 性能](#2)
- [3. 使用](#3)
- [3.1 快速开始](#31)
- [3.2 训练](#32)
- [3.3 评估](#33)
- [3.4 预测](#34)
<a name="1"></a>
## 1. 表格识别 pipeline ## 1. 表格识别 pipeline
表格识别主要包含三个模型 表格识别主要包含三个模型
...@@ -18,7 +21,7 @@ ...@@ -18,7 +21,7 @@
具体流程图如下 具体流程图如下
![tableocr_pipeline](../../doc/table/tableocr_pipeline.jpg) ![tableocr_pipeline](../docs/table/tableocr_pipeline.jpg)
流程说明: 流程说明:
...@@ -28,7 +31,9 @@ ...@@ -28,7 +31,9 @@
4. 单元格的识别结果和表格结构一起构造表格的html字符串。 4. 单元格的识别结果和表格结构一起构造表格的html字符串。
<a name="2"></a>
## 2. 性能 ## 2. 性能
我们在 PubTabNet<sup>[1]</sup> 评估数据集上对算法进行了评估,性能如下 我们在 PubTabNet<sup>[1]</sup> 评估数据集上对算法进行了评估,性能如下
...@@ -37,8 +42,10 @@ ...@@ -37,8 +42,10 @@
| EDD<sup>[2]</sup> | 88.3 | | EDD<sup>[2]</sup> | 88.3 |
| Ours | 93.32 | | Ours | 93.32 |
<a name="3"></a>
## 3. 使用 ## 3. 使用
<a name="31"></a>
### 3.1 快速开始 ### 3.1 快速开始
```python ```python
...@@ -54,12 +61,13 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab ...@@ -54,12 +61,13 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd .. cd ..
# 执行预测 # 执行预测
python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=./docs/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ./output/table
``` ```
运行完成后,每张图片的excel表格会保存到output字段指定的目录下 运行完成后,每张图片的excel表格会保存到output字段指定的目录下
note: 上述模型是在 PubLayNet 数据集上训练的表格识别模型,仅支持英文扫描场景,如需识别其他场景需要自己训练模型后替换 `det_model_dir`,`rec_model_dir`,`table_model_dir`三个字段即可。 note: 上述模型是在 PubLayNet 数据集上训练的表格识别模型,仅支持英文扫描场景,如需识别其他场景需要自己训练模型后替换 `det_model_dir`,`rec_model_dir`,`table_model_dir`三个字段即可。
<a name="32"></a>
### 3.2 训练 ### 3.2 训练
在这一章节中,我们仅介绍表格结构模型的训练,[文字检测](../../doc/doc_ch/detection.md)[文字识别](../../doc/doc_ch/recognition.md)的模型训练请参考对应的文档。 在这一章节中,我们仅介绍表格结构模型的训练,[文字检测](../../doc/doc_ch/detection.md)[文字识别](../../doc/doc_ch/recognition.md)的模型训练请参考对应的文档。
...@@ -89,6 +97,7 @@ python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./yo ...@@ -89,6 +97,7 @@ python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./yo
**注意**`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。 **注意**`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。
<a name="33"></a>
### 3.3 评估 ### 3.3 评估
表格使用 [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) 作为模型的评估指标。在进行模型评估之前,需要将pipeline中的三个模型分别导出为inference模型(我们已经提供好),还需要准备评估的gt, gt示例如下: 表格使用 [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) 作为模型的评估指标。在进行模型评估之前,需要将pipeline中的三个模型分别导出为inference模型(我们已经提供好),还需要准备评估的gt, gt示例如下:
...@@ -113,6 +122,8 @@ python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_di ...@@ -113,6 +122,8 @@ python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_di
```bash ```bash
teds: 93.32 teds: 93.32
``` ```
<a name="34"></a>
### 3.4 预测 ### 3.4 预测
```python ```python
...@@ -120,6 +131,6 @@ cd PaddleOCR/ppstructure ...@@ -120,6 +131,6 @@ cd PaddleOCR/ppstructure
python3 table/predict_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table python3 table/predict_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
``` ```
Reference # Reference
1. https://github.com/ibm-aur-nlp/PubTabNet 1. https://github.com/ibm-aur-nlp/PubTabNet
2. https://arxiv.org/pdf/1911.10683 2. https://arxiv.org/pdf/1911.10683
...@@ -54,16 +54,20 @@ def expand(pix, det_box, shape): ...@@ -54,16 +54,20 @@ def expand(pix, det_box, shape):
class TableSystem(object): class TableSystem(object):
def __init__(self, args, text_detector=None, text_recognizer=None): def __init__(self, args, text_detector=None, text_recognizer=None):
self.text_detector = predict_det.TextDetector(args) if text_detector is None else text_detector self.text_detector = predict_det.TextDetector(
self.text_recognizer = predict_rec.TextRecognizer(args) if text_recognizer is None else text_recognizer args) if text_detector is None else text_detector
self.text_recognizer = predict_rec.TextRecognizer(
args) if text_recognizer is None else text_recognizer
self.table_structurer = predict_strture.TableStructurer(args) self.table_structurer = predict_strture.TableStructurer(args)
def __call__(self, img): def __call__(self, img, return_ocr_result_in_table=False):
result = dict()
ori_im = img.copy() ori_im = img.copy()
structure_res, elapse = self.table_structurer(copy.deepcopy(img)) structure_res, elapse = self.table_structurer(copy.deepcopy(img))
dt_boxes, elapse = self.text_detector(copy.deepcopy(img)) dt_boxes, elapse = self.text_detector(copy.deepcopy(img))
dt_boxes = sorted_boxes(dt_boxes) dt_boxes = sorted_boxes(dt_boxes)
if return_ocr_result_in_table:
result['boxes'] = [x.tolist() for x in dt_boxes]
r_boxes = [] r_boxes = []
for box in dt_boxes: for box in dt_boxes:
x_min = box[:, 0].min() - 1 x_min = box[:, 0].min() - 1
...@@ -88,14 +92,17 @@ class TableSystem(object): ...@@ -88,14 +92,17 @@ class TableSystem(object):
rec_res, elapse = self.text_recognizer(img_crop_list) rec_res, elapse = self.text_recognizer(img_crop_list)
logger.debug("rec_res num : {}, elapse : {}".format( logger.debug("rec_res num : {}, elapse : {}".format(
len(rec_res), elapse)) len(rec_res), elapse))
if return_ocr_result_in_table:
result['rec_res'] = rec_res
pred_html, pred = self.rebuild_table(structure_res, dt_boxes, rec_res) pred_html, pred = self.rebuild_table(structure_res, dt_boxes, rec_res)
return pred_html result['html'] = pred_html
return result
def rebuild_table(self, structure_res, dt_boxes, rec_res): def rebuild_table(self, structure_res, dt_boxes, rec_res):
pred_structures, pred_bboxes = structure_res pred_structures, pred_bboxes = structure_res
matched_index = self.match_result(dt_boxes, pred_bboxes) matched_index = self.match_result(dt_boxes, pred_bboxes)
pred_html, pred = self.get_pred_html(pred_structures, matched_index, rec_res) pred_html, pred = self.get_pred_html(pred_structures, matched_index,
rec_res)
return pred_html, pred return pred_html, pred
def match_result(self, dt_boxes, pred_bboxes): def match_result(self, dt_boxes, pred_bboxes):
...@@ -104,11 +111,13 @@ class TableSystem(object): ...@@ -104,11 +111,13 @@ class TableSystem(object):
# gt_box = [np.min(gt_box[:, 0]), np.min(gt_box[:, 1]), np.max(gt_box[:, 0]), np.max(gt_box[:, 1])] # gt_box = [np.min(gt_box[:, 0]), np.min(gt_box[:, 1]), np.max(gt_box[:, 0]), np.max(gt_box[:, 1])]
distances = [] distances = []
for j, pred_box in enumerate(pred_bboxes): for j, pred_box in enumerate(pred_bboxes):
distances.append( distances.append((distance(gt_box, pred_box),
(distance(gt_box, pred_box), 1. - compute_iou(gt_box, pred_box))) # 获取两两cell之间的L1距离和 1- IOU 1. - compute_iou(gt_box, pred_box)
)) # 获取两两cell之间的L1距离和 1- IOU
sorted_distances = distances.copy() sorted_distances = distances.copy()
# 根据距离和IOU挑选最"近"的cell # 根据距离和IOU挑选最"近"的cell
sorted_distances = sorted(sorted_distances, key=lambda item: (item[1], item[0])) sorted_distances = sorted(
sorted_distances, key=lambda item: (item[1], item[0]))
if distances.index(sorted_distances[0]) not in matched.keys(): if distances.index(sorted_distances[0]) not in matched.keys():
matched[distances.index(sorted_distances[0])] = [i] matched[distances.index(sorted_distances[0])] = [i]
else: else:
...@@ -122,7 +131,8 @@ class TableSystem(object): ...@@ -122,7 +131,8 @@ class TableSystem(object):
if '</td>' in tag: if '</td>' in tag:
if td_index in matched_index.keys(): if td_index in matched_index.keys():
b_with = False b_with = False
if '<b>' in ocr_contents[matched_index[td_index][0]] and len(matched_index[td_index]) > 1: if '<b>' in ocr_contents[matched_index[td_index][
0]] and len(matched_index[td_index]) > 1:
b_with = True b_with = True
end_html.extend('<b>') end_html.extend('<b>')
for i, td_index_index in enumerate(matched_index[td_index]): for i, td_index_index in enumerate(matched_index[td_index]):
...@@ -138,7 +148,8 @@ class TableSystem(object): ...@@ -138,7 +148,8 @@ class TableSystem(object):
content = content[:-4] content = content[:-4]
if len(content) == 0: if len(content) == 0:
continue continue
if i != len(matched_index[td_index]) - 1 and ' ' != content[-1]: if i != len(matched_index[
td_index]) - 1 and ' ' != content[-1]:
content += ' ' content += ' '
end_html.extend(content) end_html.extend(content)
if b_with: if b_with:
...@@ -187,18 +198,19 @@ def main(args): ...@@ -187,18 +198,19 @@ def main(args):
for i, image_file in enumerate(image_file_list): for i, image_file in enumerate(image_file_list):
logger.info("[{}/{}] {}".format(i, img_num, image_file)) logger.info("[{}/{}] {}".format(i, img_num, image_file))
img, flag = check_and_read_gif(image_file) img, flag = check_and_read_gif(image_file)
excel_path = os.path.join(args.output, os.path.basename(image_file).split('.')[0] + '.xlsx') excel_path = os.path.join(
args.output, os.path.basename(image_file).split('.')[0] + '.xlsx')
if not flag: if not flag:
img = cv2.imread(image_file) img = cv2.imread(image_file)
if img is None: if img is None:
logger.error("error in loading image:{}".format(image_file)) logger.error("error in loading image:{}".format(image_file))
continue continue
starttime = time.time() starttime = time.time()
pred_html = text_sys(img) pred_res = text_sys(img)
pred_html = pred_res['html']
logger.info(pred_html)
to_excel(pred_html, excel_path) to_excel(pred_html, excel_path)
logger.info('excel saved to {}'.format(excel_path)) logger.info('excel saved to {}'.format(excel_path))
logger.info(pred_html)
elapse = time.time() - starttime elapse = time.time() - starttime
logger.info("Predict time : {:.3f}s".format(elapse)) logger.info("Predict time : {:.3f}s".format(elapse))
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
import ast import ast
from PIL import Image from PIL import Image
import numpy as np import numpy as np
from tools.infer.utility import draw_ocr_box_txt, init_args as infer_args from tools.infer.utility import draw_ocr_box_txt, str2bool, init_args as infer_args
def init_args(): def init_args():
...@@ -30,6 +30,7 @@ def init_args(): ...@@ -30,6 +30,7 @@ def init_args():
"--table_char_dict_path", "--table_char_dict_path",
type=str, type=str,
default="../ppocr/utils/dict/table_structure_dict.txt") default="../ppocr/utils/dict/table_structure_dict.txt")
# params for layout
parser.add_argument( parser.add_argument(
"--layout_path_model", "--layout_path_model",
type=str, type=str,
...@@ -39,11 +40,27 @@ def init_args(): ...@@ -39,11 +40,27 @@ def init_args():
type=ast.literal_eval, type=ast.literal_eval,
default=None, default=None,
help='label map according to ppstructure/layout/README_ch.md') help='label map according to ppstructure/layout/README_ch.md')
# params for inference
parser.add_argument( parser.add_argument(
"--mode", "--mode",
type=str, type=str,
default='structure', default='structure',
help='structure and vqa is supported') help='structure and vqa is supported')
parser.add_argument(
"--layout",
type=str2bool,
default=True,
help='Whether to enable layout analysis')
parser.add_argument(
"--table",
type=str2bool,
default=True,
help='In the forward, whether the table area uses table recognition')
parser.add_argument(
"--ocr",
type=str2bool,
default=True,
help='In the forward, whether the non-table area is recognition by ocr')
return parser return parser
......
...@@ -51,9 +51,10 @@ We evaluate the algorithm on the Chinese dataset of [XFUND](https://github.com/d ...@@ -51,9 +51,10 @@ We evaluate the algorithm on the Chinese dataset of [XFUND](https://github.com/d
**Note:** The test images are from the XFUND dataset. **Note:** The test images are from the XFUND dataset.
<a name="31"></a>
### 3.1 SER ### 3.1 SER
![](../../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../../doc/vqa/result_ser/zh_val_42_ser.jpg) ![](../docs/vqa/result_ser/zh_val_0_ser.jpg) | ![](../docs/vqa/result_ser/zh_val_42_ser.jpg)
---|--- ---|---
Boxes with different colors in the figure represent different categories. For the XFUND dataset, there are 3 categories: `QUESTION`, `ANSWER`, `HEADER` Boxes with different colors in the figure represent different categories. For the XFUND dataset, there are 3 categories: `QUESTION`, `ANSWER`, `HEADER`
...@@ -64,9 +65,10 @@ Boxes with different colors in the figure represent different categories. For th ...@@ -64,9 +65,10 @@ Boxes with different colors in the figure represent different categories. For th
The corresponding categories and OCR recognition results are also marked on the upper left of the OCR detection frame. The corresponding categories and OCR recognition results are also marked on the upper left of the OCR detection frame.
<a name="32"></a>
### 3.2 RE ### 3.2 RE
![](../../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../../doc/vqa/result_re/zh_val_40_re.jpg) ![](../docs/vqa/result_re/zh_val_21_re.jpg) | ![](../docs/vqa/result_re/zh_val_40_re.jpg)
---|--- ---|---
...@@ -150,6 +152,7 @@ wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -x ...@@ -150,6 +152,7 @@ wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -x
cd ../ cd ../
```` ````
<a name="52"></a>
### 5.2 SER ### 5.2 SER
Before starting training, you need to modify the following four fields Before starting training, you need to modify the following four fields
...@@ -203,6 +206,7 @@ export CUDA_VISIBLE_DEVICES=0 ...@@ -203,6 +206,7 @@ export CUDA_VISIBLE_DEVICES=0
python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
```` ````
<a name="53"></a>
### 5.3 RE ### 5.3 RE
* start training * start training
......
...@@ -12,3 +12,4 @@ cython ...@@ -12,3 +12,4 @@ cython
lxml lxml
premailer premailer
openpyxl openpyxl
attrdict
===========================train_params===========================
model_name:ch_PPOCRv2_det
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=500
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
##
trainer:norm_train
norm_train:tools/train.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml -o
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:tools/export_model.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml -o
quant_export:null
fpgm_export:
distill_export:null
export1:null
export2:null
inference_dir:Student
infer_model:./inference/ch_PP-OCRv2_det_infer/
infer_export:null
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,640,640]}];[{float32,[3,960,960]}]
...@@ -6,7 +6,7 @@ Global.use_gpu:True|True ...@@ -6,7 +6,7 @@ Global.use_gpu:True|True
Global.auto_cast:fp32 Global.auto_cast:fp32
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=500 Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=500
Global.save_model_dir:./output/ Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4 Train.loader.batch_size_per_card:lite_train_lite_infer=1|whole_train_whole_infer=4
Global.pretrained_model:null Global.pretrained_model:null
train_model_name:latest train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/ train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
......
===========================train_params===========================
model_name:ch_PPOCRv2_det_PACT
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=500
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
##
trainer:pact_train
norm_train:null
pact_train:deploy/slim/quantization/quant.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml -o
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:null
quant_export:deploy/slim/quantization/export_model.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml -o
fpgm_export:
distill_export:null
export1:null
export2:null
inference_dir:Student
infer_model:./inference/ch_PP-OCRv2_det_infer/
infer_export:null
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,640,640]}];[{float32,[3,960,960]}]
===========================train_params===========================
model_name:PPOCRv2_ocr_rec
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=3|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=16|whole_train_whole_infer=128
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./inference/rec_inference
null:null
##
trainer:norm_train
norm_train:tools/train.py -c test_tipc/configs/ch_PP-OCRv2_rec/ch_PP-OCRv2_rec_distillation.yml -o
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:tools/export_model.py -c test_tipc/configs/ch_PP-OCRv2_rec/ch_PP-OCRv2_rec_distillation.yml -o
quant_export:
fpgm_export:
distill_export:null
export1:null
export2:null
inference_dir:Student
infer_model:./inference/ch_PP-OCRv2_rec_infer
infer_export:null
infer_quant:False
inference:tools/infer/predict_rec.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1|6
--use_tensorrt:False|True
--precision:fp32|int8
--rec_model_dir:
--image_dir:./inference/rec_inference
null:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,32,320]}]
===========================train_params===========================
model_name:ch_PPOCRv2_rec_PACT
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=3|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=16|whole_train_whole_infer=128
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./inference/rec_inference
null:null
##
trainer:pact_train
norm_train:null
pact_train:deploy/slim/quantization/quant.py -c test_tipc/configs/ch_PP-OCRv2_rec/ch_PP-OCRv2_rec_distillation.yml -o
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:null
quant_export:deploy/slim/quantization/export_model.py -c test_tipc/configs/ch_PP-OCRv2_rec/ch_PP-OCRv2_rec_distillation.yml -o
fpgm_export: null
distill_export:null
export1:null
export2:null
inference_dir:Student
infer_model:./inference/ch_PP-OCRv2_rec_slim_quant_infer
infer_export:null
infer_quant:True
inference:tools/infer/predict_rec.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1|6
--use_tensorrt:False|True
--precision:fp32|int8
--rec_model_dir:
--image_dir:./inference/rec_inference
null:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,32,320]}]
===========================train_params=========================== ===========================train_params===========================
model_name:ocr_det model_name:ch_ppocr_mobile_v2.0_det
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
Global.auto_cast:amp Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=300 Global.epoch_num:lite_train_lite_infer=100|whole_train_whole_infer=300
Global.save_model_dir:./output/ Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4 Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
Global.pretrained_model:null Global.pretrained_model:null
...@@ -12,10 +12,10 @@ train_model_name:latest ...@@ -12,10 +12,10 @@ train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/ train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null null:null
## ##
trainer:norm_train|pact_train|fpgm_train trainer:norm_train
norm_train:tools/train.py -c test_tipc/configs/ppocr_det_mobile/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained norm_train:tools/train.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
pact_train:deploy/slim/quantization/quant.py -c test_tipc/configs/ppocr_det_mobile/det_mv3_db.yml -o pact_train:null
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c test_tipc/configs/ppocr_det_mobile/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/det_mv3_db_v2.0_train/best_accuracy fpgm_train:null
distill_train:null distill_train:null
null:null null:null
null:null null:null
...@@ -26,10 +26,10 @@ null:null ...@@ -26,10 +26,10 @@ null:null
## ##
===========================infer_params=========================== ===========================infer_params===========================
Global.save_inference_dir:./output/ Global.save_inference_dir:./output/
Global.pretrained_model: Global.checkpoints:
norm_export:tools/export_model.py -c test_tipc/configs/ppocr_det_mobile/det_mv3_db.yml -o norm_export:tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o
quant_export:deploy/slim/quantization/export_model.py -c test_tipc/configs/ppocr_det_mobile/det_mv3_db.yml -o quant_export:null
fpgm_export:deploy/slim/prune/export_prune_model.py -c test_tipc/configs/ppocr_det_mobile/det_mv3_db.yml -o fpgm_export:null
distill_export:null distill_export:null
export1:null export1:null
export2:null export2:null
...@@ -49,3 +49,5 @@ inference:tools/infer/predict_det.py ...@@ -49,3 +49,5 @@ inference:tools/infer/predict_det.py
null:null null:null
--benchmark:True --benchmark:True
null:null null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,640,640]}];[{float32,[3,960,960]}]
\ No newline at end of file
===========================train_params=========================== ===========================train_params===========================
model_name:ocr_det model_name:ch_ppocr_mobile_v2.0_det_FPGM
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det_FPGM
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=5|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
##
trainer:fpgm_train
norm_train:null
pact_train:null
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./pretrain_models/det_mv3_db_v2.0_train/best_accuracy
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:null
quant_export:null
fpgm_export:deploy/slim/prune/export_prune_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o
distill_export:null
export1:null
export2:null
inference_dir:null
train_model:null
infer_export:null
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,640,640]}];[{float32,[3,960,960]}]
\ No newline at end of file
===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det_PACT
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=20|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
##
trainer:pact_train
norm_train:null
pact_train:deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:null
quant_export:deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o
fpgm_export:null
distill_export:null
export1:null
export2:null
inference_dir:null
train_model:./inference/ch_ppocr_mobile_v2.0_det_prune_infer/
infer_export:null
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,640,640]}];[{float32,[3,960,960]}]
===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=2|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=128|whole_train_whole_infer=128
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./inference/rec_inference
null:null
##
trainer:norm_train
norm_train:tools/train.py -c configs/rec/rec_icdar15_train.yml -o
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c configs/rec/rec_icdar15_train.yml -o
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:tools/export_model.py -c configs/rec/rec_icdar15_train.yml -o
quant_export:null
fpgm_export:null
distill_export:null
export1:null
export2:null
##
train_model:./inference/ch_ppocr_mobile_v2.0_rec_train/best_accuracy
infer_export:tools/export_model.py -c configs/rec/rec_icdar15_train.yml -o
infer_quant:False
inference:tools/infer/predict_rec.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1|6
--use_tensorrt:True|False
--precision:fp32|int8
--rec_model_dir:
--image_dir:./inference/rec_inference
--save_log_path:./test/output/
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,32,100]}]
===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_FPGM
python:python3.7
gpu_list:0
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=128|whole_train_whole_infer=128
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/ic15_data/test/word_1.png
null:null
##
trainer:fpgm_train
norm_train:null
pact_train:null
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./pretrain_models/ch_ppocr_mobile_v2.0_rec_train/best_accuracy
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:null
quant_export:null
fpgm_export:deploy/slim/prune/export_prune_model.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o
distill_export:null
export1:null
export2:null
inference_dir:null
train_model:null
infer_export:null
infer_quant:False
inference:tools/infer/predict_rec.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|int8
--rec_model_dir:
--image_dir:./inference/rec_inference
null:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,32,320]}]
\ No newline at end of file
===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_PACT
python:python3.7
gpu_list:0
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=128|whole_train_whole_infer=128
Global.checkpoints:null
train_model_name:latest
train_infer_img_dir:./train_data/ic15_data/test/word_1.png
null:null
##
trainer:pact_train
norm_train:null
pact_train:deploy/slim/quantization/quant.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_PACT/rec_chinese_lite_train_v2.0.yml -o
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:null
quant_export:deploy/slim/quantization/export_model.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_PACT/rec_chinese_lite_train_v2.0.yml -o
fpgm_export:null
distill_export:null
export1:null
export2:null
inference_dir:null
infer_model:./inference/ch_ppocr_mobile_v2.0_rec_slim_infer/
infer_export:null
infer_quant:False
inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ppocr_keys_v1.txt --rec_image_shape="3,32,100"
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1|6
--use_tensorrt:False|True
--precision:fp32|int8
--rec_model_dir:
--image_dir:./inference/rec_inference
--save_log_path:./test/output/
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,32,320]}]
===========================train_params===========================
model_name:ch_ppocr_server_v2.0_det
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=2|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_lite_infer=4
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
##
trainer:norm_train
norm_train:tools/train.py -c test_tipc/configs/ch_ppocr_server_v2.0_det/det_r50_vd_db.yml -o
quant_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c test_tipc/configs/ch_ppocr_server_v2.0_det/det_r50_vd_db.yml -o
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:tools/export_model.py -c test_tipc/configs/ch_ppocr_server_v2.0_det/det_r50_vd_db.yml -o
quant_export:null
fpgm_export:null
distill_export:null
export1:null
export2:null
##
train_model:./inference/ch_ppocr_server_v2.0_det_train/best_accuracy
infer_export:tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml -o
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
--save_log_path:null
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,640,640]}];[{float32,[3,960,960]}]
\ No newline at end of file
===========================train_params===========================
model_name:ch_ppocr_server_v2.0_rec
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:amp
Global.epoch_num:lite_train_lite_infer=5|whole_train_whole_infer=100
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=128|whole_train_whole_infer=128
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./inference/rec_inference
null:null
##
trainer:norm_train
norm_train:tools/train.py -c test_tipc/configs/ch_ppocr_server_v2.0_rec/rec_icdar15_train.yml -o
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c test_tipc/configs/ch_ppocr_server_v2.0_rec/rec_icdar15_train.yml -o
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.checkpoints:
norm_export:tools/export_model.py -c test_tipc/configs/ch_ppocr_server_v2.0_rec/rec_icdar15_train.yml -o
quant_export:null
fpgm_export:null
distill_export:null
export1:null
export2:null
##
train_model:./inference/ch_ppocr_server_v2.0_rec_train/best_accuracy
infer_export:tools/export_model.py -c test_tipc/configs/ch_ppocr_server_v2.0_rec/rec_icdar15_train.yml -o
infer_quant:False
inference:tools/infer/predict_rec.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1|6
--use_tensorrt:True|False
--precision:fp32|int8
--rec_model_dir:
--image_dir:./inference/rec_inference
--save_log_path:./test/output/
--benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,32,100]}]
...@@ -37,7 +37,7 @@ export2:null ...@@ -37,7 +37,7 @@ export2:null
train_model:./inference/rec_mv3_tps_bilstm_att_v2.0_train/best_accuracy train_model:./inference/rec_mv3_tps_bilstm_att_v2.0_train/best_accuracy
infer_export:tools/export_model.py -c test_tipc/configs/rec_mv3_tps_bilstm_att_v2.0/rec_mv3_tps_bilstm_att.yml -o infer_export:tools/export_model.py -c test_tipc/configs/rec_mv3_tps_bilstm_att_v2.0/rec_mv3_tps_bilstm_att.yml -o
infer_quant:False infer_quant:False
inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --rec_image_shape="3,32,100" --rec_algorithm="RARE" inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --rec_image_shape="3,32,100" --rec_algorithm="RARE" --min_subgraph_size=5
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:True|False --enable_mkldnn:True|False
--cpu_threads:1|6 --cpu_threads:1|6
......
...@@ -47,14 +47,40 @@ def main(): ...@@ -47,14 +47,40 @@ def main():
if config['Architecture']["algorithm"] in ["Distillation", if config['Architecture']["algorithm"] in ["Distillation",
]: # distillation model ]: # distillation model
for key in config['Architecture']["Models"]: for key in config['Architecture']["Models"]:
if config['Architecture']['Models'][key]['Head'][
'name'] == 'MultiHead': # for multi head
out_channels_list = {}
if config['PostProcess'][
'name'] == 'DistillationSARLabelDecode':
char_num = char_num - 2
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Models'][key]['Head'][
'out_channels_list'] = out_channels_list
else:
config['Architecture']["Models"][key]["Head"][ config['Architecture']["Models"][key]["Head"][
'out_channels'] = char_num 'out_channels'] = char_num
elif config['Architecture']['Head'][
'name'] == 'MultiHead': # for multi head
out_channels_list = {}
if config['PostProcess']['name'] == 'SARLabelDecode':
char_num = char_num - 2
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Head'][
'out_channels_list'] = out_channels_list
else: # base rec model else: # base rec model
config['Architecture']["Head"]['out_channels'] = char_num config['Architecture']["Head"]['out_channels'] = char_num
model = build_model(config['Architecture']) model = build_model(config['Architecture'])
extra_input = config['Architecture'][ extra_input_models = ["SRN", "NRTR", "SAR", "SEED", "SVTR"]
'algorithm'] in ["SRN", "NRTR", "SAR", "SEED"] extra_input = False
if config['Architecture']['algorithm'] == 'Distillation':
for key in config['Architecture']["Models"]:
extra_input = extra_input or config['Architecture']['Models'][key][
'algorithm'] in extra_input_models
else:
extra_input = config['Architecture']['algorithm'] in extra_input_models
if "model_type" in config['Architecture'].keys(): if "model_type" in config['Architecture'].keys():
model_type = config['Architecture']['model_type'] model_type = config['Architecture']['model_type']
else: else:
......
...@@ -31,7 +31,7 @@ from ppocr.utils.logging import get_logger ...@@ -31,7 +31,7 @@ from ppocr.utils.logging import get_logger
from tools.program import load_config, merge_config, ArgsParser from tools.program import load_config, merge_config, ArgsParser
def export_single_model(model, arch_config, save_path, logger): def export_single_model(model, arch_config, save_path, logger, quanter=None):
if arch_config["algorithm"] == "SRN": if arch_config["algorithm"] == "SRN":
max_text_length = arch_config["Head"]["max_text_length"] max_text_length = arch_config["Head"]["max_text_length"]
other_shape = [ other_shape = [
...@@ -55,6 +55,18 @@ def export_single_model(model, arch_config, save_path, logger): ...@@ -55,6 +55,18 @@ def export_single_model(model, arch_config, save_path, logger):
shape=[None, 3, 48, 160], dtype="float32"), shape=[None, 3, 48, 160], dtype="float32"),
] ]
model = to_static(model, input_spec=other_shape) model = to_static(model, input_spec=other_shape)
elif arch_config["algorithm"] == "SVTR":
if arch_config["Head"]["name"] == 'MultiHead':
other_shape = [
paddle.static.InputSpec(
shape=[None, 3, 48, -1], dtype="float32"),
]
else:
other_shape = [
paddle.static.InputSpec(
shape=[None, 3, 64, 256], dtype="float32"),
]
model = to_static(model, input_spec=other_shape)
elif arch_config["algorithm"] == "PREN": elif arch_config["algorithm"] == "PREN":
other_shape = [ other_shape = [
paddle.static.InputSpec( paddle.static.InputSpec(
...@@ -83,7 +95,10 @@ def export_single_model(model, arch_config, save_path, logger): ...@@ -83,7 +95,10 @@ def export_single_model(model, arch_config, save_path, logger):
shape=[None] + infer_shape, dtype="float32") shape=[None] + infer_shape, dtype="float32")
]) ])
if quanter is None:
paddle.jit.save(model, save_path) paddle.jit.save(model, save_path)
else:
quanter.save_quantized_model(model, save_path)
logger.info("inference model is saved to {}".format(save_path)) logger.info("inference model is saved to {}".format(save_path))
return return
...@@ -105,13 +120,35 @@ def main(): ...@@ -105,13 +120,35 @@ def main():
if config["Architecture"]["algorithm"] in ["Distillation", if config["Architecture"]["algorithm"] in ["Distillation",
]: # distillation model ]: # distillation model
for key in config["Architecture"]["Models"]: for key in config["Architecture"]["Models"]:
if config["Architecture"]["Models"][key]["Head"][
"name"] == 'MultiHead': # multi head
out_channels_list = {}
if config['PostProcess'][
'name'] == 'DistillationSARLabelDecode':
char_num = char_num - 2
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Models'][key]['Head'][
'out_channels_list'] = out_channels_list
else:
config["Architecture"]["Models"][key]["Head"][ config["Architecture"]["Models"][key]["Head"][
"out_channels"] = char_num "out_channels"] = char_num
# just one final tensor needs to to exported for inference # just one final tensor needs to to exported for inference
config["Architecture"]["Models"][key][ config["Architecture"]["Models"][key][
"return_all_feats"] = False "return_all_feats"] = False
elif config['Architecture']['Head'][
'name'] == 'MultiHead': # multi head
out_channels_list = {}
char_num = len(getattr(post_process_class, 'character'))
if config['PostProcess']['name'] == 'SARLabelDecode':
char_num = char_num - 2
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Head'][
'out_channels_list'] = out_channels_list
else: # base rec model else: # base rec model
config["Architecture"]["Head"]["out_channels"] = char_num config["Architecture"]["Head"]["out_channels"] = char_num
model = build_model(config["Architecture"]) model = build_model(config["Architecture"])
load_model(config, model) load_model(config, model)
model.eval() model.eval()
......
...@@ -284,7 +284,7 @@ if __name__ == "__main__": ...@@ -284,7 +284,7 @@ if __name__ == "__main__":
total_time += elapse total_time += elapse
count += 1 count += 1
save_pred = os.path.basename(image_file) + "\t" + str( save_pred = os.path.basename(image_file) + "\t" + str(
json.dumps(np.array(dt_boxes).astype(np.int32).tolist())) + "\n" json.dumps([x.tolist() for x in dt_boxes])) + "\n"
save_results.append(save_pred) save_results.append(save_pred)
logger.info(save_pred) logger.info(save_pred)
logger.info("The predict time of {}: {}".format(image_file, elapse)) logger.info("The predict time of {}: {}".format(image_file, elapse))
......
...@@ -107,7 +107,7 @@ class TextRecognizer(object): ...@@ -107,7 +107,7 @@ class TextRecognizer(object):
return norm_img.astype(np.float32) / 128. - 1. return norm_img.astype(np.float32) / 128. - 1.
assert imgC == img.shape[2] assert imgC == img.shape[2]
imgW = int((32 * max_wh_ratio)) imgW = int((imgH * max_wh_ratio))
if self.use_onnx: if self.use_onnx:
w = self.input_tensor.shape[3:][0] w = self.input_tensor.shape[3:][0]
if w is not None and w > 0: if w is not None and w > 0:
...@@ -132,6 +132,17 @@ class TextRecognizer(object): ...@@ -132,6 +132,17 @@ class TextRecognizer(object):
padding_im[:, :, 0:resized_w] = resized_image padding_im[:, :, 0:resized_w] = resized_image
return padding_im return padding_im
def resize_norm_img_svtr(self, img, image_shape):
imgC, imgH, imgW = image_shape
resized_image = cv2.resize(
img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
resized_image = resized_image.astype('float32')
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
return resized_image
def resize_norm_img_srn(self, img, image_shape): def resize_norm_img_srn(self, img, image_shape):
imgC, imgH, imgW = image_shape imgC, imgH, imgW = image_shape
...@@ -255,18 +266,16 @@ class TextRecognizer(object): ...@@ -255,18 +266,16 @@ class TextRecognizer(object):
for beg_img_no in range(0, img_num, batch_num): for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num) end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = [] norm_img_batch = []
max_wh_ratio = 0 imgC, imgH, imgW = self.rec_image_shape
max_wh_ratio = imgW / imgH
# max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no): for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2] h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio) max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no): for ino in range(beg_img_no, end_img_no):
if self.rec_algorithm != "SRN" and self.rec_algorithm != "SAR":
norm_img = self.resize_norm_img(img_list[indices[ino]], if self.rec_algorithm == "SAR":
max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
elif self.rec_algorithm == "SAR":
norm_img, _, _, valid_ratio = self.resize_norm_img_sar( norm_img, _, _, valid_ratio = self.resize_norm_img_sar(
img_list[indices[ino]], self.rec_image_shape) img_list[indices[ino]], self.rec_image_shape)
norm_img = norm_img[np.newaxis, :] norm_img = norm_img[np.newaxis, :]
...@@ -274,7 +283,7 @@ class TextRecognizer(object): ...@@ -274,7 +283,7 @@ class TextRecognizer(object):
valid_ratios = [] valid_ratios = []
valid_ratios.append(valid_ratio) valid_ratios.append(valid_ratio)
norm_img_batch.append(norm_img) norm_img_batch.append(norm_img)
else: elif self.rec_algorithm == "SRN":
norm_img = self.process_image_srn( norm_img = self.process_image_srn(
img_list[indices[ino]], self.rec_image_shape, 8, 25) img_list[indices[ino]], self.rec_image_shape, 8, 25)
encoder_word_pos_list = [] encoder_word_pos_list = []
...@@ -286,6 +295,16 @@ class TextRecognizer(object): ...@@ -286,6 +295,16 @@ class TextRecognizer(object):
gsrm_slf_attn_bias1_list.append(norm_img[3]) gsrm_slf_attn_bias1_list.append(norm_img[3])
gsrm_slf_attn_bias2_list.append(norm_img[4]) gsrm_slf_attn_bias2_list.append(norm_img[4])
norm_img_batch.append(norm_img[0]) norm_img_batch.append(norm_img[0])
elif self.rec_algorithm == "SVTR":
norm_img = self.resize_norm_img_svtr(
img_list[indices[ino]], self.rec_image_shape)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
else:
norm_img = self.resize_norm_img(img_list[indices[ino]],
max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch) norm_img_batch = np.concatenate(norm_img_batch)
norm_img_batch = norm_img_batch.copy() norm_img_batch = norm_img_batch.copy()
if self.benchmark: if self.benchmark:
......
...@@ -271,9 +271,10 @@ def create_predictor(args, mode, logger): ...@@ -271,9 +271,10 @@ def create_predictor(args, mode, logger):
elif mode == "rec": elif mode == "rec":
if args.rec_algorithm != "CRNN": if args.rec_algorithm != "CRNN":
use_dynamic_shape = False use_dynamic_shape = False
min_input_shape = {"x": [1, 3, 32, 10]} imgH = int(args.rec_image_shape.split(',')[-2])
max_input_shape = {"x": [args.rec_batch_num, 3, 32, 1536]} min_input_shape = {"x": [1, 3, imgH, 10]}
opt_input_shape = {"x": [args.rec_batch_num, 3, 32, 320]} max_input_shape = {"x": [args.rec_batch_num, 3, imgH, 1536]}
opt_input_shape = {"x": [args.rec_batch_num, 3, imgH, 320]}
elif mode == "cls": elif mode == "cls":
min_input_shape = {"x": [1, 3, 48, 10]} min_input_shape = {"x": [1, 3, 48, 10]}
max_input_shape = {"x": [args.rec_batch_num, 3, 48, 1024]} max_input_shape = {"x": [args.rec_batch_num, 3, 48, 1024]}
...@@ -300,8 +301,8 @@ def create_predictor(args, mode, logger): ...@@ -300,8 +301,8 @@ def create_predictor(args, mode, logger):
# enable memory optim # enable memory optim
config.enable_memory_optim() config.enable_memory_optim()
config.disable_glog_info() config.disable_glog_info()
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.delete_pass("matmul_transpose_reshape_fuse_pass")
if mode == 'table': if mode == 'table':
config.delete_pass("fc_fuse_pass") # not supported for table config.delete_pass("fc_fuse_pass") # not supported for table
config.switch_use_feed_fetch_ops(False) config.switch_use_feed_fetch_ops(False)
......
...@@ -57,6 +57,8 @@ def main(): ...@@ -57,6 +57,8 @@ def main():
continue continue
elif op_name == 'KeepKeys': elif op_name == 'KeepKeys':
op[op_name]['keep_keys'] = ['image'] op[op_name]['keep_keys'] = ['image']
elif op_name == "SSLRotateResize":
op[op_name]["mode"] = "test"
transforms.append(op) transforms.append(op)
global_config['infer_mode'] = True global_config['infer_mode'] = True
ops = create_operators(transforms, global_config) ops = create_operators(transforms, global_config)
......
...@@ -51,8 +51,28 @@ def main(): ...@@ -51,8 +51,28 @@ def main():
if config['Architecture']["algorithm"] in ["Distillation", if config['Architecture']["algorithm"] in ["Distillation",
]: # distillation model ]: # distillation model
for key in config['Architecture']["Models"]: for key in config['Architecture']["Models"]:
if config['Architecture']['Models'][key]['Head'][
'name'] == 'MultiHead': # for multi head
out_channels_list = {}
if config['PostProcess'][
'name'] == 'DistillationSARLabelDecode':
char_num = char_num - 2
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Models'][key]['Head'][
'out_channels_list'] = out_channels_list
else:
config['Architecture']["Models"][key]["Head"][ config['Architecture']["Models"][key]["Head"][
'out_channels'] = char_num 'out_channels'] = char_num
elif config['Architecture']['Head'][
'name'] == 'MultiHead': # for multi head loss
out_channels_list = {}
if config['PostProcess']['name'] == 'SARLabelDecode':
char_num = char_num - 2
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Head'][
'out_channels_list'] = out_channels_list
else: # base rec model else: # base rec model
config['Architecture']["Head"]['out_channels'] = char_num config['Architecture']["Head"]['out_channels'] = char_num
......
...@@ -201,12 +201,19 @@ def train(config, ...@@ -201,12 +201,19 @@ def train(config,
model.train() model.train()
use_srn = config['Architecture']['algorithm'] == "SRN" use_srn = config['Architecture']['algorithm'] == "SRN"
extra_input = config['Architecture'][ extra_input_models = ["SRN", "NRTR", "SAR", "SEED", "SVTR"]
'algorithm'] in ["SRN", "NRTR", "SAR", "SEED"] extra_input = False
if config['Architecture']['algorithm'] == 'Distillation':
for key in config['Architecture']["Models"]:
extra_input = extra_input or config['Architecture']['Models'][key][
'algorithm'] in extra_input_models
else:
extra_input = config['Architecture']['algorithm'] in extra_input_models
try: try:
model_type = config['Architecture']['model_type'] model_type = config['Architecture']['model_type']
except: except:
model_type = None model_type = None
algorithm = config['Architecture']['algorithm'] algorithm = config['Architecture']['algorithm']
start_epoch = best_model_dict[ start_epoch = best_model_dict[
...@@ -268,6 +275,11 @@ def train(config, ...@@ -268,6 +275,11 @@ def train(config,
batch = [item.numpy() for item in batch] batch = [item.numpy() for item in batch]
if model_type in ['table', 'kie']: if model_type in ['table', 'kie']:
eval_class(preds, batch) eval_class(preds, batch)
else:
if config['Loss']['name'] in ['MultiLoss', 'MultiLoss_v2'
]: # for multi head loss
post_result = post_process_class(
preds['ctc'], batch[1]) # for CTC head out
else: else:
post_result = post_process_class(preds, batch[1]) post_result = post_process_class(preds, batch[1])
eval_class(post_result, batch) eval_class(post_result, batch)
...@@ -541,7 +553,7 @@ def preprocess(is_train=False): ...@@ -541,7 +553,7 @@ def preprocess(is_train=False):
assert alg in [ assert alg in [
'EAST', 'DB', 'SAST', 'Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN', 'EAST', 'DB', 'SAST', 'Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN',
'CLS', 'PGNet', 'Distillation', 'NRTR', 'TableAttn', 'SAR', 'PSE', 'CLS', 'PGNet', 'Distillation', 'NRTR', 'TableAttn', 'SAR', 'PSE',
'SEED', 'SDMGR', 'LayoutXLM', 'LayoutLM', 'PREN', 'FCE' 'SEED', 'SDMGR', 'LayoutXLM', 'LayoutLM', 'PREN', 'FCE', 'SVTR'
] ]
device = 'cpu' device = 'cpu'
......
...@@ -74,11 +74,49 @@ def main(config, device, logger, vdl_writer): ...@@ -74,11 +74,49 @@ def main(config, device, logger, vdl_writer):
if config['Architecture']["algorithm"] in ["Distillation", if config['Architecture']["algorithm"] in ["Distillation",
]: # distillation model ]: # distillation model
for key in config['Architecture']["Models"]: for key in config['Architecture']["Models"]:
if config['Architecture']['Models'][key]['Head'][
'name'] == 'MultiHead': # for multi head
if config['PostProcess'][
'name'] == 'DistillationSARLabelDecode':
char_num = char_num - 2
# update SARLoss params
assert list(config['Loss']['loss_config_list'][-1].keys())[
0] == 'DistillationSARLoss'
config['Loss']['loss_config_list'][-1][
'DistillationSARLoss']['ignore_index'] = char_num + 1
out_channels_list = {}
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Models'][key]['Head'][
'out_channels_list'] = out_channels_list
else:
config['Architecture']["Models"][key]["Head"][ config['Architecture']["Models"][key]["Head"][
'out_channels'] = char_num 'out_channels'] = char_num
elif config['Architecture']['Head'][
'name'] == 'MultiHead': # for multi head
if config['PostProcess']['name'] == 'SARLabelDecode':
char_num = char_num - 2
# update SARLoss params
assert list(config['Loss']['loss_config_list'][1].keys())[
0] == 'SARLoss'
if config['Loss']['loss_config_list'][1]['SARLoss'] is None:
config['Loss']['loss_config_list'][1]['SARLoss'] = {
'ignore_index': char_num + 1
}
else:
config['Loss']['loss_config_list'][1]['SARLoss'][
'ignore_index'] = char_num + 1
out_channels_list = {}
out_channels_list['CTCLabelDecode'] = char_num
out_channels_list['SARLabelDecode'] = char_num + 2
config['Architecture']['Head'][
'out_channels_list'] = out_channels_list
else: # base rec model else: # base rec model
config['Architecture']["Head"]['out_channels'] = char_num config['Architecture']["Head"]['out_channels'] = char_num
if config['PostProcess']['name'] == 'SARLabelDecode': # for SAR model
config['Loss']['ignore_index'] = char_num - 1
model = build_model(config['Architecture']) model = build_model(config['Architecture'])
if config['Global']['distributed']: if config['Global']['distributed']:
model = paddle.DataParallel(model) model = paddle.DataParallel(model)
...@@ -91,7 +129,7 @@ def main(config, device, logger, vdl_writer): ...@@ -91,7 +129,7 @@ def main(config, device, logger, vdl_writer):
config['Optimizer'], config['Optimizer'],
epochs=config['Global']['epoch_num'], epochs=config['Global']['epoch_num'],
step_each_epoch=len(train_dataloader), step_each_epoch=len(train_dataloader),
parameters=model.parameters()) model=model)
# build metric # build metric
eval_class = build_metric(config['Metric']) eval_class = build_metric(config['Metric'])
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册