提交 96a8e0f8 编写于 作者: A andyjpaddle

Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into dygraph

...@@ -113,18 +113,19 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -113,18 +113,19 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- [Quick Start](./ppstructure/docs/quickstart_en.md) - [Quick Start](./ppstructure/docs/quickstart_en.md)
- [Model Zoo](./ppstructure/docs/models_list_en.md) - [Model Zoo](./ppstructure/docs/models_list_en.md)
- [Model training](./doc/doc_en/training_en.md) - [Model training](./doc/doc_en/training_en.md)
- [Layout Parser](./ppstructure/layout/README.md) - [Layout Analysis](./ppstructure/layout/README.md)
- [Table Recognition](./ppstructure/table/README.md) - [Table Recognition](./ppstructure/table/README.md)
- [DocVQA](./ppstructure/vqa/README.md) - [Key Information Extraction](./ppstructure/kie/README.md)
- [Key Information Extraction](./ppstructure/docs/kie_en.md)
- [Inference and Deployment](./deploy/README.md) - [Inference and Deployment](./deploy/README.md)
- [Python Inference](./ppstructure/docs/inference_en.md) - [Python Inference](./ppstructure/docs/inference_en.md)
- [C++ Inference]() - [C++ Inference](./deploy/cpp_infer/readme.md)
- [Serving](./deploy/pdserving/README.md) - [Serving](./deploy/pdserving/README.md)
- [Academic algorithms](./doc/doc_en/algorithms_en.md) - [Academic Algorithms](./doc/doc_en/algorithm_overview_en.md)
- [Text detection](./doc/doc_en/algorithm_overview_en.md) - [Text detection](./doc/doc_en/algorithm_overview_en.md)
- [Text recognition](./doc/doc_en/algorithm_overview_en.md) - [Text recognition](./doc/doc_en/algorithm_overview_en.md)
- [End-to-end](./doc/doc_en/algorithm_overview_en.md) - [End-to-end OCR](./doc/doc_en/algorithm_overview_en.md)
- [Table Recognition](./doc/doc_en/algorithm_overview_en.md)
- [Key Information Extraction](./doc/doc_en/algorithm_overview_en.md)
- [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md) - [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md)
- Data Annotation and Synthesis - Data Annotation and Synthesis
- [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md) - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
...@@ -135,9 +136,9 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -135,9 +136,9 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- [General OCR Datasets(Chinese/English)](doc/doc_en/dataset/datasets_en.md) - [General OCR Datasets(Chinese/English)](doc/doc_en/dataset/datasets_en.md)
- [HandWritten_OCR_Datasets(Chinese)](doc/doc_en/dataset/handwritten_datasets_en.md) - [HandWritten_OCR_Datasets(Chinese)](doc/doc_en/dataset/handwritten_datasets_en.md)
- [Various OCR Datasets(multilingual)](doc/doc_en/dataset/vertical_and_multilingual_datasets_en.md) - [Various OCR Datasets(multilingual)](doc/doc_en/dataset/vertical_and_multilingual_datasets_en.md)
- [layout analysis](doc/doc_en/dataset/layout_datasets_en.md) - [Layout Analysis](doc/doc_en/dataset/layout_datasets_en.md)
- [table recognition](doc/doc_en/dataset/table_datasets_en.md) - [Table Recognition](doc/doc_en/dataset/table_datasets_en.md)
- [DocVQA](doc/doc_en/dataset/docvqa_datasets_en.md) - [Key Information Extraction](doc/doc_en/dataset/kie_datasets_en.md)
- [Code Structure](./doc/doc_en/tree_en.md) - [Code Structure](./doc/doc_en/tree_en.md)
- [Visualization](#Visualization) - [Visualization](#Visualization)
- [Community](#Community) - [Community](#Community)
...@@ -176,7 +177,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -176,7 +177,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
</details> </details>
<details open> <details open>
<summary>PP-Structure</summary> <summary>PP-Structurev2</summary>
- layout analysis + table recognition - layout analysis + table recognition
<div align="center"> <div align="center">
...@@ -185,12 +186,28 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -185,12 +186,28 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- SER (Semantic entity recognition) - SER (Semantic entity recognition)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_ser/zh_val_0_ser.jpg" width="800"> <img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div> </div>
- RE (Relation Extraction) - RE (Relation Extraction)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_re/zh_val_21_re.jpg" width="800"> <img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div> </div>
</details> </details>
......
...@@ -27,15 +27,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -27,15 +27,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 近期更新 ## 近期更新
- **🔥2022.5.11~13 每晚8:30【超强OCR技术详解与产业应用实战】三日直播课** - **🔥2022.7 发布[OCR场景应用集合](./applications)**
- 11日:开源最强OCR系统PP-OCRv3揭秘 - 发布OCR场景应用集合,包含数码管、液晶屏、车牌、高精度SVTR模型等**7个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。
- 12日:云边端全覆盖的PP-OCRv3训练部署实战
- 13日:OCR产业应用全流程拆解与实战
赶紧扫码报名吧!
<div align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "150" height = "150" />
</div>
- **🔥2022.5.9 发布PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)** - **🔥2022.5.9 发布PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上; - 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
...@@ -71,24 +64,22 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -71,24 +64,22 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 《动手学OCR》电子书 ## 《动手学OCR》电子书
- [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md) - [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md)
## 场景应用
- PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR垂类应用,在PP-OCR、PP-Structure的通用能力基础之上,以notebook的形式展示利用场景数据微调、模型优化方法、数据增广等内容,为开发者快速落地OCR应用提供示范与启发。详情可查看[README](./applications)
<a name="开源社区"></a> <a name="开源社区"></a>
## 开源社区 ## 开源社区
- **项目合作📑:** 如果您是企业开发者且有明确的OCR垂类应用需求,填写[问卷](https://paddle.wjx.cn/vj/QwF7GKw.aspx)后可免费与官方团队展开不同层次的合作。
- **加入社区👬:** 微信扫描二维码并填写问卷之后,加入交流群领取福利 - **加入社区👬:** 微信扫描二维码并填写问卷之后,加入交流群领取福利
- **获取5月11-13日每晚20:30《OCR超强技术详解与产业应用实战》的直播课链接** - **获取PaddleOCR最新发版解说《OCR超强技术详解与产业应用实战》系列直播课回放链接**
- **10G重磅OCR学习大礼包:**《动手学OCR》电子书,配套讲解视频和notebook项目;66篇OCR相关顶会前沿论文打包放送,包括CVPR、AAAI、IJCAI、ICCV等;PaddleOCR历次发版直播课视频;OCR社区优秀开发者项目分享视频。 - **10G重磅OCR学习大礼包:**《动手学OCR》电子书,配套讲解视频和notebook项目;66篇OCR相关顶会前沿论文打包放送,包括CVPR、AAAI、IJCAI、ICCV等;PaddleOCR历次发版直播课视频;OCR社区优秀开发者项目分享视频。
- **社区项目**🏅️:[社区项目](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙,也是帮助优质项目宣传的广播站。
- **社区贡献**🏅️:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙,也是帮助优质项目宣传的广播站。
- **社区常规赛**🎁:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982) - **社区常规赛**🎁:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)
<div align="center"> <div align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" /> <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "150" height = "150" />
</div> </div>
<a name="模型下载"></a> <a name="模型下载"></a>
## PP-OCR系列模型列表(更新中) ## PP-OCR系列模型列表(更新中)
...@@ -96,14 +87,21 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -96,14 +87,21 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
| ------------------------------------- | ----------------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | ------------------------------------- | ----------------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| 中英文超轻量PP-OCRv3模型(16.2M) | ch_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) | | 中英文超轻量PP-OCRv3模型(16.2M) | ch_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
| 英文超轻量PP-OCRv3模型(13.4M) | en_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) | | 英文超轻量PP-OCRv3模型(13.4M) | en_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| 中英文通用PP-OCR server模型(143.4M) | ch_ppocr_server_v2.0_xx | 服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure 系列模型下载](./ppstructure/docs/models_list.md) - 超轻量OCR系列更多模型下载(包括多语言),可以参考[PP-OCR系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure系列模型下载](./ppstructure/docs/models_list.md)
### PaddleOCR场景应用模型
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| ---- | ------------ | ---------------------------------- | ------------------------------------------------------------ | --------------------------------------------- |
| 制造 | 数码管识别 | 数码管数据合成、漏识别调优 | [光功率计数码管字符识别](./applications/光功率计数码管字符识别/光功率计数码管字符识别.md) | [下载链接](./applications/README.md#模型下载) |
| 金融 | 通用表单识别 | 多模态通用表单结构化提取 | [多模态表单识别](./applications/多模态表单识别.md) | [下载链接](./applications/README.md#模型下载) |
| 交通 | 车牌识别 | 多角度图像处理、轻量模型、端侧部署 | [轻量级车牌识别](./applications/轻量级车牌识别.md) | [下载链接](./applications/README.md#模型下载) |
- 更多制造、金融、交通行业的主要OCR垂类应用模型(如电表、液晶屏、高精度SVTR模型等),可参考[场景应用模型下载](./applications)
<a name="文档教程"></a> <a name="文档教程"></a>
## 文档教程 ## 文档教程
- [运行环境准备](./doc/doc_ch/environment.md) - [运行环境准备](./doc/doc_ch/environment.md)
...@@ -120,7 +118,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -120,7 +118,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [知识蒸馏](./doc/doc_ch/knowledge_distillation.md) - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
- [推理部署](./deploy/README_ch.md) - [推理部署](./deploy/README_ch.md)
- [基于Python预测引擎推理](./doc/doc_ch/inference_ppocr.md) - [基于Python预测引擎推理](./doc/doc_ch/inference_ppocr.md)
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [基于C++预测引擎推理](./deploy/cpp_infer/readme_ch.md)
- [服务化部署](./deploy/pdserving/README_CN.md) - [服务化部署](./deploy/pdserving/README_CN.md)
- [端侧部署](./deploy/lite/readme.md) - [端侧部署](./deploy/lite/readme.md)
- [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md) - [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md)
...@@ -132,16 +130,17 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -132,16 +130,17 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [模型训练](./doc/doc_ch/training.md) - [模型训练](./doc/doc_ch/training.md)
- [版面分析](./ppstructure/layout/README_ch.md) - [版面分析](./ppstructure/layout/README_ch.md)
- [表格识别](./ppstructure/table/README_ch.md) - [表格识别](./ppstructure/table/README_ch.md)
- [关键信息提取](./ppstructure/docs/kie.md) - [关键信息提取](./ppstructure/kie/README_ch.md)
- [DocVQA](./ppstructure/vqa/README_ch.md)
- [推理部署](./deploy/README_ch.md) - [推理部署](./deploy/README_ch.md)
- [基于Python预测引擎推理](./ppstructure/docs/inference.md) - [基于Python预测引擎推理](./ppstructure/docs/inference.md)
- [基于C++预测引擎推理]() - [基于C++预测引擎推理](./deploy/cpp_infer/readme_ch.md)
- [服务化部署](./deploy/pdserving/README_CN.md) - [服务化部署](./deploy/pdserving/README_CN.md)
- [前沿算法与模型🚀](./doc/doc_ch/algorithm.md) - [前沿算法与模型🚀](./doc/doc_ch/algorithm_overview.md)
- [文本检测算法](./doc/doc_ch/algorithm_overview.md#11-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E7%AE%97%E6%B3%95) - [文本检测算法](./doc/doc_ch/algorithm_overview.md)
- [文本识别算法](./doc/doc_ch/algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95) - [文本识别算法](./doc/doc_ch/algorithm_overview.md)
- [端到端算法](./doc/doc_ch/algorithm_overview.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95) - [端到端OCR算法](./doc/doc_ch/algorithm_overview.md)
- [表格识别算法](./doc/doc_ch/algorithm_overview.md)
- [关键信息抽取算法](./doc/doc_ch/algorithm_overview.md)
- [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md) - [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md)
- [场景应用](./applications) - [场景应用](./applications)
- 数据标注与合成 - 数据标注与合成
...@@ -155,7 +154,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -155,7 +154,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [垂类多语言OCR数据集](doc/doc_ch/dataset/vertical_and_multilingual_datasets.md) - [垂类多语言OCR数据集](doc/doc_ch/dataset/vertical_and_multilingual_datasets.md)
- [版面分析数据集](doc/doc_ch/dataset/layout_datasets.md) - [版面分析数据集](doc/doc_ch/dataset/layout_datasets.md)
- [表格识别数据集](doc/doc_ch/dataset/table_datasets.md) - [表格识别数据集](doc/doc_ch/dataset/table_datasets.md)
- [DocVQA数据集](doc/doc_ch/dataset/docvqa_datasets.md) - [关键信息提取数据集](doc/doc_ch/dataset/kie_datasets.md)
- [代码组织结构](./doc/doc_ch/tree.md) - [代码组织结构](./doc/doc_ch/tree.md)
- [效果展示](#效果展示) - [效果展示](#效果展示)
- [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md) - [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md)
...@@ -214,12 +213,28 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -214,12 +213,28 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- SER(语义实体识别) - SER(语义实体识别)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_ser/zh_val_0_ser.jpg" width="800"> <img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div> </div>
- RE(关系提取) - RE(关系提取)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_re/zh_val_21_re.jpg" width="800"> <img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div> </div>
</details> </details>
......
# 基于VI-LayoutXLM的发票关键信息抽取
- [1. 项目背景及意义](#1-项目背景及意义)
- [2. 项目内容](#2-项目内容)
- [3. 安装环境](#3-安装环境)
- [4. 关键信息抽取](#4-关键信息抽取)
- [4.1 文本检测](#41-文本检测)
- [4.2 文本识别](#42-文本识别)
- [4.3 语义实体识别](#43-语义实体识别)
- [4.4 关系抽取](#44-关系抽取)
## 1. 项目背景及意义
关键信息抽取在文档场景中被广泛使用,如身份证中的姓名、住址信息抽取,快递单中的姓名、联系方式等关键字段内容的抽取。传统基于模板匹配的方案需要针对不同的场景制定模板并进行适配,较为繁琐,不够鲁棒。基于该问题,我们借助飞桨提供的PaddleOCR套件中的关键信息抽取方案,实现对增值税发票场景的关键信息抽取。
## 2. 项目内容
本项目基于PaddleOCR开源套件,以VI-LayoutXLM多模态关键信息抽取模型为基础,针对增值税发票场景进行适配,提取该场景的关键信息。
## 3. 安装环境
```bash
# 首先git官方的PaddleOCR项目,安装需要的依赖
# 第一次运行打开该注释
git clone https://gitee.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
# 安装PaddleOCR的依赖
pip install -r requirements.txt
# 安装关键信息抽取任务的依赖
pip install -r ./ppstructure/vqa/requirements.txt
```
## 4. 关键信息抽取
基于文档图像的关键信息抽取包含3个部分:(1)文本检测(2)文本识别(3)关键信息抽取方法,包括语义实体识别或者关系抽取,下面分别进行介绍。
### 4.1 文本检测
本文重点关注发票的关键信息抽取模型训练与预测过程,因此在关键信息抽取过程中,直接使用标注的文本检测与识别标注信息进行测试,如果你希望自定义该场景的文本检测模型,完成端到端的关键信息抽取部分,请参考[文本检测模型训练教程](../doc/doc_ch/detection.md),按照训练数据格式准备数据,并完成该场景下垂类文本检测模型的微调过程。
### 4.2 文本识别
本文重点关注发票的关键信息抽取模型训练与预测过程,因此在关键信息抽取过程中,直接使用提供的文本检测与识别标注信息进行测试,如果你希望自定义该场景的文本检测模型,完成端到端的关键信息抽取部分,请参考[文本识别模型训练教程](../doc/doc_ch/recognition.md),按照训练数据格式准备数据,并完成该场景下垂类文本识别模型的微调过程。
### 4.3 语义实体识别 (Semantic Entity Recognition)
语义实体识别指的是给定一段文本行,确定其类别(如`姓名``住址`等类别)。PaddleOCR中提供了基于VI-LayoutXLM的多模态语义实体识别方法,融合文本、位置与版面信息,相比LayoutXLM多模态模型,去除了其中的视觉骨干网络特征提取部分,引入符合阅读顺序的文本行排序方法,同时使用UDML联合互蒸馏方法进行训练,最终在精度与速度方面均超越LayoutXLM。更多关于VI-LayoutXLM的算法介绍与精度指标,请参考:[VI-LayoutXLM算法介绍](../doc/doc_ch/algorithm_kie_vi_layoutxlm.md)
#### 4.3.1 准备数据
发票场景为例,我们首先需要标注出其中的关键字段,我们将其标注为`问题-答案`的key-value pair,如下,编号No为12270830,则`No`字段标注为question,`12270830`字段标注为answer。如下图所示。
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185381131-76b6e260-04fe-46d9-baca-6bdd7fe0d0ce.jpg" width="800">
</div>
**注意:**
* 如果文本检测模型数据标注过程中,没有标注 **非关键信息内容** 的检测框,那么在标注关键信息抽取任务的时候,也不需要标注该部分,如上图所示;如果标注的过程,如果同时标注了**非关键信息内容** 的检测框,那么我们需要将该部分的label记为other。
* 标注过程中,需要以文本行为单位进行标注,无需标注单个字符的位置信息。
已经处理好的增值税发票数据集从这里下载:[增值税发票数据集下载链接](https://aistudio.baidu.com/aistudio/datasetdetail/165561)
下载好发票数据集,并解压在train_data目录下,目录结构如下所示。
```
train_data
|--zzsfp
|---class_list.txt
|---imgs/
|---train.json
|---val.json
```
其中`class_list.txt`是包含`other`, `question`, `answer`,3个种类的的类别列表(不区分大小写),`imgs`目录底下,`train.json``val.json`分别表示训练与评估集合的标注文件。训练集中包含30张图片,验证集中包含8张图片。部分标注如下所示。
```py
b33.jpg [{"transcription": "No", "label": "question", "points": [[2882, 472], [3026, 472], [3026, 588], [2882, 588]], }, {"transcription": "12269563", "label": "answer", "points": [[3066, 448], [3598, 448], [3598, 576], [3066, 576]], ]}]
```
相比于OCR检测的标注,仅多了`label`字段。
#### 4.3.2 开始训练
VI-LayoutXLM的配置为[ser_vi_layoutxlm_xfund_zh_udml.yml](../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml),需要修改数据、类别数目以及配置文件。
```yml
Architecture:
model_type: &model_type "vqa"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
# one of base or vi
mode: vi
checkpoints:
# 定义类别数目
num_classes: &num_classes 5
...
PostProcess:
name: DistillationSerPostProcess
model_name: ["Student", "Teacher"]
key: backbone_out
# 定义类别文件
class_path: &class_path train_data/zzsfp/class_list.txt
Train:
dataset:
name: SimpleDataSet
# 定义训练数据目录与标注文件
data_dir: train_data/zzsfp/imgs
label_file_list:
- train_data/zzsfp/train.json
...
Eval:
dataset:
# 定义评估数据目录与标注文件
name: SimpleDataSet
data_dir: train_data/zzsfp/imgs
label_file_list:
- train_data/zzsfp/val.json
...
```
LayoutXLM与VI-LayoutXLM针对该场景的训练结果如下所示。
| 模型 | 迭代轮数 | Hmean |
| :---: | :---: | :---: |
| LayoutXLM | 50 | 100% |
| VI-LayoutXLM | 50 | 100% |
可以看出,由于当前数据量较少,场景比较简单,因此2个模型的Hmean均达到了100%。
#### 4.3.3 模型评估
模型训练过程中,使用的是知识蒸馏的策略,最终保留了学生模型的参数,在评估时,我们需要针对学生模型的配置文件进行修改: [ser_vi_layoutxlm_xfund_zh.yml](../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml),修改内容与训练配置相同,包括**类别数、类别映射文件、数据目录**
修改完成后,执行下面的命令完成评估过程。
```bash
# 注意:需要根据你的配置文件地址与保存的模型地址,对评估命令进行修改
python3 tools/eval.py -c ./fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy
```
输出结果如下所示。
```
[2022/08/18 08:49:58] ppocr INFO: metric eval ***************
[2022/08/18 08:49:58] ppocr INFO: precision:1.0
[2022/08/18 08:49:58] ppocr INFO: recall:1.0
[2022/08/18 08:49:58] ppocr INFO: hmean:1.0
[2022/08/18 08:49:58] ppocr INFO: fps:1.9740402401574881
```
#### 4.3.4 模型预测
使用下面的命令进行预测。
```bash
python3 tools/infer_vqa_token_ser.py -c fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/XFUND/zh_val/val.json Global.infer_mode=False
```
预测结果会保存在配置文件中的`Global.save_res_path`目录中。
部分预测结果如下所示。
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="800">
</div>
* 注意:在预测时,使用的文本检测与识别结果为标注的结果,直接从json文件里面进行读取。
如果希望使用OCR引擎结果得到的结果进行推理,则可以使用下面的命令进行推理。
```bash
python3 tools/infer_vqa_token_ser.py -c fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/imgs/b25.jpg Global.infer_mode=True
```
结果如下所示。
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185384321-61153faa-e407-45c4-8e7c-a39540248189.jpg" width="800">
</div>
它会使用PP-OCRv3的文本检测与识别模型进行获取文本位置与内容信息。
可以看出,由于训练的过程中,没有标注额外的字段为other类别,所以大多数检测出来的字段被预测为question或者answer。
如果希望构建基于你在垂类场景训练得到的OCR检测与识别模型,可以使用下面的方法传入检测与识别的inference 模型路径,即可完成OCR文本检测与识别以及SER的串联过程。
```bash
python3 tools/infer_vqa_token_ser.py -c fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/imgs/b25.jpg Global.infer_mode=True Global.kie_rec_model_dir="your_rec_model" Global.kie_det_model_dir="your_det_model"
```
### 4.4 关系抽取(Relation Extraction)
使用SER模型,可以获取图像中所有的question与answer的字段,继续这些字段的类别,我们需要进一步获取question与answer之间的连接,因此需要进一步训练关系抽取模型,解决该问题。本文也基于VI-LayoutXLM多模态预训练模型,进行下游RE任务的模型训练。
#### 4.4.1 准备数据
以发票场景为例,相比于SER任务,RE中还需要标记每个文本行的id信息以及链接关系linking,如下所示。
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185387870-dc9125a0-9ceb-4036-abf3-184b6e65dc7d.jpg" width="800">
</div>
标注文件的部分内容如下所示。
```py
b33.jpg [{"transcription": "No", "label": "question", "points": [[2882, 472], [3026, 472], [3026, 588], [2882, 588]], "id": 0, "linking": [[0, 1]]}, {"transcription": "12269563", "label": "answer", "points": [[3066, 448], [3598, 448], [3598, 576], [3066, 576]], "id": 1, "linking": [[0, 1]]}]
```
相比与SER的标注,多了`id``linking`的信息,分别表示唯一标识以及连接关系。
已经处理好的增值税发票数据集从这里下载:[增值税发票数据集下载链接](https://aistudio.baidu.com/aistudio/datasetdetail/165561)
#### 4.4.2 开始训练
基于VI-LayoutXLM的RE任务配置为[re_vi_layoutxlm_xfund_zh_udml.yml](../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml),需要修改**数据路径、类别列表文件**
```yml
Train:
dataset:
name: SimpleDataSet
# 定义训练数据目录与标注文件
data_dir: train_data/zzsfp/imgs
label_file_list:
- train_data/zzsfp/train.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: &class_path train_data/zzsfp/class_list.txt
...
Eval:
dataset:
# 定义评估数据目录与标注文件
name: SimpleDataSet
data_dir: train_data/zzsfp/imgs
label_file_list:
- train_data/zzsfp/val.json
...
```
LayoutXLM与VI-LayoutXLM针对该场景的训练结果如下所示。
| 模型 | 迭代轮数 | Hmean |
| :---: | :---: | :---: |
| LayoutXLM | 50 | 98.0% |
| VI-LayoutXLM | 50 | 99.3% |
可以看出,对于VI-LayoutXLM相比LayoutXLM的Hmean高了1.3%。
#### 4.4.3 模型评估
模型训练过程中,使用的是知识蒸馏的策略,最终保留了学生模型的参数,在评估时,我们需要针对学生模型的配置文件进行修改: [re_vi_layoutxlm_xfund_zh.yml](../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml),修改内容与训练配置相同,包括**类别映射文件、数据目录**
修改完成后,执行下面的命令完成评估过程。
```bash
# 注意:需要根据你的配置文件地址与保存的模型地址,对评估命令进行修改
python3 tools/eval.py -c ./fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_udml/best_accuracy
```
输出结果如下所示。
```py
[2022/08/18 12:17:14] ppocr INFO: metric eval ***************
[2022/08/18 12:17:14] ppocr INFO: precision:1.0
[2022/08/18 12:17:14] ppocr INFO: recall:0.9873417721518988
[2022/08/18 12:17:14] ppocr INFO: hmean:0.9936305732484078
[2022/08/18 12:17:14] ppocr INFO: fps:2.765963539771157
```
#### 4.4.4 模型预测
使用下面的命令进行预测。
```bash
# -c 后面的是RE任务的配置文件
# -o 后面的字段是RE任务的配置
# -c_ser 后面的是SER任务的配置文件
# -c_ser 后面的字段是SER任务的配置
python3 tools/infer_vqa_token_ser_re.py -c fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/val.json Global.infer_mode=False -c_ser fapiao/ser_vi_layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy
```
预测结果会保存在配置文件中的`Global.save_res_path`目录中。
部分预测结果如下所示。
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="800">
</div>
* 注意:在预测时,使用的文本检测与识别结果为标注的结果,直接从json文件里面进行读取。
如果希望使用OCR引擎结果得到的结果进行推理,则可以使用下面的命令进行推理。
```bash
python3 tools/infer_vqa_token_ser_re.py -c fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/val.json Global.infer_mode=True -c_ser fapiao/ser_vi_layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy
```
如果希望构建基于你在垂类场景训练得到的OCR检测与识别模型,可以使用下面的方法传入,即可完成SER + RE的串联过程。
```bash
python3 tools/infer_vqa_token_ser_re.py -c fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/val.json Global.infer_mode=True -c_ser fapiao/ser_vi_layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.kie_rec_model_dir="your_rec_model" Global.kie_det_model_dir="your_det_model"
```
...@@ -14,6 +14,9 @@ Global: ...@@ -14,6 +14,9 @@ Global:
use_visualdl: False use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt save_res_path: ./output/det_db/predicts_db.txt
use_amp: False
amp_level: O2
amp_custom_black_list: ['exp']
Architecture: Architecture:
name: DistillationModel name: DistillationModel
......
...@@ -11,11 +11,11 @@ Global: ...@@ -11,11 +11,11 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg infer_img: ppstructure/docs/kie/input/zh_val_21.jpg
save_res_path: ./output/re_layoutlmv2_xfund_zh/res/ save_res_path: ./output/re_layoutlmv2_xfund_zh/res/
Architecture: Architecture:
model_type: vqa model_type: kie
algorithm: &algorithm "LayoutLMv2" algorithm: &algorithm "LayoutLMv2"
Transform: Transform:
Backbone: Backbone:
......
...@@ -11,11 +11,11 @@ Global: ...@@ -11,11 +11,11 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg infer_img: ppstructure/docs/kie/input/zh_val_21.jpg
save_res_path: ./output/re_layoutxlm_xfund_zh/res/ save_res_path: ./output/re_layoutxlm_xfund_zh/res/
Architecture: Architecture:
model_type: vqa model_type: kie
algorithm: &algorithm "LayoutXLM" algorithm: &algorithm "LayoutXLM"
Transform: Transform:
Backbone: Backbone:
......
...@@ -11,11 +11,11 @@ Global: ...@@ -11,11 +11,11 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
save_res_path: ./output/re_layoutlm_xfund_zh/res save_res_path: ./output/re_layoutlm_xfund_zh/res
Architecture: Architecture:
model_type: vqa model_type: kie
algorithm: &algorithm "LayoutLM" algorithm: &algorithm "LayoutLM"
Transform: Transform:
Backbone: Backbone:
......
...@@ -11,11 +11,11 @@ Global: ...@@ -11,11 +11,11 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
save_res_path: ./output/ser_layoutlmv2_xfund_zh/res/ save_res_path: ./output/ser_layoutlmv2_xfund_zh/res/
Architecture: Architecture:
model_type: vqa model_type: kie
algorithm: &algorithm "LayoutLMv2" algorithm: &algorithm "LayoutLMv2"
Transform: Transform:
Backbone: Backbone:
......
...@@ -11,11 +11,11 @@ Global: ...@@ -11,11 +11,11 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
save_res_path: ./output/ser_layoutxlm_xfund_zh/res save_res_path: ./output/ser_layoutxlm_xfund_zh/res
Architecture: Architecture:
model_type: vqa model_type: kie
algorithm: &algorithm "LayoutXLM" algorithm: &algorithm "LayoutXLM"
Transform: Transform:
Backbone: Backbone:
......
...@@ -11,11 +11,13 @@ Global: ...@@ -11,11 +11,13 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg infer_img: ppstructure/docs/kie/input/zh_val_21.jpg
save_res_path: ./output/re/xfund_zh/with_gt save_res_path: ./output/re/xfund_zh/with_gt
kie_rec_model_dir:
kie_det_model_dir:
Architecture: Architecture:
model_type: vqa model_type: kie
algorithm: &algorithm "LayoutXLM" algorithm: &algorithm "LayoutXLM"
Transform: Transform:
Backbone: Backbone:
......
...@@ -11,11 +11,11 @@ Global: ...@@ -11,11 +11,11 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg infer_img: ppstructure/docs/kie/input/zh_val_21.jpg
save_res_path: ./output/re/xfund_zh/with_gt save_res_path: ./output/re/xfund_zh/with_gt
Architecture: Architecture:
model_type: &model_type "vqa" model_type: &model_type "kie"
name: DistillationModel name: DistillationModel
algorithm: Distillation algorithm: Distillation
Models: Models:
......
...@@ -11,16 +11,18 @@ Global: ...@@ -11,16 +11,18 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
# if you want to predict using the groundtruth ocr info, # if you want to predict using the groundtruth ocr info,
# you can use the following config # you can use the following config
# infer_img: train_data/XFUND/zh_val/val.json # infer_img: train_data/XFUND/zh_val/val.json
# infer_mode: False # infer_mode: False
save_res_path: ./output/ser/xfund_zh/res save_res_path: ./output/ser/xfund_zh/res
kie_rec_model_dir:
kie_det_model_dir:
Architecture: Architecture:
model_type: vqa model_type: kie
algorithm: &algorithm "LayoutXLM" algorithm: &algorithm "LayoutXLM"
Transform: Transform:
Backbone: Backbone:
......
...@@ -11,12 +11,12 @@ Global: ...@@ -11,12 +11,12 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
save_res_path: ./output/ser_layoutxlm_xfund_zh/res save_res_path: ./output/ser_layoutxlm_xfund_zh/res
Architecture: Architecture:
model_type: &model_type "vqa" model_type: &model_type "kie"
name: DistillationModel name: DistillationModel
algorithm: Distillation algorithm: Distillation
Models: Models:
......
Global:
use_gpu: True
epoch_num: 400
log_smooth_window: 20
print_batch_step: 20
save_model_dir: ./output/SLANet_ch
save_epoch_step: 400
# evaluation is run every 331 iterations after the 0th iteration
eval_batch_step: [0, 331]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir: ./output/SLANet_ch/infer
use_visualdl: False
infer_img: doc/table/table.jpg
# for data or label process
character_dict_path: ppocr/utils/dict/table_structure_dict_ch.txt
character_type: en
max_text_length: &max_text_length 500
box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy'
infer_mode: False
use_sync_bn: True
save_res_path: output/infer
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
clip_norm: 5.0
lr:
learning_rate: 0.001
regularizer:
name: 'L2'
factor: 0.00000
Architecture:
model_type: table
algorithm: SLANet
Backbone:
name: PPLCNet
scale: 1.0
pretrained: True
use_ssld: True
Neck:
name: CSPPAN
out_channels: 96
Head:
name: SLAHead
hidden_size: 256
max_text_length: *max_text_length
loc_reg_num: &loc_reg_num 8
Loss:
name: SLALoss
structure_weight: 1.0
loc_weight: 2.0
loc_loss: smooth_l1
PostProcess:
name: TableLabelDecode
merge_no_span_structure: &merge_no_span_structure True
Metric:
name: TableMetric
main_indicator: acc
compute_bbox_metric: False
loc_reg_num: *loc_reg_num
box_format: *box_format
del_thead_tbody: True
Train:
dataset:
name: PubTabDataSet
data_dir: train_data/table/train/
label_file_list: [train_data/table/train.txt]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: False
- TableLabelEncode:
learn_empty_box: False
merge_no_span_structure: *merge_no_span_structure
replace_empty_cell_token: False
loc_reg_num: *loc_reg_num
max_text_length: *max_text_length
- TableBoxEncode:
in_box_format: *box_format
out_box_format: *box_format
- ResizeTableImage:
max_len: 488
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- PaddingTableImage:
size: [488, 488]
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ]
loader:
shuffle: True
batch_size_per_card: 48
drop_last: True
num_workers: 1
Eval:
dataset:
name: PubTabDataSet
data_dir: train_data/table/val/
label_file_list: [train_data/table/val.txt]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: False
- TableLabelEncode:
learn_empty_box: False
merge_no_span_structure: *merge_no_span_structure
replace_empty_cell_token: False
loc_reg_num: *loc_reg_num
max_text_length: *max_text_length
- TableBoxEncode:
in_box_format: *box_format
out_box_format: *box_format
- ResizeTableImage:
max_len: 488
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- PaddingTableImage:
size: [488, 488]
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ]
loader:
shuffle: False
drop_last: False
batch_size_per_card: 48
num_workers: 1
...@@ -17,7 +17,7 @@ Global: ...@@ -17,7 +17,7 @@ Global:
# for data or label process # for data or label process
character_dict_path: ppocr/utils/dict/table_structure_dict.txt character_dict_path: ppocr/utils/dict/table_structure_dict.txt
character_type: en character_type: en
max_text_length: &max_text_length 800 max_text_length: &max_text_length 500
box_format: &box_format 'xyxy' # 'xywh', 'xyxy', 'xyxyxyxy' box_format: &box_format 'xyxy' # 'xywh', 'xyxy', 'xyxyxyxy'
infer_mode: False infer_mode: False
...@@ -38,7 +38,8 @@ Architecture: ...@@ -38,7 +38,8 @@ Architecture:
Backbone: Backbone:
name: MobileNetV3 name: MobileNetV3
scale: 1.0 scale: 1.0
model_name: large model_name: small
disable_se: true
Head: Head:
name: TableAttentionHead name: TableAttentionHead
hidden_size: 256 hidden_size: 256
...@@ -89,7 +90,7 @@ Train: ...@@ -89,7 +90,7 @@ Train:
keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ]
loader: loader:
shuffle: True shuffle: True
batch_size_per_card: 32 batch_size_per_card: 48
drop_last: True drop_last: True
num_workers: 1 num_workers: 1
...@@ -124,5 +125,5 @@ Eval: ...@@ -124,5 +125,5 @@ Eval:
loader: loader:
shuffle: False shuffle: False
drop_last: False drop_last: False
batch_size_per_card: 16 batch_size_per_card: 48
num_workers: 1 num_workers: 1
...@@ -47,7 +47,7 @@ str_to_cpu_mode(const std::string &cpu_mode) { ...@@ -47,7 +47,7 @@ str_to_cpu_mode(const std::string &cpu_mode) {
std::string upper_key; std::string upper_key;
std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(), std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(),
::toupper); ::toupper);
auto index = cpu_mode_map.find(upper_key); auto index = cpu_mode_map.find(upper_key.c_str());
if (index == cpu_mode_map.end()) { if (index == cpu_mode_map.end()) {
LOGE("cpu_mode not found %s", upper_key.c_str()); LOGE("cpu_mode not found %s", upper_key.c_str());
return paddle::lite_api::LITE_POWER_HIGH; return paddle::lite_api::LITE_POWER_HIGH;
......
...@@ -54,7 +54,7 @@ public class OCRPredictorNative { ...@@ -54,7 +54,7 @@ public class OCRPredictorNative {
} }
public void destory() { public void destory() {
if (nativePointer > 0) { if (nativePointer != 0) {
release(nativePointer); release(nativePointer);
nativePointer = 0; nativePointer = 0;
} }
......
...@@ -109,8 +109,10 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT ...@@ -109,8 +109,10 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT
运行之前,将下面文件拷贝到`build/Release/`文件夹下 运行之前,将下面文件拷贝到`build/Release/`文件夹下
1. `paddle_inference/paddle/lib/paddle_inference.dll` 1. `paddle_inference/paddle/lib/paddle_inference.dll`
2. `opencv/build/x64/vc15/bin/opencv_world455.dll` 2. `paddle_inference/third_party/install/onnxruntime/lib/onnxruntime.dll`
3. 如果使用openblas版本的预测库还需要拷贝 `paddle_inference/third_party/install/openblas/lib/openblas.dll` 3. `paddle_inference/third_party/install/paddle2onnx/lib/paddle2onnx.dll`
4. `opencv/build/x64/vc15/bin/opencv_world455.dll`
5. 如果使用openblas版本的预测库还需要拷贝 `paddle_inference/third_party/install/openblas/lib/openblas.dll`
### Step4: 预测 ### Step4: 预测
......
...@@ -73,4 +73,4 @@ python deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_ ...@@ -73,4 +73,4 @@ python deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_
The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8. The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
The derived model can be converted through the `opt tool` of PaddleLite. The derived model can be converted through the `opt tool` of PaddleLite.
For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md) For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme.md)
# 前沿算法与模型
PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,已支持的模型与使用教程可点击下方列表查看:
- [文本检测算法](./algorithm_overview.md#11-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E7%AE%97%E6%B3%95)
- [文本识别算法](./algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [端到端算法](./algorithm_overview.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [表格识别](./algorithm_overview.md#3-%E8%A1%A8%E6%A0%BC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
**欢迎广大开发者合作共建,贡献更多算法,合入有奖🎁!具体可查看[社区常规赛](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。**
新增算法可参考如下教程:
- [使用PaddleOCR架构添加新算法](./add_new_algorithm.md)
...@@ -66,10 +66,10 @@ LayoutXLM模型基于SER任务进行推理,可以执行如下命令: ...@@ -66,10 +66,10 @@ LayoutXLM模型基于SER任务进行推理,可以执行如下命令:
```bash ```bash
cd ppstructure cd ppstructure
python3 vqa/predict_vqa_token_ser.py \ python3 kie/predict_kie_token_ser.py \
--vqa_algorithm=LayoutXLM \ --kie_algorithm=LayoutXLM \
--ser_model_dir=../inference/ser_layoutxlm_infer \ --ser_model_dir=../inference/ser_layoutxlm_infer \
--image_dir=./docs/vqa/input/zh_val_42.jpg \ --image_dir=./docs/kie/input/zh_val_42.jpg \
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \ --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
--vis_font_path=../doc/fonts/simfang.ttf --vis_font_path=../doc/fonts/simfang.ttf
``` ```
...@@ -77,7 +77,7 @@ python3 vqa/predict_vqa_token_ser.py \ ...@@ -77,7 +77,7 @@ python3 vqa/predict_vqa_token_ser.py \
SER可视化结果默认保存到`./output`文件夹里面,结果示例如下: SER可视化结果默认保存到`./output`文件夹里面,结果示例如下:
<div align="center"> <div align="center">
<img src="../../ppstructure/docs/vqa/result_ser/zh_val_42_ser.jpg" width="800"> <img src="../../ppstructure/docs/kie/result_ser/zh_val_42_ser.jpg" width="800">
</div> </div>
......
...@@ -59,10 +59,10 @@ VI-LayoutXLM模型基于SER任务进行推理,可以执行如下命令: ...@@ -59,10 +59,10 @@ VI-LayoutXLM模型基于SER任务进行推理,可以执行如下命令:
```bash ```bash
cd ppstructure cd ppstructure
python3 vqa/predict_vqa_token_ser.py \ python3 kie/predict_kie_token_ser.py \
--vqa_algorithm=LayoutXLM \ --kie_algorithm=LayoutXLM \
--ser_model_dir=../inference/ser_vi_layoutxlm_infer \ --ser_model_dir=../inference/ser_vi_layoutxlm_infer \
--image_dir=./docs/vqa/input/zh_val_42.jpg \ --image_dir=./docs/kie/input/zh_val_42.jpg \
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \ --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
--vis_font_path=../doc/fonts/simfang.ttf \ --vis_font_path=../doc/fonts/simfang.ttf \
--ocr_order_method="tb-yx" --ocr_order_method="tb-yx"
...@@ -71,7 +71,7 @@ python3 vqa/predict_vqa_token_ser.py \ ...@@ -71,7 +71,7 @@ python3 vqa/predict_vqa_token_ser.py \
SER可视化结果默认保存到`./output`文件夹里面,结果示例如下: SER可视化结果默认保存到`./output`文件夹里面,结果示例如下:
<div align="center"> <div align="center">
<img src="../../ppstructure/docs/vqa/result_ser/zh_val_42_ser.jpg" width="800"> <img src="../../ppstructure/docs/kie/result_ser/zh_val_42_ser.jpg" width="800">
</div> </div>
......
# 算法汇总 # 前沿算法与模型
- [1. 两阶段OCR算法](#1) - [1. 两阶段OCR算法](#1)
- [1.1 文本检测算法](#11) - [1.1 文本检测算法](#11)
...@@ -7,8 +7,13 @@ ...@@ -7,8 +7,13 @@
- [3. 表格识别算法](#3) - [3. 表格识别算法](#3)
- [4. 关键信息抽取算法](#4) - [4. 关键信息抽取算法](#4)
本文给出了PaddleOCR已支持的OCR算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCRv3 系列模型下载](./models_list.md)
>>
PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广大开发者合作共建,贡献更多算法,合入有奖🎁!具体可查看[社区常规赛](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。**
>>
新增算法可参考教程:[使用PaddleOCR架构添加新算法](./add_new_algorithm.md)
本文给出了PaddleOCR已支持的OCR算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)
<a name="1"></a> <a name="1"></a>
......
...@@ -78,7 +78,7 @@ python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretr ...@@ -78,7 +78,7 @@ python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretr
SRN文本识别模型推理,可以执行如下命令: SRN文本识别模型推理,可以执行如下命令:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_char_type="ch" --rec_algorithm="SRN" --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --use_space_char=False python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_algorithm="SRN" --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --use_space_char=False
``` ```
<a name="4-2"></a> <a name="4-2"></a>
......
# 信息抽取数据集 # 关键信息抽取数据集
这里整理了常见的DocVQA数据集,持续更新中,欢迎各位小伙伴贡献数据集~ 这里整理了常见的DocVQA数据集,持续更新中,欢迎各位小伙伴贡献数据集~
- [FUNSD数据集](#funsd) - [FUNSD数据集](#funsd)
- [XFUND数据集](#xfund) - [XFUND数据集](#xfund)
- [wildreceipt数据集](#wildreceipt) - [wildreceipt数据集](#wildreceipt)
......
...@@ -64,7 +64,7 @@ zh_train_1.jpg [{"transcription": "中国人体器官捐献", "label": "other" ...@@ -64,7 +64,7 @@ zh_train_1.jpg [{"transcription": "中国人体器官捐献", "label": "other"
验证集构建方式与训练集相同。 验证集构建方式与训练集相同。
* 字典文件 **(3)字典文件**
训练集与验证集中的文本行包含标签信息,所有标签的列表存在字典文件中(如`class_list.txt`),字典文件中的每一行表示为一个类别名称。 训练集与验证集中的文本行包含标签信息,所有标签的列表存在字典文件中(如`class_list.txt`),字典文件中的每一行表示为一个类别名称。
...@@ -103,7 +103,7 @@ HEADER ...@@ -103,7 +103,7 @@ HEADER
## 1.3. 数据下载 ## 1.3. 数据下载
如果你没有本地数据集,可以从[XFUND](https://github.com/doc-analysis/XFUND)或者[FUNSD](https://guillaumejaume.github.io/FUNSD/)官网下载数据,然后使用XFUND与FUNSD的处理脚本([XFUND](../../ppstructure/vqa/tools/trans_xfun_data.py), [FUNSD](../../ppstructure/vqa/tools/trans_funsd_label.py)),生成用于PaddleOCR训练的数据格式,并使用公开数据集快速体验关键信息抽取的流程。 如果你没有本地数据集,可以从[XFUND](https://github.com/doc-analysis/XFUND)或者[FUNSD](https://guillaumejaume.github.io/FUNSD/)官网下载数据,然后使用XFUND与FUNSD的处理脚本([XFUND](../../ppstructure/kie/tools/trans_xfun_data.py), [FUNSD](../../ppstructure/kie/tools/trans_funsd_label.py)),生成用于PaddleOCR训练的数据格式,并使用公开数据集快速体验关键信息抽取的流程。
更多关于公开数据集的介绍,请参考[关键信息抽取数据集说明文档](./dataset/kie_datasets.md) 更多关于公开数据集的介绍,请参考[关键信息抽取数据集说明文档](./dataset/kie_datasets.md)
...@@ -209,7 +209,7 @@ Architecture: ...@@ -209,7 +209,7 @@ Architecture:
num_classes: &num_classes 7 num_classes: &num_classes 7
PostProcess: PostProcess:
name: VQASerTokenLayoutLMPostProcess name: kieSerTokenLayoutLMPostProcess
# 修改字典文件的路径为你自定义的数据集的字典路径 # 修改字典文件的路径为你自定义的数据集的字典路径
class_path: &class_path train_data/XFUND/class_list_xfun.txt class_path: &class_path train_data/XFUND/class_list_xfun.txt
...@@ -347,25 +347,25 @@ output/ser_vi_layoutxlm_xfund_zh/ ...@@ -347,25 +347,25 @@ output/ser_vi_layoutxlm_xfund_zh/
```bash ```bash
python3 tools/infer_vqa_token_ser.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh/best_accuracy Global.infer_img=./ppstructure/docs/vqa/input/zh_val_42.jpg python3 tools/infer_kie_token_ser.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh/best_accuracy Global.infer_img=./ppstructure/docs/kie/input/zh_val_42.jpg
``` ```
预测图片如下所示,图片会存储在`Global.save_res_path`路径中。 预测图片如下所示,图片会存储在`Global.save_res_path`路径中。
<div align="center"> <div align="center">
<img src="../../ppstructure/docs/vqa/result_ser/zh_val_42_ser.jpg" width="800"> <img src="../../ppstructure/docs/kie/result_ser/zh_val_42_ser.jpg" width="800">
</div> </div>
预测过程中,默认会加载PP-OCRv3的检测识别模型,用于OCR的信息抽取,如果希望加载预先获取的OCR结果,可以使用下面的方式进行预测,指定`Global.infer_img`为标注文件,其中包含图片路径以及OCR信息,同时指定`Global.infer_mode`为False,表示此时不使用OCR预测引擎。 预测过程中,默认会加载PP-OCRv3的检测识别模型,用于OCR的信息抽取,如果希望加载预先获取的OCR结果,可以使用下面的方式进行预测,指定`Global.infer_img`为标注文件,其中包含图片路径以及OCR信息,同时指定`Global.infer_mode`为False,表示此时不使用OCR预测引擎。
```bash ```bash
python3 tools/infer_vqa_token_ser.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh/best_accuracy Global.infer_img=./train_data/XFUND/zh_val/val.json Global.infer_mode=False python3 tools/infer_kie_token_ser.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh/best_accuracy Global.infer_img=./train_data/XFUND/zh_val/val.json Global.infer_mode=False
``` ```
对于上述图片,如果使用标注的OCR结果进行信息抽取,预测结果如下。 对于上述图片,如果使用标注的OCR结果进行信息抽取,预测结果如下。
<div align="center"> <div align="center">
<img src="../../ppstructure/docs/vqa/result_ser_with_gt_ocr/zh_val_42_ser.jpg" width="800"> <img src="../../ppstructure/docs/kie/result_ser_with_gt_ocr/zh_val_42_ser.jpg" width="800">
</div> </div>
可以看出,部分检测框信息更加准确,但是整体信息抽取识别结果基本一致。 可以看出,部分检测框信息更加准确,但是整体信息抽取识别结果基本一致。
...@@ -375,20 +375,26 @@ python3 tools/infer_vqa_token_ser.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxl ...@@ -375,20 +375,26 @@ python3 tools/infer_vqa_token_ser.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxl
```bash ```bash
python3 ./tools/infer_vqa_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_udml_xfund_zh/re_layoutxlm_xfund_zh_v4_udml/best_accuracy/ Global.infer_img=./train_data/XFUND/zh_val/image/ -c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o_ser Architecture.Backbone.checkpoints=pretrain_models/ser_vi_layoutxlm_udml_xfund_zh/best_accuracy/ python3 ./tools/infer_kie_token_ser_re.py \
-c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_udml_xfund_zh/best_accuracy/ \
Global.infer_img=./train_data/XFUND/zh_val/image/ \
-c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o_ser Architecture.Backbone.checkpoints=pretrain_models/ \
ser_vi_layoutxlm_udml_xfund_zh/best_accuracy/
``` ```
预测结果如下所示。 预测结果如下所示。
<div align="center"> <div align="center">
<img src="../../ppstructure/docs/vqa/result_re/zh_val_42_re.jpg" width="800"> <img src="../../ppstructure/docs/kie/result_re/zh_val_42_re.jpg" width="800">
</div> </div>
如果希望使用标注或者预先获取的OCR信息进行关键信息抽取,同上,可以指定`Global.infer_mode`为False,指定`Global.infer_img`为标注文件。 如果希望使用标注或者预先获取的OCR信息进行关键信息抽取,同上,可以指定`Global.infer_mode`为False,指定`Global.infer_img`为标注文件。
```bash ```bash
python3 ./tools/infer_vqa_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_udml_xfund_zh/re_layoutxlm_xfund_zh_v4_udml/best_accuracy/ Global.infer_img=./train_data/XFUND/zh_val/val.json Global.infer_mode=False -c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o_ser Architecture.Backbone.checkpoints=pretrain_models/ser_vi_layoutxlm_udml_xfund_zh/best_accuracy/ python3 ./tools/infer_kie_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_udml_xfund_zh/re_layoutxlm_xfund_zh_v4_udml/best_accuracy/ Global.infer_img=./train_data/XFUND/zh_val/val.json Global.infer_mode=False -c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o_ser Architecture.Backbone.checkpoints=pretrain_models/ser_vi_layoutxlm_udml_xfund_zh/best_accuracy/
``` ```
其中`c_ser`表示SER的配置文件,`o_ser` 后面需要加上待修改的SER模型与配置文件,如预训练权重等。 其中`c_ser`表示SER的配置文件,`o_ser` 后面需要加上待修改的SER模型与配置文件,如预训练权重等。
...@@ -397,7 +403,7 @@ python3 ./tools/infer_vqa_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layo ...@@ -397,7 +403,7 @@ python3 ./tools/infer_vqa_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layo
预测结果如下所示。 预测结果如下所示。
<div align="center"> <div align="center">
<img src="../../ppstructure/docs/vqa/result_re_with_gt_ocr/zh_val_42_re.jpg" width="800"> <img src="../../ppstructure/docs/kie/result_re_with_gt_ocr/zh_val_42_re.jpg" width="800">
</div> </div>
可以看出,直接使用标注的OCR结果的RE预测结果要更加准确一些。 可以看出,直接使用标注的OCR结果的RE预测结果要更加准确一些。
...@@ -417,8 +423,8 @@ inference 模型(`paddle.jit.save`保存的模型) ...@@ -417,8 +423,8 @@ inference 模型(`paddle.jit.save`保存的模型)
```bash ```bash
# -c 后面设置训练算法的yml配置文件 # -c 后面设置训练算法的yml配置文件
# -o 配置可选参数 # -o 配置可选参数
# Global.pretrained_model 参数设置待转换的训练模型地址。 # Architecture.Backbone.checkpoints 参数设置待转换的训练模型地址
# Global.save_inference_dir参数设置转换的模型将保存的地址。 # Global.save_inference_dir 参数设置转换的模型将保存的地址
python3 tools/export_model.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh/best_accuracy Global.save_inference_dir=./inference/ser_vi_layoutxlm python3 tools/export_model.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh/best_accuracy Global.save_inference_dir=./inference/ser_vi_layoutxlm
``` ```
...@@ -440,10 +446,10 @@ VI-LayoutXLM模型基于SER任务进行推理,可以执行如下命令: ...@@ -440,10 +446,10 @@ VI-LayoutXLM模型基于SER任务进行推理,可以执行如下命令:
```bash ```bash
cd ppstructure cd ppstructure
python3 vqa/predict_vqa_token_ser.py \ python3 kie/predict_kie_token_ser.py \
--vqa_algorithm=LayoutXLM \ --kie_algorithm=LayoutXLM \
--ser_model_dir=../inference/ser_vi_layoutxlm \ --ser_model_dir=../inference/ser_vi_layoutxlm \
--image_dir=./docs/vqa/input/zh_val_42.jpg \ --image_dir=./docs/kie/input/zh_val_42.jpg \
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \ --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
--vis_font_path=../doc/fonts/simfang.ttf \ --vis_font_path=../doc/fonts/simfang.ttf \
--ocr_order_method="tb-yx" --ocr_order_method="tb-yx"
...@@ -452,7 +458,7 @@ python3 vqa/predict_vqa_token_ser.py \ ...@@ -452,7 +458,7 @@ python3 vqa/predict_vqa_token_ser.py \
可视化SER结果结果默认保存到`./output`文件夹里面。结果示例如下: 可视化SER结果结果默认保存到`./output`文件夹里面。结果示例如下:
<div align="center"> <div align="center">
<img src="../../ppstructure/docs/vqa/result_ser/zh_val_42_ser.jpg" width="800"> <img src="../../ppstructure/docs/kie/result_ser/zh_val_42_ser.jpg" width="800">
</div> </div>
......
# Academic Algorithms and Models
PaddleOCR will add cutting-edge OCR algorithms and models continuously. Check out the supported models and tutorials by clicking the following list:
- [text detection algorithms](./algorithm_overview_en.md#11)
- [text recognition algorithms](./algorithm_overview_en.md#12)
- [end-to-end algorithms](./algorithm_overview_en.md#2)
- [table recognition algorithms](./algorithm_overview_en.md#3)
Developers are welcome to contribute more algorithms! Please refer to [add new algorithm](./add_new_algorithm_en.md) guideline.
# KIE Algorithm - LayoutXLM
- [1. Introduction](#1-introduction)
- [2. Environment](#2-environment)
- [3. Model Training / Evaluation / Prediction](#3-model-training--evaluation--prediction)
- [4. Inference and Deployment](#4-inference-and-deployment)
- [4.1 Python Inference](#41-python-inference)
- [4.2 C++ Inference](#42-c-inference)
- [4.3 Serving](#43-serving)
- [4.4 More](#44-more)
- [5. FAQ](#5-faq)
- [Citation](#Citation)
## 1. Introduction
Paper:
> [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836)
>
> Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
>
> 2021
On XFUND_zh dataset, the algorithm reproduction Hmean is as follows.
|Model|Backbone|Task |Cnnfig|Hmean|Download link|
| --- | --- |--|--- | --- | --- |
|LayoutXLM|LayoutXLM-base|SER |[ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)/[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar)|
|LayoutXLM|LayoutXLM-base|RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)/[inference model(coming soon)]()|
## 2. Environment
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
Please refer to [KIE tutorial](./kie_en.md)。PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models.
## 4. Inference and Deployment
### 4.1 Python Inference
**Note:** Currently, the RE model inference process is still in the process of adaptation. We take SER model as an example to introduce the KIE process based on LayoutXLM model.
First, we need to export the trained model into inference model. Take LayoutXLM model trained on XFUND_zh as an example ([trained model download link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)). Use the following command to export.
``` bash
wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar
tar -xf ser_LayoutXLM_xfun_zh.tar
python3 tools/export_model.py -c configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./ser_LayoutXLM_xfun_zh/best_accuracy Global.save_inference_dir=./inference/ser_layoutxlm
```
Use the following command to infer using LayoutXLM SER model.
```bash
cd ppstructure
python3 kie/predict_kie_token_ser.py \
--kie_algorithm=LayoutXLM \
--ser_model_dir=../inference/ser_layoutxlm_infer \
--image_dir=./docs/kie/input/zh_val_42.jpg \
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
--vis_font_path=../doc/fonts/simfang.ttf
```
The SER visualization results are saved in the `./output` directory by default. The results are as follows.
<div align="center">
<img src="../../ppstructure/docs/kie/result_ser/zh_val_42_ser.jpg" width="800">
</div>
### 4.2 C++ Inference
Not supported
### 4.3 Serving
Not supported
### 4.4 More
Not supported
## 5. FAQ
## Citation
```bibtex
@article{DBLP:journals/corr/abs-2104-08836,
author = {Yiheng Xu and
Tengchao Lv and
Lei Cui and
Guoxin Wang and
Yijuan Lu and
Dinei Flor{\^{e}}ncio and
Cha Zhang and
Furu Wei},
title = {LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich
Document Understanding},
journal = {CoRR},
volume = {abs/2104.08836},
year = {2021},
url = {https://arxiv.org/abs/2104.08836},
eprinttype = {arXiv},
eprint = {2104.08836},
timestamp = {Thu, 14 Oct 2021 09:17:23 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-08836.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-1912-13318,
author = {Yiheng Xu and
Minghao Li and
Lei Cui and
Shaohan Huang and
Furu Wei and
Ming Zhou},
title = {LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
journal = {CoRR},
volume = {abs/1912.13318},
year = {2019},
url = {http://arxiv.org/abs/1912.13318},
eprinttype = {arXiv},
eprint = {1912.13318},
timestamp = {Mon, 01 Jun 2020 16:20:46 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1912-13318.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-2012-14740,
author = {Yang Xu and
Yiheng Xu and
Tengchao Lv and
Lei Cui and
Furu Wei and
Guoxin Wang and
Yijuan Lu and
Dinei A. F. Flor{\^{e}}ncio and
Cha Zhang and
Wanxiang Che and
Min Zhang and
Lidong Zhou},
title = {LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding},
journal = {CoRR},
volume = {abs/2012.14740},
year = {2020},
url = {https://arxiv.org/abs/2012.14740},
eprinttype = {arXiv},
eprint = {2012.14740},
timestamp = {Tue, 27 Jul 2021 09:53:52 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2012-14740.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
# KIE Algorithm - SDMGR
- [1. Introduction](#1-introduction)
- [2. Environment](#2-environment)
- [3. Model Training / Evaluation / Prediction](#3-model-training--evaluation--prediction)
- [4. Inference and Deployment](#4-inference-and-deployment)
- [4.1 Python Inference](#41-python-inference)
- [4.2 C++ Inference](#42-c-inference)
- [4.3 Serving](#43-serving)
- [4.4 More](#44-more)
- [5. FAQ](#5-faq)
- [Citation](#Citation)
## 1. Introduction
Paper:
> [Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
>
> Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang
>
> 2021
On wildreceipt dataset, the algorithm reproduction Hmean is as follows.
|Model|Backbone |Cnnfig|Hmean|Download link|
| --- | --- | --- | --- | --- |
|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model]( https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)/[inference model(coming soon)]()|
## 2. 环境配置
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
SDMGR is a key information extraction algorithm that classifies each detected textline into predefined categories, such as order ID, invoice number, amount, etc.
The training and test data are collected in the wildreceipt dataset, use following command to downloaded the dataset.
```bash
wget https://paddleocr.bj.bcebos.com/ppstructure/dataset/wildreceipt.tar && tar xf wildreceipt.tar
```
Create dataset soft link to `PaddleOCR/train_data` directory.
```bash
cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./
```
### 3.1 Model training
The config file is `configs/kie/sdmgr/kie_unet_sdmgr.yml`, the default dataset path is `train_data/wildreceipt`.
Use the following command to train the model.
```bash
python3 tools/train.py -c configs/kie/sdmgr/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
```
### 3.2 Model evaluation
Use the following command to evaluate the model.
```bash
python3 tools/eval.py -c configs/kie/sdmgr/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
```
An example of output information is shown below.
```py
[2022/08/10 05:22:23] ppocr INFO: metric eval ***************
[2022/08/10 05:22:23] ppocr INFO: hmean:0.8670120239257812
[2022/08/10 05:22:23] ppocr INFO: fps:10.18816520530961
```
### 3.3 Model prediction
Use the following command to load the model and predict. During the prediction, the text file storing the image path and OCR information needs to be loaded in advance. Use `Global.infer_img` to assign.
```bash
python3 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=./train_data/wildreceipt/1.txt
```
The visualization results and texts are saved in the `./output/sdmgr_kie/` directory by default. The results are as follows.
<div align="center">
<img src="../../ppstructure/docs/imgs/sdmgr_result.png" width="800">
</div>
## 4. Inference and Deployment
### 4.1 Python Inference
Not supported
### 4.2 C++ Inference
Not supported
### 4.3 Serving
Not supported
### 4.4 More
Not supported
## 5. FAQ
## Citation
```bibtex
@misc{sun2021spatial,
title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
year={2021},
eprint={2103.14470},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
# KIE Algorithm - VI-LayoutXLM
- [1. Introduction](#1-introduction)
- [2. Environment](#2-environment)
- [3. Model Training / Evaluation / Prediction](#3-model-training--evaluation--prediction)
- [4. Inference and Deployment](#4-inference-and-deployment)
- [4.1 Python Inference](#41-python-inference)
- [4.2 C++ Inference](#42-c-inference)
- [4.3 Serving](#43-serving)
- [4.4 More](#44-more)
- [5. FAQ](#5-faq)
- [Citation](#Citation)
## 1. Introduction
VI-LayoutXLM is improved based on LayoutXLM. In the process of downstream finetuning, the visual backbone network module is removed, and the model infernce speed is further improved on the basis of almost lossless accuracy.
On XFUND_zh dataset, the algorithm reproduction Hmean is as follows.
|Model|Backbone|Task |Cnnfig|Hmean|Download link|
| --- | --- |---| --- | --- | --- |
|VI-LayoutXLM |VI-LayoutXLM-base | SER |[ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|93.19%|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)/[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar)|
|VI-LayoutXLM |VI-LayoutXLM-base |RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|83.92%|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)/[inference model(coming soon)]()|
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
Please refer to [KIE tutorial](./kie_en.md)。PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models.
## 4. Inference and Deployment
### 4.1 Python Inference
**Note:** Currently, the RE model inference process is still in the process of adaptation. We take SER model as an example to introduce the KIE process based on VI-LayoutXLM model.
First, we need to export the trained model into inference model. Take VI-LayoutXLM model trained on XFUND_zh as an example ([trained model download link](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)). Use the following command to export.
``` bash
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar
tar -xf ser_vi_layoutxlm_xfund_pretrained.tar
python3 tools/export_model.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./ser_vi_layoutxlm_xfund_pretrained/best_accuracy Global.save_inference_dir=./inference/ser_vi_layoutxlm_infer
```
Use the following command to infer using VI-LayoutXLM SER model.
```bash
cd ppstructure
python3 kie/predict_kie_token_ser.py \
--kie_algorithm=LayoutXLM \
--ser_model_dir=../inference/ser_vi_layoutxlm_infer \
--image_dir=./docs/kie/input/zh_val_42.jpg \
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
--vis_font_path=../doc/fonts/simfang.ttf \
--ocr_order_method="tb-yx"
```
The SER visualization results are saved in the `./output` folder by default. The results are as follows.
<div align="center">
<img src="../../ppstructure/docs/kie/result_ser/zh_val_42_ser.jpg" width="800">
</div>
### 4.2 C++ Inference
Not supported
### 4.3 Serving
Not supported
### 4.4 More
Not supported
## 5. FAQ
## Citation
```bibtex
@article{DBLP:journals/corr/abs-2104-08836,
author = {Yiheng Xu and
Tengchao Lv and
Lei Cui and
Guoxin Wang and
Yijuan Lu and
Dinei Flor{\^{e}}ncio and
Cha Zhang and
Furu Wei},
title = {LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich
Document Understanding},
journal = {CoRR},
volume = {abs/2104.08836},
year = {2021},
url = {https://arxiv.org/abs/2104.08836},
eprinttype = {arXiv},
eprint = {2104.08836},
timestamp = {Thu, 14 Oct 2021 09:17:23 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-08836.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-1912-13318,
author = {Yiheng Xu and
Minghao Li and
Lei Cui and
Shaohan Huang and
Furu Wei and
Ming Zhou},
title = {LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
journal = {CoRR},
volume = {abs/1912.13318},
year = {2019},
url = {http://arxiv.org/abs/1912.13318},
eprinttype = {arXiv},
eprint = {1912.13318},
timestamp = {Mon, 01 Jun 2020 16:20:46 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1912-13318.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-2012-14740,
author = {Yang Xu and
Yiheng Xu and
Tengchao Lv and
Lei Cui and
Furu Wei and
Guoxin Wang and
Yijuan Lu and
Dinei A. F. Flor{\^{e}}ncio and
Cha Zhang and
Wanxiang Che and
Min Zhang and
Lidong Zhou},
title = {LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding},
journal = {CoRR},
volume = {abs/2012.14740},
year = {2020},
url = {https://arxiv.org/abs/2012.14740},
eprinttype = {arXiv},
eprint = {2012.14740},
timestamp = {Tue, 27 Jul 2021 09:53:52 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2012-14740.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
# OCR Algorithms # Algorithms
- [1. Two-stage Algorithms](#1) - [1. Two-stage OCR Algorithms](#1)
- [1.1 Text Detection Algorithms](#11) - [1.1 Text Detection Algorithms](#11)
- [1.2 Text Recognition Algorithms](#12) - [1.2 Text Recognition Algorithms](#12)
- [2. End-to-end Algorithms](#2) - [2. End-to-end OCR Algorithms](#2)
- [3. Table Recognition Algorithms](#3) - [3. Table Recognition Algorithms](#3)
- [4. Key Information Extraction Algorithms](#4)
This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCRv3 models list](./models_list_en.md).
>>
Developers are welcome to contribute more algorithms! Please refer to [add new algorithm](./add_new_algorithm_en.md) guideline.
This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md).
<a name="1"></a> <a name="1"></a>
## 1. Two-stage Algorithms ## 1. Two-stage OCR Algorithms
<a name="11"></a> <a name="11"></a>
...@@ -98,11 +102,12 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r ...@@ -98,11 +102,12 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
<a name="2"></a> <a name="2"></a>
## 2. End-to-end Algorithms ## 2. End-to-end OCR Algorithms
Supported end-to-end algorithms (Click the link to get the tutorial): Supported end-to-end algorithms (Click the link to get the tutorial):
- [x] [PGNet](./algorithm_e2e_pgnet_en.md) - [x] [PGNet](./algorithm_e2e_pgnet_en.md)
<a name="3"></a> <a name="3"></a>
## 3. Table Recognition Algorithms ## 3. Table Recognition Algorithms
...@@ -114,3 +119,34 @@ On the PubTabNet dataset, the algorithm result is as follows: ...@@ -114,3 +119,34 @@ On the PubTabNet dataset, the algorithm result is as follows:
|Model|Backbone|Config|Acc|Download link| |Model|Backbone|Config|Acc|Download link|
|---|---|---|---|---| |---|---|---|---|---|
|TableMaster|TableResNetExtra|[configs/table/table_master.yml](../../configs/table/table_master.yml)|77.47%|[trained](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar) / [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_infer.tar)| |TableMaster|TableResNetExtra|[configs/table/table_master.yml](../../configs/table/table_master.yml)|77.47%|[trained](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar) / [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_infer.tar)|
<a name="4"></a>
## 4. Key Information Extraction Algorithms
Supported KIE algorithms (Click the link to get the tutorial):
- [x] [VI-LayoutXLM](./algorithm_kie_vi_laoutxlm_en.md)
- [x] [LayoutLM](./algorithm_kie_laoutxlm_en.md)
- [x] [LayoutLMv2](./algorithm_kie_laoutxlm_en.md)
- [x] [LayoutXLM](./algorithm_kie_laoutxlm_en.md)
- [x] [SDMGR](./algorithm_kie_sdmgr_en.md)
On wildreceipt dataset, the algorithm result is as follows:
|Model|Backbone|Config|Hmean|Download link|
| --- | --- | --- | --- | --- |
|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
On XFUND_zh dataset, the algorithm result is as follows:
|Model|Backbone|Task|Config|Hmean|Download link|
| --- | --- | --- | --- | --- | --- |
|VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)|
|LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)|
|LayoutLM| LayoutLM-base | SER | [ser_layoutlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlm_xfund_zh.yml)|77.31%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar)|
|LayoutLMv2| LayoutLMv2-base | SER | [ser_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlmv2_xfund_zh.yml)|85.44%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)|
|VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)|
|LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)|
|LayoutLMv2| LayoutLMv2-base | RE | [re_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutlmv2_xfund_zh.yml)|67.77%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar)|
## DocVQA dataset ## Key Imnformation Extraction dataset
Here are the common DocVQA datasets, which are being updated continuously. Welcome to contribute datasets~
Here are the common DocVQA datasets, which are being updated continuously. Welcome to contribute datasets.
- [FUNSD dataset](#funsd) - [FUNSD dataset](#funsd)
- [XFUND dataset](#xfund) - [XFUND dataset](#xfund)
- [wildreceipt dataset](#wildreceipt数据集)
<a name="funsd"></a> <a name="funsd"></a>
#### 1. FUNSD dataset #### 1. FUNSD dataset
...@@ -25,3 +27,21 @@ Here are the common DocVQA datasets, which are being updated continuously. Welco ...@@ -25,3 +27,21 @@ Here are the common DocVQA datasets, which are being updated continuously. Welco
</div> </div>
- **Download address**: https://github.com/doc-analysis/XFUND/releases/tag/v1.0 - **Download address**: https://github.com/doc-analysis/XFUND/releases/tag/v1.0
<a name="wildreceipt"></a>
## 3. wildreceipt dataset
- **Data source**: https://arxiv.org/abs/2103.14470
- **Data introduction**: XFUND is an English receipt dataset, which contains 26 different categories. There are 1267 training images and 472 evaluation images, in which 50,000 textlines and boxes are annotated. Part of the image and the annotation box visualization are shown below.
<div align="center">
<img src="../../datasets/wildreceipt_demo/2769.jpeg" width="500">
<img src="../../datasets/wildreceipt_demo/1bbe854b8817dedb8585e0732089fd1f752d2cec.jpeg" width="500">
</div>
**Note:** Boxes with category `Ignore` or `Others` are not visualized here.
- **Download address**
- Offical dataset: [link](https://download.openmmlab.com/mmocr/data/wildreceipt.tar)
- Dataset converted for PaddleOCR training process: [link](https://paddleocr.bj.bcebos.com/ppstructure/dataset/wildreceipt.tar)
此差异已折叠。
...@@ -35,7 +35,7 @@ from tools.infer import predict_system ...@@ -35,7 +35,7 @@ from tools.infer import predict_system
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
logger = get_logger() logger = get_logger()
from ppocr.utils.utility import check_and_read_gif, get_image_file_list from ppocr.utils.utility import check_and_read, get_image_file_list
from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url
from tools.infer.utility import draw_ocr, str2bool, check_gpu from tools.infer.utility import draw_ocr, str2bool, check_gpu
from ppstructure.utility import init_args, draw_structure_result from ppstructure.utility import init_args, draw_structure_result
...@@ -289,7 +289,8 @@ MODEL_URLS = { ...@@ -289,7 +289,8 @@ MODEL_URLS = {
'ch': { 'ch': {
'url': 'url':
'https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar', 'https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar',
'dict_path': 'ppocr/utils/dict/layout_publaynet_dict.txt' 'dict_path':
'ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt'
} }
} }
} }
...@@ -490,7 +491,7 @@ class PaddleOCR(predict_system.TextSystem): ...@@ -490,7 +491,7 @@ class PaddleOCR(predict_system.TextSystem):
download_with_progressbar(img, 'tmp.jpg') download_with_progressbar(img, 'tmp.jpg')
img = 'tmp.jpg' img = 'tmp.jpg'
image_file = img image_file = img
img, flag = check_and_read_gif(image_file) img, flag, _ = check_and_read(image_file)
if not flag: if not flag:
with open(image_file, 'rb') as f: with open(image_file, 'rb') as f:
np_arr = np.frombuffer(f.read(), dtype=np.uint8) np_arr = np.frombuffer(f.read(), dtype=np.uint8)
...@@ -584,7 +585,7 @@ class PPStructure(StructureSystem): ...@@ -584,7 +585,7 @@ class PPStructure(StructureSystem):
download_with_progressbar(img, 'tmp.jpg') download_with_progressbar(img, 'tmp.jpg')
img = 'tmp.jpg' img = 'tmp.jpg'
image_file = img image_file = img
img, flag = check_and_read_gif(image_file) img, flag, _ = check_and_read(image_file)
if not flag: if not flag:
with open(image_file, 'rb') as f: with open(image_file, 'rb') as f:
np_arr = np.frombuffer(f.read(), dtype=np.uint8) np_arr = np.frombuffer(f.read(), dtype=np.uint8)
...@@ -635,4 +636,6 @@ def main(): ...@@ -635,4 +636,6 @@ def main():
for item in result: for item in result:
item.pop('img') item.pop('img')
item.pop('res')
logger.info(item) logger.info(item)
logger.info('result save to {}'.format(args.output))
...@@ -35,10 +35,12 @@ class CopyPaste(object): ...@@ -35,10 +35,12 @@ class CopyPaste(object):
point_num = data['polys'].shape[1] point_num = data['polys'].shape[1]
src_img = data['image'] src_img = data['image']
src_polys = data['polys'].tolist() src_polys = data['polys'].tolist()
src_texts = data['texts']
src_ignores = data['ignore_tags'].tolist() src_ignores = data['ignore_tags'].tolist()
ext_data = data['ext_data'][0] ext_data = data['ext_data'][0]
ext_image = ext_data['image'] ext_image = ext_data['image']
ext_polys = ext_data['polys'] ext_polys = ext_data['polys']
ext_texts = ext_data['texts']
ext_ignores = ext_data['ignore_tags'] ext_ignores = ext_data['ignore_tags']
indexs = [i for i in range(len(ext_ignores)) if not ext_ignores[i]] indexs = [i for i in range(len(ext_ignores)) if not ext_ignores[i]]
...@@ -53,7 +55,7 @@ class CopyPaste(object): ...@@ -53,7 +55,7 @@ class CopyPaste(object):
src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB) src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB)
ext_image = cv2.cvtColor(ext_image, cv2.COLOR_BGR2RGB) ext_image = cv2.cvtColor(ext_image, cv2.COLOR_BGR2RGB)
src_img = Image.fromarray(src_img).convert('RGBA') src_img = Image.fromarray(src_img).convert('RGBA')
for poly, tag in zip(select_polys, select_ignores): for idx, poly, tag in zip(select_idxs, select_polys, select_ignores):
box_img = get_rotate_crop_image(ext_image, poly) box_img = get_rotate_crop_image(ext_image, poly)
src_img, box = self.paste_img(src_img, box_img, src_polys) src_img, box = self.paste_img(src_img, box_img, src_polys)
...@@ -62,6 +64,7 @@ class CopyPaste(object): ...@@ -62,6 +64,7 @@ class CopyPaste(object):
for _ in range(len(box), point_num): for _ in range(len(box), point_num):
box.append(box[-1]) box.append(box[-1])
src_polys.append(box) src_polys.append(box)
src_texts.append(ext_texts[idx])
src_ignores.append(tag) src_ignores.append(tag)
src_img = cv2.cvtColor(np.array(src_img), cv2.COLOR_RGB2BGR) src_img = cv2.cvtColor(np.array(src_img), cv2.COLOR_RGB2BGR)
h, w = src_img.shape[:2] h, w = src_img.shape[:2]
...@@ -70,6 +73,7 @@ class CopyPaste(object): ...@@ -70,6 +73,7 @@ class CopyPaste(object):
src_polys[:, :, 1] = np.clip(src_polys[:, :, 1], 0, h) src_polys[:, :, 1] = np.clip(src_polys[:, :, 1], 0, h)
data['image'] = src_img data['image'] = src_img
data['polys'] = src_polys data['polys'] = src_polys
data['texts'] = src_texts
data['ignore_tags'] = np.array(src_ignores) data['ignore_tags'] = np.array(src_ignores)
return data return data
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import Levenshtein from rapidfuzz.distance import Levenshtein
import string import string
...@@ -46,8 +46,7 @@ class RecMetric(object): ...@@ -46,8 +46,7 @@ class RecMetric(object):
if self.is_filter: if self.is_filter:
pred = self._normalize_text(pred) pred = self._normalize_text(pred)
target = self._normalize_text(target) target = self._normalize_text(target)
norm_edit_dis += Levenshtein.distance(pred, target) / max( norm_edit_dis += Levenshtein.normalized_distance(pred, target)
len(pred), len(target), 1)
if pred == target: if pred == target:
correct_num += 1 correct_num += 1
all_num += 1 all_num += 1
......
...@@ -52,17 +52,15 @@ def build_backbone(config, model_type): ...@@ -52,17 +52,15 @@ def build_backbone(config, model_type):
support_dict = ['ResNet'] support_dict = ['ResNet']
elif model_type == 'kie': elif model_type == 'kie':
from .kie_unet_sdmgr import Kie_backbone from .kie_unet_sdmgr import Kie_backbone
support_dict = ['Kie_backbone'] from .vqa_layoutlm import LayoutLMForSer, LayoutLMv2ForSer, LayoutLMv2ForRe, LayoutXLMForSer, LayoutXLMForRe
support_dict = [
'Kie_backbone', 'LayoutLMForSer', 'LayoutLMv2ForSer',
'LayoutLMv2ForRe', 'LayoutXLMForSer', 'LayoutXLMForRe'
]
elif model_type == 'table': elif model_type == 'table':
from .table_resnet_vd import ResNet from .table_resnet_vd import ResNet
from .table_mobilenet_v3 import MobileNetV3 from .table_mobilenet_v3 import MobileNetV3
support_dict = ['ResNet', 'MobileNetV3'] support_dict = ['ResNet', 'MobileNetV3']
elif model_type == 'vqa':
from .vqa_layoutlm import LayoutLMForSer, LayoutLMv2ForSer, LayoutLMv2ForRe, LayoutXLMForSer, LayoutXLMForRe
support_dict = [
'LayoutLMForSer', 'LayoutLMv2ForSer', 'LayoutLMv2ForRe',
'LayoutXLMForSer', 'LayoutXLMForRe'
]
else: else:
raise NotImplementedError raise NotImplementedError
......
...@@ -54,13 +54,15 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -54,13 +54,15 @@ def load_model(config, model, optimizer=None, model_type='det'):
pretrained_model = global_config.get('pretrained_model') pretrained_model = global_config.get('pretrained_model')
best_model_dict = {} best_model_dict = {}
is_float16 = False is_float16 = False
is_nlp_model = model_type == 'kie' and config["Architecture"][
"algorithm"] not in ["SDMGR"]
if model_type == 'vqa': if is_nlp_model is True:
# NOTE: for vqa model dsitillation, resume training is not supported now # NOTE: for kie model dsitillation, resume training is not supported now
if config["Architecture"]["algorithm"] in ["Distillation"]: if config["Architecture"]["algorithm"] in ["Distillation"]:
return best_model_dict return best_model_dict
checkpoints = config['Architecture']['Backbone']['checkpoints'] checkpoints = config['Architecture']['Backbone']['checkpoints']
# load vqa method metric # load kie method metric
if checkpoints: if checkpoints:
if os.path.exists(os.path.join(checkpoints, 'metric.states')): if os.path.exists(os.path.join(checkpoints, 'metric.states')):
with open(os.path.join(checkpoints, 'metric.states'), with open(os.path.join(checkpoints, 'metric.states'),
...@@ -102,8 +104,9 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -102,8 +104,9 @@ def load_model(config, model, optimizer=None, model_type='det'):
continue continue
pre_value = params[key] pre_value = params[key]
if pre_value.dtype == paddle.float16: if pre_value.dtype == paddle.float16:
pre_value = pre_value.astype(paddle.float32)
is_float16 = True is_float16 = True
if pre_value.dtype != value.dtype:
pre_value = pre_value.astype(value.dtype)
if list(value.shape) == list(pre_value.shape): if list(value.shape) == list(pre_value.shape):
new_state_dict[key] = pre_value new_state_dict[key] = pre_value
else: else:
...@@ -160,8 +163,9 @@ def load_pretrained_params(model, path): ...@@ -160,8 +163,9 @@ def load_pretrained_params(model, path):
logger.warning("The pretrained params {} not in model".format(k1)) logger.warning("The pretrained params {} not in model".format(k1))
else: else:
if params[k1].dtype == paddle.float16: if params[k1].dtype == paddle.float16:
params[k1] = params[k1].astype(paddle.float32)
is_float16 = True is_float16 = True
if params[k1].dtype != state_dict[k1].dtype:
params[k1] = params[k1].astype(state_dict[k1].dtype)
if list(state_dict[k1].shape) == list(params[k1].shape): if list(state_dict[k1].shape) == list(params[k1].shape):
new_state_dict[k1] = params[k1] new_state_dict[k1] = params[k1]
else: else:
...@@ -192,10 +196,13 @@ def save_model(model, ...@@ -192,10 +196,13 @@ def save_model(model,
_mkdir_if_not_exist(model_path, logger) _mkdir_if_not_exist(model_path, logger)
model_prefix = os.path.join(model_path, prefix) model_prefix = os.path.join(model_path, prefix)
paddle.save(optimizer.state_dict(), model_prefix + '.pdopt') paddle.save(optimizer.state_dict(), model_prefix + '.pdopt')
if config['Architecture']["model_type"] != 'vqa':
is_nlp_model = config['Architecture']["model_type"] == 'kie' and config[
"Architecture"]["algorithm"] not in ["SDMGR"]
if is_nlp_model is not True:
paddle.save(model.state_dict(), model_prefix + '.pdparams') paddle.save(model.state_dict(), model_prefix + '.pdparams')
metric_prefix = model_prefix metric_prefix = model_prefix
else: # for vqa system, we follow the save/load rules in NLP else: # for kie system, we follow the save/load rules in NLP
if config['Global']['distributed']: if config['Global']['distributed']:
arch = model._layers arch = model._layers
else: else:
......
...@@ -50,7 +50,7 @@ def get_check_global_params(mode): ...@@ -50,7 +50,7 @@ def get_check_global_params(mode):
def _check_image_file(path): def _check_image_file(path):
img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif'} img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif', 'pdf'}
return any([path.lower().endswith(e) for e in img_end]) return any([path.lower().endswith(e) for e in img_end])
...@@ -59,7 +59,7 @@ def get_image_file_list(img_file): ...@@ -59,7 +59,7 @@ def get_image_file_list(img_file):
if img_file is None or not os.path.exists(img_file): if img_file is None or not os.path.exists(img_file):
raise Exception("not found any img file in {}".format(img_file)) raise Exception("not found any img file in {}".format(img_file))
img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif'} img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif', 'pdf'}
if os.path.isfile(img_file) and _check_image_file(img_file): if os.path.isfile(img_file) and _check_image_file(img_file):
imgs_lists.append(img_file) imgs_lists.append(img_file)
elif os.path.isdir(img_file): elif os.path.isdir(img_file):
...@@ -73,7 +73,7 @@ def get_image_file_list(img_file): ...@@ -73,7 +73,7 @@ def get_image_file_list(img_file):
return imgs_lists return imgs_lists
def check_and_read_gif(img_path): def check_and_read(img_path):
if os.path.basename(img_path)[-3:] in ['gif', 'GIF']: if os.path.basename(img_path)[-3:] in ['gif', 'GIF']:
gif = cv2.VideoCapture(img_path) gif = cv2.VideoCapture(img_path)
ret, frame = gif.read() ret, frame = gif.read()
...@@ -84,8 +84,26 @@ def check_and_read_gif(img_path): ...@@ -84,8 +84,26 @@ def check_and_read_gif(img_path):
if len(frame.shape) == 2 or frame.shape[-1] == 1: if len(frame.shape) == 2 or frame.shape[-1] == 1:
frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2RGB) frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2RGB)
imgvalue = frame[:, :, ::-1] imgvalue = frame[:, :, ::-1]
return imgvalue, True return imgvalue, True, False
return None, False elif os.path.basename(img_path)[-3:] in ['pdf']:
import fitz
from PIL import Image
imgs = []
with fitz.open(img_path) as pdf:
for pg in range(0, pdf.pageCount):
page = pdf[pg]
mat = fitz.Matrix(2, 2)
pm = page.getPixmap(matrix=mat, alpha=False)
# if width or height > 2000 pixels, don't enlarge the image
if pm.width > 2000 or pm.height > 2000:
pm = page.getPixmap(matrix=fitz.Matrix(1, 1), alpha=False)
img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
imgs.append(img)
return imgs, False, True
return None, False, False
def load_vqa_bio_label_maps(label_map_path): def load_vqa_bio_label_maps(label_map_path):
......
English | [简体中文](README_ch.md) English | [简体中文](README_ch.md)
- [1. Introduction](#1-introduction) - [1. Introduction](#1-introduction)
- [2. Update log](#2-update-log) - [2. Features](#2-features)
- [3. Features](#3-features) - [3. Results](#3-results)
- [4. Results](#4-results) - [3.1 Layout analysis and table recognition](#31-layout-analysis-and-table-recognition)
- [4.1 Layout analysis and table recognition](#41-layout-analysis-and-table-recognition) - [3.2 Layout Recovery](#32-layout-recovery)
- [4.2 DOC-VQA](#42-doc-vqa) - [3.3 KIE](#33-kie)
- [5. Quick start](#5-quick-start) - [4. Quick start](#4-quick-start)
- [6. PP-Structure System](#6-pp-structure-system) - [5. Model List](#5-model-list)
- [6.1 Layout analysis and table recognition](#61-layout-analysis-and-table-recognition)
- [6.1.1 Layout analysis](#611-layout-analysis)
- [6.1.2 Table recognition](#612-table-recognition)
- [6.2 DOC-VQA](#62-doc-vqa)
- [7. Model List](#7-model-list)
- [7.1 Layout analysis model](#71-layout-analysis-model)
- [7.2 OCR and table recognition model](#72-ocr-and-table-recognition-model)
- [7.3 DOC-VQA model](#73-doc-vqa-model)
## 1. Introduction ## 1. Introduction
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks PP-Structure is an intelligent document analysis system developed by the PaddleOCR team, which aims to help developers better complete tasks related to document understanding such as layout analysis and table recognition.
## 2. Update log The pipeline of PP-Structurev2 system is shown below. The document image first passes through the image direction correction module to identify the direction of the entire image and complete the direction correction. Then, two tasks of layout information analysis and key information extraction can be completed.
* 2022.02.12 DOC-VQA add LayoutLMv2 model。
* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)
## 3. Features - In the layout analysis task, the image first goes through the layout analysis model to divide the image into different areas such as text, table, and figure, and then analyze these areas separately. For example, the table area is sent to the form recognition module for structured recognition, and the text area is sent to the OCR engine for text recognition. Finally, the layout recovery module restores it to a word or pdf file with the same layout as the original image;
- In the key information extraction task, the OCR engine is first used to extract the text content, and then the SER(semantic entity recognition) module obtains the semantic entities in the image, and finally the RE(relationship extraction) module obtains the correspondence between the semantic entities, thereby extracting the required key information.
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/>
The main features of PP-Structure are as follows: More technical details: 👉 [PP-Structurev2 Technical Report](docs/PP-Structurev2_introduction.md)
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser) PP-Structurev2 supports independent use or flexible collocation of each module. For example, you can use layout analysis alone or table recognition alone. Click the corresponding link below to get the tutorial for each independent module:
- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR)
- Support to extract excel files from the table areas
- Support python whl package and command line usage, easy to use
- Support custom training for layout analysis and table structure tasks
- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
## 4. Results - [Layout Analysis](layout/README.md)
- [Table Recognition](table/README.md)
- [Key Information Extraction](kie/README.md)
- [Layout Recovery](recovery/README.md)
### 4.1 Layout analysis and table recognition ## 2. Features
<img src="docs/table/ppstructure.GIF" width="100%"/> The main features of PP-Structurev2 are as follows:
- Support layout analysis of documents in the form of images/pdfs, which can be divided into areas such as **text, titles, tables, figures, formulas, etc.**;
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use. - Support common Chinese and English **table detection** tasks;
- Support structured table recognition, and output the final result to **Excel file**;
### 4.2 DOC-VQA - Support multimodal-based Key Information Extraction (KIE) tasks - **Semantic Entity Recognition** (SER) and **Relation Extraction (RE);
- Support **layout recovery**, that is, restore the document in word or pdf format with the same layout as the original image;
* SER - Support customized training and multiple inference deployment methods such as python whl package quick start;
* - Connect with the semi-automatic data labeling tool PPOCRLabel, which supports the labeling of layout analysis, table recognition, and SER.
![](docs/vqa/result_ser/zh_val_0_ser.jpg) | ![](docs/vqa/result_ser/zh_val_42_ser.jpg)
---|---
Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header:
* Dark purple: header ## 3. Results
* Light purple: query
* Army green: answer
The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box. PP-Structurev2 supports the independent use or flexible collocation of each module. For example, layout analysis can be used alone, or table recognition can be used alone. Only the visualization effects of several representative usage methods are shown here.
### 3.1 Layout analysis and table recognition
* RE The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
<img src="docs/table/ppstructure.GIF" width="100%"/>
![](docs/vqa/result_re/zh_val_21_re.jpg) | ![](docs/vqa/result_re/zh_val_40_re.jpg)
---|---
### 3.2 Layout recovery
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box. The following figure shows the effect of layout recovery based on the results of layout analysis and table recognition in the previous section.
<img src="./docs/recovery/recovery.jpg" width="100%"/>
## 5. Quick start ### 3.3 KIE
Start from [Quick Installation](./docs/quickstart.md) * SER
## 6. PP-Structure System Different colored boxes in the figure represent different categories.
### 6.1 Layout analysis and table recognition <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
![pipeline](docs/table/pipeline.jpg) <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095702-9acef674-12af-4d09-97fc-abf4ab32600e.png" width="600">
</div>
In PP-Structure, the image will be divided into 5 types of areas **text, title, image list and table**. For the first 4 types of areas, directly use PP-OCR system to complete the text detection and recognition. For the table area, after the table structuring process, the table in image is converted into an Excel file with the same table style. <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539141-68e71c75-5cf7-4529-b2ca-219d29fa5f68.jpg" width="600">
</div>
#### 6.1.1 Layout analysis <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README.md). <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
#### 6.1.2 Table recognition * RE
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md) In the figure, the red box represents `Question`, the blue box represents `Answer`, and `Question` and `Answer` are connected by green lines.
### 6.2 DOC-VQA <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md) <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095641-5843b4da-34d7-4c1c-943a-b1036a859fe3.png" width="600">
</div>
## 7. Model List <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
PP-Structure Series Model List (Updating) <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div>
### 7.1 Layout analysis model ## 4. Quick start
|model name|description|download|label_map| Start from [Quick Start](./docs/quickstart_en.md).
| --- | --- | --- |--- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}|
### 7.2 OCR and table recognition model ## 5. Model List
|model name|description|model size|download| Some tasks need to use both the structured analysis models and the OCR models. For example, the table recognition task needs to use the table recognition model for structured analysis, and the OCR model to recognize the text in the table. Please select the appropriate models according to your specific needs.
| --- | --- | --- | --- |
|ch_PP-OCRv3_det_slim|[New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection| 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_distill_train.tar)|
|ch_PP-OCRv3_rec_slim |[New] Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition| 4.9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) |
|ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model trained on PubTabNet dataset based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
### 7.3 DOC-VQA model For structural analysis related model downloads, please refer to:
- [PP-Structure Model Zoo](./docs/models_list_en.md)
|model name|description|model size|download| For OCR related model downloads, please refer to:
| --- | --- | --- | --- | - [PP-OCR Model Zoo](../doc/doc_en/models_list_en.md)
|ser_LayoutXLM_xfun_zhd|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|re_LayoutXLM_xfun_zh|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and [PPStructure model_list](./docs/models_list.md)
...@@ -3,135 +3,120 @@ ...@@ -3,135 +3,120 @@
# PP-Structure 文档分析 # PP-Structure 文档分析
- [1. 简介](#1) - [1. 简介](#1)
- [2. 近期更新](#2) - [2. 特性](#2)
- [3. 特性](#3) - [3. 效果展示](#3)
- [4. 效果展示](#4) - [3.1 版面分析和表格识别](#31)
- [4.1 版面分析和表格识别](#41) - [3.2 版面恢复](#32)
- [4.2 DocVQA](#42) - [3.3 关键信息抽取](#33)
- [5. 快速体验](#5) - [4. 快速体验](#4)
- [6. PP-Structure 介绍](#6) - [5. 模型库](#5)
- [6.1 版面分析+表格识别](#61)
- [6.1.1 版面分析](#611)
- [6.1.2 表格识别](#612)
- [6.2 DocVQA](#62)
- [7. 模型库](#7)
- [7.1 版面分析模型](#71)
- [7.2 OCR和表格识别模型](#72)
- [7.3 DocVQA 模型](#73)
<a name="1"></a> <a name="1"></a>
## 1. 简介 ## 1. 简介
PP-Structure是一个可用于复杂文档结构分析和处理的OCR工具包,旨在帮助开发者更好的完成文档理解相关任务。
<a name="2"></a> PP-Structure是PaddleOCR团队自研的智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别等文档理解相关任务。
## 2. 近期更新
* 2022.02.12 DocVQA增加LayoutLMv2模型。
* 2021.12.07 新增[DOC-VQA任务SER和RE](vqa/README.md)
<a name="3"></a>
## 3. 特性
PP-Structure的主要特性如下:
- 支持对图片形式的文档进行版面分析,可以划分**文字、标题、表格、图片以及列表**5类区域(与Layout-Parser联合使用)
- 支持文字、标题、图片以及列表区域提取为文字字段(与PP-OCR联合使用)
- 支持表格区域进行结构化分析,最终结果输出Excel文件
- 支持python whl包和命令行两种方式,简单易用
- 支持版面分析和表格结构化两类任务自定义训练
- 支持文档视觉问答(Document Visual Question Answering,DocVQA)任务-语义实体识别(Semantic Entity Recognition,SER)和关系抽取(Relation Extraction,RE)
<a name="4"></a>
## 4. 效果展示
<a name="41"></a>
### 4.1 版面分析和表格识别
<img src="./docs/table/ppstructure.GIF" width="100%"/>
图中展示了版面分析+表格识别的整体流程,图片先有版面分析划分为图像、文本、标题和表格四种区域,然后对图像、文本和标题三种区域进行OCR的检测识别,对表格进行表格识别,其中图像还会被存储下来以便使用。
<a name="42"></a>
### 4.2 DOC-VQA
* SER
![](./docs/vqa/result_ser/zh_val_0_ser.jpg) | ![](./docs/vqa/result_ser/zh_val_42_ser.jpg) PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正模块,判断整图方向并完成转正,随后可以完成版面信息分析与关键信息抽取2类任务。
---|--- - 版面分析任务中,图像首先经过版面分析模型,将图像划分为文本、表格、图像等不同区域,随后对这些区域分别进行识别,如,将表格区域送入表格识别模块进行结构化识别,将文本区域送入OCR引擎进行文字识别,最后使用版面恢复模块将其恢复为与原始图像布局一致的word或者pdf格式的文件;
- 关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/>
图中不同颜色的框表示不同的类别,对于XFUN数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别 更多技术细节:👉 [PP-Structurev2技术报告](docs/PP-Structurev2_introduction.md)
* 深紫色:HEADER PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,点击下面相应链接获取各个独立模块的使用教程:
* 浅紫色:QUESTION
* 军绿色:ANSWER
在OCR检测框的左上方也标出了对应的类别和OCR识别结果。 - [版面分析](layout/README_ch.md)
- [表格识别](table/README_ch.md)
- [关键信息抽取](kie/README_ch.md)
- [版面复原](recovery/README_ch.md)
* RE <a name="2"></a>
## 2. 特性
![](./docs/vqa/result_re/zh_val_21_re.jpg) | ![](./docs/vqa/result_re/zh_val_40_re.jpg) PP-Structurev2的主要特性如下:
---|--- - 支持对图片/pdf形式的文档进行版面分析,可以划分**文字、标题、表格、图片、公式等**区域;
- 支持通用的中英文**表格检测**任务;
- 支持表格区域进行结构化识别,最终结果输出**Excel文件**
- 支持基于多模态的关键信息抽取(Key Information Extraction,KIE)任务-**语义实体识别**(Semantic Entity Recognition,SER)和**关系抽取**(Relation Extraction,RE);
- 支持**版面复原**,即恢复为与原始图像布局一致的word或者pdf格式的文件;
- 支持自定义训练及python whl包调用等多种推理部署方式,简单易用;
- 与半自动数据标注工具PPOCRLabel打通,支持版面分析、表格识别、SER三种任务的标注。
<a name="3"></a>
## 3. 效果展示
PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,这里仅展示几种代表性使用方式的可视化效果。
图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。 <a name="31"></a>
### 3.1 版面分析和表格识别
下图展示了版面分析+表格识别的整体流程,图片先有版面分析划分为图像、文本、标题和表格四种区域,然后对图像、文本和标题三种区域进行OCR的检测识别,对表格进行表格识别,其中图像还会被存储下来以便使用。
<img src="./docs/table/ppstructure.GIF" width="100%"/>
<a name="5"></a> <a name="32"></a>
## 5. 快速体验 ### 3.2 版面恢复
下图展示了基于上一节版面分析和表格识别的结果进行版面恢复的效果。
<img src="./docs/recovery/recovery.jpg" width="100%"/>
请参考[快速使用](./docs/quickstart.md)教程。
<a name="6"></a> <a name="33"></a>
## 6. PP-Structure 介绍 ### 3.3 关键信息抽取
<a name="61"></a> * SER
### 6.1 版面分析+表格识别
![pipeline](./docs/table/pipeline.jpg) 图中不同颜色的框表示不同的类别。
在PP-Structure中,图片会先经由Layout-Parser进行版面分析,在版面分析中,会对图片里的区域进行分类,包括**文字、标题、图片、列表和表格**5类。对于前4类区域,直接使用PP-OCR完成对应区域文字检测与识别。对于表格类区域,经过表格结构化处理后,表格图片转换为相同表格样式的Excel文件。 <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539141-68e71c75-5cf7-4529-b2ca-219d29fa5f68.jpg" width="600">
</div>
<a name="611"></a> <div align="center">
#### 6.1.1 版面分析 <img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
版面分析对文档数据进行区域分类,其中包括版面分析工具的Python脚本使用、提取指定类别检测框、性能指标以及自定义训练版面分析模型,详细内容可以参考[文档](layout/README_ch.md) <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
<a name="612"></a> <div align="center">
#### 6.1.2 表格识别 <img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
表格识别将表格图片转换为excel文档,其中包含对于表格文本的检测和识别以及对于表格结构和单元格坐标的预测,详细说明参考[文档](table/README_ch.md) <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095702-9acef674-12af-4d09-97fc-abf4ab32600e.png" width="600">
</div>
<a name="62"></a> * RE
### 6.2 DocVQA
DocVQA指文档视觉问答,其中包括语义实体识别 (Semantic Entity Recognition, SER) 和关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair),详细说明参考[文档](vqa/README.md) 图中红色框表示`问题`,蓝色框表示`答案``问题``答案`之间使用绿色线连接
<a name="7"></a> <div align="center">
## 7. 模型库 <img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
PP-Structure系列模型列表(更新中) <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div>
<a name="71"></a> <div align="center">
### 7.1 版面分析模型 <img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
|模型名称|模型简介|下载地址| label_map| <div align="center">
| --- | --- | --- | --- | <img src="https://user-images.githubusercontent.com/25809855/186095641-5843b4da-34d7-4c1c-943a-b1036a859fe3.png" width="600">
| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| </div>
<a name="72"></a> <a name="4"></a>
### 7.2 OCR和表格识别模型 ## 4. 快速体验
|模型名称|模型简介|模型大小|下载地址| 请参考[快速使用](./docs/quickstart.md)教程。
| --- | --- | --- | --- |
|ch_PP-OCRv3_det_slim|【最新】slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测| 1.1M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_distill_train.tar)|
|ch_PP-OCRv3_rec_slim |【最新】slim量化版超轻量模型,支持中英文、数字识别| 4.9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) |
|ch_ppstructure_mobile_v2.0_SLANet|基于SLANet在PubTabNet数据集上训练的中文表格识别模型|9.3M|[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
<a name="5"></a>
## 5. 模型库
<a name="73"></a> 部分任务需要同时用到结构化分析模型和OCR模型,如表格识别需要使用表格识别模型进行结构化解析,同时也要用到OCR模型对表格内的文字进行识别,请根据具体需求选择合适的模型。
### 7.3 DocVQA 模型
|模型名称|模型简介|模型大小|下载地址| 结构化分析相关模型下载可以参考:
| --- | --- | --- | --- | - [PP-Structure 模型库](./docs/models_list.md)
|ser_LayoutXLM_xfun_zhd|基于LayoutXLM在xfun中文数据集上训练的SER模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|re_LayoutXLM_xfun_zh|基于LayoutXLM在xfun中文数据集上训练的RE模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
OCR相关模型下载可以参考:
- [PP-OCR 模型库](../doc/doc_ch/models_list.md)
更多模型下载,可以参考 [PP-OCR model_list](../doc/doc_ch/models_list.md) and [PP-Structure model_list](./docs/models_list.md)
此差异已折叠。
# 基于Python预测引擎推理 # 基于Python预测引擎推理
- [1. Structure](#1) - [1. 版面信息抽取](#1)
- [1.1 版面分析+表格识别](#1.1) - [1.1 版面分析+表格识别](#1.1)
- [1.2 版面分析](#1.2) - [1.2 版面分析](#1.2)
- [1.3 表格识别](#1.3) - [1.3 表格识别](#1.3)
- [2. DocVQA](#2) - [2. 关键信息抽取](#2)
<a name="1"></a> <a name="1"></a>
## 1. Structure ## 1. 版面信息抽取
进入`ppstructure`目录 进入`ppstructure`目录
```bash ```bash
cd ppstructure cd ppstructure
```` ```
下载模型 下载模型
```bash ```bash
mkdir inference && cd inference mkdir inference && cd inference
# 下载PP-OCRv2文本检测模型并解压 # 下载PP-Structurev2版面分析模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar xf picodet_lcnet_x1_0_layout_infer.tar
# 下载PP-OCRv2文本识别模型并解压 # 下载PP-OCRv3文本检测模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# 下载超轻量级英文表格预测模型并解压 # 下载PP-OCRv3文本识别模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# 下载PP-Structurev2表格识别模型并解压
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd .. cd ..
``` ```
<a name="1.1"></a> <a name="1.1"></a>
### 1.1 版面分析+表格识别 ### 1.1 版面分析+表格识别
```bash ```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv3_det_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \ --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
--layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
--image_dir=./docs/table/1.png \ --image_dir=./docs/table/1.png \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \ --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
--output=../output \ --output=../output \
--vis_font_path=../doc/fonts/simfang.ttf --vis_font_path=../doc/fonts/simfang.ttf
``` ```
...@@ -41,19 +44,23 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i ...@@ -41,19 +44,23 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
<a name="1.2"></a> <a name="1.2"></a>
### 1.2 版面分析 ### 1.2 版面分析
```bash ```bash
python3 predict_system.py --image_dir=./docs/table/1.png --table=false --ocr=false --output=../output/ python3 predict_system.py --layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
--image_dir=./docs/table/1.png \
--output=../output \
--table=false \
--ocr=false
``` ```
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,图片区域会被裁剪之后保存下来,图片名为表格在图片里的坐标。版面分析结果会存储在`res.txt`文件中。 运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,图片区域会被裁剪之后保存下来,图片名为表格在图片里的坐标。版面分析结果会存储在`res.txt`文件中。
<a name="1.3"></a> <a name="1.3"></a>
### 1.3 表格识别 ### 1.3 表格识别
```bash ```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv3_det_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \ --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
--image_dir=./docs/table/table.jpg \ --image_dir=./docs/table/table.jpg \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \ --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
--output=../output \ --output=../output \
--vis_font_path=../doc/fonts/simfang.ttf \ --vis_font_path=../doc/fonts/simfang.ttf \
--layout=false --layout=false
...@@ -61,20 +68,22 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i ...@@ -61,20 +68,22 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,表格会存储为一个excel,excel文件名为`[0,0,img_h,img_w]` 运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,表格会存储为一个excel,excel文件名为`[0,0,img_h,img_w]`
<a name="2"></a> <a name="2"></a>
## 2. DocVQA ## 2. 关键信息抽取
```bash ```bash
cd ppstructure cd ppstructure
# 下载模型
mkdir inference && cd inference mkdir inference && cd inference
# 下载SER xfun 模型并解压 # 下载SER XFUND 模型并解压
wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar && tar -xf ser_vi_layoutxlm_xfund_infer.tar
cd .. cd ..
python3 kie/predict_kie_token_ser.py \
python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained/ \ --kie_algorithm=LayoutXLM \
--mode=vqa \ --ser_model_dir=../inference/ser_vi_layoutxlm_xfund_infer \
--image_dir=vqa/images/input/zh_val_0.jpg \ --image_dir=./docs/kie/input/zh_val_42.jpg \
--vis_font_path=../doc/fonts/simfang.ttf --ser_dict_path=../ppocr/utils/dict/kie_dict/xfund_class_list.txt \
--vis_font_path=../doc/fonts/simfang.ttf \
--ocr_order_method="tb-yx"
``` ```
运行完成后,每张图片会在`output`字段指定的目录下的`vqa`目录下存放可视化之后的图片,图片名和输入图片名一致。
运行完成后,每张图片会在`output`字段指定的目录下的`kie`目录下存放可视化之后的图片,图片名和输入图片名一致。
# Python Inference # Python Inference
- [1. Structure](#1) - [1. Layout Structured Analysis](#1)
- [1.1 layout analysis + table recognition](#1.1) - [1.1 layout analysis + table recognition](#1.1)
- [1.2 layout analysis](#1.2) - [1.2 layout analysis](#1.2)
- [1.3 table recognition](#1.3) - [1.3 table recognition](#1.3)
- [2. DocVQA](#2) - [2. Key Information Extraction](#2)
<a name="1"></a> <a name="1"></a>
## 1. Structure ## 1. Layout Structured Analysis
Go to the `ppstructure` directory Go to the `ppstructure` directory
```bash ```bash
...@@ -18,23 +18,26 @@ download model ...@@ -18,23 +18,26 @@ download model
```bash ```bash
mkdir inference && cd inference mkdir inference && cd inference
# Download the PP-OCRv2 text detection model and unzip it # Download the PP-Structurev2 layout analysis model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar xf picodet_lcnet_x1_0_layout_infer.tar
# Download the PP-OCRv2 text recognition model and unzip it # Download the PP-OCRv3 text detection model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# Download the ultra-lightweight English table structure model and unzip it # Download the PP-OCRv3 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# Download the PP-Structurev2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd .. cd ..
``` ```
<a name="1.1"></a> <a name="1.1"></a>
### 1.1 layout analysis + table recognition ### 1.1 layout analysis + table recognition
```bash ```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv3_det_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \ --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
--layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
--image_dir=./docs/table/1.png \ --image_dir=./docs/table/1.png \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \ --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
--output=../output \ --output=../output \
--vis_font_path=../doc/fonts/simfang.ttf --vis_font_path=../doc/fonts/simfang.ttf
``` ```
...@@ -43,19 +46,23 @@ After the operation is completed, each image will have a directory with the same ...@@ -43,19 +46,23 @@ After the operation is completed, each image will have a directory with the same
<a name="1.2"></a> <a name="1.2"></a>
### 1.2 layout analysis ### 1.2 layout analysis
```bash ```bash
python3 predict_system.py --image_dir=./docs/table/1.png --table=false --ocr=false --output=../output/ python3 predict_system.py --layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
--image_dir=./docs/table/1.png \
--output=../output \
--table=false \
--ocr=false
``` ```
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each picture in image will be cropped and saved. The filename of picture area is their coordinates in the image. Layout analysis results will be stored in the `res.txt` file After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each picture in image will be cropped and saved. The filename of picture area is their coordinates in the image. Layout analysis results will be stored in the `res.txt` file
<a name="1.3"></a> <a name="1.3"></a>
### 1.3 table recognition ### 1.3 table recognition
```bash ```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv3_det_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \ --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
--image_dir=./docs/table/table.jpg \ --image_dir=./docs/table/table.jpg \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \ --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
--output=../output \ --output=../output \
--vis_font_path=../doc/fonts/simfang.ttf \ --vis_font_path=../doc/fonts/simfang.ttf \
--layout=false --layout=false
...@@ -63,19 +70,22 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i ...@@ -63,19 +70,22 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image. After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image.
<a name="2"></a> <a name="2"></a>
## 2. DocVQA ## 2. Key Information Extraction
```bash ```bash
cd ppstructure cd ppstructure
# download model
mkdir inference && cd inference mkdir inference && cd inference
wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar # download model
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar && tar -xf ser_vi_layoutxlm_xfund_infer.tar
cd .. cd ..
python3 kie/predict_kie_token_ser.py \
python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained/ \ --kie_algorithm=LayoutXLM \
--mode=vqa \ --ser_model_dir=../inference/ser_vi_layoutxlm_xfund_infer \
--image_dir=vqa/images/input/zh_val_0.jpg \ --image_dir=./docs/kie/input/zh_val_42.jpg \
--vis_font_path=../doc/fonts/simfang.ttf --ser_dict_path=../ppocr/utils/dict/kie_dict/xfund_class_list.txt \
--vis_font_path=../doc/fonts/simfang.ttf \
--ocr_order_method="tb-yx"
``` ```
After the operation is completed, each image will store the visualized image in the `vqa` directory under the directory specified by the `output` field, and the image name is the same as the input image name.
After the operation is completed, each image will store the visualized image in the `kie` directory under the directory specified by the `output` field, and the image name is the same as the input image name.
- [快速安装](#快速安装)
- [1. PaddlePaddle 和 PaddleOCR](#1-paddlepaddle-和-paddleocr)
- [2. 安装其他依赖](#2-安装其他依赖)
- [2.1 VQA所需依赖](#21--vqa所需依赖)
# 快速安装
## 1. PaddlePaddle 和 PaddleOCR
可参考[PaddleOCR安装文档](../../doc/doc_ch/installation.md)
## 2. 安装其他依赖
### 2.1 VQA所需依赖
* paddleocr
```bash
pip3 install paddleocr
```
* PaddleNLP
```bash
git clone https://github.com/PaddlePaddle/PaddleNLP -b develop
cd PaddleNLP
pip3 install -e .
```
# Quick installation
- [1. PaddlePaddle 和 PaddleOCR](#1)
- [2. Install other dependencies](#2)
- [2.1 VQA](#21)
<a name="1"></a>
## 1. PaddlePaddle and PaddleOCR
Please refer to [PaddleOCR installation documentation](../../doc/doc_en/installation_en.md)
<a name="2"></a>
## 2. Install other dependencies
<a name="21"></a>
### 2.1 VQA
* paddleocr
```bash
pip3 install paddleocr
```
* PaddleNLP
```bash
git clone https://github.com/PaddlePaddle/PaddleNLP -b develop
cd PaddleNLP
pip3 install -e .
```
...@@ -10,13 +10,17 @@ ...@@ -10,13 +10,17 @@
<a name="1"></a> <a name="1"></a>
## 1. 版面分析模型 ## 1. 版面分析模型
|模型名称|模型简介|下载地址|label_map| |模型名称|模型简介|推理模型大小|下载地址|dict path|
| --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| | picodet_lcnet_x1_0_fgd_layout | 基于PicoDet LCNet_x1_0和FGD蒸馏在PubLayNet 数据集训练的英文版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| | ppyolov2_r50vd_dcn_365e_publaynet | 基于PP-YOLOv2在PubLayNet数据集上训练的英文版面分析模型 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | 同上 |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| | picodet_lcnet_x1_0_fgd_layout_cdla | CDLA数据集训练的中文版面分析模型,可以划分为**表格、图片、图片标题、表格、表格标题、页眉、脚本、引用、公式**10类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) |
| picodet_lcnet_x1_0_fgd_layout_table | 表格数据集训练的版面分析模型,支持中英文文档表格区域的检测 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | 基于PP-YOLOv2在TableBank Word 数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | 同上 |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | 基于PP-YOLOv2在TableBank Latex数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | 同上 |
<a name="2"></a> <a name="2"></a>
## 2. OCR和表格识别模型 ## 2. OCR和表格识别模型
<a name="21"></a> <a name="21"></a>
...@@ -24,8 +28,8 @@ ...@@ -24,8 +28,8 @@
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- | | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_det|PubLayNet数据集训练的英文表格场景的文字检测|4.7M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_det_train.tar) | |en_ppocr_mobile_v2.0_table_det|PubTabNet数据集训练的英文表格场景的文字检测|4.7M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_det_train.tar) |
|en_ppocr_mobile_v2.0_table_rec|PubLayNet数据集训练的英文表格场景的文字识别|6.9M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) | |en_ppocr_mobile_v2.0_table_rec|PubTabNet数据集训练的英文表格场景的文字识别|6.9M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) |
如需要使用其他OCR模型,可以在 [PP-OCR model_list](../../doc/doc_ch/models_list.md) 下载模型或者使用自己训练好的模型配置到 `det_model_dir`, `rec_model_dir`两个字段即可。 如需要使用其他OCR模型,可以在 [PP-OCR model_list](../../doc/doc_ch/models_list.md) 下载模型或者使用自己训练好的模型配置到 `det_model_dir`, `rec_model_dir`两个字段即可。
...@@ -34,9 +38,9 @@ ...@@ -34,9 +38,9 @@
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- | | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|基于TableRec-RARE在PubTabNet数据集上训练的英文表格识别模型|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | |en_ppocr_mobile_v2.0_table_structure|基于TableRec-RARE在PubTabNet数据集上训练的英文表格识别模型|6.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
|en_ppstructure_mobile_v2.0_SLANet|基于SLANet在PubTabNet数据集上训练的英文表格识别模型|9M|[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_train.tar) | |en_ppstructure_mobile_v2.0_SLANet|基于SLANet在PubTabNet数据集上训练的英文表格识别模型|9.2M|[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_train.tar) |
|ch_ppstructure_mobile_v2.0_SLANet|基于SLANet在PubTabNet数据集上训练的中文表格识别模型|9.3M|[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) | |ch_ppstructure_mobile_v2.0_SLANet|基于SLANet的中文表格识别模型|9.3M|[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
<a name="3"></a> <a name="3"></a>
......
...@@ -4,18 +4,20 @@ ...@@ -4,18 +4,20 @@
- [2. OCR and Table Recognition](#2-ocr-and-table-recognition) - [2. OCR and Table Recognition](#2-ocr-and-table-recognition)
- [2.1 OCR](#21-ocr) - [2.1 OCR](#21-ocr)
- [2.2 Table Recognition](#22-table-recognition) - [2.2 Table Recognition](#22-table-recognition)
- [3. VQA](#3-vqa) - [3. KIE](#3-kie)
- [4. KIE](#4-kie)
<a name="1"></a> <a name="1"></a>
## 1. Layout Analysis ## 1. Layout Analysis
|model name| description |download|label_map| |model name| description | inference model size |download|dict path|
| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | | --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **text, title, table, picture and list** | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| | picodet_lcnet_x1_0_fgd_layout | The layout analysis English model trained on the PubLayNet dataset based on PicoDet LCNet_x1_0 and FGD . the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| | ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221M | [inference_moel]](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | sme as above |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| | picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis Chinese model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) |
| picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can detect tables in Chinese and English documents | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | same as above |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | same as above |
<a name="2"></a> <a name="2"></a>
## 2. OCR and Table Recognition ## 2. OCR and Table Recognition
...@@ -35,24 +37,30 @@ If you need to use other OCR models, you can download the model in [PP-OCR model ...@@ -35,24 +37,30 @@ If you need to use other OCR models, you can download the model in [PP-OCR model
|model| description |inference model size|download| |model| description |inference model size|download|
| --- |-----------------------------------------------------------------------------| --- | --- | | --- |-----------------------------------------------------------------------------| --- | --- |
|en_ppocr_mobile_v2.0_table_structure| English table recognition model trained on PubTabNet dataset based on TableRec-RARE |18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | |en_ppocr_mobile_v2.0_table_structure| English table recognition model trained on PubTabNet dataset based on TableRec-RARE |6.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
|en_ppstructure_mobile_v2.0_SLANet|English table recognition model trained on PubTabNet dataset based on SLANet|9M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_train.tar) | |en_ppstructure_mobile_v2.0_SLANet|English table recognition model trained on PubTabNet dataset based on SLANet|9.2M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_train.tar) |
|ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model trained on PubTabNet dataset based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) | |ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
<a name="3"></a> <a name="3"></a>
## 3. VQA ## 3. KIE
|model| description |inference model size|download| On XFUND_zh dataset, Accuracy and time cost of different models on V100 GPU are as follows.
| --- |----------------------------------------------------------------| --- | --- |
|ser_LayoutXLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) | |Model|Backbone|Task|Config|Hmean|Time cost(ms)|Download link|
|re_LayoutXLM_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | | --- | --- | --- | --- | --- | --- |--- |
|ser_LayoutLMv2_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLMv2 |778M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) | |VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**| 15.49| [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)|
|re_LayoutLMv2_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLMv2 |765M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | |LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%| 19.49 |[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)|
|ser_LayoutLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutLM |430M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | |LayoutLM| LayoutLM-base | SER | [ser_layoutlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlm_xfund_zh.yml)|77.31%|-|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar)|
|LayoutLMv2| LayoutLMv2-base | SER | [ser_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlmv2_xfund_zh.yml)|85.44%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)|
<a name="4"></a> |VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**|15.49|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)|
## 4. KIE |LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|19.49|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)|
|LayoutLMv2| LayoutLMv2-base | RE | [re_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutlmv2_xfund_zh.yml)|67.77%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar)|
|model|description|model size|download|
| --- | --- | --- | --- | * Note: The above time cost information just considers inference time without preprocess or postprocess, test environment: `V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4`
|SDMGR|Key Information Extraction Model|78M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
On wildreceipt dataset, the algorithm result is as follows:
|Model|Backbone|Config|Hmean|Download link|
| --- | --- | --- | --- | --- |
|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
# PP-Structure 快速开始 # PP-Structure 快速开始
- [1. 安装依赖包](#1-安装依赖包) - [1. 准备环境](#1-准备环境)
- [2. 便捷使用](#2-便捷使用) - [2. 便捷使用](#2-便捷使用)
- [2.1 命令行使用](#21-命令行使用) - [2.1 命令行使用](#21-命令行使用)
- [2.1.1 图像方向分类+版面分析+表格识别](#211-图像方向分类版面分析表格识别) - [2.1.1 图像方向分类+版面分析+表格识别](#211-图像方向分类版面分析表格识别)
- [2.1.2 版面分析+表格识别](#212-版面分析表格识别) - [2.1.2 版面分析+表格识别](#212-版面分析表格识别)
- [2.1.3 版面分析](#213-版面分析) - [2.1.3 版面分析](#213-版面分析)
- [2.1.4 表格识别](#214-表格识别) - [2.1.4 表格识别](#214-表格识别)
- [2.1.5 DocVQA](#215-docvqa) - [2.1.5 关键信息抽取](#215-关键信息抽取)
- [2.2 代码使用](#22-代码使用) - [2.1.6 版面恢复](#216-版面恢复)
- [2.2.1 图像方向分类版面分析表格识别](#221-图像方向分类版面分析表格识别) - [2.2 Python脚本使用](#22-Python脚本使用)
- [2.2.1 图像方向分类+版面分析+表格识别](#221-图像方向分类版面分析表格识别)
- [2.2.2 版面分析+表格识别](#222-版面分析表格识别) - [2.2.2 版面分析+表格识别](#222-版面分析表格识别)
- [2.2.3 版面分析](#223-版面分析) - [2.2.3 版面分析](#223-版面分析)
- [2.2.4 表格识别](#224-表格识别) - [2.2.4 表格识别](#224-表格识别)
- [2.2.5 DocVQA](#225-docvqa) - [2.2.5 关键信息抽取](#225-关键信息抽取)
- [2.2.6 版面恢复](#226-版面恢复)
- [2.3 返回结果说明](#23-返回结果说明) - [2.3 返回结果说明](#23-返回结果说明)
- [2.3.1 版面分析+表格识别](#231-版面分析表格识别) - [2.3.1 版面分析+表格识别](#231-版面分析表格识别)
- [2.3.2 DocVQA](#232-docvqa) - [2.3.2 关键信息抽取](#232-关键信息抽取)
- [2.4 参数说明](#24-参数说明) - [2.4 参数说明](#24-参数说明)
- [3. 小结](#3-小结)
<a name="1"></a> <a name="1"></a>
## 1. 安装依赖包 ## 1. 准备环境
### 1.1 安装PaddlePaddle
> 如果您没有基础的Python运行环境,请参考[运行环境准备](../../doc/doc_ch/environment.md)。
- 您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- 您的机器是CPU,请运行以下命令安装
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
### 1.2 安装PaddleOCR whl包
```bash ```bash
# 安装 paddleocr,推荐使用2.5+版本 # 安装 paddleocr,推荐使用2.6版本
pip3 install "paddleocr>=2.5" pip3 install "paddleocr>=2.6"
# 安装 DocVQA依赖包paddlenlp(如不需要DocVQA功能,可跳过)
pip install paddlenlp # 安装 图像方向分类依赖包paddleclas(如不需要图像方向分类功能,可跳过)
pip3 install paddleclas
# 安装 关键信息抽取 依赖包(如不需要KIE功能,可跳过)
pip3 install -r kie/requirements.txt
``` ```
<a name="2"></a> <a name="2"></a>
...@@ -40,37 +63,46 @@ pip install paddlenlp ...@@ -40,37 +63,46 @@ pip install paddlenlp
<a name="211"></a> <a name="211"></a>
#### 2.1.1 图像方向分类+版面分析+表格识别 #### 2.1.1 图像方向分类+版面分析+表格识别
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --image_orientation=true paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --image_orientation=true
``` ```
<a name="212"></a> <a name="212"></a>
#### 2.1.2 版面分析+表格识别 #### 2.1.2 版面分析+表格识别
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure
``` ```
<a name="213"></a> <a name="213"></a>
#### 2.1.3 版面分析 #### 2.1.3 版面分析
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
``` ```
<a name="214"></a> <a name="214"></a>
#### 2.1.4 表格识别 #### 2.1.4 表格识别
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout=false
``` ```
<a name="215"></a> <a name="215"></a>
#### 2.1.5 DocVQA
请参考:[文档视觉问答](../vqa/README.md) #### 2.1.5 关键信息抽取
关键信息抽取暂不支持通过whl包调用,详细使用教程请参考:[关键信息抽取教程](../kie/README_ch.md)
<a name="216"></a>
#### 2.1.6 版面恢复
```bash
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true
```
<a name="22"></a> <a name="22"></a>
### 2.2 代码使用
### 2.2 Python脚本使用
<a name="221"></a> <a name="221"></a>
#### 2.2.1 图像方向分类版面分析表格识别 #### 2.2.1 图像方向分类+版面分析+表格识别
```python ```python
import os import os
...@@ -80,7 +112,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res ...@@ -80,7 +112,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True, image_orientation=True) table_engine = PPStructure(show_log=True, image_orientation=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
...@@ -91,7 +123,7 @@ for line in result: ...@@ -91,7 +123,7 @@ for line in result:
from PIL import Image from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包 font_path = 'doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB') image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path) im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
...@@ -109,7 +141,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res ...@@ -109,7 +141,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True) table_engine = PPStructure(show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
...@@ -120,7 +152,7 @@ for line in result: ...@@ -120,7 +152,7 @@ for line in result:
from PIL import Image from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包 font_path = 'doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB') image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path) im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
...@@ -138,7 +170,7 @@ from paddleocr import PPStructure,save_structure_res ...@@ -138,7 +170,7 @@ from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(table=False, ocr=False, show_log=True) table_engine = PPStructure(table=False, ocr=False, show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
...@@ -149,6 +181,7 @@ for line in result: ...@@ -149,6 +181,7 @@ for line in result:
``` ```
<a name="224"></a> <a name="224"></a>
#### 2.2.4 表格识别 #### 2.2.4 表格识别
```python ```python
...@@ -159,7 +192,7 @@ from paddleocr import PPStructure,save_structure_res ...@@ -159,7 +192,7 @@ from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(layout=False, show_log=True) table_engine = PPStructure(layout=False, show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/table.jpg' img_path = 'ppstructure/docs/table/table.jpg'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
...@@ -170,13 +203,40 @@ for line in result: ...@@ -170,13 +203,40 @@ for line in result:
``` ```
<a name="225"></a> <a name="225"></a>
#### 2.2.5 DocVQA #### 2.2.5 关键信息抽取
关键信息抽取暂不支持通过whl包调用,详细使用教程请参考:[关键信息抽取教程](../kie/README_ch.md)
请参考:[文档视觉问答](../vqa/README.md) <a name="226"></a>
#### 2.2.6 版面恢复
```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res
from paddelocr.ppstructure.recovery.recovery_to_doc import sorted_layout_boxes, convert_info_docx
table_engine = PPStructure(layout=False, show_log=True)
save_folder = './output'
img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
h, w, _ = img.shape
res = sorted_layout_boxes(res, w)
convert_info_docx(img, result, save_folder, os.path.basename(img_path).split('.')[0])
```
<a name="23"></a> <a name="23"></a>
### 2.3 返回结果说明 ### 2.3 返回结果说明
PP-Structure的返回结果为一个dict组成的list,示例如下 PP-Structure的返回结果为一个dict组成的list,示例如下
<a name="231"></a> <a name="231"></a>
#### 2.3.1 版面分析+表格识别 #### 2.3.1 版面分析+表格识别
...@@ -189,7 +249,7 @@ PP-Structure的返回结果为一个dict组成的list,示例如下 ...@@ -189,7 +249,7 @@ PP-Structure的返回结果为一个dict组成的list,示例如下
} }
] ]
``` ```
dict 里各个字段说明如下 dict 里各个字段说明如下
| 字段 | 说明| | 字段 | 说明|
| --- |---| | --- |---|
...@@ -208,9 +268,9 @@ dict 里各个字段说明如下 ...@@ -208,9 +268,9 @@ dict 里各个字段说明如下
``` ```
<a name="232"></a> <a name="232"></a>
#### 2.3.2 DocVQA #### 2.3.2 关键信息抽取
请参考:[文档视觉问答](../vqa/README.md) 请参考:[关键信息抽取教程](../kie/README_ch.md)
<a name="24"></a> <a name="24"></a>
### 2.4 参数说明 ### 2.4 参数说明
...@@ -226,15 +286,21 @@ dict 里各个字段说明如下 ...@@ -226,15 +286,21 @@ dict 里各个字段说明如下
| layout_dict_path | 版面分析模型字典| ../ppocr/utils/dict/layout_publaynet_dict.txt | | layout_dict_path | 版面分析模型字典| ../ppocr/utils/dict/layout_publaynet_dict.txt |
| layout_score_threshold | 版面分析模型检测框阈值| 0.5| | layout_score_threshold | 版面分析模型检测框阈值| 0.5|
| layout_nms_threshold | 版面分析模型nms阈值| 0.5| | layout_nms_threshold | 版面分析模型nms阈值| 0.5|
| vqa_algorithm | vqa模型算法| LayoutXLM| | kie_algorithm | kie模型算法| LayoutXLM|
| ser_model_dir | ser模型 inference 模型地址| None| | ser_model_dir | ser模型 inference 模型地址| None|
| ser_dict_path | ser模型字典| ../train_data/XFUND/class_list_xfun.txt| | ser_dict_path | ser模型字典| ../train_data/XFUND/class_list_xfun.txt|
| mode | structure or vqa | structure | | mode | structure or kie | structure |
| image_orientation | 前向中是否执行图像方向分类 | False | | image_orientation | 前向中是否执行图像方向分类 | False |
| layout | 前向中是否执行版面分析 | True | | layout | 前向中是否执行版面分析 | True |
| table | 前向中是否执行表格识别 | True | | table | 前向中是否执行表格识别 | True |
| ocr | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False| True | | ocr | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False| True |
| recovery | 前向中是否执行版面恢复| False | | recovery | 前向中是否执行版面恢复| False |
| save_pdf | 版面恢复导出docx文件的同时,是否导出pdf文件 | False |
| structure_version | 模型版本,可选 PP-structure和PP-structurev2 | PP-structure | | structure_version | 模型版本,可选 PP-structure和PP-structurev2 | PP-structure |
大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md) 大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md)
<a name="3"></a>
## 3. 小结
通过本节内容,相信您已经熟练掌握通过PaddleOCR whl包调用PP-Structure相关功能的使用方法,您可以参考[文档教程](../../README_ch.md#文档教程),获取包括模型训练、推理部署等更详细的使用教程。
\ No newline at end of file
# PP-Structure Quick Start # PP-Structure Quick Start
- [1. Install package](#1-install-package) - [1. Environment Preparation](#1-environment-preparation)
- [2. Use](#2-use) - [2. Quick Use](#2-quick-use)
- [2.1 Use by command line](#21-use-by-command-line) - [2.1 Use by command line](#21-use-by-command-line)
- [2.1.1 image orientation + layout analysis + table recognition](#211-image-orientation--layout-analysis--table-recognition) - [2.1.1 image orientation + layout analysis + table recognition](#211-image-orientation--layout-analysis--table-recognition)
- [2.1.2 layout analysis + table recognition](#212-layout-analysis--table-recognition) - [2.1.2 layout analysis + table recognition](#212-layout-analysis--table-recognition)
- [2.1.3 layout analysis](#213-layout-analysis) - [2.1.3 layout analysis](#213-layout-analysis)
- [2.1.4 table recognition](#214-table-recognition) - [2.1.4 table recognition](#214-table-recognition)
- [2.1.5 DocVQA](#215-docvqa) - [2.1.5 Key Information Extraction](#215-Key-Information-Extraction)
- [2.2 Use by code](#22-use-by-code) - [2.1.6 layout recovery](#216-layout-recovery)
- [2.2 Use by python script](#22-use-by-python-script)
- [2.2.1 image orientation + layout analysis + table recognition](#221-image-orientation--layout-analysis--table-recognition) - [2.2.1 image orientation + layout analysis + table recognition](#221-image-orientation--layout-analysis--table-recognition)
- [2.2.2 layout analysis + table recognition](#222-layout-analysis--table-recognition) - [2.2.2 layout analysis + table recognition](#222-layout-analysis--table-recognition)
- [2.2.3 layout analysis](#223-layout-analysis) - [2.2.3 layout analysis](#223-layout-analysis)
- [2.2.4 table recognition](#224-table-recognition) - [2.2.4 table recognition](#224-table-recognition)
- [2.2.5 DocVQA](#225-docvqa) - [2.2.5 Key Information Extraction](#225-Key-Information-Extraction)
- [2.2.6 layout recovery](#226-layout-recovery)
- [2.3 Result description](#23-result-description) - [2.3 Result description](#23-result-description)
- [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition) - [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition)
- [2.3.2 DocVQA](#232-docvqa) - [2.3.2 Key Information Extraction](#232-Key-Information-Extraction)
- [2.4 Parameter Description](#24-parameter-description) - [2.4 Parameter Description](#24-parameter-description)
- [3. Summary](#3-summary)
<a name="1"></a> <a name="1"></a>
## 1. Install package ## 1. Environment Preparation
### 1.1 Install PaddlePaddle
> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).
- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- If you have no available GPU on your machine, please run the following command to install the CPU version
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
### 1.2 Install PaddleOCR Whl Package
```bash ```bash
# Install paddleocr, version 2.5+ is recommended # Install paddleocr, version 2.6 is recommended
pip3 install "paddleocr>=2.5" pip3 install "paddleocr>=2.6"
# Install the DocVQA dependency package paddlenlp (if you do not use the DocVQA, you can skip it)
pip install paddlenlp
# Install the image direction classification dependency package paddleclas (if you do not use the image direction classification, you can skip it)
pip3 install paddleclas
# Install the KIE dependency packages (if you do not use the KIE, you can skip it)
pip3 install -r kie/requirements.txt
``` ```
<a name="2"></a> <a name="2"></a>
## 2. Use
## 2. Quick Use
<a name="21"></a> <a name="21"></a>
### 2.1 Use by command line ### 2.1 Use by command line
...@@ -40,34 +66,40 @@ pip install paddlenlp ...@@ -40,34 +66,40 @@ pip install paddlenlp
<a name="211"></a> <a name="211"></a>
#### 2.1.1 image orientation + layout analysis + table recognition #### 2.1.1 image orientation + layout analysis + table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --image_orientation=true paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --image_orientation=true
``` ```
<a name="212"></a> <a name="212"></a>
#### 2.1.2 layout analysis + table recognition #### 2.1.2 layout analysis + table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure
``` ```
<a name="213"></a> <a name="213"></a>
#### 2.1.3 layout analysis #### 2.1.3 layout analysis
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
``` ```
<a name="214"></a> <a name="214"></a>
#### 2.1.4 table recognition #### 2.1.4 table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout=false
``` ```
<a name="215"></a> <a name="215"></a>
#### 2.1.5 DocVQA #### 2.1.5 Key Information Extraction
Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md).
Please refer to: [Documentation Visual Q&A](../vqa/README.md) . <a name="216"></a>
#### 2.1.6 layout recovery
```bash
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true
```
<a name="22"></a> <a name="22"></a>
### 2.2 Use by code ### 2.2 Use by python script
<a name="221"></a> <a name="221"></a>
#### 2.2.1 image orientation + layout analysis + table recognition #### 2.2.1 image orientation + layout analysis + table recognition
...@@ -80,7 +112,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res ...@@ -80,7 +112,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True, image_orientation=True) table_engine = PPStructure(show_log=True, image_orientation=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
...@@ -91,7 +123,7 @@ for line in result: ...@@ -91,7 +123,7 @@ for line in result:
from PIL import Image from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包 font_path = 'doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB') image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path) im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
...@@ -109,7 +141,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res ...@@ -109,7 +141,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True) table_engine = PPStructure(show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
...@@ -120,7 +152,7 @@ for line in result: ...@@ -120,7 +152,7 @@ for line in result:
from PIL import Image from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包 font_path = 'doc/fonts/simfang.ttf' # font provieded in PaddleOCR
image = Image.open(img_path).convert('RGB') image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path) im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
...@@ -138,7 +170,7 @@ from paddleocr import PPStructure,save_structure_res ...@@ -138,7 +170,7 @@ from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(table=False, ocr=False, show_log=True) table_engine = PPStructure(table=False, ocr=False, show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
...@@ -159,7 +191,7 @@ from paddleocr import PPStructure,save_structure_res ...@@ -159,7 +191,7 @@ from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(layout=False, show_log=True) table_engine = PPStructure(layout=False, show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/table.jpg' img_path = 'ppstructure/docs/table/table.jpg'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
...@@ -170,9 +202,35 @@ for line in result: ...@@ -170,9 +202,35 @@ for line in result:
``` ```
<a name="225"></a> <a name="225"></a>
#### 2.2.5 DocVQA #### 2.2.5 Key Information Extraction
Please refer to: [Documentation Visual Q&A](../vqa/README.md) . Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md).
<a name="226"></a>
#### 2.2.6 layout recovery
```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res
from paddelocr.ppstructure.recovery.recovery_to_doc import sorted_layout_boxes, convert_info_docx
table_engine = PPStructure(layout=False, show_log=True)
save_folder = './output'
img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
for line in result:
line.pop('img')
print(line)
h, w, _ = img.shape
res = sorted_layout_boxes(res, w)
convert_info_docx(img, result, save_folder, os.path.basename(img_path).split('.')[0])
```
<a name="23"></a> <a name="23"></a>
### 2.3 Result description ### 2.3 Result description
...@@ -208,9 +266,9 @@ After the recognition is completed, each image will have a directory with the sa ...@@ -208,9 +266,9 @@ After the recognition is completed, each image will have a directory with the sa
``` ```
<a name="232"></a> <a name="232"></a>
#### 2.3.2 DocVQA #### 2.3.2 Key Information Extraction
Please refer to: [Documentation Visual Q&A](../vqa/README.md) . Please refer to: [Key Information Extraction](../kie/README.md) .
<a name="24"></a> <a name="24"></a>
### 2.4 Parameter Description ### 2.4 Parameter Description
...@@ -226,15 +284,21 @@ Please refer to: [Documentation Visual Q&A](../vqa/README.md) . ...@@ -226,15 +284,21 @@ Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
| layout_dict_path | The dictionary path of layout analysis model| ../ppocr/utils/dict/layout_publaynet_dict.txt | | layout_dict_path | The dictionary path of layout analysis model| ../ppocr/utils/dict/layout_publaynet_dict.txt |
| layout_score_threshold | The box threshold path of layout analysis model| 0.5| | layout_score_threshold | The box threshold path of layout analysis model| 0.5|
| layout_nms_threshold | The nms threshold path of layout analysis model| 0.5| | layout_nms_threshold | The nms threshold path of layout analysis model| 0.5|
| vqa_algorithm | vqa model algorithm| LayoutXLM| | kie_algorithm | kie model algorithm| LayoutXLM|
| ser_model_dir | Ser model inference model path| None| | ser_model_dir | Ser model inference model path| None|
| ser_dict_path | The dictionary path of Ser model| ../train_data/XFUND/class_list_xfun.txt| | ser_dict_path | The dictionary path of Ser model| ../train_data/XFUND/class_list_xfun.txt|
| mode | structure or vqa | structure | | mode | structure or kie | structure |
| image_orientation | Whether to perform image orientation classification in forward | False | | image_orientation | Whether to perform image orientation classification in forward | False |
| layout | Whether to perform layout analysis in forward | True | | layout | Whether to perform layout analysis in forward | True |
| table | Whether to perform table recognition in forward | True | | table | Whether to perform table recognition in forward | True |
| ocr | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False| True | | ocr | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False| True |
| recovery | Whether to perform layout recovery in forward| False | | recovery | Whether to perform layout recovery in forward| False |
| save_pdf | Whether to convert docx to pdf when recovery| False |
| structure_version | Structure version, optional PP-structure and PP-structurev2 | PP-structure | | structure_version | Structure version, optional PP-structure and PP-structurev2 | PP-structure |
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md) Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)
<a name="3"></a>
## 3. Summary
Through the content in this section, you can master the use of PP-Structure related functions through PaddleOCR whl package. Please refer to [documentation tutorial](../../README.md) for more detailed usage tutorials including model training, inference and deployment, etc.
\ No newline at end of file
ppstructure/docs/table/layout.jpg

671.5 KB | W: | H:

ppstructure/docs/table/layout.jpg

671.5 KB | W: | H:

ppstructure/docs/table/layout.jpg
ppstructure/docs/table/layout.jpg
ppstructure/docs/table/layout.jpg
ppstructure/docs/table/layout.jpg
  • 2-up
  • Swipe
  • Onion skin
ppstructure/docs/table/paper-image.jpg

671.5 KB | W: | H:

ppstructure/docs/table/paper-image.jpg

671.5 KB | W: | H:

ppstructure/docs/table/paper-image.jpg
ppstructure/docs/table/paper-image.jpg
ppstructure/docs/table/paper-image.jpg
ppstructure/docs/table/paper-image.jpg
  • 2-up
  • Swipe
  • Onion skin
English | [简体中文](README_ch.md)
- [1. Introduction](#1-introduction)
- [2. Accuracy and performance](#2-Accuracy-and-performance)
- [3. Visualization](#3-Visualization)
- [3.1 SER](#31-ser)
- [3.2 RE](#32-re)
- [4. Usage](#4-usage)
- [4.1 Prepare for the environment](#41-Prepare-for-the-environment)
- [4.2 Quick start](#42-Quick-start)
- [4.3 More](#43-More)
- [5. Reference](#5-Reference)
- [6. License](#6-License)
## 1. Introduction
Key information extraction (KIE) refers to extracting key information from text or images. As downstream task of OCR, the key information extraction task of document image has many practical application scenarios, such as form recognition, ticket information extraction, ID card information extraction, etc.
PP-Structure conducts research based on the LayoutXLM multi-modal, and proposes the VI-LayoutXLM, which gets rid of visual features when finetuning the downstream tasks. An textline sorting method is also utilized to fit in reading order. What's more, UDML knowledge distillation is used for higher accuracy. Finally, the accuracy and inference speed of VI-LayoutXLM surpass those of LayoutXLM.
The main features of the key information extraction module in PP-Structure are as follows.
- Integrate multi-modal methods such as [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf), VI-LayoutXLM, and PP-OCR inference engine.
- Supports Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks based on multimodal methods. Based on the SER task, the text recognition and classification in the image can be completed; based on the RE task, the relationship extraction of the text content in the image can be completed, such as judging the problem pair (pair).
- Supports custom training for SER tasks and RE tasks.
- Supports end-to-end system prediction and evaluation of OCR+SER.
- Supports end-to-end system prediction of OCR+SER+RE.
- Support SER model export and inference using PaddleInference.
## 2. Accuracy and performance
We evaluate the methods on the Chinese dataset of [XFUND](https://github.com/doc-analysis/XFUND), and the performance is as follows
|Model | Backbone | Task | Config file | Hmean | Inference time (ms) | Download link|
| --- | --- | --- | --- | --- | --- | --- |
|VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**| 15.49|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)|
|LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%| 19.49 | [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)|
|VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**| 15.49|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)|
|LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%| 19.49|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)|
* Note:Inference environment:V100 GPU + cuda10.2 + cudnn8.1.1 + TensorRT 7.2.3.4,tested using fp16.
For more KIE models in PaddleOCR, please refer to [KIE model zoo](../../doc/doc_en/algorithm_overview_en.md).
## 3. Visualization
There are two main solutions to the key information extraction task based on VI-LayoutXLM series model.
(1) Text detection + text recognition + semantic entity recognition (SER)
(2) Text detection + text recognition + semantic entity recognition (SER) + relationship extraction (RE)
The following images are demo results of the SER and RE models. For more detailed introduction to the above solutions, please refer to [KIE Guide](./how_to_do_kie.md).
### 3.1 SER
Demo results for SER task are as follows.
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539141-68e71c75-5cf7-4529-b2ca-219d29fa5f68.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539735-37b5c2ef-629d-43fe-9abb-44bb717ef7ee.jpg" width="600">
</div>
**Note:** test pictures are from [xfund dataset](https://github.com/doc-analysis/XFUND), [invoice dataset](https://aistudio.baidu.com/aistudio/datasetdetail/165561) and a composite ID card dataset.
Boxes of different colors in the image represent different categories.
The invoice and application form images have three categories: `request`, `answer` and `header`. The `question` and 'answer' can be used to extract the relationship.
For the ID card image, the mdoel can be directly identify the key information such as `name`, `gender`, `nationality`, so that the subsequent relationship extraction process is not required, and the key information extraction task can be completed using only on model.
### 3.2 RE
Demo results for RE task are as follows.
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540291-f64e5daf-6d42-4e7c-bbbb-471e3fac4fcc.png" width="600">
</div>
Red boxes are questions, blue boxes are answers. The green lines means the two conected objects are a pair.
## 4. Usage
### 4.1 Prepare for the environment
Use the following command to install KIE dependencies.
```bash
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
pip install -r requirements.txt
pip install -r ppstructure/kie/requirements.txt
# 安装PaddleOCR引擎用于预测
pip install paddleocr -U
```
The visualized results of SER are saved in the `./output` folder by default. Examples of results are as follows.
<div align="center">
<img src="../../ppstructure/docs/kie/result_ser/zh_val_42_ser.jpg" width="800">
</div>
### 4.2 Quick start
Here we use XFUND dataset to quickly experience the SER model and RE model.
#### 4.2.1 Prepare for the dataset
```bash
mkdir train_data
cd train_data
# download and uncompress the dataset
wget https://paddleocr.bj.bcebos.com/ppstructure/dataset/XFUND.tar && tar -xf XFUND.tar
cd ..
```
#### 4.2.2 Predict images using the trained model
Use the following command to download the models.
```bash
mkdir pretrained_model
cd pretrained_model
# download and uncompress the SER trained model
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar && tar -xf ser_vi_layoutxlm_xfund_pretrained.tar
# download and uncompress the RE trained model
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar && tar -xf re_vi_layoutxlm_xfund_pretrained.tar
```
If you want to use OCR engine to obtain end-to-end prediction results, you can use the following command to predict.
```bash
# just predict using SER trained model
python3 tools/infer_kie_token_ser.py \
-c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./ppstructure/docs/kie/input/zh_val_42.jpg
# predict using SER and RE trained model at the same time
python3 ./tools/infer_kie_token_ser_re.py \
-c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./train_data/XFUND/zh_val/image/zh_val_42.jpg \
-c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o_ser Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy
```
The visual result images and the predicted text file will be saved in the `Global.save_res_path` directory.
If you want to load the text detection and recognition results collected before, you can use the following command to predict.
```bash
# just predict using SER trained model
python3 tools/infer_kie_token_ser.py \
-c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./train_data/XFUND/zh_val/val.json \
Global.infer_mode=False
# predict using SER and RE trained model at the same time
python3 ./tools/infer_kie_token_ser_re.py \
-c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./train_data/XFUND/zh_val/val.json \
Global.infer_mode=False \
-c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o_ser Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy
```
#### 4.2.3 Inference using PaddleInference
At present, only SER model supports inference using PaddleInference.
Firstly, download the inference SER inference model.
```bash
mkdir inference
cd inference
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar && tar -xf ser_vi_layoutxlm_xfund_infer.tar
```
Use the following command for inference.
```bash
cd ppstructure
python3 kie/predict_kie_token_ser.py \
--kie_algorithm=LayoutXLM \
--ser_model_dir=../inference/ser_vi_layoutxlm_xfund_infer \
--image_dir=./docs/kie/input/zh_val_42.jpg \
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
--vis_font_path=../doc/fonts/simfang.ttf \
--ocr_order_method="tb-yx"
```
The visual results and text file will be saved in directory `output`.
### 4.3 More
For training, evaluation and inference tutorial for KIE models, please refer to [KIE doc](../../doc/doc_en/kie_en.md).
For training, evaluation and inference tutorial for text detection models, please refer to [text detection doc](../../doc/doc_en/detection_en.md).
For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](../../doc/doc_en/recognition.md).
If you want to finish the KIE tasks in your scene, and don't know what to prepare, please refer to [End cdoc](../../doc/doc_en/recognition.md).
To complete the key information extraction task in your own scenario from data preparation to model selection, please refer to: [Guide to End-to-end KIE](./how_to_do_kie_en.md)
## 5. Reference
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
- XFUND dataset, https://github.com/doc-analysis/XFUND
## 6. License
The content of this project itself is licensed under the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
[English](README.md) | 简体中文
# 关键信息抽取
- [1. 简介](#1-简介)
- [2. 精度与性能](#2-精度与性能)
- [3. 效果演示](#3-效果演示)
- [3.1 SER](#31-ser)
- [3.2 RE](#32-re)
- [4. 使用](#4-使用)
- [4.1 准备环境](#41-准备环境)
- [4.2 快速开始](#42-快速开始)
- [4.3 更多](#43-更多)
- [5. 参考链接](#5-参考链接)
- [6. License](#6-License)
## 1. 简介
关键信息抽取 (Key Information Extraction, KIE)指的是是从文本或者图像中,抽取出关键的信息。针对文档图像的关键信息抽取任务作为OCR的下游任务,存在非常多的实际应用场景,如表单识别、车票信息抽取、身份证信息抽取等。
PP-Structure 基于 LayoutXLM 文档多模态系列方法进行研究与优化,设计了视觉特征无关的多模态模型结构VI-LayoutXLM,同时引入符合阅读顺序的文本行排序方法以及UDML联合互学习蒸馏方法,最终在精度与速度均超越LayoutXLM。
PP-Structure中关键信息抽取模块的主要特性如下:
- 集成[LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf)、VI-LayoutXLM等多模态模型以及PP-OCR预测引擎。
- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
- 支持SER任务和RE任务的自定义训练。
- 支持OCR+SER的端到端系统预测与评估。
- 支持OCR+SER+RE的端到端系统预测。
- 支持SER模型的动转静导出与基于PaddleInfernece的模型推理。
## 2. 精度与性能
我们在 [XFUND](https://github.com/doc-analysis/XFUND) 的中文数据集上对算法进行了评估,SER与RE上的任务性能如下
|模型|骨干网络|任务|配置文件|hmean|预测耗时(ms)|下载链接|
| --- | --- | --- | --- | --- | --- | --- |
|VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**| 15.49|[训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)|
|LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%| 19.49 | [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)|
|VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**| 15.49|[训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)|
|LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%| 19.49|[训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)|
* 注:预测耗时测试条件:V100 GPU + cuda10.2 + cudnn8.1.1 + TensorRT 7.2.3.4,使用FP16进行测试。
更多关于PaddleOCR中关键信息抽取模型的介绍,请参考[关键信息抽取模型库](../../doc/doc_ch/algorithm_overview.md)
## 3. 效果演示
基于多模态模型的关键信息抽取任务有2种主要的解决方案。
(1)文本检测 + 文本识别 + 语义实体识别(SER)
(2)文本检测 + 文本识别 + 语义实体识别(SER) + 关系抽取(RE)
下面给出SER与RE任务的示例效果,关于上述解决方案的详细介绍,请参考[关键信息抽取全流程指南](./how_to_do_kie.md)
### 3.1 SER
对于SER任务,效果如下所示。
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539141-68e71c75-5cf7-4529-b2ca-219d29fa5f68.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539735-37b5c2ef-629d-43fe-9abb-44bb717ef7ee.jpg" width="600">
</div>
**注意:** 测试图片来源于[XFUND数据集](https://github.com/doc-analysis/XFUND)[发票数据集](https://aistudio.baidu.com/aistudio/datasetdetail/165561)以及合成的身份证数据集。
图中不同颜色的框表示不同的类别。
图中的发票以及申请表图像,有`QUESTION`, `ANSWER`, `HEADER` 3种类别,识别的`QUESTION`, `ANSWER`可以用于后续的问题与答案的关系抽取。
图中的身份证图像,则直接识别出其中的`姓名``性别``民族`等关键信息,这样就无需后续的关系抽取过程,一个模型即可完成关键信息抽取。
### 3.2 RE
对于RE任务,效果如下所示。
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540291-f64e5daf-6d42-4e7c-bbbb-471e3fac4fcc.png" width="600">
</div>
红色框是问题,蓝色框是答案。绿色线条表示连接的两端为一个key-value的pair。
## 4. 使用
### 4.1 准备环境
使用下面的命令安装运行SER与RE关键信息抽取的依赖。
```bash
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
pip install -r requirements.txt
pip install -r ppstructure/kie/requirements.txt
# 安装PaddleOCR引擎用于预测
pip install paddleocr -U
```
### 4.2 快速开始
下面XFUND数据集,快速体验SER模型与RE模型。
#### 4.2.1 准备数据
```bash
mkdir train_data
cd train_data
# 下载与解压数据
wget https://paddleocr.bj.bcebos.com/ppstructure/dataset/XFUND.tar && tar -xf XFUND.tar
cd ..
```
#### 4.2.2 基于动态图的预测
首先下载模型。
```bash
mkdir pretrained_model
cd pretrained_model
# 下载并解压SER预训练模型
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar && tar -xf ser_vi_layoutxlm_xfund_pretrained.tar
# 下载并解压RE预训练模型
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar && tar -xf re_vi_layoutxlm_xfund_pretrained.tar
```
如果希望使用OCR引擎,获取端到端的预测结果,可以使用下面的命令进行预测。
```bash
# 仅预测SER模型
python3 tools/infer_kie_token_ser.py \
-c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./ppstructure/docs/kie/input/zh_val_42.jpg
# SER + RE模型串联
python3 ./tools/infer_kie_token_ser_re.py \
-c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./train_data/XFUND/zh_val/image/zh_val_42.jpg \
-c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o_ser Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy
```
`Global.save_res_path`目录中会保存可视化的结果图像以及预测的文本文件。
如果希望加载标注好的文本检测与识别结果,仅预测可以使用下面的命令进行预测。
```bash
# 仅预测SER模型
python3 tools/infer_kie_token_ser.py \
-c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./train_data/XFUND/zh_val/val.json \
Global.infer_mode=False
# SER + RE模型串联
python3 ./tools/infer_kie_token_ser_re.py \
-c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml \
-o Architecture.Backbone.checkpoints=./pretrain_models/re_vi_layoutxlm_xfund_pretrained/best_accuracy \
Global.infer_img=./train_data/XFUND/zh_val/val.json \
Global.infer_mode=False \
-c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml \
-o_ser Architecture.Backbone.checkpoints=./pretrain_models/ser_vi_layoutxlm_xfund_pretrained/best_accuracy
```
#### 4.2.3 基于PaddleInference的预测
目前仅SER模型支持PaddleInference推理。
首先下载SER的推理模型。
```bash
mkdir inference
cd inference
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar && tar -xf ser_vi_layoutxlm_xfund_infer.tar
```
执行下面的命令进行预测。
```bash
cd ppstructure
python3 kie/predict_kie_token_ser.py \
--kie_algorithm=LayoutXLM \
--ser_model_dir=../inference/ser_vi_layoutxlm_xfund_infer \
--image_dir=./docs/kie/input/zh_val_42.jpg \
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
--vis_font_path=../doc/fonts/simfang.ttf \
--ocr_order_method="tb-yx"
```
可视化结果保存在`output`目录下。
### 4.3 更多
关于KIE模型的训练评估与推理,请参考:[关键信息抽取教程](../../doc/doc_ch/kie.md)
关于文本检测模型的训练评估与推理,请参考:[文本检测教程](../../doc/doc_ch/detection.md)
关于文本识别模型的训练评估与推理,请参考:[文本识别教程](../../doc/doc_ch/recognition.md)
关于怎样在自己的场景中完成关键信息抽取任务,请参考:[关键信息抽取全流程指南](./how_to_do_kie.md)
## 5. 参考链接
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
- XFUND dataset, https://github.com/doc-analysis/XFUND
## 6. License
The content of this project itself is licensed under the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
# Key Information Extraction Pipeline
- [1. Introduction](#1-Introduction)
- [1.1 Background](#11-Background)
- [1.2 Mainstream Deep-learning Solutions](#12-Mainstream-Deep-learning-Solutions)
- [2. KIE Pipeline](#2-KIE-Pipeline)
- [2.1 Train OCR Models](#21-Train-OCR-Models)
- [2.2 Train KIE Models](#22-Train-KIE-Models)
- [3. Reference](#3-Reference)
## 1. Introduction
### 1.1 Background
Key information extraction (KIE) refers to extracting key information from text or images. As the downstream task of OCR, KIE of document image has many practical application scenarios, such as form recognition, ticket information extraction, ID card information extraction, etc. However, it is time-consuming and laborious to extract key information from these document images by manpower. It's challengable but also valuable to combine multi-modal features (visual, layout, text, etc) together and complete KIE tasks.
For the document images in a specific scene, the position and layout of the key information are relatively fixed. Therefore, in the early stage of the research, there are many methods based on template matching to extract the key information. This method is still widely used in many simple scenarios at present. However, it takes long time to adjut the template for different scenarios.
The KIE in the document image generally contains 2 subtasks, which is as shown follows.
* (1) SER: semantic entity recognition, which classifies each detected textline, such as dividing it into name and ID card. As shown in the red boxes in the following figure.
* (2) RE: relationship extraction, which matches the question and answer based on SER results. As shown in the figure below, the yellow arrows match the question and answer.
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185726510-faba470d-2c79-4784-b8da-6c1aa5af9572.png" width="800">
</div>
### 1.2 Mainstream Deep-learning Solutions
General KIE methods are based on Named Entity Recognition (NER), but such methods only use text information and ignore location and visual feature information, which leads to limited accuracy. In recent years, most scholars have started to combine mutil-modal features to improve the accuracy of KIE model. The main methods are as follows:
* (1) Grid based methods. These methods mainly focus on the fusion of multi-modal information at the image level. Most texts are of character granularity. The text and structure information embedding method is simple, such as the algorithm of chargrid [1].
* (2) Token based methods. These methods refer to the NLP methods such as Bert, which encode the position, vision and other feature information into the multi-modal model, and conduct pre-training on large-scale datasets, so that in downstream tasks, only a small amount of annotation data is required to obtain excellent results. The representative algorithms are layoutlm [2], layoutlmv2 [3], layoutxlm [4], structext [5], etc.
* (3) GCN based methods. These methods try to learn the structural information between images and characters, so as to solve the problem of extracting open set information (templates not seen in the training set), such as GCN [6], SDMGR [7] and other algorithms.
* (4) End to end based methods: these methods put the existing OCR character recognition and KIE information extraction tasks into a unified network for common learning, and strengthen each other in the learning process. Such as TRIE [8].
For more detailed introduction of the algorithms, please refer to Chapter 6 of [Diving into OCR](https://aistudio.baidu.com/aistudio/education/group/info/25207).
## 2. KIE Pipeline
Token based methods such as LayoutXLM are implemented in PaddleOCR. What's more, in PP-Structurev2, we simplify the LayoutXLM model and proposed VI-LayoutXLM, in which the visual feature extraction module is removed for speed-up. The textline sorting strategy conforming to the human reading order and UDML knowledge distillation strategy are utilized for higher model accuracy.
In the non end-to-end KIE method, KIE needs at least ** 2 steps**. Firstly, the OCR model is used to extract the text and its position. Secondly, the KIE model is used to extract the key information according to the image, text position and text content.
### 2.1 Train OCR Models
#### 2.1.1 Text Detection
**(1) Data**
Most of the models provided in PaddleOCR are general models. In the process of text detection, the detection of adjacent text lines is generally based on the distance of the position. As shown in the figure above, when using PP-OCRv3 general English detection model for text detection, it is easy to detect the two fields representing different propoerties as one. Therefore, it is suggested to finetune a detection model according to your scenario firstly during the KIE task.
During data annotation, the different key information needs to be separated. Otherwise, it will increase the difficulty of subsequent KIE tasks.
For downstream tasks, generally speaking, `200~300` training images can guarantee the basic training effect. If there is not too much prior knowledge, **`200~300`** images can be labeled firstly for subsequent text detection model training.
**(2) Model**
In terms of model selection, PP-OCRv3 detection model is recommended. For more information about the training methods of the detection model, please refer to: [Text detection tutorial](../../doc/doc_en/detection_en.md) and [PP-OCRv3 detection model tutorial](../../doc/doc_ch/PPOCRv3_det_train.md).
#### 2.1.2 Text recognition
Compared with the natural scene, the text recognition in the document image is generally relatively easier (the background is not too complex), so **it is suggested to** try the PP-OCRv3 general text recognition model provided in PaddleOCR ([PP-OCRv3 model list](../../doc/doc_en/models_list_en.md))
**(1) Data**
However, there are also some challenges in some document scenarios, such as rare words in ID card scenarios and special fonts in invoice and other scenarios. These problems will increase the difficulty of text recognition. At this time, if you want to ensure or further improve the model accuracy, it is recommended to load PP-OCRv3 model based on the text recognition dataset of specific document scenarios for finetuning.
In the process of model finetuning, it is recommended to prepare at least `5000` vertical scene text recognition images to ensure the basic model fine-tuning effect. If you want to improve the accuracy and generalization ability of the model, you can synthesize more text recognition images similar to the scene, collect general real text recognition data from the public data set, and add them to the text recognition training process. In the training process, it is suggested that the ratio of real data, synthetic data and general data of each epoch should be around `1:1:1`, which can be controlled by setting the sampling ratio of different data sources. If there are 3 training text files, including 10k, 20k and 50k pieces of data respectively, the data can be set in the configuration file as follows:
```yml
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
label_file_list:
- ./train_data/train_list_10k.txt
- ./train_data/train_list_10k.txt
- ./train_data/train_list_50k.txt
ratio_list: [1.0, 0.5, 0.2]
...
```
**(2) Model**
In terms of model selection, PP-OCRv3 recognition model is recommended. For more information about the training methods of the recognition model, please refer to: [Text recognition tutorial](../../doc/doc_en/recognition_en.md) and [PP-OCRv3 model list](../../doc/doc_en/models_list_en.md).
### 2.2 Train KIE Models
There are two main methods to extract the key information from the recognized texts.
(1) Directly use SER model to obtain the key information category. For example, in the ID card scenario, we mark "name" and "Geoff Sample" as "name_key" and "name_value", respectively. The **text field** corresponding to the category "name_value" finally identified is the key information we need.
(2) Joint use SER and RE models. For this case, we firstly use SER model to obtain all questions (keys) and questions (values) for the image text, and then use RE model to match all keys and values to find the relationship, so as to complete the extraction of key information.
#### 2.2.1 SER
Take the ID card scenario as an example. The key information generally includes `name`, `DOB`, etc. We can directly mark the corresponding fields as specific categories, as shown in the following figure.
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185728456-dc396f47-0880-4279-9c7c-c99601bf16a7.png" width="500">
</div>
**Note:**
- In the labeling process, text content without key information about KIE shall be labeled as`other`, which is equivalent to background information. For example, in the ID card scenario, if we do not pay attention to `DOB` information, we can mark the categories of `DOB` and `Area manager` as `other`.
- In the annotation process of, it is required to annotate the **textline** position rather than the character.
In terms of data, generally speaking, for relatively fixed scenes, **50** training images can achieve acceptable effects. You can refer to [PPOCRLabel](../../PPOCRLabel/README.md) for finish the labeling process.
In terms of model, it is recommended to use the VI-layoutXLM model proposed in PP-Structurev2. It is improved based on the LayoutXLM model, removing the visual feature extraction module, and further improving the model inference speed without the significant reduction on model accuracy. For more tutorials, please refer to [VI-LayoutXLM introduction](../../doc/doc_en/algorithm_kie_vi_layoutxlm_en.md) and [KIE tutorial](../../doc/doc_en/kie_en.md).
#### 2.2.2 SER + RE
The SER model is mainly used to identify all keys and values in the document image, and the RE model is mainly used to match all keys and values.
Taking the ID card scenario as an example, the key information generally includes key information such as `name`, `DOB`, etc. in the SER stage, we need to identify all questions (keys) and answers (values). The demo annotation is as follows. All keys can be annotated as `question`, and all keys can be annotated as `answer`.
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185728881-b6055e01-034c-4584-aaa6-97c9c25fb61b.png" width="500">
</div>
In the RE stage, the ID and connection information of each field need to be marked, as shown in the following figure.
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185728948-a4208013-5038-4025-9a93-0c6d51447488.png" width="500">
</div>
For each textline, you need to add 'ID' and 'linking' field information. The 'ID' records the unique identifier of the textline. Different text contents in the same images cannot be repeated. The 'linking' is a list that records the connection information between different texts. If the ID of the field "name" is 0 and the ID of the field "Geoff Sample" is 1, then they all have [[0, 1]] 'linking' marks, indicating that the fields with `id=0` and `id=1` form a key value relationship (the fields such as DOB and Expires are similar, and will not be repeated here).
**Note:**
-During annotation, if value is multiple textines, a key value pair can be added in linking, such as `[[0, 1], [0, 2]]`.
In terms of data, generally speaking, for relatively fixed scenes, about **50** training images can achieve acceptable effects.
In terms of model, it is recommended to use the VI-layoutXLM model proposed in PP-Structurev2. It is improved based on the LayoutXLM model, removing the visual feature extraction module, and further improving the model inference speed without the significant reduction on model accuracy. For more tutorials, please refer to [VI-LayoutXLM introduction](../../doc/doc_en/algorithm_kie_vi_layoutxlm_en.md) and [KIE tutorial](../../doc/doc_en/kie_en.md).
## 3. Reference
[1] Katti A R, Reisswig C, Guder C, et al. Chargrid: Towards understanding 2d documents[J]. arXiv preprint arXiv:1809.08799, 2018.
[2] Xu Y, Li M, Cui L, et al. Layoutlm: Pre-training of text and layout for document image understanding[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 1192-1200.
[3] Xu Y, Xu Y, Lv T, et al. LayoutLMv2: Multi-modal pre-training for visually-rich document understanding[J]. arXiv preprint arXiv:2012.14740, 2020.
[4]: Xu Y, Lv T, Cui L, et al. Layoutxlm: Multimodal pre-training for multilingual visually-rich document understanding[J]. arXiv preprint arXiv:2104.08836, 2021.
[5] Li Y, Qian Y, Yu Y, et al. StrucTexT: Structured Text Understanding with Multi-Modal Transformers[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 1912-1920.
[6] Liu X, Gao F, Zhang Q, et al. Graph convolution for multimodal information extraction from visually rich documents[J]. arXiv preprint arXiv:1903.11279, 2019.
[7] Sun H, Kuang Z, Yue X, et al. Spatial Dual-Modality Graph Reasoning for Key Information Extraction[J]. arXiv preprint arXiv:2103.14470, 2021.
[8] Zhang P, Xu Y, Cheng Z, et al. Trie: End-to-end text reading and information extraction for document understanding[C]//Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1413-1422.
...@@ -30,7 +30,7 @@ from ppocr.data import create_operators, transform ...@@ -30,7 +30,7 @@ from ppocr.data import create_operators, transform
from ppocr.postprocess import build_post_process from ppocr.postprocess import build_post_process
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
from ppocr.utils.visual import draw_ser_results from ppocr.utils.visual import draw_ser_results
from ppocr.utils.utility import get_image_file_list, check_and_read_gif from ppocr.utils.utility import get_image_file_list, check_and_read
from ppstructure.utility import parse_args from ppstructure.utility import parse_args
from paddleocr import PaddleOCR from paddleocr import PaddleOCR
...@@ -49,7 +49,7 @@ class SerPredictor(object): ...@@ -49,7 +49,7 @@ class SerPredictor(object):
pre_process_list = [{ pre_process_list = [{
'VQATokenLabelEncode': { 'VQATokenLabelEncode': {
'algorithm': args.vqa_algorithm, 'algorithm': args.kie_algorithm,
'class_path': args.ser_dict_path, 'class_path': args.ser_dict_path,
'contains_re': False, 'contains_re': False,
'ocr_engine': self.ocr_engine, 'ocr_engine': self.ocr_engine,
...@@ -138,7 +138,7 @@ def main(args): ...@@ -138,7 +138,7 @@ def main(args):
os.path.join(args.output, 'infer.txt'), mode='w', os.path.join(args.output, 'infer.txt'), mode='w',
encoding='utf-8') as f_w: encoding='utf-8') as f_w:
for image_file in image_file_list: for image_file in image_file_list:
img, flag = check_and_read_gif(image_file) img, flag, _ = check_and_read(image_file)
if not flag: if not flag:
img = cv2.imread(image_file) img = cv2.imread(image_file)
img = img[:, :, ::-1] img = img[:, :, ::-1]
......
sentencepiece sentencepiece
yacs yacs
seqeval seqeval
paddlenlp>=2.2.1 git+https://github.com/PaddlePaddle/PaddleNLP
pypandoc pypandoc
attrdict attrdict
python_docx python_docx
...@@ -20,7 +20,7 @@ from shapely.geometry import Polygon ...@@ -20,7 +20,7 @@ from shapely.geometry import Polygon
import numpy as np import numpy as np
from collections import defaultdict from collections import defaultdict
import operator import operator
import Levenshtein from rapidfuzz.distance import Levenshtein
import argparse import argparse
import json import json
import copy import copy
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
此差异已折叠。
...@@ -28,12 +28,13 @@ import tools.infer.utility as utility ...@@ -28,12 +28,13 @@ import tools.infer.utility as utility
from ppocr.data import create_operators, transform from ppocr.data import create_operators, transform
from ppocr.postprocess import build_post_process from ppocr.postprocess import build_post_process
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
from ppocr.utils.utility import get_image_file_list, check_and_read_gif from ppocr.utils.utility import get_image_file_list, check_and_read
from ppstructure.utility import parse_args from ppstructure.utility import parse_args
from picodet_postprocess import PicoDetPostProcess from picodet_postprocess import PicoDetPostProcess
logger = get_logger() logger = get_logger()
class LayoutPredictor(object): class LayoutPredictor(object):
def __init__(self, args): def __init__(self, args):
pre_process_list = [{ pre_process_list = [{
...@@ -109,7 +110,7 @@ def main(args): ...@@ -109,7 +110,7 @@ def main(args):
repeats = 50 repeats = 50
for image_file in image_file_list: for image_file in image_file_list:
img, flag = check_and_read_gif(image_file) img, flag, _ = check_and_read(image_file)
if not flag: if not flag:
img = cv2.imread(image_file) img = cv2.imread(image_file)
if img is None: if img is None:
......
此差异已折叠。
...@@ -6,6 +6,8 @@ English | [简体中文](README_ch.md) ...@@ -6,6 +6,8 @@ English | [简体中文](README_ch.md)
- [2.1 Installation dependencies](#2.1) - [2.1 Installation dependencies](#2.1)
- [2.2 Install PaddleOCR](#2.2) - [2.2 Install PaddleOCR](#2.2)
- [3. Quick Start](#3) - [3. Quick Start](#3)
- [3.1 Download models](#3.1)
- [3.2 Layout recovery](#3.2)
<a name="1"></a> <a name="1"></a>
...@@ -17,8 +19,9 @@ Layout recovery combines [layout analysis](../layout/README.md)、[table recogni ...@@ -17,8 +19,9 @@ Layout recovery combines [layout analysis](../layout/README.md)、[table recogni
The following figure shows the result: The following figure shows the result:
<div align="center"> <div align="center">
<img src="../docs/table/recovery.jpg" width = "700" /> <img src="../docs/recovery/recovery.jpg" width = "700" />
</div> </div>
<a name="2"></a> <a name="2"></a>
## 2. Install ## 2. Install
...@@ -33,14 +36,14 @@ The following figure shows the result: ...@@ -33,14 +36,14 @@ The following figure shows the result:
python3 -m pip install --upgrade pip python3 -m pip install --upgrade pip
# GPU installation # GPU installation
python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simple
# CPU installation # CPU installation
python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple
```` ````
For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick). For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/macos-pip_en.html).
<a name="2.2"></a> <a name="2.2"></a>
...@@ -67,20 +70,61 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt ...@@ -67,20 +70,61 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt
## 3. Quick Start ## 3. Quick Start
```python <a name="3.1"></a>
### 3.1 Download models
If input is English document, download English models:
```bash
cd PaddleOCR/ppstructure cd PaddleOCR/ppstructure
# download model # download model
mkdir inference && cd inference mkdir inference && cd inference
# Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it # Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it # Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it # Download the ultra-lightweight English table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of publaynet dataset and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
cd .. cd ..
# run ```
python3 predict_system.py --det_model_dir=inference/en_PP-OCRv3_det_infer --rec_model_dir=inference/en_PP-OCRv3_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --rec_char_dict_path=../ppocr/utils/en_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --output ./output/table --rec_image_shape=3,48,320 --vis_font_path=../doc/fonts/simfang.ttf --recovery=True --image_dir=./docs/table/1.png If input is Chinese document,download Chinese models:
[Chinese and English ultra-lightweight PP-OCRv3 model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README.md#pp-ocr-series-model-listupdate-on-september-8th)、[表格识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#22-表格识别模型)、[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-版面分析模型)
<a name="3.2"></a>
### 3.2 Layout recovery
```bash
python3 predict_system.py \
--image_dir=./docs/table/1.png \
--det_model_dir=inference/en_PP-OCRv3_det_infer \
--rec_model_dir=inference/en_PP-OCRv3_rec_infer \
--rec_char_dict_path=../ppocr/utils/en_dict.txt \
--table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
--layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
--vis_font_path=../doc/fonts/simfang.ttf \
--recovery=True \
--save_pdf=False \
--output=../output/
``` ```
After running, the docx of each picture will be saved in the directory specified by the output field After running, the docx of each picture will be saved in the directory specified by the output field
Field:
- image_dir:test file测试文件, can be picture, picture directory, pdf file, pdf file directory
- det_model_dir:OCR detection model path
- rec_model_dir:OCR recognition model path
- rec_char_dict_path:OCR recognition dict path. If the Chinese model is used, change to "../ppocr/utils/ppocr_keys_v1.txt". And if you trained the model on your own dataset, change to the trained dictionary
- table_model_dir:tabel recognition model path
- table_char_dict_path:tabel recognition dict path. If the Chinese model is used, no need to change
- layout_model_dir:layout analysis model path
- layout_dict_path:layout analysis dict path. If the Chinese model is used, change to "../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt"
- recovery:whether to enable layout of recovery, default False
- save_pdf:when recovery file, whether to save pdf file, default False
- output:save the recovery result path
此差异已折叠。
此差异已折叠。
opencv-contrib-python==4.4.0.46
pypandoc pypandoc
python-docx python-docx
docx2pdf
fitz
PyMuPDF
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册