提交 21bf8918 编写于 作者: qq_25193841's avatar qq_25193841

Merge remote-tracking branch 'origin/release/2.6' into release2.6

...@@ -26,8 +26,15 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools ...@@ -26,8 +26,15 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
</div> </div>
## 📣 Recent updates ## 📣 Recent updates
- 💥 **Live Preview: Oct 24 - Oct 26, China Standard Time, 20:30**, Engineers@PaddleOCR will show PP-StructureV2 optimization strategy for 3 days.
- Scan the QR code below using WeChat, follow the PaddlePaddle official account and fill out the questionnaire to join the WeChat group, get the live link and 20G OCR learning materials (including PDF2Word application, 10 models in vertical scenarios, etc.)
<div align="center">
<img src="https://user-images.githubusercontent.com/50011306/196944258-0eb82df1-d730-4b96-a350-c1d370fdc2b1.jpg" width = "150" height = "150" />
</div>
- **🔥2022.8.24 Release PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)** - **🔥2022.8.24 Release PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- Release [PP-Structurev2](./ppstructure/),with functions and performance fully upgraded, adapted to Chinese scenes, and new support for [Layout Recovery](./ppstructure/recovery) and **one line command to convert PDF to Word**; - Release [PP-StructureV2](./ppstructure/),with functions and performance fully upgraded, adapted to Chinese scenes, and new support for [Layout Recovery](./ppstructure/recovery) and **one line command to convert PDF to Word**;
- [Layout Analysis](./ppstructure/layout) optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms; - [Layout Analysis](./ppstructure/layout) optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms;
- [Table Recognition](./ppstructure/table) optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption; - [Table Recognition](./ppstructure/table) optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption;
- [Key Information Extraction](./ppstructure/kie) optimization:a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%. - [Key Information Extraction](./ppstructure/kie) optimization:a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%.
...@@ -183,7 +190,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -183,7 +190,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
</details> </details>
<details open> <details open>
<summary>PP-Structurev2</summary> <summary>PP-StructureV2</summary>
- layout analysis + table recognition - layout analysis + table recognition
<div align="center"> <div align="center">
...@@ -192,7 +199,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -192,7 +199,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- SER (Semantic entity recognition) - SER (Semantic entity recognition)
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600"> <img src="https://user-images.githubusercontent.com/14270174/197464552-69de557f-edff-4c7f-acbf-069df1ba097f.png" width="600">
</div> </div>
<div align="center"> <div align="center">
......
...@@ -27,9 +27,14 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -27,9 +27,14 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 📣 近期更新 ## 📣 近期更新
- **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)** - **💥 直播预告:10.24-10.26日每晚8点半**,PaddleOCR研发团队详解PP-StructureV2优化策略。微信扫描下方二维码,关注公众号并填写问卷后进入官方交流群,获取直播链接与20G重磅OCR学习大礼包(内含PDF转Word应用程序、10种垂类模型、《动手学OCR》电子书等)
<div align="center">
<img src="https://user-images.githubusercontent.com/50011306/196944258-0eb82df1-d730-4b96-a350-c1d370fdc2b1.jpg" width = "150" height = "150" />
</div>
- 发布[PP-Structurev2](./ppstructure/README_ch.md),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery/README_ch.md),支持**一行命令完成PDF转Word** - **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- 发布[PP-StructureV2](./ppstructure/README_ch.md),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery/README_ch.md),支持**一行命令完成PDF转Word**
- [版面分析](./ppstructure/layout/README_ch.md)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms; - [版面分析](./ppstructure/layout/README_ch.md)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms;
- [表格识别](./ppstructure/table/README_ch.md)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%; - [表格识别](./ppstructure/table/README_ch.md)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%;
- [关键信息抽取](./ppstructure/kie/README_ch.md)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。 - [关键信息抽取](./ppstructure/kie/README_ch.md)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。
...@@ -38,11 +43,13 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -38,11 +43,13 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。 - 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。
- **2022.8 新增实现[8种前沿算法](doc/doc_ch/algorithm_overview.md)** - **2022.8 新增实现[8种前沿算法](doc/doc_ch/algorithm_overview.md)**
- 文本检测:[FCENet](doc/doc_ch/algorithm_det_fcenet.md), [DB++](doc/doc_ch/algorithm_det_db.md) - 文本检测:[FCENet](doc/doc_ch/algorithm_det_fcenet.md), [DB++](doc/doc_ch/algorithm_det_db.md)
- 文本识别:[ViTSTR](doc/doc_ch/algorithm_rec_vitstr.md), [ABINet](doc/doc_ch/algorithm_rec_abinet.md), [VisionLAN](doc/doc_ch/algorithm_rec_visionlan.md), [SPIN](doc/doc_ch/algorithm_rec_spin.md), [RobustScanner](doc/doc_ch/algorithm_rec_robustscanner.md) - 文本识别:[ViTSTR](doc/doc_ch/algorithm_rec_vitstr.md), [ABINet](doc/doc_ch/algorithm_rec_abinet.md), [VisionLAN](doc/doc_ch/algorithm_rec_visionlan.md), [SPIN](doc/doc_ch/algorithm_rec_spin.md), [RobustScanner](doc/doc_ch/algorithm_rec_robustscanner.md)
- 表格识别:[TableMaster](doc/doc_ch/algorithm_table_master.md) - 表格识别:[TableMaster](doc/doc_ch/algorithm_table_master.md)
- **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)** - **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上; - 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
- 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能; - 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
...@@ -230,7 +237,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -230,7 +237,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
</div> </div>
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600"> <img src="https://user-images.githubusercontent.com/14270174/197464552-69de557f-edff-4c7f-acbf-069df1ba097f.png" width="600">
</div> </div>
- RE(关系提取) - RE(关系提取)
......
...@@ -58,8 +58,8 @@ PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR ...@@ -58,8 +58,8 @@ PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR
| 类别 | 亮点 | 模型下载 | 教程 | 示例图 | | 类别 | 亮点 | 模型下载 | 教程 | 示例图 |
| ----------------- | ------------------------------ | -------------- | ----------------------------------- | ------------------------------------------------------------ | | ----------------- | ------------------------------ | -------------- | ----------------------------------- | ------------------------------------------------------------ |
| 车牌识别 | 多角度图像、轻量模型、端侧部署 | [模型下载](#2) | [中文](./轻量级车牌识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7" width = "200" height = "100" /> | | 车牌识别 | 多角度图像、轻量模型、端侧部署 | [模型下载](#2) | [中文](./轻量级车牌识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7" width = "200" height = "100" /> |
| 驾驶证/行驶证识别 | 请期待 | | | | | 驾驶证/行驶证识别 | 请期待 | | | |
| 快递单识别 | 请期待 | | | | | 快递单识别 | 请期待 | | | |
<a name="2"></a> <a name="2"></a>
......
...@@ -158,7 +158,7 @@ build/paddle_inference_install_dir/ ...@@ -158,7 +158,7 @@ build/paddle_inference_install_dir/
<a name="21"></a> <a name="21"></a>
### 2.1 Export the inference model ### 2.1 Export the inference model
* You can refer to [Model inference](../../doc/doc_ch/inference.md) and export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows. * You can refer to [Model inference](../../doc/doc_en/inference_en.md) and export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
``` ```
inference/ inference/
......
...@@ -42,7 +42,7 @@ The introduction and tutorial of Paddle Serving service deployment framework ref ...@@ -42,7 +42,7 @@ The introduction and tutorial of Paddle Serving service deployment framework ref
PaddleOCR operating environment and Paddle Serving operating environment are needed. PaddleOCR operating environment and Paddle Serving operating environment are needed.
1. Please prepare PaddleOCR operating environment reference [link](../../doc/doc_ch/installation.md). 1. Please prepare PaddleOCR operating environment reference [link](../../doc/doc_en/installation_en.md).
Download the corresponding paddlepaddle whl package according to the environment, it is recommended to install version 2.2.2. Download the corresponding paddlepaddle whl package according to the environment, it is recommended to install version 2.2.2.
2. The steps of PaddleServing operating environment prepare are as follows: 2. The steps of PaddleServing operating environment prepare are as follows:
......
...@@ -2,13 +2,13 @@ ...@@ -2,13 +2,13 @@
# PP-OCRv3 文本检测模型训练 # PP-OCRv3 文本检测模型训练
- [1. 简介](#1) - [1. 简介](#1)
- [2. PPOCRv3检测训练](#2) - [2. PP-OCRv3检测训练](#2)
- [3. 基于PPOCRv3检测的finetune训练](#3) - [3. 基于PP-OCRv3检测的finetune训练](#3)
<a name="1"></a> <a name="1"></a>
## 1. 简介 ## 1. 简介
PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PPOCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md) PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PP-OCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)
<a name="2"></a> <a name="2"></a>
...@@ -145,19 +145,19 @@ paddle.save(s_params, "./pretrain_models/cml_student.pdparams") ...@@ -145,19 +145,19 @@ paddle.save(s_params, "./pretrain_models/cml_student.pdparams")
<a name="3"></a> <a name="3"></a>
## 3. 基于PPOCRv3检测finetune训练 ## 3. 基于PP-OCRv3检测finetune训练
本节介绍如何使用PPOCRv3检测模型在其他场景上的finetune训练。 本节介绍如何使用PP-OCRv3检测模型在其他场景上的finetune训练。
finetune训练适用于三种场景: finetune训练适用于三种场景:
- 基于CML蒸馏方法的finetune训练,适用于教师模型在使用场景上精度高于PPOCRv3检测模型,且希望得到一个轻量检测模型。 - 基于CML蒸馏方法的finetune训练,适用于教师模型在使用场景上精度高于PP-OCRv3检测模型,且希望得到一个轻量检测模型。
- 基于PPOCRv3轻量检测模型的finetune训练,无需训练教师模型,希望在PPOCRv3检测模型基础上提升使用场景上的精度。 - 基于PP-OCRv3轻量检测模型的finetune训练,无需训练教师模型,希望在PP-OCRv3检测模型基础上提升使用场景上的精度。
- 基于DML蒸馏方法的finetune训练,适用于采用DML方法进一步提升精度的场景。 - 基于DML蒸馏方法的finetune训练,适用于采用DML方法进一步提升精度的场景。
**基于CML蒸馏方法的finetune训练** **基于CML蒸馏方法的finetune训练**
下载PPOCRv3训练模型: 下载PP-OCRv3训练模型:
``` ```
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar tar xf ch_PP-OCRv3_det_distill_train.tar
...@@ -177,10 +177,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs ...@@ -177,10 +177,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
Global.save_model_dir=./output/ Global.save_model_dir=./output/
``` ```
**基于PPOCRv3轻量检测模型的finetune训练** **基于PP-OCRv3轻量检测模型的finetune训练**
下载PPOCRv3训练模型,并提取Student结构的模型参数: 下载PP-OCRv3训练模型,并提取Student结构的模型参数:
``` ```
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar tar xf ch_PP-OCRv3_det_distill_train.tar
...@@ -248,5 +248,3 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/ ...@@ -248,5 +248,3 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
Architecture.Models.Student2.pretrained=./teacher \ Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/ Global.save_model_dir=./output/
``` ```
...@@ -99,9 +99,9 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广 ...@@ -99,9 +99,9 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
|SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) | |SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) |
|ViTSTR|ViTSTR| 79.82% | rec_vitstr_none_ce | [训练模型](https://paddleocr.bj.bcebos.com/rec_vitstr_none_ce_train.tar) | |ViTSTR|ViTSTR| 79.82% | rec_vitstr_none_ce | [训练模型](https://paddleocr.bj.bcebos.com/rec_vitstr_none_ce_train.tar) |
|ABINet|Resnet45| 90.75% | rec_r45_abinet | [训练模型](https://paddleocr.bj.bcebos.com/rec_r45_abinet_train.tar) | |ABINet|Resnet45| 90.75% | rec_r45_abinet | [训练模型](https://paddleocr.bj.bcebos.com/rec_r45_abinet_train.tar) |
|VisionLAN|Resnet45| 90.30% | rec_r45_visionlan | [训练模型](https://paddleocr.bj.bcebos.com/rec_r45_visionlan_train.tar) | |VisionLAN|Resnet45| 90.30% | rec_r45_visionlan | [训练模型](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar) |
|SPIN|ResNet32| 90.00% | rec_r32_gaspin_bilstm_att | coming soon | |SPIN|ResNet32| 90.00% | rec_r32_gaspin_bilstm_att | [训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_r32_gaspin_bilstm_att.tar) |
|RobustScanner|ResNet31| 87.77% | rec_r31_robustscanner | coming soon | |RobustScanner|ResNet31| 87.77% | rec_r31_robustscanner | [训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_r31_robustscanner.tar)|
<a name="2"></a> <a name="2"></a>
......
...@@ -27,7 +27,7 @@ ...@@ -27,7 +27,7 @@
|模型|骨干网络|配置文件|Acc|下载链接| |模型|骨干网络|配置文件|Acc|下载链接|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.3%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/rec_r45_visionlan_train.tar)| |VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.3%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)|
<a name="2"></a> <a name="2"></a>
## 2. 环境配置 ## 2. 环境配置
...@@ -80,7 +80,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_r45_visionlan.yml -o Global.infer_ ...@@ -80,7 +80,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_r45_visionlan.yml -o Global.infer_
<a name="4-1"></a> <a name="4-1"></a>
### 4.1 Python推理 ### 4.1 Python推理
首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/rec_r45_visionlan_train.tar)),可以使用如下命令进行转换: 首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)),可以使用如下命令进行转换:
```shell ```shell
# 注意将pretrained_model的路径设置为本地路径。 # 注意将pretrained_model的路径设置为本地路径。
...@@ -139,7 +139,7 @@ Predicts of ./doc/imgs_words/en/word_2.png:('yourself', 0.9999493) ...@@ -139,7 +139,7 @@ Predicts of ./doc/imgs_words/en/word_2.png:('yourself', 0.9999493)
## 5. FAQ ## 5. FAQ
1. MJSynth和SynthText两种数据集来自于[VisionLAN源repo](https://github.com/wangyuxin87/VisionLAN) 1. MJSynth和SynthText两种数据集来自于[VisionLAN源repo](https://github.com/wangyuxin87/VisionLAN)
2. 我们使用VisionLAN作者提供的预训练模型进行finetune训练。 2. 我们使用VisionLAN作者提供的预训练模型进行finetune训练,预训练模型配套字典为'ppocr/utils/ic15_dict.txt'
## 引用 ## 引用
......
...@@ -100,6 +100,10 @@ PaddleOCR提供的配置文件是在8卡训练(相当于总的batch size是`8* ...@@ -100,6 +100,10 @@ PaddleOCR提供的配置文件是在8卡训练(相当于总的batch size是`8*
* 数据分布:建议分布与实测场景尽量一致。如果实测场景包含大量短文本,则训练数据中建议也包含较多短文本,如果实测场景对于空格识别效果要求较高,则训练数据中建议也包含较多带空格的文本内容。 * 数据分布:建议分布与实测场景尽量一致。如果实测场景包含大量短文本,则训练数据中建议也包含较多短文本,如果实测场景对于空格识别效果要求较高,则训练数据中建议也包含较多带空格的文本内容。
* 数据合成:针对部分字符识别有误的情况,建议获取一批特定字符数据,加入到原数据中使用小学习率微调。其中原始数据与新增数据比例可尝试 10:1 ~ 5:1, 避免单一场景数据过多导致模型过拟合,同时尽量平衡语料词频,确保常用字的出现频率不会过低。
特定字符生成可以使用 TextRenderer 工具,合成例子可参考 [数码管数据合成](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/applications/%E5%85%89%E5%8A%9F%E7%8E%87%E8%AE%A1%E6%95%B0%E7%A0%81%E7%AE%A1%E5%AD%97%E7%AC%A6%E8%AF%86%E5%88%AB/%E5%85%89%E5%8A%9F%E7%8E%87%E8%AE%A1%E6%95%B0%E7%A0%81%E7%AE%A1%E5%AD%97%E7%AC%A6%E8%AF%86%E5%88%AB.md#31-%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87)
,合成数据语料尽量来自真实使用场景,在贴近真实场景的基础上保持字体、背景的丰富性,有助于提升模型效果。
* 通用中英文数据:在训练的时候,可以在训练集中添加通用真实数据(如在不更换字典的微调场景中,建议添加LSVT、RCTW、MTWI等真实数据),进一步提升模型的泛化性能。 * 通用中英文数据:在训练的时候,可以在训练集中添加通用真实数据(如在不更换字典的微调场景中,建议添加LSVT、RCTW、MTWI等真实数据),进一步提升模型的泛化性能。
...@@ -168,3 +172,8 @@ Train: ...@@ -168,3 +172,8 @@ Train:
- general.txt - general.txt
ratio_list: [1.0, 0.1] ratio_list: [1.0, 0.1]
``` ```
### 3.4 训练调优
训练过程并非一蹴而就的,完成一个阶段的训练评估后,建议收集分析当前模型在真实场景中的 badcase,有针对性的调整训练数据比例,或者进一步新增合成数据。
通过多次迭代训练,不断优化模型效果。
...@@ -87,9 +87,9 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.9956803321838379) ...@@ -87,9 +87,9 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.9956803321838379)
``` ```
# 下载英文数字识别模型: # 下载英文数字识别模型:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar
tar xf en_PP-OCRv3_det_infer.tar tar xf en_PP-OCRv3_rec_infer.tar
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./en_PP-OCRv3_det_infer/" --rec_char_dict_path="ppocr/utils/en_dict.txt" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./en_PP-OCRv3_rec_infer/" --rec_char_dict_path="ppocr/utils/en_dict.txt"
``` ```
![](../imgs_words/en/word_1.png) ![](../imgs_words/en/word_1.png)
......
...@@ -96,9 +96,9 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r ...@@ -96,9 +96,9 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
|SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) | |SVTR|SVTR-Tiny| 89.25% | rec_svtr_tiny_none_ctc_en | [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) |
|ViTSTR|ViTSTR| 79.82% | rec_vitstr_none_ce | [trained model](https://paddleocr.bj.bcebos.com/rec_vitstr_none_none_train.tar) | |ViTSTR|ViTSTR| 79.82% | rec_vitstr_none_ce | [trained model](https://paddleocr.bj.bcebos.com/rec_vitstr_none_none_train.tar) |
|ABINet|Resnet45| 90.75% | rec_r45_abinet | [trained model](https://paddleocr.bj.bcebos.com/rec_r45_abinet_train.tar) | |ABINet|Resnet45| 90.75% | rec_r45_abinet | [trained model](https://paddleocr.bj.bcebos.com/rec_r45_abinet_train.tar) |
|VisionLAN|Resnet45| 90.30% | rec_r45_visionlan | [trained model](https://paddleocr.bj.bcebos.com/rec_r45_visionlan_train.tar) | |VisionLAN|Resnet45| 90.30% | rec_r45_visionlan | [trained model](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar) |
|SPIN|ResNet32| 90.00% | rec_r32_gaspin_bilstm_att | coming soon | |SPIN|ResNet32| 90.00% | rec_r32_gaspin_bilstm_att | [trained model](https://paddleocr.bj.bcebos.com/contribution/rec_r32_gaspin_bilstm_att.tar) |
|RobustScanner|ResNet31| 87.77% | rec_r31_robustscanner | coming soon | |RobustScanner|ResNet31| 87.77% | rec_r31_robustscanner | [trained model](https://paddleocr.bj.bcebos.com/contribution/rec_r31_robustscanner.tar)|
<a name="2"></a> <a name="2"></a>
......
...@@ -25,7 +25,7 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval ...@@ -25,7 +25,7 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval
|Model|Backbone|config|Acc|Download link| |Model|Backbone|config|Acc|Download link|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.3%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/rec_r45_visionlan_train.tar)| |VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.3%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)|
<a name="2"></a> <a name="2"></a>
## 2. Environment ## 2. Environment
...@@ -68,7 +68,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_r45_visionlan.yml -o Global.infer_ ...@@ -68,7 +68,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_r45_visionlan.yml -o Global.infer_
<a name="4-1"></a> <a name="4-1"></a>
### 4.1 Python Inference ### 4.1 Python Inference
First, the model saved during the VisionLAN text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/rec_r45_visionlan_train.tar)) ), you can use the following command to convert: First, the model saved during the VisionLAN text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)) ), you can use the following command to convert:
``` ```
python3 tools/export_model.py -c configs/rec/rec_r45_visionlan.yml -o Global.pretrained_model=./rec_r45_visionlan_train/best_accuracy Global.save_inference_dir=./inference/rec_r45_visionlan/ python3 tools/export_model.py -c configs/rec/rec_r45_visionlan.yml -o Global.pretrained_model=./rec_r45_visionlan_train/best_accuracy Global.save_inference_dir=./inference/rec_r45_visionlan/
...@@ -120,7 +120,7 @@ Not supported ...@@ -120,7 +120,7 @@ Not supported
## 5. FAQ ## 5. FAQ
1. Note that the MJSynth and SynthText datasets come from [VisionLAN repo](https://github.com/wangyuxin87/VisionLAN). 1. Note that the MJSynth and SynthText datasets come from [VisionLAN repo](https://github.com/wangyuxin87/VisionLAN).
2. We use the pre-trained model provided by the VisionLAN authors for finetune training. 2. We use the pre-trained model provided by the VisionLAN authors for finetune training. The dictionary for the pre-trained model is 'ppocr/utils/ic15_dict.txt'.
## Citation ## Citation
......
...@@ -64,7 +64,7 @@ For more details, please refer to [PP-OCRv3 technical report](https://arxiv.org/ ...@@ -64,7 +64,7 @@ For more details, please refer to [PP-OCRv3 technical report](https://arxiv.org/
For the performance comparison between PP-OCR series models, please check the [benchmark](./benchmark_en.md) documentation. For the performance comparison between PP-OCR series models, please check the [benchmark](./benchmark_en.md) documentation.
<a name="4"></a> <a name="4"></a>
## 4. Visualization [more](./visualization.md) ## 4. Visualization [more](./visualization_en.md)
<details open> <details open>
<summary>PP-OCRv3 Chinese model</summary> <summary>PP-OCRv3 Chinese model</summary>
......
...@@ -26,7 +26,7 @@ ...@@ -26,7 +26,7 @@
### 1.1 DataSet Preparation ### 1.1 DataSet Preparation
To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets.md) . To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets_en.md) .
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways: PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
......
...@@ -47,7 +47,7 @@ __all__ = [ ...@@ -47,7 +47,7 @@ __all__ = [
] ]
SUPPORT_DET_MODEL = ['DB'] SUPPORT_DET_MODEL = ['DB']
VERSION = '2.6.0.1' VERSION = '2.6.0.3'
SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet'] SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet']
BASE_DIR = os.path.expanduser("~/.paddleocr/") BASE_DIR = os.path.expanduser("~/.paddleocr/")
......
...@@ -1344,8 +1344,6 @@ class VLLabelEncode(BaseRecLabelEncode): ...@@ -1344,8 +1344,6 @@ class VLLabelEncode(BaseRecLabelEncode):
**kwargs): **kwargs):
super(VLLabelEncode, self).__init__( super(VLLabelEncode, self).__init__(
max_text_length, character_dict_path, use_space_char, lower) max_text_length, character_dict_path, use_space_char, lower)
self.character = self.character[10:] + self.character[
1:10] + [self.character[0]]
self.dict = {} self.dict = {}
for i, char in enumerate(self.character): for i, char in enumerate(self.character):
self.dict[char] = i self.dict[char] = i
......
...@@ -26,6 +26,7 @@ class BaseRecLabelDecode(object): ...@@ -26,6 +26,7 @@ class BaseRecLabelDecode(object):
self.end_str = "eos" self.end_str = "eos"
self.reverse = False self.reverse = False
self.character_str = [] self.character_str = []
if character_dict_path is None: if character_dict_path is None:
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str) dict_character = list(self.character_str)
...@@ -720,8 +721,6 @@ class VLLabelDecode(BaseRecLabelDecode): ...@@ -720,8 +721,6 @@ class VLLabelDecode(BaseRecLabelDecode):
super(VLLabelDecode, self).__init__(character_dict_path, use_space_char) super(VLLabelDecode, self).__init__(character_dict_path, use_space_char)
self.max_text_length = kwargs.get('max_text_length', 25) self.max_text_length = kwargs.get('max_text_length', 25)
self.nclass = len(self.character) + 1 self.nclass = len(self.character) + 1
self.character = self.character[10:] + self.character[
1:10] + [self.character[0]]
def decode(self, text_index, text_prob=None, is_remove_duplicate=False): def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
""" convert text-index into text-label. """ """ convert text-index into text-label. """
......
...@@ -51,20 +51,22 @@ def draw_ser_results(image, ...@@ -51,20 +51,22 @@ def draw_ser_results(image,
bbox = trans_poly_to_bbox(ocr_info["points"]) bbox = trans_poly_to_bbox(ocr_info["points"])
draw_box_txt(bbox, text, draw, font, font_size, color) draw_box_txt(bbox, text, draw, font, font_size, color)
img_new = Image.blend(image, img_new, 0.5) img_new = Image.blend(image, img_new, 0.7)
return np.array(img_new) return np.array(img_new)
def draw_box_txt(bbox, text, draw, font, font_size, color): def draw_box_txt(bbox, text, draw, font, font_size, color):
# draw ocr results outline # draw ocr results outline
bbox = ((bbox[0], bbox[1]), (bbox[2], bbox[3])) bbox = ((bbox[0], bbox[1]), (bbox[2], bbox[3]))
draw.rectangle(bbox, fill=color) draw.rectangle(bbox, fill=color)
# draw ocr results # draw ocr results
start_y = max(0, bbox[0][1] - font_size)
tw = font.getsize(text)[0] tw = font.getsize(text)[0]
th = font.getsize(text)[1]
start_y = max(0, bbox[0][1] - th)
draw.rectangle( draw.rectangle(
[(bbox[0][0] + 1, start_y), (bbox[0][0] + tw + 1, start_y + font_size)], [(bbox[0][0] + 1, start_y), (bbox[0][0] + tw + 1, start_y + th)],
fill=(0, 0, 255)) fill=(0, 0, 255))
draw.text((bbox[0][0] + 1, start_y), text, fill=(255, 255, 255), font=font) draw.text((bbox[0][0] + 1, start_y), text, fill=(255, 255, 255), font=font)
......
...@@ -15,15 +15,15 @@ English | [简体中文](README_ch.md) ...@@ -15,15 +15,15 @@ English | [简体中文](README_ch.md)
PP-Structure is an intelligent document analysis system developed by the PaddleOCR team, which aims to help developers better complete tasks related to document understanding such as layout analysis and table recognition. PP-Structure is an intelligent document analysis system developed by the PaddleOCR team, which aims to help developers better complete tasks related to document understanding such as layout analysis and table recognition.
The pipeline of PP-Structurev2 system is shown below. The document image first passes through the image direction correction module to identify the direction of the entire image and complete the direction correction. Then, two tasks of layout information analysis and key information extraction can be completed. The pipeline of PP-StructureV2 system is shown below. The document image first passes through the image direction correction module to identify the direction of the entire image and complete the direction correction. Then, two tasks of layout information analysis and key information extraction can be completed.
- In the layout analysis task, the image first goes through the layout analysis model to divide the image into different areas such as text, table, and figure, and then analyze these areas separately. For example, the table area is sent to the form recognition module for structured recognition, and the text area is sent to the OCR engine for text recognition. Finally, the layout recovery module restores it to a word or pdf file with the same layout as the original image; - In the layout analysis task, the image first goes through the layout analysis model to divide the image into different areas such as text, table, and figure, and then analyze these areas separately. For example, the table area is sent to the form recognition module for structured recognition, and the text area is sent to the OCR engine for text recognition. Finally, the layout recovery module restores it to a word or pdf file with the same layout as the original image;
- In the key information extraction task, the OCR engine is first used to extract the text content, and then the SER(semantic entity recognition) module obtains the semantic entities in the image, and finally the RE(relationship extraction) module obtains the correspondence between the semantic entities, thereby extracting the required key information. - In the key information extraction task, the OCR engine is first used to extract the text content, and then the SER(semantic entity recognition) module obtains the semantic entities in the image, and finally the RE(relationship extraction) module obtains the correspondence between the semantic entities, thereby extracting the required key information.
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/> <img src="https://user-images.githubusercontent.com/14270174/195265734-6f4b5a7f-59b1-4fcc-af6d-89afc9bd51e1.jpg" width="100%"/>
More technical details: 👉 [PP-Structurev2 Technical Report](docs/PP-Structurev2_introduction.md) More technical details: 👉 [PP-StructureV2 Technical Report](https://arxiv.org/abs/2210.05391)
PP-Structurev2 supports independent use or flexible collocation of each module. For example, you can use layout analysis alone or table recognition alone. Click the corresponding link below to get the tutorial for each independent module: PP-StructureV2 supports independent use or flexible collocation of each module. For example, you can use layout analysis alone or table recognition alone. Click the corresponding link below to get the tutorial for each independent module:
- [Layout Analysis](layout/README.md) - [Layout Analysis](layout/README.md)
- [Table Recognition](table/README.md) - [Table Recognition](table/README.md)
...@@ -32,7 +32,7 @@ PP-Structurev2 supports independent use or flexible collocation of each module. ...@@ -32,7 +32,7 @@ PP-Structurev2 supports independent use or flexible collocation of each module.
## 2. Features ## 2. Features
The main features of PP-Structurev2 are as follows: The main features of PP-StructureV2 are as follows:
- Support layout analysis of documents in the form of images/pdfs, which can be divided into areas such as **text, titles, tables, figures, formulas, etc.**; - Support layout analysis of documents in the form of images/pdfs, which can be divided into areas such as **text, titles, tables, figures, formulas, etc.**;
- Support common Chinese and English **table detection** tasks; - Support common Chinese and English **table detection** tasks;
- Support structured table recognition, and output the final result to **Excel file**; - Support structured table recognition, and output the final result to **Excel file**;
...@@ -43,7 +43,7 @@ The main features of PP-Structurev2 are as follows: ...@@ -43,7 +43,7 @@ The main features of PP-Structurev2 are as follows:
## 3. Results ## 3. Results
PP-Structurev2 supports the independent use or flexible collocation of each module. For example, layout analysis can be used alone, or table recognition can be used alone. Only the visualization effects of several representative usage methods are shown here. PP-StructureV2 supports the independent use or flexible collocation of each module. For example, layout analysis can be used alone, or table recognition can be used alone. Only the visualization effects of several representative usage methods are shown here.
### 3.1 Layout analysis and table recognition ### 3.1 Layout analysis and table recognition
...@@ -62,7 +62,7 @@ The following figure shows the effect of layout recovery based on the results of ...@@ -62,7 +62,7 @@ The following figure shows the effect of layout recovery based on the results of
Different colored boxes in the figure represent different categories. Different colored boxes in the figure represent different categories.
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600"> <img src="https://user-images.githubusercontent.com/14270174/197464552-69de557f-edff-4c7f-acbf-069df1ba097f.png" width="600">
</div> </div>
<div align="center"> <div align="center">
...@@ -114,4 +114,3 @@ For structural analysis related model downloads, please refer to: ...@@ -114,4 +114,3 @@ For structural analysis related model downloads, please refer to:
For OCR related model downloads, please refer to: For OCR related model downloads, please refer to:
- [PP-OCR Model Zoo](../doc/doc_en/models_list_en.md) - [PP-OCR Model Zoo](../doc/doc_en/models_list_en.md)
...@@ -16,14 +16,15 @@ ...@@ -16,14 +16,15 @@
PP-Structure是PaddleOCR团队自研的智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别等文档理解相关任务。 PP-Structure是PaddleOCR团队自研的智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别等文档理解相关任务。
PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正模块,判断整图方向并完成转正,随后可以完成版面信息分析与关键信息抽取2类任务。 PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正模块,判断整图方向并完成转正,随后可以完成版面信息分析与关键信息抽取2类任务。
- 版面分析任务中,图像首先经过版面分析模型,将图像划分为文本、表格、图像等不同区域,随后对这些区域分别进行识别,如,将表格区域送入表格识别模块进行结构化识别,将文本区域送入OCR引擎进行文字识别,最后使用版面恢复模块将其恢复为与原始图像布局一致的word或者pdf格式的文件; - 版面分析任务中,图像首先经过版面分析模型,将图像划分为文本、表格、图像等不同区域,随后对这些区域分别进行识别,如,将表格区域送入表格识别模块进行结构化识别,将文本区域送入OCR引擎进行文字识别,最后使用版面恢复模块将其恢复为与原始图像布局一致的word或者pdf格式的文件;
- 关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。 - 关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/>
更多技术细节:👉 [PP-Structurev2技术报告](docs/PP-Structurev2_introduction.md) <img src="https://user-images.githubusercontent.com/14270174/195265734-6f4b5a7f-59b1-4fcc-af6d-89afc9bd51e1.jpg" width="100%"/>
PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,点击下面相应链接获取各个独立模块的使用教程: 更多技术细节:👉 PP-StructureV2技术报告 [中文版](docs/PP-StructureV2_introduction.md)[英文版](https://arxiv.org/abs/2210.05391)
PP-StructureV2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,点击下面相应链接获取各个独立模块的使用教程:
- [版面分析](layout/README_ch.md) - [版面分析](layout/README_ch.md)
- [表格识别](table/README_ch.md) - [表格识别](table/README_ch.md)
...@@ -33,7 +34,7 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独 ...@@ -33,7 +34,7 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独
<a name="2"></a> <a name="2"></a>
## 2. 特性 ## 2. 特性
PP-Structurev2的主要特性如下: PP-StructureV2的主要特性如下:
- 支持对图片/pdf形式的文档进行版面分析,可以划分**文字、标题、表格、图片、公式等**区域; - 支持对图片/pdf形式的文档进行版面分析,可以划分**文字、标题、表格、图片、公式等**区域;
- 支持通用的中英文**表格检测**任务; - 支持通用的中英文**表格检测**任务;
- 支持表格区域进行结构化识别,最终结果输出**Excel文件** - 支持表格区域进行结构化识别,最终结果输出**Excel文件**
...@@ -44,7 +45,7 @@ PP-Structurev2的主要特性如下: ...@@ -44,7 +45,7 @@ PP-Structurev2的主要特性如下:
<a name="3"></a> <a name="3"></a>
## 3. 效果展示 ## 3. 效果展示
PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,这里仅展示几种代表性使用方式的可视化效果。 PP-StructureV2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,这里仅展示几种代表性使用方式的可视化效果。
<a name="31"></a> <a name="31"></a>
### 3.1 版面分析和表格识别 ### 3.1 版面分析和表格识别
...@@ -77,7 +78,7 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独 ...@@ -77,7 +78,7 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独
</div> </div>
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600"> <img src="https://user-images.githubusercontent.com/14270174/197464552-69de557f-edff-4c7f-acbf-069df1ba097f.png" width="600">
</div> </div>
<div align="center"> <div align="center">
...@@ -119,4 +120,3 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独 ...@@ -119,4 +120,3 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独
OCR相关模型下载可以参考: OCR相关模型下载可以参考:
- [PP-OCR 模型库](../doc/doc_ch/models_list.md) - [PP-OCR 模型库](../doc/doc_ch/models_list.md)
# PP-Structurev2 # PP-StructureV2
## 目录 ## 目录
...@@ -16,11 +16,11 @@ ...@@ -16,11 +16,11 @@
现实场景中包含大量的文档图像,它们以图片等非结构化形式存储。基于文档图像的结构化分析与信息抽取对于数据的数字化存储以及产业的数字化转型至关重要。基于该考虑,PaddleOCR自研并发布了PP-Structure智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别、关键信息抽取等文档理解相关任务。 现实场景中包含大量的文档图像,它们以图片等非结构化形式存储。基于文档图像的结构化分析与信息抽取对于数据的数字化存储以及产业的数字化转型至关重要。基于该考虑,PaddleOCR自研并发布了PP-Structure智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别、关键信息抽取等文档理解相关任务。
近期,PaddleOCR团队针对PP-Structurev1的版面分析、表格识别、关键信息抽取模块,进行了共计8个方面的升级,同时新增整图方向矫正、文档复原等功能,打造出一个全新的、效果更优的文档分析系统:PP-Structurev2。 近期,PaddleOCR团队针对PP-Structurev1的版面分析、表格识别、关键信息抽取模块,进行了共计8个方面的升级,同时新增整图方向矫正、文档复原等功能,打造出一个全新的、效果更优的文档分析系统:PP-StructureV2。
## 2. 简介 ## 2. 简介
PP-Structurev2在PP-Structurev1的基础上进一步改进,主要有以下3个方面升级: PP-StructureV2在PP-Structurev1的基础上进一步改进,主要有以下3个方面升级:
* **系统功能升级** :新增图像矫正和版面复原模块,图像转word/pdf、关键信息抽取能力全覆盖! * **系统功能升级** :新增图像矫正和版面复原模块,图像转word/pdf、关键信息抽取能力全覆盖!
* **系统性能优化** * **系统性能优化**
...@@ -29,7 +29,7 @@ PP-Structurev2在PP-Structurev1的基础上进一步改进,主要有以下3个 ...@@ -29,7 +29,7 @@ PP-Structurev2在PP-Structurev1的基础上进一步改进,主要有以下3个
* 关键信息抽取:设计视觉无关模型结构,语义实体识别精度提升**2.8%**,关系抽取精度提升**9.1%** * 关键信息抽取:设计视觉无关模型结构,语义实体识别精度提升**2.8%**,关系抽取精度提升**9.1%**
* **中文场景适配** :完成对版面分析与表格识别的中文场景适配,开源**开箱即用**的中文场景版面结构化模型! * **中文场景适配** :完成对版面分析与表格识别的中文场景适配,开源**开箱即用**的中文场景版面结构化模型!
PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正模块,判断整图方向并完成转正,随后可以完成版面信息分析与关键信息抽取2类任务。版面分析任务中,图像首先经过版面分析模型,将图像划分为文本、表格、图像等不同区域,随后对这些区域分别进行识别,如,将表格区域送入表格识别模块进行结构化识别,将文本区域送入OCR引擎进行文字识别,最后使用版面恢复模块将其恢复为与原始图像布局一致的word或者pdf格式的文件;关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。 PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正模块,判断整图方向并完成转正,随后可以完成版面信息分析与关键信息抽取2类任务。版面分析任务中,图像首先经过版面分析模型,将图像划分为文本、表格、图像等不同区域,随后对这些区域分别进行识别,如,将表格区域送入表格识别模块进行结构化识别,将文本区域送入OCR引擎进行文字识别,最后使用版面恢复模块将其恢复为与原始图像布局一致的word或者pdf格式的文件;关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185939247-57e53254-399c-46c4-a610-da4fa79232f5.png" width="1200"> <img src="https://user-images.githubusercontent.com/14270174/185939247-57e53254-399c-46c4-a610-da4fa79232f5.png" width="1200">
...@@ -62,7 +62,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正 ...@@ -62,7 +62,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正
## 3. 整图方向矫正 ## 3. 整图方向矫正
由于训练集一般以正方向图像为主,旋转过的文档图像直接输入模型会增加识别难度,影响识别效果。PP-Structurev2引入了整图方向矫正模块来判断含文字图像的方向,并将其进行方向调整。 由于训练集一般以正方向图像为主,旋转过的文档图像直接输入模型会增加识别难度,影响识别效果。PP-StructureV2引入了整图方向矫正模块来判断含文字图像的方向,并将其进行方向调整。
我们直接调用PaddleClas中提供的文字图像方向分类模型-[PULC_text_image_orientation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/PULC/PULC_text_image_orientation.md),该模型部分数据集图像如下所示。不同于文本行方向分类器,文字图像方向分类模型针对整图进行方向判别。文字图像方向分类模型在验证集上精度高达99%,单张图像CPU预测耗时仅为`2.16ms` 我们直接调用PaddleClas中提供的文字图像方向分类模型-[PULC_text_image_orientation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/PULC/PULC_text_image_orientation.md),该模型部分数据集图像如下所示。不同于文本行方向分类器,文字图像方向分类模型针对整图进行方向判别。文字图像方向分类模型在验证集上精度高达99%,单张图像CPU预测耗时仅为`2.16ms`
...@@ -76,7 +76,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正 ...@@ -76,7 +76,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正
版面分析指的是对图片形式的文档进行区域划分,定位其中的关键区域,如文字、标题、表格、图片等,PP-Structurev1使用了PaddleDetection中开源的高效检测算法PP-YOLOv2完成版面分析的任务。 版面分析指的是对图片形式的文档进行区域划分,定位其中的关键区域,如文字、标题、表格、图片等,PP-Structurev1使用了PaddleDetection中开源的高效检测算法PP-YOLOv2完成版面分析的任务。
在PP-Structurev2中,我们发布基于PP-PicoDet的轻量级版面分析模型,并针对版面分析场景定制图像尺度,同时使用FGD知识蒸馏算法,进一步提升模型精度。最终CPU上`41ms`即可完成版面分析过程(仅包含模型推理时间,数据预处理耗时大约50ms左右)。在公开数据集PubLayNet 上,消融实验如下: 在PP-StructureV2中,我们发布基于PP-PicoDet的轻量级版面分析模型,并针对版面分析场景定制图像尺度,同时使用FGD知识蒸馏算法,进一步提升模型精度。最终CPU上`41ms`即可完成版面分析过程(仅包含模型推理时间,数据预处理耗时大约50ms左右)。在公开数据集PubLayNet 上,消融实验如下:
| 实验序号 | 策略 | 模型存储(M) | mAP | CPU预测耗时(ms) | | 实验序号 | 策略 | 模型存储(M) | mAP | CPU预测耗时(ms) |
|:------:|:------:|:------:|:------:|:------:| |:------:|:------:|:------:|:------:|:------:|
...@@ -95,7 +95,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正 ...@@ -95,7 +95,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正
| 模型 | mAP | CPU预测耗时 | | 模型 | mAP | CPU预测耗时 |
|-------------------|-----------|------------| |-------------------|-----------|------------|
| layoutparser (Detectron2) | 88.98% | 2.9s | | layoutparser (Detectron2) | 88.98% | 2.9s |
| PP-Structurev2 (PP-PicoDet) | **94%** | 41.2ms | | PP-StructureV2 (PP-PicoDet) | **94%** | 41.2ms |
[PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)数据集是一个大型的文档图像数据集,包含Text、Title、Tale、Figure、List,共5个类别。数据集中包含335,703张训练集、11,245张验证集和11,405张测试集。训练数据与标注示例图如下所示: [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)数据集是一个大型的文档图像数据集,包含Text、Title、Tale、Figure、List,共5个类别。数据集中包含335,703张训练集、11,245张验证集和11,405张测试集。训练数据与标注示例图如下所示:
...@@ -157,7 +157,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾 ...@@ -157,7 +157,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾
### 4.2 表格识别 ### 4.2 表格识别
基于深度学习的表格识别算法种类丰富,PP-Structurev1中,我们基于文本识别算法RARE研发了端到端表格识别算法TableRec-RARE,模型输出为表格结构的HTML表示,进而可以方便地转化为Excel文件。PP-Structurev2中,我们对模型结构和损失函数等5个方面进行升级,提出了 SLANet (Structure Location Alignment Network) ,模型结构如下图所示: 基于深度学习的表格识别算法种类丰富,PP-Structurev1中,我们基于文本识别算法RARE研发了端到端表格识别算法TableRec-RARE,模型输出为表格结构的HTML表示,进而可以方便地转化为Excel文件。PP-StructureV2中,我们对模型结构和损失函数等5个方面进行升级,提出了 SLANet (Structure Location Alignment Network) ,模型结构如下图所示:
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185940811-089c9265-4be9-4776-b365-6d1125606b4b.png" width="1200"> <img src="https://user-images.githubusercontent.com/14270174/185940811-089c9265-4be9-4776-b365-6d1125606b4b.png" width="1200">
...@@ -189,7 +189,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾 ...@@ -189,7 +189,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾
**(1) CPU友好型轻量级骨干网络PP-LCNet** **(1) CPU友好型轻量级骨干网络PP-LCNet**
PP-LCNet是结合Intel-CPU端侧推理特性而设计的轻量高性能骨干网络,该方案在图像分类任务上取得了比ShuffleNetV2、MobileNetV3、GhostNet等轻量级模型更优的“精度-速度”均衡。PP-Structurev2中,我们采用PP-LCNet作为骨干网络,表格识别模型精度从71.73%提升至72.98%;同时加载通过SSLD知识蒸馏方案训练得到的图像分类模型权重作为表格识别的预训练模型,最终精度进一步提升2.95%至74.71%。 PP-LCNet是结合Intel-CPU端侧推理特性而设计的轻量高性能骨干网络,该方案在图像分类任务上取得了比ShuffleNetV2、MobileNetV3、GhostNet等轻量级模型更优的“精度-速度”均衡。PP-StructureV2中,我们采用PP-LCNet作为骨干网络,表格识别模型精度从71.73%提升至72.98%;同时加载通过SSLD知识蒸馏方案训练得到的图像分类模型权重作为表格识别的预训练模型,最终精度进一步提升2.95%至74.71%。
**(2)轻量级高低层特征融合模块CSP-PAN** **(2)轻量级高低层特征融合模块CSP-PAN**
...@@ -199,7 +199,7 @@ PP-LCNet是结合Intel-CPU端侧推理特性而设计的轻量高性能骨干网 ...@@ -199,7 +199,7 @@ PP-LCNet是结合Intel-CPU端侧推理特性而设计的轻量高性能骨干网
TableRec-RARE的TableAttentionHead如下图a所示,TableAttentionHead在执行完全部step的计算后拿到最终隐藏层状态表征(hiddens),随后hiddens经由SDM(Structure Decode Module)和CLDM(Cell Location Decode Module)模块生成全部的表格结构token和单元格坐标。但是这种设计忽略了单元格token和坐标之间一一对应的关系。 TableRec-RARE的TableAttentionHead如下图a所示,TableAttentionHead在执行完全部step的计算后拿到最终隐藏层状态表征(hiddens),随后hiddens经由SDM(Structure Decode Module)和CLDM(Cell Location Decode Module)模块生成全部的表格结构token和单元格坐标。但是这种设计忽略了单元格token和坐标之间一一对应的关系。
PP-Structurev2中,我们设计SLAHead模块,对单元格token和坐标之间做了对齐操作,如下图b所示。在SLAHead中,每一个step的隐藏层状态表征会分别送入SDM和CLDM来得到当前step的token和坐标,每个step的token和坐标输出分别进行concat得到表格的html表达和全部单元格的坐标。此外,考虑到表格识别模型的单元格准确率依赖于表格结构的识别准确,我们将损失函数中表格结构分支与单元格定位分支的权重比从1:1提升到8:1,并使用收敛更稳定的Smoothl1 Loss替换定位分支中的MSE Loss。最终模型精度从75.68%提高至77.7%。 PP-StructureV2中,我们设计SLAHead模块,对单元格token和坐标之间做了对齐操作,如下图b所示。在SLAHead中,每一个step的隐藏层状态表征会分别送入SDM和CLDM来得到当前step的token和坐标,每个step的token和坐标输出分别进行concat得到表格的html表达和全部单元格的坐标。此外,考虑到表格识别模型的单元格准确率依赖于表格结构的识别准确,我们将损失函数中表格结构分支与单元格定位分支的权重比从1:1提升到8:1,并使用收敛更稳定的Smoothl1 Loss替换定位分支中的MSE Loss。最终模型精度从75.68%提高至77.7%。
<div align="center"> <div align="center">
...@@ -211,7 +211,7 @@ PP-Structurev2中,我们设计SLAHead模块,对单元格token和坐标之间 ...@@ -211,7 +211,7 @@ PP-Structurev2中,我们设计SLAHead模块,对单元格token和坐标之间
TableRec-RARE算法中,我们使用`<td>``</td>`两个单独的token来表示一个非跨行列单元格,这种表示方式限制了网络对于单元格数量较多表格的处理能力。 TableRec-RARE算法中,我们使用`<td>``</td>`两个单独的token来表示一个非跨行列单元格,这种表示方式限制了网络对于单元格数量较多表格的处理能力。
PP-Structurev2中,我们参考TableMaster中的token处理方法,将`<td>``</td>`合并为一个token-`<td></td>`。合并token后,验证集中token长度大于500的图片也参与模型评估,最终模型精度降低为76.31%,但是端到端TEDS提升1.04%。 PP-StructureV2中,我们参考TableMaster中的token处理方法,将`<td>``</td>`合并为一个token-`<td></td>`。合并token后,验证集中token长度大于500的图片也参与模型评估,最终模型精度降低为76.31%,但是端到端TEDS提升1.04%。
#### 4.2.2 中文场景适配 #### 4.2.2 中文场景适配
...@@ -249,7 +249,7 @@ PP-Structurev2中,我们参考TableMaster中的token处理方法,将`<td>` ...@@ -249,7 +249,7 @@ PP-Structurev2中,我们参考TableMaster中的token处理方法,将`<td>`
### 4.3 版面恢复 ### 4.3 版面恢复
版面恢复指的是文档图像经过OCR识别、版面分析、表格识别等方法处理后的内容可以与原始文档保持相同的排版方式,并输出到word等文档中。PP-Structurev2中,我们版面恢复系统,包含版面分析、表格识别、OCR文本检测与识别等子模块。 版面恢复指的是文档图像经过OCR识别、版面分析、表格识别等方法处理后的内容可以与原始文档保持相同的排版方式,并输出到word等文档中。PP-StructureV2中,我们版面恢复系统,包含版面分析、表格识别、OCR文本检测与识别等子模块。
下图展示了版面恢复的结果: 下图展示了版面恢复的结果:
<div align="center"> <div align="center">
...@@ -258,7 +258,7 @@ PP-Structurev2中,我们参考TableMaster中的token处理方法,将`<td>` ...@@ -258,7 +258,7 @@ PP-Structurev2中,我们参考TableMaster中的token处理方法,将`<td>`
## 5. 关键信息抽取 ## 5. 关键信息抽取
关键信息抽取指的是针对文档图像的文字内容,提取出用户关注的关键信息,如身份证中的姓名、住址等字段。PP-Structure中支持了基于多模态LayoutLM系列模型的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。PP-Structurev2中,我们对模型结构以及下游任务训练方法进行升级,提出了VI-LayoutXLM(Visual-feature Independent LayoutXLM),具体流程图如下所示。 关键信息抽取指的是针对文档图像的文字内容,提取出用户关注的关键信息,如身份证中的姓名、住址等字段。PP-Structure中支持了基于多模态LayoutLM系列模型的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。PP-StructureV2中,我们对模型结构以及下游任务训练方法进行升级,提出了VI-LayoutXLM(Visual-feature Independent LayoutXLM),具体流程图如下所示。
<div align="center"> <div align="center">
...@@ -394,7 +394,7 @@ RE任务的可视化结果如下所示。 ...@@ -394,7 +394,7 @@ RE任务的可视化结果如下所示。
| 实验序号 | 策略 | F1-score | | 实验序号 | 策略 | F1-score |
|:------:|:------:|:------:| |:------:|:------:|:------:|
| 1 | LayoutXLM | 82.28% | | 1 | LayoutXLM | 82.28% |
| 2 | PP-Structurev2 SER | **87.79%** | | 2 | PP-StructureV2 SER | **87.79%** |
**RE任务结果** **RE任务结果**
...@@ -402,7 +402,7 @@ RE任务的可视化结果如下所示。 ...@@ -402,7 +402,7 @@ RE任务的可视化结果如下所示。
| 实验序号 | 策略 | F1-score | | 实验序号 | 策略 | F1-score |
|:------:|:------:|:------:| |:------:|:------:|:------:|
| 1 | LayoutXLM | 53.13% | | 1 | LayoutXLM | 53.13% |
| 2 | PP-Structurev2 SER | **74.87%** | | 2 | PP-StructureV2 SER | **74.87%** |
## 6. Reference ## 6. Reference
......
...@@ -16,13 +16,13 @@ cd ppstructure ...@@ -16,13 +16,13 @@ cd ppstructure
下载模型 下载模型
```bash ```bash
mkdir inference && cd inference mkdir inference && cd inference
# 下载PP-Structurev2版面分析模型并解压 # 下载PP-StructureV2版面分析模型并解压
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar xf picodet_lcnet_x1_0_layout_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar xf picodet_lcnet_x1_0_layout_infer.tar
# 下载PP-OCRv3文本检测模型并解压 # 下载PP-OCRv3文本检测模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# 下载PP-OCRv3文本识别模型并解压 # 下载PP-OCRv3文本识别模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# 下载PP-Structurev2表格识别模型并解压 # 下载PP-StructureV2表格识别模型并解压
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd .. cd ..
``` ```
......
...@@ -18,13 +18,13 @@ download model ...@@ -18,13 +18,13 @@ download model
```bash ```bash
mkdir inference && cd inference mkdir inference && cd inference
# Download the PP-Structurev2 layout analysis model and unzip it # Download the PP-StructureV2 layout analysis model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar xf picodet_lcnet_x1_0_layout_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar xf picodet_lcnet_x1_0_layout_infer.tar
# Download the PP-OCRv3 text detection model and unzip it # Download the PP-OCRv3 text detection model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# Download the PP-OCRv3 text recognition model and unzip it # Download the PP-OCRv3 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# Download the PP-Structurev2 form recognition model and unzip it # Download the PP-StructureV2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd .. cd ..
``` ```
......
...@@ -49,12 +49,6 @@ pip3 install "paddleocr>=2.6" ...@@ -49,12 +49,6 @@ pip3 install "paddleocr>=2.6"
# 安装 图像方向分类依赖包paddleclas(如不需要图像方向分类功能,可跳过) # 安装 图像方向分类依赖包paddleclas(如不需要图像方向分类功能,可跳过)
pip3 install paddleclas>=2.4.3 pip3 install paddleclas>=2.4.3
# 安装 关键信息抽取 依赖包(如不需要KIE功能,可跳过)
pip3 install -r ppstructure/kie/requirements.txt
# 安装 版面恢复 依赖包(如不需要版面恢复功能,可跳过)
pip3 install -r ppstructure/recovery/requirements.txt
``` ```
<a name="2"></a> <a name="2"></a>
......
...@@ -51,12 +51,6 @@ pip3 install "paddleocr>=2.6" ...@@ -51,12 +51,6 @@ pip3 install "paddleocr>=2.6"
# Install the image direction classification dependency package paddleclas (if you do not use the image direction classification, you can skip it) # Install the image direction classification dependency package paddleclas (if you do not use the image direction classification, you can skip it)
pip3 install paddleclas>=2.4.3 pip3 install paddleclas>=2.4.3
# Install the KIE dependency packages (if you do not use the KIE, you can skip it)
pip3 install -r kie/requirements.txt
# Install the layout recovery dependency packages (if you do not use the layout recovery, you can skip it)
pip3 install -r recovery/requirements.txt
``` ```
<a name="2"></a> <a name="2"></a>
......
...@@ -42,7 +42,7 @@ ...@@ -42,7 +42,7 @@
## 2. 关键信息抽取任务流程 ## 2. 关键信息抽取任务流程
PaddleOCR中实现了LayoutXLM等算法(基于Token),同时,在PP-Structurev2中,对LayoutXLM多模态预训练模型的网络结构进行简化,去除了其中的Visual backbone部分,设计了视觉无关的VI-LayoutXLM模型,同时引入符合人类阅读顺序的排序逻辑以及UDML知识蒸馏策略,最终同时提升了关键信息抽取模型的精度与推理速度。 PaddleOCR中实现了LayoutXLM等算法(基于Token),同时,在PP-StructureV2中,对LayoutXLM多模态预训练模型的网络结构进行简化,去除了其中的Visual backbone部分,设计了视觉无关的VI-LayoutXLM模型,同时引入符合人类阅读顺序的排序逻辑以及UDML知识蒸馏策略,最终同时提升了关键信息抽取模型的精度与推理速度。
下面介绍怎样基于PaddleOCR完成关键信息抽取任务。 下面介绍怎样基于PaddleOCR完成关键信息抽取任务。
...@@ -115,7 +115,7 @@ Train: ...@@ -115,7 +115,7 @@ Train:
数据量方面,一般来说,对于比较固定的场景,**50张**左右的训练图片即可达到可以接受的效果,可以使用[PPOCRLabel](../../PPOCRLabel/README_ch.md)完成KIE的标注过程。 数据量方面,一般来说,对于比较固定的场景,**50张**左右的训练图片即可达到可以接受的效果,可以使用[PPOCRLabel](../../PPOCRLabel/README_ch.md)完成KIE的标注过程。
模型方面,推荐使用PP-Structurev2中提出的VI-LayoutXLM模型,它基于LayoutXLM模型进行改进,去除其中的视觉特征提取模块,在精度基本无损的情况下,进一步提升了模型推理速度。更多教程请参考:[VI-LayoutXLM算法介绍](../../doc/doc_ch/algorithm_kie_vi_layoutxlm.md)[KIE关键信息抽取使用教程](../../doc/doc_ch/kie.md) 模型方面,推荐使用PP-StructureV2中提出的VI-LayoutXLM模型,它基于LayoutXLM模型进行改进,去除其中的视觉特征提取模块,在精度基本无损的情况下,进一步提升了模型推理速度。更多教程请参考:[VI-LayoutXLM算法介绍](../../doc/doc_ch/algorithm_kie_vi_layoutxlm.md)[KIE关键信息抽取使用教程](../../doc/doc_ch/kie.md)
#### 2.2.2 SER + RE #### 2.2.2 SER + RE
...@@ -145,7 +145,7 @@ Train: ...@@ -145,7 +145,7 @@ Train:
数据量方面,一般来说,对于比较固定的场景,**50张**左右的训练图片即可达到可以接受的效果,可以使用PPOCRLabel完成KIE的标注过程。 数据量方面,一般来说,对于比较固定的场景,**50张**左右的训练图片即可达到可以接受的效果,可以使用PPOCRLabel完成KIE的标注过程。
模型方面,推荐使用PP-Structurev2中提出的VI-LayoutXLM模型,它基于LayoutXLM模型进行改进,去除其中的视觉特征提取模块,在精度基本无损的情况下,进一步提升了模型推理速度。更多教程请参考:[VI-LayoutXLM算法介绍](../../doc/doc_ch/algorithm_kie_vi_layoutxlm.md)[KIE关键信息抽取使用教程](../../doc/doc_ch/kie.md) 模型方面,推荐使用PP-StructureV2中提出的VI-LayoutXLM模型,它基于LayoutXLM模型进行改进,去除其中的视觉特征提取模块,在精度基本无损的情况下,进一步提升了模型推理速度。更多教程请参考:[VI-LayoutXLM算法介绍](../../doc/doc_ch/algorithm_kie_vi_layoutxlm.md)[KIE关键信息抽取使用教程](../../doc/doc_ch/kie.md)
## 3. 参考文献 ## 3. 参考文献
......
...@@ -48,7 +48,7 @@ For more detailed introduction of the algorithms, please refer to Chapter 6 of [ ...@@ -48,7 +48,7 @@ For more detailed introduction of the algorithms, please refer to Chapter 6 of [
## 2. KIE Pipeline ## 2. KIE Pipeline
Token based methods such as LayoutXLM are implemented in PaddleOCR. What's more, in PP-Structurev2, we simplify the LayoutXLM model and proposed VI-LayoutXLM, in which the visual feature extraction module is removed for speed-up. The textline sorting strategy conforming to the human reading order and UDML knowledge distillation strategy are utilized for higher model accuracy. Token based methods such as LayoutXLM are implemented in PaddleOCR. What's more, in PP-StructureV2, we simplify the LayoutXLM model and proposed VI-LayoutXLM, in which the visual feature extraction module is removed for speed-up. The textline sorting strategy conforming to the human reading order and UDML knowledge distillation strategy are utilized for higher model accuracy.
In the non end-to-end KIE method, KIE needs at least ** 2 steps**. Firstly, the OCR model is used to extract the text and its position. Secondly, the KIE model is used to extract the key information according to the image, text position and text content. In the non end-to-end KIE method, KIE needs at least ** 2 steps**. Firstly, the OCR model is used to extract the text and its position. Secondly, the KIE model is used to extract the key information according to the image, text position and text content.
...@@ -125,7 +125,7 @@ Take the ID card scenario as an example. The key information generally includes ...@@ -125,7 +125,7 @@ Take the ID card scenario as an example. The key information generally includes
In terms of data, generally speaking, for relatively fixed scenes, **50** training images can achieve acceptable effects. You can refer to [PPOCRLabel](../../PPOCRLabel/README.md) for finish the labeling process. In terms of data, generally speaking, for relatively fixed scenes, **50** training images can achieve acceptable effects. You can refer to [PPOCRLabel](../../PPOCRLabel/README.md) for finish the labeling process.
In terms of model, it is recommended to use the VI-layoutXLM model proposed in PP-Structurev2. It is improved based on the LayoutXLM model, removing the visual feature extraction module, and further improving the model inference speed without the significant reduction on model accuracy. For more tutorials, please refer to [VI-LayoutXLM introduction](../../doc/doc_en/algorithm_kie_vi_layoutxlm_en.md) and [KIE tutorial](../../doc/doc_en/kie_en.md). In terms of model, it is recommended to use the VI-layoutXLM model proposed in PP-StructureV2. It is improved based on the LayoutXLM model, removing the visual feature extraction module, and further improving the model inference speed without the significant reduction on model accuracy. For more tutorials, please refer to [VI-LayoutXLM introduction](../../doc/doc_en/algorithm_kie_vi_layoutxlm_en.md) and [KIE tutorial](../../doc/doc_en/kie_en.md).
#### 2.2.2 SER + RE #### 2.2.2 SER + RE
...@@ -155,7 +155,7 @@ For each textline, you need to add 'ID' and 'linking' field information. The 'ID ...@@ -155,7 +155,7 @@ For each textline, you need to add 'ID' and 'linking' field information. The 'ID
In terms of data, generally speaking, for relatively fixed scenes, about **50** training images can achieve acceptable effects. In terms of data, generally speaking, for relatively fixed scenes, about **50** training images can achieve acceptable effects.
In terms of model, it is recommended to use the VI-layoutXLM model proposed in PP-Structurev2. It is improved based on the LayoutXLM model, removing the visual feature extraction module, and further improving the model inference speed without the significant reduction on model accuracy. For more tutorials, please refer to [VI-LayoutXLM introduction](../../doc/doc_en/algorithm_kie_vi_layoutxlm_en.md) and [KIE tutorial](../../doc/doc_en/kie_en.md). In terms of model, it is recommended to use the VI-layoutXLM model proposed in PP-StructureV2. It is improved based on the LayoutXLM model, removing the visual feature extraction module, and further improving the model inference speed without the significant reduction on model accuracy. For more tutorials, please refer to [VI-LayoutXLM introduction](../../doc/doc_en/algorithm_kie_vi_layoutxlm_en.md) and [KIE tutorial](../../doc/doc_en/kie_en.md).
......
...@@ -66,7 +66,7 @@ mkdir inference && cd inference ...@@ -66,7 +66,7 @@ mkdir inference && cd inference
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# Download the PP-OCRv3 text recognition model and unzip it # Download the PP-OCRv3 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# Download the PP-Structurev2 form recognition model and unzip it # Download the PP-StructureV2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd .. cd ..
# run # run
......
...@@ -71,7 +71,7 @@ mkdir inference && cd inference ...@@ -71,7 +71,7 @@ mkdir inference && cd inference
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# 下载PP-OCRv3文本识别模型并解压 # 下载PP-OCRv3文本识别模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# 下载PP-Structurev2中文表格识别模型并解压 # 下载PP-StructureV2中文表格识别模型并解压
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd .. cd ..
# 执行表格识别 # 执行表格识别
......
...@@ -16,9 +16,17 @@ from setuptools import setup ...@@ -16,9 +16,17 @@ from setuptools import setup
from io import open from io import open
from paddleocr import VERSION from paddleocr import VERSION
with open('requirements.txt', encoding="utf-8-sig") as f:
requirements = f.readlines() def load_requirements(file_list=None):
requirements.append('tqdm') if file_list is None:
file_list = ['requirements.txt']
if isinstance(file_list, str):
file_list = [file_list]
requirements = []
for file in file_list:
with open(file, encoding="utf-8-sig") as f:
requirements.extend(f.readlines())
return requirements
def readme(): def readme():
...@@ -34,7 +42,8 @@ setup( ...@@ -34,7 +42,8 @@ setup(
include_package_data=True, include_package_data=True,
entry_points={"console_scripts": ["paddleocr= paddleocr.paddleocr:main"]}, entry_points={"console_scripts": ["paddleocr= paddleocr.paddleocr:main"]},
version=VERSION, version=VERSION,
install_requires=requirements, install_requires=load_requirements(
['requirements.txt', 'ppstructure/recovery/requirements.txt']),
license='Apache License 2.0', license='Apache License 2.0',
description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices', description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices',
long_description=readme(), long_description=readme(),
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册