提交 2976dab9 编写于 作者: A an1018

Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into add_layout_hub

...@@ -26,17 +26,16 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools ...@@ -26,17 +26,16 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
</div> </div>
## Recent updates ## Recent updates
- **🔥2022.8.24 Release PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- Release [PP-Structurev2](./ppstructure/),with functions and performance fully upgraded, adapted to Chinese scenes, and new support for [Layout Recovery](./ppstructure/recovery) and [PDF to Word]();
- [Layout Analysis](./ppstructure/layout) optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms;
- [Table Recognition](./ppstructure/table) optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption;
- [Key Information Extraction](./ppstructure/kie) optimization:a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%.
- **🔥2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)** - **🔥2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- Release [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%. - Release [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
- Release [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image. - Release [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image.
- Release interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology. - Release interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology.
- 2021.12.21 Release PaddleOCR [release/2.4](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4)
- Release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR).
- Release 1 key information extraction algorithm (SDMGR, [tutorial](./ppstructure/docs/kie_en.md)) and 3 [DocVQA](./ppstructure/vqa) algorithms (LayoutLM, LayoutLMv2, LayoutXLM).
- 2021.9.7 Release PaddleOCR [release/2.3](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.3)
- Release [PP-OCRv2](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv2). The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 Release PaddleOCR [release/2.2](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.2)
- Release a new structured documents analysis toolkit, i.e., [PP-Structure](./ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- [more](./doc/doc_en/update_en.md) - [more](./doc/doc_en/update_en.md)
...@@ -45,7 +44,9 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools ...@@ -45,7 +44,9 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution [PP-OCR](./doc/doc_en/ppocr_introduction_en.md) and [PP-Structure](./ppstructure/README.md) on this basis, and get through the whole process of data production, model training, compression, inference and deployment. PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution [PP-OCR](./doc/doc_en/ppocr_introduction_en.md) and [PP-Structure](./ppstructure/README.md) on this basis, and get through the whole process of data production, model training, compression, inference and deployment.
![](./doc/features_en.png) <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186171245-40abc4d7-904f-4949-ade1-250f86ed3a90.png">
</div>
> It is recommended to start with the “quick experience” in the document tutorial > It is recommended to start with the “quick experience” in the document tutorial
...@@ -113,18 +114,19 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -113,18 +114,19 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- [Quick Start](./ppstructure/docs/quickstart_en.md) - [Quick Start](./ppstructure/docs/quickstart_en.md)
- [Model Zoo](./ppstructure/docs/models_list_en.md) - [Model Zoo](./ppstructure/docs/models_list_en.md)
- [Model training](./doc/doc_en/training_en.md) - [Model training](./doc/doc_en/training_en.md)
- [Layout Parser](./ppstructure/layout/README.md) - [Layout Analysis](./ppstructure/layout/README.md)
- [Table Recognition](./ppstructure/table/README.md) - [Table Recognition](./ppstructure/table/README.md)
- [DocVQA](./ppstructure/vqa/README.md) - [Key Information Extraction](./ppstructure/kie/README.md)
- [Key Information Extraction](./ppstructure/docs/kie_en.md)
- [Inference and Deployment](./deploy/README.md) - [Inference and Deployment](./deploy/README.md)
- [Python Inference](./ppstructure/docs/inference_en.md) - [Python Inference](./ppstructure/docs/inference_en.md)
- [C++ Inference]() - [C++ Inference](./deploy/cpp_infer/readme.md)
- [Serving](./deploy/pdserving/README.md) - [Serving](./deploy/pdserving/README.md)
- [Academic algorithms](./doc/doc_en/algorithms_en.md) - [Academic Algorithms](./doc/doc_en/algorithm_overview_en.md)
- [Text detection](./doc/doc_en/algorithm_overview_en.md) - [Text detection](./doc/doc_en/algorithm_overview_en.md)
- [Text recognition](./doc/doc_en/algorithm_overview_en.md) - [Text recognition](./doc/doc_en/algorithm_overview_en.md)
- [End-to-end](./doc/doc_en/algorithm_overview_en.md) - [End-to-end OCR](./doc/doc_en/algorithm_overview_en.md)
- [Table Recognition](./doc/doc_en/algorithm_overview_en.md)
- [Key Information Extraction](./doc/doc_en/algorithm_overview_en.md)
- [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md) - [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md)
- Data Annotation and Synthesis - Data Annotation and Synthesis
- [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md) - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
...@@ -135,9 +137,9 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -135,9 +137,9 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- [General OCR Datasets(Chinese/English)](doc/doc_en/dataset/datasets_en.md) - [General OCR Datasets(Chinese/English)](doc/doc_en/dataset/datasets_en.md)
- [HandWritten_OCR_Datasets(Chinese)](doc/doc_en/dataset/handwritten_datasets_en.md) - [HandWritten_OCR_Datasets(Chinese)](doc/doc_en/dataset/handwritten_datasets_en.md)
- [Various OCR Datasets(multilingual)](doc/doc_en/dataset/vertical_and_multilingual_datasets_en.md) - [Various OCR Datasets(multilingual)](doc/doc_en/dataset/vertical_and_multilingual_datasets_en.md)
- [layout analysis](doc/doc_en/dataset/layout_datasets_en.md) - [Layout Analysis](doc/doc_en/dataset/layout_datasets_en.md)
- [table recognition](doc/doc_en/dataset/table_datasets_en.md) - [Table Recognition](doc/doc_en/dataset/table_datasets_en.md)
- [DocVQA](doc/doc_en/dataset/docvqa_datasets_en.md) - [Key Information Extraction](doc/doc_en/dataset/kie_datasets_en.md)
- [Code Structure](./doc/doc_en/tree_en.md) - [Code Structure](./doc/doc_en/tree_en.md)
- [Visualization](#Visualization) - [Visualization](#Visualization)
- [Community](#Community) - [Community](#Community)
...@@ -176,7 +178,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -176,7 +178,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
</details> </details>
<details open> <details open>
<summary>PP-Structure</summary> <summary>PP-Structurev2</summary>
- layout analysis + table recognition - layout analysis + table recognition
<div align="center"> <div align="center">
...@@ -185,12 +187,28 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ...@@ -185,12 +187,28 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- SER (Semantic entity recognition) - SER (Semantic entity recognition)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_ser/zh_val_0_ser.jpg" width="800"> <img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div> </div>
- RE (Relation Extraction) - RE (Relation Extraction)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_re/zh_val_21_re.jpg" width="800"> <img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div> </div>
</details> </details>
......
...@@ -27,21 +27,20 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -27,21 +27,20 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 近期更新 ## 近期更新
- **🔥2022.7 发布[OCR场景应用集合](./applications)** - **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- 发布OCR场景应用集合,包含数码管、液晶屏、车牌、高精度SVTR模型等**7个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。 - 发布[PP-Structurev2](./ppstructure/),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery)[PDF转Word]();
- [版面分析](./ppstructure/layout)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms;
- [表格识别](./ppstructure/table)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%;
- [关键信息抽取](./ppstructure/kie)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。
- **🔥2022.5.9 发布PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)** - **🔥2022.8 发布 [OCR场景应用集合](./applications)**
- 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。
- **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上; - 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
- 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能; - 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
- 发布OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求; - 发布OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求;
- 发布交互式OCR开源电子书[《动手学OCR》](./doc/doc_ch/ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。 - 发布交互式OCR开源电子书[《动手学OCR》](./doc/doc_ch/ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。
- 2021.12.21 发布PaddleOCR [release/2.4](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4)
- OCR算法新增1种文本检测算法([PSENet](./doc/doc_ch/algorithm_det_psenet.md)),3种文本识别算法([NRTR](./doc/doc_ch/algorithm_rec_nrtr.md)[SEED](./doc/doc_ch/algorithm_rec_seed.md)[SAR](./doc/doc_ch/algorithm_rec_sar.md));
- 文档结构化算法新增1种关键信息提取算法([SDMGR](./ppstructure/docs/kie.md)),3种[DocVQA](./ppstructure/vqa)算法(LayoutLM、LayoutLMv2,LayoutXLM)。
- 2021.9.7 发布PaddleOCR [release/2.3](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.3)
- 发布[PP-OCRv2](./doc/doc_ch/ppocr_introduction.md#pp-ocrv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR [release/2.2](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.2)
- 发布文档结构分析[PP-Structure](./ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
> [更多](./doc/doc_ch/update.md) > [更多](./doc/doc_ch/update.md)
...@@ -49,7 +48,9 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -49,7 +48,9 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
支持多种OCR相关前沿算法,在此基础上打造产业级特色模型[PP-OCR](./doc/doc_ch/ppocr_introduction.md)[PP-Structure](./ppstructure/README_ch.md),并打通数据生产、模型训练、压缩、预测部署全流程。 支持多种OCR相关前沿算法,在此基础上打造产业级特色模型[PP-OCR](./doc/doc_ch/ppocr_introduction.md)[PP-Structure](./ppstructure/README_ch.md),并打通数据生产、模型训练、压缩、预测部署全流程。
![](./doc/features.png) <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186170862-b8f80f6c-fee7-4b26-badc-de9c327c76ce.png">
</div>
> 上述内容的使用方法建议从文档教程中的快速开始体验 > 上述内容的使用方法建议从文档教程中的快速开始体验
...@@ -213,14 +214,30 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -213,14 +214,30 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- SER(语义实体识别) - SER(语义实体识别)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_ser/zh_val_0_ser.jpg" width="800"> <img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div> </div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
- RE(关系提取) - RE(关系提取)
<div align="center"> <div align="center">
<img src="./ppstructure/docs/vqa/result_re/zh_val_21_re.jpg" width="800"> <img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div> </div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
</details> </details>
<a name="许可证书"></a> <a name="许可证书"></a>
......
...@@ -78,7 +78,7 @@ python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretr ...@@ -78,7 +78,7 @@ python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretr
SRN文本识别模型推理,可以执行如下命令: SRN文本识别模型推理,可以执行如下命令:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_char_type="ch" --rec_algorithm="SRN" --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --use_space_char=False python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_algorithm="SRN" --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --use_space_char=False
``` ```
<a name="4-2"></a> <a name="4-2"></a>
......
# Academic Algorithms and Models
PaddleOCR will add cutting-edge OCR algorithms and models continuously. Check out the supported models and tutorials by clicking the following list:
- [text detection algorithms](./algorithm_overview_en.md#11)
- [text recognition algorithms](./algorithm_overview_en.md#12)
- [end-to-end algorithms](./algorithm_overview_en.md#2)
- [table recognition algorithms](./algorithm_overview_en.md#3)
Developers are welcome to contribute more algorithms! Please refer to [add new algorithm](./add_new_algorithm_en.md) guideline.
...@@ -7,7 +7,11 @@ ...@@ -7,7 +7,11 @@
- [3. Table Recognition Algorithms](#3) - [3. Table Recognition Algorithms](#3)
- [4. Key Information Extraction Algorithms](#4) - [4. Key Information Extraction Algorithms](#4)
This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md). This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCRv3 models list](./models_list_en.md).
>>
Developers are welcome to contribute more algorithms! Please refer to [add new algorithm](./add_new_algorithm_en.md) guideline.
<a name="1"></a> <a name="1"></a>
......
...@@ -652,8 +652,9 @@ def main(): ...@@ -652,8 +652,9 @@ def main():
for index, pdf_img in enumerate(img): for index, pdf_img in enumerate(img):
os.makedirs( os.makedirs(
os.path.join(args.output, img_name), exist_ok=True) os.path.join(args.output, img_name), exist_ok=True)
pdf_img_path = os.path.join(args.output, img_name, img_name pdf_img_path = os.path.join(
+ '_' + str(index) + '.jpg') args.output, img_name,
img_name + '_' + str(index) + '.jpg')
cv2.imwrite(pdf_img_path, pdf_img) cv2.imwrite(pdf_img_path, pdf_img)
img_paths.append([pdf_img_path, pdf_img]) img_paths.append([pdf_img_path, pdf_img])
......
...@@ -104,8 +104,9 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -104,8 +104,9 @@ def load_model(config, model, optimizer=None, model_type='det'):
continue continue
pre_value = params[key] pre_value = params[key]
if pre_value.dtype == paddle.float16: if pre_value.dtype == paddle.float16:
pre_value = pre_value.astype(paddle.float32)
is_float16 = True is_float16 = True
if pre_value.dtype != value.dtype:
pre_value = pre_value.astype(value.dtype)
if list(value.shape) == list(pre_value.shape): if list(value.shape) == list(pre_value.shape):
new_state_dict[key] = pre_value new_state_dict[key] = pre_value
else: else:
...@@ -162,8 +163,9 @@ def load_pretrained_params(model, path): ...@@ -162,8 +163,9 @@ def load_pretrained_params(model, path):
logger.warning("The pretrained params {} not in model".format(k1)) logger.warning("The pretrained params {} not in model".format(k1))
else: else:
if params[k1].dtype == paddle.float16: if params[k1].dtype == paddle.float16:
params[k1] = params[k1].astype(paddle.float32)
is_float16 = True is_float16 = True
if params[k1].dtype != state_dict[k1].dtype:
params[k1] = params[k1].astype(state_dict[k1].dtype)
if list(state_dict[k1].shape) == list(params[k1].shape): if list(state_dict[k1].shape) == list(params[k1].shape):
new_state_dict[k1] = params[k1] new_state_dict[k1] = params[k1]
else: else:
......
English | [简体中文](README_ch.md) English | [简体中文](README_ch.md)
- [1. Introduction](#1-introduction) - [1. Introduction](#1-introduction)
- [2. Update log](#2-update-log) - [2. Features](#2-features)
- [3. Features](#3-features) - [3. Results](#3-results)
- [4. Results](#4-results) - [3.1 Layout analysis and table recognition](#31-layout-analysis-and-table-recognition)
- [4.1 Layout analysis and table recognition](#41-layout-analysis-and-table-recognition) - [3.2 Layout Recovery](#32-layout-recovery)
- [4.2 KIE](#42-kie) - [3.3 KIE](#33-kie)
- [5. Quick start](#5-quick-start) - [4. Quick start](#4-quick-start)
- [6. PP-Structure System](#6-pp-structure-system) - [5. Model List](#5-model-list)
- [6.1 Layout analysis and table recognition](#61-layout-analysis-and-table-recognition)
- [6.1.1 Layout analysis](#611-layout-analysis)
- [6.1.2 Table recognition](#612-table-recognition)
- [6.2 KIE](#62-kie)
- [7. Model List](#7-model-list)
- [7.1 Layout analysis model](#71-layout-analysis-model)
- [7.2 OCR and table recognition model](#72-ocr-and-table-recognition-model)
- [7.3 KIE model](#73-kie-model)
## 1. Introduction ## 1. Introduction
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks PP-Structure is an intelligent document analysis system developed by the PaddleOCR team, which aims to help developers better complete tasks related to document understanding such as layout analysis and table recognition.
## 2. Update log The pipeline of PP-Structurev2 system is shown below. The document image first passes through the image direction correction module to identify the direction of the entire image and complete the direction correction. Then, two tasks of layout information analysis and key information extraction can be completed.
* 2022.02.12 KIE add LayoutLMv2 model。
* 2021.12.07 add [KIE SER and RE tasks](kie/README.md)
## 3. Features - In the layout analysis task, the image first goes through the layout analysis model to divide the image into different areas such as text, table, and figure, and then analyze these areas separately. For example, the table area is sent to the form recognition module for structured recognition, and the text area is sent to the OCR engine for text recognition. Finally, the layout recovery module restores it to a word or pdf file with the same layout as the original image;
- In the key information extraction task, the OCR engine is first used to extract the text content, and then the SER(semantic entity recognition) module obtains the semantic entities in the image, and finally the RE(relationship extraction) module obtains the correspondence between the semantic entities, thereby extracting the required key information.
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/>
The main features of PP-Structure are as follows: More technical details: 👉 [PP-Structurev2 Technical Report](docs/PP-Structurev2_introduction.md)
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser) PP-Structurev2 supports independent use or flexible collocation of each module. For example, you can use layout analysis alone or table recognition alone. Click the corresponding link below to get the tutorial for each independent module:
- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR)
- Support to extract excel files from the table areas
- Support python whl package and command line usage, easy to use
- Support custom training for layout analysis and table structure tasks
- Support Document Key Information Extraction (KIE) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
## 4. Results - [Layout Analysis](layout/README.md)
- [Table Recognition](table/README.md)
- [Key Information Extraction](kie/README.md)
- [Layout Recovery](recovery/README.md)
### 4.1 Layout analysis and table recognition ## 2. Features
<img src="docs/table/ppstructure.GIF" width="100%"/> The main features of PP-Structurev2 are as follows:
- Support layout analysis of documents in the form of images/pdfs, which can be divided into areas such as **text, titles, tables, figures, formulas, etc.**;
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use. - Support common Chinese and English **table detection** tasks;
- Support structured table recognition, and output the final result to **Excel file**;
### 4.2 KIE - Support multimodal-based Key Information Extraction (KIE) tasks - **Semantic Entity Recognition** (SER) and **Relation Extraction (RE);
- Support **layout recovery**, that is, restore the document in word or pdf format with the same layout as the original image;
* SER - Support customized training and multiple inference deployment methods such as python whl package quick start;
* - Connect with the semi-automatic data labeling tool PPOCRLabel, which supports the labeling of layout analysis, table recognition, and SER.
![](docs/kie/result_ser/zh_val_0_ser.jpg) | ![](docs/kie/result_ser/zh_val_42_ser.jpg)
---|---
Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header:
* Dark purple: header ## 3. Results
* Light purple: query
* Army green: answer
The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box. PP-Structurev2 supports the independent use or flexible collocation of each module. For example, layout analysis can be used alone, or table recognition can be used alone. Only the visualization effects of several representative usage methods are shown here.
### 3.1 Layout analysis and table recognition
* RE The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
<img src="docs/table/ppstructure.GIF" width="100%"/>
![](docs/kie/result_re/zh_val_21_re.jpg) | ![](docs/kie/result_re/zh_val_40_re.jpg)
---|---
### 3.2 Layout recovery
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box. The following figure shows the effect of layout recovery based on the results of layout analysis and table recognition in the previous section.
<img src="./docs/recovery/recovery.jpg" width="100%"/>
## 5. Quick start ### 3.3 KIE
Start from [Quick Installation](./docs/quickstart.md) * SER
## 6. PP-Structure System Different colored boxes in the figure represent different categories.
### 6.1 Layout analysis and table recognition <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
![pipeline](docs/table/pipeline.jpg) <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095702-9acef674-12af-4d09-97fc-abf4ab32600e.png" width="600">
</div>
In PP-Structure, the image will be divided into 5 types of areas **text, title, image list and table**. For the first 4 types of areas, directly use PP-OCR system to complete the text detection and recognition. For the table area, after the table structuring process, the table in image is converted into an Excel file with the same table style. <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539141-68e71c75-5cf7-4529-b2ca-219d29fa5f68.jpg" width="600">
</div>
#### 6.1.1 Layout analysis <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README.md). <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
#### 6.1.2 Table recognition * RE
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md) In the figure, the red box represents `Question`, the blue box represents `Answer`, and `Question` and `Answer` are connected by green lines.
### 6.2 KIE <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
Multi-modal based Key Information Extraction (KIE) methods include Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](kie/README.md) <div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095641-5843b4da-34d7-4c1c-943a-b1036a859fe3.png" width="600">
</div>
## 7. Model List <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
PP-Structure Series Model List (Updating) <div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div>
### 7.1 Layout analysis model ## 4. Quick start
|model name|description|download|label_map| Start from [Quick Start](./docs/quickstart_en.md).
| --- | --- | --- |--- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}|
### 7.2 OCR and table recognition model ## 5. Model List
|model name|description|model size|download| Some tasks need to use both the structured analysis models and the OCR models. For example, the table recognition task needs to use the table recognition model for structured analysis, and the OCR model to recognize the text in the table. Please select the appropriate models according to your specific needs.
| --- | --- | --- | --- |
|ch_PP-OCRv3_det| [New] Lightweight model, supporting Chinese, English, multilingual text detection | 3.8M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)|
|ch_PP-OCRv3_rec| [New] Lightweight model, supporting Chinese, English, multilingual text recognition | 12.4M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
|ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
### 7.3 KIE model For structural analysis related model downloads, please refer to:
- [PP-Structure Model Zoo](./docs/models_list_en.md)
|model name|description|model size|download| For OCR related model downloads, please refer to:
| --- | --- | --- | --- | - [PP-OCR Model Zoo](../doc/doc_en/models_list_en.md)
|ser_LayoutXLM_xfun_zhd|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|re_LayoutXLM_xfun_zh|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and [PPStructure model_list](./docs/models_list.md)
...@@ -21,7 +21,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正 ...@@ -21,7 +21,7 @@ PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正
- 关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。 - 关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/> <img src="./docs/ppstructurev2_pipeline.png" width="100%"/>
更多技术细节:👉 [PP-Structurev2技术报告]() 更多技术细节:👉 [PP-Structurev2技术报告](docs/PP-Structurev2_introduction.md)
PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,点击下面相应链接获取各个独立模块的使用教程: PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,点击下面相应链接获取各个独立模块的使用教程:
...@@ -76,6 +76,14 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独 ...@@ -76,6 +76,14 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600"> <img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div> </div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095702-9acef674-12af-4d09-97fc-abf4ab32600e.png" width="600">
</div>
* RE * RE
图中红色框表示`问题`,蓝色框表示`答案``问题``答案`之间使用绿色线连接。 图中红色框表示`问题`,蓝色框表示`答案``问题``答案`之间使用绿色线连接。
...@@ -88,6 +96,14 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独 ...@@ -88,6 +96,14 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独
<img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600"> <img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div> </div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095641-5843b4da-34d7-4c1c-943a-b1036a859fe3.png" width="600">
</div>
<a name="4"></a> <a name="4"></a>
## 4. 快速体验 ## 4. 快速体验
......
此差异已折叠。
# Python Inference # Python Inference
- [1. Structure](#1) - [1. Layout Structured Analysis](#1)
- [1.1 layout analysis + table recognition](#1.1) - [1.1 layout analysis + table recognition](#1.1)
- [1.2 layout analysis](#1.2) - [1.2 layout analysis](#1.2)
- [1.3 table recognition](#1.3) - [1.3 table recognition](#1.3)
- [2. KIE](#2) - [2. Key Information Extraction](#2)
<a name="1"></a> <a name="1"></a>
## 1. Structure ## 1. Layout Structured Analysis
Go to the `ppstructure` directory Go to the `ppstructure` directory
```bash ```bash
...@@ -70,7 +70,7 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv3_det_infer \ ...@@ -70,7 +70,7 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv3_det_infer \
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image. After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image.
<a name="2"></a> <a name="2"></a>
## 2. KIE ## 2. Key Information Extraction
```bash ```bash
cd ppstructure cd ppstructure
......
# PP-Structure Quick Start # PP-Structure Quick Start
- [1. Install package](#1-install-package) - [1. Environment Preparation](#1-environment-preparation)
- [2. Use](#2-use) - [2. Quick Use](#2-quick-use)
- [2.1 Use by command line](#21-use-by-command-line) - [2.1 Use by command line](#21-use-by-command-line)
- [2.1.1 image orientation + layout analysis + table recognition](#211-image-orientation--layout-analysis--table-recognition) - [2.1.1 image orientation + layout analysis + table recognition](#211-image-orientation--layout-analysis--table-recognition)
- [2.1.2 layout analysis + table recognition](#212-layout-analysis--table-recognition) - [2.1.2 layout analysis + table recognition](#212-layout-analysis--table-recognition)
...@@ -9,22 +9,41 @@ ...@@ -9,22 +9,41 @@
- [2.1.4 table recognition](#214-table-recognition) - [2.1.4 table recognition](#214-table-recognition)
- [2.1.5 Key Information Extraction](#215-Key-Information-Extraction) - [2.1.5 Key Information Extraction](#215-Key-Information-Extraction)
- [2.1.6 layout recovery](#216-layout-recovery) - [2.1.6 layout recovery](#216-layout-recovery)
- [2.2 Use by code](#22-use-by-code) - [2.2 Use by python script](#22-use-by-python-script)
- [2.2.1 image orientation + layout analysis + table recognition](#221-image-orientation--layout-analysis--table-recognition) - [2.2.1 image orientation + layout analysis + table recognition](#221-image-orientation--layout-analysis--table-recognition)
- [2.2.2 layout analysis + table recognition](#222-layout-analysis--table-recognition) - [2.2.2 layout analysis + table recognition](#222-layout-analysis--table-recognition)
- [2.2.3 layout analysis](#223-layout-analysis) - [2.2.3 layout analysis](#223-layout-analysis)
- [2.2.4 table recognition](#224-table-recognition) - [2.2.4 table recognition](#224-table-recognition)
- [2.2.5 DocVQA](#225-dockie)
- [2.2.5 Key Information Extraction](#225-Key-Information-Extraction) - [2.2.5 Key Information Extraction](#225-Key-Information-Extraction)
- [2.2.6 layout recovery](#226-layout-recovery) - [2.2.6 layout recovery](#226-layout-recovery)
- [2.3 Result description](#23-result-description) - [2.3 Result description](#23-result-description)
- [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition) - [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition)
- [2.3.2 Key Information Extraction](#232-Key-Information-Extraction) - [2.3.2 Key Information Extraction](#232-Key-Information-Extraction)
- [2.4 Parameter Description](#24-parameter-description) - [2.4 Parameter Description](#24-parameter-description)
- [3. Summary](#3-summary)
<a name="1"></a> <a name="1"></a>
## 1. Install package ## 1. Environment Preparation
### 1.1 Install PaddlePaddle
> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).
- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- If you have no available GPU on your machine, please run the following command to install the CPU version
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
### 1.2 Install PaddleOCR Whl Package
```bash ```bash
# Install paddleocr, version 2.6 is recommended # Install paddleocr, version 2.6 is recommended
...@@ -34,17 +53,15 @@ pip3 install "paddleocr>=2.6" ...@@ -34,17 +53,15 @@ pip3 install "paddleocr>=2.6"
pip3 install paddleclas pip3 install paddleclas
# Install the KIE dependency packages (if you do not use the KIE, you can skip it) # Install the KIE dependency packages (if you do not use the KIE, you can skip it)
pip3 install -r ppstructure/kie/requirements.txt pip3 install -r kie/requirements.txt
# Install the layout recovery dependency packages (if you do not use the layout recovery, you can skip it) # Install the layout recovery dependency packages (if you do not use the layout recovery, you can skip it)
pip3 install -r ppstructure/recovery/requirements.txt pip3 install -r recovery/requirements.txt
``` ```
<a name="2"></a> <a name="2"></a>
## 2. Use ## 2. Quick Use
<a name="21"></a> <a name="21"></a>
### 2.1 Use by command line ### 2.1 Use by command line
...@@ -52,45 +69,41 @@ pip3 install -r ppstructure/recovery/requirements.txt ...@@ -52,45 +69,41 @@ pip3 install -r ppstructure/recovery/requirements.txt
<a name="211"></a> <a name="211"></a>
#### 2.1.1 image orientation + layout analysis + table recognition #### 2.1.1 image orientation + layout analysis + table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --image_orientation=true paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --image_orientation=true
``` ```
<a name="212"></a> <a name="212"></a>
#### 2.1.2 layout analysis + table recognition #### 2.1.2 layout analysis + table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure
``` ```
<a name="213"></a> <a name="213"></a>
#### 2.1.3 layout analysis #### 2.1.3 layout analysis
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
``` ```
<a name="214"></a> <a name="214"></a>
#### 2.1.4 table recognition #### 2.1.4 table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout=false
``` ```
<a name="215"></a> <a name="215"></a>
#### 2.1.5 Key Information Extraction #### 2.1.5 Key Information Extraction
Please refer to: [Key Information Extraction](../kie/README.md) . Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md).
<a name="216"></a> <a name="216"></a>
#### 2.1.6 layout recovery #### 2.1.6 layout recovery
``` ```
# Chinese pic paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --recovery=true
# English pic
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf file
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
``` ```
<a name="22"></a> <a name="22"></a>
### 2.2 Use by code ### 2.2 Use by python script
<a name="221"></a> <a name="221"></a>
#### 2.2.1 image orientation + layout analysis + table recognition #### 2.2.1 image orientation + layout analysis + table recognition
...@@ -103,7 +116,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res ...@@ -103,7 +116,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True, image_orientation=True) table_engine = PPStructure(show_log=True, image_orientation=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
...@@ -114,7 +127,7 @@ for line in result: ...@@ -114,7 +127,7 @@ for line in result:
from PIL import Image from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包 font_path = 'doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB') image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path) im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
...@@ -132,7 +145,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res ...@@ -132,7 +145,7 @@ from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True) table_engine = PPStructure(show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
...@@ -143,7 +156,7 @@ for line in result: ...@@ -143,7 +156,7 @@ for line in result:
from PIL import Image from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # font provieded in PaddleOCR font_path = 'doc/fonts/simfang.ttf' # font provieded in PaddleOCR
image = Image.open(img_path).convert('RGB') image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path) im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
...@@ -161,7 +174,7 @@ from paddleocr import PPStructure,save_structure_res ...@@ -161,7 +174,7 @@ from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(table=False, ocr=False, show_log=True) table_engine = PPStructure(table=False, ocr=False, show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png' img_path = 'ppstructure/docs/table/1.png'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
...@@ -182,7 +195,7 @@ from paddleocr import PPStructure,save_structure_res ...@@ -182,7 +195,7 @@ from paddleocr import PPStructure,save_structure_res
table_engine = PPStructure(layout=False, show_log=True) table_engine = PPStructure(layout=False, show_log=True)
save_folder = './output' save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/table.jpg' img_path = 'ppstructure/docs/table/table.jpg'
img = cv2.imread(img_path) img = cv2.imread(img_path)
result = table_engine(img) result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])
...@@ -195,7 +208,7 @@ for line in result: ...@@ -195,7 +208,7 @@ for line in result:
<a name="225"></a> <a name="225"></a>
#### 2.2.5 Key Information Extraction #### 2.2.5 Key Information Extraction
Please refer to: [Key Information Extraction](../kie/README.md) . Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md).
<a name="226"></a> <a name="226"></a>
#### 2.2.6 layout recovery #### 2.2.6 layout recovery
...@@ -246,8 +259,8 @@ Each field in dict is described as follows: ...@@ -246,8 +259,8 @@ Each field in dict is described as follows:
| field | description | | field | description |
| --- |---| | --- |---|
|type| Type of image area. | |type| Type of image area. |
|bbox| The coordinates of the image area in the original image, respectively [upper left corner x, upper left corner y, lower right corner x, lower right corner y]. | |bbox| The coordinates of the image area in the original image, respectively [upper left corner x, upper left corner y, lower right corner x, lower right corner y]. |
|res| OCR or table recognition result of the image area. <br> table: a dict with field descriptions as follows: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: html str of table.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; In the code usage mode, set return_ocr_result_in_table=True whrn call can get the detection and recognition results of each text in the table area, corresponding to the following fields: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: text detection boxes.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: text recognition results.<br> OCR: A tuple containing the detection boxes and recognition results of each single text. | |res| OCR or table recognition result of the image area. <br> table: a dict with field descriptions as follows: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: html str of table.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; In the code usage mode, set return_ocr_result_in_table=True whrn call can get the detection and recognition results of each text in the table area, corresponding to the following fields: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: text detection boxes.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: text recognition results.<br> OCR: A tuple containing the detection boxes and recognition results of each single text. |
After the recognition is completed, each image will have a directory with the same name under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image. After the recognition is completed, each image will have a directory with the same name under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image.
...@@ -291,3 +304,8 @@ Please refer to: [Key Information Extraction](../kie/README.md) . ...@@ -291,3 +304,8 @@ Please refer to: [Key Information Extraction](../kie/README.md) .
| structure_version | Structure version, optional PP-structure and PP-structurev2 | PP-structure | | structure_version | Structure version, optional PP-structure and PP-structurev2 | PP-structure |
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md) Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)
<a name="3"></a>
## 3. Summary
Through the content in this section, you can master the use of PP-Structure related functions through PaddleOCR whl package. Please refer to [documentation tutorial](../../README.md) for more detailed usage tutorials including model training, inference and deployment, etc.
\ No newline at end of file
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_image_shape="3,48,320" ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_image_shape="3,48,320"
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -4,7 +4,7 @@ Global: ...@@ -4,7 +4,7 @@ Global:
log_smooth_window: 20 log_smooth_window: 20
print_batch_step: 5 print_batch_step: 5
save_model_dir: ./output/table_mv3/ save_model_dir: ./output/table_mv3/
save_epoch_step: 3 save_epoch_step: 400
# evaluation is run every 400 iterations after the 0th iteration # evaluation is run every 400 iterations after the 0th iteration
eval_batch_step: [0, 40000] eval_batch_step: [0, 40000]
cal_metric_during_train: True cal_metric_during_train: True
...@@ -17,7 +17,8 @@ Global: ...@@ -17,7 +17,8 @@ Global:
# for data or label process # for data or label process
character_dict_path: ppocr/utils/dict/table_structure_dict.txt character_dict_path: ppocr/utils/dict/table_structure_dict.txt
character_type: en character_type: en
max_text_length: 800 max_text_length: &max_text_length 500
box_format: &box_format 'xyxy' # 'xywh', 'xyxy', 'xyxyxyxy'
infer_mode: False infer_mode: False
Optimizer: Optimizer:
...@@ -37,12 +38,14 @@ Architecture: ...@@ -37,12 +38,14 @@ Architecture:
Backbone: Backbone:
name: MobileNetV3 name: MobileNetV3
scale: 1.0 scale: 1.0
model_name: large model_name: small
disable_se: true
Head: Head:
name: TableAttentionHead name: TableAttentionHead
hidden_size: 256 hidden_size: 256
loc_type: 2 loc_type: 2
max_text_length: 800 max_text_length: *max_text_length
loc_reg_num: &loc_reg_num 4
Loss: Loss:
name: TableAttentionLoss name: TableAttentionLoss
...@@ -70,6 +73,8 @@ Train: ...@@ -70,6 +73,8 @@ Train:
learn_empty_box: False learn_empty_box: False
merge_no_span_structure: False merge_no_span_structure: False
replace_empty_cell_token: False replace_empty_cell_token: False
loc_reg_num: *loc_reg_num
max_text_length: *max_text_length
- TableBoxEncode: - TableBoxEncode:
- ResizeTableImage: - ResizeTableImage:
max_len: 488 max_len: 488
...@@ -102,6 +107,8 @@ Eval: ...@@ -102,6 +107,8 @@ Eval:
learn_empty_box: False learn_empty_box: False
merge_no_span_structure: False merge_no_span_structure: False
replace_empty_cell_token: False replace_empty_cell_token: False
loc_reg_num: *loc_reg_num
max_text_length: *max_text_length
- TableBoxEncode: - TableBoxEncode:
- ResizeTableImage: - ResizeTableImage:
max_len: 488 max_len: 488
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/EN_symbo ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/EN_symbo
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
===========================train_params=========================== ===========================train_params===========================
model_name:rec_r31_robustscanner model_name:rec_r31_robustscanner
python:python python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
Global.auto_cast:null Global.auto_cast:null
...@@ -39,11 +39,11 @@ infer_export:tools/export_model.py -c test_tipc/configs/rec_r31_robustscanner/re ...@@ -39,11 +39,11 @@ infer_export:tools/export_model.py -c test_tipc/configs/rec_r31_robustscanner/re
infer_quant:False infer_quant:False
inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict90.txt --rec_image_shape="3,48,48,160" --use_space_char=False --rec_algorithm="RobustScanner" inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict90.txt --rec_image_shape="3,48,48,160" --use_space_char=False --rec_algorithm="RobustScanner"
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:True|False --enable_mkldnn:False
--cpu_threads:1|6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False|False --use_tensorrt:False
--precision:fp32|int8 --precision:fp32
--rec_model_dir: --rec_model_dir:
--image_dir:./inference/rec_inference --image_dir:./inference/rec_inference
--save_log_path:./test/output/ --save_log_path:./test/output/
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict90.t ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict90.t
--use_gpu:True --use_gpu:True
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict/spi ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict/spi
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/ic15_dic
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/EN_symbo ...@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/EN_symbo
--use_gpu:True|False --use_gpu:True|False
--enable_mkldnn:False --enable_mkldnn:False
--cpu_threads:6 --cpu_threads:6
--rec_batch_num:1|6 --rec_batch_num:1
--use_tensorrt:False --use_tensorrt:False
--precision:fp32 --precision:fp32
--rec_model_dir: --rec_model_dir:
......
...@@ -160,6 +160,8 @@ if [ ${MODE} = "lite_train_lite_infer" ];then ...@@ -160,6 +160,8 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
ln -s ./icdar2015_lite ./icdar2015 ln -s ./icdar2015_lite ./icdar2015
wget -nc -P ./ic15_data/ https://paddleocr.bj.bcebos.com/dataset/rec_gt_train_lite.txt --no-check-certificate wget -nc -P ./ic15_data/ https://paddleocr.bj.bcebos.com/dataset/rec_gt_train_lite.txt --no-check-certificate
wget -nc -P ./ic15_data/ https://paddleocr.bj.bcebos.com/dataset/rec_gt_test_lite.txt --no-check-certificate wget -nc -P ./ic15_data/ https://paddleocr.bj.bcebos.com/dataset/rec_gt_test_lite.txt --no-check-certificate
mv ic15_data/rec_gt_train_lite.txt ic15_data/rec_gt_train.txt
mv ic15_data/rec_gt_test_lite.txt ic15_data/rec_gt_test.txt
cd ../ cd ../
cd ./inference && tar xf rec_inference.tar && cd ../ cd ./inference && tar xf rec_inference.tar && cd ../
if [ ${model_name} == "ch_PP-OCRv2_det" ] || [ ${model_name} == "ch_PP-OCRv2_det_PACT" ]; then if [ ${model_name} == "ch_PP-OCRv2_det" ] || [ ${model_name} == "ch_PP-OCRv2_det_PACT" ]; then
...@@ -221,7 +223,6 @@ if [ ${MODE} = "lite_train_lite_infer" ];then ...@@ -221,7 +223,6 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
fi fi
if [ ${model_name} == "layoutxlm_ser" ] || [ ${model_name} == "vi_layoutxlm_ser" ]; then if [ ${model_name} == "layoutxlm_ser" ] || [ ${model_name} == "vi_layoutxlm_ser" ]; then
pip install -r ppstructure/kie/requirements.txt pip install -r ppstructure/kie/requirements.txt
pip install paddlenlp\>=2.3.5 --force-reinstall -i https://mirrors.aliyun.com/pypi/simple/
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/ppstructure/dataset/XFUND.tar --no-check-certificate wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/ppstructure/dataset/XFUND.tar --no-check-certificate
cd ./train_data/ && tar xf XFUND.tar cd ./train_data/ && tar xf XFUND.tar
cd ../ cd ../
......
...@@ -23,6 +23,7 @@ __dir__ = os.path.dirname(os.path.abspath(__file__)) ...@@ -23,6 +23,7 @@ __dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, __dir__) sys.path.insert(0, __dir__)
sys.path.insert(0, os.path.abspath(os.path.join(__dir__, '..'))) sys.path.insert(0, os.path.abspath(os.path.join(__dir__, '..')))
import paddle
from ppocr.data import build_dataloader from ppocr.data import build_dataloader
from ppocr.modeling.architectures import build_model from ppocr.modeling.architectures import build_model
from ppocr.postprocess import build_post_process from ppocr.postprocess import build_post_process
...@@ -86,6 +87,30 @@ def main(): ...@@ -86,6 +87,30 @@ def main():
else: else:
model_type = None model_type = None
# build metric
eval_class = build_metric(config['Metric'])
# amp
use_amp = config["Global"].get("use_amp", False)
amp_level = config["Global"].get("amp_level", 'O2')
amp_custom_black_list = config['Global'].get('amp_custom_black_list',[])
if use_amp:
AMP_RELATED_FLAGS_SETTING = {
'FLAGS_cudnn_batchnorm_spatial_persistent': 1,
'FLAGS_max_inplace_grad_add': 8,
}
paddle.fluid.set_flags(AMP_RELATED_FLAGS_SETTING)
scale_loss = config["Global"].get("scale_loss", 1.0)
use_dynamic_loss_scaling = config["Global"].get(
"use_dynamic_loss_scaling", False)
scaler = paddle.amp.GradScaler(
init_loss_scaling=scale_loss,
use_dynamic_loss_scaling=use_dynamic_loss_scaling)
if amp_level == "O2":
model = paddle.amp.decorate(
models=model, level=amp_level, master_weight=True)
else:
scaler = None
best_model_dict = load_model( best_model_dict = load_model(
config, model, model_type=config['Architecture']["model_type"]) config, model, model_type=config['Architecture']["model_type"])
if len(best_model_dict): if len(best_model_dict):
...@@ -93,11 +118,9 @@ def main(): ...@@ -93,11 +118,9 @@ def main():
for k, v in best_model_dict.items(): for k, v in best_model_dict.items():
logger.info('{}:{}'.format(k, v)) logger.info('{}:{}'.format(k, v))
# build metric
eval_class = build_metric(config['Metric'])
# start eval # start eval
metric = program.eval(model, valid_dataloader, post_process_class, metric = program.eval(model, valid_dataloader, post_process_class,
eval_class, model_type, extra_input) eval_class, model_type, extra_input, scaler, amp_level, amp_custom_black_list)
logger.info('metric eval ***************') logger.info('metric eval ***************')
for k, v in metric.items(): for k, v in metric.items():
logger.info('{}:{}'.format(k, v)) logger.info('{}:{}'.format(k, v))
......
...@@ -349,6 +349,13 @@ class TextRecognizer(object): ...@@ -349,6 +349,13 @@ class TextRecognizer(object):
for beg_img_no in range(0, img_num, batch_num): for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num) end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = [] norm_img_batch = []
if self.rec_algorithm == "SRN":
encoder_word_pos_list = []
gsrm_word_pos_list = []
gsrm_slf_attn_bias1_list = []
gsrm_slf_attn_bias2_list = []
if self.rec_algorithm == "SAR":
valid_ratios = []
imgC, imgH, imgW = self.rec_image_shape[:3] imgC, imgH, imgW = self.rec_image_shape[:3]
max_wh_ratio = imgW / imgH max_wh_ratio = imgW / imgH
# max_wh_ratio = 0 # max_wh_ratio = 0
...@@ -357,22 +364,16 @@ class TextRecognizer(object): ...@@ -357,22 +364,16 @@ class TextRecognizer(object):
wh_ratio = w * 1.0 / h wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio) max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no): for ino in range(beg_img_no, end_img_no):
if self.rec_algorithm == "SAR": if self.rec_algorithm == "SAR":
norm_img, _, _, valid_ratio = self.resize_norm_img_sar( norm_img, _, _, valid_ratio = self.resize_norm_img_sar(
img_list[indices[ino]], self.rec_image_shape) img_list[indices[ino]], self.rec_image_shape)
norm_img = norm_img[np.newaxis, :] norm_img = norm_img[np.newaxis, :]
valid_ratio = np.expand_dims(valid_ratio, axis=0) valid_ratio = np.expand_dims(valid_ratio, axis=0)
valid_ratios = []
valid_ratios.append(valid_ratio) valid_ratios.append(valid_ratio)
norm_img_batch.append(norm_img) norm_img_batch.append(norm_img)
elif self.rec_algorithm == "SRN": elif self.rec_algorithm == "SRN":
norm_img = self.process_image_srn( norm_img = self.process_image_srn(
img_list[indices[ino]], self.rec_image_shape, 8, 25) img_list[indices[ino]], self.rec_image_shape, 8, 25)
encoder_word_pos_list = []
gsrm_word_pos_list = []
gsrm_slf_attn_bias1_list = []
gsrm_slf_attn_bias2_list = []
encoder_word_pos_list.append(norm_img[1]) encoder_word_pos_list.append(norm_img[1])
gsrm_word_pos_list.append(norm_img[2]) gsrm_word_pos_list.append(norm_img[2])
gsrm_slf_attn_bias1_list.append(norm_img[3]) gsrm_slf_attn_bias1_list.append(norm_img[3])
......
...@@ -191,7 +191,8 @@ def train(config, ...@@ -191,7 +191,8 @@ def train(config,
logger, logger,
log_writer=None, log_writer=None,
scaler=None, scaler=None,
amp_level='O2'): amp_level='O2',
amp_custom_black_list=[]):
cal_metric_during_train = config['Global'].get('cal_metric_during_train', cal_metric_during_train = config['Global'].get('cal_metric_during_train',
False) False)
calc_epoch_interval = config['Global'].get('calc_epoch_interval', 1) calc_epoch_interval = config['Global'].get('calc_epoch_interval', 1)
...@@ -278,10 +279,7 @@ def train(config, ...@@ -278,10 +279,7 @@ def train(config,
model_average = True model_average = True
# use amp # use amp
if scaler: if scaler:
custom_black_list = config['Global'].get( with paddle.amp.auto_cast(level=amp_level, custom_black_list=amp_custom_black_list):
'amp_custom_black_list', [])
with paddle.amp.auto_cast(
level=amp_level, custom_black_list=custom_black_list):
if model_type == 'table' or extra_input: if model_type == 'table' or extra_input:
preds = model(images, data=batch[1:]) preds = model(images, data=batch[1:])
elif model_type in ["kie"]: elif model_type in ["kie"]:
...@@ -386,7 +384,9 @@ def train(config, ...@@ -386,7 +384,9 @@ def train(config,
eval_class, eval_class,
model_type, model_type,
extra_input=extra_input, extra_input=extra_input,
scaler=scaler) scaler=scaler,
amp_level=amp_level,
amp_custom_black_list=amp_custom_black_list)
cur_metric_str = 'cur metric, {}'.format(', '.join( cur_metric_str = 'cur metric, {}'.format(', '.join(
['{}: {}'.format(k, v) for k, v in cur_metric.items()])) ['{}: {}'.format(k, v) for k, v in cur_metric.items()]))
logger.info(cur_metric_str) logger.info(cur_metric_str)
...@@ -477,7 +477,9 @@ def eval(model, ...@@ -477,7 +477,9 @@ def eval(model,
eval_class, eval_class,
model_type=None, model_type=None,
extra_input=False, extra_input=False,
scaler=None): scaler=None,
amp_level='O2',
amp_custom_black_list = []):
model.eval() model.eval()
with paddle.no_grad(): with paddle.no_grad():
total_frame = 0.0 total_frame = 0.0
...@@ -498,7 +500,7 @@ def eval(model, ...@@ -498,7 +500,7 @@ def eval(model,
# use amp # use amp
if scaler: if scaler:
with paddle.amp.auto_cast(level='O2'): with paddle.amp.auto_cast(level=amp_level, custom_black_list=amp_custom_black_list):
if model_type == 'table' or extra_input: if model_type == 'table' or extra_input:
preds = model(images, data=batch[1:]) preds = model(images, data=batch[1:])
elif model_type in ["kie"]: elif model_type in ["kie"]:
......
...@@ -138,9 +138,7 @@ def main(config, device, logger, vdl_writer): ...@@ -138,9 +138,7 @@ def main(config, device, logger, vdl_writer):
# build metric # build metric
eval_class = build_metric(config['Metric']) eval_class = build_metric(config['Metric'])
# load pretrain model
pre_best_model_dict = load_model(config, model, optimizer,
config['Architecture']["model_type"])
logger.info('train dataloader has {} iters'.format(len(train_dataloader))) logger.info('train dataloader has {} iters'.format(len(train_dataloader)))
if valid_dataloader is not None: if valid_dataloader is not None:
logger.info('valid dataloader has {} iters'.format( logger.info('valid dataloader has {} iters'.format(
...@@ -148,6 +146,7 @@ def main(config, device, logger, vdl_writer): ...@@ -148,6 +146,7 @@ def main(config, device, logger, vdl_writer):
use_amp = config["Global"].get("use_amp", False) use_amp = config["Global"].get("use_amp", False)
amp_level = config["Global"].get("amp_level", 'O2') amp_level = config["Global"].get("amp_level", 'O2')
amp_custom_black_list = config['Global'].get('amp_custom_black_list',[])
if use_amp: if use_amp:
AMP_RELATED_FLAGS_SETTING = { AMP_RELATED_FLAGS_SETTING = {
'FLAGS_cudnn_batchnorm_spatial_persistent': 1, 'FLAGS_cudnn_batchnorm_spatial_persistent': 1,
...@@ -166,12 +165,16 @@ def main(config, device, logger, vdl_writer): ...@@ -166,12 +165,16 @@ def main(config, device, logger, vdl_writer):
else: else:
scaler = None scaler = None
# load pretrain model
pre_best_model_dict = load_model(config, model, optimizer,
config['Architecture']["model_type"])
if config['Global']['distributed']: if config['Global']['distributed']:
model = paddle.DataParallel(model) model = paddle.DataParallel(model)
# start train # start train
program.train(config, train_dataloader, valid_dataloader, device, model, program.train(config, train_dataloader, valid_dataloader, device, model,
loss_class, optimizer, lr_scheduler, post_process_class, loss_class, optimizer, lr_scheduler, post_process_class,
eval_class, pre_best_model_dict, logger, vdl_writer, scaler,amp_level) eval_class, pre_best_model_dict, logger, vdl_writer, scaler,amp_level, amp_custom_black_list)
def test_reader(config, device, logger): def test_reader(config, device, logger):
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册