diff --git a/ppstructure/README.md b/ppstructure/README.md index a09a43299b11dccf99897d5a6c69704191253aaf..745d5f7ed3610c59b1cd069ebe300418d8fd4741 100644 --- a/ppstructure/README.md +++ b/ppstructure/README.md @@ -1,187 +1,140 @@ English | [简体中文](README_ch.md) -# PP-Structure +- [1. Introduction](#1) +- [2. Update log](#2) +- [3. Features](#3) +- [4. Results](#4) + * [4.1 Layout analysis and table recognition](#41) + * [4.2 DOC-VQA](#42) +- [5. Quick start](#5) +- [6. PP-Structure System](#6) + * [6.1 Layout analysis and table recognition](#61) + * [6.2 DOC-VQA](#62) +- [7. Model List](#7) -PP-Structure is an OCR toolkit that can be used for complex documents analysis. The main features are as follows: -- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser) -- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR) -- Support to extract excel files from the table areas -- Support python whl package and command line usage, easy to use -- Support custom training for layout analysis and table structure tasks + -## 1. Visualization - - +## 1. Introduction +PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks + -## 2. Installation +## 2. Update log +* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)。 -### 2.1 Install requirements + -- **(1) Install PaddlePaddle** +## 3. Features -```bash -pip3 install --upgrade pip +The main features of PP-Structure are as follows: -# GPU -python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple - -# CPU - python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple +- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser) +- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR) +- Support to extract excel files from the table areas +- Support python whl package and command line usage, easy to use +- Support custom training for layout analysis and table structure tasks +- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE) -``` -For more,refer [Installation](https://www.paddlepaddle.org.cn/install/quick) . -- **(2) Install Layout-Parser** + -```bash -pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl -``` +## 4. Results -### 2.2 Install PaddleOCR(including PP-OCR and PP-Structure) + -- **(1) PIP install PaddleOCR whl package(inference only)** +### 4.1 Layout analysis and table recognition -```bash -pip install "paddleocr>=2.2" -``` + -- **(2) Clone PaddleOCR(Inference+training)** +The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use. -```bash -git clone https://github.com/PaddlePaddle/PaddleOCR -``` + +### 4.2 DOC-VQA -## 3. Quick Start +* SER -### 3.1 Use by command line +![](./vqa/images/result_ser/zh_val_0_ser.jpg) | ![](./vqa/images/result_ser/zh_val_42_ser.jpg) +---|--- -```bash -paddleocr --image_dir=../doc/table/1.png --type=structure -``` +Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header: -### 3.2 Use by python API +* Dark purple: header +* Light purple: query +* Army green: answer -```python -import os -import cv2 -from paddleocr import PPStructure,draw_structure_result,save_structure_res +The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box. -table_engine = PPStructure(show_log=True) -save_folder = './output/table' -img_path = '../doc/table/1.png' -img = cv2.imread(img_path) -result = table_engine(img) -save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) +* RE -for line in result: - line.pop('img') - print(line) +![](./vqa/images/result_re/zh_val_21_re.jpg) | ![](./vqa/images/result_re/zh_val_40_re.jpg) +---|--- -from PIL import Image -font_path = '../doc/fonts/simfang.ttf' -image = Image.open(img_path).convert('RGB') -im_show = draw_structure_result(image, result,font_path=font_path) -im_show = Image.fromarray(im_show) -im_show.save('result.jpg') -``` -### 3.3 Returned results format -The returned results of PP-Structure is a list composed of a dict, an example is as follows +In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box. -```shell -[ - { 'type': 'Text', - 'bbox': [34, 432, 345, 462], - 'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], - [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)]) - } -] -``` -The description of each field in dict is as follows -| Parameter | Description | -| --------------- | -------------| -|type|Type of image area| -|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]| -|res|OCR or table recognition result of image area。
Table: HTML string of the table;
OCR: A tuple containing the detection coordinates and recognition results of each single line of text| + +## 5. Quick start -### 3.4 Parameter description: +Start from [Quick Installation](./docs/quickstart.md) -| Parameter | Description | Default value | -| --------------- | ---------------------------------------- | ------------------------------------------- | -| output | The path where excel and recognition results are saved | ./output/table | -| table_max_len | The long side of the image is resized in table structure model | 488 | -| table_model_dir | inference model path of table structure model | None | -| table_char_type | dict path of table structure model | ../ppocr/utils/dict/table_structure_dict.tx | + -Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md) +## 6. PP-Structure System -After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image. + -## 4. PP-Structure Pipeline -![pipeline](../doc/table/pipeline_en.jpg) +### 6.1 Layout analysis and table recognition -In PP-Structure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will be converted to an excel file of the same table style via Table OCR. +![pipeline](../doc/table/pipeline.jpg) -### 4.1 LayoutParser +In PP-Structure, the image will be divided into 5 types of areas **text, title, image list and table**. For the first 4 types of areas, directly use PP-OCR system to complete the text detection and recognition. For the table area, after the table structuring process, the table in image is converted into an Excel file with the same table style. -Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md). +#### 6.1.1 Layout analysis -### 4.2 Table Recognition +Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README.md). -Table Recognition converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md) +#### 6.1.2 Table recognition -## 5. Prediction by inference engine +Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md) -Use the following commands to complete the inference. + -```python -cd PaddleOCR/ppstructure +### 6.2 DOC-VQA -# download model -mkdir inference && cd inference -# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar -# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar -# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar -cd .. +Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md) -python3 predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf -``` -After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image. -**Model List** + -|model name|description|config|model size|download| -| --- | --- | --- | --- | --- | -|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) | +## 7. Model List -**Model List** +PP-Structure系列模型列表(更新中) -LayoutParser model +* Layout analysis model |model name|description|download| | --- | --- | --- | -| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet data set can be divided into 5 types of areas **text, title, table, picture and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | -| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset can only detect tables | [TableBank Word](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | -| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset can only detect tables | [TableBank Latex](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | +| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | -OCR and table recognition model + +* OCR and table recognition model |model name|description|model size|download| | --- | --- | --- | --- | |ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar) | |ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | -|en_ppocr_mobile_v2.0_table_det|Text detection of English table scenes trained on PubLayNet dataset|4.7M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_det_train.tar) | -|en_ppocr_mobile_v2.0_table_rec|Text recognition of English table scene trained on PubLayNet dataset|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) | -|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | +|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | + +* VQA model +|model name|description|model size|download| +| --- | --- | --- | --- | +|PP-Layout_v1.0_ser_pretrained|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) | +|PP-Layout_v1.0_re_pretrained|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) | -If you need to use other models, you can download the model in [model_list](../doc/doc_en/models_list_en.md) or use your own trained model to configure it to the three fields of `det_model_dir`, `rec_model_dir`, `table_model_dir` . +If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and [PPStructure model_list](./docs/model_list.md) diff --git a/ppstructure/README_ch.md b/ppstructure/README_ch.md index 1957202f32bc2eabad82af8476f5b0af2d064a14..99e55437c222e2f4a40b7031d7cb9ef4b840a357 100644 --- a/ppstructure/README_ch.md +++ b/ppstructure/README_ch.md @@ -5,11 +5,11 @@ - [3. 特性](#3) - [4. 效果展示](#4) * [4.1 版面分析和表格识别](#41) - * [4.2 VQA](#42) + * [4.2 DOC-VQA](#42) - [5. 快速体验](#5) - [6. PP-Structure 介绍](#6) * [6.1 版面分析+表格识别](#61) - * [6.2 VQA](#62) + * [6.2 DOC-VQA](#62) - [7. 模型库](#7) @@ -20,13 +20,13 @@ PP-Structure是一个可用于复杂文档结构分析和处理的OCR工具包 ## 2. 近期更新 -* 2021.12.07 新增VQA任务-SER和RE。 +* 2021.12.07 新增DOC-[VQA任务SER和RE](vqa/README.md)。 ## 3. 特性 -PP-Structure是一个可用于复杂文档结构分析和处理的OCR工具包,主要特性如下: +PP-Structure的主要特性如下: - 支持对图片形式的文档进行版面分析,可以划分**文字、标题、表格、图片以及列表**5类区域(与Layout-Parser联合使用) - 支持文字、标题、图片以及列表区域提取为文字字段(与PP-OCR联合使用) - 支持表格区域进行结构化分析,最终结果输出Excel文件 @@ -45,7 +45,12 @@ PP-Structure是一个可用于复杂文档结构分析和处理的OCR工具包 -### 4.2 VQA +图中展示了版面分析+表格识别的整体流程,图片先有版面分析划分为图像、文本、标题和表格四种区域,然后对图像、文本和标题三种区域进行OCR的检测识别,对表格进行表格识别,其中图像还会被存储下来以便使用。 + + + + +### 4.2 DOC-VQA * SER @@ -72,13 +77,12 @@ PP-Structure是一个可用于复杂文档结构分析和处理的OCR工具包 ## 5. 快速体验 -代码体验:从 [快速安装](./docs/quickstart.md) 开始 +请参考[快速安装](./docs/quickstart.md)教程。 ## 6. PP-Structure 介绍 -PP-Structure 内置 ### 6.1 版面分析+表格识别 @@ -93,13 +97,13 @@ PP-Structure 内置 #### 6.1.2 表格识别 -表格识别将表格图片转换为excel文档,其中包含对于表格文本的检测和识别以及对于表格结构和单元格坐标的预测,详细说明参考[文档](table/README_ch.md) +表格识别将表格图片转换为excel文档,其中包含对于表格文本的检测和识别以及对于表格结构和单元格坐标的预测,详细说明参考[文档](table/README_ch.md)。 -### 6.2 VQA +### 6.2 DOC-VQA -VQA指文档视觉问答,其中包括语义实体识别 (Semantic Entity Recognition, SER) 和关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair),详细说明参考[文档](vqa/README.md) +DOC-VQA指文档视觉问答,其中包括语义实体识别 (Semantic Entity Recognition, SER) 和关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair),详细说明参考[文档](vqa/README.md)。 @@ -107,7 +111,7 @@ VQA指文档视觉问答,其中包括语义实体识别 (Semantic Entity Recog PP-Structure系列模型列表(更新中) -* LayoutParser 模型 +* 版面分析模型 |模型名称|模型简介|下载地址| | --- | --- | --- | @@ -122,7 +126,7 @@ PP-Structure系列模型列表(更新中) |ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | -* VQA模型 +* DOC-VQA |模型名称|模型简介|模型大小|下载地址| | --- | --- | --- | --- | @@ -130,4 +134,4 @@ PP-Structure系列模型列表(更新中) |PP-Layout_v1.0_re_pretrained|基于LayoutXLM在xfun中文数据集上训练的RE模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) | -更多模型下载,可以参考 [模型库](./docs/model_list.md) +更多模型下载,可以参考 [PPOCR model_list](../doc/doc_en/models_list.md) and [PPStructure model_list](./docs/model_list.md) \ No newline at end of file