@@ -17,20 +17,14 @@ English | [简体中文](README_ch.md)
...
@@ -17,20 +17,14 @@ English | [简体中文](README_ch.md)
-[7.2 OCR and table recognition model](#72-ocr-and-table-recognition-model)
-[7.2 OCR and table recognition model](#72-ocr-and-table-recognition-model)
-[7.3 DOC-VQA model](#73-doc-vqa-model)
-[7.3 DOC-VQA model](#73-doc-vqa-model)
<aname="1"></a>
## 1. Introduction
## 1. Introduction
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks
<aname="2"></a>
## 2. Update log
## 2. Update log
* 2022.02.12 DOC-VQA add LayoutLMv2 model。
* 2022.02.12 DOC-VQA add LayoutLMv2 model。
* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)。
* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)。
<aname="3"></a>
## 3. Features
## 3. Features
The main features of PP-Structure are as follows:
The main features of PP-Structure are as follows:
...
@@ -42,21 +36,14 @@ The main features of PP-Structure are as follows:
...
@@ -42,21 +36,14 @@ The main features of PP-Structure are as follows:
- Support custom training for layout analysis and table structure tasks
- Support custom training for layout analysis and table structure tasks
- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
<aname="42"></a>
### 4.2 DOC-VQA
### 4.2 DOC-VQA
* SER
* SER
...
@@ -81,19 +68,12 @@ The corresponding category and OCR recognition results are also marked at the to
...
@@ -81,19 +68,12 @@ The corresponding category and OCR recognition results are also marked at the to
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
<aname="5"></a>
## 5. Quick start
## 5. Quick start
Start from [Quick Installation](./docs/quickstart.md)
Start from [Quick Installation](./docs/quickstart.md)
<aname="6"></a>
## 6. PP-Structure System
## 6. PP-Structure System
<aname="61"></a>
### 6.1 Layout analysis and table recognition
### 6.1 Layout analysis and table recognition
![pipeline](../doc/table/pipeline.jpg)
![pipeline](../doc/table/pipeline.jpg)
...
@@ -108,30 +88,20 @@ Layout analysis classifies image by region, including the use of Python scripts
...
@@ -108,30 +88,20 @@ Layout analysis classifies image by region, including the use of Python scripts
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md)
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md)
<aname="62"></a>
### 6.2 DOC-VQA
### 6.2 DOC-VQA
Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md)
Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md)
<aname="7"></a>
## 7. Model List
## 7. Model List
PP-Structure Series Model List (Updating)
PP-Structure Series Model List (Updating)
<aname="71"></a>
### 7.1 Layout analysis model
### 7.1 Layout analysis model
|model name|description|download|
|model name|description|download|
| --- | --- | --- |
| --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
<aname="72"></a>
### 7.2 OCR and table recognition model
### 7.2 OCR and table recognition model
|model name|description|model size|download|
|model name|description|model size|download|
...
@@ -140,8 +110,6 @@ PP-Structure Series Model List (Updating)
...
@@ -140,8 +110,6 @@ PP-Structure Series Model List (Updating)
|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text recognition| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text recognition| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset| 18.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset| 18.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
-[Key Information Extraction(KIE)](#key-information-extractionkie)
-[1. Quick Use](#1-quick-use)
-[2. Model Training](#2-model-training)
-[3. Model Evaluation](#3-model-evaluation)
-[4. Reference](#4-reference)
# Key Information Extraction(KIE)
# Key Information Extraction(KIE)
...
@@ -6,13 +10,6 @@ This section provides a tutorial example on how to quickly use, train, and evalu
...
@@ -6,13 +10,6 @@ This section provides a tutorial example on how to quickly use, train, and evalu
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
*[1. Quick Use](#1-----)
*[2. Model Training](#2-----)
*[3. Model Evaluation](#3-----)
<aname="1-----"></a>
## 1. Quick Use
## 1. Quick Use
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
...
@@ -37,7 +34,6 @@ The visualization results are shown in the figure below:
...
@@ -37,7 +34,6 @@ The visualization results are shown in the figure below:
<imgsrc="./imgs/0.png"width="800">
<imgsrc="./imgs/0.png"width="800">
</div>
</div>
<aname="2-----"></a>
## 2. Model Training
## 2. Model Training
Create a softlink to the folder, `PaddleOCR/train_data`:
Create a softlink to the folder, `PaddleOCR/train_data`:
...
@@ -51,7 +47,6 @@ The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. Th
...
@@ -51,7 +47,6 @@ The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. Th