README.md 6.3 KB
Newer Older
G
grasswolfs 已提交
1
English | [简体中文](README_ch.md)
W
WenmuZhou 已提交
2

3
- [1. Introduction](#1-introduction)
M
MissPenguin 已提交
4 5 6 7 8 9 10
- [2. Features](#2-features)
- [3. Results](#3-results)
  - [3.1 Layout analysis and table recognition](#31-layout-analysis-and-table-recognition)
  - [3.2 Layout Recovery](#32-layout-recovery)
  - [3.3 KIE](#33-kie)
- [4. Quick start](#4-quick-start)
- [5. Model List](#5-model-list)
W
opt doc  
WenmuZhou 已提交
11

文幕地方's avatar
update  
文幕地方 已提交
12
## 1. Introduction
13

M
MissPenguin 已提交
14
PP-Structure is an intelligent document analysis system developed by the PaddleOCR team, which aims to help developers better complete tasks related to document understanding such as layout analysis and table recognition.
15

M
MissPenguin 已提交
16
The pipeline of PP-Structurev2 system is shown below. The document image first passes through the image direction correction module to identify the direction of the entire image and complete the direction correction. Then, two tasks of layout information analysis and key information extraction can be completed.
G
grasswolfs 已提交
17

M
MissPenguin 已提交
18 19 20
- In the layout analysis task, the image first goes through the layout analysis model to divide the image into different areas such as text, table, and figure, and then analyze these areas separately. For example, the table area is sent to the form recognition module for structured recognition, and the text area is sent to the OCR engine for text recognition. Finally, the layout recovery module restores it to a word or pdf file with the same layout as the original image;
- In the key information extraction task, the OCR engine is first used to extract the text content, and then the SER(semantic entity recognition) module obtains the semantic entities in the image, and finally the RE(relationship extraction) module obtains the correspondence between the semantic entities, thereby extracting the required key information.
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/>
21

M
MissPenguin 已提交
22
More technical details: 👉 [PP-Structurev2 Technical Report]()
G
grasswolfs 已提交
23

M
MissPenguin 已提交
24
PP-Structurev2 supports independent use or flexible collocation of each module. For example, you can use layout analysis alone or table recognition alone. Click the corresponding link below to get the tutorial for each independent module:
25

M
MissPenguin 已提交
26 27 28 29
- [Layout Analysis](layout/README.md)
- [Table Recognition](table/README.md)
- [Key Information Extraction](kie/README.md)
- [Layout Recovery](recovery/README.md)
30

M
MissPenguin 已提交
31
## 2. Features
W
WenmuZhou 已提交
32

M
MissPenguin 已提交
33 34 35 36 37 38 39 40
The main features of PP-Structurev2 are as follows:
- Support layout analysis of documents in the form of images/pdfs, which can be divided into areas such as **text, titles, tables, figures, formulas, etc.**;
- Support common Chinese and English **table detection** tasks;
- Support structured table recognition, and output the final result to **Excel file**;
- Support multimodal-based Key Information Extraction (KIE) tasks - **Semantic Entity Recognition** (SER) and **Relation Extraction (RE);
- Support **layout recovery**, that is, restore the document in word or pdf format with the same layout as the original image;
- Support customized training and multiple inference deployment methods such as python whl package quick use;
- Connected with the semi-automatic data labeling tool PPOCRLabel, which supports the labeling of layout analysis, table recognition, and SER.
W
opt doc  
WenmuZhou 已提交
41

M
MissPenguin 已提交
42
## 3. Results
W
WenmuZhou 已提交
43

M
MissPenguin 已提交
44
PP-Structurev2 supports the independent use or flexible collocation of each module. For example, layout analysis can be used alone, or table recognition can be used alone. Only the visualization effects of several representative usage methods are shown here.
W
WenmuZhou 已提交
45

M
MissPenguin 已提交
46
### 3.1 Layout analysis and table recognition
W
WenmuZhou 已提交
47

M
MissPenguin 已提交
48 49
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
<img src="docs/table/ppstructure.GIF" width="100%"/>
W
WenmuZhou 已提交
50

M
MissPenguin 已提交
51
### 3.2 Layout recovery
W
WenmuZhou 已提交
52

M
MissPenguin 已提交
53 54
The following figure shows the effect of layout recovery based on the results of layout analysis and table recognition in the previous section.
<img src="./docs/recovery/recovery.jpg" width="100%"/>
W
WenmuZhou 已提交
55

M
MissPenguin 已提交
56
### 3.3 KIE
W
WenmuZhou 已提交
57

M
MissPenguin 已提交
58
* SER
W
opt doc  
WenmuZhou 已提交
59

M
MissPenguin 已提交
60
Different colored boxes in the figure represent different categories. 
W
opt doc  
WenmuZhou 已提交
61

M
MissPenguin 已提交
62 63 64
<div align="center">
    <img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>
W
opt doc  
WenmuZhou 已提交
65

M
MissPenguin 已提交
66 67 68
<div align="center">
    <img src="https://user-images.githubusercontent.com/25809855/186095702-9acef674-12af-4d09-97fc-abf4ab32600e.png" width="600">
</div>
W
WenmuZhou 已提交
69

M
MissPenguin 已提交
70 71 72
<div align="center">
    <img src="https://user-images.githubusercontent.com/14270174/185539141-68e71c75-5cf7-4529-b2ca-219d29fa5f68.jpg" width="600">
</div>
W
opt doc  
WenmuZhou 已提交
73

M
MissPenguin 已提交
74 75 76
<div align="center">
    <img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
W
opt doc  
WenmuZhou 已提交
77

M
MissPenguin 已提交
78 79 80
<div align="center">
    <img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>
W
opt doc  
WenmuZhou 已提交
81

M
MissPenguin 已提交
82
* RE
W
opt doc  
WenmuZhou 已提交
83

M
MissPenguin 已提交
84
In the figure, the red box represents `Question`, the blue box represents `Answer`, and `Question` and `Answer` are connected by green lines.
W
opt doc  
WenmuZhou 已提交
85

M
MissPenguin 已提交
86 87 88
<div align="center">
    <img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>  
W
WenmuZhou 已提交
89

M
MissPenguin 已提交
90 91 92
<div align="center">
    <img src="https://user-images.githubusercontent.com/25809855/186095641-5843b4da-34d7-4c1c-943a-b1036a859fe3.png" width="600">
</div> 
W
WenmuZhou 已提交
93

M
MissPenguin 已提交
94 95 96
<div align="center">
    <img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
</div>
97

M
MissPenguin 已提交
98 99 100
<div align="center">
    <img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
</div>
101

M
MissPenguin 已提交
102
## 4. Quick start
103

M
MissPenguin 已提交
104
Start from [Quick Start](./docs/quickstart_en.md).
105

M
MissPenguin 已提交
106
## 5. Model List
107

M
MissPenguin 已提交
108
Some tasks need to use both the structured analysis models and the OCR models. For example, the table recognition task needs to use the table recognition model for structured analysis, and the OCR model to recognize the text in the table. Please select the appropriate models according to your specific needs.
文幕地方's avatar
文幕地方 已提交
109

M
MissPenguin 已提交
110 111
For structural analysis related model downloads, please refer to:
- [PP-Structure Model Zoo](./docs/models_list_en.md)
112

M
MissPenguin 已提交
113 114
For OCR related model downloads, please refer to:
- [PP-OCR Model Zoo](../doc/doc_en/models_list_en.md)
L
LDOUBLEV 已提交
115