README.md 6.8 KB
Newer Older
G
grasswolfs 已提交
1
English | [简体中文](README_ch.md)
W
WenmuZhou 已提交
2

3 4 5 6 7 8 9 10 11 12 13 14 15
- [1. Introduction](#1-introduction)
- [2. Update log](#2-update-log)
- [3. Features](#3-features)
- [4. Results](#4-results)
  - [4.1 Layout analysis and table recognition](#41-layout-analysis-and-table-recognition)
  - [4.2 DOC-VQA](#42-doc-vqa)
- [5. Quick start](#5-quick-start)
- [6. PP-Structure System](#6-pp-structure-system)
  - [6.1 Layout analysis and table recognition](#61-layout-analysis-and-table-recognition)
    - [6.1.1 Layout analysis](#611-layout-analysis)
    - [6.1.2 Table recognition](#612-table-recognition)
  - [6.2 DOC-VQA](#62-doc-vqa)
- [7. Model List](#7-model-list)
W
opt doc  
WenmuZhou 已提交
16

文幕地方's avatar
update  
文幕地方 已提交
17
<a name="1"></a>
18

文幕地方's avatar
update  
文幕地方 已提交
19
## 1. Introduction
20

文幕地方's avatar
update  
文幕地方 已提交
21
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks
22

文幕地方's avatar
update  
文幕地方 已提交
23
<a name="2"></a>
24

文幕地方's avatar
update  
文幕地方 已提交
25 26
## 2. Update log
* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)
G
grasswolfs 已提交
27

文幕地方's avatar
update  
文幕地方 已提交
28
<a name="3"></a>
29

文幕地方's avatar
update  
文幕地方 已提交
30
## 3. Features
31

文幕地方's avatar
update  
文幕地方 已提交
32
The main features of PP-Structure are as follows:
G
grasswolfs 已提交
33

文幕地方's avatar
update  
文幕地方 已提交
34 35 36 37 38 39
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser)
- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR)
- Support to extract excel files from the table areas
- Support python whl package and command line usage, easy to use
- Support custom training for layout analysis and table structure tasks
- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
40

W
opt doc  
WenmuZhou 已提交
41

文幕地方's avatar
update  
文幕地方 已提交
42
<a name="4"></a>
文幕地方's avatar
文幕地方 已提交
43

文幕地方's avatar
update  
文幕地方 已提交
44
## 4. Results
45

文幕地方's avatar
update  
文幕地方 已提交
46
<a name="41"></a>
G
grasswolfs 已提交
47

文幕地方's avatar
update  
文幕地方 已提交
48
### 4.1 Layout analysis and table recognition
W
WenmuZhou 已提交
49

文幕地方's avatar
update  
文幕地方 已提交
50
<img src="../doc/table/ppstructure.GIF" width="100%"/>
G
grasswolfs 已提交
51

文幕地方's avatar
update  
文幕地方 已提交
52
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
G
grasswolfs 已提交
53

文幕地方's avatar
update  
文幕地方 已提交
54
<a name="42"></a>
W
WenmuZhou 已提交
55

文幕地方's avatar
update  
文幕地方 已提交
56
### 4.2 DOC-VQA
W
WenmuZhou 已提交
57

文幕地方's avatar
update  
文幕地方 已提交
58
* SER
59 60
*
![](../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../doc/vqa/result_ser/zh_val_42_ser.jpg)
文幕地方's avatar
update  
文幕地方 已提交
61
---|---
W
WenmuZhou 已提交
62

文幕地方's avatar
update  
文幕地方 已提交
63
Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header:
W
opt doc  
WenmuZhou 已提交
64

文幕地方's avatar
update  
文幕地方 已提交
65 66 67
* Dark purple: header
* Light purple: query
* Army green: answer
W
WenmuZhou 已提交
68

文幕地方's avatar
update  
文幕地方 已提交
69
The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
W
WenmuZhou 已提交
70 71


文幕地方's avatar
update  
文幕地方 已提交
72
* RE
W
WenmuZhou 已提交
73

74
![](../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../doc/vqa/result_re/zh_val_40_re.jpg)
文幕地方's avatar
update  
文幕地方 已提交
75
---|---
W
WenmuZhou 已提交
76 77


文幕地方's avatar
update  
文幕地方 已提交
78
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
W
WenmuZhou 已提交
79 80


文幕地方's avatar
update  
文幕地方 已提交
81
<a name="5"></a>
W
WenmuZhou 已提交
82

文幕地方's avatar
update  
文幕地方 已提交
83
## 5. Quick start
W
WenmuZhou 已提交
84

文幕地方's avatar
update  
文幕地方 已提交
85
Start from [Quick Installation](./docs/quickstart.md)
W
opt doc  
WenmuZhou 已提交
86

文幕地方's avatar
update  
文幕地方 已提交
87
<a name="6"></a>
W
opt doc  
WenmuZhou 已提交
88

文幕地方's avatar
update  
文幕地方 已提交
89
## 6. PP-Structure System
W
opt doc  
WenmuZhou 已提交
90

文幕地方's avatar
update  
文幕地方 已提交
91
<a name="61"></a>
W
opt doc  
WenmuZhou 已提交
92

文幕地方's avatar
update  
文幕地方 已提交
93
### 6.1 Layout analysis and table recognition
W
opt doc  
WenmuZhou 已提交
94

文幕地方's avatar
update  
文幕地方 已提交
95
![pipeline](../doc/table/pipeline.jpg)
W
WenmuZhou 已提交
96

文幕地方's avatar
update  
文幕地方 已提交
97
In PP-Structure, the image will be divided into 5 types of areas **text, title, image list and table**. For the first 4 types of areas, directly use PP-OCR system to complete the text detection and recognition. For the table area, after the table structuring process, the table in image is converted into an Excel file with the same table style.
W
opt doc  
WenmuZhou 已提交
98

文幕地方's avatar
update  
文幕地方 已提交
99
#### 6.1.1 Layout analysis
W
opt doc  
WenmuZhou 已提交
100

101
Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README.md).
W
opt doc  
WenmuZhou 已提交
102

文幕地方's avatar
update  
文幕地方 已提交
103
#### 6.1.2 Table recognition
W
opt doc  
WenmuZhou 已提交
104

文幕地方's avatar
update  
文幕地方 已提交
105
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md)
W
opt doc  
WenmuZhou 已提交
106

文幕地方's avatar
update  
文幕地方 已提交
107
<a name="62"></a>
W
opt doc  
WenmuZhou 已提交
108

文幕地方's avatar
update  
文幕地方 已提交
109
### 6.2 DOC-VQA
W
WenmuZhou 已提交
110

文幕地方's avatar
update  
文幕地方 已提交
111
Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md)
W
WenmuZhou 已提交
112

W
WenmuZhou 已提交
113

文幕地方's avatar
update  
文幕地方 已提交
114
<a name="7"></a>
W
WenmuZhou 已提交
115

文幕地方's avatar
update  
文幕地方 已提交
116
## 7. Model List
117

文幕地方's avatar
update  
文幕地方 已提交
118
PP-Structure系列模型列表(更新中)
119

文幕地方's avatar
update  
文幕地方 已提交
120
* Layout analysis model
121 122 123

|model name|description|download|
| --- | --- | --- |
文幕地方's avatar
update  
文幕地方 已提交
124
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
125

文幕地方's avatar
update  
文幕地方 已提交
126 127

* OCR and table recognition model
128 129 130

|model name|description|model size|download|
| --- | --- | --- | --- |
W
WenmuZhou 已提交
131 132
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar) |
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) |
文幕地方's avatar
update  
文幕地方 已提交
133 134
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |

文幕地方's avatar
update  
文幕地方 已提交
135
* DOC-VQA model
136

文幕地方's avatar
update  
文幕地方 已提交
137 138 139 140
|model name|description|model size|download|
| --- | --- | --- | --- |
|PP-Layout_v1.0_ser_pretrained|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
|PP-Layout_v1.0_re_pretrained|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
L
LDOUBLEV 已提交
141

文幕地方's avatar
update  
文幕地方 已提交
142
If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and  [PPStructure model_list](./docs/model_list.md)