README.md 9.7 KB
Newer Older
G
grasswolfs 已提交
1
English | [简体中文](README_ch.md)
W
WenmuZhou 已提交
2

G
grasswolfs 已提交
3
# PP-Structure
W
opt doc  
WenmuZhou 已提交
4

G
grasswolfs 已提交
5
PP-Structure is an OCR toolkit that can be used for complex documents analysis. The main features are as follows:
D
Daniel Yang 已提交
6 7
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (combined with Layout-Parser)
- Support to extract the texts from the text, title, picture and list areas (combined with PP-OCR)
G
grasswolfs 已提交
8
- Support to extract excel files from the table areas
D
Daniel Yang 已提交
9
- Support to use with python whl package and command line easily
G
grasswolfs 已提交
10
- Support custom training for layout analysis and table structure tasks
11

G
grasswolfs 已提交
12
## 1. Visualization
13

G
grasswolfs 已提交
14
<img src="../doc/table/ppstructure.GIF" width="100%"/>
15 16 17



G
grasswolfs 已提交
18 19 20
## 2. Installation

### 2.1 Install requirements
21

G
grasswolfs 已提交
22
- **(1) Install PaddlePaddle**
23 24

```bash
G
grasswolfs 已提交
25 26 27
pip3 install --upgrade pip

# GPU
D
Daniel Yang 已提交
28
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple
29

G
grasswolfs 已提交
30
# CPU
D
Daniel Yang 已提交
31
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple
32

G
grasswolfs 已提交
33
# For more,refer[Installation](https://www.paddlepaddle.org.cn/install/quick)。
34
```
W
opt doc  
WenmuZhou 已提交
35

G
grasswolfs 已提交
36
- **(2) Install Layout-Parser**
文幕地方's avatar
文幕地方 已提交
37

38
```bash
W
WenmuZhou 已提交
39
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
40 41
```

G
grasswolfs 已提交
42 43 44
### 2.2 Install PaddleOCR(including PP-OCR and PP-Structure)

- **(1) PIP install PaddleOCR whl package(inference only)**
W
WenmuZhou 已提交
45

46
```bash
D
Daniel Yang 已提交
47
pip install "paddleocr>=2.2"
48
```
G
grasswolfs 已提交
49 50 51 52 53

- **(2) Clone PaddleOCR(Inference+training)**

```bash
git clone https://github.com/PaddlePaddle/PaddleOCR
W
opt doc  
WenmuZhou 已提交
54
```
W
WenmuZhou 已提交
55 56


M
MissPenguin 已提交
57
## 3. Quick Start
G
grasswolfs 已提交
58 59

### 3.1 Use by command line
W
WenmuZhou 已提交
60

W
opt doc  
WenmuZhou 已提交
61
```bash
W
WenmuZhou 已提交
62
paddleocr --image_dir=../doc/table/1.png --type=structure
W
opt doc  
WenmuZhou 已提交
63 64
```

G
grasswolfs 已提交
65
### 3.2 Use by python API
W
WenmuZhou 已提交
66 67

```python
W
WenmuZhou 已提交
68
import os
W
WenmuZhou 已提交
69
import cv2
W
WenmuZhou 已提交
70
from paddleocr import PPStructure,draw_structure_result,save_structure_res
W
WenmuZhou 已提交
71

W
WenmuZhou 已提交
72
table_engine = PPStructure(show_log=True)
W
WenmuZhou 已提交
73

W
WenmuZhou 已提交
74
save_folder = './output/table'
W
WenmuZhou 已提交
75 76 77
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
W
WenmuZhou 已提交
78
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
W
WenmuZhou 已提交
79

W
WenmuZhou 已提交
80
for line in result:
W
WenmuZhou 已提交
81
    line.pop('img')
W
WenmuZhou 已提交
82 83 84 85
    print(line)

from PIL import Image

W
WenmuZhou 已提交
86
font_path = '../doc/fonts/simfang.ttf'
W
WenmuZhou 已提交
87
image = Image.open(img_path).convert('RGB')
W
WenmuZhou 已提交
88
im_show = draw_structure_result(image, result,font_path=font_path)
W
WenmuZhou 已提交
89 90 91
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
M
MissPenguin 已提交
92 93
### 3.3 Returned results format
The returned results of PP-Structure is a list composed of a dict, an example is as follows
W
WenmuZhou 已提交
94 95 96

```shell
[
G
grasswolfs 已提交
97 98 99
  {   'type': 'Text',
      'bbox': [34, 432, 345, 462],
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
W
WenmuZhou 已提交
100 101 102 103 104 105
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

G
grasswolfs 已提交
106
| Parameter            | Description           |
W
WenmuZhou 已提交
107 108 109 110 111
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

W
WenmuZhou 已提交
112

M
MissPenguin 已提交
113
### 3.4 Parameter description:
W
opt doc  
WenmuZhou 已提交
114 115 116 117 118 119 120 121 122 123

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

W
WenmuZhou 已提交
124
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
W
opt doc  
WenmuZhou 已提交
125

M
MissPenguin 已提交
126
## 4. PP-Structure Pipeline
W
opt doc  
WenmuZhou 已提交
127 128

the process is as follows
W
WenmuZhou 已提交
129
![pipeline](../doc/table/pipeline_en.jpg)
W
opt doc  
WenmuZhou 已提交
130

M
MissPenguin 已提交
131
In PP-Structure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
W
WenmuZhou 已提交
132

G
grasswolfs 已提交
133
### 4.1 LayoutParser
W
opt doc  
WenmuZhou 已提交
134

W
WenmuZhou 已提交
135
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
W
opt doc  
WenmuZhou 已提交
136

M
MissPenguin 已提交
137
### 4.2 Table Recognition
W
opt doc  
WenmuZhou 已提交
138

M
MissPenguin 已提交
139
Table Recognition converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)
W
opt doc  
WenmuZhou 已提交
140

M
MissPenguin 已提交
141
## 5. Prediction by inference engine
W
opt doc  
WenmuZhou 已提交
142

G
grasswolfs 已提交
143
Use the following commands to complete the inference.
W
opt doc  
WenmuZhou 已提交
144 145

```python
W
WenmuZhou 已提交
146 147 148 149 150 151 152 153 154 155 156 157
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..

W
WenmuZhou 已提交
158
python3 predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
W
opt doc  
WenmuZhou 已提交
159
```
W
WenmuZhou 已提交
160
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
W
WenmuZhou 已提交
161

W
WenmuZhou 已提交
162
**Model List**
W
WenmuZhou 已提交
163 164


W
opt doc  
WenmuZhou 已提交
165 166
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
G
grasswolfs 已提交
167
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188

**Model List**

LayoutParser model

|model name|description|download|
| --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet data set can be divided into 5 types of areas **text, title, table, picture and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset can only detect tables | [TableBank Word](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset can only detect tables | [TableBank Latex](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) |

OCR and table recognition model

|model name|description|model size|download|
| --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar) |
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) |
|en_ppocr_mobile_v2.0_table_det|Text detection of English table scenes trained on PubLayNet dataset|4.7M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) |
|en_ppocr_mobile_v2.0_table_rec|Text recognition of English table scene trained on PubLayNet dataset|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |

D
Daniel Yang 已提交
189
If you need to use other models, you can download the model in [model_list](../doc/doc_en/models_list_en.md) or use your own trained model to configure it to the three fields of `det_model_dir`, `rec_model_dir`, `table_model_dir` .