quickstart_en.md 11.4 KB
Newer Older
文幕地方's avatar
文幕地方 已提交
1 2
# PP-Structure Quick Start

M
MissPenguin 已提交
3 4
- [1. Environment Preparation](#1-environment-preparation)
- [2. Quick Use](#2-quick-use)
文幕地方's avatar
文幕地方 已提交
5
  - [2.1 Use by command line](#21-use-by-command-line)
6 7 8 9
    - [2.1.1 image orientation + layout analysis + table recognition](#211-image-orientation--layout-analysis--table-recognition)
    - [2.1.2 layout analysis + table recognition](#212-layout-analysis--table-recognition)
    - [2.1.3 layout analysis](#213-layout-analysis)
    - [2.1.4 table recognition](#214-table-recognition)
littletomatodonkey's avatar
littletomatodonkey 已提交
10
    - [2.1.5 Key Information Extraction](#215-Key-Information-Extraction)
A
an1018 已提交
11
    - [2.1.6 layout recovery](#216-layout-recovery)
M
MissPenguin 已提交
12
  - [2.2 Use by python script](#22-use-by-python-script)
13 14 15 16
    - [2.2.1 image orientation + layout analysis + table recognition](#221-image-orientation--layout-analysis--table-recognition)
    - [2.2.2 layout analysis + table recognition](#222-layout-analysis--table-recognition)
    - [2.2.3 layout analysis](#223-layout-analysis)
    - [2.2.4 table recognition](#224-table-recognition)
littletomatodonkey's avatar
littletomatodonkey 已提交
17
    - [2.2.5 Key Information Extraction](#225-Key-Information-Extraction)
A
an1018 已提交
18
    - [2.2.6 layout recovery](#226-layout-recovery)  
文幕地方's avatar
文幕地方 已提交
19 20
  - [2.3 Result description](#23-result-description)
    - [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition)
littletomatodonkey's avatar
littletomatodonkey 已提交
21
    - [2.3.2 Key Information Extraction](#232-Key-Information-Extraction)
文幕地方's avatar
文幕地方 已提交
22
  - [2.4 Parameter Description](#24-parameter-description)
M
MissPenguin 已提交
23
- [3. Summary](#3-summary)
M
update  
MissPenguin 已提交
24 25 26


<a name="1"></a>
M
MissPenguin 已提交
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## 1. Environment Preparation
### 1.1 Install PaddlePaddle

> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).

- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install

  ```bash
  python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
  ```

- If you have no available GPU on your machine, please run the following command to install the CPU version

  ```bash
  python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
  ```

For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.

### 1.2 Install PaddleOCR Whl Package
M
update  
MissPenguin 已提交
47 48

```bash
A
an1018 已提交
49 50
# Install paddleocr, version 2.6 is recommended
pip3 install "paddleocr>=2.6"
A
an1018 已提交
51

A
an1018 已提交
52 53
# Install the image direction classification dependency package paddleclas (if you do not use the image direction classification, you can skip it)
pip3 install paddleclas
A
an1018 已提交
54 55

# Install the KIE dependency packages (if you do not use the KIE, you can skip it)
M
MissPenguin 已提交
56
pip3 install -r kie/requirements.txt
A
an1018 已提交
57 58

# Install the layout recovery dependency packages (if you do not use the layout recovery, you can skip it)
59
pip3 install -r recovery/requirements.txt
M
update  
MissPenguin 已提交
60 61 62
```

<a name="2"></a>
A
an1018 已提交
63

M
MissPenguin 已提交
64
## 2. Quick Use
M
update  
MissPenguin 已提交
65 66

<a name="21"></a>
文幕地方's avatar
文幕地方 已提交
67
### 2.1 Use by command line
68

M
update  
MissPenguin 已提交
69
<a name="211"></a>
70
#### 2.1.1 image orientation + layout analysis + table recognition
M
update  
MissPenguin 已提交
71
```bash
M
MissPenguin 已提交
72
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --image_orientation=true
M
update  
MissPenguin 已提交
73 74 75
```

<a name="212"></a>
76
#### 2.1.2 layout analysis + table recognition
77
```bash
M
MissPenguin 已提交
78
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure
79 80 81
```

<a name="213"></a>
82
#### 2.1.3 layout analysis
83
```bash
M
MissPenguin 已提交
84
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
85 86 87
```

<a name="214"></a>
88 89
#### 2.1.4 table recognition
```bash
M
MissPenguin 已提交
90
paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout=false
91 92 93
```

<a name="215"></a>
94

littletomatodonkey's avatar
littletomatodonkey 已提交
95
#### 2.1.5 Key Information Extraction
M
update  
MissPenguin 已提交
96

M
MissPenguin 已提交
97
Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md).
M
update  
MissPenguin 已提交
98

A
an1018 已提交
99 100
<a name="216"></a>
#### 2.1.6 layout recovery
A
an1018 已提交
101
```
102
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
A
an1018 已提交
103 104
```

M
update  
MissPenguin 已提交
105
<a name="22"></a>
M
MissPenguin 已提交
106
### 2.2 Use by python script
M
update  
MissPenguin 已提交
107 108

<a name="221"></a>
109
#### 2.2.1 image orientation + layout analysis + table recognition
M
update  
MissPenguin 已提交
110 111 112 113 114 115

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

116
table_engine = PPStructure(show_log=True, image_orientation=True)
M
update  
MissPenguin 已提交
117

118
save_folder = './output'
M
MissPenguin 已提交
119
img_path = 'ppstructure/docs/table/1.png'
M
update  
MissPenguin 已提交
120 121 122 123 124 125 126 127 128 129
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

M
MissPenguin 已提交
130
font_path = 'doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
M
update  
MissPenguin 已提交
131 132 133 134 135 136 137
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

<a name="222"></a>
138 139 140 141 142 143 144 145 146 147
#### 2.2.2 layout analysis + table recognition

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

save_folder = './output'
M
MissPenguin 已提交
148
img_path = 'ppstructure/docs/table/1.png'
149 150 151 152 153 154 155 156 157 158
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

M
MissPenguin 已提交
159
font_path = 'doc/fonts/simfang.ttf' # font provieded in PaddleOCR
160 161 162 163 164 165 166 167
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

<a name="223"></a>
#### 2.2.3 layout analysis
168 169 170 171 172 173 174 175 176

```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res

table_engine = PPStructure(table=False, ocr=False, show_log=True)

save_folder = './output'
M
MissPenguin 已提交
177
img_path = 'ppstructure/docs/table/1.png'
178 179 180 181 182 183 184 185 186
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)
```

187 188
<a name="224"></a>
#### 2.2.4 table recognition
189 190 191 192 193 194 195 196 197

```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res

table_engine = PPStructure(layout=False, show_log=True)

save_folder = './output'
M
MissPenguin 已提交
198
img_path = 'ppstructure/docs/table/table.jpg'
199 200 201 202 203 204 205 206 207
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)
```

208
<a name="225"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
209
#### 2.2.5 Key Information Extraction
M
update  
MissPenguin 已提交
210

M
MissPenguin 已提交
211
Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md).
M
update  
MissPenguin 已提交
212

A
an1018 已提交
213 214 215 216 217 218 219
<a name="226"></a>
#### 2.2.6 layout recovery

```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res
A
an1018 已提交
220
from paddleocr.ppstructure.recovery.recovery_to_doc import sorted_layout_boxes, convert_info_docx
A
an1018 已提交
221

A
an1018 已提交
222 223 224 225
# Chinese image
table_engine = PPStructure(recovery=True)
# English image
# table_engine = PPStructure(recovery=True, lang='en')
A
an1018 已提交
226 227

save_folder = './output'
A
an1018 已提交
228
img_path = 'ppstructure/docs/table/1.png'
A
an1018 已提交
229 230 231 232 233 234 235 236 237
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

h, w, _ = img.shape
A
an1018 已提交
238 239
res = sorted_layout_boxes(result, w)
convert_info_docx(img, res, save_folder, os.path.basename(img_path).split('.')[0])
A
an1018 已提交
240 241
```

M
update  
MissPenguin 已提交
242
<a name="23"></a>
文幕地方's avatar
文幕地方 已提交
243 244 245
### 2.3 Result description

The return of PP-Structure is a list of dicts, the example is as follows:
M
update  
MissPenguin 已提交
246 247

<a name="231"></a>
文幕地方's avatar
文幕地方 已提交
248
#### 2.3.1 layout analysis + table recognition
M
update  
MissPenguin 已提交
249 250 251 252 253 254 255 256 257
```shell
[
  {   'type': 'Text',
      'bbox': [34, 432, 345, 462],
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
文幕地方's avatar
文幕地方 已提交
258
Each field in dict is described as follows:
M
update  
MissPenguin 已提交
259

260 261
| field | description  |
| --- |---|
M
MissPenguin 已提交
262 263
|type| Type of image area. |
|bbox| The coordinates of the image area in the original image, respectively [upper left corner x, upper left corner y, lower right corner x, lower right corner y]. |
文幕地方's avatar
文幕地方 已提交
264
|res| OCR or table recognition result of the image area. <br> table: a dict with field descriptions as follows: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: html str of table.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; In the code usage mode, set return_ocr_result_in_table=True whrn call can get the detection and recognition results of each text in the table area, corresponding to the following fields: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: text detection boxes.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: text recognition results.<br> OCR: A tuple containing the detection boxes and recognition results of each single text. |
M
update  
MissPenguin 已提交
265

文幕地方's avatar
文幕地方 已提交
266
After the recognition is completed, each image will have a directory with the same name under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of  excel and picture is their coordinates in the image.
M
update  
MissPenguin 已提交
267 268 269
  ```
  /output/table/1/
    └─ res.txt
文幕地方's avatar
文幕地方 已提交
270 271 272
    └─ [454, 360, 824, 658].xlsx        table recognition result
    └─ [16, 2, 828, 305].jpg            picture in Image
    └─ [17, 361, 404, 711].xlsx        table recognition result
M
update  
MissPenguin 已提交
273 274 275
  ```

<a name="232"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
276
#### 2.3.2 Key Information Extraction
M
update  
MissPenguin 已提交
277

littletomatodonkey's avatar
littletomatodonkey 已提交
278
Please refer to: [Key Information Extraction](../kie/README.md) .
M
update  
MissPenguin 已提交
279 280

<a name="24"></a>
文幕地方's avatar
文幕地方 已提交
281 282
### 2.4 Parameter Description

283 284 285 286 287 288 289 290 291 292 293
| field | description | default |
|---|---|---|
| output | result save path | ./output/table |
| table_max_len | long side of the image resize in table structure model | 488 |
| table_model_dir | Table structure model inference model path| None |
| table_char_dict_path | The dictionary path of table structure model | ../ppocr/utils/dict/table_structure_dict.txt  |
| merge_no_span_structure | In the table recognition model, whether to merge '\<td>' and '\</td>' | False |
| layout_model_dir  | Layout analysis model inference model path| None |
| layout_dict_path  | The dictionary path of layout analysis model| ../ppocr/utils/dict/layout_publaynet_dict.txt |
| layout_score_threshold  | The box threshold path of layout analysis model| 0.5|
| layout_nms_threshold  | The nms threshold path of layout analysis model| 0.5|
294
| kie_algorithm  | kie model algorithm| LayoutXLM|
295 296
| ser_model_dir  | Ser model inference model path| None|
| ser_dict_path  | The dictionary path of Ser model| ../train_data/XFUND/class_list_xfun.txt|
297
| mode | structure or kie  | structure   |
298 299 300 301 302
| image_orientation | Whether to perform image orientation classification in forward  | False   |
| layout | Whether to perform layout analysis in forward  | True   |
| table  | Whether to perform table recognition in forward  | True   |
| ocr    | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False| True |
| recovery    | Whether to perform layout recovery in forward| False |
A
an1018 已提交
303
| save_pdf    | Whether to convert docx to pdf when recovery| False |
304 305
| structure_version |  Structure version, optional PP-structure and PP-structurev2  | PP-structure |

文幕地方's avatar
文幕地方 已提交
306
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)
M
MissPenguin 已提交
307 308 309 310 311

<a name="3"></a>
## 3. Summary

Through the content in this section, you can master the use of PP-Structure related functions through PaddleOCR whl package. Please refer to [documentation tutorial](../../README.md) for more detailed usage tutorials including model training, inference and deployment, etc.