quickstart_en.md 13.2 KB
Newer Older
文幕地方's avatar
文幕地方 已提交
1 2
# PP-Structure Quick Start

文幕地方's avatar
文幕地方 已提交
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
- [1. Install package](#1-install-package)
- [2. Use](#2-use)
  - [2.1 Use by command line](#21-use-by-command-line)
    - [2.1.1 layout analysis + table recognition](#211-layout-analysis--table-recognition)
    - [2.1.2 layout analysis](#212-layout-analysis)
    - [2.1.3 table recognition](#213-table-recognition)
    - [2.1.4 DocVQA](#214-docvqa)
  - [2.2 Use by code](#22-use-by-code)
    - [2.2.1 layout analysis + table recognition](#221-layout-analysis--table-recognition)
    - [2.2.2 layout analysis](#222-layout-analysis)
    - [2.2.3 table recognition](#223-table-recognition)
    - [2.2.4 DocVQA](#224-docvqa)
  - [2.3 Result description](#23-result-description)
    - [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition)
    - [2.3.2 DocVQA](#232-docvqa)
  - [2.4 Parameter Description](#24-parameter-description)
M
update  
MissPenguin 已提交
19 20 21


<a name="1"></a>
文幕地方's avatar
文幕地方 已提交
22
## 1. Install package
M
update  
MissPenguin 已提交
23 24

```bash
文幕地方's avatar
文幕地方 已提交
25
# Install paddleocr, version 2.5+ is recommended
文幕地方's avatar
文幕地方 已提交
26
pip3 install "paddleocr>=2.5"
文幕地方's avatar
文幕地方 已提交
27
# Install the DocVQA dependency package paddlenlp (if you do not use the DocVQA, you can skip it)
M
update  
MissPenguin 已提交
28 29 30 31 32
pip install paddlenlp

```

<a name="2"></a>
文幕地方's avatar
文幕地方 已提交
33
## 2. Use
M
update  
MissPenguin 已提交
34 35

<a name="21"></a>
文幕地方's avatar
文幕地方 已提交
36
### 2.1 Use by command line
37

M
update  
MissPenguin 已提交
38
<a name="211"></a>
文幕地方's avatar
文幕地方 已提交
39
#### 2.1.1 layout analysis + table recognition
M
update  
MissPenguin 已提交
40
```bash
41
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure
M
update  
MissPenguin 已提交
42 43 44
```

<a name="212"></a>
文幕地方's avatar
文幕地方 已提交
45
#### 2.1.2 layout analysis
46 47 48 49 50
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
```

<a name="213"></a>
文幕地方's avatar
文幕地方 已提交
51
#### 2.1.3 table recognition
52 53 54 55 56 57
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false
```

<a name="214"></a>
#### 2.1.4 DocVQA
M
update  
MissPenguin 已提交
58

文幕地方's avatar
文幕地方 已提交
59
Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
M
update  
MissPenguin 已提交
60 61

<a name="22"></a>
文幕地方's avatar
文幕地方 已提交
62
### 2.2 Use by code
M
update  
MissPenguin 已提交
63 64

<a name="221"></a>
文幕地方's avatar
文幕地方 已提交
65
#### 2.2.1 layout analysis + table recognition
M
update  
MissPenguin 已提交
66 67 68 69 70 71 72 73

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

74 75
save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
M
update  
MissPenguin 已提交
76 77 78 79 80 81 82 83 84 85
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

86
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
M
update  
MissPenguin 已提交
87 88 89 90 91 92 93
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

<a name="222"></a>
文幕地方's avatar
文幕地方 已提交
94
#### 2.2.2 layout analysis
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114

```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res

table_engine = PPStructure(table=False, ocr=False, show_log=True)

save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)
```

<a name="223"></a>
文幕地方's avatar
文幕地方 已提交
115
#### 2.2.3 table recognition
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136

```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res

table_engine = PPStructure(layout=False, show_log=True)

save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/table.jpg'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)
```

<a name="224"></a>
#### 2.2.4 DocVQA
M
update  
MissPenguin 已提交
137

文幕地方's avatar
文幕地方 已提交
138
Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
M
update  
MissPenguin 已提交
139 140

<a name="23"></a>
文幕地方's avatar
文幕地方 已提交
141 142 143
### 2.3 Result description

The return of PP-Structure is a list of dicts, the example is as follows:
M
update  
MissPenguin 已提交
144 145

<a name="231"></a>
文幕地方's avatar
文幕地方 已提交
146
#### 2.3.1 layout analysis + table recognition
M
update  
MissPenguin 已提交
147 148 149 150 151 152 153 154 155
```shell
[
  {   'type': 'Text',
      'bbox': [34, 432, 345, 462],
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
文幕地方's avatar
文幕地方 已提交
156
Each field in dict is described as follows:
M
update  
MissPenguin 已提交
157

文幕地方's avatar
文幕地方 已提交
158 159 160 161 162
| field            | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| --------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|type| Type of image area.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|bbox| The coordinates of the image area in the original image, respectively [upper left corner x, upper left corner y, lower right corner x, lower right corner y].                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|res| OCR or table recognition result of the image area. <br> table: a dict with field descriptions as follows: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: html str of table.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; In the code usage mode, set return_ocr_result_in_table=True whrn call can get the detection and recognition results of each text in the table area, corresponding to the following fields: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: text detection boxes.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: text recognition results.<br> OCR: A tuple containing the detection boxes and recognition results of each single text. |
M
update  
MissPenguin 已提交
163

文幕地方's avatar
文幕地方 已提交
164
After the recognition is completed, each image will have a directory with the same name under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of  excel and picture is their coordinates in the image.
M
update  
MissPenguin 已提交
165 166 167
  ```
  /output/table/1/
    └─ res.txt
文幕地方's avatar
文幕地方 已提交
168 169 170
    └─ [454, 360, 824, 658].xlsx        table recognition result
    └─ [16, 2, 828, 305].jpg            picture in Image
    └─ [17, 361, 404, 711].xlsx        table recognition result
M
update  
MissPenguin 已提交
171 172 173 174 175
  ```

<a name="232"></a>
#### 2.3.2 DocVQA

文幕地方's avatar
文幕地方 已提交
176
Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
M
update  
MissPenguin 已提交
177 178

<a name="24"></a>
文幕地方's avatar
文幕地方 已提交
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
### 2.4 Parameter Description

| field                | description                                                                                                                                                                                                                                                      | default                                                 |
|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| output               | The save path of result                                                                                                                                                                                                                                          | ./output/table                                          |
| table_max_len        | When the table structure model predicts, the long side of the image                                                                                                                                                                                              | 488                                                     |
| table_model_dir      | the path of table structure model                                                                                                                                                                                                                                | None                                                    |
| table_char_dict_path | the dict path of table structure model                                                                                                                                                                                                                           | ../ppocr/utils/dict/table_structure_dict.txt            |
| layout_path_model    | The model path of the layout analysis model, which can be an online address or a local path. When it is a local path, layout_label_map needs to be set. In command line mode, use --layout_label_map='{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}' | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config |
| layout_label_map     | Layout analysis model model label mapping dictionary path                                                                                                                                                                                                        | None                                                    |
| model_name_or_path   | the model path of VQA SER model                                                                                                                                                                                                                                  | None                                                    |
| max_seq_length       | the max token length of VQA SER model                                                                                                                                                                                                                            | 512                                                     |
| label_map_path       | the label path of VQA SER model                                                                                                                                                                                                                                  | ./vqa/labels/labels_ser.txt                             |
| layout               | Whether to perform layout analysis in forward                                                                                                                                                                                                                    | True                                                    |
| table                | Whether to perform table recognition in forward                                                                                                                                                                                                                  | True                                                    |
| ocr                  | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False                                                                                                                                                                                                                 | True                                                    |
A
andyjpaddle 已提交
195
| structure_version                     | table structure Model version number, the current model support list is as follows: PP-STRUCTURE support english table structure model | PP-STRUCTURE                 |
文幕地方's avatar
文幕地方 已提交
196
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)