README.md 6.0 KB
Newer Older
W
WenmuZhou 已提交
1
# PPStructure
W
WenmuZhou 已提交
2

W
WenmuZhou 已提交
3
PPStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
W
opt doc  
WenmuZhou 已提交
4 5 6 7
## 1. Quick start

### install

W
WenmuZhou 已提交
8
**install paddleocr**
文幕地方's avatar
文幕地方 已提交
9

W
WenmuZhou 已提交
10
ref to [paddleocr whl doc](../doc/doc_en/whl_en.md)
W
WenmuZhou 已提交
11

W
WenmuZhou 已提交
12 13 14
**install layoutparser**
```sh
pip3 install -U premailer https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
W
opt doc  
WenmuZhou 已提交
15
```
W
WenmuZhou 已提交
16

W
opt doc  
WenmuZhou 已提交
17
### 1.2 Use
W
WenmuZhou 已提交
18

W
opt doc  
WenmuZhou 已提交
19
#### 1.2.1 Use by command line
W
WenmuZhou 已提交
20

W
opt doc  
WenmuZhou 已提交
21
```bash
W
WenmuZhou 已提交
22
paddleocr --image_dir=../doc/table/1.png --type=structure
W
opt doc  
WenmuZhou 已提交
23 24
```

W
opt doc  
WenmuZhou 已提交
25
#### 1.2.2 Use by code
W
WenmuZhou 已提交
26 27

```python
W
WenmuZhou 已提交
28
import os
W
WenmuZhou 已提交
29
import cv2
W
WenmuZhou 已提交
30
from paddleocr import PPStructure,draw_structure_result,save_structure_res
W
WenmuZhou 已提交
31

W
WenmuZhou 已提交
32
table_engine = PPStructure(show_log=True)
W
WenmuZhou 已提交
33

W
WenmuZhou 已提交
34
save_folder = './output/table'
W
WenmuZhou 已提交
35 36 37
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
W
WenmuZhou 已提交
38
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
W
WenmuZhou 已提交
39

W
WenmuZhou 已提交
40
for line in result:
W
WenmuZhou 已提交
41
    line.pop('img')
W
WenmuZhou 已提交
42 43 44 45
    print(line)

from PIL import Image

W
WenmuZhou 已提交
46
font_path = '../doc/fonts/simfang.ttf'
W
WenmuZhou 已提交
47
image = Image.open(img_path).convert('RGB')
W
WenmuZhou 已提交
48
im_show = draw_structure_result(image, result,font_path=font_path)
W
WenmuZhou 已提交
49 50 51
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
W
WenmuZhou 已提交
52
#### 1.2.3 返回结果说明
W
WenmuZhou 已提交
53
The return result of PPStructure is a list composed of a dict, an example is as follows
W
WenmuZhou 已提交
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

```shell
[
  {   'type': 'Text', 
      'bbox': [34, 432, 345, 462], 
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], 
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

| Parameter            | Description           | 
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

W
WenmuZhou 已提交
72

W
WenmuZhou 已提交
73
#### 1.2.4 Parameter Description:
W
opt doc  
WenmuZhou 已提交
74 75 76 77 78 79 80 81 82 83

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

W
WenmuZhou 已提交
84
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
W
opt doc  
WenmuZhou 已提交
85

W
WenmuZhou 已提交
86
## 2. PPStructure Pipeline
W
opt doc  
WenmuZhou 已提交
87 88

the process is as follows
W
WenmuZhou 已提交
89
![pipeline](../doc/table/pipeline_en.jpg)
W
opt doc  
WenmuZhou 已提交
90

W
WenmuZhou 已提交
91
In PPStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
W
WenmuZhou 已提交
92

W
opt doc  
WenmuZhou 已提交
93 94
### 2.1 LayoutParser

W
WenmuZhou 已提交
95
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
W
opt doc  
WenmuZhou 已提交
96 97 98 99 100

### 2.2 Table OCR

Table OCR converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)

W
WenmuZhou 已提交
101
## 3. Predictive by inference engine
W
opt doc  
WenmuZhou 已提交
102 103 104 105

Use the following commands to complete the inference. 

```python
W
WenmuZhou 已提交
106
python3 table/predict_system.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=EN --det_limit_side_len=736 --det_limit_type=min --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
W
opt doc  
WenmuZhou 已提交
107 108
```
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel, and the excel file name will be the coordinates of the table in the image.
W
WenmuZhou 已提交
109

W
WenmuZhou 已提交
110
**Model List**
W
WenmuZhou 已提交
111 112


W
opt doc  
WenmuZhou 已提交
113 114 115 116 117
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_det|Text detection in English table scene|[ch_det_mv3_db_v2.0.yml](../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 4.7M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) |
|en_ppocr_mobile_v2.0_table_rec|Text recognition in English table scene|[rec_chinese_lite_train_v2.0.yml](..//configs/rec/rec_mv3_none_bilstm_ctc.yml)|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |