README.md 6.1 KB
Newer Older
W
WenmuZhou 已提交
1 2
# PaddleStructure

W
opt doc  
WenmuZhou 已提交
3 4 5 6 7 8
PaddleStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
## 1. Quick start

### install

**install layoutparser**
文幕地方's avatar
文幕地方 已提交
9
```sh
W
WenmuZhou 已提交
10
pip3 install -U premailer paddleocr https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
文幕地方's avatar
文幕地方 已提交
11
```
W
opt doc  
WenmuZhou 已提交
12
**install paddlestructure**
文幕地方's avatar
文幕地方 已提交
13

W
opt doc  
WenmuZhou 已提交
14
install by pypi
W
WenmuZhou 已提交
15

W
opt doc  
WenmuZhou 已提交
16 17 18
```bash
pip install paddlestructure
```
W
WenmuZhou 已提交
19

W
opt doc  
WenmuZhou 已提交
20 21 22 23 24
build own whl package and install
```bash
python3 setup.py bdist_wheel
pip3 install dist/paddlestructure-x.x.x-py3-none-any.whl # x.x.x is the version of paddlestructure
```
W
WenmuZhou 已提交
25

W
opt doc  
WenmuZhou 已提交
26
### 1.2 Use
W
WenmuZhou 已提交
27

W
opt doc  
WenmuZhou 已提交
28
#### 1.2.1 Use by command line
W
WenmuZhou 已提交
29

W
opt doc  
WenmuZhou 已提交
30 31
```bash
paddlestructure --image_dir=../doc/table/1.png
W
opt doc  
WenmuZhou 已提交
32 33
```

W
opt doc  
WenmuZhou 已提交
34
#### 1.2.2 Use by code
W
WenmuZhou 已提交
35 36

```python
W
WenmuZhou 已提交
37
import os
W
WenmuZhou 已提交
38
import cv2
W
WenmuZhou 已提交
39
from paddlestructure import PaddleStructure,draw_result,save_res
W
WenmuZhou 已提交
40

W
WenmuZhou 已提交
41
table_engine = PaddleStructure(show_log=True)
W
WenmuZhou 已提交
42

W
WenmuZhou 已提交
43
save_folder = './output/table'
W
WenmuZhou 已提交
44 45 46
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
W
WenmuZhou 已提交
47 48
save_res(result, save_folder,os.path.basename(img_path).split('.')[0])

W
WenmuZhou 已提交
49 50 51 52 53
for line in result:
    print(line)

from PIL import Image

W
opt doc  
WenmuZhou 已提交
54
font_path = '../doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
W
WenmuZhou 已提交
55 56 57 58 59
image = Image.open(img_path).convert('RGB')
im_show = draw_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
W
WenmuZhou 已提交
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
#### 1.2.3 返回结果说明
The return result of PaddleStructure is a list composed of a dict, an example is as follows

```shell
[
  {   'type': 'Text', 
      'bbox': [34, 432, 345, 462], 
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], 
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

| Parameter            | Description           | 
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

W
WenmuZhou 已提交
80

W
WenmuZhou 已提交
81
#### 1.2.4 Parameter Description:
W
opt doc  
WenmuZhou 已提交
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel, and the excel file name will be the coordinates of the table in the image.

## 2. PaddleStructure Pipeline

the process is as follows
W
WenmuZhou 已提交
97
![pipeline](../doc/table/pipeline_en.jpg)
W
opt doc  
WenmuZhou 已提交
98 99

In PaddleStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
W
WenmuZhou 已提交
100

W
opt doc  
WenmuZhou 已提交
101 102
### 2.1 LayoutParser

W
WenmuZhou 已提交
103
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
W
opt doc  
WenmuZhou 已提交
104 105 106 107 108

### 2.2 Table OCR

Table OCR converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)

W
WenmuZhou 已提交
109
## 3. Predictive by inference engine
W
opt doc  
WenmuZhou 已提交
110 111 112 113

Use the following commands to complete the inference. 

```python
W
WenmuZhou 已提交
114
python3 table/predict_system.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=EN --det_limit_side_len=736 --det_limit_type=min --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
W
opt doc  
WenmuZhou 已提交
115 116
```
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel, and the excel file name will be the coordinates of the table in the image.
W
WenmuZhou 已提交
117

W
WenmuZhou 已提交
118
**Model List**
W
WenmuZhou 已提交
119 120


W
opt doc  
WenmuZhou 已提交
121 122 123 124 125
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_det|Text detection in English table scene|[ch_det_mv3_db_v2.0.yml](../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 4.7M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) |
|en_ppocr_mobile_v2.0_table_rec|Text recognition in English table scene|[rec_chinese_lite_train_v2.0.yml](..//configs/rec/rec_mv3_none_bilstm_ctc.yml)|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |