README.md 9.8 KB
Newer Older
G
grasswolfs 已提交
1
English | [简体中文](README_ch.md)
W
WenmuZhou 已提交
2

G
grasswolfs 已提交
3
# PP-Structure
W
opt doc  
WenmuZhou 已提交
4

G
grasswolfs 已提交
5 6 7 8 9 10 11
PP-Structure is an OCR toolkit that can be used for complex documents analysis. The main features are as follows:
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser)
- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR)
- Support to extract excel files from the table areas
- Support python whl package and command line usage, easy to use
- Support custom training for layout analysis and table structure tasks
- The total model size is only about 18.6M (continuous optimization)
12

G
grasswolfs 已提交
13
## 1. Visualization
14

G
grasswolfs 已提交
15
<img src="../doc/table/ppstructure.GIF" width="100%"/>
16 17 18



G
grasswolfs 已提交
19 20 21
## 2. Installation

### 2.1 Install requirements
22

G
grasswolfs 已提交
23
- **(1) Install PaddlePaddle**
24 25

```bash
G
grasswolfs 已提交
26 27 28
pip3 install --upgrade pip

# GPU
D
Daniel Yang 已提交
29
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple
30

G
grasswolfs 已提交
31
# CPU
D
Daniel Yang 已提交
32
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple
33

G
grasswolfs 已提交
34
# For more,refer[Installation](https://www.paddlepaddle.org.cn/install/quick)。
35
```
W
opt doc  
WenmuZhou 已提交
36

G
grasswolfs 已提交
37
- **(2) Install Layout-Parser**
文幕地方's avatar
文幕地方 已提交
38

39
```bash
W
WenmuZhou 已提交
40
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
41 42
```

G
grasswolfs 已提交
43 44 45
### 2.2 Install PaddleOCR(including PP-OCR and PP-Structure)

- **(1) PIP install PaddleOCR whl package(inference only)**
W
WenmuZhou 已提交
46

47
```bash
D
Daniel Yang 已提交
48
pip install "paddleocr>=2.2"
49
```
G
grasswolfs 已提交
50 51 52 53 54

- **(2) Clone PaddleOCR(Inference+training)**

```bash
git clone https://github.com/PaddlePaddle/PaddleOCR
W
opt doc  
WenmuZhou 已提交
55
```
W
WenmuZhou 已提交
56 57


M
MissPenguin 已提交
58
## 3. Quick Start
G
grasswolfs 已提交
59 60

### 3.1 Use by command line
W
WenmuZhou 已提交
61

W
opt doc  
WenmuZhou 已提交
62
```bash
W
WenmuZhou 已提交
63
paddleocr --image_dir=../doc/table/1.png --type=structure
W
opt doc  
WenmuZhou 已提交
64 65
```

G
grasswolfs 已提交
66
### 3.2 Use by python API
W
WenmuZhou 已提交
67 68

```python
W
WenmuZhou 已提交
69
import os
W
WenmuZhou 已提交
70
import cv2
W
WenmuZhou 已提交
71
from paddleocr import PPStructure,draw_structure_result,save_structure_res
W
WenmuZhou 已提交
72

W
WenmuZhou 已提交
73
table_engine = PPStructure(show_log=True)
W
WenmuZhou 已提交
74

W
WenmuZhou 已提交
75
save_folder = './output/table'
W
WenmuZhou 已提交
76 77 78
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
W
WenmuZhou 已提交
79
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
W
WenmuZhou 已提交
80

W
WenmuZhou 已提交
81
for line in result:
W
WenmuZhou 已提交
82
    line.pop('img')
W
WenmuZhou 已提交
83 84 85 86
    print(line)

from PIL import Image

W
WenmuZhou 已提交
87
font_path = '../doc/fonts/simfang.ttf'
W
WenmuZhou 已提交
88
image = Image.open(img_path).convert('RGB')
W
WenmuZhou 已提交
89
im_show = draw_structure_result(image, result,font_path=font_path)
W
WenmuZhou 已提交
90 91 92
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
M
MissPenguin 已提交
93 94
### 3.3 Returned results format
The returned results of PP-Structure is a list composed of a dict, an example is as follows
W
WenmuZhou 已提交
95 96 97

```shell
[
G
grasswolfs 已提交
98 99 100
  {   'type': 'Text',
      'bbox': [34, 432, 345, 462],
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
W
WenmuZhou 已提交
101 102 103 104 105 106
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

G
grasswolfs 已提交
107
| Parameter            | Description           |
W
WenmuZhou 已提交
108 109 110 111 112
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

W
WenmuZhou 已提交
113

M
MissPenguin 已提交
114
### 3.4 Parameter description:
W
opt doc  
WenmuZhou 已提交
115 116 117 118 119 120 121 122 123 124

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

W
WenmuZhou 已提交
125
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
W
opt doc  
WenmuZhou 已提交
126

M
MissPenguin 已提交
127
## 4. PP-Structure Pipeline
W
opt doc  
WenmuZhou 已提交
128 129

the process is as follows
W
WenmuZhou 已提交
130
![pipeline](../doc/table/pipeline_en.jpg)
W
opt doc  
WenmuZhou 已提交
131

M
MissPenguin 已提交
132
In PP-Structure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
W
WenmuZhou 已提交
133

G
grasswolfs 已提交
134
### 4.1 LayoutParser
W
opt doc  
WenmuZhou 已提交
135

W
WenmuZhou 已提交
136
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
W
opt doc  
WenmuZhou 已提交
137

M
MissPenguin 已提交
138
### 4.2 Table Recognition
W
opt doc  
WenmuZhou 已提交
139

M
MissPenguin 已提交
140
Table Recognition converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)
W
opt doc  
WenmuZhou 已提交
141

M
MissPenguin 已提交
142
## 5. Prediction by inference engine
W
opt doc  
WenmuZhou 已提交
143

G
grasswolfs 已提交
144
Use the following commands to complete the inference.
W
opt doc  
WenmuZhou 已提交
145 146

```python
W
WenmuZhou 已提交
147 148 149 150 151 152 153 154 155 156 157 158
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..

W
WenmuZhou 已提交
159
python3 predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
W
opt doc  
WenmuZhou 已提交
160
```
W
WenmuZhou 已提交
161
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
W
WenmuZhou 已提交
162

W
WenmuZhou 已提交
163
**Model List**
W
WenmuZhou 已提交
164 165


W
opt doc  
WenmuZhou 已提交
166 167
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
G
grasswolfs 已提交
168
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190

**Model List**

LayoutParser model

|model name|description|download|
| --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet data set can be divided into 5 types of areas **text, title, table, picture and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset can only detect tables | [TableBank Word](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset can only detect tables | [TableBank Latex](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) |

OCR and table recognition model

|model name|description|model size|download|
| --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar) |
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) |
|en_ppocr_mobile_v2.0_table_det|Text detection of English table scenes trained on PubLayNet dataset|4.7M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) |
|en_ppocr_mobile_v2.0_table_rec|Text recognition of English table scene trained on PubLayNet dataset|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |

If you need to use other models, you can download the model in [model_list](../doc/doc_en/models_list_en.md) or use your own trained model to configure it to the three fields of `det_model_dir`, `rec_model_dir`, `table_model_dir` .