README.md 7.6 KB
Newer Older
W
WenmuZhou 已提交
1
# PPStructure
W
WenmuZhou 已提交
2

W
WenmuZhou 已提交
3
PPStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
W
opt doc  
WenmuZhou 已提交
4 5 6
## 1. Quick start

### install
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
**install PaddlePaddle2.0**

```bash
pip3 install --upgrade pip

# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple

# If you only have cpu on your machine, please run the following command to install

python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple

For more version requirements, please refer to the instructions in the [installation document](https://www.paddlepaddle.org.cn/install/quick) .
```

**Clone PaddleOCR repo**

```bash
# Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR

# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud:
git clone https://gitee.com/paddlepaddle/PaddleOCR

# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method.
```
W
opt doc  
WenmuZhou 已提交
33

W
WenmuZhou 已提交
34
**install paddleocr**
文幕地方's avatar
文幕地方 已提交
35

36 37 38 39 40 41 42
install by pypi
```bash
cd PaddleOCR
pip install "paddleocr>=2.2" #  # Recommend to use version 2.2
```

build own whl package and install
W
WenmuZhou 已提交
43

44 45 46 47
```bash
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
```
W
WenmuZhou 已提交
48 49 50
**install layoutparser**
```sh
pip3 install -U premailer https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
W
opt doc  
WenmuZhou 已提交
51
```
W
WenmuZhou 已提交
52

W
opt doc  
WenmuZhou 已提交
53
### 1.2 Use
W
WenmuZhou 已提交
54

W
opt doc  
WenmuZhou 已提交
55
#### 1.2.1 Use by command line
W
WenmuZhou 已提交
56

W
opt doc  
WenmuZhou 已提交
57
```bash
W
WenmuZhou 已提交
58
paddleocr --image_dir=../doc/table/1.png --type=structure
W
opt doc  
WenmuZhou 已提交
59 60
```

W
opt doc  
WenmuZhou 已提交
61
#### 1.2.2 Use by code
W
WenmuZhou 已提交
62 63

```python
W
WenmuZhou 已提交
64
import os
W
WenmuZhou 已提交
65
import cv2
W
WenmuZhou 已提交
66
from paddleocr import PPStructure,draw_structure_result,save_structure_res
W
WenmuZhou 已提交
67

W
WenmuZhou 已提交
68
table_engine = PPStructure(show_log=True)
W
WenmuZhou 已提交
69

W
WenmuZhou 已提交
70
save_folder = './output/table'
W
WenmuZhou 已提交
71 72 73
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
W
WenmuZhou 已提交
74
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
W
WenmuZhou 已提交
75

W
WenmuZhou 已提交
76
for line in result:
W
WenmuZhou 已提交
77
    line.pop('img')
W
WenmuZhou 已提交
78 79 80 81
    print(line)

from PIL import Image

W
WenmuZhou 已提交
82
font_path = '../doc/fonts/simfang.ttf'
W
WenmuZhou 已提交
83
image = Image.open(img_path).convert('RGB')
W
WenmuZhou 已提交
84
im_show = draw_structure_result(image, result,font_path=font_path)
W
WenmuZhou 已提交
85 86 87
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
W
WenmuZhou 已提交
88
#### 1.2.3 返回结果说明
W
WenmuZhou 已提交
89
The return result of PPStructure is a list composed of a dict, an example is as follows
W
WenmuZhou 已提交
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107

```shell
[
  {   'type': 'Text', 
      'bbox': [34, 432, 345, 462], 
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], 
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

| Parameter            | Description           | 
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

W
WenmuZhou 已提交
108

W
WenmuZhou 已提交
109
#### 1.2.4 Parameter Description:
W
opt doc  
WenmuZhou 已提交
110 111 112 113 114 115 116 117 118 119

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

W
WenmuZhou 已提交
120
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
W
opt doc  
WenmuZhou 已提交
121

W
WenmuZhou 已提交
122
## 2. PPStructure Pipeline
W
opt doc  
WenmuZhou 已提交
123 124

the process is as follows
W
WenmuZhou 已提交
125
![pipeline](../doc/table/pipeline_en.jpg)
W
opt doc  
WenmuZhou 已提交
126

W
WenmuZhou 已提交
127
In PPStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
W
WenmuZhou 已提交
128

W
opt doc  
WenmuZhou 已提交
129 130
### 2.1 LayoutParser

W
WenmuZhou 已提交
131
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
W
opt doc  
WenmuZhou 已提交
132

W
WenmuZhou 已提交
133
### 2.2 Table Structure
W
opt doc  
WenmuZhou 已提交
134 135 136

Table OCR converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)

W
WenmuZhou 已提交
137
## 3. Predictive by inference engine
W
opt doc  
WenmuZhou 已提交
138 139 140 141

Use the following commands to complete the inference. 

```python
W
WenmuZhou 已提交
142 143 144 145 146 147 148 149 150 151 152 153 154
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..

python3 table/predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --det_limit_side_len=736 --det_limit_type=min --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
W
opt doc  
WenmuZhou 已提交
155
```
W
WenmuZhou 已提交
156
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
W
WenmuZhou 已提交
157

W
WenmuZhou 已提交
158
**Model List**
W
WenmuZhou 已提交
159 160


W
opt doc  
WenmuZhou 已提交
161 162 163
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |