README.md 7.9 KB
Newer Older
文幕地方's avatar
文幕地方 已提交
1
English | [简体中文](README_ch.md)
文幕地方's avatar
文幕地方 已提交
2

M
MissPenguin 已提交
3
# Table Recognition
W
WenmuZhou 已提交
4

文幕地方's avatar
文幕地方 已提交
5 6 7 8 9 10 11 12 13
- [Table Recognition](#table-recognition)
  - [1. pipeline](#1-pipeline)
  - [2. Performance](#2-performance)
  - [3. Result](#3-result)
  - [4. How to use](#4-how-to-use)
    - [4.1 Quick start](#41-quick-start)
    - [4.2 Training, Evaluation and Inference](#42-training-evaluation-and-inference)
    - [4.3 Calculate TEDS](#43-calculate-teds)
  - [5. Reference](#5-reference)
文幕地方's avatar
文幕地方 已提交
14 15


W
WenmuZhou 已提交
16
## 1. pipeline
M
MissPenguin 已提交
17
The table recognition mainly contains three models
W
WenmuZhou 已提交
18 19
1. Single line text detection-DB
2. Single line text recognition-CRNN
文幕地方's avatar
文幕地方 已提交
20
3. Table structure and cell coordinate prediction-SLANet
W
WenmuZhou 已提交
21

M
MissPenguin 已提交
22
The table recognition flow chart is as follows
W
WenmuZhou 已提交
23

M
MissPenguin 已提交
24
![tableocr_pipeline](../docs/table/tableocr_pipeline_en.jpg)
W
WenmuZhou 已提交
25 26

1. The coordinates of single-line text is detected by DB model, and then sends it to the recognition model to get the recognition result.
文幕地方's avatar
文幕地方 已提交
27
2. The table structure and cell coordinates is predicted by SLANet model.
W
WenmuZhou 已提交
28 29 30
3. The recognition result of the cell is combined by the coordinates, recognition result of the single line and the coordinates of the cell.
4. The cell recognition result and the table structure together construct the html string of the table.

W
WenmuZhou 已提交
31 32
## 2. Performance
We evaluated the algorithm on the PubTabNet<sup>[1]</sup> eval dataset, and the performance is as follows:
W
WenmuZhou 已提交
33

文幕地方's avatar
文幕地方 已提交
34 35 36
|Method|Acc|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|Speed|
| --- | --- | --- | ---|
| EDD<sup>[2]</sup> |x| 88.3 |x|
文幕地方's avatar
文幕地方 已提交
37
| TableRec-RARE(ours) | 71.73%| 93.88% |779ms|
文幕地方's avatar
文幕地方 已提交
38
| SLANet(ours) | 76.31%|    95.89%|766ms|
文幕地方's avatar
文幕地方 已提交
39 40 41 42 43

The performance indicators are explained as follows:
- Acc: The accuracy of the table structure in each image, a wrong token is considered an error.
- TEDS: The accuracy of the model's restoration of table information. This indicator evaluates not only the table structure, but also the text content in the table.
- Speed: The inference speed of a single image when the model runs on the CPU machine and MKL is enabled.
文幕地方's avatar
文幕地方 已提交
44

文幕地方's avatar
文幕地方 已提交
45
## 3. Result
W
WenmuZhou 已提交
46

文幕地方's avatar
文幕地方 已提交
47 48 49
![](../docs/imgs/table_ch_result1.jpg)
![](../docs/imgs/table_ch_result2.jpg)
![](../docs/imgs/table_ch_result3.jpg)
W
WenmuZhou 已提交
50

文幕地方's avatar
文幕地方 已提交
51 52 53
## 4. How to use

### 4.1 Quick start
W
WenmuZhou 已提交
54

文幕地方's avatar
文幕地方 已提交
55 56
PP-Structure currently provides table recognition models in both Chinese and English. For the model link, see [models_list](../docs/models_list.md). The following takes the Chinese table recognition model as an example to introduce how to recognize a table.

文幕地方's avatar
文幕地方 已提交
57
Use the following commands to quickly complete the identification of a table.
文幕地方's avatar
文幕地方 已提交
58

W
WenmuZhou 已提交
59 60 61 62 63
```python
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
文幕地方's avatar
文幕地方 已提交
64
# Download the PP-OCRv3 text detection model and unzip it
文幕地方's avatar
文幕地方 已提交
65
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
文幕地方's avatar
文幕地方 已提交
66
# Download the PP-OCRv3 text recognition model and unzip it
文幕地方's avatar
文幕地方 已提交
67
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
文幕地方's avatar
文幕地方 已提交
68 69
# Download the PP-Structurev2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
W
WenmuZhou 已提交
70
cd ..
71
# run
文幕地方's avatar
文幕地方 已提交
72
python3.7 table/predict_table.py \
文幕地方's avatar
文幕地方 已提交
73 74
    --det_model_dir=inference/ch_PP-OCRv3_det_infer \
    --rec_model_dir=inference/ch_PP-OCRv3_rec_infer  \
文幕地方's avatar
文幕地方 已提交
75 76 77 78 79 80
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --image_dir=docs/table/table.jpg \
    --output=../output/table

W
WenmuZhou 已提交
81
```
82

文幕地方's avatar
文幕地方 已提交
83
After the operation is completed, the excel table of each image will be saved to the directory specified by the output field, and an html file will be produced in the directory to visually view the cell coordinates and the recognized table.
W
WenmuZhou 已提交
84

文幕地方's avatar
文幕地方 已提交
85 86 87 88
**NOTE**
1. If you want to use the English table recognition model, you need to download the English text detection and recognition model and the English table recognition model in [models_list](../docs/models_list_en.md), and replace `table_structure_dict_ch.txt` with `table_structure_dict.txt`.
2. To use the TableRec-RARE model, you need to replace `table_structure_dict_ch.txt` with `table_structure_dict.txt`, and add parameter `--merge_no_span_structure=False`

文幕地方's avatar
文幕地方 已提交
89
### 4.2 Training, Evaluation and Inference
W
WenmuZhou 已提交
90

文幕地方's avatar
文幕地方 已提交
91
The training, evaluation and inference process of the text detection model can be referred to [detection](../../doc/doc_en/detection_en.md)
W
WenmuZhou 已提交
92

文幕地方's avatar
文幕地方 已提交
93
The training, evaluation and inference process of the text recognition model can be referred to [recognition](../../doc/doc_en/recognition_en.md)
文幕地方's avatar
文幕地方 已提交
94

文幕地方's avatar
文幕地方 已提交
95
The training, evaluation and inference process of the table recognition model can be referred to [table_recognition](../../doc/doc_en/table_recognition_en.md)
文幕地方's avatar
文幕地方 已提交
96

文幕地方's avatar
文幕地方 已提交
97
### 4.3 Calculate TEDS
W
WenmuZhou 已提交
98

W
WenmuZhou 已提交
99
The table uses [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) as the evaluation metric of the model. Before the model evaluation, the three models in the pipeline need to be exported as inference models (we have provided them), and the gt for evaluation needs to be prepared. Examples of gt are as follows:
文幕地方's avatar
文幕地方 已提交
100 101 102 103 104 105 106 107
```txt
PMC5755158_010_01.png    <html><body><table><thead><tr><td></td><td><b>Weaning</b></td><td><b>Week 15</b></td><td><b>Off-test</b></td></tr></thead><tbody><tr><td>Weaning</td><td>–</td><td>–</td><td>–</td></tr><tr><td>Week 15</td><td>–</td><td>0.17 ± 0.08</td><td>0.16 ± 0.03</td></tr><tr><td>Off-test</td><td>–</td><td>0.80 ± 0.24</td><td>0.19 ± 0.09</td></tr></tbody></table></body></html>
```
Each line in gt consists of the file name and the html string of the table. The file name and the html string of the table are separated by `\t`.

You can also use the following command to generate an evaluation gt file from the annotation file:
```python
python3 ppstructure/table/convert_label2html.py --ori_gt_path /path/to/your_label_file --save_path /path/to/save_file
W
WenmuZhou 已提交
108 109 110 111
```

Use the following command to evaluate. After the evaluation is completed, the teds indicator will be output.
```python
文幕地方's avatar
文幕地方 已提交
112 113 114 115 116 117 118 119 120 121
python3 table/eval_table.py \
    --det_model_dir=path/to/det_model_dir \
    --rec_model_dir=path/to/rec_model_dir \
    --table_model_dir=path/to/table_model_dir \
    --image_dir=../doc/table/1.png \
    --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
    --det_limit_side_len=736 \
    --det_limit_type=min \
    --gt_path=path/to/gt.txt
W
WenmuZhou 已提交
122 123
```

文幕地方's avatar
文幕地方 已提交
124 125 126 127 128 129
Evaluate on the PubLatNet dataset using the English model

```bash
cd PaddleOCR/ppstructure
# Download the model
mkdir inference && cd inference
文幕地方's avatar
文幕地方 已提交
130
# Download the text detection model trained on the PubTabNet dataset and unzip it
文幕地方's avatar
文幕地方 已提交
131
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar && tar xf en_ppocr_mobile_v2.0_table_det_infer.tar
文幕地方's avatar
文幕地方 已提交
132
# Download the text recognition model trained on the PubTabNet dataset and unzip it
文幕地方's avatar
文幕地方 已提交
133
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar && tar xf en_ppocr_mobile_v2.0_table_rec_infer.tar
文幕地方's avatar
文幕地方 已提交
134
# Download the table recognition model trained on the PubTabNet dataset and unzip it
文幕地方's avatar
文幕地方 已提交
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
cd ..

python3 table/eval_table.py \
    --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer \
    --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer \
    --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
    --image_dir=train_data/table/pubtabnet/val/ \
    --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
    --det_limit_side_len=736 \
    --det_limit_type=min \
    --gt_path=path/to/gt.txt
```

output is
W
WenmuZhou 已提交
151
```bash
文幕地方's avatar
文幕地方 已提交
152
teds: 95.89
W
WenmuZhou 已提交
153
```
W
WenmuZhou 已提交
154

文幕地方's avatar
文幕地方 已提交
155
## 5. Reference
W
WenmuZhou 已提交
156
1. https://github.com/ibm-aur-nlp/PubTabNet
157
2. https://arxiv.org/pdf/1911.10683