README.md 5.2 KB
Newer Older
文幕地方's avatar
文幕地方 已提交
1
English | [简体中文](README_ch.md)
文幕地方's avatar
文幕地方 已提交
2

M
MissPenguin 已提交
3
# Table Recognition
W
WenmuZhou 已提交
4

文幕地方's avatar
文幕地方 已提交
5 6 7 8 9 10 11 12 13
- [1. pipeline](#1-pipeline)
- [2. Performance](#2-performance)
- [3. How to use](#3-how-to-use)
  - [3.1 quick start](#31-quick-start)
  - [3.2 Train](#32-train)
  - [3.3 Calculate TEDS](#33-calculate-teds)
- [4. Reference](#4-reference)


W
WenmuZhou 已提交
14
## 1. pipeline
M
MissPenguin 已提交
15
The table recognition mainly contains three models
W
WenmuZhou 已提交
16 17
1. Single line text detection-DB
2. Single line text recognition-CRNN
文幕地方's avatar
文幕地方 已提交
18
3. Table structure and cell coordinate prediction-SLANet
W
WenmuZhou 已提交
19

M
MissPenguin 已提交
20
The table recognition flow chart is as follows
W
WenmuZhou 已提交
21

M
MissPenguin 已提交
22
![tableocr_pipeline](../docs/table/tableocr_pipeline_en.jpg)
W
WenmuZhou 已提交
23 24

1. The coordinates of single-line text is detected by DB model, and then sends it to the recognition model to get the recognition result.
文幕地方's avatar
文幕地方 已提交
25
2. The table structure and cell coordinates is predicted by SLANet model.
W
WenmuZhou 已提交
26 27 28
3. The recognition result of the cell is combined by the coordinates, recognition result of the single line and the coordinates of the cell.
4. The cell recognition result and the table structure together construct the html string of the table.

W
WenmuZhou 已提交
29 30
## 2. Performance
We evaluated the algorithm on the PubTabNet<sup>[1]</sup> eval dataset, and the performance is as follows:
W
WenmuZhou 已提交
31

W
WenmuZhou 已提交
32

文幕地方's avatar
文幕地方 已提交
33 34 35 36 37
|Method|acc|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|
| --- | --- | --- |
| EDD<sup>[2]</sup> |x| 88.3 |
| TableRec-RARE(ours) |73.8%| 93.32 |
| SLANet(ours) | 76.2%| 94.98 |SLANet |
W
WenmuZhou 已提交
38 39 40 41

## 3. How to use

### 3.1 quick start
W
WenmuZhou 已提交
42

文幕地方's avatar
文幕地方 已提交
43
Use the following commands to quickly complete the identification of a table.
文幕地方's avatar
文幕地方 已提交
44

W
WenmuZhou 已提交
45 46 47 48 49
```python
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
文幕地方's avatar
文幕地方 已提交
50 51 52 53 54 55
# Download the PP-OCRv3 text detection model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar && tar xf ch_PP-OCRv3_det_slim_infer.tar
# Download the PP-OCRv3 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar && tar xf ch_PP-OCRv3_rec_slim_infer.tar
# Download the PP-Structurev2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
W
WenmuZhou 已提交
56
cd ..
57
# run
文幕地方's avatar
文幕地方 已提交
58 59 60 61 62 63 64 65 66
python3.7 table/predict_table.py \
    --det_model_dir=inference/ch_PP-OCRv3_det_slim_infer \
    --rec_model_dir=inference/ch_PP-OCRv3_rec_slim_infer  \
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --image_dir=docs/table/table.jpg \
    --output=../output/table

W
WenmuZhou 已提交
67
```
68

文幕地方's avatar
文幕地方 已提交
69
After the operation is completed, the excel table of each image will be saved to the directory specified by the output field, and an html file will be produced in the directory to visually view the cell coordinates and the recognized table.
W
WenmuZhou 已提交
70

W
WenmuZhou 已提交
71
### 3.2 Train
W
WenmuZhou 已提交
72

文幕地方's avatar
文幕地方 已提交
73
The training, evaluation and inference process of the text detection model can be referred to [detection](../../doc/doc_en/detection_en.md)
W
WenmuZhou 已提交
74

文幕地方's avatar
文幕地方 已提交
75
The training, evaluation and inference process of the text recognition model can be referred to [recognition](../../doc/doc_en/recognition_en.md)
文幕地方's avatar
文幕地方 已提交
76

文幕地方's avatar
文幕地方 已提交
77
The training, evaluation and inference process of the table recognition model can be referred to [table_recognition](../../doc/doc_en/table_recognition_en.md)
文幕地方's avatar
文幕地方 已提交
78

文幕地方's avatar
文幕地方 已提交
79
### 3.3 Calculate TEDS
W
WenmuZhou 已提交
80

W
WenmuZhou 已提交
81
The table uses [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) as the evaluation metric of the model. Before the model evaluation, the three models in the pipeline need to be exported as inference models (we have provided them), and the gt for evaluation needs to be prepared. Examples of gt are as follows:
文幕地方's avatar
文幕地方 已提交
82 83 84 85 86 87 88 89
```txt
PMC5755158_010_01.png    <html><body><table><thead><tr><td></td><td><b>Weaning</b></td><td><b>Week 15</b></td><td><b>Off-test</b></td></tr></thead><tbody><tr><td>Weaning</td><td>–</td><td>–</td><td>–</td></tr><tr><td>Week 15</td><td>–</td><td>0.17 ± 0.08</td><td>0.16 ± 0.03</td></tr><tr><td>Off-test</td><td>–</td><td>0.80 ± 0.24</td><td>0.19 ± 0.09</td></tr></tbody></table></body></html>
```
Each line in gt consists of the file name and the html string of the table. The file name and the html string of the table are separated by `\t`.

You can also use the following command to generate an evaluation gt file from the annotation file:
```python
python3 ppstructure/table/convert_label2html.py --ori_gt_path /path/to/your_label_file --save_path /path/to/save_file
W
WenmuZhou 已提交
90 91 92 93
```

Use the following command to evaluate. After the evaluation is completed, the teds indicator will be output.
```python
文幕地方's avatar
文幕地方 已提交
94 95 96 97 98 99 100 101 102 103
python3 table/eval_table.py \
    --det_model_dir=path/to/det_model_dir \
    --rec_model_dir=path/to/rec_model_dir \
    --table_model_dir=path/to/table_model_dir \
    --image_dir=../doc/table/1.png \
    --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
    --det_limit_side_len=736 \
    --det_limit_type=min \
    --gt_path=path/to/gt.txt
W
WenmuZhou 已提交
104 105
```

W
WenmuZhou 已提交
106 107
If the PubLatNet eval dataset is used, it will be output
```bash
文幕地方's avatar
文幕地方 已提交
108
teds: 94.98
W
WenmuZhou 已提交
109
```
W
WenmuZhou 已提交
110

文幕地方's avatar
文幕地方 已提交
111
## 4. Reference
W
WenmuZhou 已提交
112
1. https://github.com/ibm-aur-nlp/PubTabNet
113
2. https://arxiv.org/pdf/1911.10683