README.md 8.3 KB
Newer Older
文幕地方's avatar
文幕地方 已提交
1 2 3 4 5 6 7 8 9 10
- [Table Recognition](#table-recognition)
  - [1. pipeline](#1-pipeline)
  - [2. Performance](#2-performance)
  - [3. How to use](#3-how-to-use)
    - [3.1 quick start](#31-quick-start)
    - [3.2 Train](#32-train)
    - [3.3 Eval](#33-eval)
    - [3.4 Inference](#34-inference)


M
MissPenguin 已提交
11
# Table Recognition
W
WenmuZhou 已提交
12 13

## 1. pipeline
M
MissPenguin 已提交
14
The table recognition mainly contains three models
W
WenmuZhou 已提交
15 16 17 18
1. Single line text detection-DB
2. Single line text recognition-CRNN
3. Table structure and cell coordinate prediction-RARE

M
MissPenguin 已提交
19
The table recognition flow chart is as follows
W
WenmuZhou 已提交
20

M
MissPenguin 已提交
21
![tableocr_pipeline](../docs/table/tableocr_pipeline_en.jpg)
W
WenmuZhou 已提交
22 23 24 25 26 27

1. The coordinates of single-line text is detected by DB model, and then sends it to the recognition model to get the recognition result.
2. The table structure and cell coordinates is predicted by RARE model.
3. The recognition result of the cell is combined by the coordinates, recognition result of the single line and the coordinates of the cell.
4. The cell recognition result and the table structure together construct the html string of the table.

W
WenmuZhou 已提交
28 29
## 2. Performance
We evaluated the algorithm on the PubTabNet<sup>[1]</sup> eval dataset, and the performance is as follows:
W
WenmuZhou 已提交
30

W
WenmuZhou 已提交
31 32

|Method|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|
33 34
| --- | --- |
| EDD<sup>[2]</sup> | 88.3 |
文幕地方's avatar
文幕地方 已提交
35 36
| TableRec-RARE(ours) | 93.32 |
| SLANet(ours) | 94.98 |
W
WenmuZhou 已提交
37 38 39 40

## 3. How to use

### 3.1 quick start
W
WenmuZhou 已提交
41

文幕地方's avatar
文幕地方 已提交
42 43
- table recognition

W
WenmuZhou 已提交
44 45 46 47 48
```python
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
文幕地方's avatar
文幕地方 已提交
49 50 51 52 53 54
# Download the PP-OCRv3 text detection model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar && tar xf ch_PP-OCRv3_det_slim_infer.tar
# Download the PP-OCRv3 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar && tar xf ch_PP-OCRv3_rec_slim_infer.tar
# Download the PP-Structurev2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
W
WenmuZhou 已提交
55
cd ..
56
# run
文幕地方's avatar
文幕地方 已提交
57 58 59 60 61 62 63 64 65
python3.7 table/predict_table.py \
    --det_model_dir=inference/ch_PP-OCRv3_det_slim_infer \
    --rec_model_dir=inference/ch_PP-OCRv3_rec_slim_infer  \
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --image_dir=docs/table/table.jpg \
    --output=../output/table

W
WenmuZhou 已提交
66
```
67

文幕地方's avatar
文幕地方 已提交
68
After the operation is completed, the excel table of each image will be saved to the directory specified by the output field, and an html file will be produced in the directory to visually view the cell coordinates and the recognized table.
W
WenmuZhou 已提交
69

文幕地方's avatar
文幕地方 已提交
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
- table structure recognition
```python
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the PP-Structurev2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd ..
# run
python3.7 table/predict_structure.py \
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --image_dir=docs/table/table.jpg \
    --output=../output/table
```
After the run is complete, the visualization of the detection frame of the cell will be saved to the directory specified by the output field.

W
WenmuZhou 已提交
88
### 3.2 Train
W
WenmuZhou 已提交
89 90 91

In this chapter, we only introduce the training of the table structure model, For model training of [text detection](../../doc/doc_en/detection_en.md) and [text recognition](../../doc/doc_en/recognition_en.md), please refer to the corresponding documents

文幕地方's avatar
文幕地方 已提交
92
* data preparation  
文幕地方's avatar
文幕地方 已提交
93 94 95 96 97 98 99 100 101 102

For the Chinese model and the English model, the data sources are different, as follows:

English dataset: The training data uses public data set [PubTabNet](https://arxiv.org/abs/1911.10683 ), Can be downloaded from the official [website](https://github.com/ibm-aur-nlp/PubTabNet) 。The PubTabNet data set contains about 500,000 images, as well as annotations in html format。

Chinese dataset: The Chinese dataset consists of the following two parts, which are trained with a 1:1 sampling ratio.
>   1. Generate dataset: Use [Table Generation Tool](https://github.com/WenmuZhou/TableGeneration) to generate 40,000 images.
>   2. Crop 10,000 images from [WTW](https://github.com/wangwen-whu/WTW-Dataset).

For a detailed introduction to public datasets, please refer to [table_datasets](../../doc/doc_en/dataset/table_datasets_en.md). The following training and evaluation procedures are based on the English dataset as an example.
W
WenmuZhou 已提交
103

文幕地方's avatar
文幕地方 已提交
104
* Start training  
W
WenmuZhou 已提交
105 106 107 108 109 110 111 112 113 114 115 116
*If you are installing the cpu version of paddle, please modify the `use_gpu` field in the configuration file to false*
```shell
# single GPU training
python3 tools/train.py -c configs/table/table_mv3.yml
# multi-GPU training
# Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
```

In the above instruction, use `-c` to select the training to use the `configs/table/table_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](../../doc/doc_en/config_en.md).

文幕地方's avatar
文幕地方 已提交
117
* load trained model and continue training
W
WenmuZhou 已提交
118 119 120 121 122 123 124 125

If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.

```shell
python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./your/trained/model
```

**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
W
WenmuZhou 已提交
126

W
WenmuZhou 已提交
127
### 3.3 Eval
W
WenmuZhou 已提交
128

W
WenmuZhou 已提交
129
The table uses [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) as the evaluation metric of the model. Before the model evaluation, the three models in the pipeline need to be exported as inference models (we have provided them), and the gt for evaluation needs to be prepared. Examples of gt are as follows:
文幕地方's avatar
文幕地方 已提交
130 131 132 133 134 135 136 137
```txt
PMC5755158_010_01.png    <html><body><table><thead><tr><td></td><td><b>Weaning</b></td><td><b>Week 15</b></td><td><b>Off-test</b></td></tr></thead><tbody><tr><td>Weaning</td><td>–</td><td>–</td><td>–</td></tr><tr><td>Week 15</td><td>–</td><td>0.17 ± 0.08</td><td>0.16 ± 0.03</td></tr><tr><td>Off-test</td><td>–</td><td>0.80 ± 0.24</td><td>0.19 ± 0.09</td></tr></tbody></table></body></html>
```
Each line in gt consists of the file name and the html string of the table. The file name and the html string of the table are separated by `\t`.

You can also use the following command to generate an evaluation gt file from the annotation file:
```python
python3 ppstructure/table/convert_label2html.py --ori_gt_path /path/to/your_label_file --save_path /path/to/save_file
W
WenmuZhou 已提交
138 139 140 141
```

Use the following command to evaluate. After the evaluation is completed, the teds indicator will be output.
```python
W
WenmuZhou 已提交
142
cd PaddleOCR/ppstructure
文幕地方's avatar
文幕地方 已提交
143
python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --gt_path=path/to/gt.txt
W
WenmuZhou 已提交
144 145
```

W
WenmuZhou 已提交
146 147
If the PubLatNet eval dataset is used, it will be output
```bash
文幕地方's avatar
文幕地方 已提交
148
teds: 94.98
W
WenmuZhou 已提交
149
```
W
WenmuZhou 已提交
150

W
WenmuZhou 已提交
151
### 3.4 Inference
W
WenmuZhou 已提交
152 153

```python
W
WenmuZhou 已提交
154
cd PaddleOCR/ppstructure
文幕地方's avatar
文幕地方 已提交
155
python3 table/predict_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
W
WenmuZhou 已提交
156
```
M
MissPenguin 已提交
157
After running, the excel sheet of each picture will be saved in the directory specified by the output field
W
WenmuZhou 已提交
158 159 160

Reference
1. https://github.com/ibm-aur-nlp/PubTabNet
161
2. https://arxiv.org/pdf/1911.10683