提交 d1b31bf8 编写于 作者: 文幕地方's avatar 文幕地方

add ref

上级 929ee466
...@@ -2,21 +2,19 @@ Global: ...@@ -2,21 +2,19 @@ Global:
use_gpu: true use_gpu: true
epoch_num: 17 epoch_num: 17
log_smooth_window: 20 log_smooth_window: 20
print_batch_step: 5 print_batch_step: 100
save_model_dir: ./output/table_master/ save_model_dir: ./output/table_master/
save_epoch_step: 17 save_epoch_step: 17
# evaluation is run every 400 iterations after the 0th iteration eval_batch_step: [0, 6259]
eval_batch_step: [0, 400] cal_metric_during_train: true
cal_metric_during_train: True pretrained_model: null
pretrained_model:
checkpoints: checkpoints:
save_inference_dir: save_inference_dir: output/table_master/infer
use_visualdl: False use_visualdl: false
infer_img: ppstructure/docs/table/table.jpg infer_img: ppstructure/docs/table/table.jpg
save_res_path: output/table_master save_res_path: ./output/table_master
# for data or label process
character_dict_path: ppocr/utils/dict/table_master_structure_dict.txt character_dict_path: ppocr/utils/dict/table_master_structure_dict.txt
infer_mode: False infer_mode: false
max_text_length: 500 max_text_length: 500
process_total_num: 0 process_total_num: 0
process_cut_num: 0 process_cut_num: 0
...@@ -33,8 +31,8 @@ Optimizer: ...@@ -33,8 +31,8 @@ Optimizer:
gamma: 0.1 gamma: 0.1
warmup_epoch: 0.02 warmup_epoch: 0.02
regularizer: regularizer:
name: 'L2' name: L2
factor: 0.00000 factor: 0.0
Architecture: Architecture:
model_type: table model_type: table
...@@ -67,15 +65,15 @@ PostProcess: ...@@ -67,15 +65,15 @@ PostProcess:
Metric: Metric:
name: TableMetric name: TableMetric
main_indicator: acc main_indicator: acc
compute_bbox_metric: true # cost many time, set False for training compute_bbox_metric: False
Train: Train:
dataset: dataset:
name: PubTabDataSet name: PubTabDataSet
data_dir: /home/zhoujun20/table/PubTabNe/pubtabnet/train/ data_dir: train_data/table/pubtabnet/train/
label_file_list: [/home/zhoujun20/table/PubTabNe/pubtabnet/PubTabNet_2.0.0_train.jsonl] label_file_list: [train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl]
transforms: transforms:
- DecodeImage: # load image - DecodeImage:
img_mode: BGR img_mode: BGR
channel_first: False channel_first: False
- TableMasterLabelEncode: - TableMasterLabelEncode:
...@@ -88,20 +86,20 @@ Train: ...@@ -88,20 +86,20 @@ Train:
- PaddingTableImage: - PaddingTableImage:
size: [480, 480] size: [480, 480]
- TableBoxEncode: - TableBoxEncode:
use_xywh: true use_xywh: True
- NormalizeImage: - NormalizeImage:
scale: 1./255. scale: 1./255.
mean: [0.5, 0.5, 0.5] mean: [0.5, 0.5, 0.5]
std: [0.5, 0.5, 0.5] std: [0.5, 0.5, 0.5]
order: 'hwc' order: hwc
- ToCHWImage: - ToCHWImage: null
- KeepKeys: - KeepKeys:
keep_keys: ['image', 'structure', 'bboxes', 'bbox_masks','shape'] keep_keys: [image, structure, bboxes, bbox_masks, shape]
loader: loader:
shuffle: True shuffle: True
batch_size_per_card: 8 batch_size_per_card: 10
drop_last: True drop_last: True
num_workers: 1 num_workers: 8
Eval: Eval:
dataset: dataset:
...@@ -109,7 +107,7 @@ Eval: ...@@ -109,7 +107,7 @@ Eval:
data_dir: /home/zhoujun20/table/PubTabNe/pubtabnet/val/ data_dir: /home/zhoujun20/table/PubTabNe/pubtabnet/val/
label_file_list: [/home/zhoujun20/table/PubTabNe/pubtabnet/val_500.jsonl] label_file_list: [/home/zhoujun20/table/PubTabNe/pubtabnet/val_500.jsonl]
transforms: transforms:
- DecodeImage: # load image - DecodeImage:
img_mode: BGR img_mode: BGR
channel_first: False channel_first: False
- TableMasterLabelEncode: - TableMasterLabelEncode:
...@@ -120,19 +118,19 @@ Eval: ...@@ -120,19 +118,19 @@ Eval:
max_len: 480 max_len: 480
resize_bboxes: True resize_bboxes: True
- PaddingTableImage: - PaddingTableImage:
size: [ 480, 480 ] size: [480, 480]
- TableBoxEncode: - TableBoxEncode:
use_xywh: true use_xywh: True
- NormalizeImage: - NormalizeImage:
scale: 1./255. scale: 1./255.
mean: [ 0.5, 0.5, 0.5 ] mean: [0.5, 0.5, 0.5]
std: [ 0.5, 0.5, 0.5 ] std: [0.5, 0.5, 0.5]
order: 'hwc' order: hwc
- ToCHWImage: - ToCHWImage: null
- KeepKeys: - KeepKeys:
keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks','shape' ] keep_keys: [image, structure, bboxes, bbox_masks, shape]
loader: loader:
shuffle: False shuffle: False
drop_last: False drop_last: False
batch_size_per_card: 2 batch_size_per_card: 10
num_workers: 8 num_workers: 8
\ No newline at end of file
# FCENet # FCENet
- [1. 算法简介](#1) - [1. 算法简介](#1-算法简介)
- [2. 环境配置](#2) - [2. 环境配置](#2-环境配置)
- [3. 模型训练、评估、预测](#3) - [3. 模型训练、评估、预测](#3-模型训练评估预测)
- [3.1 训练](#3-1) - [4. 推理部署](#4-推理部署)
- [3.2 评估](#3-2) - [4.1 Python推理](#41-python推理)
- [3.3 预测](#3-3) - [4.2 C++推理](#42-c推理)
- [4. 推理部署](#4) - [4.3 Serving服务化部署](#43-serving服务化部署)
- [4.1 Python推理](#4-1) - [4.4 更多推理部署](#44-更多推理部署)
- [4.2 C++推理](#4-2) - [5. FAQ](#5-faq)
- [4.3 Serving服务化部署](#4-3) - [引用](#引用)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a> <a name="1"></a>
## 1. 算法简介 ## 1. 算法简介
......
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
- [1.1 文本检测算法](#11-文本检测算法) - [1.1 文本检测算法](#11-文本检测算法)
- [1.2 文本识别算法](#12-文本识别算法) - [1.2 文本识别算法](#12-文本识别算法)
- [2. 端到端算法](#2-端到端算法) - [2. 端到端算法](#2-端到端算法)
- [3. 表格识别算法](#3-表格识别算法)
本文给出了PaddleOCR已支持的OCR算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md) 本文给出了PaddleOCR已支持的OCR算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)
...@@ -96,3 +97,14 @@ ...@@ -96,3 +97,14 @@
已支持的端到端OCR算法列表(戳链接获取使用教程): 已支持的端到端OCR算法列表(戳链接获取使用教程):
- [x] [PGNet](./algorithm_e2e_pgnet.md) - [x] [PGNet](./algorithm_e2e_pgnet.md)
## 3. 表格识别算法
已支持的表格识别算法列表(戳链接获取使用教程):
- [x] [TableMaster](./algorithm_table_master.md)
在PubTabNet表格识别公开数据集上,算法效果如下:
|模型|骨干网络|配置文件|acc|下载链接|
|---|---|---|---|---|
|TableMaster|TableResNetExtra|[configs/table/table_master.yml](../../configs/table/table_master.yml)|77.47%|[训练模型]|[训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar)/[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_infer.tar)|
# 表格识别算法-TableMASTER
- [1. 算法简介](#1-算法简介)
- [2. 环境配置](#2-环境配置)
- [3. 模型训练、评估、预测](#3-模型训练评估预测)
- [4. 推理部署](#4-推理部署)
- [4.1 Python推理](#41-python推理)
- [4.2 C++推理部署](#42-c推理部署)
- [4.3 Serving服务化部署](#43-serving服务化部署)
- [4.4 更多推理部署](#44-更多推理部署)
- [5. FAQ](#5-faq)
- [引用](#引用)
<a name="1"></a>
## 1. 算法简介
论文信息:
> [TableMaster: PINGAN-VCGROUP’S SOLUTION FOR ICDAR 2021 COMPETITION ON SCIENTIFIC LITERATURE PARSING TASK B: TABLE RECOGNITION TO HTML](https://arxiv.org/pdf/2105.01848.pdf)
> Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong
> 2021
在PubTabNet表格识别公开数据集上,算法复现效果如下:
|模型|骨干网络|配置文件|acc|下载链接|
| --- | --- | --- | --- | --- |
|TableMaster|TableResNetExtra|[configs/table/table_master.yml](../../configs/table/table_master.yml)|77.47%|[训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar)/[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_infer.tar)|
<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
<a name="3"></a>
## 3. 模型训练、评估、预测
上述TableMaster模型使用PubTabNet表格识别公开数据集训练得到,数据集下载可参考 [table_datasets](./dataset/table_datasets.md)
数据下载完成后,请参考[文本识别教程](./recognition.md)进行训练。PaddleOCR对代码进行了模块化,训练不同的模型只需要**更换配置文件**即可。
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将训练得到best模型,转换成inference model。以基于TableResNetExtra骨干网络,在PubTabNet数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/contribution/table_master.tar)),可以使用如下命令进行转换:
```shell
# 注意将pretrained_model的路径设置为本地路径。
python3 tools/export_model.py -c configs/table/table_master.yml -o Global.pretrained_model=output/table_master/best_accuracy Global.save_inference_dir=./inference/table_master
```
**注意:**
- 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的`character_dict_path`是否为所正确的字典文件。
转换成功后,在目录下有三个文件:
```
/inference/table_master/
├── inference.pdiparams # 识别inference模型的参数文件
├── inference.pdiparams.info # 识别inference模型的参数信息,可忽略
└── inference.pdmodel # 识别inference模型的program文件
```
执行如下命令进行模型推理:
```shell
cd ppstructure/
python3.7 table/predict_structure.py --table_model_dir=../output/table_master/table_structure_tablemaster_infer/ --table_algorithm=TableMaster --table_char_dict_path=../ppocr/utils/dict/table_master_structure_dict.txt --table_max_len=480 --image_dir=docs/table/table.jpg
# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='docs/table'。
```
执行命令后,上面图像的预测结果(结构信息和表格中每个单元格的坐标)会打印到屏幕上,同时会保存单元格坐标的可视化结果。示例如下:
结果如下:
```shell
[2022/06/16 13:06:54] ppocr INFO: result: ['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>'], [[72.17591094970703, 10.759100914001465, 60.29658508300781, 16.6805362701416], [161.85562133789062, 10.884308815002441, 14.9495210647583, 16.727018356323242], [277.79876708984375, 29.54340362548828, 31.490320205688477, 18.143272399902344],
...
[336.11724853515625, 280.3601989746094, 39.456939697265625, 18.121286392211914]]
[2022/06/16 13:06:54] ppocr INFO: save vis result to ./output/table.jpg
[2022/06/16 13:06:54] ppocr INFO: Predict time of docs/table/table.jpg: 17.36806297302246
```
**注意**
- TableMaster在推理时比较慢,建议使用GPU进行使用。
<a name="4-2"></a>
### 4.2 C++推理部署
由于C++预处理后处理还未支持TableMaster,所以暂未支持
<a name="4-3"></a>
### 4.3 Serving服务化部署
暂不支持
<a name="4-4"></a>
### 4.4 更多推理部署
暂不支持
<a name="5"></a>
## 5. FAQ
## 引用
```bibtex
@article{ye2021pingan,
title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML},
author={Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong},
journal={arXiv preprint arXiv:2105.01848},
year={2021}
}
```
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
* [1.1 Text Detection Algorithms](#11) * [1.1 Text Detection Algorithms](#11)
* [1.2 Text Recognition Algorithms](#12) * [1.2 Text Recognition Algorithms](#12)
- [2. End-to-end Algorithms](#2) - [2. End-to-end Algorithms](#2)
- [3. Table Recognition Algorithms](#3)
This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md). This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md).
...@@ -95,3 +96,15 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r ...@@ -95,3 +96,15 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
Supported end-to-end algorithms (Click the link to get the tutorial): Supported end-to-end algorithms (Click the link to get the tutorial):
- [x] [PGNet](./algorithm_e2e_pgnet_en.md) - [x] [PGNet](./algorithm_e2e_pgnet_en.md)
<a name="3"></a>
## 3. Table Recognition Algorithms
Supported table recognition algorithms (Click the link to get the tutorial):
- [x] [TableMaster](./algorithm_table_master_en.md)
On the PubTabNet dataset, the algorithm result is as follows:
|Model|Backbone|Config|Acc|Download link|
|---|---|---|---|---|
|TableMaster|TableResNetExtra|[configs/table/table_master.yml](../../configs/table/table_master.yml)|77.47%|[训练模型]|[训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar)/[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_infer.tar)|
# Torm Recognition Algorithm-TableMASTER
- [1. Introduction](#1-introduction)
- [2. Environment](#2-environment)
- [3. Model Training / Evaluation / Prediction](#3-model-training--evaluation--prediction)
- [4. Inference and Deployment](#4-inference-and-deployment)
- [4.1 Python Inference](#41-python-inference)
- [4.2 C++ Inference](#42-c-inference)
- [4.3 Serving](#43-serving)
- [4.4 More](#44-more)
- [5. FAQ](#5-faq)
- [Citation](#citation)
<a name="1"></a>
## 1. Introduction
Paper:
> [TableMaster: PINGAN-VCGROUP’S SOLUTION FOR ICDAR 2021 COMPETITION ON SCIENTIFIC LITERATURE PARSING TASK B: TABLE RECOGNITION TO HTML](https://arxiv.org/pdf/2105.01848.pdf)
> Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong
> 2021
On the PubTabNet table recognition public data set, the algorithm reproduction acc is as follows:
|Model|Backbone|Cnnfig|Acc|Download link|
| --- | --- | --- | --- | --- |
|TableMaster|TableResNetExtra|[configs/table/table_master.yml](../../configs/table/table_master.yml)|77.47%|[train model](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar)/[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_infer.tar)|
<a name="2"></a>
## 2. Environment
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
The above TableMaster model is trained using the PubTabNet table recognition public dataset. For the download of the dataset, please refer to [table_datasets](./dataset/table_datasets_en.md).
After the data download is complete, please refer to [Text Recognition Training Tutorial](./recognition_en.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models.
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, convert the model saved in the TableMaster table recognition training process into an inference model. Taking the model based on the TableResNetExtra backbone network and trained on the PubTabNet dataset as example ([model download link](https://paddleocr.bj.bcebos.com/contribution/table_master.tar)), you can use the following command to convert:
```shell
python3 tools/export_model.py -c configs/table/table_master.yml -o Global.pretrained_model=output/table_master/best_accuracy Global.save_inference_dir=./inference/table_master
```
**Note: **
- If you trained the model on your own dataset and adjusted the dictionary file, please pay attention to whether the `character_dict_path` in the modified configuration file is the correct dictionary file
Execute the following command for model inference:
```shell
cd ppstructure/
# When predicting all images in a folder, you can modify image_dir to a folder, such as --image_dir='docs/table'.
python3.7 table/predict_structure.py --table_model_dir=../output/table_master/table_structure_tablemaster_infer/ --table_algorithm=TableMaster --table_char_dict_path=../ppocr/utils/dict/table_master_structure_dict.txt --table_max_len=480 --image_dir=docs/table/table.jpg
```
After executing the command, the prediction results of the above image (structural information and the coordinates of each cell in the table) are printed to the screen, and the visualization of the cell coordinates is also saved. An example is as follows:
result:
```shell
[2022/06/16 13:06:54] ppocr INFO: result: ['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>'], [[72.17591094970703, 10.759100914001465, 60.29658508300781, 16.6805362701416], [161.85562133789062, 10.884308815002441, 14.9495210647583, 16.727018356323242], [277.79876708984375, 29.54340362548828, 31.490320205688477, 18.143272399902344],
...
[336.11724853515625, 280.3601989746094, 39.456939697265625, 18.121286392211914]]
[2022/06/16 13:06:54] ppocr INFO: save vis result to ./output/table.jpg
[2022/06/16 13:06:54] ppocr INFO: Predict time of docs/table/table.jpg: 17.36806297302246
```
**Note**:
- TableMaster is relatively slow during inference, and it is recommended to use GPU for use.
<a name="4-2"></a>
### 4.2 C++ Inference
Since the post-processing is not written in CPP, the TableMaster does not support CPP inference.
<a name="4-3"></a>
### 4.3 Serving
Not supported
<a name="4-4"></a>
### 4.4 More
Not supported
<a name="5"></a>
## 5. FAQ
## Citation
```bibtex
@article{ye2021pingan,
title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML},
author={Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong},
journal={arXiv preprint arXiv:2105.01848},
year={2021}
}
```
...@@ -670,6 +670,10 @@ class TableLabelEncode(AttnLabelEncode): ...@@ -670,6 +670,10 @@ class TableLabelEncode(AttnLabelEncode):
return data return data
def _merge_no_span_structure(self, structure): def _merge_no_span_structure(self, structure):
"""
This fun code is refer from:
https://github.com/JiaquanYe/TableMASTER-mmocr/blob/master/table_recognition/data_preprocess.py
"""
new_structure = [] new_structure = []
i = 0 i = 0
while i < len(structure): while i < len(structure):
...@@ -682,6 +686,11 @@ class TableLabelEncode(AttnLabelEncode): ...@@ -682,6 +686,11 @@ class TableLabelEncode(AttnLabelEncode):
return new_structure return new_structure
def _replace_empty_cell_token(self, token_list, cells): def _replace_empty_cell_token(self, token_list, cells):
"""
This fun code is refer from:
https://github.com/JiaquanYe/TableMASTER-mmocr/blob/master/table_recognition/data_preprocess.py
"""
bbox_idx = 0 bbox_idx = 0
add_empty_bbox_token_list = [] add_empty_bbox_token_list = []
for token in token_list: for token in token_list:
......
...@@ -11,6 +11,11 @@ ...@@ -11,6 +11,11 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""
This fun code is refer from:
https://github.com/JiaquanYe/TableMASTER-mmocr/tree/master/mmocr/models/textrecog/losses
"""
import paddle import paddle
from paddle import nn from paddle import nn
......
...@@ -11,6 +11,10 @@ ...@@ -11,6 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""
This fun code is refer from:
https://github.com/JiaquanYe/TableMASTER-mmocr/blob/master/mmocr/models/textrecog/backbones/table_resnet_extra.py
"""
import paddle import paddle
import paddle.nn as nn import paddle.nn as nn
......
...@@ -11,6 +11,11 @@ ...@@ -11,6 +11,11 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""
This fun code is refer from:
https://github.com/JiaquanYe/TableMASTER-mmocr/blob/master/mmocr/models/textrecog/decoders/master_decoder.py
"""
import copy import copy
import math import math
import paddle import paddle
......
...@@ -26,7 +26,7 @@ from .east_postprocess import EASTPostProcess ...@@ -26,7 +26,7 @@ from .east_postprocess import EASTPostProcess
from .sast_postprocess import SASTPostProcess from .sast_postprocess import SASTPostProcess
from .fce_postprocess import FCEPostProcess from .fce_postprocess import FCEPostProcess
from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, \ from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, \
DistillationCTCLabelDecode, TableLabelDecode, NRTRLabelDecode, SARLabelDecode, \ DistillationCTCLabelDecode, NRTRLabelDecode, SARLabelDecode, \
SEEDLabelDecode, PRENLabelDecode, ViTSTRLabelDecode, ABINetLabelDecode SEEDLabelDecode, PRENLabelDecode, ViTSTRLabelDecode, ABINetLabelDecode
from .cls_postprocess import ClsPostProcess from .cls_postprocess import ClsPostProcess
from .pg_postprocess import PGPostProcess from .pg_postprocess import PGPostProcess
......
...@@ -35,7 +35,7 @@ ...@@ -35,7 +35,7 @@
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- | | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | |en_ppocr_mobile_v2.0_table_structure|PubTabNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
<a name="3"></a> <a name="3"></a>
## 3. VQA模型 ## 3. VQA模型
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册