diff --git a/configs/picodet/legacy_model/application/layout_analysis/README.md b/configs/picodet/legacy_model/application/layout_analysis/README.md index 9fe3d361825beaddff761b8f0a4f43d5dbffed09..8bc43178b75292079457a059b865594f08a6666a 100644 --- a/configs/picodet/legacy_model/application/layout_analysis/README.md +++ b/configs/picodet/legacy_model/application/layout_analysis/README.md @@ -11,33 +11,38 @@ ### 1.1 数据集 -训练版面分析模型时主要用到了以下几个数据集。 - -| dataset | 简介 | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| [cTDaR2019_cTDaR](https://cndplab-founder.github.io/cTDaR2019/) | 用于表格检测(TRACKA)和表格识别(TRACKB)。图片类型包含历史数据集(以cTDaR_t0开头,如cTDaR_t00872.jpg)和现代数据集(以cTDaR_t1开头,cTDaR_t10482.jpg)。 | -| [IIIT-AR-13K](http://cvit.iiit.ac.in/usodi/iiitar13k.php) | 手动注释公开的年度报告中的图形或页面而构建的数据集,包含5类:table, figure, natural image, logo, and signature | -| [CDLA](https://github.com/buptlihang/CDLA) | 中文文档版面分析数据集,面向中文文献类(论文)场景,包含10类:Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation | -| [TableBank](https://github.com/doc-analysis/TableBank) | 用于表格检测和识别大型数据集,包含Word和Latex2种文档格式 | -| [DocBank](https://github.com/doc-analysis/DocBank) | 使用弱监督方法构建的大规模数据集(500K文档页面),用于文档布局分析,包含12类:Author、Caption、Date、Equation、Figure、Footer、List、Paragraph、Reference、Section、Table、Title | - +使用[PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)训练英文文档版面分析模型,该数据面向英文文献类(论文)场景,分别训练集(333,703张标注图片)、验证集(11,245张标注图片)和测试集(11,405张图片),包含5类:Table、Figure、Title、Text、List,更多[版面分析数据集](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README.md#32) ### 1.2 模型库 +使用PicoDet模型在PubLayNet数据集进行训练,同时采用FGD蒸馏,预训练模型如下: + | 模型 | 图像输入尺寸 | mAPval
0.5 | 下载地址 | 配置文件 | | :-------- | :--------: | :----------------: | :---------------: | ----------------- | -| PicoDet-LCNet_x1_0 | 800*608 | 93.5 | [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams) | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) | [config](./picodet_lcnet_x1_0_layout.yml) | -| PicoDet-LCNet_x1_0 + FGD | 800*608 | 94 | [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar) | [teacher config](./picodet_lcnet_x2_5_layout.yml)|[student config](./picodet_lcnet_x1_0_layout.yml) | +| PicoDet-LCNet_x1_0 | 800*608 | 93.5% | [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams) | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar) | [config](./picodet_lcnet_x1_0_layout.yml) | +| PicoDet-LCNet_x1_0 + FGD | 800*608 | 94.0% | [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) | [teacher config](./picodet_lcnet_x2_5_layout.yml)|[student config](./picodet_lcnet_x1_0_layout.yml) | + + [FGD蒸馏介绍](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/slim/distill/README.md) ### 1.3 模型推理 -下载模型库中的inference_model模型,版面恢复任务进行推理,可以执行如下命令: +了解版面分析整个流程(数据准备、模型训练、评估等),请参考[版面分析](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README.md),这里仅展示模型推理过程。首先下载模型库中的inference_model模型。 + +``` +mkdir inference_model +cd inference_model +# 下载并解压PubLayNet推理模型 +wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar +cd .. +``` + +版面恢复任务进行推理,可以执行如下命令: ```bash python3 deploy/python/infer.py \ - --model_dir=picodet_lcnet_x1_0_layout/ \ - --image_file=docs/images/layout.jpg \ - --device=CPU + --model_dir=inference_model/picodet_lcnet_x1_0_fgd_layout_infer/ \ + --image_file=docs/images/layout.jpg \ + --device=CPU ``` 可视化版面结果如下图所示: @@ -46,3 +51,6 @@ python3 deploy/python/infer.py \ +## 2 Reference + +[1] Zhong X, Tang J, Yepes A J. Publaynet: largest dataset ever for document layout analysis[C]//2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019: 1015-1022. diff --git a/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml b/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml index 25acd05d37ecdb0dc886a92bb19b05789a5c0c85..b4bec58d7a9d036c943b67e6906bfa6d439ab052 100644 --- a/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml +++ b/configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml @@ -44,6 +44,10 @@ TestDataset: worker_num: 8 +eval_height: &eval_height 800 +eval_width: &eval_width 608 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {}