diff --git a/ppstructure/docs/models_list.md b/ppstructure/docs/models_list.md index 0b2f41deb5588c82238e93d835dc8c606e4fde2e..3b8c3790e8b7cc7fd1da6f04958daa8fccdb382a 100644 --- a/ppstructure/docs/models_list.md +++ b/ppstructure/docs/models_list.md @@ -10,13 +10,14 @@ ## 1. 版面分析模型 -|模型名称|模型简介|下载地址|label_map| +|模型名称|模型简介|推理模型大小|下载地址| | --- | --- | --- | --- | -| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| -| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| -| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| +| picodet_lcnet_x1_0_fgd_layout | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | +| picodet_lcnet_x1_0_fgd_layout_cdla | CDLA数据集训练的版面分析模型,可以划分为**表格、图片、图片标题、表格、表格标题、页眉、脚本、引用、公式**10类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | +| picodet_lcnet_x1_0_fgd_layout_table | 表格数据集训练的版面分析模型,只能检测表格 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | + ## 2. OCR和表格识别模型 diff --git a/ppstructure/docs/models_list_en.md b/ppstructure/docs/models_list_en.md index 7ba1d30464287eaf67a0265464fcc261e3b4407f..300f4c5666b3cc3cdef2f82e9c1f41f54a092e9e 100644 --- a/ppstructure/docs/models_list_en.md +++ b/ppstructure/docs/models_list_en.md @@ -4,18 +4,17 @@ - [2. OCR and Table Recognition](#2-ocr-and-table-recognition) - [2.1 OCR](#21-ocr) - [2.2 Table Recognition](#22-table-recognition) -- [3. VQA](#3-kie) -- [4. KIE](#4-kie) +- [3. KIE](#3-kie) ## 1. Layout Analysis -|model name| description |download|label_map| -| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | -| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **text, title, table, picture and list** | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| -| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| -| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| +|model name| description |download| +| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | +| picodet_lcnet_x1_0_fgd_layout | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | +| picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | +| picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can only detect tables | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | ## 2. OCR and Table Recognition @@ -40,19 +39,25 @@ If you need to use other OCR models, you can download the model in [PP-OCR model |ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model trained on PubTabNet dataset based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) | -## 3. VQA - -|model| description |inference model size|download| -| --- |----------------------------------------------------------------| --- | --- | -|ser_LayoutXLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) | -|re_LayoutXLM_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | -|ser_LayoutLMv2_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLMv2 |778M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) | -|re_LayoutLMv2_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLMv2 |765M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | -|ser_LayoutLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutLM |430M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | - - -## 4. KIE - -|model|description|model size|download| -| --- | --- | --- | --- | -|SDMGR|Key Information Extraction Model|78M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| +## 3. KIE + +On XFUND_zh dataset, Accuracy and time cost of different models on V100 GPU are as follows. + +|Model|Backbone|Task|Config|Hmean|Time cost(ms)|Download link| +| --- | --- | --- | --- | --- | --- |--- | +|VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**| 15.49| [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)| +|LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%| 19.49 |[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)| +|LayoutLM| LayoutLM-base | SER | [ser_layoutlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlm_xfund_zh.yml)|77.31%|-|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar)| +|LayoutLMv2| LayoutLMv2-base | SER | [ser_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlmv2_xfund_zh.yml)|85.44%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)| +|VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**|15.49|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)| +|LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|19.49|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)| +|LayoutLMv2| LayoutLMv2-base | RE | [re_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutlmv2_xfund_zh.yml)|67.77%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar)| + +* Note: The above time cost information just considers inference time without preprocess or postprocess, test environment: `V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4` + + +On wildreceipt dataset, the algorithm result is as follows: + +|Model|Backbone|Config|Hmean|Download link| +| --- | --- | --- | --- | --- | +|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| diff --git a/ppstructure/docs/recovery/recovery.jpg b/ppstructure/docs/recovery/recovery.jpg new file mode 100644 index 0000000000000000000000000000000000000000..a3817ab70eff5b380072701b70ab227ae6c8184c Binary files /dev/null and b/ppstructure/docs/recovery/recovery.jpg differ diff --git a/ppstructure/docs/table/recovery.jpg b/ppstructure/docs/table/recovery.jpg deleted file mode 100644 index bee2e2fb3499ec4b348e2b2f1475a87c9c562190..0000000000000000000000000000000000000000 Binary files a/ppstructure/docs/table/recovery.jpg and /dev/null differ diff --git a/ppstructure/layout/README.md b/ppstructure/layout/README.md index 3762544b834d752a705216ca3f93d326aa1391ad..f2dc9c0d6925603fd4af89bfff549ffe4549a5ed 100644 --- a/ppstructure/layout/README.md +++ b/ppstructure/layout/README.md @@ -63,7 +63,7 @@ python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simp git clone https://github.com/PaddlePaddle/PaddleDetection.git ``` -- **(2)安装其他依赖 ** +- **(2)安装其他依赖** ```bash cd PaddleDetection @@ -138,7 +138,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放, ``` { - + 'segmentation': # 物体的分割标注 'area': 60518.099043117836, # 物体的区域面积 'iscrowd': 0, # iscrowd @@ -166,15 +166,17 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放, 提供了训练脚本、评估脚本和预测脚本,本节将以PubLayNet预训练模型为例进行讲解。 -如果不希望训练,直接体验后面的模型评估、预测、动转静、推理的流程,可以下载提供的预训练模型,并跳过本部分。 +如果不希望训练,直接体验后面的模型评估、预测、动转静、推理的流程,可以下载提供的预训练模型(PubLayNet数据集),并跳过本部分。 ``` mkdir pretrained_model cd pretrained_model -# 下载并解压PubLayNet预训练模型 +# 下载PubLayNet预训练模型 wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams ``` +下载更多[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-%E7%89%88%E9%9D%A2%E5%88%86%E6%9E%90%E6%A8%A1%E5%9E%8B)(中文CDLA数据集预训练模型、表格预训练模型) + ### 4.1. 启动训练 开始训练: @@ -184,7 +186,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_ 如果你希望训练自己的数据集,需要修改配置文件中的数据配置、类别数。 -以`configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 为例,修改的内容如下所示。 +以`configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 为例,修改的内容如下所示。 ```yaml metric: COCO @@ -223,16 +225,20 @@ TestDataset: # 训练日志会自动保存到 log 目录中 # 单卡训练 +export CUDA_VISIBLE_DEVICES=0 python3 tools/train.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ --eval # 多卡训练,通过--gpus参数指定卡号 +export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ --eval ``` +**注意:**如果训练时显存out memory,将TrainReader中batch_size调小,同时LearningRate中base_lr等比例减小。发布的config均由8卡训练得到,如果改变GPU卡数为1,那么base_lr需要减小8倍。 + 正常启动训练后,会看到以下log输出: ``` @@ -254,9 +260,11 @@ PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for D 更换数据集,修改【TODO】配置中的数据配置、类别数,具体可以参考4.1。启动训练: ```bash -python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - --slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ +# 单卡训练 +export CUDA_VISIBLE_DEVICES=0 +python3 tools/train.py \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ --eval ``` @@ -267,13 +275,13 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ ### 5.1. 指标评估 -训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir`、`anno_path`和`dataset_dir` 设置。 +训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir`、`anno_path`和`dataset_dir` 设置。 ```bash # GPU 评估, weights 为待测权重 python3 tools/eval.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - -o weigths=./output/picodet_lcnet_x1_0_layout/best_model + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + -o weights=./output/picodet_lcnet_x1_0_layout/best_model ``` 会输出以下信息,打印出mAP、AP0.5等信息。 @@ -299,8 +307,8 @@ python3 tools/eval.py \ ``` python3 tools/eval.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - --slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ -o weights=output/picodet_lcnet_x2_5_layout/best_model ``` @@ -311,18 +319,17 @@ python3 tools/eval.py \ ### 5.2. 测试版面分析结果 -预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。 - -使用 PaddleDetection 训练好的模型,您可以使用如下命令进行中文模型预测。 +预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。 +使用 PaddleDetection 训练好的模型,您可以使用如下命令进行模型预测。 ```bash python3 tools/infer.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ -o weights='output/picodet_lcnet_x1_0_layout/best_model.pdparams' \ --infer_img='docs/images/layout.jpg' \ --output_dir=output_dir/ \ - --draw_threshold=0.4 + --draw_threshold=0.5 ``` - `--infer_img`: 推理单张图片,也可以通过`--infer_dir`推理文件中的所有图片。 @@ -335,16 +342,15 @@ python3 tools/infer.py \ ``` python3 tools/infer.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - --slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ -o weights='output/picodet_lcnet_x2_5_layout/best_model.pdparams' \ --infer_img='docs/images/layout.jpg' \ --output_dir=output_dir/ \ - --draw_threshold=0.4 + --draw_threshold=0.5 ``` - ## 6. 模型导出与预测 @@ -356,7 +362,7 @@ inference 模型(`paddle.jit.save`保存的模型) 一般是模型训练, ```bash python3 tools/export_model.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ -o weights=output/picodet_lcnet_x1_0_layout/best_model \ --output_dir=output_inference/ ``` @@ -377,8 +383,8 @@ FGD蒸馏模型转inference模型步骤如下: ```bash python3 tools/export_model.py \ - -c configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_student.yml \ - --slim_config configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_teacher.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ -o weights=./output/picodet_lcnet_x2_5_layout/best_model \ --output_dir=output_inference/ ``` @@ -404,7 +410,7 @@ python3 deploy/python/infer.py \ ------------------------------------------ ----------- Model Configuration ----------- Model Arch: PicoDet -Transform Order: +Transform Order: --transform op: Resize --transform op: NormalizeImage --transform op: Permute @@ -466,4 +472,3 @@ preprocess_time(ms): 2172.50, inference_time(ms): 11.90, postprocess_time(ms): 1 year={2022} } ``` - diff --git a/ppstructure/layout/__init__.py b/ppstructure/layout/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1d11e265597c7c8e39098a228108da3bb954b892 --- /dev/null +++ b/ppstructure/layout/__init__.py @@ -0,0 +1,13 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/ppstructure/recovery/README.md b/ppstructure/recovery/README.md index 713d0307dbbd66664db15d19df484af76efea75a..698bee08de07f6b308e79a6da52938cca18f83ef 100644 --- a/ppstructure/recovery/README.md +++ b/ppstructure/recovery/README.md @@ -9,7 +9,7 @@ English | [简体中文](README_ch.md) -## 1. Introduction +## 1. Introduction Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order. @@ -33,14 +33,14 @@ The following figure shows the result: python3 -m pip install --upgrade pip # GPU installation -python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple +python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simple # CPU installation -python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple +python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple ```` -For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick). +For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/macos-pip_en.html). @@ -67,38 +67,61 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt ## 3. Quick Start + +### 3.1 下载模型 + +If input is English document, download English models: + ```python cd PaddleOCR/ppstructure # download model mkdir inference && cd inference # Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it -wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar +https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar # Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it -wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar # Download the ultra-lightweight English table inch model and unzip it -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar +wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar # Download the layout model of publaynet dataset and unzip it -wget -https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar picodet_lcnet_x1_0_layout_infer.tar +wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar cd .. -# run +``` +If input is Chinese document,download Chinese models: +[Chinese and English ultra-lightweight PP-OCRv3 model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README.md#pp-ocr-series-model-listupdate-on-september-8th)、[表格识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#22-表格识别模型)、[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-版面分析模型) + + +### 3.2 版面恢复 + + +```bash python3 predict_system.py \ --image_dir=./docs/table/1.png \ --det_model_dir=inference/en_PP-OCRv3_det_infer \ - --rec_model_dir=inference/en_PP-OCRv3_rec_infe \ + --rec_model_dir=inference/en_PP-OCRv3_rec_infer \ --rec_char_dict_path=../ppocr/utils/en_dict.txt \ - --output=../output/ \ - --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \ + --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ - --table_max_len=488 \ - --layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \ + --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \ --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \ --vis_font_path=../doc/fonts/simfang.ttf \ --recovery=True \ - --save_pdf=False + --save_pdf=False \ + --output=../output/ ``` -After running, the docx of each picture will be saved in the directory specified by the output field - -Recovery table to Word code[table_process.py] reference:https://github.com/pqzx/html2docx.git \ No newline at end of file +After running, the docx of each picture will be saved in the directory specified by the output field + +Field: + +- image_dir:test file测试文件, can be picture, picture directory, pdf file, pdf file directory +- det_model_dir:OCR detection model path +- rec_model_dir:OCR recognition model path +- rec_char_dict_path:OCR recognition dict path. If the Chinese model is used, change to "../ppocr/utils/ppocr_keys_v1.txt". And if you trained the model on your own dataset, change to the trained dictionary +- table_model_dir:tabel recognition model path +- table_char_dict_path:tabel recognition dict path. If the Chinese model is used, no need to change +- layout_model_dir:layout analysis model path +- layout_dict_path:layout analysis dict path. If the Chinese model is used, change to "../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt" +- recovery:whether to enable layout of recovery, default False +- save_pdf:when recovery file, whether to save pdf file, default False +- output:save the recovery result path diff --git a/ppstructure/recovery/README_ch.md b/ppstructure/recovery/README_ch.md index 14ca8836a0332a5b0e119be4bf6bcb36fb011d1e..73405879faa3a934f00a4e9e3cdb3fb19434c66b 100644 --- a/ppstructure/recovery/README_ch.md +++ b/ppstructure/recovery/README_ch.md @@ -8,19 +8,22 @@ - [2.2 安装PaddleOCR](#2.2) - [3. 使用](#3) + - [3.1 下载模型](#3.1) + - [3.2 版面恢复](#3.2) -## 1. 简介 +## 1. 简介 版面恢复就是在OCR识别后,内容仍然像原文档图片那样排列着,段落不变、顺序不变的输出到word文档中等。 -版面恢复结合了[版面分析](../layout/README_ch.md)、[表格识别](../table/README_ch.md)技术,从而更好地恢复图片、表格、标题等内容,下图展示了版面恢复的结果: +版面恢复结合了[版面分析](../layout/README_ch.md)、[表格识别](../table/README_ch.md)技术,从而更好地恢复图片、表格、标题等内容,支持pdf文档、文档图片格式的输入文件,下图展示了版面恢复的结果:
- +
+ ## 2. 安装 @@ -35,10 +38,10 @@ python3 -m pip install --upgrade pip # GPU安装 -python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple +python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simple # CPU安装 -python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple +python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple ``` @@ -69,40 +72,66 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt ## 3. 使用 -恢复给定文档的版面: + -```python +### 3.1 下载模型 + +如果输入为英文文档类型,下载英文模型 + +``` cd PaddleOCR/ppstructure # 下载模型 mkdir inference && cd inference -# 下载超英文轻量级PP-OCRv3模型的检测模型并解压 -wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar -# 下载英文轻量级PP-OCRv3模型的识别模型并解压 -wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar -# 下载超轻量级英文表格英寸模型并解压 -wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar +# 下载英文超轻量PP-OCRv3检测模型并解压 +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar +# 下载英文超轻量PP-OCRv3识别模型并解压 +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar +# 下载英文表格识别模型并解压 +wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar # 下载英文版面分析模型 -wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar picodet_lcnet_x1_0_layout_infer.tar +wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar cd .. +``` + +如果输入为中文文档类型,在下述链接中下载中文模型即可: -# 执行预测 +[PP-OCRv3中英文超轻量文本检测和识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README_ch.md#pp-ocr%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E5%88%97%E8%A1%A8%E6%9B%B4%E6%96%B0%E4%B8%AD)、[表格识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#22-表格识别模型)、[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-版面分析模型) + + + +### 3.2 版面恢复 + +使用下载的模型恢复给定文档的版面,以英文模型为例,执行如下命令: + +```python python3 predict_system.py \ --image_dir=./docs/table/1.png \ --det_model_dir=inference/en_PP-OCRv3_det_infer \ - --rec_model_dir=inference/en_PP-OCRv3_rec_infe \ + --rec_model_dir=inference/en_PP-OCRv3_rec_infer \ --rec_char_dict_path=../ppocr/utils/en_dict.txt \ - --output=../output/ \ - --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \ + --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ - --table_max_len=488 \ - --layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \ + --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \ --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \ --vis_font_path=../doc/fonts/simfang.ttf \ --recovery=True \ - --save_pdf=False + --save_pdf=False \ + --output=../output/ ``` -运行完成后,每张图片的docx文档会保存到`output`字段指定的目录下 - -表格恢复到Word代码[table_process.py]来自:https://github.com/pqzx/html2docx.git +运行完成后,恢复版面的docx文档会保存到`output`字段指定的目录下 + +字段含义: + +- image_dir:测试文件,可以是图片、图片目录、pdf文件、pdf文件目录 +- det_model_dir:OCR检测模型路径 +- rec_model_dir:OCR识别模型路径 +- rec_char_dict_path:OCR识别字典,如果更换为中文模型,需要更改为"../ppocr/utils/ppocr_keys_v1.txt",如果您在自己的数据集上训练的模型,则更改为训练的字典的文件 +- table_model_dir:表格识别模型路径 +- table_char_dict_path:表格识别字典,如果更换为中文模型,不需要更换字典 +- layout_model_dir:版面分析模型路径 +- layout_dict_path:版面分析字典,如果更换为中文模型,需要更改为"../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt" +- recovery:是否进行版面恢复,默认False +- save_pdf:进行版面恢复导出docx文档的同时,是否保存为pdf文件,默认为False +- output:版面恢复结果保存路径 diff --git a/ppstructure/recovery/__init__.py b/ppstructure/recovery/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1d11e265597c7c8e39098a228108da3bb954b892 --- /dev/null +++ b/ppstructure/recovery/__init__.py @@ -0,0 +1,13 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/ppstructure/recovery/recovery_to_doc.py b/ppstructure/recovery/recovery_to_doc.py index 4401b1f27cf10f8483ee9b2b4a61315ad6aad264..0a556093d17050f65440f8e962015a86de696107 100644 --- a/ppstructure/recovery/recovery_to_doc.py +++ b/ppstructure/recovery/recovery_to_doc.py @@ -24,7 +24,7 @@ from docx.enum.section import WD_SECTION from docx.oxml.ns import qn from docx.enum.table import WD_TABLE_ALIGNMENT -from table_process import HtmlToDocx +from ppstructure.recovery.table_process import HtmlToDocx from ppocr.utils.logging import get_logger logger = get_logger() @@ -69,7 +69,7 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf): new_table = deepcopy(table) new_table.alignment = WD_TABLE_ALIGNMENT.CENTER paragraph.add_run().element.addnext(new_table._tbl) - + else: paragraph = doc.add_paragraph() paragraph_format = paragraph.paragraph_format @@ -86,10 +86,10 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf): # save to pdf if save_pdf: - pdf = os.path.join(save_folder, '{}.pdf'.format(img_name)) + pdf_path = os.path.join(save_folder, '{}.pdf'.format(img_name)) from docx2pdf import convert convert(docx_path, pdf_path) - logger.info('pdf save to {}'.format(pdf)) + logger.info('pdf save to {}'.format(pdf_path)) def sorted_layout_boxes(res, w): @@ -112,7 +112,7 @@ def sorted_layout_boxes(res, w): res_left = [] res_right = [] i = 0 - + while True: if i >= num_boxes: break @@ -137,7 +137,7 @@ def sorted_layout_boxes(res, w): res_left = [] res_right = [] break - elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3*w / 4: + elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3 * w / 4: _boxes[i]['layout'] = 'double' res_left.append(_boxes[i]) i += 1 @@ -157,4 +157,4 @@ def sorted_layout_boxes(res, w): new_res += res_left if res_right: new_res += res_right - return new_res \ No newline at end of file + return new_res diff --git a/ppstructure/utility.py b/ppstructure/utility.py index 270ee3aef9ced40f47eaa5dd9aac3054469d69a8..4df726118c0cca8ffc58a423357eda11e5221919 100644 --- a/ppstructure/utility.py +++ b/ppstructure/utility.py @@ -84,13 +84,18 @@ def init_args(): type=str2bool, default=True, help='In the forward, whether the non-table area is recognition by ocr') + # param for recovery parser.add_argument( "--recovery", - type=bool, + type=str2bool, default=False, help='Whether to enable layout of recovery') parser.add_argument( - "--save_pdf", type=bool, default=False, help='Whether to save pdf file') + "--save_pdf", + type=str2bool, + default=False, + help='Whether to save pdf file') + return parser