diff --git a/ppstructure/docs/models_list.md b/ppstructure/docs/models_list.md index ef7048faa6f367316871e1fe8cfbd72ceb805e59..935d12d756eec467574f9ae32d48c70a3ea054c3 100644 --- a/ppstructure/docs/models_list.md +++ b/ppstructure/docs/models_list.md @@ -10,13 +10,17 @@ ## 1. 版面分析模型 -|模型名称|模型简介|下载地址|label_map| -| --- | --- | --- | --- | -| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| -| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| -| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| +|模型名称|模型简介|推理模型大小|下载地址|dict path| +| --- | --- | --- | --- | --- | +| picodet_lcnet_x1_0_fgd_layout | 基于PicoDet LCNet_x1_0和FGD蒸馏在PubLayNet 数据集训练的英文版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) | +| ppyolov2_r50vd_dcn_365e_publaynet | 基于PP-YOLOv2在PubLayNet数据集上训练的英文版面分析模型 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | 同上 | +| picodet_lcnet_x1_0_fgd_layout_cdla | CDLA数据集训练的中文版面分析模型,可以划分为**表格、图片、图片标题、表格、表格标题、页眉、脚本、引用、公式**10类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) | +| picodet_lcnet_x1_0_fgd_layout_table | 表格数据集训练的版面分析模型,支持中英文文档表格区域的检测 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) | +| ppyolov2_r50vd_dcn_365e_tableBank_word | 基于PP-YOLOv2在TableBank Word 数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | 同上 | +| ppyolov2_r50vd_dcn_365e_tableBank_latex | 基于PP-YOLOv2在TableBank Latex数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | 同上 | + ## 2. OCR和表格识别模型 diff --git a/ppstructure/docs/models_list_en.md b/ppstructure/docs/models_list_en.md index 27b444d5a71f14d1135847a1fd2e9345f54b59ac..85531fb753c4e32f0cdc9296ab97a9faebbb0ebd 100644 --- a/ppstructure/docs/models_list_en.md +++ b/ppstructure/docs/models_list_en.md @@ -6,15 +6,18 @@ - [2.2 Table Recognition](#22-table-recognition) - [3. KIE](#3-kie) - + ## 1. Layout Analysis -|model name| description |download|label_map| -| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | -| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **text, title, table, picture and list** | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| -| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| -| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| +|model name| description | inference model size |download|dict path| +| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | --- | +| picodet_lcnet_x1_0_fgd_layout | The layout analysis English model trained on the PubLayNet dataset based on PicoDet LCNet_x1_0 and FGD . the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) | +| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221M | [inference_moel]](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | sme as above | +| picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis Chinese model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) | +| picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can detect tables in Chinese and English documents | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) | +| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | same as above | +| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | same as above | ## 2. OCR and Table Recognition diff --git a/ppstructure/docs/quickstart.md b/ppstructure/docs/quickstart.md index 517703d7b6c172937b45dda15ffc816946a5ef53..38a37bebcb561228ddfb2b2970b9ddbaebbcb19a 100644 --- a/ppstructure/docs/quickstart.md +++ b/ppstructure/docs/quickstart.md @@ -8,15 +8,21 @@ - [2.1.3 版面分析](#213-版面分析) - [2.1.4 表格识别](#214-表格识别) - [2.1.5 关键信息抽取](#215-关键信息抽取) + - [2.1.6 版面恢复](#216-版面恢复) - [2.2 代码使用](#22-代码使用) - - [2.2.1 图像方向分类版面分析表格识别](#221-图像方向分类版面分析表格识别) + + - [2.2.1 图像方向+分类版面分析+表格识别](#221-图像方向分类版面分析表格识别) - [2.2.2 版面分析+表格识别](#222-版面分析表格识别) - [2.2.3 版面分析](#223-版面分析) - [2.2.4 表格识别](#224-表格识别) + - [2.2.5 关键信息抽取](#225-关键信息抽取) + - [2.2.6 版面恢复](#226-版面恢复) + - [2.3 返回结果说明](#23-返回结果说明) - [2.3.1 版面分析+表格识别](#231-版面分析表格识别) - [2.3.2 关键信息抽取](#232-关键信息抽取) + - [2.4 参数说明](#24-参数说明) @@ -24,11 +30,12 @@ ## 1. 安装依赖包 ```bash -# 安装 paddleocr,推荐使用2.5+版本 -pip3 install "paddleocr>=2.5" +# 安装 paddleocr,推荐使用2.6版本 +pip3 install "paddleocr>=2.6" # 安装 关键信息抽取 依赖包(如不需要KIE功能,可跳过) pip install -r kie/requirements.txt - +# 安装 图像方向分类依赖包paddleclas(如不需要图像方向分类功能,可跳过) +pip3 install paddleclas ``` @@ -62,15 +69,24 @@ paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structur ``` -#### 2.1.5 关键信息抽取 +#### 2.1.5 关键信息抽取 请参考:[关键信息抽取教程](../kie/README_ch.md)。 + + +#### 2.1.6 版面恢复 + +```bash +paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --recovery=true +``` + + ### 2.2 代码使用 -#### 2.2.1 图像方向分类版面分析表格识别 +#### 2.2.1 图像方向分类+版面分析+表格识别 ```python import os @@ -149,6 +165,7 @@ for line in result: ``` + #### 2.2.4 表格识别 ```python @@ -174,6 +191,33 @@ for line in result: 请参考:[关键信息抽取教程](../kie/README_ch.md)。 + + +#### 2.2.6 版面恢复 + +```python +import os +import cv2 +from paddleocr import PPStructure,save_structure_res +from paddelocr.ppstructure.recovery.recovery_to_doc import sorted_layout_boxes, convert_info_docx + +table_engine = PPStructure(layout=False, show_log=True) + +save_folder = './output' +img_path = 'PaddleOCR/ppstructure/docs/table/1.png' +img = cv2.imread(img_path) +result = table_engine(img) +save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) + +for line in result: + line.pop('img') + print(line) + +h, w, _ = img.shape +res = sorted_layout_boxes(res, w) +convert_info_docx(img, result, save_folder, os.path.basename(img_path).split('.')[0]) +``` + ### 2.3 返回结果说明 PP-Structure的返回结果为一个dict组成的list,示例如下 @@ -235,6 +279,7 @@ dict 里各个字段说明如下 | table | 前向中是否执行表格识别 | True | | ocr | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False| True | | recovery | 前向中是否执行版面恢复| False | +| save_pdf | 版面恢复导出docx文件的同时,是否导出pdf文件 | False | | structure_version | 模型版本,可选 PP-structure和PP-structurev2 | PP-structure | 大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md) diff --git a/ppstructure/docs/quickstart_en.md b/ppstructure/docs/quickstart_en.md index 3a4e7a2d60bb26fd4f453314a0783d4180dc7ad9..dbfbf43b01c94bd6f9c729f2f6edcd1dd6aee056 100644 --- a/ppstructure/docs/quickstart_en.md +++ b/ppstructure/docs/quickstart_en.md @@ -8,12 +8,15 @@ - [2.1.3 layout analysis](#213-layout-analysis) - [2.1.4 table recognition](#214-table-recognition) - [2.1.5 Key Information Extraction](#215-Key-Information-Extraction) + - [2.1.6 layout recovery](#216-layout-recovery) - [2.2 Use by code](#22-use-by-code) - [2.2.1 image orientation + layout analysis + table recognition](#221-image-orientation--layout-analysis--table-recognition) - [2.2.2 layout analysis + table recognition](#222-layout-analysis--table-recognition) - [2.2.3 layout analysis](#223-layout-analysis) - [2.2.4 table recognition](#224-table-recognition) + - [2.2.5 DocVQA](#225-dockie) - [2.2.5 Key Information Extraction](#225-Key-Information-Extraction) + - [2.2.6 layout recovery](#226-layout-recovery) - [2.3 Result description](#23-result-description) - [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition) - [2.3.2 Key Information Extraction](#232-Key-Information-Extraction) @@ -24,14 +27,16 @@ ## 1. Install package ```bash -# Install paddleocr, version 2.5+ is recommended -pip3 install "paddleocr>=2.5" +# Install paddleocr, version 2.6 is recommended +pip3 install "paddleocr>=2.6" # Install the KIE dependency packages (if you do not use the KIE, you can skip it) pip install -r kie/requirements.txt - +# Install the image direction classification dependency package paddleclas (if you do not use the image direction classification, you can skip it) +pip3 install paddleclas ``` + ## 2. Use @@ -66,6 +71,12 @@ paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structur Please refer to: [Key Information Extraction](../kie/README.md) . + +#### 2.1.6 layout recovery +```bash +paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --recovery=true +``` + ### 2.2 Use by code @@ -174,6 +185,32 @@ for line in result: Please refer to: [Key Information Extraction](../kie/README.md) . + +#### 2.2.6 layout recovery + +```python +import os +import cv2 +from paddleocr import PPStructure,save_structure_res +from paddelocr.ppstructure.recovery.recovery_to_doc import sorted_layout_boxes, convert_info_docx + +table_engine = PPStructure(layout=False, show_log=True) + +save_folder = './output' +img_path = 'PaddleOCR/ppstructure/docs/table/1.png' +img = cv2.imread(img_path) +result = table_engine(img) +save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0]) + +for line in result: + line.pop('img') + print(line) + +h, w, _ = img.shape +res = sorted_layout_boxes(res, w) +convert_info_docx(img, result, save_folder, os.path.basename(img_path).split('.')[0]) +``` + ### 2.3 Result description @@ -235,6 +272,7 @@ Please refer to: [Key Information Extraction](../kie/README.md) . | table | Whether to perform table recognition in forward | True | | ocr | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False| True | | recovery | Whether to perform layout recovery in forward| False | +| save_pdf | Whether to convert docx to pdf when recovery| False | | structure_version | Structure version, optional PP-structure and PP-structurev2 | PP-structure | Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md) diff --git a/ppstructure/docs/recovery/recovery.jpg b/ppstructure/docs/recovery/recovery.jpg new file mode 100644 index 0000000000000000000000000000000000000000..a3817ab70eff5b380072701b70ab227ae6c8184c Binary files /dev/null and b/ppstructure/docs/recovery/recovery.jpg differ diff --git a/ppstructure/docs/table/recovery.jpg b/ppstructure/docs/table/recovery.jpg deleted file mode 100644 index bee2e2fb3499ec4b348e2b2f1475a87c9c562190..0000000000000000000000000000000000000000 Binary files a/ppstructure/docs/table/recovery.jpg and /dev/null differ diff --git a/ppstructure/layout/README.md b/ppstructure/layout/README.md index 3762544b834d752a705216ca3f93d326aa1391ad..45386da348fc8d2d76da64cffbd2d1a4482812af 100644 --- a/ppstructure/layout/README.md +++ b/ppstructure/layout/README.md @@ -63,7 +63,7 @@ python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simp git clone https://github.com/PaddlePaddle/PaddleDetection.git ``` -- **(2)安装其他依赖 ** +- **(2)安装其他依赖** ```bash cd PaddleDetection @@ -138,7 +138,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放, ``` { - + 'segmentation': # 物体的分割标注 'area': 60518.099043117836, # 物体的区域面积 'iscrowd': 0, # iscrowd @@ -166,15 +166,17 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放, 提供了训练脚本、评估脚本和预测脚本,本节将以PubLayNet预训练模型为例进行讲解。 -如果不希望训练,直接体验后面的模型评估、预测、动转静、推理的流程,可以下载提供的预训练模型,并跳过本部分。 +如果不希望训练,直接体验后面的模型评估、预测、动转静、推理的流程,可以下载提供的预训练模型(PubLayNet数据集),并跳过本部分。 ``` mkdir pretrained_model cd pretrained_model -# 下载并解压PubLayNet预训练模型 +# 下载PubLayNet预训练模型 wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams ``` +下载更多[版面分析模型](../docs/models_list.md)(中文CDLA数据集预训练模型、表格预训练模型) + ### 4.1. 启动训练 开始训练: @@ -184,7 +186,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_ 如果你希望训练自己的数据集,需要修改配置文件中的数据配置、类别数。 -以`configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 为例,修改的内容如下所示。 +以`configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 为例,修改的内容如下所示。 ```yaml metric: COCO @@ -223,16 +225,20 @@ TestDataset: # 训练日志会自动保存到 log 目录中 # 单卡训练 +export CUDA_VISIBLE_DEVICES=0 python3 tools/train.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ --eval # 多卡训练,通过--gpus参数指定卡号 +export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ --eval ``` +**注意:**如果训练时显存out memory,将TrainReader中batch_size调小,同时LearningRate中base_lr等比例减小。发布的config均由8卡训练得到,如果改变GPU卡数为1,那么base_lr需要减小8倍。 + 正常启动训练后,会看到以下log输出: ``` @@ -254,9 +260,11 @@ PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for D 更换数据集,修改【TODO】配置中的数据配置、类别数,具体可以参考4.1。启动训练: ```bash -python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - --slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ +# 单卡训练 +export CUDA_VISIBLE_DEVICES=0 +python3 tools/train.py \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ --eval ``` @@ -267,13 +275,13 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ ### 5.1. 指标评估 -训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir`、`anno_path`和`dataset_dir` 设置。 +训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir`、`anno_path`和`dataset_dir` 设置。 ```bash # GPU 评估, weights 为待测权重 python3 tools/eval.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - -o weigths=./output/picodet_lcnet_x1_0_layout/best_model + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + -o weights=./output/picodet_lcnet_x1_0_layout/best_model ``` 会输出以下信息,打印出mAP、AP0.5等信息。 @@ -299,8 +307,8 @@ python3 tools/eval.py \ ``` python3 tools/eval.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - --slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ -o weights=output/picodet_lcnet_x2_5_layout/best_model ``` @@ -311,18 +319,17 @@ python3 tools/eval.py \ ### 5.2. 测试版面分析结果 -预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。 - -使用 PaddleDetection 训练好的模型,您可以使用如下命令进行中文模型预测。 +预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。 +使用 PaddleDetection 训练好的模型,您可以使用如下命令进行模型预测。 ```bash python3 tools/infer.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ -o weights='output/picodet_lcnet_x1_0_layout/best_model.pdparams' \ --infer_img='docs/images/layout.jpg' \ --output_dir=output_dir/ \ - --draw_threshold=0.4 + --draw_threshold=0.5 ``` - `--infer_img`: 推理单张图片,也可以通过`--infer_dir`推理文件中的所有图片。 @@ -335,16 +342,15 @@ python3 tools/infer.py \ ``` python3 tools/infer.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ - --slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ -o weights='output/picodet_lcnet_x2_5_layout/best_model.pdparams' \ --infer_img='docs/images/layout.jpg' \ --output_dir=output_dir/ \ - --draw_threshold=0.4 + --draw_threshold=0.5 ``` - ## 6. 模型导出与预测 @@ -356,7 +362,7 @@ inference 模型(`paddle.jit.save`保存的模型) 一般是模型训练, ```bash python3 tools/export_model.py \ - -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ -o weights=output/picodet_lcnet_x1_0_layout/best_model \ --output_dir=output_inference/ ``` @@ -377,8 +383,8 @@ FGD蒸馏模型转inference模型步骤如下: ```bash python3 tools/export_model.py \ - -c configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_student.yml \ - --slim_config configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_teacher.yml \ + -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ + --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \ -o weights=./output/picodet_lcnet_x2_5_layout/best_model \ --output_dir=output_inference/ ``` @@ -404,7 +410,7 @@ python3 deploy/python/infer.py \ ------------------------------------------ ----------- Model Configuration ----------- Model Arch: PicoDet -Transform Order: +Transform Order: --transform op: Resize --transform op: NormalizeImage --transform op: Permute @@ -466,4 +472,3 @@ preprocess_time(ms): 2172.50, inference_time(ms): 11.90, postprocess_time(ms): 1 year={2022} } ``` - diff --git a/ppstructure/layout/__init__.py b/ppstructure/layout/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1d11e265597c7c8e39098a228108da3bb954b892 --- /dev/null +++ b/ppstructure/layout/__init__.py @@ -0,0 +1,13 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/ppstructure/recovery/README.md b/ppstructure/recovery/README.md index 713d0307dbbd66664db15d19df484af76efea75a..90a6a2c3c4189dc885d698e4cac2d1a24a49d1df 100644 --- a/ppstructure/recovery/README.md +++ b/ppstructure/recovery/README.md @@ -6,10 +6,12 @@ English | [简体中文](README_ch.md) - [2.1 Installation dependencies](#2.1) - [2.2 Install PaddleOCR](#2.2) - [3. Quick Start](#3) + - [3.1 Download models](#3.1) + - [3.2 Layout recovery](#3.2) -## 1. Introduction +## 1. Introduction Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order. @@ -17,8 +19,9 @@ Layout recovery combines [layout analysis](../layout/README.md)、[table recogni The following figure shows the result: