提交 357ab78f 编写于 作者: A an1018

update doc

上级 dd063fc9
...@@ -10,13 +10,14 @@ ...@@ -10,13 +10,14 @@
<a name="1"></a> <a name="1"></a>
## 1. 版面分析模型 ## 1. 版面分析模型
|模型名称|模型简介|下载地址|label_map| |模型名称|模型简介|推理模型大小|下载地址|
| --- | --- | --- | --- | | --- | --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| | picodet_lcnet_x1_0_fgd_layout | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| | picodet_lcnet_x1_0_fgd_layout_cdla | CDLA数据集训练的版面分析模型,可以划分为**表格、图片、图片标题、表格、表格标题、页眉、脚本、引用、公式**10类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| | picodet_lcnet_x1_0_fgd_layout_table | 表格数据集训练的版面分析模型,只能检测表格 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) |
<a name="2"></a> <a name="2"></a>
## 2. OCR和表格识别模型 ## 2. OCR和表格识别模型
<a name="21"></a> <a name="21"></a>
......
...@@ -4,18 +4,17 @@ ...@@ -4,18 +4,17 @@
- [2. OCR and Table Recognition](#2-ocr-and-table-recognition) - [2. OCR and Table Recognition](#2-ocr-and-table-recognition)
- [2.1 OCR](#21-ocr) - [2.1 OCR](#21-ocr)
- [2.2 Table Recognition](#22-table-recognition) - [2.2 Table Recognition](#22-table-recognition)
- [3. VQA](#3-kie) - [3. KIE](#3-kie)
- [4. KIE](#4-kie)
<a name="1"></a> <a name="1"></a>
## 1. Layout Analysis ## 1. Layout Analysis
|model name| description |download|label_map| |model name| description |download|
| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | | --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **text, title, table, picture and list** | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| | picodet_lcnet_x1_0_fgd_layout | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| | picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| | picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can only detect tables | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) |
<a name="2"></a> <a name="2"></a>
## 2. OCR and Table Recognition ## 2. OCR and Table Recognition
...@@ -40,19 +39,25 @@ If you need to use other OCR models, you can download the model in [PP-OCR model ...@@ -40,19 +39,25 @@ If you need to use other OCR models, you can download the model in [PP-OCR model
|ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model trained on PubTabNet dataset based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) | |ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model trained on PubTabNet dataset based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
<a name="3"></a> <a name="3"></a>
## 3. VQA ## 3. KIE
|model| description |inference model size|download| On XFUND_zh dataset, Accuracy and time cost of different models on V100 GPU are as follows.
| --- |----------------------------------------------------------------| --- | --- |
|ser_LayoutXLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) | |Model|Backbone|Task|Config|Hmean|Time cost(ms)|Download link|
|re_LayoutXLM_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | | --- | --- | --- | --- | --- | --- |--- |
|ser_LayoutLMv2_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLMv2 |778M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) | |VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**| 15.49| [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)|
|re_LayoutLMv2_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLMv2 |765M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | |LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%| 19.49 |[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)|
|ser_LayoutLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutLM |430M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | |LayoutLM| LayoutLM-base | SER | [ser_layoutlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlm_xfund_zh.yml)|77.31%|-|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar)|
|LayoutLMv2| LayoutLMv2-base | SER | [ser_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlmv2_xfund_zh.yml)|85.44%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)|
<a name="4"></a> |VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**|15.49|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)|
## 4. KIE |LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|19.49|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)|
|LayoutLMv2| LayoutLMv2-base | RE | [re_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutlmv2_xfund_zh.yml)|67.77%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar)|
|model|description|model size|download|
| --- | --- | --- | --- | * Note: The above time cost information just considers inference time without preprocess or postprocess, test environment: `V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4`
|SDMGR|Key Information Extraction Model|78M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
On wildreceipt dataset, the algorithm result is as follows:
|Model|Backbone|Config|Hmean|Download link|
| --- | --- | --- | --- | --- |
|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
...@@ -63,7 +63,7 @@ python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simp ...@@ -63,7 +63,7 @@ python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simp
git clone https://github.com/PaddlePaddle/PaddleDetection.git git clone https://github.com/PaddlePaddle/PaddleDetection.git
``` ```
- **(2)安装其他依赖 ** - **(2)安装其他依赖**
```bash ```bash
cd PaddleDetection cd PaddleDetection
...@@ -138,7 +138,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放, ...@@ -138,7 +138,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放,
``` ```
{ {
'segmentation': # 物体的分割标注 'segmentation': # 物体的分割标注
'area': 60518.099043117836, # 物体的区域面积 'area': 60518.099043117836, # 物体的区域面积
'iscrowd': 0, # iscrowd 'iscrowd': 0, # iscrowd
...@@ -166,15 +166,17 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放, ...@@ -166,15 +166,17 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放,
提供了训练脚本、评估脚本和预测脚本,本节将以PubLayNet预训练模型为例进行讲解。 提供了训练脚本、评估脚本和预测脚本,本节将以PubLayNet预训练模型为例进行讲解。
如果不希望训练,直接体验后面的模型评估、预测、动转静、推理的流程,可以下载提供的预训练模型,并跳过本部分。 如果不希望训练,直接体验后面的模型评估、预测、动转静、推理的流程,可以下载提供的预训练模型(PubLayNet数据集),并跳过本部分。
``` ```
mkdir pretrained_model mkdir pretrained_model
cd pretrained_model cd pretrained_model
# 下载并解压PubLayNet预训练模型 # 下载PubLayNet预训练模型
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams
``` ```
下载更多[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-%E7%89%88%E9%9D%A2%E5%88%86%E6%9E%90%E6%A8%A1%E5%9E%8B)(中文CDLA数据集预训练模型、表格预训练模型)
### 4.1. 启动训练 ### 4.1. 启动训练
开始训练: 开始训练:
...@@ -184,7 +186,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_ ...@@ -184,7 +186,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_
如果你希望训练自己的数据集,需要修改配置文件中的数据配置、类别数。 如果你希望训练自己的数据集,需要修改配置文件中的数据配置、类别数。
`configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 为例,修改的内容如下所示。 `configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 为例,修改的内容如下所示。
```yaml ```yaml
metric: COCO metric: COCO
...@@ -223,16 +225,20 @@ TestDataset: ...@@ -223,16 +225,20 @@ TestDataset:
# 训练日志会自动保存到 log 目录中 # 训练日志会自动保存到 log 目录中
# 单卡训练 # 单卡训练
export CUDA_VISIBLE_DEVICES=0
python3 tools/train.py \ python3 tools/train.py \
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
--eval --eval
# 多卡训练,通过--gpus参数指定卡号 # 多卡训练,通过--gpus参数指定卡号
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
--eval --eval
``` ```
**注意:**如果训练时显存out memory,将TrainReader中batch_size调小,同时LearningRate中base_lr等比例减小。发布的config均由8卡训练得到,如果改变GPU卡数为1,那么base_lr需要减小8倍。
正常启动训练后,会看到以下log输出: 正常启动训练后,会看到以下log输出:
``` ```
...@@ -254,9 +260,11 @@ PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for D ...@@ -254,9 +260,11 @@ PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for D
更换数据集,修改【TODO】配置中的数据配置、类别数,具体可以参考4.1。启动训练: 更换数据集,修改【TODO】配置中的数据配置、类别数,具体可以参考4.1。启动训练:
```bash ```bash
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ # 单卡训练
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ export CUDA_VISIBLE_DEVICES=0
--slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ python3 tools/train.py \
-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
--eval --eval
``` ```
...@@ -267,13 +275,13 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \ ...@@ -267,13 +275,13 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \
### 5.1. 指标评估 ### 5.1. 指标评估
训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir``anno_path``dataset_dir` 设置。 训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir``anno_path``dataset_dir` 设置。
```bash ```bash
# GPU 评估, weights 为待测权重 # GPU 评估, weights 为待测权重
python3 tools/eval.py \ python3 tools/eval.py \
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-o weigths=./output/picodet_lcnet_x1_0_layout/best_model -o weights=./output/picodet_lcnet_x1_0_layout/best_model
``` ```
会输出以下信息,打印出mAP、AP0.5等信息。 会输出以下信息,打印出mAP、AP0.5等信息。
...@@ -299,8 +307,8 @@ python3 tools/eval.py \ ...@@ -299,8 +307,8 @@ python3 tools/eval.py \
``` ```
python3 tools/eval.py \ python3 tools/eval.py \
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
--slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
-o weights=output/picodet_lcnet_x2_5_layout/best_model -o weights=output/picodet_lcnet_x2_5_layout/best_model
``` ```
...@@ -311,18 +319,17 @@ python3 tools/eval.py \ ...@@ -311,18 +319,17 @@ python3 tools/eval.py \
### 5.2. 测试版面分析结果 ### 5.2. 测试版面分析结果
预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。 预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。
使用 PaddleDetection 训练好的模型,您可以使用如下命令进行中文模型预测。
使用 PaddleDetection 训练好的模型,您可以使用如下命令进行模型预测。
```bash ```bash
python3 tools/infer.py \ python3 tools/infer.py \
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-o weights='output/picodet_lcnet_x1_0_layout/best_model.pdparams' \ -o weights='output/picodet_lcnet_x1_0_layout/best_model.pdparams' \
--infer_img='docs/images/layout.jpg' \ --infer_img='docs/images/layout.jpg' \
--output_dir=output_dir/ \ --output_dir=output_dir/ \
--draw_threshold=0.4 --draw_threshold=0.5
``` ```
- `--infer_img`: 推理单张图片,也可以通过`--infer_dir`推理文件中的所有图片。 - `--infer_img`: 推理单张图片,也可以通过`--infer_dir`推理文件中的所有图片。
...@@ -335,16 +342,15 @@ python3 tools/infer.py \ ...@@ -335,16 +342,15 @@ python3 tools/infer.py \
``` ```
python3 tools/infer.py \ python3 tools/infer.py \
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
--slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \ --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
-o weights='output/picodet_lcnet_x2_5_layout/best_model.pdparams' \ -o weights='output/picodet_lcnet_x2_5_layout/best_model.pdparams' \
--infer_img='docs/images/layout.jpg' \ --infer_img='docs/images/layout.jpg' \
--output_dir=output_dir/ \ --output_dir=output_dir/ \
--draw_threshold=0.4 --draw_threshold=0.5
``` ```
## 6. 模型导出与预测 ## 6. 模型导出与预测
...@@ -356,7 +362,7 @@ inference 模型(`paddle.jit.save`保存的模型) 一般是模型训练, ...@@ -356,7 +362,7 @@ inference 模型(`paddle.jit.save`保存的模型) 一般是模型训练,
```bash ```bash
python3 tools/export_model.py \ python3 tools/export_model.py \
-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-o weights=output/picodet_lcnet_x1_0_layout/best_model \ -o weights=output/picodet_lcnet_x1_0_layout/best_model \
--output_dir=output_inference/ --output_dir=output_inference/
``` ```
...@@ -377,8 +383,8 @@ FGD蒸馏模型转inference模型步骤如下: ...@@ -377,8 +383,8 @@ FGD蒸馏模型转inference模型步骤如下:
```bash ```bash
python3 tools/export_model.py \ python3 tools/export_model.py \
-c configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_student.yml \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
--slim_config configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_teacher.yml \ --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
-o weights=./output/picodet_lcnet_x2_5_layout/best_model \ -o weights=./output/picodet_lcnet_x2_5_layout/best_model \
--output_dir=output_inference/ --output_dir=output_inference/
``` ```
...@@ -404,7 +410,7 @@ python3 deploy/python/infer.py \ ...@@ -404,7 +410,7 @@ python3 deploy/python/infer.py \
------------------------------------------ ------------------------------------------
----------- Model Configuration ----------- ----------- Model Configuration -----------
Model Arch: PicoDet Model Arch: PicoDet
Transform Order: Transform Order:
--transform op: Resize --transform op: Resize
--transform op: NormalizeImage --transform op: NormalizeImage
--transform op: Permute --transform op: Permute
...@@ -466,4 +472,3 @@ preprocess_time(ms): 2172.50, inference_time(ms): 11.90, postprocess_time(ms): 1 ...@@ -466,4 +472,3 @@ preprocess_time(ms): 2172.50, inference_time(ms): 11.90, postprocess_time(ms): 1
year={2022} year={2022}
} }
``` ```
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
...@@ -9,7 +9,7 @@ English | [简体中文](README_ch.md) ...@@ -9,7 +9,7 @@ English | [简体中文](README_ch.md)
<a name="1"></a> <a name="1"></a>
## 1. Introduction ## 1. Introduction
Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order. Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order.
...@@ -33,14 +33,14 @@ The following figure shows the result: ...@@ -33,14 +33,14 @@ The following figure shows the result:
python3 -m pip install --upgrade pip python3 -m pip install --upgrade pip
# GPU installation # GPU installation
python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simple
# CPU installation # CPU installation
python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple
```` ````
For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick). For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/macos-pip_en.html).
<a name="2.2"></a> <a name="2.2"></a>
...@@ -67,38 +67,61 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt ...@@ -67,38 +67,61 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt
## 3. Quick Start ## 3. Quick Start
<a name="3.1"></a>
### 3.1 下载模型
If input is English document, download English models:
```python ```python
cd PaddleOCR/ppstructure cd PaddleOCR/ppstructure
# download model # download model
mkdir inference && cd inference mkdir inference && cd inference
# Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it # Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it # Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it # Download the ultra-lightweight English table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of publaynet dataset and unzip it # Download the layout model of publaynet dataset and unzip it
wget wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar picodet_lcnet_x1_0_layout_infer.tar
cd .. cd ..
# run ```
If input is Chinese document,download Chinese models:
[Chinese and English ultra-lightweight PP-OCRv3 model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README.md#pp-ocr-series-model-listupdate-on-september-8th)、[表格识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#22-表格识别模型)、[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-版面分析模型)
<a name="3.2"></a>
### 3.2 版面恢复
```bash
python3 predict_system.py \ python3 predict_system.py \
--image_dir=./docs/table/1.png \ --image_dir=./docs/table/1.png \
--det_model_dir=inference/en_PP-OCRv3_det_infer \ --det_model_dir=inference/en_PP-OCRv3_det_infer \
--rec_model_dir=inference/en_PP-OCRv3_rec_infe \ --rec_model_dir=inference/en_PP-OCRv3_rec_infer \
--rec_char_dict_path=../ppocr/utils/en_dict.txt \ --rec_char_dict_path=../ppocr/utils/en_dict.txt \
--output=../output/ \ --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
--table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--table_max_len=488 \ --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
--layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
--layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \ --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
--vis_font_path=../doc/fonts/simfang.ttf \ --vis_font_path=../doc/fonts/simfang.ttf \
--recovery=True \ --recovery=True \
--save_pdf=False --save_pdf=False \
--output=../output/
``` ```
After running, the docx of each picture will be saved in the directory specified by the output field After running, the docx of each picture will be saved in the directory specified by the output field
Recovery table to Word code[table_process.py] reference:https://github.com/pqzx/html2docx.git Field:
\ No newline at end of file
- image_dir:test file测试文件, can be picture, picture directory, pdf file, pdf file directory
- det_model_dir:OCR detection model path
- rec_model_dir:OCR recognition model path
- rec_char_dict_path:OCR recognition dict path. If the Chinese model is used, change to "../ppocr/utils/ppocr_keys_v1.txt". And if you trained the model on your own dataset, change to the trained dictionary
- table_model_dir:tabel recognition model path
- table_char_dict_path:tabel recognition dict path. If the Chinese model is used, no need to change
- layout_model_dir:layout analysis model path
- layout_dict_path:layout analysis dict path. If the Chinese model is used, change to "../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt"
- recovery:whether to enable layout of recovery, default False
- save_pdf:when recovery file, whether to save pdf file, default False
- output:save the recovery result path
...@@ -8,19 +8,22 @@ ...@@ -8,19 +8,22 @@
- [2.2 安装PaddleOCR](#2.2) - [2.2 安装PaddleOCR](#2.2)
- [3. 使用](#3) - [3. 使用](#3)
- [3.1 下载模型](#3.1)
- [3.2 版面恢复](#3.2)
<a name="1"></a> <a name="1"></a>
## 1. 简介 ## 1. 简介
版面恢复就是在OCR识别后,内容仍然像原文档图片那样排列着,段落不变、顺序不变的输出到word文档中等。 版面恢复就是在OCR识别后,内容仍然像原文档图片那样排列着,段落不变、顺序不变的输出到word文档中等。
版面恢复结合了[版面分析](../layout/README_ch.md)[表格识别](../table/README_ch.md)技术,从而更好地恢复图片、表格、标题等内容,下图展示了版面恢复的结果: 版面恢复结合了[版面分析](../layout/README_ch.md)[表格识别](../table/README_ch.md)技术,从而更好地恢复图片、表格、标题等内容,支持pdf文档、文档图片格式的输入文件,下图展示了版面恢复的结果:
<div align="center"> <div align="center">
<img src="../docs/table/recovery.jpg" width = "700" /> <img src="../docs/recovery/recovery.jpg" width = "700" />
</div> </div>
<a name="2"></a> <a name="2"></a>
## 2. 安装 ## 2. 安装
...@@ -35,10 +38,10 @@ ...@@ -35,10 +38,10 @@
python3 -m pip install --upgrade pip python3 -m pip install --upgrade pip
# GPU安装 # GPU安装
python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simple
# CPU安装 # CPU安装
python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple
``` ```
...@@ -69,40 +72,66 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt ...@@ -69,40 +72,66 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt
## 3. 使用 ## 3. 使用
恢复给定文档的版面: <a name="3.1"></a>
```python ### 3.1 下载模型
如果输入为英文文档类型,下载英文模型
```
cd PaddleOCR/ppstructure cd PaddleOCR/ppstructure
# 下载模型 # 下载模型
mkdir inference && cd inference mkdir inference && cd inference
# 下载超英文轻量级PP-OCRv3模型的检测模型并解压 # 下载英文超轻量PP-OCRv3检测模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
# 下载英文轻量级PP-OCRv3模型的识别模型并解压 # 下载英文超轻量PP-OCRv3识别模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
# 下载超轻量级英文表格英寸模型并解压 # 下载英文表格识别模型并解压
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
# 下载英文版面分析模型 # 下载英文版面分析模型
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar picodet_lcnet_x1_0_layout_infer.tar wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
cd .. cd ..
```
如果输入为中文文档类型,在下述链接中下载中文模型即可:
# 执行预测 [PP-OCRv3中英文超轻量文本检测和识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README_ch.md#pp-ocr%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E5%88%97%E8%A1%A8%E6%9B%B4%E6%96%B0%E4%B8%AD)[表格识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#22-表格识别模型)[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-版面分析模型)
<a name="3.2"></a>
### 3.2 版面恢复
使用下载的模型恢复给定文档的版面,以英文模型为例,执行如下命令:
```python
python3 predict_system.py \ python3 predict_system.py \
--image_dir=./docs/table/1.png \ --image_dir=./docs/table/1.png \
--det_model_dir=inference/en_PP-OCRv3_det_infer \ --det_model_dir=inference/en_PP-OCRv3_det_infer \
--rec_model_dir=inference/en_PP-OCRv3_rec_infe \ --rec_model_dir=inference/en_PP-OCRv3_rec_infer \
--rec_char_dict_path=../ppocr/utils/en_dict.txt \ --rec_char_dict_path=../ppocr/utils/en_dict.txt \
--output=../output/ \ --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
--table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--table_max_len=488 \ --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
--layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
--layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \ --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
--vis_font_path=../doc/fonts/simfang.ttf \ --vis_font_path=../doc/fonts/simfang.ttf \
--recovery=True \ --recovery=True \
--save_pdf=False --save_pdf=False \
--output=../output/
``` ```
运行完成后,每张图片的docx文档会保存到`output`字段指定的目录下 运行完成后,恢复版面的docx文档会保存到`output`字段指定的目录下
表格恢复到Word代码[table_process.py]来自:https://github.com/pqzx/html2docx.git 字段含义:
- image_dir:测试文件,可以是图片、图片目录、pdf文件、pdf文件目录
- det_model_dir:OCR检测模型路径
- rec_model_dir:OCR识别模型路径
- rec_char_dict_path:OCR识别字典,如果更换为中文模型,需要更改为"../ppocr/utils/ppocr_keys_v1.txt",如果您在自己的数据集上训练的模型,则更改为训练的字典的文件
- table_model_dir:表格识别模型路径
- table_char_dict_path:表格识别字典,如果更换为中文模型,不需要更换字典
- layout_model_dir:版面分析模型路径
- layout_dict_path:版面分析字典,如果更换为中文模型,需要更改为"../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt"
- recovery:是否进行版面恢复,默认False
- save_pdf:进行版面恢复导出docx文档的同时,是否保存为pdf文件,默认为False
- output:版面恢复结果保存路径
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
...@@ -24,7 +24,7 @@ from docx.enum.section import WD_SECTION ...@@ -24,7 +24,7 @@ from docx.enum.section import WD_SECTION
from docx.oxml.ns import qn from docx.oxml.ns import qn
from docx.enum.table import WD_TABLE_ALIGNMENT from docx.enum.table import WD_TABLE_ALIGNMENT
from table_process import HtmlToDocx from ppstructure.recovery.table_process import HtmlToDocx
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
logger = get_logger() logger = get_logger()
...@@ -69,7 +69,7 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf): ...@@ -69,7 +69,7 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf):
new_table = deepcopy(table) new_table = deepcopy(table)
new_table.alignment = WD_TABLE_ALIGNMENT.CENTER new_table.alignment = WD_TABLE_ALIGNMENT.CENTER
paragraph.add_run().element.addnext(new_table._tbl) paragraph.add_run().element.addnext(new_table._tbl)
else: else:
paragraph = doc.add_paragraph() paragraph = doc.add_paragraph()
paragraph_format = paragraph.paragraph_format paragraph_format = paragraph.paragraph_format
...@@ -86,10 +86,10 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf): ...@@ -86,10 +86,10 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf):
# save to pdf # save to pdf
if save_pdf: if save_pdf:
pdf = os.path.join(save_folder, '{}.pdf'.format(img_name)) pdf_path = os.path.join(save_folder, '{}.pdf'.format(img_name))
from docx2pdf import convert from docx2pdf import convert
convert(docx_path, pdf_path) convert(docx_path, pdf_path)
logger.info('pdf save to {}'.format(pdf)) logger.info('pdf save to {}'.format(pdf_path))
def sorted_layout_boxes(res, w): def sorted_layout_boxes(res, w):
...@@ -112,7 +112,7 @@ def sorted_layout_boxes(res, w): ...@@ -112,7 +112,7 @@ def sorted_layout_boxes(res, w):
res_left = [] res_left = []
res_right = [] res_right = []
i = 0 i = 0
while True: while True:
if i >= num_boxes: if i >= num_boxes:
break break
...@@ -137,7 +137,7 @@ def sorted_layout_boxes(res, w): ...@@ -137,7 +137,7 @@ def sorted_layout_boxes(res, w):
res_left = [] res_left = []
res_right = [] res_right = []
break break
elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3*w / 4: elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3 * w / 4:
_boxes[i]['layout'] = 'double' _boxes[i]['layout'] = 'double'
res_left.append(_boxes[i]) res_left.append(_boxes[i])
i += 1 i += 1
...@@ -157,4 +157,4 @@ def sorted_layout_boxes(res, w): ...@@ -157,4 +157,4 @@ def sorted_layout_boxes(res, w):
new_res += res_left new_res += res_left
if res_right: if res_right:
new_res += res_right new_res += res_right
return new_res return new_res
\ No newline at end of file
...@@ -84,13 +84,18 @@ def init_args(): ...@@ -84,13 +84,18 @@ def init_args():
type=str2bool, type=str2bool,
default=True, default=True,
help='In the forward, whether the non-table area is recognition by ocr') help='In the forward, whether the non-table area is recognition by ocr')
# param for recovery
parser.add_argument( parser.add_argument(
"--recovery", "--recovery",
type=bool, type=str2bool,
default=False, default=False,
help='Whether to enable layout of recovery') help='Whether to enable layout of recovery')
parser.add_argument( parser.add_argument(
"--save_pdf", type=bool, default=False, help='Whether to save pdf file') "--save_pdf",
type=str2bool,
default=False,
help='Whether to save pdf file')
return parser return parser
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册