update doc

357ab78f · an1018 · dd063fc9 · 357ab78f · 357ab78f · 357ab78f
11 changed file
--- a/ppstructure/docs/models_list.md
+++ b/ppstructure/docs/models_list.md
@@ -10,13 +10,14 @@
 <a name="1"></a>
 ## 1. 版面分析模型
-|模型名称|模型简介|下载地址|label_map|
+|模型名称|模型简介|推理模型大小|下载地址|
 | --- | --- | --- | --- |
-| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型，可以划分**文字、标题、表格、图片以及列表**5类区域 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}|
+| picodet_lcnet_x1_0_fgd_layout | PubLayNet 数据集训练的版面分析模型，可以划分**文字、标题、表格、图片以及列表**5类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) |
-| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型，只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}|
+| picodet_lcnet_x1_0_fgd_layout_cdla | CDLA数据集训练的版面分析模型，可以划分为**表格、图片、图片标题、表格、表格标题、页眉、脚本、引用、公式**10类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) |
-| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型，只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}|
+| picodet_lcnet_x1_0_fgd_layout_table | 表格数据集训练的版面分析模型，只能检测表格 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) |
 <a name="2"></a>
 ## 2. OCR和表格识别模型
 <a name="21"></a>

--- a/ppstructure/docs/models_list_en.md
+++ b/ppstructure/docs/models_list_en.md
@@ -4,18 +4,17 @@
 - [2. OCR and Table Recognition](#2-ocr-and-table-recognition)
  - [2.1 OCR](#21-ocr)
  - [2.2 Table Recognition](#22-table-recognition)
- [3. VQA](#3-kie)
+- [3. KIE](#3-kie)
- [4. KIE](#4-kie)
 <a name="1"></a>
 ## 1. Layout Analysis
-|model name| description                                                                                                                                             |download|label_map|
+|model name| description                                                                                                                                             |download|
-| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- |
+| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- |
-| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **text, title, table, picture and list** | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}|
+| picodet_lcnet_x1_0_fgd_layout | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) |
-| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset, the model can only detect tables                                                       | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}|
+| picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) |
-| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset, the model can only detect tables                                                      | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}|
+| picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can only detect tables                                                      | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) |
 <a name="2"></a>
 ## 2. OCR and Table Recognition
@@ -40,19 +39,25 @@ If you need to use other OCR models, you can download the model in [PP-OCR model
 |ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model trained on PubTabNet dataset based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
 <a name="3"></a>
-## 3. VQA
+## 3. KIE
-|model| description                                                    |inference model size|download|
+On XFUND_zh dataset, Accuracy and time cost of different models on V100 GPU are as follows.
-| --- |----------------------------------------------------------------| --- | --- |
-|ser_LayoutXLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLM   |1.4G|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
+|Model|Backbone|Task|Config|Hmean|Time cost(ms)|Download link|
-|re_LayoutXLM_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLM    |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
+| --- | --- |  --- | --- | --- | --- |--- |
-|ser_LayoutLMv2_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLMv2 |778M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) |
+|VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**| 15.49| [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)|
-|re_LayoutLMv2_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLMv2  |765M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
+|LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%| 19.49 |[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)|
-|ser_LayoutLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutLM    |430M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
+|LayoutLM| LayoutLM-base | SER | [ser_layoutlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlm_xfund_zh.yml)|77.31%|-|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar)|
+|LayoutLMv2| LayoutLMv2-base | SER | [ser_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlmv2_xfund_zh.yml)|85.44%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)|
-<a name="4"></a>
+|VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**|15.49|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)|
-## 4. KIE
+|LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|19.49|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)|
+|LayoutLMv2| LayoutLMv2-base | RE | [re_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutlmv2_xfund_zh.yml)|67.77%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar)|
-|model|description|model size|download|
-| --- | --- | --- | --- |
+* Note: The above time cost information just considers inference time without preprocess or postprocess, test environment: `V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4`
-|SDMGR|Key Information Extraction Model|78M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
+On wildreceipt dataset, the algorithm result is as follows:
+|Model|Backbone|Config|Hmean|Download link|
+| --- | --- | --- | --- | --- |
+|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
--- a/ppstructure/docs/recovery/recovery.jpg
+++ b/ppstructure/docs/recovery/recovery.jpg
--- a/ppstructure/docs/table/recovery.jpg
+++ b/ppstructure/docs/table/recovery.jpg
--- a/ppstructure/layout/README.md
+++ b/ppstructure/layout/README.md
@@ -63,7 +63,7 @@ python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simp
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
 ```
- **（2）安装其他依赖 **
+- **（2）安装其他依赖**
 ```bash
 cd PaddleDetection
@@ -138,7 +138,7 @@ json文件包含所有图像的标注，数据以字典嵌套的方式存放，
  ```
  {
      'segmentation':             # 物体的分割标注
      'area': 60518.099043117836, # 物体的区域面积
      'iscrowd': 0,               # iscrowd
@@ -166,15 +166,17 @@ json文件包含所有图像的标注，数据以字典嵌套的方式存放，
 提供了训练脚本、评估脚本和预测脚本，本节将以PubLayNet预训练模型为例进行讲解。
-如果不希望训练，直接体验后面的模型评估、预测、动转静、推理的流程，可以下载提供的预训练模型，并跳过本部分。
+如果不希望训练，直接体验后面的模型评估、预测、动转静、推理的流程，可以下载提供的预训练模型(PubLayNet数据集)，并跳过本部分。
 ```
 mkdir pretrained_model
 cd pretrained_model
-# 下载并解压PubLayNet预训练模型
+# 下载PubLayNet预训练模型
 wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams
 ```
+下载更多[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-%E7%89%88%E9%9D%A2%E5%88%86%E6%9E%90%E6%A8%A1%E5%9E%8B)（中文CDLA数据集预训练模型、表格预训练模型）
 ### 4.1. 启动训练
 开始训练:
@@ -184,7 +186,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_
 如果你希望训练自己的数据集，需要修改配置文件中的数据配置、类别数。
-以`configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 为例，修改的内容如下所示。
+以`configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 为例，修改的内容如下所示。
 ```yaml
 metric: COCO
@@ -223,16 +225,20 @@ TestDataset:
 # 训练日志会自动保存到 log 目录中
 # 单卡训练
+export CUDA_VISIBLE_DEVICES=0
 python3 tools/train.py \
-	-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
 	--eval
 # 多卡训练，通过--gpus参数指定卡号
+export CUDA_VISIBLE_DEVICES=0,1,2,3
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py \
-	-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
 	--eval
 ```
+**注意：**如果训练时显存out memory，将TrainReader中batch_size调小，同时LearningRate中base_lr等比例减小。发布的config均由8卡训练得到，如果改变GPU卡数为1，那么base_lr需要减小8倍。
 正常启动训练后，会看到以下log输出：
 ```
@@ -254,9 +260,11 @@ PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for D
 更换数据集，修改【TODO】配置中的数据配置、类别数，具体可以参考4.1。启动训练：
 ```bash
-python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \
+# 单卡训练
-	-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+export CUDA_VISIBLE_DEVICES=0
-	--slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \
+python3 tools/train.py \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
 	--eval
 ```
@@ -267,13 +275,13 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \
 ### 5.1. 指标评估
-训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时，需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml`  修改`EvalDataset`中的 `image_dir`、`anno_path`和`dataset_dir` 设置。
+训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时，需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml`  修改`EvalDataset`中的 `image_dir`、`anno_path`和`dataset_dir` 设置。
 ```bash
 # GPU 评估， weights 为待测权重
 python3 tools/eval.py \
-	-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	-o weigths=./output/picodet_lcnet_x1_0_layout/best_model
+	-o weights=./output/picodet_lcnet_x1_0_layout/best_model
 ```
 会输出以下信息，打印出mAP、AP0.5等信息。
@@ -299,8 +307,8 @@ python3 tools/eval.py \
 ```
 python3 tools/eval.py \
-	-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \
+	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
 	-o weights=output/picodet_lcnet_x2_5_layout/best_model
 ```
@@ -311,18 +319,17 @@ python3 tools/eval.py \
 ### 5.2. 测试版面分析结果
-预测使用的配置文件必须与训练一致，如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。
+预测使用的配置文件必须与训练一致，如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。
-使用 PaddleDetection 训练好的模型，您可以使用如下命令进行中文模型预测。
+使用 PaddleDetection 训练好的模型，您可以使用如下命令进行模型预测。
 ```bash
 python3 tools/infer.py \
-    -c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
    -o weights='output/picodet_lcnet_x1_0_layout/best_model.pdparams' \
    --infer_img='docs/images/layout.jpg' \
    --output_dir=output_dir/ \
-    --draw_threshold=0.4
+    --draw_threshold=0.5
 ```
 - `--infer_img`: 推理单张图片，也可以通过`--infer_dir`推理文件中的所有图片。
@@ -335,16 +342,15 @@ python3 tools/infer.py \
 ```
 python3 tools/infer.py \
-	-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--slim_config configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x2_5_layout.yml \
+	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
 	-o weights='output/picodet_lcnet_x2_5_layout/best_model.pdparams' \
 	--infer_img='docs/images/layout.jpg' \
 	--output_dir=output_dir/ \
-	--draw_threshold=0.4
+	--draw_threshold=0.5
 ```
 ## 6. 模型导出与预测
@@ -356,7 +362,7 @@ inference 模型（`paddle.jit.save`保存的模型） 一般是模型训练，
 ```bash
 python3 tools/export_model.py \
-	-c configs/picodet/legacy_model/application/layout_detection/picodet_lcnet_x1_0_layout.yml \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
 	-o weights=output/picodet_lcnet_x1_0_layout/best_model \
 	--output_dir=output_inference/
 ```
@@ -377,8 +383,8 @@ FGD蒸馏模型转inference模型步骤如下：
 ```bash
 python3 tools/export_model.py \
-	-c configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_student.yml \
+	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--slim_config configs/picodet/legacy_model/application/publayernet_lcnet_x1_5/picodet_teacher.yml \
+	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
 	-o weights=./output/picodet_lcnet_x2_5_layout/best_model \
 	--output_dir=output_inference/
 ```
@@ -404,7 +410,7 @@ python3 deploy/python/infer.py \
 ------------------------------------------
 -----------  Model Configuration -----------
 Model Arch: PicoDet
-Transform Order: 
+Transform Order:
 --transform op: Resize
 --transform op: NormalizeImage
 --transform op: Permute
@@ -466,4 +472,3 @@ preprocess_time(ms): 2172.50, inference_time(ms): 11.90, postprocess_time(ms): 1
  year={2022}
 }
 ```
--- a/ppstructure/layout/__init__.py
+++ b/ppstructure/layout/__init__.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppstructure/recovery/README.md
+++ b/ppstructure/recovery/README.md
@@ -9,7 +9,7 @@ English | [简体中文](README_ch.md)
 <a name="1"></a>
-## 1.  Introduction
+## 1. Introduction
 Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order.
@@ -33,14 +33,14 @@ The following figure shows the result：
 python3 -m pip install --upgrade pip
 # GPU installation
-python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simple
 # CPU installation
-python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple
 ````
-For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick).
+For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/macos-pip_en.html).
 <a name="2.2"></a>
@@ -67,38 +67,61 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt
 ## 3. Quick Start
+<a name="3.1"></a>
+### 3.1 下载模型
+If input is English document, download English models:
 ```python
 cd PaddleOCR/ppstructure
 # download model
 mkdir inference && cd inference
 # Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it
-wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
+https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
 # Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
-wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf  ch_PP-OCRv3_rec_infer.tar
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
 # Download the ultra-lightweight English table inch model and unzip it
-wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
 # Download the layout model of publaynet dataset and unzip it
-wget 
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
-https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar picodet_lcnet_x1_0_layout_infer.tar
 cd ..
-# run
+```
+If input is Chinese document，download Chinese models:
+[Chinese and English ultra-lightweight PP-OCRv3 model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README.md#pp-ocr-series-model-listupdate-on-september-8th)、[表格识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#22-表格识别模型)、[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-版面分析模型)
+<a name="3.2"></a>
+### 3.2 版面恢复
+```bash
 python3 predict_system.py \
    --image_dir=./docs/table/1.png \
    --det_model_dir=inference/en_PP-OCRv3_det_infer \
-    --rec_model_dir=inference/en_PP-OCRv3_rec_infe \
+    --rec_model_dir=inference/en_PP-OCRv3_rec_infer \
    --rec_char_dict_path=../ppocr/utils/en_dict.txt \
-    --output=../output/ \
+    --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
-    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
-    --table_max_len=488 \
+    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
-    --layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
    --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
    --vis_font_path=../doc/fonts/simfang.ttf \
    --recovery=True \
-		--save_pdf=False
+    --save_pdf=False \
+    --output=../output/
 ```
-After running, the docx  of each picture will be saved in the directory specified by the output field
+After running, the docx of each picture will be saved in the directory specified by the output field
-Recovery table to Word code[table_process.py] reference：https://github.com/pqzx/html2docx.git
+Field：
\ No newline at end of file
+- image_dir：test file测试文件， can be picture, picture directory, pdf file, pdf file directory
+- det_model_dir：OCR detection model path
+- rec_model_dir：OCR recognition model path
+- rec_char_dict_path：OCR recognition dict path. If the Chinese model is used, change to "../ppocr/utils/ppocr_keys_v1.txt". And if you trained the model on your own dataset, change to the trained dictionary
+- table_model_dir：tabel recognition model path
+- table_char_dict_path：tabel recognition dict path. If the Chinese model is used, no need to change
+- layout_model_dir：layout analysis model path
+- layout_dict_path：layout analysis dict path. If the Chinese model is used, change to "../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt"
+- recovery：whether to enable layout of recovery, default False
+- save_pdf：when recovery file, whether to save pdf file, default False
+- output：save the recovery result path
--- a/ppstructure/recovery/README_ch.md
+++ b/ppstructure/recovery/README_ch.md
@@ -8,19 +8,22 @@
  - [2.2 安装PaddleOCR](#2.2)
 - [3. 使用](#3)
+  - [3.1 下载模型](#3.1)
+  - [3.2 版面恢复](#3.2)
 <a name="1"></a>
-## 1.  简介
+## 1. 简介
 版面恢复就是在OCR识别后，内容仍然像原文档图片那样排列着，段落不变、顺序不变的输出到word文档中等。
-版面恢复结合了[版面分析](../layout/README_ch.md)、[表格识别](../table/README_ch.md)技术，从而更好地恢复图片、表格、标题等内容，下图展示了版面恢复的结果：
+版面恢复结合了[版面分析](../layout/README_ch.md)、[表格识别](../table/README_ch.md)技术，从而更好地恢复图片、表格、标题等内容，支持pdf文档、文档图片格式的输入文件，下图展示了版面恢复的结果：
 <div align="center">
-<img src="../docs/table/recovery.jpg"  width = "700" />
+<img src="../docs/recovery/recovery.jpg"  width = "700" />
 </div>
 <a name="2"></a>
 ## 2. 安装
@@ -35,10 +38,10 @@
 python3 -m pip install --upgrade pip
 # GPU安装
-python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simple
 # CPU安装
-python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple
 ```
@@ -69,40 +72,66 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt
 ## 3. 使用
-恢复给定文档的版面：
+<a name="3.1"></a>
-```python
+### 3.1 下载模型
+如果输入为英文文档类型，下载英文模型
+```
 cd PaddleOCR/ppstructure
 # 下载模型
 mkdir inference && cd inference
-# 下载超英文轻量级PP-OCRv3模型的检测模型并解压
+# 下载英文超轻量PP-OCRv3检测模型并解压
-wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
-# 下载英文轻量级PP-OCRv3模型的识别模型并解压
+# 下载英文超轻量PP-OCRv3识别模型并解压
-wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf  ch_PP-OCRv3_rec_infer.tar
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
-# 下载超轻量级英文表格英寸模型并解压
+# 下载英文表格识别模型并解压
-wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
 # 下载英文版面分析模型
-wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout_infer.tar && tar picodet_lcnet_x1_0_layout_infer.tar
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
 cd ..
+```
+如果输入为中文文档类型，在下述链接中下载中文模型即可：
-# 执行预测
+[PP-OCRv3中英文超轻量文本检测和识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/README_ch.md#pp-ocr%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E5%88%97%E8%A1%A8%E6%9B%B4%E6%96%B0%E4%B8%AD)、[表格识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#22-表格识别模型)、[版面分析模型](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md#1-版面分析模型)
+<a name="3.2"></a>
+### 3.2 版面恢复
+使用下载的模型恢复给定文档的版面，以英文模型为例，执行如下命令：
+```python
 python3 predict_system.py \
    --image_dir=./docs/table/1.png \
    --det_model_dir=inference/en_PP-OCRv3_det_infer \
-    --rec_model_dir=inference/en_PP-OCRv3_rec_infe \
+    --rec_model_dir=inference/en_PP-OCRv3_rec_infer \
    --rec_char_dict_path=../ppocr/utils/en_dict.txt \
-    --output=../output/ \
+    --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
-    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
-    --table_max_len=488 \
+    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
-    --layout_model_dir=inference/picodet_lcnet_x1_0_layout_infer \
    --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
    --vis_font_path=../doc/fonts/simfang.ttf \
    --recovery=True \
-		--save_pdf=False
+    --save_pdf=False \
+    --output=../output/
 ```
-运行完成后，每张图片的docx文档会保存到`output`字段指定的目录下
+运行完成后，恢复版面的docx文档会保存到`output`字段指定的目录下
-表格恢复到Word代码[table_process.py]来自：https://github.com/pqzx/html2docx.git
+字段含义：
+- image_dir：测试文件，可以是图片、图片目录、pdf文件、pdf文件目录
+- det_model_dir：OCR检测模型路径
+- rec_model_dir：OCR识别模型路径
+- rec_char_dict_path：OCR识别字典，如果更换为中文模型，需要更改为"../ppocr/utils/ppocr_keys_v1.txt"，如果您在自己的数据集上训练的模型，则更改为训练的字典的文件
+- table_model_dir：表格识别模型路径
+- table_char_dict_path：表格识别字典，如果更换为中文模型，不需要更换字典
+- layout_model_dir：版面分析模型路径
+- layout_dict_path：版面分析字典，如果更换为中文模型，需要更改为"../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt"
+- recovery：是否进行版面恢复，默认False
+- save_pdf：进行版面恢复导出docx文档的同时，是否保存为pdf文件，默认为False
+- output：版面恢复结果保存路径
--- a/ppstructure/recovery/__init__.py
+++ b/ppstructure/recovery/__init__.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppstructure/recovery/recovery_to_doc.py
+++ b/ppstructure/recovery/recovery_to_doc.py
@@ -24,7 +24,7 @@ from docx.enum.section import WD_SECTION
 from docx.oxml.ns import qn
 from docx.enum.table import WD_TABLE_ALIGNMENT
-from table_process import HtmlToDocx
+from ppstructure.recovery.table_process import HtmlToDocx
 from ppocr.utils.logging import get_logger
 logger = get_logger()
@@ -69,7 +69,7 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf):
            new_table = deepcopy(table)
            new_table.alignment = WD_TABLE_ALIGNMENT.CENTER
            paragraph.add_run().element.addnext(new_table._tbl)
        else:
            paragraph = doc.add_paragraph()
            paragraph_format = paragraph.paragraph_format
@@ -86,10 +86,10 @@ def convert_info_docx(img, res, save_folder, img_name, save_pdf):
    # save to pdf
    if save_pdf:
-        pdf = os.path.join(save_folder, '{}.pdf'.format(img_name))
+        pdf_path = os.path.join(save_folder, '{}.pdf'.format(img_name))
        from docx2pdf import convert
        convert(docx_path, pdf_path)
-        logger.info('pdf save to {}'.format(pdf))
+        logger.info('pdf save to {}'.format(pdf_path))
 def sorted_layout_boxes(res, w):
@@ -112,7 +112,7 @@ def sorted_layout_boxes(res, w):
    res_left = []
    res_right = []
    i = 0
    while True:
        if i >= num_boxes:
            break
@@ -137,7 +137,7 @@ def sorted_layout_boxes(res, w):
            res_left = []
            res_right = []
            break
-        elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3*w / 4:
+        elif _boxes[i]['bbox'][0] < w / 4 and _boxes[i]['bbox'][2] < 3 * w / 4:
            _boxes[i]['layout'] = 'double'
            res_left.append(_boxes[i])
            i += 1
@@ -157,4 +157,4 @@ def sorted_layout_boxes(res, w):
        new_res += res_left
    if res_right:
        new_res += res_right
    return new_res
\ No newline at end of file
--- a/ppstructure/utility.py
+++ b/ppstructure/utility.py
@@ -84,13 +84,18 @@ def init_args():
        type=str2bool,
        default=True,
        help='In the forward, whether the non-table area is recognition by ocr')
+    # param for recovery
    parser.add_argument(
        "--recovery",
-        type=bool,
+        type=str2bool,
        default=False,
        help='Whether to enable layout of recovery')
    parser.add_argument(
-        "--save_pdf", type=bool, default=False, help='Whether to save pdf file')
+        "--save_pdf",
+        type=str2bool,
+        default=False,
+        help='Whether to save pdf file')
    return parser