merge dygraph

a5003201 · 文幕地方 · 3bf40c76 · 0b220341 · a5003201 · a5003201
35 changed file
--- a/PPOCRLabel/README_ch.md
+++ b/PPOCRLabel/README_ch.md
@@ -86,17 +86,18 @@ PPOCRLabel --lang ch --kie True  # 启动 【KIE 模式】，用于打【检测+

 > 如果上述安装出现问题，可以参考3.6节 错误提示

-#### 1.2.2 本地构建whl包并安装
+#### 1.2.2 通过Python脚本运行PPOCRLabel
+
+如果您对PPOCRLabel文件有所更改（例如指定新的内置模型），通过Python脚本运行会更加方面的看到更改的结果。如果仍然需要通过whl包启动，则需要参考下节重新编译whl包。

 ```bash
-cd PaddleOCR/PPOCRLabel
-python3 setup.py bdist_wheel 
-pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple
+cd ./PPOCRLabel  # 切换到PPOCRLabel目录
+python PPOCRLabel.py --lang ch
 ```

-#### 1.2.3 通过Python脚本运行PPOCRLabel
+#### 1.2.3 本地构建whl包并安装

-如果您对PPOCRLabel文件有所更改，通过Python脚本运行会更加方面的看到更改的结果
+编译与安装新的whl包，其中1.0.2为版本号，可在 `setup.py` 中指定新版本。

 ```bash
 cd ./PPOCRLabel  # 切换到PPOCRLabel目录
@@ -107,7 +108,6 @@ python PPOCRLabel.py --lang ch --kie True  # 启动 【KIE 模式】，用于打
 ```


-
 ## 2. 使用

 ### 2.1 操作步骤

--- a/README.md
+++ b/README.md
@@ -112,6 +112,10 @@ For a new language request, please refer to [Guideline for new language_requests
        - [Text Recognition](./doc/doc_en/recognition_en.md)
        - [Text Direction Classification](./doc/doc_en/angle_class_en.md)
        - [Yml Configuration](./doc/doc_en/config_en.md)
+    - PP-OCR Models Compression
+        - [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md)
+        - [Model Quantization](./deploy/slim/quantization/README_en.md)
+        - [Model Pruning](./deploy/slim/prune/README_en.md)
    - Inference and Deployment
        - [C++ Inference](./deploy/cpp_infer/readme_en.md)
        - [Serving](./deploy/pdserving/README.md)

--- a/README_ch.md
+++ b/README_ch.md
@@ -79,22 +79,26 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力
 ## 文档教程

 - [运行环境准备](./doc/doc_ch/environment.md)
- [快速开始（中英文/多语言/文档分析）](./doc/doc_ch/quickstart.md)
+- [快速开始（中英文/多语言/版面分析）](./doc/doc_ch/quickstart.md)
 - [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md)
 - PP-OCR产业落地：从训练到部署
    - [PP-OCR模型库](./doc/doc_ch/models.md)
        - [PP-OCR模型下载](./doc/doc_ch/models_list.md)
-        - [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md)
+        - [Python引擎的PP-OCR模型库推理](./doc/doc_ch/inference_ppocr.md)
    - [PP-OCR模型训练](./doc/doc_ch/training.md)
        - [文本检测](./doc/doc_ch/detection.md)
        - [文本识别](./doc/doc_ch/recognition.md)
        - [文本方向分类器](./doc/doc_ch/angle_class.md)
-        - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
        - [配置文件内容与生成](./doc/doc_ch/config.md)
+    - PP-OCR模型压缩
+        - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
+        - [模型量化](./deploy/slim/quantization/README.md)
+        - [模型裁剪](./deploy/slim/prune/README.md)
    - PP-OCR模型推理部署
        - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
        - [服务化部署](./deploy/pdserving/README_CN.md)
        - [端侧部署](./deploy/lite/readme.md)
+        - [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md)
        - [Benchmark](./doc/doc_ch/benchmark.md)
 - [PP-Structure信息提取](./ppstructure/README_ch.md)
    - [版面分析](./ppstructure/layout/README_ch.md)

--- a/deploy/paddle2onnx/readme.md
+++ b/deploy/paddle2onnx/readme.md
-# paddle2onnx 模型转化与预测
+# Paddle2ONNX模型转化与预测

 本章节介绍 PaddleOCR 模型如何转化为 ONNX 模型，并基于 ONNXRuntime 引擎预测。


--- a/deploy/pdserving/README_CN.md
+++ b/deploy/pdserving/README_CN.md
@@ -8,8 +8,7 @@ PaddleOCR提供2种服务部署方式：

 # 基于PaddleServing的服务部署

-本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PPOCR
-动态图模型的pipeline在线服务。
+本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PP-OCR动态图模型的pipeline在线服务。

 相比较于hubserving部署，PaddleServing具备以下优点：
 - 支持客户端和服务端之间高并发和高效通信
@@ -59,7 +58,7 @@ pip3 install paddle_serving_app-0.7.0-py3-none-any.whl

 使用PaddleServing做服务化部署时，需要将保存的inference模型转换为serving易于部署的模型。

-首先，下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th)
+首先，下载PP-OCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th)

 ```bash
 # 下载并解压 OCR 文本检测模型
@@ -107,7 +106,7 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \
 1. 下载PaddleOCR代码，若已下载可跳过此步骤
    ```
    git clone https://github.com/PaddlePaddle/PaddleOCR
-
+    
    # 进入到工作目录
    cd PaddleOCR/deploy/pdserving/
    ```

--- a/deploy/slim/prune/README.md
+++ b/deploy/slim/prune/README.md

-## 介绍
+# PP-OCR模型裁剪

 复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余，模型裁剪通过移出网络模型中的子模型来减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
@@ -7,13 +7,13 @@


 在开始本教程之前，建议先了解：
-1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)
+1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)
 2. [模型裁剪教程](https://github.com/PaddlePaddle/PaddleSlim/blob/release%2F2.0.0/docs/zh_cn/tutorials/pruning/dygraph/filter_pruning.md)

-
 ## 快速开始

 模型裁剪主要包括四个步骤：
+
 1. 安装 PaddleSlim
 2. 准备训练好的模型
 3. 敏感度分析、裁剪训练
@@ -35,17 +35,20 @@ python3 setup.py install

 加载预训练模型后，通过对现有模型的每个网络层进行敏感度分析，得到敏感度文件：sen.pickle，可以通过PaddleSlim提供的[接口](https://github.com/PaddlePaddle/PaddleSlim/blob/9b01b195f0c4bc34a1ab434751cb260e13d64d9e/paddleslim/dygraph/prune/filter_pruner.py#L75)加载文件，获得各网络层在不同裁剪比例下的精度损失。从而了解各网络层冗余度，决定每个网络层的裁剪比例。
 敏感度文件内容格式：
-    sen.pickle(Dict){
+```
+sen.pickle(Dict){
            'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
            'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
        }

-    例子：
+例子：
        {
            'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
            'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
        }
-加载敏感度文件后会返回一个字典，字典中的keys为网络模型参数模型的名字，values为一个字典，里面保存了相应网络层的裁剪敏感度信息。例如在例子中，conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%，详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0-alpha/docs/zh_cn/algo/algo.md)
+```
+
+加载敏感度文件后会返回一个字典，字典中的keys为网络模型参数模型的名字，values为一个字典，里面保存了相应网络层的裁剪敏感度信息。例如在例子中，conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%，详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)

 进入PaddleOCR根目录，通过以下命令对模型进行敏感度分析训练：
 ```bash

--- a/deploy/slim/prune/README_en.md
+++ b/deploy/slim/prune/README_en.md

-## Introduction
+# PP-OCR Models Pruning

 Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.

@@ -37,25 +37,27 @@ PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en.

  After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle.  After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see：[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md)
  The data format of sensitivity file：
-      sen.pickle(Dict){
+
+```      
+sen.pickle(Dict){
              'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
              'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
          }
-
-      example：
+example：
          {
              'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
              'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
          }
  The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0-alpha/docs/zh_cn/algo/algo.md)
+```
+
+  The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)


 Enter the PaddleOCR root directory，perform sensitivity analysis on the model with the following command：

 ```bash
-
 python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model"  Global.save_model_dir=./output/prune_model/
-
 ```



--- a/deploy/slim/quantization/README.md
+++ b/deploy/slim/quantization/README.md

-## 介绍
+# PP-OCR模型量化
 复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余，模型量化将全精度缩减到定点数减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
 模型量化可以在基本不损失模型的精度的情况下，将FP32精度的模型参数转换为Int8精度，减小模型参数大小并加速计算，使用量化后的模型在移动端等部署时更具备速度优势。

 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
 [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化（包括量化训练和离线量化）、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能，如果您感兴趣，可以关注并了解。

-在开始本教程之前，建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
+在开始本教程之前，建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)


 ## 快速开始

--- a/deploy/slim/quantization/README_en.md
+++ b/deploy/slim/quantization/README_en.md

-## Introduction
+# PP-OCR Models Quantization

 Generally, a more complex model would achieve better performance in the task, but it also leads to some redundancy in the model.
 Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,

--- a/deploy/slim/quantization/export_model.py
+++ b/deploy/slim/quantization/export_model.py
@@ -127,6 +127,7 @@ def main():
    arch_config = config["Architecture"]
    if arch_config["algorithm"] in ["Distillation", ]:  # distillation model
        for idx, name in enumerate(model.model_name_list):
+            model.model_list[idx].eval()
            sub_model_save_path = os.path.join(save_path, name, "inference")
            export_single_model(quanter, model.model_list[idx], infer_shape,
                                sub_model_save_path, logger)

--- a/doc/doc_ch/environment.md
+++ b/doc/doc_ch/environment.md
 # 运行环境准备

-Windows和Mac用户推荐使用Anaconda搭建Python环境，Linux用户建议使用docker搭建PyThon环境。
+Windows和Mac用户推荐使用Anaconda搭建Python环境，Linux用户建议使用docker搭建Python环境。

 推荐环境：
- PaddlePaddle >= 2.0.0 (2.1.2)
- python3.7
+- PaddlePaddle >= 2.1.2
+- Python 3.7
 - CUDA10.1 / CUDA10.2
 - CUDNN 7.6

-如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。
+> 如果您已经安装Python环境，可以直接参考[PaddleOCR快速开始](./quickstart.md)

 * [1. Python环境搭建](#1)
  + [1.1 Windows](#1.1)
  + [1.2 Mac](#1.2)
  + [1.3 Linux](#1.3)
-* [2. 安装PaddlePaddle](#2)

 <a name="1"></a>

@@ -212,7 +211,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker
    wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh

  # 若您要下载其他版本，需要将最后1个/后的文件名改成您希望下载的版本
-    ```
+  ```

 - 安装Anaconda：

@@ -311,21 +310,3 @@ sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=hos
 # ctrl+P+Q可退出docker 容器，重新进入docker 容器使用如下命令
 sudo docker container exec -it ppocr /bin/bash
 ```
-
-<a name="2"></a>
-
-## 2. 安装PaddlePaddle
-
- 如果您的机器安装的是CUDA9或CUDA10，请运行以下命令安装
-
-```bash
-python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
-```
-
- 如果您的机器是CPU，请运行以下命令安装
-
-```bash
-python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
-```
-
-更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
--- a/doc/doc_ch/finetune.md
+++ b/doc/doc_ch/finetune.md
+# 模型微调
+
+## 1. 模型微调背景与意义
+
+PaddleOCR提供的PP-OCR系列模型在通用场景中性能优异，能够解决绝大多数情况下的检测与识别问题。在垂类场景中，如果希望获取更优的模型效果，可以通过模型微调的方法，进一步提升PP-OCR系列检测与识别模型的精度。
+
+本文主要介绍文本检测与识别模型在模型微调时的一些注意事项，最终希望您在自己的场景中，通过模型微调，可以获取精度更高的文本检测与识别模型。
+
+本文核心要点如下所示。
+
+1. PP-OCR提供的预训练模型有较好的泛化能力
+2. 加入少量真实数据（检测任务>=500张, 识别任务>=5000张），会大幅提升垂类场景的检测与识别效果
+3. 在模型微调时，加入真实通用场景数据，可以进一步提升模型精度与泛化性能
+4. 在图像检测任务中，增大图像的预测尺度，能够进一步提升较小文字区域的检测效果
+5. 在模型微调时，需要适当调整超参数（学习率，batch size最为重要），以获得更优的微调效果。
+
+更多详细内容，请参考第2章与第3章。
+
+## 2. 文本检测模型微调
+
+### 2.1 数据选择
+
+* 数据量：建议至少准备500张的文本检测数据集用于模型微调。
+
+* 数据标注：单行文本标注格式，建议标注的检测框与实际语义内容一致。如在火车票场景中，姓氏与名字可能离得较远，但是它们在语义上属于同一个检测字段，这里也需要将整个姓名标注为1个检测框。
+
+### 2.2 模型选择
+
+建议选择PP-OCRv2模型（配置文件：[ch_PP-OCRv2_det_student.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml)，预训练模型：[ch_PP-OCRv2_det_distill_train.tar](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)）进行微调，其精度与泛化性能是目前提供的最优预训练模型。
+
+更多PP-OCR系列模型，请参考[PaddleOCR 首页说明文档](../../README_ch.md)。
+
+注意：在使用上述预训练模型的时候，由于保存的模型中包含教师模型，因此需要将其中的学生模型单独提取出来，再加载学生模型即可进行模型微调。
+
+```python
+import paddle
+# 加载完整的检测预训练模型
+a = paddle.load("ch_PP-OCRv2_det_distill_train/best_accuracy.pdparams")
+# 提取学生模型的参数
+b = {k[len("student_model."):]: a[k] for k in a if "student_model." in k}
+# 保存模型，用于后续模型微调
+paddle.save(b, "ch_PP-OCRv2_det_student.pdparams")
+```
+
+
+### 2.3 训练超参选择
+
+在模型微调的时候，最重要的超参就是预训练模型路径`pretrained_model`, 学习率`learning_rate`与`batch_size`，部分配置文件如下所示。
+
+```yaml
+Global:
+  pretrained_model: ./pretrain_models/student.pdparams # 预训练模型路径
+Optimizer:
+  lr:
+    name: Cosine
+    learning_rate: 0.001 # 学习率
+    warmup_epoch: 2
+  regularizer:
+    name: 'L2'
+    factor: 0
+
+Train:
+  loader:
+    shuffle: True
+    drop_last: False
+    batch_size_per_card: 8  # 单卡batch size
+    num_workers: 4
+```
+
+上述配置文件中，首先需要将`pretrained_model`字段指定为2.2章节中提取出来的`ch_PP-OCRv2_det_student.pdparams`文件路径。
+
+PaddleOCR提供的配置文件是在8卡训练（相当于总的batch size是`8*8=64`）、且没有加载预训练模型情况下的配置文件，因此您的场景中，学习率与总的batch size需要对应线性调整，例如
+
+* 如果您的场景中是单卡训练，单卡batch_size=8，则总的batch_size=8，建议将学习率调整为`1e-4`左右。
+* 如果您的场景中是单卡训练，由于显存限制，只能设置单卡batch_size=4，则总的batch_size=4，建议将学习率调整为`5e-5`左右。
+
+### 2.4 预测超参选择
+
+对训练好的模型导出并进行推理时，可以通过进一步调整预测的图像尺度，来提升小面积文本的检测效果，下面是DBNet推理时的一些超参数，可以通过适当调整，提升效果。
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  det_db_thresh | float | 0.3 | DB输出的概率图中，得分大于该阈值的像素点才会被认为是文字像素点 |
+|  det_db_box_thresh | float | 0.6 | 检测结果边框内，所有像素点的平均得分大于该阈值时，该结果会被认为是文字区域 |
+|  det_db_unclip_ratio | float | 1.5 | `Vatti clipping`算法的扩张系数，使用该方法对文字区域进行扩张 |
+|  max_batch_size | int | 10 | 预测的batch size |
+|  use_dilation | bool | False | 是否对分割结果进行膨胀以获取更优检测效果 |
+|  det_db_score_mode | str | "fast" | DB的检测结果得分计算方法，支持`fast`和`slow`，`fast`是根据polygon的外接矩形边框内的所有像素计算平均得分，`slow`是根据原始polygon内的所有像素计算平均得分，计算速度相对较慢一些，但是更加准确一些。 |
+
+
+更多关于推理方法的介绍可以参考[Paddle Inference推理教程](./inference.md)。
+
+
+## 3. 文本识别模型微调
+
+
+### 3.1 数据选择
+
+* 数据量：不更换字典的情况下，建议至少准备5000张的文本识别数据集用于模型微调；如果更换了字典（不建议），需要的数量更多。
+
+* 数据分布：建议分布与实测场景尽量一致。如果实测场景包含大量短文本，则训练数据中建议也包含较多短文本，如果实测场景对于空格识别效果要求较高，则训练数据中建议也包含较多带空格的文本内容。
+
+
+* 通用中英文数据：在训练的时候，可以在训练集中添加通用真实数据（如在不更换字典的微调场景中，建议添加LSVT、RCTW、MTWI等真实数据），进一步提升模型的泛化性能。
+
+### 3.2 模型选择
+
+建议选择PP-OCRv2模型（配置文件：[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)，预训练模型：[ch_PP-OCRv2_rec_train.tar](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)）进行微调，其精度与泛化性能是目前提供的最优预训练模型。
+
+更多PP-OCR系列，模型请参考[PaddleOCR 首页说明文档](../../README_ch.md)。
+
+
+### 3.3 训练超参选择
+
+与文本检测任务微调相同，在识别模型微调的时候，最重要的超参就是预训练模型路径`pretrained_model`, 学习率`learning_rate`与`batch_size`，部分默认配置文件如下所示。
+
+```yaml
+Global:
+  pretrained_model:  # 预训练模型路径
+Optimizer:
+  lr:
+    name: Piecewise
+    decay_epochs : [700, 800]
+    values : [0.001, 0.0001]  # 学习率
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    factor: 0
+
+Train:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/
+    label_file_list:
+    - ./train_data/train_list.txt
+    ratio_list: [1.0] # 采样比例，默认值是[1.0]
+  loader:
+    shuffle: True
+    drop_last: False
+    batch_size_per_card: 128 # 单卡batch size
+    num_workers: 8
+
+```
+
+
+上述配置文件中，首先需要将`pretrained_model`字段指定为2.2章节中解压得到的`ch_PP-OCRv2_rec_train/best_accuracy.pdparams`文件路径。
+
+PaddleOCR提供的配置文件是在8卡训练（相当于总的batch size是`8*128=1024`）、且没有加载预训练模型情况下的配置文件，因此您的场景中，学习率与总的batch size需要对应线性调整，例如：
+
+* 如果您的场景中是单卡训练，单卡batch_size=128，则总的batch_size=128，在加载预训练模型的情况下，建议将学习率调整为`[1e-4, 2e-5]`左右（piecewise学习率策略，需设置2个值，下同）。
+* 如果您的场景中是单卡训练，因为显存限制，只能设置单卡batch_size=64，则总的batch_size=64，在加载预训练模型的情况下，建议将学习率调整为`[5e-5, 1e-5]`左右。
+
+
+如果有通用真实场景数据加进来，建议每个epoch中，垂类场景数据与真实场景的数据量保持在1:1左右。
+
+比如：您自己的垂类场景识别数据量为1W，数据标签文件为`vertical.txt`，收集到的通用场景识别数据量为10W，数据标签文件为`general.txt`，
+
+
+那么，可以设置`label_file_list`和`ratio_list`参数如下所示。每个epoch中，`vertical.txt`中会进行全采样（采样比例为1.0），包含1W条数据；`general.txt`中会按照0.1的采样比例进行采样，包含`10W*0.1=1W`条数据，最终二者的比例为`1:1`。
+
+```yaml
+Train:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/
+    label_file_list:
+    - vertical.txt
+    - general.txt
+    ratio_list: [1.0, 0.1]
+```
--- a/doc/doc_ch/inference.md
+++ b/doc/doc_ch/inference.md
@@ -36,6 +36,8 @@ inference 模型（`paddle.jit.save`保存的模型）

 - [六、参数解释](#参数解释)

+- [七、FAQ](#FAQ)
+

 <a name="训练模型转inference模型"></a>
 ## 一、训练模型转inference模型
@@ -520,3 +522,9 @@ PSE算法相关参数如下
 |  label_list | list | ['0', '180'] | class id对应的角度值 |
 |  cls_batch_num | int | 6 | 方向分类器预测的batch size |
 |  cls_thresh | float | 0.9 | 预测阈值，模型预测结果为180度，且得分大于该阈值时，认为最终预测结果为180度，需要翻转 |
+
+
+
+# 七、FAQ
+
+* 如果是使用paddle2.0之前版本的代码导出的`inference模型`，则其文件名为`model`与`params`，分别对应paddle2.0或者之后版本导出的`inference.pdmodel`与`inference.pdiparams`；不过目前PaddleOCR的release分支已经不支持paddle2.0之前版本导出的inference 模型，如果希望使用，需要使用develop分支（静态图分支）的代码与文档。
--- a/doc/doc_ch/quickstart.md
+++ b/doc/doc_ch/quickstart.md
-# PaddleOCR快速开始
-
 - [PaddleOCR快速开始](#paddleocr快速开始)
-  - [1. 安装PaddleOCR whl包](#1-安装paddleocr-whl包)
+  - [1. 安装](#1-安装)
+    - [1.1 安装PaddlePaddle](#11-安装paddlepaddle)
+    - [1.2 安装PaddleOCR whl包](#12-安装paddleocr-whl包)
  - [2. 便捷使用](#2-便捷使用)
    - [2.1 命令行使用](#21-命令行使用)
      - [2.1.1 中英文模型](#211-中英文模型)
@@ -10,10 +10,37 @@
    - [2.2 Python脚本使用](#22-python脚本使用)
      - [2.2.1 中英文与多语言使用](#221-中英文与多语言使用)
      - [2.2.2 版面分析](#222-版面分析)
+  - [3. 小结](#3-小结)
+
+# PaddleOCR快速开始

 <a name="1"></a>

-## 1. 安装PaddleOCR whl包
+## 1. 安装
+
+<a name="11"></a>
+
+### 1.1 安装PaddlePaddle
+
+> 如果您没有基础的Python运行环境，请参考[运行环境准备](./environment.md)。
+
+- 您的机器安装的是CUDA9或CUDA10，请运行以下命令安装
+
+  ```bash
+  python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+  ```
+
+- 您的机器是CPU，请运行以下命令安装
+
+  ```bash
+  python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+  ```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="12"></a>
+
+### 1.2 安装PaddleOCR whl包

 ```bash
 pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本
@@ -258,3 +285,11 @@ im_show = draw_structure_result(image, result,font_path=font_path)
 im_show = Image.fromarray(im_show)
 im_show.save('result.jpg')
 ```
+
+<a name="3"></a>
+
+## 3. 小结
+
+通过本节内容，相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。
+
+PaddleOCR是一套丰富领先实用的OCR工具库，打通数据、模型训练、压缩和推理部署全流程，因此在[下一节](./paddleOCR_overview.md)中我们将首先为您介绍PaddleOCR的全景图，然后克隆PaddleOCR项目，正式开启PaddleOCR的应用之旅。
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@@ -248,7 +248,10 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
 | rec_r31_sar.yml               | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
 | rec_resnet_stn_bilstm_att.yml | SEED | Aster_Resnet | STN | BiLSTM | att |

-*其中SEED模型需要额外加载FastText训练好的[语言模型](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz)
+*其中SEED模型需要额外加载FastText训练好的[语言模型](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz) ,并且安装 fasttext 依赖：
+```
+python3.7 -m pip install fasttext==0.9.1
+```

 训练中文数据，推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)，如您希望尝试其他算法在中文数据集上的效果，请参考下列说明修改配置文件：


--- a/doc/doc_en/environment_en.md
+++ b/doc/doc_en/environment_en.md
 # Environment Preparation

-Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. If you are familiar with the Python environment, you can skip to step 2 to install PaddlePaddle.
+Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. 

 Recommended working environment:
- PaddlePaddle >= 2.0.0 (2.1.2)
+- PaddlePaddle >= 2.1.2
 - Python 3.7
 - CUDA 10.1 / CUDA 10.2
 - cuDNN 7.6

+> If you already have a Python environment installed, you can skip to [PaddleOCR Quick Start](./quickstart_en.md).
+
 * [1. Python Environment Setup](#1)
  + [1.1 Windows](#1.1)
  + [1.2 Mac](#1.2)
  + [1.3 Linux](#1.3)
-* [2. Install PaddlePaddle 2.0](#2)


 <a name="1"></a>
@@ -330,21 +331,3 @@ You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags
 # ctrl+P+Q to exit docker, to re-enter docker using the following command:
 sudo docker container exec -it ppocr /bin/bash
 ```
-
-<a name="2"></a>
-
-## 2. Install PaddlePaddle 2.0
-
- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
-
-```bash
-python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
-```
-
- If you have no available GPU on your machine, please run the following command to install the CPU version
-
-```bash
-python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
-```
-
-For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
--- a/doc/doc_en/quickstart_en.md
+++ b/doc/doc_en/quickstart_en.md
-
-# PaddleOCR Quick Start
-
 - [PaddleOCR Quick Start](#paddleocr-quick-start)
-  - [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
+  - [1. Installation](#1-installation)
+    - [1.1 Install PaddlePaddle](#11-install-paddlepaddle)
+    - [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
  - [2. Easy-to-Use](#2-easy-to-use)
    - [2.1 Use by Command Line](#21-use-by-command-line)
      - [2.1.1 Chinese and English Model](#211-chinese-and-english-model)
@@ -11,12 +10,38 @@
    - [2.2 Use by Code](#22-use-by-code)
      - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese--english-model-and-multilingual-model)
      - [2.2.2 Layout Analysis](#222-layout-analysis)
+  - [3. Summary](#3-summary)
+
+# PaddleOCR Quick Start
+
+
+<a name="1nstallation"></a>
+
+## 1. Installation
+
+<a name="11-install-paddlepaddle"></a>
+
+### 1.1 Install PaddlePaddle
+
+> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).

+- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install

+  ```bash
+  python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+  ```
+
+- If you have no available GPU on your machine, please run the following command to install the CPU version
+
+  ```bash
+  python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+  ```

-<a name="1-install-paddleocr-whl-package"></a>
+For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.

-## 1. Install PaddleOCR Whl Package
+<a name="12-install-paddleocr-whl-package"></a>
+
+### 1.2 Install PaddleOCR Whl Package

 ```bash
 pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
@@ -249,3 +274,11 @@ im_show = draw_structure_result(image, result,font_path=font_path)
 im_show = Image.fromarray(im_show)
 im_show.save('result.jpg')
 ```
+
+<a name="3"></a>
+
+## 3. Summary
+
+In this section, you have mastered the use of PaddleOCR whl packages and obtained results.
+
+PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR.
--- a/doc/joinus.PNG
+++ b/doc/joinus.PNG
--- a/paddleocr.py
+++ b/paddleocr.py
@@ -47,7 +47,7 @@ __all__ = [
 ]

 SUPPORT_DET_MODEL = ['DB']
-VERSION = '2.4.0.3'
+VERSION = '2.4.0.4'
 SUPPORT_REC_MODEL = ['CRNN']
 BASE_DIR = os.path.expanduser("~/.paddleocr/")


--- a/ppocr/losses/kie_sdmgr_loss.py
+++ b/ppocr/losses/kie_sdmgr_loss.py
-# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -12,6 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+# reference from : https://github.com/open-mmlab/mmocr/blob/main/mmocr/models/kie/losses/sdmgr_loss.py
+
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

--- a/ppocr/metrics/kie_metric.py
+++ b/ppocr/metrics/kie_metric.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+# The code is refer from: https://github.com/open-mmlab/mmocr/blob/main/mmocr/core/evaluation/kie_metric.py

 from __future__ import absolute_import
 from __future__ import division

--- a/ppocr/modeling/heads/kie_sdmgr_head.py
+++ b/ppocr/modeling/heads/kie_sdmgr_head.py
-# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+# reference from : https://github.com/open-mmlab/mmocr/blob/main/mmocr/models/kie/heads/sdmgr_head.py

 from __future__ import absolute_import
 from __future__ import division

--- a/ppocr/modeling/heads/rec_ctc_head.py
+++ b/ppocr/modeling/heads/rec_ctc_head.py
@@ -80,7 +80,6 @@ class CTCHead(nn.Layer):
            result = (x, predicts)
        else:
            result = predicts
-
        if not self.training:
            predicts = F.softmax(predicts, axis=2)
            result = predicts

--- a/ppocr/modeling/heads/rec_sar_head.py
+++ b/ppocr/modeling/heads/rec_sar_head.py
@@ -216,7 +216,7 @@ class ParallelSARDecoder(BaseDecoder):
        self.pred_dropout = nn.Dropout(pred_dropout)
        pred_num_classes = self.num_classes - 1
        if pred_concat:
-            fc_in_channel = decoder_rnn_out_size + d_model + d_enc
+            fc_in_channel = decoder_rnn_out_size + d_model + encoder_rnn_out_size
        else:
            fc_in_channel = d_model
        self.prediction = nn.Linear(fc_in_channel, pred_num_classes)

--- a/ppocr/postprocess/rec_postprocess.py
+++ b/ppocr/postprocess/rec_postprocess.py
@@ -89,7 +89,7 @@ class CTCLabelDecode(BaseRecLabelDecode):
                                             use_space_char)

    def __call__(self, preds, label=None, *args, **kwargs):
-        if isinstance(preds, tuple):
+        if isinstance(preds, tuple) or isinstance(preds, list):
            preds = preds[-1]
        if isinstance(preds, paddle.Tensor):
            preds = preds.numpy()

--- a/ppstructure/vqa/README.md
+++ b/ppstructure/vqa/README.md
- [文档视觉问答（DOC-VQA）](#文档视觉问答doc-vqa)
-  - [1. 简介](#1-简介)
-  - [2. 性能](#2-性能)
-  - [3. 效果演示](#3-效果演示)
+English | [简体中文](README_ch.md)
+
+- [Document Visual Question Answering (Doc-VQA)](#Document-Visual-Question-Answering)
+  - [1. Introduction](#1-Introduction)
+  - [2. Performance](#2-performance)
+  - [3. Effect demo](#3-Effect-demo)
    - [3.1 SER](#31-ser)
    - [3.2 RE](#32-re)
-  - [4. 安装](#4-安装)
-    - [4.1 安装依赖](#41-安装依赖)
-    - [4.2 安装PaddleOCR（包含 PP-OCR 和 VQA）](#42-安装paddleocr包含-pp-ocr-和-vqa)
-  - [5. 使用](#5-使用)
-    - [5.1 数据和预训练模型准备](#51-数据和预训练模型准备)
+  - [4. Install](#4-Install)
+    - [4.1 Installation dependencies](#41-Install-dependencies)
+    - [4.2 Install PaddleOCR](#42-Install-PaddleOCR)
+  - [5. Usage](#5-Usage)
+    - [5.1 Data and Model Preparation](#51-Data-and-Model-Preparation)
    - [5.2 SER](#52-ser)
    - [5.3 RE](#53-re)
-  - [6. 参考链接](#6-参考链接)
-
+  - [6. Reference](#6-Reference-Links)

-# 文档视觉问答（DOC-VQA）
+# Document Visual Question Answering

-## 1. 简介
+## 1 Introduction

-VQA指视觉问答，主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种，DOC-VQA主要针对文本图像的文字内容提出问题。
+VQA refers to visual question answering, which mainly asks and answers image content. DOC-VQA is one of the VQA tasks. DOC-VQA mainly asks questions about the text content of text images.

-PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进行开发。
+The DOC-VQA algorithm in PP-Structure is developed based on the PaddleNLP natural language processing algorithm library.

-主要特性如下：
+The main features are as follows:

- 集成[LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf)模型以及PP-OCR预测引擎。
- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务，可以完成对图像中的文本识别与分类；基于 RE 任务，可以完成对图象中的文本内容的关系提取，如判断问题对(pair)。
- 支持SER任务和RE任务的自定义训练。
- 支持OCR+SER的端到端系统预测与评估。
- 支持OCR+SER+RE的端到端系统预测。
+- Integrate [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) model and PP-OCR prediction engine.
+- Supports Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks based on multimodal methods. Based on the SER task, the text recognition and classification in the image can be completed; based on the RE task, the relationship extraction of the text content in the image can be completed, such as judging the problem pair (pair).
+- Supports custom training for SER tasks and RE tasks.
+- Supports end-to-end system prediction and evaluation of OCR+SER.
+- Supports end-to-end system prediction of OCR+SER+RE.


-本项目是 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/pdf/2104.08836.pdf) 在 Paddle 2.2上的开源实现，
-包含了在 [XFUND数据集](https://github.com/doc-analysis/XFUND) 上的微调代码。
+This project is an open source implementation of [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/pdf/2104.08836.pdf) on Paddle 2.2,
+Included fine-tuning code on [XFUND dataset](https://github.com/doc-analysis/XFUND).

-## 2. 性能
+## 2. Performance

-我们在 [XFUN](https://github.com/doc-analysis/XFUND) 的中文数据集上对算法进行了评估，性能如下
+We evaluate the algorithm on the Chinese dataset of [XFUND](https://github.com/doc-analysis/XFUND), and the performance is as follows

-| 模型 | 任务 | hmean | 模型下载地址 |
+| Model | Task | hmean | Model download address |
 |:---:|:---:|:---:| :---:|
-| LayoutXLM | SER | 0.9038 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
-| LayoutXLM | RE | 0.7483 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
-| LayoutLMv2 | SER | 0.8544 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)
-| LayoutLMv2 | RE | 0.6777 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
-| LayoutLM | SER | 0.7731 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
+| LayoutXLM | SER | 0.9038 | [link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
+| LayoutXLM | RE | 0.7483 | [link](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
+| LayoutLMv2 | SER | 0.8544 | [link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)
+| LayoutLMv2 | RE | 0.6777 | [link](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
+| LayoutLM | SER | 0.7731 | [link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |

-## 3. 效果演示
+## 3. Effect demo

-**注意：** 测试图片来源于XFUN数据集。
+**Note:** The test images are from the XFUND dataset.

 ### 3.1 SER

 ![](../../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../../doc/vqa/result_ser/zh_val_42_ser.jpg)
 ---|---

-图中不同颜色的框表示不同的类别，对于XFUN数据集，有`QUESTION`, `ANSWER`, `HEADER` 3种类别
+Boxes with different colors in the figure represent different categories. For the XFUND dataset, there are 3 categories: `QUESTION`, `ANSWER`, `HEADER`

-* 深紫色：HEADER
-* 浅紫色：QUESTION
-* 军绿色：ANSWER
+* Dark purple: HEADER
+* Light purple: QUESTION
+* Army Green: ANSWER

-在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+The corresponding categories and OCR recognition results are also marked on the upper left of the OCR detection frame.

 ### 3.2 RE

@@ -69,176 +70,190 @@ PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进
 ---|---


-图中红色框表示问题，蓝色框表示答案，问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+The red box in the figure represents the question, the blue box represents the answer, and the question and the answer are connected by a green line. The corresponding categories and OCR recognition results are also marked on the upper left of the OCR detection frame.

-## 4. 安装
+## 4. Install

-### 4.1 安装依赖
+### 4.1 Install dependencies

- **（1) 安装PaddlePaddle**
+- **(1) Install PaddlePaddle**

 ```bash
 python3 -m pip install --upgrade pip

-# GPU安装
+# GPU installation
 python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple

-# CPU安装
+# CPU installation
 python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple

-```
-更多需求，请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+````
+For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick).

-### 4.2 安装PaddleOCR（包含 PP-OCR 和 VQA）
+### 4.2 Install PaddleOCR

- **（1）pip快速安装PaddleOCR whl包（仅预测）**
+- **(1) pip install PaddleOCR whl package quickly (prediction only)**

 ```bash
 python3 -m pip install paddleocr
-```
+````

- **（2）下载VQA源码（预测+训练）**
+- **(2) Download VQA source code (prediction + training)**

 ```bash
-【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
+[Recommended] git clone https://github.com/PaddlePaddle/PaddleOCR

-# 如果因为网络问题无法pull成功，也可选择使用码云上的托管：
+# If the pull cannot be successful due to network problems, you can also choose to use the hosting on the code cloud:
 git clone https://gitee.com/paddlepaddle/PaddleOCR

-# 注：码云托管代码可能无法实时同步本github项目更新，存在3~5天延时，请优先使用推荐方式。
-```
+# Note: Code cloud hosting code may not be able to synchronize the update of this github project in real time, there is a delay of 3 to 5 days, please use the recommended method first.
+````

- **（3）安装VQA的`requirements`**
+- **(3) Install VQA's `requirements`**

 ```bash
 python3 -m pip install -r ppstructure/vqa/requirements.txt
-```
+````

-## 5. 使用
+## 5. Usage

-### 5.1 数据和预训练模型准备
+### 5.1 Data and Model Preparation

-如果希望直接体验预测过程，可以下载我们提供的预训练模型，跳过训练过程，直接预测即可。
+If you want to experience the prediction process directly, you can download the pre-training model provided by us, skip the training process, and just predict directly.

-* 下载处理好的数据集
+* Download the processed dataset

-处理好的XFUN中文数据集下载地址：[https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)。
+The download address of the processed XFUND Chinese dataset: [https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar).


-下载并解压该数据集，解压后将数据集放置在当前目录下。
+Download and unzip the dataset, and place the dataset in the current directory after unzipping.

 ```shell
 wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
-```
+````
+
+* Convert the dataset

-* 转换数据集
+If you need to train other XFUND datasets, you can use the following commands to convert the datasets

-若需进行其他XFUN数据集的训练，可使用下面的命令进行数据集的转换
+```bash
+python3 ppstructure/vqa/tools/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
+````

+* Download the pretrained models
 ```bash
-python3 ppstructure/vqa/helper/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
-```
+mkdir pretrain && cd pretrain
+#download the SER model
+wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar && tar -xvf ser_LayoutXLM_xfun_zh.tar
+#download the RE model
+wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -xvf re_LayoutXLM_xfun_zh.tar
+cd ../
+````

 ### 5.2 SER

-启动训练之前，需要修改下面的四个字段
+Before starting training, you need to modify the following four fields

-1. `Train.dataset.data_dir`：指向训练集图片存放目录
-2. `Train.dataset.label_file_list`：指向训练集标注文件
-3. `Eval.dataset.data_dir`：指指向验证集图片存放目录
-4. `Eval.dataset.label_file_list`：指向验证集标注文件
+1. `Train.dataset.data_dir`: point to the directory where the training set images are stored
+2. `Train.dataset.label_file_list`: point to the training set label file
+3. `Eval.dataset.data_dir`: refers to the directory where the validation set images are stored
+4. `Eval.dataset.label_file_list`: point to the validation set label file

-* 启动训练
+* start training
 ```shell
 CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml
-```
+````

-最终会打印出`precision`, `recall`, `hmean`等指标。
-在`./output/ser_layoutxlm/`文件夹中会保存训练日志，最优的模型和最新epoch的模型。
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed.
+In the `./output/ser_layoutxlm/` folder will save the training log, the optimal model and the model for the latest epoch.

-* 恢复训练
+* resume training

-恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+To resume training, assign the folder path of the previously trained model to the `Architecture.Backbone.checkpoints` field.

 ```shell
 CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
+````

-* 评估
+* evaluate

-评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+Evaluation requires assigning the folder path of the model to be evaluated to the `Architecture.Backbone.checkpoints` field.

 ```shell
 CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
-最终会打印出`precision`, `recall`, `hmean`等指标
+````
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed

-* 使用`OCR引擎 + SER`串联预测
+* Use `OCR engine + SER` tandem prediction

-使用如下命令即可完成`OCR引擎 + SER`的串联预测
+Use the following command to complete the series prediction of `OCR engine + SER`, taking the pretrained SER model as an example:

 ```shell
-CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml  -o Architecture.Backbone.checkpoints=ser_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_42.jpg
-```
+CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_42.jpg
+````

-最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件，预测结果文本文件名为`infer_results.txt`。
+Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.

-* 对`OCR引擎 + SER`预测系统进行端到端评估
+* End-to-end evaluation of `OCR engine + SER` prediction system

-首先使用 `tools/infer_vqa_token_ser.py` 脚本完成数据集的预测，然后使用下面的命令进行评估。
+First use the `tools/infer_vqa_token_ser.py` script to complete the prediction of the dataset, then use the following command to evaluate.

 ```shell
 export CUDA_VISIBLE_DEVICES=0
-python3 helper/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json  --pred_json_path output_res/infer_results.txt
-```
+python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
+````

 ### 5.3 RE

-* 启动训练
+* start training

-启动训练之前，需要修改下面的四个字段
+Before starting training, you need to modify the following four fields

-1. `Train.dataset.data_dir`：指向训练集图片存放目录
-2. `Train.dataset.label_file_list`：指向训练集标注文件
-3. `Eval.dataset.data_dir`：指指向验证集图片存放目录
-4. `Eval.dataset.label_file_list`：指向验证集标注文件
+1. `Train.dataset.data_dir`: point to the directory where the training set images are stored
+2. `Train.dataset.label_file_list`: point to the training set label file
+3. `Eval.dataset.data_dir`: refers to the directory where the validation set images are stored
+4. `Eval.dataset.label_file_list`: point to the validation set label file

 ```shell
 CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml
-```
+````

-最终会打印出`precision`, `recall`, `hmean`等指标。
-在`./output/re_layoutxlm/`文件夹中会保存训练日志，最优的模型和最新epoch的模型。
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed.
+In the `./output/re_layoutxlm/` folder will save the training log, the optimal model and the model for the latest epoch.

-* 恢复训练
+* resume training

-恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+To resume training, assign the folder path of the previously trained model to the `Architecture.Backbone.checkpoints` field.

 ```shell
 CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
+````

-* 评估
+* evaluate

-评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+Evaluation requires assigning the folder path of the model to be evaluated to the `Architecture.Backbone.checkpoints` field.

 ```shell
 CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
-最终会打印出`precision`, `recall`, `hmean`等指标
+````
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed

-* 使用`OCR引擎 + SER + RE`串联预测
+* Use `OCR engine + SER + RE` tandem prediction

-使用如下命令即可完成`OCR引擎 + SER + RE`的串联预测
+Use the following command to complete the series prediction of `OCR engine + SER + RE`, taking the pretrained SER and RE models as an example:
 ```shell
 export CUDA_VISIBLE_DEVICES=0
-python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=re_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=ser_LayoutXLM_xfun_zh/
-```
+python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm. yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
+````

-最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件，预测结果文本文件名为`infer_results.txt`。
+Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.

-## 6. 参考链接
+## 6. Reference Links

 - LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
 - microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
 - XFUND dataset, https://github.com/doc-analysis/XFUND
+
+## License
+
+The content of this project itself is licensed under the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
--- a/ppstructure/vqa/README_ch.md
+++ b/ppstructure/vqa/README_ch.md
+[English](README.md) | 简体中文
+
+- [文档视觉问答（DOC-VQA）](#文档视觉问答doc-vqa)
+  - [1. 简介](#1-简介)
+  - [2. 性能](#2-性能)
+  - [3. 效果演示](#3-效果演示)
+    - [3.1 SER](#31-ser)
+    - [3.2 RE](#32-re)
+  - [4. 安装](#4-安装)
+    - [4.1 安装依赖](#41-安装依赖)
+    - [4.2 安装PaddleOCR（包含 PP-OCR 和 VQA）](#42-安装paddleocr包含-pp-ocr-和-vqa)
+  - [5. 使用](#5-使用)
+    - [5.1 数据和预训练模型准备](#51-数据和预训练模型准备)
+    - [5.2 SER](#52-ser)
+    - [5.3 RE](#53-re)
+  - [6. 参考链接](#6-参考链接)
+
+# 文档视觉问答（DOC-VQA）
+
+## 1. 简介
+
+VQA指视觉问答，主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种，DOC-VQA主要针对文本图像的文字内容提出问题。
+
+PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进行开发。
+
+主要特性如下：
+
+- 集成[LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf)模型以及PP-OCR预测引擎。
+- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务，可以完成对图像中的文本识别与分类；基于 RE 任务，可以完成对图象中的文本内容的关系提取，如判断问题对(pair)。
+- 支持SER任务和RE任务的自定义训练。
+- 支持OCR+SER的端到端系统预测与评估。
+- 支持OCR+SER+RE的端到端系统预测。
+
+本项目是 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/pdf/2104.08836.pdf) 在 Paddle 2.2上的开源实现，
+包含了在 [XFUND数据集](https://github.com/doc-analysis/XFUND) 上的微调代码。
+
+## 2. 性能
+
+我们在 [XFUND](https://github.com/doc-analysis/XFUND) 的中文数据集上对算法进行了评估，性能如下
+
+| 模型 | 任务 | hmean | 模型下载地址 |
+|:---:|:---:|:---:| :---:|
+| LayoutXLM | SER | 0.9038 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
+| LayoutXLM | RE | 0.7483 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
+| LayoutLMv2 | SER | 0.8544 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)
+| LayoutLMv2 | RE | 0.6777 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
+| LayoutLM | SER | 0.7731 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
+
+## 3. 效果演示
+
+**注意：** 测试图片来源于XFUND数据集。
+
+### 3.1 SER
+
+![](../../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../../doc/vqa/result_ser/zh_val_42_ser.jpg)
+---|---
+
+图中不同颜色的框表示不同的类别，对于XFUND数据集，有`QUESTION`, `ANSWER`, `HEADER` 3种类别
+
+* 深紫色：HEADER
+* 浅紫色：QUESTION
+* 军绿色：ANSWER
+
+在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+
+### 3.2 RE
+
+![](../../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../../doc/vqa/result_re/zh_val_40_re.jpg)
+---|---
+
+
+图中红色框表示问题，蓝色框表示答案，问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+
+## 4. 安装
+
+### 4.1 安装依赖
+
+- **（1) 安装PaddlePaddle**
+
+```bash
+python3 -m pip install --upgrade pip
+
+# GPU安装
+python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple
+
+# CPU安装
+python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple
+
+```
+更多需求，请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+### 4.2 安装PaddleOCR（包含 PP-OCR 和 VQA）
+
+- **（1）pip快速安装PaddleOCR whl包（仅预测）**
+
+```bash
+python3 -m pip install paddleocr
+```
+
+- **（2）下载VQA源码（预测+训练）**
+
+```bash
+【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
+
+# 如果因为网络问题无法pull成功，也可选择使用码云上的托管：
+git clone https://gitee.com/paddlepaddle/PaddleOCR
+
+# 注：码云托管代码可能无法实时同步本github项目更新，存在3~5天延时，请优先使用推荐方式。
+```
+
+- **（3）安装VQA的`requirements`**
+
+```bash
+python3 -m pip install -r ppstructure/vqa/requirements.txt
+```
+
+## 5. 使用
+
+### 5.1 数据和预训练模型准备
+
+如果希望直接体验预测过程，可以下载我们提供的预训练模型，跳过训练过程，直接预测即可。
+
+* 下载处理好的数据集
+
+处理好的XFUND中文数据集下载地址：[https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)。
+
+
+下载并解压该数据集，解压后将数据集放置在当前目录下。
+
+```shell
+wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
+```
+
+* 转换数据集
+
+若需进行其他XFUND数据集的训练，可使用下面的命令进行数据集的转换
+
+```bash
+python3 ppstructure/vqa/tools/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
+```
+
+* 下载预训练模型
+```bash
+mkdir pretrain && cd pretrain
+#下载SER模型
+wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar && tar -xvf ser_LayoutXLM_xfun_zh.tar
+#下载RE模型
+wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -xvf re_LayoutXLM_xfun_zh.tar
+cd ../
+```
+
+### 5.2 SER
+
+启动训练之前，需要修改下面的四个字段
+
+1. `Train.dataset.data_dir`：指向训练集图片存放目录
+2. `Train.dataset.label_file_list`：指向训练集标注文件
+3. `Eval.dataset.data_dir`：指指向验证集图片存放目录
+4. `Eval.dataset.label_file_list`：指向验证集标注文件
+
+* 启动训练
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml
+```
+
+最终会打印出`precision`, `recall`, `hmean`等指标。
+在`./output/ser_layoutxlm/`文件夹中会保存训练日志，最优的模型和最新epoch的模型。
+
+* 恢复训练
+
+恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+
+* 评估
+
+评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+最终会打印出`precision`, `recall`, `hmean`等指标
+
+* 使用`OCR引擎 + SER`串联预测
+
+使用如下命令即可完成`OCR引擎 + SER`的串联预测, 以SER预训练模型为例:
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml  -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_42.jpg
+```
+
+最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件，预测结果文本文件名为`infer_results.txt`。
+
+* 对`OCR引擎 + SER`预测系统进行端到端评估
+
+首先使用 `tools/infer_vqa_token_ser.py` 脚本完成数据集的预测，然后使用下面的命令进行评估。
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json  --pred_json_path output_res/infer_results.txt
+```
+
+### 5.3 RE
+
+* 启动训练
+
+启动训练之前，需要修改下面的四个字段
+
+1. `Train.dataset.data_dir`：指向训练集图片存放目录
+2. `Train.dataset.label_file_list`：指向训练集标注文件
+3. `Eval.dataset.data_dir`：指指向验证集图片存放目录
+4. `Eval.dataset.label_file_list`：指向验证集标注文件
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml
+```
+
+最终会打印出`precision`, `recall`, `hmean`等指标。
+在`./output/re_layoutxlm/`文件夹中会保存训练日志，最优的模型和最新epoch的模型。
+
+* 恢复训练
+
+恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+
+* 评估
+
+评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+最终会打印出`precision`, `recall`, `hmean`等指标
+
+* 使用`OCR引擎 + SER + RE`串联预测
+
+使用如下命令即可完成`OCR引擎 + SER + RE`的串联预测, 以预训练SER和RE模型为例：
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
+```
+
+最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件，预测结果文本文件名为`infer_results.txt`。
+
+## 6. 参考链接
+
+- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
+- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
+- XFUND dataset, https://github.com/doc-analysis/XFUND
+
+## License
+
+The content of this project itself is licensed under the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
--- a/ppstructure/vqa/helper/eval_with_label_end2end.py
+++ b/ppstructure/vqa/helper/eval_with_label_end2end.py
--- a/ppstructure/vqa/helper/trans_xfun_data.py
+++ b/ppstructure/vqa/helper/trans_xfun_data.py
--- a/requirements.txt
+++ b/requirements.txt
@@ -12,4 +12,3 @@ cython
 lxml
 premailer
 openpyxl
-fasttext==0.9.1
--- a/test_tipc/configs/ch_PP-OCRv2_rec_PACT/train_infer_python.txt
+++ b/test_tipc/configs/ch_PP-OCRv2_rec_PACT/train_infer_python.txt
 ===========================train_params===========================
-model_name:PPOCRv2_ocr_rec_pact
+model_name:ch_PPOCRv2_rec_PACT
 python:python3.7
 gpu_list:0|0,1
 Global.use_gpu:True|True
 Global.auto_cast:fp32
-Global.epoch_num:lite_train_lite_infer=3|whole_train_whole_infer=300
+Global.epoch_num:lite_train_lite_infer=6|whole_train_whole_infer=300
 Global.save_model_dir:./output/
 Train.loader.batch_size_per_card:lite_train_lite_infer=16|whole_train_whole_infer=128
-Global.pretrained_model:null
+Global.pretrained_model:pretrain_models/ch_PP-OCRv2_rec_train/best_accuracy
 train_model_name:latest
 train_infer_img_dir:./inference/rec_inference
 null:null

--- a/test_tipc/configs/det_mv3_east_v2.0/train_infer_python.txt
+++ b/test_tipc/configs/det_mv3_east_v2.0/train_infer_python.txt
 ===========================train_params===========================
 model_name:det_mv3_east_v2.0
 python:python3.7
-gpu_list:0
+gpu_list:0|0,1
 Global.use_gpu:True|True
 Global.auto_cast:fp32
 Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=500
 Global.save_model_dir:./output/
 Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
-Global.pretrained_model:null
+Global.pretrained_model:./pretrain_models/det_mv3_east_v2.0_train/best_accuracy
 train_model_name:latest
 train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
 null:null

--- a/test_tipc/configs/rec_r31_sar/train_infer_python.txt
+++ b/test_tipc/configs/rec_r31_sar/train_infer_python.txt
@@ -50,4 +50,4 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict90.t
 --benchmark:True
 null:null
 ===========================infer_benchmark_params==========================
-random_infer_input:[{float32,[3,48,48,160]}]
+random_infer_input:[{float32,[3,48,160]}]
--- a/test_tipc/prepare.sh
+++ b/test_tipc/prepare.sh
@@ -60,9 +60,13 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
    ln -s ./icdar2015_lite ./icdar2015
    cd ../
    cd ./inference && tar xf rec_inference.tar && cd ../
-    if [ ${model_name} == "ch_PPOCRv2_det" ]; then
-        wget  -nc -P  ./pretrain_models/  https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar --no-check-certificate
-        cd ./pretrain_models/ && tar xf ch_ppocr_mobile_v2.0_det_train.tar  && cd ../
+    if [ ${model_name} == "ch_PPOCRv2_det" ] || [ ${model_name} == "ch_PPOCRv2_det_PACT" ]; then
+        wget  -nc -P  ./pretrain_models/  https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar --no-check-certificate
+        cd ./pretrain_models/ && tar xf ch_ppocr_server_v2.0_det_train.tar  && cd ../
+    fi
+    if [ ${model_name} == "ch_PPOCRv2_rec" ] || [ ${model_name} == "ch_PPOCRv2_rec_PACT" ]; then
+        wget  -nc -P  ./pretrain_models/  https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar --no-check-certificate
+        cd ./pretrain_models/ && tar xf ch_PP-OCRv2_rec_train.tar && cd ../
    fi
    if [ ${model_name} == "det_r18_db_v2_0" ]; then
        wget -nc -P ./pretrain_models/  https://paddleocr.bj.bcebos.com/pretrained/ResNet18_vd_pretrained.pdparams  --no-check-certificate
@@ -91,6 +95,10 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
        wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar --no-check-certificate
        cd ./pretrain_models/ && tar xf ch_ppocr_mobile_v2.0_rec_train.tar && cd ../
    fi
+    if [ ${model_name} == "det_mv3_east_v2.0" ]; then
+        wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar --no-check-certificate
+        cd ./pretrain_models/ && tar xf det_mv3_east_v2.0_train.tar && cd ../
+    fi

 elif [ ${MODE} = "whole_train_whole_infer" ];then
    wget -nc -P  ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams --no-check-certificate

--- a/tools/infer/utility.py
+++ b/tools/infer/utility.py
@@ -312,12 +312,22 @@ def create_predictor(args, mode, logger):
        input_names = predictor.get_input_names()
        for name in input_names:
            input_tensor = predictor.get_input_handle(name)
-        output_names = predictor.get_output_names()
-        output_tensors = []
+        output_tensors = get_output_tensors(args, mode, predictor)
+        return predictor, input_tensor, output_tensors, config
+
+
+def get_output_tensors(args, mode, predictor):
+    output_names = predictor.get_output_names()
+    output_tensors = []
+    if mode == "rec" and args.rec_algorithm == "CRNN":
+        output_name = 'softmax_0.tmp_0'
+        if output_name in output_names:
+            return [predictor.get_output_handle(output_name)]
+    else:
        for output_name in output_names:
            output_tensor = predictor.get_output_handle(output_name)
            output_tensors.append(output_tensor)
-        return predictor, input_tensor, output_tensors, config
+    return output_tensors


 def get_infer_gpuid():