diff --git a/PPOCRLabel/README_ch.md b/PPOCRLabel/README_ch.md
index 9bf898fd79b6b1642ce20fabda3009708473c354..f9c736d56e0b1b2a9b0a270149404c6afd4ec2bf 100644
--- a/PPOCRLabel/README_ch.md
+++ b/PPOCRLabel/README_ch.md
@@ -86,17 +86,18 @@ PPOCRLabel --lang ch --kie True # 启动 【KIE 模式】,用于打【检测+
> 如果上述安装出现问题,可以参考3.6节 错误提示
-#### 1.2.2 本地构建whl包并安装
+#### 1.2.2 通过Python脚本运行PPOCRLabel
+
+如果您对PPOCRLabel文件有所更改(例如指定新的内置模型),通过Python脚本运行会更加方面的看到更改的结果。如果仍然需要通过whl包启动,则需要参考下节重新编译whl包。
```bash
-cd PaddleOCR/PPOCRLabel
-python3 setup.py bdist_wheel
-pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple
+cd ./PPOCRLabel # 切换到PPOCRLabel目录
+python PPOCRLabel.py --lang ch
```
-#### 1.2.3 通过Python脚本运行PPOCRLabel
+#### 1.2.3 本地构建whl包并安装
-如果您对PPOCRLabel文件有所更改,通过Python脚本运行会更加方面的看到更改的结果
+编译与安装新的whl包,其中1.0.2为版本号,可在 `setup.py` 中指定新版本。
```bash
cd ./PPOCRLabel # 切换到PPOCRLabel目录
@@ -107,7 +108,6 @@ python PPOCRLabel.py --lang ch --kie True # 启动 【KIE 模式】,用于打
```
-
## 2. 使用
### 2.1 操作步骤
diff --git a/README.md b/README.md
index 95f35277a1d634c87d5720c7151d066b09dbdae7..99d20357b427477c13a560b54849dcbb831d0ee0 100644
--- a/README.md
+++ b/README.md
@@ -112,6 +112,10 @@ For a new language request, please refer to [Guideline for new language_requests
- [Text Recognition](./doc/doc_en/recognition_en.md)
- [Text Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
+ - PP-OCR Models Compression
+ - [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md)
+ - [Model Quantization](./deploy/slim/quantization/README_en.md)
+ - [Model Pruning](./deploy/slim/prune/README_en.md)
- Inference and Deployment
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
- [Serving](./deploy/pdserving/README.md)
diff --git a/README_ch.md b/README_ch.md
index 3788f9f0d4003a0a8aa636cd1dd6148936598411..6eb57fecebb6e9b22dd09f730c84f9f15ac8dc9c 100755
--- a/README_ch.md
+++ b/README_ch.md
@@ -79,22 +79,26 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 文档教程
- [运行环境准备](./doc/doc_ch/environment.md)
-- [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md)
+- [快速开始(中英文/多语言/版面分析)](./doc/doc_ch/quickstart.md)
- [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md)
- PP-OCR产业落地:从训练到部署
- [PP-OCR模型库](./doc/doc_ch/models.md)
- [PP-OCR模型下载](./doc/doc_ch/models_list.md)
- - [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md)
+ - [Python引擎的PP-OCR模型库推理](./doc/doc_ch/inference_ppocr.md)
- [PP-OCR模型训练](./doc/doc_ch/training.md)
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [文本方向分类器](./doc/doc_ch/angle_class.md)
- - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
- [配置文件内容与生成](./doc/doc_ch/config.md)
+ - PP-OCR模型压缩
+ - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
+ - [模型量化](./deploy/slim/quantization/README.md)
+ - [模型裁剪](./deploy/slim/prune/README.md)
- PP-OCR模型推理部署
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/pdserving/README_CN.md)
- [端侧部署](./deploy/lite/readme.md)
+ - [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md)
- [Benchmark](./doc/doc_ch/benchmark.md)
- [PP-Structure信息提取](./ppstructure/README_ch.md)
- [版面分析](./ppstructure/layout/README_ch.md)
diff --git a/deploy/paddle2onnx/readme.md b/deploy/paddle2onnx/readme.md
index e08f2adee5d315cecba703ecdf515c09cd1569d2..8e821892142d65caddd6fa3bd8ff24a372fe9a5d 100644
--- a/deploy/paddle2onnx/readme.md
+++ b/deploy/paddle2onnx/readme.md
@@ -1,4 +1,4 @@
-# paddle2onnx 模型转化与预测
+# Paddle2ONNX模型转化与预测
本章节介绍 PaddleOCR 模型如何转化为 ONNX 模型,并基于 ONNXRuntime 引擎预测。
diff --git a/deploy/pdserving/README_CN.md b/deploy/pdserving/README_CN.md
index ee83b73b851d6188072bdb79d6130a809c3823e0..afd355bac098a3c13c36476e2967d8f94e8cd306 100644
--- a/deploy/pdserving/README_CN.md
+++ b/deploy/pdserving/README_CN.md
@@ -8,8 +8,7 @@ PaddleOCR提供2种服务部署方式:
# 基于PaddleServing的服务部署
-本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PPOCR
-动态图模型的pipeline在线服务。
+本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PP-OCR动态图模型的pipeline在线服务。
相比较于hubserving部署,PaddleServing具备以下优点:
- 支持客户端和服务端之间高并发和高效通信
@@ -59,7 +58,7 @@ pip3 install paddle_serving_app-0.7.0-py3-none-any.whl
使用PaddleServing做服务化部署时,需要将保存的inference模型转换为serving易于部署的模型。
-首先,下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th)
+首先,下载PP-OCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th)
```bash
# 下载并解压 OCR 文本检测模型
@@ -107,7 +106,7 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \
1. 下载PaddleOCR代码,若已下载可跳过此步骤
```
git clone https://github.com/PaddlePaddle/PaddleOCR
-
+
# 进入到工作目录
cd PaddleOCR/deploy/pdserving/
```
diff --git a/deploy/slim/prune/README.md b/deploy/slim/prune/README.md
index c438572318f57fdfe9066ff2135156d7129bee4c..6d04f1648705071d70c1e9f17cd30d6825f92467 100644
--- a/deploy/slim/prune/README.md
+++ b/deploy/slim/prune/README.md
@@ -1,5 +1,5 @@
-## 介绍
+# PP-OCR模型裁剪
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
@@ -7,13 +7,13 @@
在开始本教程之前,建议先了解:
-1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)
+1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)
2. [模型裁剪教程](https://github.com/PaddlePaddle/PaddleSlim/blob/release%2F2.0.0/docs/zh_cn/tutorials/pruning/dygraph/filter_pruning.md)
-
## 快速开始
模型裁剪主要包括四个步骤:
+
1. 安装 PaddleSlim
2. 准备训练好的模型
3. 敏感度分析、裁剪训练
@@ -35,17 +35,20 @@ python3 setup.py install
加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,得到敏感度文件:sen.pickle,可以通过PaddleSlim提供的[接口](https://github.com/PaddlePaddle/PaddleSlim/blob/9b01b195f0c4bc34a1ab434751cb260e13d64d9e/paddleslim/dygraph/prune/filter_pruner.py#L75)加载文件,获得各网络层在不同裁剪比例下的精度损失。从而了解各网络层冗余度,决定每个网络层的裁剪比例。
敏感度文件内容格式:
- sen.pickle(Dict){
+```
+sen.pickle(Dict){
'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
}
- 例子:
+例子:
{
'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
}
-加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0-alpha/docs/zh_cn/algo/algo.md)
+```
+
+加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析训练:
```bash
diff --git a/deploy/slim/prune/README_en.md b/deploy/slim/prune/README_en.md
index f8fbed47ca1c788ea816cc76f1092b17f0ea5219..ceb9e204a74b5460d74766e8df154d22f39bd843 100644
--- a/deploy/slim/prune/README_en.md
+++ b/deploy/slim/prune/README_en.md
@@ -1,5 +1,5 @@
-## Introduction
+# PP-OCR Models Pruning
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
@@ -37,25 +37,27 @@ PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en.
After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md)
The data format of sensitivity file:
- sen.pickle(Dict){
+
+```
+sen.pickle(Dict){
'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
}
-
- example:
+example:
{
'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
}
The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0-alpha/docs/zh_cn/algo/algo.md)
+```
+
+ The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:
```bash
-
python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model" Global.save_model_dir=./output/prune_model/
-
```
diff --git a/deploy/slim/quantization/README.md b/deploy/slim/quantization/README.md
index 62bc408f5eeda6d8366834200e8d8a20d1dc82cd..8d3f779e0028a62d8396601166283f0ee54d43a7 100644
--- a/deploy/slim/quantization/README.md
+++ b/deploy/slim/quantization/README.md
@@ -1,12 +1,12 @@
-## 介绍
+# PP-OCR模型量化
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。
本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。
-在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
+在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
## 快速开始
diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md
index d3bf12d625b076c7bc18016bc9973d1212b3d70b..3f1fe67c9aa3b0b95ee006d97e39e3ce6a19ca22 100644
--- a/deploy/slim/quantization/README_en.md
+++ b/deploy/slim/quantization/README_en.md
@@ -1,5 +1,5 @@
-## Introduction
+# PP-OCR Models Quantization
Generally, a more complex model would achieve better performance in the task, but it also leads to some redundancy in the model.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
diff --git a/deploy/slim/quantization/export_model.py b/deploy/slim/quantization/export_model.py
index 34cf80f5e5566707a08d15ddeaaa51348dcd9acf..bbd291c3347929bf394d7859e277286cb4932042 100755
--- a/deploy/slim/quantization/export_model.py
+++ b/deploy/slim/quantization/export_model.py
@@ -127,6 +127,7 @@ def main():
arch_config = config["Architecture"]
if arch_config["algorithm"] in ["Distillation", ]: # distillation model
for idx, name in enumerate(model.model_name_list):
+ model.model_list[idx].eval()
sub_model_save_path = os.path.join(save_path, name, "inference")
export_single_model(quanter, model.model_list[idx], infer_shape,
sub_model_save_path, logger)
diff --git a/doc/doc_ch/environment.md b/doc/doc_ch/environment.md
index 3a266c4bb8fe5516f844bea9f0aa21359d51660e..23bec4b978ab34f144a2ec7256e09412f5440646 100644
--- a/doc/doc_ch/environment.md
+++ b/doc/doc_ch/environment.md
@@ -1,20 +1,19 @@
# 运行环境准备
-Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建PyThon环境。
+Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建Python环境。
推荐环境:
-- PaddlePaddle >= 2.0.0 (2.1.2)
-- python3.7
+- PaddlePaddle >= 2.1.2
+- Python 3.7
- CUDA10.1 / CUDA10.2
- CUDNN 7.6
-如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。
+> 如果您已经安装Python环境,可以直接参考[PaddleOCR快速开始](./quickstart.md)
* [1. Python环境搭建](#1)
+ [1.1 Windows](#1.1)
+ [1.2 Mac](#1.2)
+ [1.3 Linux](#1.3)
-* [2. 安装PaddlePaddle](#2)
@@ -212,7 +211,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh
# 若您要下载其他版本,需要将最后1个/后的文件名改成您希望下载的版本
- ```
+ ```
- 安装Anaconda:
@@ -311,21 +310,3 @@ sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=hos
# ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令
sudo docker container exec -it ppocr /bin/bash
```
-
-
-
-## 2. 安装PaddlePaddle
-
-- 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
-
-```bash
-python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
-```
-
-- 如果您的机器是CPU,请运行以下命令安装
-
-```bash
-python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
-```
-
-更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
diff --git a/doc/doc_ch/finetune.md b/doc/doc_ch/finetune.md
new file mode 100644
index 0000000000000000000000000000000000000000..e8f146aadc079444c37e000d16ada8b6bda8ba18
--- /dev/null
+++ b/doc/doc_ch/finetune.md
@@ -0,0 +1,170 @@
+# 模型微调
+
+## 1. 模型微调背景与意义
+
+PaddleOCR提供的PP-OCR系列模型在通用场景中性能优异,能够解决绝大多数情况下的检测与识别问题。在垂类场景中,如果希望获取更优的模型效果,可以通过模型微调的方法,进一步提升PP-OCR系列检测与识别模型的精度。
+
+本文主要介绍文本检测与识别模型在模型微调时的一些注意事项,最终希望您在自己的场景中,通过模型微调,可以获取精度更高的文本检测与识别模型。
+
+本文核心要点如下所示。
+
+1. PP-OCR提供的预训练模型有较好的泛化能力
+2. 加入少量真实数据(检测任务>=500张, 识别任务>=5000张),会大幅提升垂类场景的检测与识别效果
+3. 在模型微调时,加入真实通用场景数据,可以进一步提升模型精度与泛化性能
+4. 在图像检测任务中,增大图像的预测尺度,能够进一步提升较小文字区域的检测效果
+5. 在模型微调时,需要适当调整超参数(学习率,batch size最为重要),以获得更优的微调效果。
+
+更多详细内容,请参考第2章与第3章。
+
+## 2. 文本检测模型微调
+
+### 2.1 数据选择
+
+* 数据量:建议至少准备500张的文本检测数据集用于模型微调。
+
+* 数据标注:单行文本标注格式,建议标注的检测框与实际语义内容一致。如在火车票场景中,姓氏与名字可能离得较远,但是它们在语义上属于同一个检测字段,这里也需要将整个姓名标注为1个检测框。
+
+### 2.2 模型选择
+
+建议选择PP-OCRv2模型(配置文件:[ch_PP-OCRv2_det_student.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml),预训练模型:[ch_PP-OCRv2_det_distill_train.tar](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar))进行微调,其精度与泛化性能是目前提供的最优预训练模型。
+
+更多PP-OCR系列模型,请参考[PaddleOCR 首页说明文档](../../README_ch.md)。
+
+注意:在使用上述预训练模型的时候,由于保存的模型中包含教师模型,因此需要将其中的学生模型单独提取出来,再加载学生模型即可进行模型微调。
+
+```python
+import paddle
+# 加载完整的检测预训练模型
+a = paddle.load("ch_PP-OCRv2_det_distill_train/best_accuracy.pdparams")
+# 提取学生模型的参数
+b = {k[len("student_model."):]: a[k] for k in a if "student_model." in k}
+# 保存模型,用于后续模型微调
+paddle.save(b, "ch_PP-OCRv2_det_student.pdparams")
+```
+
+
+### 2.3 训练超参选择
+
+在模型微调的时候,最重要的超参就是预训练模型路径`pretrained_model`, 学习率`learning_rate`与`batch_size`,部分配置文件如下所示。
+
+```yaml
+Global:
+ pretrained_model: ./pretrain_models/student.pdparams # 预训练模型路径
+Optimizer:
+ lr:
+ name: Cosine
+ learning_rate: 0.001 # 学习率
+ warmup_epoch: 2
+ regularizer:
+ name: 'L2'
+ factor: 0
+
+Train:
+ loader:
+ shuffle: True
+ drop_last: False
+ batch_size_per_card: 8 # 单卡batch size
+ num_workers: 4
+```
+
+上述配置文件中,首先需要将`pretrained_model`字段指定为2.2章节中提取出来的`ch_PP-OCRv2_det_student.pdparams`文件路径。
+
+PaddleOCR提供的配置文件是在8卡训练(相当于总的batch size是`8*8=64`)、且没有加载预训练模型情况下的配置文件,因此您的场景中,学习率与总的batch size需要对应线性调整,例如
+
+* 如果您的场景中是单卡训练,单卡batch_size=8,则总的batch_size=8,建议将学习率调整为`1e-4`左右。
+* 如果您的场景中是单卡训练,由于显存限制,只能设置单卡batch_size=4,则总的batch_size=4,建议将学习率调整为`5e-5`左右。
+
+### 2.4 预测超参选择
+
+对训练好的模型导出并进行推理时,可以通过进一步调整预测的图像尺度,来提升小面积文本的检测效果,下面是DBNet推理时的一些超参数,可以通过适当调整,提升效果。
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+| det_db_thresh | float | 0.3 | DB输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点 |
+| det_db_box_thresh | float | 0.6 | 检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域 |
+| det_db_unclip_ratio | float | 1.5 | `Vatti clipping`算法的扩张系数,使用该方法对文字区域进行扩张 |
+| max_batch_size | int | 10 | 预测的batch size |
+| use_dilation | bool | False | 是否对分割结果进行膨胀以获取更优检测效果 |
+| det_db_score_mode | str | "fast" | DB的检测结果得分计算方法,支持`fast`和`slow`,`fast`是根据polygon的外接矩形边框内的所有像素计算平均得分,`slow`是根据原始polygon内的所有像素计算平均得分,计算速度相对较慢一些,但是更加准确一些。 |
+
+
+更多关于推理方法的介绍可以参考[Paddle Inference推理教程](./inference.md)。
+
+
+## 3. 文本识别模型微调
+
+
+### 3.1 数据选择
+
+* 数据量:不更换字典的情况下,建议至少准备5000张的文本识别数据集用于模型微调;如果更换了字典(不建议),需要的数量更多。
+
+* 数据分布:建议分布与实测场景尽量一致。如果实测场景包含大量短文本,则训练数据中建议也包含较多短文本,如果实测场景对于空格识别效果要求较高,则训练数据中建议也包含较多带空格的文本内容。
+
+
+* 通用中英文数据:在训练的时候,可以在训练集中添加通用真实数据(如在不更换字典的微调场景中,建议添加LSVT、RCTW、MTWI等真实数据),进一步提升模型的泛化性能。
+
+### 3.2 模型选择
+
+建议选择PP-OCRv2模型(配置文件:[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml),预训练模型:[ch_PP-OCRv2_rec_train.tar](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar))进行微调,其精度与泛化性能是目前提供的最优预训练模型。
+
+更多PP-OCR系列,模型请参考[PaddleOCR 首页说明文档](../../README_ch.md)。
+
+
+### 3.3 训练超参选择
+
+与文本检测任务微调相同,在识别模型微调的时候,最重要的超参就是预训练模型路径`pretrained_model`, 学习率`learning_rate`与`batch_size`,部分默认配置文件如下所示。
+
+```yaml
+Global:
+ pretrained_model: # 预训练模型路径
+Optimizer:
+ lr:
+ name: Piecewise
+ decay_epochs : [700, 800]
+ values : [0.001, 0.0001] # 学习率
+ warmup_epoch: 5
+ regularizer:
+ name: 'L2'
+ factor: 0
+
+Train:
+ dataset:
+ name: SimpleDataSet
+ data_dir: ./train_data/
+ label_file_list:
+ - ./train_data/train_list.txt
+ ratio_list: [1.0] # 采样比例,默认值是[1.0]
+ loader:
+ shuffle: True
+ drop_last: False
+ batch_size_per_card: 128 # 单卡batch size
+ num_workers: 8
+
+```
+
+
+上述配置文件中,首先需要将`pretrained_model`字段指定为2.2章节中解压得到的`ch_PP-OCRv2_rec_train/best_accuracy.pdparams`文件路径。
+
+PaddleOCR提供的配置文件是在8卡训练(相当于总的batch size是`8*128=1024`)、且没有加载预训练模型情况下的配置文件,因此您的场景中,学习率与总的batch size需要对应线性调整,例如:
+
+* 如果您的场景中是单卡训练,单卡batch_size=128,则总的batch_size=128,在加载预训练模型的情况下,建议将学习率调整为`[1e-4, 2e-5]`左右(piecewise学习率策略,需设置2个值,下同)。
+* 如果您的场景中是单卡训练,因为显存限制,只能设置单卡batch_size=64,则总的batch_size=64,在加载预训练模型的情况下,建议将学习率调整为`[5e-5, 1e-5]`左右。
+
+
+如果有通用真实场景数据加进来,建议每个epoch中,垂类场景数据与真实场景的数据量保持在1:1左右。
+
+比如:您自己的垂类场景识别数据量为1W,数据标签文件为`vertical.txt`,收集到的通用场景识别数据量为10W,数据标签文件为`general.txt`,
+
+
+那么,可以设置`label_file_list`和`ratio_list`参数如下所示。每个epoch中,`vertical.txt`中会进行全采样(采样比例为1.0),包含1W条数据;`general.txt`中会按照0.1的采样比例进行采样,包含`10W*0.1=1W`条数据,最终二者的比例为`1:1`。
+
+```yaml
+Train:
+ dataset:
+ name: SimpleDataSet
+ data_dir: ./train_data/
+ label_file_list:
+ - vertical.txt
+ - general.txt
+ ratio_list: [1.0, 0.1]
+```
diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md
index c02da14af495cd807668dca6d7f3823d1de6820d..ade1a2dbdf728ac785efef3e5a82b4c932674b87 100755
--- a/doc/doc_ch/inference.md
+++ b/doc/doc_ch/inference.md
@@ -36,6 +36,8 @@ inference 模型(`paddle.jit.save`保存的模型)
- [六、参数解释](#参数解释)
+- [七、FAQ](#FAQ)
+
## 一、训练模型转inference模型
@@ -520,3 +522,9 @@ PSE算法相关参数如下
| label_list | list | ['0', '180'] | class id对应的角度值 |
| cls_batch_num | int | 6 | 方向分类器预测的batch size |
| cls_thresh | float | 0.9 | 预测阈值,模型预测结果为180度,且得分大于该阈值时,认为最终预测结果为180度,需要翻转 |
+
+
+
+# 七、FAQ
+
+* 如果是使用paddle2.0之前版本的代码导出的`inference模型`,则其文件名为`model`与`params`,分别对应paddle2.0或者之后版本导出的`inference.pdmodel`与`inference.pdiparams`;不过目前PaddleOCR的release分支已经不支持paddle2.0之前版本导出的inference 模型,如果希望使用,需要使用develop分支(静态图分支)的代码与文档。
diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md
index 49b0e7743ebd52ac6d4afa7cc665df71d8d16b7f..57931aa26143f2f442f3e4d579abc2549c11322b 100644
--- a/doc/doc_ch/quickstart.md
+++ b/doc/doc_ch/quickstart.md
@@ -1,7 +1,7 @@
-# PaddleOCR快速开始
-
- [PaddleOCR快速开始](#paddleocr快速开始)
- - [1. 安装PaddleOCR whl包](#1-安装paddleocr-whl包)
+ - [1. 安装](#1-安装)
+ - [1.1 安装PaddlePaddle](#11-安装paddlepaddle)
+ - [1.2 安装PaddleOCR whl包](#12-安装paddleocr-whl包)
- [2. 便捷使用](#2-便捷使用)
- [2.1 命令行使用](#21-命令行使用)
- [2.1.1 中英文模型](#211-中英文模型)
@@ -10,10 +10,37 @@
- [2.2 Python脚本使用](#22-python脚本使用)
- [2.2.1 中英文与多语言使用](#221-中英文与多语言使用)
- [2.2.2 版面分析](#222-版面分析)
+ - [3. 小结](#3-小结)
+
+# PaddleOCR快速开始
-## 1. 安装PaddleOCR whl包
+## 1. 安装
+
+
+
+### 1.1 安装PaddlePaddle
+
+> 如果您没有基础的Python运行环境,请参考[运行环境准备](./environment.md)。
+
+- 您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
+
+ ```bash
+ python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+ ```
+
+- 您的机器是CPU,请运行以下命令安装
+
+ ```bash
+ python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+ ```
+
+更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+
+
+### 1.2 安装PaddleOCR whl包
```bash
pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本
@@ -258,3 +285,11 @@ im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
+
+
+
+## 3. 小结
+
+通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。
+
+PaddleOCR是一套丰富领先实用的OCR工具库,打通数据、模型训练、压缩和推理部署全流程,因此在[下一节](./paddleOCR_overview.md)中我们将首先为您介绍PaddleOCR的全景图,然后克隆PaddleOCR项目,正式开启PaddleOCR的应用之旅。
diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md
index e7e7842273ffe326e8a89cdaa92fbe22ddf518c1..6cdd547517ebb8888374b22c1b52314da53eebab 100644
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@@ -248,7 +248,10 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
| rec_r31_sar.yml | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
| rec_resnet_stn_bilstm_att.yml | SEED | Aster_Resnet | STN | BiLSTM | att |
-*其中SEED模型需要额外加载FastText训练好的[语言模型](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz)
+*其中SEED模型需要额外加载FastText训练好的[语言模型](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz) ,并且安装 fasttext 依赖:
+```
+python3.7 -m pip install fasttext==0.9.1
+```
训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件:
diff --git a/doc/doc_en/environment_en.md b/doc/doc_en/environment_en.md
index fc87f10c104628df0268bc6f8910c5914aeba225..6521d3c4144aa579be2075d14826e9dcb9ad9dd6 100644
--- a/doc/doc_en/environment_en.md
+++ b/doc/doc_en/environment_en.md
@@ -1,18 +1,19 @@
# Environment Preparation
-Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. If you are familiar with the Python environment, you can skip to step 2 to install PaddlePaddle.
+Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment.
Recommended working environment:
-- PaddlePaddle >= 2.0.0 (2.1.2)
+- PaddlePaddle >= 2.1.2
- Python 3.7
- CUDA 10.1 / CUDA 10.2
- cuDNN 7.6
+> If you already have a Python environment installed, you can skip to [PaddleOCR Quick Start](./quickstart_en.md).
+
* [1. Python Environment Setup](#1)
+ [1.1 Windows](#1.1)
+ [1.2 Mac](#1.2)
+ [1.3 Linux](#1.3)
-* [2. Install PaddlePaddle 2.0](#2)
@@ -330,21 +331,3 @@ You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags
# ctrl+P+Q to exit docker, to re-enter docker using the following command:
sudo docker container exec -it ppocr /bin/bash
```
-
-
-
-## 2. Install PaddlePaddle 2.0
-
-- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
-
-```bash
-python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
-```
-
-- If you have no available GPU on your machine, please run the following command to install the CPU version
-
-```bash
-python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
-```
-
-For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md
index f7f65089a754ec14bf24e61c2fea6df168052002..8a9c38069f384dcef06db60f6b1266e6eb116d84 100644
--- a/doc/doc_en/quickstart_en.md
+++ b/doc/doc_en/quickstart_en.md
@@ -1,8 +1,7 @@
-
-# PaddleOCR Quick Start
-
- [PaddleOCR Quick Start](#paddleocr-quick-start)
- - [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
+ - [1. Installation](#1-installation)
+ - [1.1 Install PaddlePaddle](#11-install-paddlepaddle)
+ - [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
- [2. Easy-to-Use](#2-easy-to-use)
- [2.1 Use by Command Line](#21-use-by-command-line)
- [2.1.1 Chinese and English Model](#211-chinese-and-english-model)
@@ -11,12 +10,38 @@
- [2.2 Use by Code](#22-use-by-code)
- [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese--english-model-and-multilingual-model)
- [2.2.2 Layout Analysis](#222-layout-analysis)
+ - [3. Summary](#3-summary)
+
+# PaddleOCR Quick Start
+
+
+
+
+## 1. Installation
+
+
+
+### 1.1 Install PaddlePaddle
+
+> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).
+- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
+ ```bash
+ python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+ ```
+
+- If you have no available GPU on your machine, please run the following command to install the CPU version
+
+ ```bash
+ python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+ ```
-
+For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
-## 1. Install PaddleOCR Whl Package
+
+
+### 1.2 Install PaddleOCR Whl Package
```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
@@ -249,3 +274,11 @@ im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
+
+
+
+## 3. Summary
+
+In this section, you have mastered the use of PaddleOCR whl packages and obtained results.
+
+PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR.
diff --git a/doc/joinus.PNG b/doc/joinus.PNG
index 774f4ef2f809004950af852e95858da8d5c85a8d..5838a96bc8317178de07a16d246966bf6cc7df63 100644
Binary files a/doc/joinus.PNG and b/doc/joinus.PNG differ
diff --git a/paddleocr.py b/paddleocr.py
index 3a06158b0c92ea70e4320646d38e8c9e9295e9db..d07082f0ddc1133b3e9b3a7a7703d87f7cfeeedb 100644
--- a/paddleocr.py
+++ b/paddleocr.py
@@ -47,7 +47,7 @@ __all__ = [
]
SUPPORT_DET_MODEL = ['DB']
-VERSION = '2.4.0.3'
+VERSION = '2.4.0.4'
SUPPORT_REC_MODEL = ['CRNN']
BASE_DIR = os.path.expanduser("~/.paddleocr/")
diff --git a/ppocr/losses/kie_sdmgr_loss.py b/ppocr/losses/kie_sdmgr_loss.py
index 8f2173e49904926ebab2c450890c4fafe3f36b50..745671f58da91c108624097faea72d55c1877f6b 100644
--- a/ppocr/losses/kie_sdmgr_loss.py
+++ b/ppocr/losses/kie_sdmgr_loss.py
@@ -1,4 +1,4 @@
-# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
+# reference from : https://github.com/open-mmlab/mmocr/blob/main/mmocr/models/kie/losses/sdmgr_loss.py
+
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
diff --git a/ppocr/metrics/kie_metric.py b/ppocr/metrics/kie_metric.py
index 761965cfcc25d2a6de30342769d01b36d6212d98..f3bce0411d6521b1756892cbd7b4c6fcb7bcfb6c 100644
--- a/ppocr/metrics/kie_metric.py
+++ b/ppocr/metrics/kie_metric.py
@@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
+# The code is refer from: https://github.com/open-mmlab/mmocr/blob/main/mmocr/core/evaluation/kie_metric.py
from __future__ import absolute_import
from __future__ import division
diff --git a/ppocr/modeling/heads/kie_sdmgr_head.py b/ppocr/modeling/heads/kie_sdmgr_head.py
index 46ac0ed8dcaccb7628ef87fbe851a2b6acd60d55..ac5f73fa7e5b182faa1456e069da79118d6f7068 100644
--- a/ppocr/modeling/heads/kie_sdmgr_head.py
+++ b/ppocr/modeling/heads/kie_sdmgr_head.py
@@ -1,4 +1,4 @@
-# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
+# reference from : https://github.com/open-mmlab/mmocr/blob/main/mmocr/models/kie/heads/sdmgr_head.py
from __future__ import absolute_import
from __future__ import division
diff --git a/ppocr/modeling/heads/rec_ctc_head.py b/ppocr/modeling/heads/rec_ctc_head.py
index 35d33d5f56b3b378286565cbfa9755f43343b278..6c1cf0659607186d54dfee6983b135f34d542446 100755
--- a/ppocr/modeling/heads/rec_ctc_head.py
+++ b/ppocr/modeling/heads/rec_ctc_head.py
@@ -80,7 +80,6 @@ class CTCHead(nn.Layer):
result = (x, predicts)
else:
result = predicts
-
if not self.training:
predicts = F.softmax(predicts, axis=2)
result = predicts
diff --git a/ppocr/modeling/heads/rec_sar_head.py b/ppocr/modeling/heads/rec_sar_head.py
index a46cce7de2c8e59cf797db96fc6fcb7e25fa549a..3b7674268772d8a332b963fd6b82dfb71ee40212 100644
--- a/ppocr/modeling/heads/rec_sar_head.py
+++ b/ppocr/modeling/heads/rec_sar_head.py
@@ -216,7 +216,7 @@ class ParallelSARDecoder(BaseDecoder):
self.pred_dropout = nn.Dropout(pred_dropout)
pred_num_classes = self.num_classes - 1
if pred_concat:
- fc_in_channel = decoder_rnn_out_size + d_model + d_enc
+ fc_in_channel = decoder_rnn_out_size + d_model + encoder_rnn_out_size
else:
fc_in_channel = d_model
self.prediction = nn.Linear(fc_in_channel, pred_num_classes)
diff --git a/ppocr/postprocess/rec_postprocess.py b/ppocr/postprocess/rec_postprocess.py
index de771acca86a8956b06b366b840aac7e21f835a4..3bc7bcdf9b388bb8da6c656682e2e06a18a0f4fb 100644
--- a/ppocr/postprocess/rec_postprocess.py
+++ b/ppocr/postprocess/rec_postprocess.py
@@ -89,7 +89,7 @@ class CTCLabelDecode(BaseRecLabelDecode):
use_space_char)
def __call__(self, preds, label=None, *args, **kwargs):
- if isinstance(preds, tuple):
+ if isinstance(preds, tuple) or isinstance(preds, list):
preds = preds[-1]
if isinstance(preds, paddle.Tensor):
preds = preds.numpy()
diff --git a/ppstructure/vqa/README.md b/ppstructure/vqa/README.md
index b9a82cc5fd971800aaebd9bc4553ba6f0700845e..a2117c9e0601360750e354d4faecd43b2a2a0a68 100644
--- a/ppstructure/vqa/README.md
+++ b/ppstructure/vqa/README.md
@@ -1,67 +1,68 @@
-- [文档视觉问答(DOC-VQA)](#文档视觉问答doc-vqa)
- - [1. 简介](#1-简介)
- - [2. 性能](#2-性能)
- - [3. 效果演示](#3-效果演示)
+English | [简体中文](README_ch.md)
+
+- [Document Visual Question Answering (Doc-VQA)](#Document-Visual-Question-Answering)
+ - [1. Introduction](#1-Introduction)
+ - [2. Performance](#2-performance)
+ - [3. Effect demo](#3-Effect-demo)
- [3.1 SER](#31-ser)
- [3.2 RE](#32-re)
- - [4. 安装](#4-安装)
- - [4.1 安装依赖](#41-安装依赖)
- - [4.2 安装PaddleOCR(包含 PP-OCR 和 VQA)](#42-安装paddleocr包含-pp-ocr-和-vqa)
- - [5. 使用](#5-使用)
- - [5.1 数据和预训练模型准备](#51-数据和预训练模型准备)
+ - [4. Install](#4-Install)
+ - [4.1 Installation dependencies](#41-Install-dependencies)
+ - [4.2 Install PaddleOCR](#42-Install-PaddleOCR)
+ - [5. Usage](#5-Usage)
+ - [5.1 Data and Model Preparation](#51-Data-and-Model-Preparation)
- [5.2 SER](#52-ser)
- [5.3 RE](#53-re)
- - [6. 参考链接](#6-参考链接)
-
+ - [6. Reference](#6-Reference-Links)
-# 文档视觉问答(DOC-VQA)
+# Document Visual Question Answering
-## 1. 简介
+## 1 Introduction
-VQA指视觉问答,主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种,DOC-VQA主要针对文本图像的文字内容提出问题。
+VQA refers to visual question answering, which mainly asks and answers image content. DOC-VQA is one of the VQA tasks. DOC-VQA mainly asks questions about the text content of text images.
-PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进行开发。
+The DOC-VQA algorithm in PP-Structure is developed based on the PaddleNLP natural language processing algorithm library.
-主要特性如下:
+The main features are as follows:
-- 集成[LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf)模型以及PP-OCR预测引擎。
-- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
-- 支持SER任务和RE任务的自定义训练。
-- 支持OCR+SER的端到端系统预测与评估。
-- 支持OCR+SER+RE的端到端系统预测。
+- Integrate [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) model and PP-OCR prediction engine.
+- Supports Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks based on multimodal methods. Based on the SER task, the text recognition and classification in the image can be completed; based on the RE task, the relationship extraction of the text content in the image can be completed, such as judging the problem pair (pair).
+- Supports custom training for SER tasks and RE tasks.
+- Supports end-to-end system prediction and evaluation of OCR+SER.
+- Supports end-to-end system prediction of OCR+SER+RE.
-本项目是 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/pdf/2104.08836.pdf) 在 Paddle 2.2上的开源实现,
-包含了在 [XFUND数据集](https://github.com/doc-analysis/XFUND) 上的微调代码。
+This project is an open source implementation of [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/pdf/2104.08836.pdf) on Paddle 2.2,
+Included fine-tuning code on [XFUND dataset](https://github.com/doc-analysis/XFUND).
-## 2. 性能
+## 2. Performance
-我们在 [XFUN](https://github.com/doc-analysis/XFUND) 的中文数据集上对算法进行了评估,性能如下
+We evaluate the algorithm on the Chinese dataset of [XFUND](https://github.com/doc-analysis/XFUND), and the performance is as follows
-| 模型 | 任务 | hmean | 模型下载地址 |
+| Model | Task | hmean | Model download address |
|:---:|:---:|:---:| :---:|
-| LayoutXLM | SER | 0.9038 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
-| LayoutXLM | RE | 0.7483 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
-| LayoutLMv2 | SER | 0.8544 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)
-| LayoutLMv2 | RE | 0.6777 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
-| LayoutLM | SER | 0.7731 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
+| LayoutXLM | SER | 0.9038 | [link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
+| LayoutXLM | RE | 0.7483 | [link](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
+| LayoutLMv2 | SER | 0.8544 | [link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)
+| LayoutLMv2 | RE | 0.6777 | [link](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
+| LayoutLM | SER | 0.7731 | [link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
-## 3. 效果演示
+## 3. Effect demo
-**注意:** 测试图片来源于XFUN数据集。
+**Note:** The test images are from the XFUND dataset.
### 3.1 SER
 | 
---|---
-图中不同颜色的框表示不同的类别,对于XFUN数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别
+Boxes with different colors in the figure represent different categories. For the XFUND dataset, there are 3 categories: `QUESTION`, `ANSWER`, `HEADER`
-* 深紫色:HEADER
-* 浅紫色:QUESTION
-* 军绿色:ANSWER
+* Dark purple: HEADER
+* Light purple: QUESTION
+* Army Green: ANSWER
-在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+The corresponding categories and OCR recognition results are also marked on the upper left of the OCR detection frame.
### 3.2 RE
@@ -69,176 +70,190 @@ PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进
---|---
-图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+The red box in the figure represents the question, the blue box represents the answer, and the question and the answer are connected by a green line. The corresponding categories and OCR recognition results are also marked on the upper left of the OCR detection frame.
-## 4. 安装
+## 4. Install
-### 4.1 安装依赖
+### 4.1 Install dependencies
-- **(1) 安装PaddlePaddle**
+- **(1) Install PaddlePaddle**
```bash
python3 -m pip install --upgrade pip
-# GPU安装
+# GPU installation
python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple
-# CPU安装
+# CPU installation
python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple
-```
-更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+````
+For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/install/quick).
-### 4.2 安装PaddleOCR(包含 PP-OCR 和 VQA)
+### 4.2 Install PaddleOCR
-- **(1)pip快速安装PaddleOCR whl包(仅预测)**
+- **(1) pip install PaddleOCR whl package quickly (prediction only)**
```bash
python3 -m pip install paddleocr
-```
+````
-- **(2)下载VQA源码(预测+训练)**
+- **(2) Download VQA source code (prediction + training)**
```bash
-【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
+[Recommended] git clone https://github.com/PaddlePaddle/PaddleOCR
-# 如果因为网络问题无法pull成功,也可选择使用码云上的托管:
+# If the pull cannot be successful due to network problems, you can also choose to use the hosting on the code cloud:
git clone https://gitee.com/paddlepaddle/PaddleOCR
-# 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
-```
+# Note: Code cloud hosting code may not be able to synchronize the update of this github project in real time, there is a delay of 3 to 5 days, please use the recommended method first.
+````
-- **(3)安装VQA的`requirements`**
+- **(3) Install VQA's `requirements`**
```bash
python3 -m pip install -r ppstructure/vqa/requirements.txt
-```
+````
-## 5. 使用
+## 5. Usage
-### 5.1 数据和预训练模型准备
+### 5.1 Data and Model Preparation
-如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。
+If you want to experience the prediction process directly, you can download the pre-training model provided by us, skip the training process, and just predict directly.
-* 下载处理好的数据集
+* Download the processed dataset
-处理好的XFUN中文数据集下载地址:[https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)。
+The download address of the processed XFUND Chinese dataset: [https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar).
-下载并解压该数据集,解压后将数据集放置在当前目录下。
+Download and unzip the dataset, and place the dataset in the current directory after unzipping.
```shell
wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
-```
+````
+
+* Convert the dataset
-* 转换数据集
+If you need to train other XFUND datasets, you can use the following commands to convert the datasets
-若需进行其他XFUN数据集的训练,可使用下面的命令进行数据集的转换
+```bash
+python3 ppstructure/vqa/tools/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
+````
+* Download the pretrained models
```bash
-python3 ppstructure/vqa/helper/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
-```
+mkdir pretrain && cd pretrain
+#download the SER model
+wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar && tar -xvf ser_LayoutXLM_xfun_zh.tar
+#download the RE model
+wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -xvf re_LayoutXLM_xfun_zh.tar
+cd ../
+````
### 5.2 SER
-启动训练之前,需要修改下面的四个字段
+Before starting training, you need to modify the following four fields
-1. `Train.dataset.data_dir`:指向训练集图片存放目录
-2. `Train.dataset.label_file_list`:指向训练集标注文件
-3. `Eval.dataset.data_dir`:指指向验证集图片存放目录
-4. `Eval.dataset.label_file_list`:指向验证集标注文件
+1. `Train.dataset.data_dir`: point to the directory where the training set images are stored
+2. `Train.dataset.label_file_list`: point to the training set label file
+3. `Eval.dataset.data_dir`: refers to the directory where the validation set images are stored
+4. `Eval.dataset.label_file_list`: point to the validation set label file
-* 启动训练
+* start training
```shell
CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml
-```
+````
-最终会打印出`precision`, `recall`, `hmean`等指标。
-在`./output/ser_layoutxlm/`文件夹中会保存训练日志,最优的模型和最新epoch的模型。
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed.
+In the `./output/ser_layoutxlm/` folder will save the training log, the optimal model and the model for the latest epoch.
-* 恢复训练
+* resume training
-恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+To resume training, assign the folder path of the previously trained model to the `Architecture.Backbone.checkpoints` field.
```shell
CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
+````
-* 评估
+* evaluate
-评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+Evaluation requires assigning the folder path of the model to be evaluated to the `Architecture.Backbone.checkpoints` field.
```shell
CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
-最终会打印出`precision`, `recall`, `hmean`等指标
+````
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed
-* 使用`OCR引擎 + SER`串联预测
+* Use `OCR engine + SER` tandem prediction
-使用如下命令即可完成`OCR引擎 + SER`的串联预测
+Use the following command to complete the series prediction of `OCR engine + SER`, taking the pretrained SER model as an example:
```shell
-CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=ser_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_42.jpg
-```
+CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_42.jpg
+````
-最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为`infer_results.txt`。
+Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.
-* 对`OCR引擎 + SER`预测系统进行端到端评估
+* End-to-end evaluation of `OCR engine + SER` prediction system
-首先使用 `tools/infer_vqa_token_ser.py` 脚本完成数据集的预测,然后使用下面的命令进行评估。
+First use the `tools/infer_vqa_token_ser.py` script to complete the prediction of the dataset, then use the following command to evaluate.
```shell
export CUDA_VISIBLE_DEVICES=0
-python3 helper/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
-```
+python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
+````
### 5.3 RE
-* 启动训练
+* start training
-启动训练之前,需要修改下面的四个字段
+Before starting training, you need to modify the following four fields
-1. `Train.dataset.data_dir`:指向训练集图片存放目录
-2. `Train.dataset.label_file_list`:指向训练集标注文件
-3. `Eval.dataset.data_dir`:指指向验证集图片存放目录
-4. `Eval.dataset.label_file_list`:指向验证集标注文件
+1. `Train.dataset.data_dir`: point to the directory where the training set images are stored
+2. `Train.dataset.label_file_list`: point to the training set label file
+3. `Eval.dataset.data_dir`: refers to the directory where the validation set images are stored
+4. `Eval.dataset.label_file_list`: point to the validation set label file
```shell
CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml
-```
+````
-最终会打印出`precision`, `recall`, `hmean`等指标。
-在`./output/re_layoutxlm/`文件夹中会保存训练日志,最优的模型和最新epoch的模型。
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed.
+In the `./output/re_layoutxlm/` folder will save the training log, the optimal model and the model for the latest epoch.
-* 恢复训练
+* resume training
-恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+To resume training, assign the folder path of the previously trained model to the `Architecture.Backbone.checkpoints` field.
```shell
CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
+````
-* 评估
+* evaluate
-评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+Evaluation requires assigning the folder path of the model to be evaluated to the `Architecture.Backbone.checkpoints` field.
```shell
CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
-```
-最终会打印出`precision`, `recall`, `hmean`等指标
+````
+Finally, `precision`, `recall`, `hmean` and other indicators will be printed
-* 使用`OCR引擎 + SER + RE`串联预测
+* Use `OCR engine + SER + RE` tandem prediction
-使用如下命令即可完成`OCR引擎 + SER + RE`的串联预测
+Use the following command to complete the series prediction of `OCR engine + SER + RE`, taking the pretrained SER and RE models as an example:
```shell
export CUDA_VISIBLE_DEVICES=0
-python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=re_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=ser_LayoutXLM_xfun_zh/
-```
+python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm. yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
+````
-最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为`infer_results.txt`。
+Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.
-## 6. 参考链接
+## 6. Reference Links
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
- XFUND dataset, https://github.com/doc-analysis/XFUND
+
+## License
+
+The content of this project itself is licensed under the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
diff --git a/ppstructure/vqa/README_ch.md b/ppstructure/vqa/README_ch.md
new file mode 100644
index 0000000000000000000000000000000000000000..ff513f8f7d603d66a372ce383883f3bcf97a7880
--- /dev/null
+++ b/ppstructure/vqa/README_ch.md
@@ -0,0 +1,257 @@
+[English](README.md) | 简体中文
+
+- [文档视觉问答(DOC-VQA)](#文档视觉问答doc-vqa)
+ - [1. 简介](#1-简介)
+ - [2. 性能](#2-性能)
+ - [3. 效果演示](#3-效果演示)
+ - [3.1 SER](#31-ser)
+ - [3.2 RE](#32-re)
+ - [4. 安装](#4-安装)
+ - [4.1 安装依赖](#41-安装依赖)
+ - [4.2 安装PaddleOCR(包含 PP-OCR 和 VQA)](#42-安装paddleocr包含-pp-ocr-和-vqa)
+ - [5. 使用](#5-使用)
+ - [5.1 数据和预训练模型准备](#51-数据和预训练模型准备)
+ - [5.2 SER](#52-ser)
+ - [5.3 RE](#53-re)
+ - [6. 参考链接](#6-参考链接)
+
+# 文档视觉问答(DOC-VQA)
+
+## 1. 简介
+
+VQA指视觉问答,主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种,DOC-VQA主要针对文本图像的文字内容提出问题。
+
+PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进行开发。
+
+主要特性如下:
+
+- 集成[LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf)模型以及PP-OCR预测引擎。
+- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
+- 支持SER任务和RE任务的自定义训练。
+- 支持OCR+SER的端到端系统预测与评估。
+- 支持OCR+SER+RE的端到端系统预测。
+
+本项目是 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/pdf/2104.08836.pdf) 在 Paddle 2.2上的开源实现,
+包含了在 [XFUND数据集](https://github.com/doc-analysis/XFUND) 上的微调代码。
+
+## 2. 性能
+
+我们在 [XFUND](https://github.com/doc-analysis/XFUND) 的中文数据集上对算法进行了评估,性能如下
+
+| 模型 | 任务 | hmean | 模型下载地址 |
+|:---:|:---:|:---:| :---:|
+| LayoutXLM | SER | 0.9038 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
+| LayoutXLM | RE | 0.7483 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
+| LayoutLMv2 | SER | 0.8544 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)
+| LayoutLMv2 | RE | 0.6777 | [链接](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
+| LayoutLM | SER | 0.7731 | [链接](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
+
+## 3. 效果演示
+
+**注意:** 测试图片来源于XFUND数据集。
+
+### 3.1 SER
+
+ | 
+---|---
+
+图中不同颜色的框表示不同的类别,对于XFUND数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别
+
+* 深紫色:HEADER
+* 浅紫色:QUESTION
+* 军绿色:ANSWER
+
+在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+
+### 3.2 RE
+
+ | 
+---|---
+
+
+图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
+
+## 4. 安装
+
+### 4.1 安装依赖
+
+- **(1) 安装PaddlePaddle**
+
+```bash
+python3 -m pip install --upgrade pip
+
+# GPU安装
+python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple
+
+# CPU安装
+python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple
+
+```
+更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+### 4.2 安装PaddleOCR(包含 PP-OCR 和 VQA)
+
+- **(1)pip快速安装PaddleOCR whl包(仅预测)**
+
+```bash
+python3 -m pip install paddleocr
+```
+
+- **(2)下载VQA源码(预测+训练)**
+
+```bash
+【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
+
+# 如果因为网络问题无法pull成功,也可选择使用码云上的托管:
+git clone https://gitee.com/paddlepaddle/PaddleOCR
+
+# 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
+```
+
+- **(3)安装VQA的`requirements`**
+
+```bash
+python3 -m pip install -r ppstructure/vqa/requirements.txt
+```
+
+## 5. 使用
+
+### 5.1 数据和预训练模型准备
+
+如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。
+
+* 下载处理好的数据集
+
+处理好的XFUND中文数据集下载地址:[https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)。
+
+
+下载并解压该数据集,解压后将数据集放置在当前目录下。
+
+```shell
+wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
+```
+
+* 转换数据集
+
+若需进行其他XFUND数据集的训练,可使用下面的命令进行数据集的转换
+
+```bash
+python3 ppstructure/vqa/tools/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
+```
+
+* 下载预训练模型
+```bash
+mkdir pretrain && cd pretrain
+#下载SER模型
+wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar && tar -xvf ser_LayoutXLM_xfun_zh.tar
+#下载RE模型
+wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -xvf re_LayoutXLM_xfun_zh.tar
+cd ../
+```
+
+### 5.2 SER
+
+启动训练之前,需要修改下面的四个字段
+
+1. `Train.dataset.data_dir`:指向训练集图片存放目录
+2. `Train.dataset.label_file_list`:指向训练集标注文件
+3. `Eval.dataset.data_dir`:指指向验证集图片存放目录
+4. `Eval.dataset.label_file_list`:指向验证集标注文件
+
+* 启动训练
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml
+```
+
+最终会打印出`precision`, `recall`, `hmean`等指标。
+在`./output/ser_layoutxlm/`文件夹中会保存训练日志,最优的模型和最新epoch的模型。
+
+* 恢复训练
+
+恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+
+* 评估
+
+评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+最终会打印出`precision`, `recall`, `hmean`等指标
+
+* 使用`OCR引擎 + SER`串联预测
+
+使用如下命令即可完成`OCR引擎 + SER`的串联预测, 以SER预训练模型为例:
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_42.jpg
+```
+
+最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为`infer_results.txt`。
+
+* 对`OCR引擎 + SER`预测系统进行端到端评估
+
+首先使用 `tools/infer_vqa_token_ser.py` 脚本完成数据集的预测,然后使用下面的命令进行评估。
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
+```
+
+### 5.3 RE
+
+* 启动训练
+
+启动训练之前,需要修改下面的四个字段
+
+1. `Train.dataset.data_dir`:指向训练集图片存放目录
+2. `Train.dataset.label_file_list`:指向训练集标注文件
+3. `Eval.dataset.data_dir`:指指向验证集图片存放目录
+4. `Eval.dataset.label_file_list`:指向验证集标注文件
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml
+```
+
+最终会打印出`precision`, `recall`, `hmean`等指标。
+在`./output/re_layoutxlm/`文件夹中会保存训练日志,最优的模型和最新epoch的模型。
+
+* 恢复训练
+
+恢复训练需要将之前训练好的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+
+* 评估
+
+评估需要将待评估的模型所在文件夹路径赋值给 `Architecture.Backbone.checkpoints` 字段。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=path/to/model_dir
+```
+最终会打印出`precision`, `recall`, `hmean`等指标
+
+* 使用`OCR引擎 + SER + RE`串联预测
+
+使用如下命令即可完成`OCR引擎 + SER + RE`的串联预测, 以预训练SER和RE模型为例:
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
+```
+
+最终会在`config.Global.save_res_path`字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为`infer_results.txt`。
+
+## 6. 参考链接
+
+- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
+- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
+- XFUND dataset, https://github.com/doc-analysis/XFUND
+
+## License
+
+The content of this project itself is licensed under the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
diff --git a/ppstructure/vqa/helper/eval_with_label_end2end.py b/ppstructure/vqa/tools/eval_with_label_end2end.py
similarity index 100%
rename from ppstructure/vqa/helper/eval_with_label_end2end.py
rename to ppstructure/vqa/tools/eval_with_label_end2end.py
diff --git a/ppstructure/vqa/helper/trans_xfun_data.py b/ppstructure/vqa/tools/trans_xfun_data.py
similarity index 100%
rename from ppstructure/vqa/helper/trans_xfun_data.py
rename to ppstructure/vqa/tools/trans_xfun_data.py
diff --git a/requirements.txt b/requirements.txt
index 1d9522aa0167c60ffce263a35b86640efb1438b2..b60d48371337e38bde6e51171aa6ecfb9573fb4d 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -12,4 +12,3 @@ cython
lxml
premailer
openpyxl
-fasttext==0.9.1
diff --git a/test_tipc/configs/ch_PP-OCRv2_rec_PACT/train_infer_python.txt b/test_tipc/configs/ch_PP-OCRv2_rec_PACT/train_infer_python.txt
index 03d749f55765b2ea9e82d538cb4e6fb3d29e0b9f..98c125229d7f968cd3f650c3885ba4edb0de754c 100644
--- a/test_tipc/configs/ch_PP-OCRv2_rec_PACT/train_infer_python.txt
+++ b/test_tipc/configs/ch_PP-OCRv2_rec_PACT/train_infer_python.txt
@@ -1,13 +1,13 @@
===========================train_params===========================
-model_name:PPOCRv2_ocr_rec_pact
+model_name:ch_PPOCRv2_rec_PACT
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:fp32
-Global.epoch_num:lite_train_lite_infer=3|whole_train_whole_infer=300
+Global.epoch_num:lite_train_lite_infer=6|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=16|whole_train_whole_infer=128
-Global.pretrained_model:null
+Global.pretrained_model:pretrain_models/ch_PP-OCRv2_rec_train/best_accuracy
train_model_name:latest
train_infer_img_dir:./inference/rec_inference
null:null
diff --git a/test_tipc/configs/det_mv3_east_v2.0/train_infer_python.txt b/test_tipc/configs/det_mv3_east_v2.0/train_infer_python.txt
index 0603fa10a640fd6d7b71582a92b92f026b4d1d51..5634297973bafbdad6c168e369d15520db09aba3 100644
--- a/test_tipc/configs/det_mv3_east_v2.0/train_infer_python.txt
+++ b/test_tipc/configs/det_mv3_east_v2.0/train_infer_python.txt
@@ -1,13 +1,13 @@
===========================train_params===========================
model_name:det_mv3_east_v2.0
python:python3.7
-gpu_list:0
+gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:fp32
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=500
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
-Global.pretrained_model:null
+Global.pretrained_model:./pretrain_models/det_mv3_east_v2.0_train/best_accuracy
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
diff --git a/test_tipc/configs/rec_r31_sar/train_infer_python.txt b/test_tipc/configs/rec_r31_sar/train_infer_python.txt
index c5018500f9a58297b30729e9f68b42806a7631e2..1a32a3d507d8923a8b51be726c7624ea2049ae14 100644
--- a/test_tipc/configs/rec_r31_sar/train_infer_python.txt
+++ b/test_tipc/configs/rec_r31_sar/train_infer_python.txt
@@ -50,4 +50,4 @@ inference:tools/infer/predict_rec.py --rec_char_dict_path=./ppocr/utils/dict90.t
--benchmark:True
null:null
===========================infer_benchmark_params==========================
-random_infer_input:[{float32,[3,48,48,160]}]
+random_infer_input:[{float32,[3,48,160]}]
diff --git a/test_tipc/prepare.sh b/test_tipc/prepare.sh
index dd66b3dbf94391e472c96d409291398245337d93..6a8983009e527b8a59b41c1d9b950e8e3f349ef2 100644
--- a/test_tipc/prepare.sh
+++ b/test_tipc/prepare.sh
@@ -60,9 +60,13 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
ln -s ./icdar2015_lite ./icdar2015
cd ../
cd ./inference && tar xf rec_inference.tar && cd ../
- if [ ${model_name} == "ch_PPOCRv2_det" ]; then
- wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar --no-check-certificate
- cd ./pretrain_models/ && tar xf ch_ppocr_mobile_v2.0_det_train.tar && cd ../
+ if [ ${model_name} == "ch_PPOCRv2_det" ] || [ ${model_name} == "ch_PPOCRv2_det_PACT" ]; then
+ wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar --no-check-certificate
+ cd ./pretrain_models/ && tar xf ch_ppocr_server_v2.0_det_train.tar && cd ../
+ fi
+ if [ ${model_name} == "ch_PPOCRv2_rec" ] || [ ${model_name} == "ch_PPOCRv2_rec_PACT" ]; then
+ wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar --no-check-certificate
+ cd ./pretrain_models/ && tar xf ch_PP-OCRv2_rec_train.tar && cd ../
fi
if [ ${model_name} == "det_r18_db_v2_0" ]; then
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet18_vd_pretrained.pdparams --no-check-certificate
@@ -91,6 +95,10 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar --no-check-certificate
cd ./pretrain_models/ && tar xf ch_ppocr_mobile_v2.0_rec_train.tar && cd ../
fi
+ if [ ${model_name} == "det_mv3_east_v2.0" ]; then
+ wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar --no-check-certificate
+ cd ./pretrain_models/ && tar xf det_mv3_east_v2.0_train.tar && cd ../
+ fi
elif [ ${MODE} = "whole_train_whole_infer" ];then
wget -nc -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams --no-check-certificate
diff --git a/tools/infer/utility.py b/tools/infer/utility.py
index 80abba67b293e3412afa6c1ea8da0291331ef8de..fd2605d963db38ed74908f4e069fa480cfd01baf 100644
--- a/tools/infer/utility.py
+++ b/tools/infer/utility.py
@@ -312,12 +312,22 @@ def create_predictor(args, mode, logger):
input_names = predictor.get_input_names()
for name in input_names:
input_tensor = predictor.get_input_handle(name)
- output_names = predictor.get_output_names()
- output_tensors = []
+ output_tensors = get_output_tensors(args, mode, predictor)
+ return predictor, input_tensor, output_tensors, config
+
+
+def get_output_tensors(args, mode, predictor):
+ output_names = predictor.get_output_names()
+ output_tensors = []
+ if mode == "rec" and args.rec_algorithm == "CRNN":
+ output_name = 'softmax_0.tmp_0'
+ if output_name in output_names:
+ return [predictor.get_output_handle(output_name)]
+ else:
for output_name in output_names:
output_tensor = predictor.get_output_handle(output_name)
output_tensors.append(output_tensor)
- return predictor, input_tensor, output_tensors, config
+ return output_tensors
def get_infer_gpuid():