diff --git a/PPOCRLabel/README.md b/PPOCRLabel/README.md
index 2ae98481d65d47838269a34d9aeb7bec7e13af11..250d1fc0c989e6db50deffc06d1c9db47ba4c2f0 100644
--- a/PPOCRLabel/README.md
+++ b/PPOCRLabel/README.md
@@ -206,7 +206,7 @@ the table and pop up Excel at the same time.
- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languagesinclude French, German, Korean, and Japanese.
For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating)
-- **Custom Model**: If users want to replace the built-in model with their own inference model, they can follow the [Custom Model Code Usage](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/whl_en.md#31-use-by-code) by modifying PPOCRLabel.py for [Instantiation of PaddleOCR class](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/PPOCRLabel/PPOCRLabel.py#L86) :
+- **Custom Model**: If users want to replace the built-in model with their own inference model, they can follow the [Custom Model Code Usage](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/whl_en.md#31-use-by-code) by modifying PPOCRLabel.py for [Instantiation of PaddleOCR class](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/PPOCRLabel/PPOCRLabel.py#L97) :
add parameter `det_model_dir` in `self.ocr = PaddleOCR(use_pdserving=False, use_angle_cls=True, det=True, cls=True, use_gpu=gpu, lang=lang) `
diff --git a/doc/doc_ch/knowledge_distillation.md b/doc/doc_ch/knowledge_distillation.md
index 76ab5e62e0136bb8707a51537bf6693fc5687047..2adba3659e101fe214d31b805d0800fd5128595c 100644
--- a/doc/doc_ch/knowledge_distillation.md
+++ b/doc/doc_ch/knowledge_distillation.md
@@ -293,8 +293,8 @@ Loss:
以上述配置为例,最终蒸馏训练的损失函数包含下面5个部分。
-- `Student`和`Teacher`最终输出(`head_out`)的CTC分支与gt的CTC loss,权重为1。在这里因为2个子网络都需要更新参数,因此2者都需要计算与g的loss。
-- `Student`和`Teacher`最终输出(`head_out`)的SAR分支与gt的SAR loss,权重为1.0。在这里因为2个子网络都需要更新参数,因此2者都需要计算与g的loss。
+- `Student`和`Teacher`最终输出(`head_out`)的CTC分支与gt的CTC loss,权重为1。在这里因为2个子网络都需要更新参数,因此2者都需要计算与gt的loss。
+- `Student`和`Teacher`最终输出(`head_out`)的SAR分支与gt的SAR loss,权重为1.0。在这里因为2个子网络都需要更新参数,因此2者都需要计算与gt的loss。
- `Student`和`Teacher`最终输出(`head_out`)的CTC分支之间的DML loss,权重为1。
- `Student`和`Teacher`最终输出(`head_out`)的SAR分支之间的DML loss,权重为0.5。
- `Student`和`Teacher`的骨干网络输出(`backbone_out`)之间的l2 loss,权重为1。
@@ -374,7 +374,7 @@ paddle.save(s_params, "ch_PP-OCRv3_rec_train/student.pdparams")
### 2.2 检测配置文件解析
-检测模型蒸馏的配置文件在PaddleOCR/configs/det/ch_PP-OCRv3/目录下,包含两个个蒸馏配置文件:
+检测模型蒸馏的配置文件在PaddleOCR/configs/det/ch_PP-OCRv3/目录下,包含两个蒸馏配置文件:
- ch_PP-OCRv3_det_cml.yml,采用cml蒸馏,采用一个大模型蒸馏两个小模型,且两个小模型互相学习的方法
- ch_PP-OCRv3_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法
@@ -383,7 +383,7 @@ paddle.save(s_params, "ch_PP-OCRv3_rec_train/student.pdparams")
知识蒸馏任务中,模型结构配置如下所示:
-```
+```yaml
Architecture:
name: DistillationModel # 结构名称,蒸馏任务中,为DistillationModel,用于构建对应的结构
algorithm: Distillation # 算法名称
@@ -424,11 +424,11 @@ Architecture:
```
-如果是采用DML,即两个小模型互相学习的方法,上述配置文件里的Teacher网络结构需要设置为Student模型一样的配置,具体参考配置文件[ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。
-
-下面介绍[ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)的配置文件参数:
+如果是采用DML,即两个小模型互相学习的方法,上述配置文件里的Teacher网络结构需要设置为Student模型一样的配置,具体参考配置文件[ch_PP-OCRv3_det_dml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。
-```
+下面介绍[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)的配置文件参数:
+
+```yaml
Architecture:
name: DistillationModel
algorithm: Distillation
diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md
index 318d5874f5e01390976723ccdb98012b95a6eb7f..c36cac037e1f90a41da24bc64cacbbb860e04c6b 100644
--- a/doc/doc_ch/models_list.md
+++ b/doc/doc_ch/models_list.md
@@ -22,7 +22,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|模型类型|模型格式|简介|
|--- | --- | --- |
-|推理模型|inference.pdmodel、inference.pdiparams|用于预测引擎推理,[详情](./inference.md)|
+|推理模型|inference.pdmodel、inference.pdiparams|用于预测引擎推理,[详情](./inference_ppocr.md)|
|训练模型、预训练模型|\*.pdparams、\*.pdopt、\*.states |训练过程中保存的模型的参数、优化器状态和训练中间信息,多用于模型指标评估和恢复训练|
|nb模型|\*.nb|经过飞桨Paddle-Lite工具优化后的模型,适用于移动端/IoT端等端侧部署场景(需使用飞桨Paddle Lite部署)。|
@@ -114,7 +114,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
| ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt |卡纳达文识别|[ka_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ka_PP-OCRv3_rec.yml)|9.9M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) |
| ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |泰米尔文识别|[ta_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ta_PP-OCRv3_rec.yml)|9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) |
| latin_PP-OCRv3_rec | ppocr/utils/dict/latin_dict.txt | 拉丁文识别 | [latin_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/latin_PP-OCRv3_rec.yml) |9.7M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_train.tar) |
-| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | 阿拉伯字母 | [arabic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/rec_arabic_lite_train.yml) |9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) |
+| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | 阿拉伯字母 | [arabic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml) |9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) |
| cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | 斯拉夫字母 | [cyrillic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/cyrillic_PP-OCRv3_rec.yml) |9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) |
| devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt |梵文字母 | [devanagari_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/devanagari_PP-OCRv3_rec.yml) |9.9M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) |
diff --git a/doc/doc_ch/multi_languages.md b/doc/doc_ch/multi_languages.md
index 499fdd9881563b3a784b5f4ba4feace54f1a3a6a..1f337bdf49f4c6f90b8b35af85bc1316dae308a4 100644
--- a/doc/doc_ch/multi_languages.md
+++ b/doc/doc_ch/multi_languages.md
@@ -238,10 +238,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
## 4 预测部署
除了安装whl包进行快速预测,ppocr 也提供了多种预测部署方式,如有需求可阅读相关文档:
-- [基于Python脚本预测引擎推理](./inference.md)
-- [基于C++预测引擎推理](../../deploy/cpp_infer/readme.md)
+- [基于Python脚本预测引擎推理](./inference_ppocr.md)
+- [基于C++预测引擎推理](../../deploy/cpp_infer/readme_ch.md)
- [服务化部署](../../deploy/hubserving/readme.md)
-- [端侧部署](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme.md)
+- [端侧部署](../../deploy/lite/readme_ch.md)
- [Benchmark](./benchmark.md)
diff --git a/doc/doc_ch/serving_inference.md b/doc/doc_ch/serving_inference.md
deleted file mode 100644
index 30ea7ee7c11692ba02e8314036d74a21c2f090e5..0000000000000000000000000000000000000000
--- a/doc/doc_ch/serving_inference.md
+++ /dev/null
@@ -1,238 +0,0 @@
-# 使用Paddle Serving预测推理
-
-阅读本文档之前,请先阅读文档 [基于Python预测引擎推理](./inference.md)
-
-同本地执行预测一样,我们需要保存一份可以用于Paddle Serving的模型。
-
-接下来首先介绍如何将训练的模型转换成Paddle Serving模型,然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。
-
-### 一、 准备环境
-我们先安装Paddle Serving相关组件
-我们推荐用户使用GPU来做Paddle Serving的OCR服务部署
-
-**CUDA版本:9.X/10.X**
-
-**CUDNN版本:7.X**
-
-**操作系统版本:Linux/Windows**
-
-**Python版本: 2.7/3.5/3.6/3.7**
-
-**Python操作指南:**
-
-目前Serving用于OCR的部分功能还在测试当中,因此在这里我们给出[Servnig latest package](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md)
-大家根据自己的环境选择需要安装的whl包即可,例如以Python 3.5为例,执行下列命令
-```
-#CPU/GPU版本选择一个
-#GPU版本服务端
-#CUDA 9
-python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post9-py3-none-any.whl
-#CUDA 10
-python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post10-py3-none-any.whl
-#CPU版本服务端
-python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.0.0-py3-none-any.whl
-#客户端和App包使用以下链接(CPU,GPU通用)
-python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp36-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.0.0-py3-none-any.whl
-```
-
-## 二、训练模型转Serving模型
-
-在前序文档 [基于Python预测引擎推理](./inference.md) 中,我们提供了如何把训练的checkpoint转换成Paddle模型。Paddle模型通常由一个文件夹构成,内含模型结构描述文件`model`和模型参数文件`params`。Serving模型由两个文件夹构成,用于存放客户端和服务端的配置。
-
-我们以`ch_rec_r34_vd_crnn`模型作为例子,下载链接在:
-
-```
-wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
-tar xf ch_rec_r34_vd_crnn_infer.tar
-```
-因此我们按照Serving模型转换教程,运行下列python文件。
-```
-python tools/inference_to_serving.py --model_dir ch_rec_r34_vd_crnn
-```
-最终会在`serving_client_dir`和`serving_server_dir`生成客户端和服务端的模型配置。其中`serving_server_dir`和`serving_client_dir`的名字可以自定义。最终文件结构如下
-
-```
-/ch_rec_r34_vd_crnn/
-├── serving_client_dir # 客户端配置文件夹
-└── serving_server_dir # 服务端配置文件夹
-```
-
-## 三、文本检测模型Serving推理
-
-启动服务可以根据实际需求选择启动`标准版`或者`快速版`,两种方式的对比如下表:
-
-|版本|特点|适用场景|
-|-|-|-|
-|标准版|稳定性高,分布式部署|适用于吞吐量大,需要跨机房部署的情况|
-|快速版|部署方便,预测速度快|适用于对预测速度要求高,迭代速度快的场景,Windows用户只能选择快速版|
-
-接下来的命令中,我们会指定快速版和标准版的命令。需要说明的是,标准版只能用Linux平台,快速版可以支持Linux/Windows。
-文本检测模型推理,默认使用DB模型的配置参数,识别默认为CRNN。
-
-配置文件在`params.py`中,我们贴出配置部分,如果需要做改动,也在这个文件内部进行修改。
-
-```
-def read_params():
- cfg = Config()
- #use gpu
- cfg.use_gpu = False # 是否使用GPU
- cfg.use_pdserving = True # 是否使用paddleserving,必须为True
-
- #params for text detector
- cfg.det_algorithm = "DB" # 检测算法, DB/EAST等
- cfg.det_model_dir = "./det_mv_server/" # 检测算法模型路径
- cfg.det_max_side_len = 960
-
- #DB params
- cfg.det_db_thresh =0.3
- cfg.det_db_box_thresh =0.5
- cfg.det_db_unclip_ratio =2.0
-
- #EAST params
- cfg.det_east_score_thresh = 0.8
- cfg.det_east_cover_thresh = 0.1
- cfg.det_east_nms_thresh = 0.2
-
- #params for text recognizer
- cfg.rec_algorithm = "CRNN" # 识别算法, CRNN/RARE等
- cfg.rec_model_dir = "./ocr_rec_server/" # 识别算法模型路径
-
- cfg.rec_image_shape = "3, 32, 320"
- cfg.rec_batch_num = 30
- cfg.max_text_length = 25
-
- cfg.rec_char_dict_path = "./ppocr_keys_v1.txt" # 识别算法字典文件
- cfg.use_space_char = True
-
- #params for text classifier
- cfg.use_angle_cls = True # 是否启用分类算法
- cfg.cls_model_dir = "./ocr_clas_server/" # 分类算法模型路径
- cfg.cls_image_shape = "3, 48, 192"
- cfg.label_list = ['0', '180']
- cfg.cls_batch_num = 30
- cfg.cls_thresh = 0.9
-
- return cfg
-```
-与本地预测不同的是,Serving预测需要一个客户端和一个服务端,因此接下来的教程都是两行代码。
-
-在正式执行服务端启动命令之前,先export PYTHONPATH到工程主目录下。
-```
-export PYTHONPATH=$PWD:$PYTHONPATH
-cd deploy/pdserving
-```
-为了方便用户复现Demo程序,我们提供了Chinese and English ultra-lightweight OCR model (8.1M)版本的Serving模型
-```
-wget --no-check-certificate https://paddleocr.bj.bcebos.com/deploy/pdserving/ocr_pdserving_suite.tar.gz
-tar xf ocr_pdserving_suite.tar.gz
-```
-
-### 1. 超轻量中文检测模型推理
-
-超轻量中文检测模型推理,可以执行如下命令启动服务端:
-
-```
-#根据环境只需要启动其中一个就可以
-python det_rpc_server.py #标准版,Linux用户
-python det_local_server.py #快速版,Windows/Linux用户
-```
-
-客户端
-
-```
-python det_web_client.py
-```
-
-
-Serving的推测和本地预测不同点在于,客户端发送请求到服务端,服务端需要检测到文字框之后返回框的坐标,此处没有后处理的图片,只能看到坐标值。
-
-## 四、文本识别模型Serving推理
-
-下面将介绍超轻量中文识别模型推理、基于CTC损失的识别模型推理和基于Attention损失的识别模型推理。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。此外,如果训练时修改了文本的字典,请参考下面的自定义文本识别字典的推理。
-
-### 1. 超轻量中文识别模型推理
-
-超轻量中文识别模型推理,可以执行如下命令启动服务端:
-需要注意params.py中的`--use_gpu`的值
-```
-#根据环境只需要启动其中一个就可以
-python rec_rpc_server.py #标准版,Linux用户
-python rec_local_server.py #快速版,Windows/Linux用户
-```
-如果需要使用CPU版本,还需增加 `--use_gpu False`。
-
-客户端
-
-```
-python rec_web_client.py
-```
-
-![](../imgs_words/ch/word_4.jpg)
-
-执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:
-
-```
-{u'result': {u'score': [u'0.89547354'], u'pred_text': ['实力活力']}}
-```
-
-
-
-## 五、方向分类模型推理
-
-下面将介绍方向分类模型推理。
-
-
-
-### 1. 方向分类模型推理
-
-方向分类模型推理, 可以执行如下命令启动服务端:
-需要注意params.py中的`--use_gpu`的值
-```
-#根据环境只需要启动其中一个就可以
-python clas_rpc_server.py #标准版,Linux用户
-python clas_local_server.py #快速版,Windows/Linux用户
-```
-
-客户端
-
-```
-python rec_web_client.py
-```
-
-![](../imgs_words/ch/word_4.jpg)
-
-执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下:
-
-```
-{u'result': {u'direction': [u'0'], u'score': [u'0.9999963']}}
-```
-
-
-## 六、文本检测、方向分类和文字识别串联Serving推理
-
-### 1. 超轻量中文OCR模型推理
-
-在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir`和`rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。与本地预测不同的是,为了减少网络传输耗时,可视化识别结果目前不做处理,用户收到的是推理得到的文字字段。
-
-执行如下命令启动服务端:
-需要注意params.py中的`--use_gpu`的值
-```
-#标准版,Linux用户
-#GPU用户
-python -m paddle_serving_server_gpu.serve --model det_infer_server --port 9293 --gpu_id 0
-python -m paddle_serving_server_gpu.serve --model cls_infer_server --port 9294 --gpu_id 0
-python ocr_rpc_server.py
-#CPU用户
-python -m paddle_serving_server.serve --model det_infer_server --port 9293
-python -m paddle_serving_server.serve --model cls_infer_server --port 9294
-python ocr_rpc_server.py
-
-#快速版,Windows/Linux用户
-python ocr_local_server.py
-```
-
-客户端
-
-```
-python rec_web_client.py
-```
diff --git a/doc/doc_ch/update.md b/doc/doc_ch/update.md
index 9927a228d270c0e4c5ddbd5ab2a0d35911c6a75a..07591ea126f5b168389657141673142c084e67ad 100644
--- a/doc/doc_ch/update.md
+++ b/doc/doc_ch/update.md
@@ -1,9 +1,9 @@
# 更新
- 2022.5.9 发布PaddleOCR v2.5。发布内容包括:
- - [PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
- - 半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
+ - [PP-OCRv3](./ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
+ - 半自动标注工具[PPOCRLabelv2](../../PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
- OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求
- - 交互式OCR开源电子书[《动手学OCR》](./doc/doc_ch/ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。
+ - 交互式OCR开源电子书[《动手学OCR》](./ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。
- 2022.5.7 添加对[Weights & Biases](https://docs.wandb.ai/)训练日志记录工具的支持。
- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。
diff --git a/doc/doc_en/algorithm_det_east_en.md b/doc/doc_en/algorithm_det_east_en.md
index 3955809a49a595aa59717bafcfbb23146ae96bd2..07c434a9b162d9d373f5f357522cbd752be1afc1 100644
--- a/doc/doc_en/algorithm_det_east_en.md
+++ b/doc/doc_en/algorithm_det_east_en.md
@@ -40,7 +40,7 @@ Please prepare your environment referring to [prepare the environment](./environ
The above EAST model is trained using the ICDAR2015 text detection public dataset. For the download of the dataset, please refer to [ocr_datasets](./dataset/ocr_datasets_en.md).
-After the data download is complete, please refer to [Text Detection Training Tutorial](./detection.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
+After the data download is complete, please refer to [Text Detection Training Tutorial](./detection_en.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
diff --git a/doc/doc_en/algorithm_det_fcenet_en.md b/doc/doc_en/algorithm_det_fcenet_en.md
index e15fb9a07ede3296d3de83c134457194d4639a1c..f3c51a91a486342b86828167de5c1b386b42cc66 100644
--- a/doc/doc_en/algorithm_det_fcenet_en.md
+++ b/doc/doc_en/algorithm_det_fcenet_en.md
@@ -37,7 +37,7 @@ Please prepare your environment referring to [prepare the environment](./environ
The above FCE model is trained using the CTW1500 text detection public dataset. For the download of the dataset, please refer to [ocr_datasets](./dataset/ocr_datasets_en.md).
-After the data download is complete, please refer to [Text Detection Training Tutorial](./detection.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
+After the data download is complete, please refer to [Text Detection Training Tutorial](./detection_en.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
## 4. Inference and Deployment
diff --git a/doc/doc_en/algorithm_det_psenet_en.md b/doc/doc_en/algorithm_det_psenet_en.md
index d4cb3ea7d1e82a3f9c261c6e44cd6df6b0f6bf1e..3977a156ace3beb899e105bc381e27af6e825d6a 100644
--- a/doc/doc_en/algorithm_det_psenet_en.md
+++ b/doc/doc_en/algorithm_det_psenet_en.md
@@ -39,7 +39,7 @@ Please prepare your environment referring to [prepare the environment](./environ
The above PSE model is trained using the ICDAR2015 text detection public dataset. For the download of the dataset, please refer to [ocr_datasets](./dataset/ocr_datasets_en.md).
-After the data download is complete, please refer to [Text Detection Training Tutorial](./detection.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
+After the data download is complete, please refer to [Text Detection Training Tutorial](./detection_en.md) for training. PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different detection models.
## 4. Inference and Deployment
diff --git a/doc/doc_en/algorithm_e2e_pgnet_en.md b/doc/doc_en/algorithm_e2e_pgnet_en.md
index c7cb3221ccfd897e2fd9062a828c2fe0ceb42024..ab74c57bc3d4d97852641cd708a2dceea5732ba7 100644
--- a/doc/doc_en/algorithm_e2e_pgnet_en.md
+++ b/doc/doc_en/algorithm_e2e_pgnet_en.md
@@ -36,7 +36,7 @@ The results of detection and recognition are as follows:
## 2. Environment Configuration
-Please refer to [Operation Environment Preparation](./environment_en.md) to configure PaddleOCR operating environment first, refer to [PaddleOCR Overview and Project Clone](./paddleOCR_overview_en.md) to clone the project
+Please refer to [Operation Environment Preparation](./environment_en.md) to configure PaddleOCR operating environment first, refer to [Project Clone](./clone_en.md) to clone the project
## 3. Quick Use
diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md
index 18c9cd7d51bdf0129245afca8a759afab5d9d589..383cbe39bbd2eb8ca85f497888920ce87cb1837e 100755
--- a/doc/doc_en/algorithm_overview_en.md
+++ b/doc/doc_en/algorithm_overview_en.md
@@ -41,6 +41,12 @@ On Total-Text dataset, the text detection result is as follows:
| --- | --- | --- | --- | --- | --- |
|SAST|ResNet50_vd|89.63%|78.44%|83.66%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)|
+On CTW1500 dataset, the text detection result is as follows:
+
+|Model|Backbone|Precision|Recall|Hmean| Download link|
+| --- | --- | --- | --- | --- |---|
+|FCE|ResNet50_dcn|88.39%|82.18%|85.27%| [trained model](https://paddleocr.bj.bcebos.com/contribution/det_r50_dcn_fce_ctw_v2.0_train.tar) |
+
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from:
* [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
* [Google Drive](https://drive.google.com/drive/folders/1ll2-XEVyCQLpJjawLDiRlvo_i4BqHCJe?usp=sharing)
diff --git a/doc/doc_en/algorithm_rec_aster_en.md b/doc/doc_en/algorithm_rec_aster_en.md
index 1540681a19f94160e221c37173510395d0fd407f..b949cb5b37c985cc55a2aa01ea5ae4096946bb05 100644
--- a/doc/doc_en/algorithm_rec_aster_en.md
+++ b/doc/doc_en/algorithm_rec_aster_en.md
@@ -33,13 +33,13 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval
## 2. Environment
-Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
-Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
diff --git a/doc/doc_en/algorithm_rec_crnn_en.md b/doc/doc_en/algorithm_rec_crnn_en.md
index 571569ee445d756ca7bdfeea6d5f960187a5a666..8548c2fa625b713d7e7e278506ff5c46713303ed 100644
--- a/doc/doc_en/algorithm_rec_crnn_en.md
+++ b/doc/doc_en/algorithm_rec_crnn_en.md
@@ -33,13 +33,13 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval
## 2. Environment
-Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
-Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
diff --git a/doc/doc_en/algorithm_rec_nrtr_en.md b/doc/doc_en/algorithm_rec_nrtr_en.md
index 3f8fd0adee900cf889d70e8b78fb1122d54c7d08..40c9b91629964dfaed74bf96aaa947f3fce99446 100644
--- a/doc/doc_en/algorithm_rec_nrtr_en.md
+++ b/doc/doc_en/algorithm_rec_nrtr_en.md
@@ -29,13 +29,13 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval
## 2. Environment
-Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
-Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
diff --git a/doc/doc_en/algorithm_rec_sar_en.md b/doc/doc_en/algorithm_rec_sar_en.md
index 8c1e6dbbfa5cc05da4d7423a535c6db74cf8f4c3..24b87c10c3b2839909392bf3de0e0c850112fcdc 100644
--- a/doc/doc_en/algorithm_rec_sar_en.md
+++ b/doc/doc_en/algorithm_rec_sar_en.md
@@ -31,13 +31,13 @@ Note:In addition to using the two text recognition datasets MJSynth and SynthTex
## 2. Environment
-Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
-Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
diff --git a/doc/doc_en/algorithm_rec_seed_en.md b/doc/doc_en/algorithm_rec_seed_en.md
index 21679f42fd6302228804db49d731f9b69ec692b2..f8d7ae6d3f34ab8a4f510c88002b22dbce7a10e8 100644
--- a/doc/doc_en/algorithm_rec_seed_en.md
+++ b/doc/doc_en/algorithm_rec_seed_en.md
@@ -31,13 +31,13 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval
## 2. Environment
-Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
-Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
diff --git a/doc/doc_en/algorithm_rec_srn_en.md b/doc/doc_en/algorithm_rec_srn_en.md
index c022a81f9e5797c531c79de7e793d44d9a22552c..1d7fc07dc29e0de021165fc5656cbca704b45284 100644
--- a/doc/doc_en/algorithm_rec_srn_en.md
+++ b/doc/doc_en/algorithm_rec_srn_en.md
@@ -30,13 +30,13 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval
## 2. Environment
-Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
## 3. Model Training / Evaluation / Prediction
-Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
diff --git a/doc/doc_en/algorithm_rec_starnet.md b/doc/doc_en/algorithm_rec_starnet.md
new file mode 100644
index 0000000000000000000000000000000000000000..dbb53a9c737c16fa249483fa97b0b49cf25b2137
--- /dev/null
+++ b/doc/doc_en/algorithm_rec_starnet.md
@@ -0,0 +1,139 @@
+# STAR-Net
+
+- [1. Introduction](#1)
+- [2. Environment](#2)
+- [3. Model Training / Evaluation / Prediction](#3)
+ - [3.1 Training](#3-1)
+ - [3.2 Evaluation](#3-2)
+ - [3.3 Prediction](#3-3)
+- [4. Inference and Deployment](#4)
+ - [4.1 Python Inference](#4-1)
+ - [4.2 C++ Inference](#4-2)
+ - [4.3 Serving](#4-3)
+ - [4.4 More](#4-4)
+- [5. FAQ](#5)
+
+
+## 1. Introduction
+
+Paper information:
+> [STAR-Net: a spatial attention residue network for scene text recognition.](http://www.bmva.org/bmvc/2016/papers/paper043/paper043.pdf)
+> Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong, Zhizhong Su and Junyu Han.
+> BMVC, pages 43.1-43.13, 2016
+
+Refer to [DTRB](https://arxiv.org/abs/1904.01906) text Recognition Training and Evaluation Process . Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets, the algorithm reproduction effect is as follows:
+
+|Models|Backbone Networks|Avg Accuracy|Configuration Files|Download Links|
+| --- | --- | --- | --- | --- |
+|StarNet|Resnet34_vd|84.44%|[configs/rec/rec_r34_vd_tps_bilstm_ctc.yml](../../configs/rec/rec_r34_vd_tps_bilstm_ctc.yml)|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_ctc_v2.0_train.tar)|
+|StarNet|MobileNetV3|81.42%|[configs/rec/rec_mv3_tps_bilstm_ctc.yml](../../configs/rec/rec_mv3_tps_bilstm_ctc.yml)|[ trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_ctc_v2.0_train.tar)|
+
+
+
+## 2. Environment
+Please refer to [Operating Environment Preparation](./environment_en.md) to configure the PaddleOCR operating environment, and refer to [Project Clone](./clone_en.md) to clone the project code.
+
+
+## 3. Model Training / Evaluation / Prediction
+
+Please refer to [Text Recognition Training Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**. Take the backbone network based on Resnet34_vd as an example:
+
+
+### 3.1 Training
+After the data preparation is complete, the training can be started. The training command is as follows:
+
+````
+#Single card training (long training period, not recommended)
+python3 tools/train.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml #Multi-card training, specify the card number through the --gpus parameter
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c rec_r34_vd_tps_bilstm_ctc.yml
+ ````
+
+
+### 3.2 Evaluation
+
+````
+# GPU evaluation, Global.pretrained_model is the model to be evaluated
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+ ````
+
+
+### 3.3 Prediction
+
+````
+# The configuration file used for prediction must match the training
+python3 tools/infer_rec.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+ ````
+
+
+## 4. Inference
+
+
+### 4.1 Python Inference
+First, convert the model saved during the STAR-Net text recognition training process into an inference model. Take the model trained on the MJSynth and SynthText text recognition datasets based on the Resnet34_vd backbone network as an example [Model download address]( https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar) , which can be converted using the following command:
+
+```shell
+python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.pretrained_model=./rec_r34_vd_tps_bilstm_ctc_v2.0_train/best_accuracy Global.save_inference_dir=./inference/rec_starnet
+ ````
+
+STAR-Net text recognition model inference, you can execute the following commands:
+
+```shell
+python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_starnet/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt"
+ ````
+
+![](../imgs_words_en/word_336.png)
+
+The inference results are as follows:
+
+
+```bash
+Predicts of ./doc/imgs_words_en/word_336.png:('super', 0.9999073)
+```
+
+**Attention** Since the above model refers to the [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the ultra-lightweight Chinese recognition model training in two aspects:
+
+- The image resolutions used during training are different. The image resolutions used for training the above models are [3, 32, 100], while for Chinese model training, in order to ensure the recognition effect of long texts, the image resolutions used during training are [ 3, 32, 320]. The default shape parameter of the predictive inference program is the image resolution used for training Chinese, i.e. [3, 32, 320]. Therefore, when inferring the above English model here, it is necessary to set the shape of the recognized image through the parameter rec_image_shape.
+
+- Character list, the experiment in the DTRB paper is only for 26 lowercase English letters and 10 numbers, a total of 36 characters. All uppercase and lowercase characters are converted to lowercase characters, and characters not listed above are ignored and considered spaces. Therefore, there is no input character dictionary here, but a dictionary is generated by the following command. Therefore, the parameter rec_char_dict_path needs to be set during inference, which is specified as an English dictionary "./ppocr/utils/ic15_dict.txt".
+
+```
+self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+dict_character = list(self.character_str)
+
+
+ ```
+
+
+### 4.2 C++ Inference
+
+After preparing the inference model, refer to the [cpp infer](../../deploy/cpp_infer/) tutorial to operate.
+
+
+### 4.3 Serving
+
+After preparing the inference model, refer to the [pdserving](../../deploy/pdserving/) tutorial for Serving deployment, including two modes: Python Serving and C++ Serving.
+
+
+### 4.4 More
+
+The STAR-Net model also supports the following inference deployment methods:
+
+- Paddle2ONNX Inference: After preparing the inference model, refer to the [paddle2onnx](../../deploy/paddle2onnx/) tutorial.
+
+
+## 5. FAQ
+
+## Quote
+
+```bibtex
+@inproceedings{liu2016star,
+ title={STAR-Net: a spatial attention residue network for scene text recognition.},
+ author={Liu, Wei and Chen, Chaofeng and Wong, Kwan-Yee K and Su, Zhizhong and Han, Junyu},
+ booktitle={BMVC},
+ volume={2},
+ pages={7},
+ year={2016}
+}
+```
+
+
diff --git a/doc/doc_en/algorithm_rec_svtr_en.md b/doc/doc_en/algorithm_rec_svtr_en.md
index 2e7deb4c077ce508773c4789e2e76bdda7dfe8c8..d402a6b49194009a061c6a200322047be5b50a30 100644
--- a/doc/doc_en/algorithm_rec_svtr_en.md
+++ b/doc/doc_en/algorithm_rec_svtr_en.md
@@ -34,7 +34,7 @@ The accuracy (%) and model files of SVTR on the public dataset of scene text rec
## 2. Environment
-Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
#### Dataset Preparation
@@ -44,7 +44,7 @@ Please refer to ["Environment Preparation"](./environment.md) to configure the P
## 3. Model Training / Evaluation / Prediction
-Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
diff --git a/doc/doc_en/knowledge_distillation_en.md b/doc/doc_en/knowledge_distillation_en.md
index bd36907c98c6d556fe1dea85712ece0e717fe426..52725e5c0586b7f7b3e8fdc86d0c24ea38030d53 100755
--- a/doc/doc_en/knowledge_distillation_en.md
+++ b/doc/doc_en/knowledge_distillation_en.md
@@ -438,10 +438,10 @@ Architecture:
```
If DML is used, that is, the method of two small models learning from each other, the Teacher network structure in the above configuration file needs to be set to the same configuration as the Student model.
-Refer to the configuration file for details. [ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)
+Refer to the configuration file for details. [ch_PP-OCRv3_det_dml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)
-The following describes the configuration file parameters [ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml):
+The following describes the configuration file parameters [ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml):
```
Architecture:
diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md
index 8e8c1f2fe11bcd0748d556d34fd184fed4b3a86f..c52f71dfe4124302b8cb308980a6228a89589bd6 100644
--- a/doc/doc_en/models_list_en.md
+++ b/doc/doc_en/models_list_en.md
@@ -20,7 +20,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|model type|model format|description|
|--- | --- | --- |
-|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)|
+|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_ppocr_en.md)|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|nb model|\*.nb| Model optimized by Paddle-Lite, which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for nb model deployment). |
@@ -37,7 +37,7 @@ Relationship of the above models is as follows.
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
-|ch_PP-OCRv3_det_slim| [New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/ch/ch_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.nb)|
+|ch_PP-OCRv3_det_slim| [New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.nb)|
|ch_PP-OCRv3_det| [New] Original lightweight model, supporting Chinese, English, multilingual text detection |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.8M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)|
|ch_PP-OCRv2_det_slim| [New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|ch_PP-OCRv2_det| [New] Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)|
@@ -75,7 +75,7 @@ Relationship of the above models is as follows.
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
-|ch_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 4.9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/ch/ch_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) |
+|ch_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 4.9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) |
|ch_PP-OCRv3_rec| [New] Original lightweight model, supporting Chinese, English, multilingual text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 12.4M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
|ch_PP-OCRv2_rec_slim| Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|ch_PP-OCRv2_rec| Original lightweight model, supporting Chinese, English, multilingual text recognition |[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
@@ -91,7 +91,7 @@ Relationship of the above models is as follows.
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
-|en_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting english, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) |
+|en_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting english, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) |
|en_PP-OCRv3_rec| [New] Original lightweight model, supporting english, English, multilingual text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
|en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) |
|en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
@@ -108,7 +108,7 @@ Relationship of the above models is as follows.
| ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt | Lightweight model for Kannada recognition |[ka_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ka_PP-OCRv3_rec.yml)|9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) |
| ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |Lightweight model for Tamil recognition|[ta_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ta_PP-OCRv3_rec.yml)|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) |
| latin_PP-OCRv3_rec | ppocr/utils/dict/latin_dict.txt | Lightweight model for latin recognition | [latin_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/latin_PP-OCRv3_rec.yml) |9.7M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_train.tar) |
-| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | Lightweight model for arabic recognition | [arabic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/rec_arabic_lite_train.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) |
+| arabic_PP-OCRv3_rec | ppocr/utils/dict/arabic_dict.txt | Lightweight model for arabic recognition | [arabic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar) |
| cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | Lightweight model for cyrillic recognition | [cyrillic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/cyrillic_PP-OCRv3_rec.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) |
| devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt | Lightweight model for devanagari recognition | [devanagari_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/devanagari_PP-OCRv3_rec.yml) |9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) |
diff --git a/doc/doc_en/multi_languages_en.md b/doc/doc_en/multi_languages_en.md
index 4696a3e842242517d19bcac7d7bdef3b4c233b12..d9cb180f706eebdba4727f7909499487794545b9 100644
--- a/doc/doc_en/multi_languages_en.md
+++ b/doc/doc_en/multi_languages_en.md
@@ -187,10 +187,10 @@ In addition to installing the whl package for quick forecasting,
PPOCR also provides a variety of forecasting deployment methods.
If necessary, you can read related documents:
-- [Python Inference](./inference_en.md)
-- [C++ Inference](../../deploy/cpp_infer/readme_en.md)
+- [Python Inference](./inference_ppocr_en.md)
+- [C++ Inference](../../deploy/cpp_infer/readme.md)
- [Serving](../../deploy/hubserving/readme_en.md)
-- [Mobile](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme_en.md)
+- [Mobile](../../deploy/lite/readme.md)
- [Benchmark](./benchmark_en.md)
diff --git a/doc/doc_en/ppocr_introduction_en.md b/doc/doc_en/ppocr_introduction_en.md
index 8fe6bc683ac69bdff0e3b4297f2eaa95b934fa17..b13d7f9bf1915de4bbbbec7b384d278e1d7ab8b4 100644
--- a/doc/doc_en/ppocr_introduction_en.md
+++ b/doc/doc_en/ppocr_introduction_en.md
@@ -38,7 +38,7 @@ On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detec
PP-OCRv3 upgraded the detection model and recognition model in 9 aspects based on PP-OCRv2:
- PP-OCRv3 detector upgrades the CML(Collaborative Mutual Learning) text detection strategy proposed in PP-OCRv2, and further optimizes the effect of teacher model and student model respectively. In the optimization of teacher model, a pan module with large receptive field named LK-PAN is proposed and the DML distillation strategy is adopted; In the optimization of student model, a FPN module with residual attention mechanism named RSE-FPN is proposed.
-- PP-OCRv3 recognizer is optimized based on text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). SVTR no longer adopts RNN by introducing transformers structure, which can mine the context information of text line image more effectively, so as to improve the ability of text recognition. PP-OCRv3 adopts lightweight text recognition network SVTR_LCNet, guided training of CTC loss by attention loss, data augmentation strategy TextConAug, better pre-trained model by self-supervised TextRotNet, UDML(Unified Deep Mutual Learning), and UIM (Unlabeled Images Mining) to accelerate the model and improve the effect.
+- PP-OCRv3 recognizer is optimized based on text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). SVTR no longer adopts RNN by introducing transformers structure, which can mine the context information of text line image more effectively, so as to improve the ability of text recognition. PP-OCRv3 adopts lightweight text recognition network SVTR_LCNet, guided training of CTC by attention, data augmentation strategy TextConAug, better pre-trained model by self-supervised TextRotNet, UDML(Unified Deep Mutual Learning), and UIM (Unlabeled Images Mining) to accelerate the model and improve the effect.
PP-OCRv3 pipeline is as follows:
diff --git a/doc/doc_en/update_en.md b/doc/doc_en/update_en.md
index a900219b2462524425fc4303ea3bd571efcbab8f..a44dd0d70c611e1b5fb59d1e58b382704d0bbae8 100644
--- a/doc/doc_en/update_en.md
+++ b/doc/doc_en/update_en.md
@@ -1,8 +1,8 @@
# RECENT UPDATES
- 2022.5.9 release PaddleOCR v2.5, including:
- - [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
- - [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image.
- - Interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology.
+ - [PP-OCRv3](./ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
+ - [PPOCRLabelv2](../../PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image.
+ - Interactive e-book [*"Dive into OCR"*](./ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology.
- 2022.5.7 Add support for metric and model logging during training to [Weights & Biases](https://docs.wandb.ai/).
- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM、LayoutLMv2,LayoutXLM).
diff --git a/ppocr/data/lmdb_dataset.py b/ppocr/data/lmdb_dataset.py
index e1b49809d199096ad06b90c4562aa5dbfa634db1..2b1ccaddcea437acb3901d6b0391dd0c7b2954b7 100644
--- a/ppocr/data/lmdb_dataset.py
+++ b/ppocr/data/lmdb_dataset.py
@@ -37,6 +37,8 @@ class LMDBDataSet(Dataset):
if self.do_shuffle:
np.random.shuffle(self.data_idx_order_list)
self.ops = create_operators(dataset_config['transforms'], global_config)
+ self.ext_op_transform_idx = dataset_config.get("ext_op_transform_idx",
+ 2)
ratio_list = dataset_config.get("ratio_list", [1.0])
self.need_reset = True in [x < 1 for x in ratio_list]
@@ -88,6 +90,29 @@ class LMDBDataSet(Dataset):
if imgori is None:
return None
return imgori
+
+ def get_ext_data(self):
+ ext_data_num = 0
+ for op in self.ops:
+ if hasattr(op, 'ext_data_num'):
+ ext_data_num = getattr(op, 'ext_data_num')
+ break
+ load_data_ops = self.ops[:self.ext_op_transform_idx]
+ ext_data = []
+
+ while len(ext_data) < ext_data_num:
+ lmdb_idx, file_idx = self.data_idx_order_list[np.random.randint(self.__len__())]
+ lmdb_idx = int(lmdb_idx)
+ file_idx = int(file_idx)
+ sample_info = self.get_lmdb_sample_info(self.lmdb_sets[lmdb_idx]['txn'],
+ file_idx)
+ if sample_info is None:
+ continue
+ img, label = sample_info
+ data = {'image': img, 'label': label}
+ outs = transform(data, load_data_ops)
+ ext_data.append(data)
+ return ext_data
def get_lmdb_sample_info(self, txn, index):
label_key = 'label-%09d'.encode() % index
@@ -109,6 +134,7 @@ class LMDBDataSet(Dataset):
return self.__getitem__(np.random.randint(self.__len__()))
img, label = sample_info
data = {'image': img, 'label': label}
+ data['ext_data'] = self.get_ext_data()
outs = transform(data, self.ops)
if outs is None:
return self.__getitem__(np.random.randint(self.__len__()))
diff --git a/ppocr/modeling/architectures/__init__.py b/ppocr/modeling/architectures/__init__.py
index e9a01cf0281b91d29f2cce88375be3aaf43feb2e..1c955ef3abe9c38e816616cc9b5399c6832aa5f1 100755
--- a/ppocr/modeling/architectures/__init__.py
+++ b/ppocr/modeling/architectures/__init__.py
@@ -15,10 +15,13 @@
import copy
import importlib
+from paddle.jit import to_static
+from paddle.static import InputSpec
+
from .base_model import BaseModel
from .distillation_model import DistillationModel
-__all__ = ['build_model']
+__all__ = ["build_model", "apply_to_static"]
def build_model(config):
@@ -30,3 +33,36 @@ def build_model(config):
mod = importlib.import_module(__name__)
arch = getattr(mod, name)(config)
return arch
+
+
+def apply_to_static(model, config, logger):
+ if config["Global"].get("to_static", False) is not True:
+ return model
+ assert "image_shape" in config[
+ "Global"], "image_shape must be assigned for static training mode..."
+ supported_list = ["DB", "SVTR"]
+ if config["Architecture"]["algorithm"] in ["Distillation"]:
+ algo = list(config["Architecture"]["Models"].values())[0]["algorithm"]
+ else:
+ algo = config["Architecture"]["algorithm"]
+ assert algo in supported_list, f"algorithms that supports static training must in in {supported_list} but got {algo}"
+
+ specs = [
+ InputSpec(
+ [None] + config["Global"]["image_shape"], dtype='float32')
+ ]
+
+ if algo == "SVTR":
+ specs.append([
+ InputSpec(
+ [None, config["Global"]["max_text_length"]],
+ dtype='int64'), InputSpec(
+ [None, config["Global"]["max_text_length"]], dtype='int64'),
+ InputSpec(
+ [None], dtype='int64'), InputSpec(
+ [None], dtype='float64')
+ ])
+
+ model = to_static(model, input_spec=specs)
+ logger.info("Successfully to apply @to_static with specs: {}".format(specs))
+ return model
diff --git a/ppocr/modeling/heads/rec_sar_head.py b/ppocr/modeling/heads/rec_sar_head.py
index 0e6b34404b61b44bebcbc7d67ddfd0a95382c39b..5e64cae85afafc555f2519ed6dd3f05eafff7ea2 100644
--- a/ppocr/modeling/heads/rec_sar_head.py
+++ b/ppocr/modeling/heads/rec_sar_head.py
@@ -83,7 +83,7 @@ class SAREncoder(nn.Layer):
def forward(self, feat, img_metas=None):
if img_metas is not None:
- assert len(img_metas[0]) == feat.shape[0]
+ assert len(img_metas[0]) == paddle.shape(feat)[0]
valid_ratios = None
if img_metas is not None and self.mask:
@@ -98,9 +98,10 @@ class SAREncoder(nn.Layer):
if valid_ratios is not None:
valid_hf = []
- T = holistic_feat.shape[1]
- for i in range(len(valid_ratios)):
- valid_step = min(T, math.ceil(T * valid_ratios[i])) - 1
+ T = paddle.shape(holistic_feat)[1]
+ for i in range(paddle.shape(valid_ratios)[0]):
+ valid_step = paddle.minimum(
+ T, paddle.ceil(valid_ratios[i] * T).astype('int32')) - 1
valid_hf.append(holistic_feat[i, valid_step, :])
valid_hf = paddle.stack(valid_hf, axis=0)
else:
@@ -247,13 +248,14 @@ class ParallelSARDecoder(BaseDecoder):
# bsz * (seq_len + 1) * h * w * attn_size
attn_weight = self.conv1x1_2(attn_weight)
# bsz * (seq_len + 1) * h * w * 1
- bsz, T, h, w, c = attn_weight.shape
+ bsz, T, h, w, c = paddle.shape(attn_weight)
assert c == 1
if valid_ratios is not None:
# cal mask of attention weight
- for i in range(len(valid_ratios)):
- valid_width = min(w, math.ceil(w * valid_ratios[i]))
+ for i in range(paddle.shape(valid_ratios)[0]):
+ valid_width = paddle.minimum(
+ w, paddle.ceil(valid_ratios[i] * w).astype("int32"))
if valid_width < w:
attn_weight[i, :, :, valid_width:, :] = float('-inf')
@@ -288,7 +290,7 @@ class ParallelSARDecoder(BaseDecoder):
img_metas: [label, valid_ratio]
'''
if img_metas is not None:
- assert len(img_metas[0]) == feat.shape[0]
+ assert paddle.shape(img_metas[0])[0] == paddle.shape(feat)[0]
valid_ratios = None
if img_metas is not None and self.mask:
@@ -302,7 +304,6 @@ class ParallelSARDecoder(BaseDecoder):
# bsz * (seq_len + 1) * C
out_dec = self._2d_attention(
in_dec, feat, out_enc, valid_ratios=valid_ratios)
- # bsz * (seq_len + 1) * num_classes
return out_dec[:, 1:, :] # bsz * seq_len * num_classes
@@ -395,7 +396,6 @@ class SARHead(nn.Layer):
if self.training:
label = targets[0] # label
- label = paddle.to_tensor(label, dtype='int64')
final_out = self.decoder(
feat, holistic_feat, label, img_metas=targets)
else:
diff --git a/ppstructure/vqa/README_ch.md b/ppstructure/vqa/README_ch.md
index ff513f8f7d603d66a372ce383883f3bcf97a7880..b677dc07bce6c1a752d753b6a1c538b4d3f99271 100644
--- a/ppstructure/vqa/README_ch.md
+++ b/ppstructure/vqa/README_ch.md
@@ -52,7 +52,7 @@ PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进
### 3.1 SER
-![](../../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../../doc/vqa/result_ser/zh_val_42_ser.jpg)
+![](../docs/vqa/result_ser/zh_val_0_ser.jpg) | ![](../docs/vqa/result_ser/zh_val_42_ser.jpg)
---|---
图中不同颜色的框表示不同的类别,对于XFUND数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别
@@ -65,7 +65,7 @@ PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进
### 3.2 RE
-![](../../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../../doc/vqa/result_re/zh_val_40_re.jpg)
+![](../docs/vqa/result_re/zh_val_21_re.jpg) | ![](../docs/vqa/result_re/zh_val_40_re.jpg)
---|---
diff --git a/test_tipc/docs/jeston_test_train_inference_python.md b/test_tipc/docs/jeston_test_train_inference_python.md
index 9e9d15fb674ca04558b1f8cb616dc4e44934dbb9..b25175ed0071dd3728ae22c7588ca20535af0505 100644
--- a/test_tipc/docs/jeston_test_train_inference_python.md
+++ b/test_tipc/docs/jeston_test_train_inference_python.md
@@ -115,4 +115,4 @@ ValueError: The results of python_infer_gpu_usetrt_True_precision_fp32_batchsize
## 3. 更多教程
本文档为功能测试用,更丰富的训练预测使用教程请参考:
[模型训练](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/training.md)
-[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference.md)
+[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference_ppocr.md)
diff --git a/test_tipc/docs/mac_test_train_inference_python.md b/test_tipc/docs/mac_test_train_inference_python.md
index ea6e0218b12b342347677ea512d3afd89053261a..c37291a8fc9b239564adce8f556565f51f2a9475 100644
--- a/test_tipc/docs/mac_test_train_inference_python.md
+++ b/test_tipc/docs/mac_test_train_inference_python.md
@@ -152,4 +152,4 @@ ValueError: The results of python_infer_cpu_usemkldnn_False_threads_1_batchsize_
## 3. 更多教程
本文档为功能测试用,更丰富的训练预测使用教程请参考:
[模型训练](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/training.md)
-[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference.md)
+[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference_ppocr.md)
diff --git a/test_tipc/docs/test_train_inference_python.md b/test_tipc/docs/test_train_inference_python.md
index fa969cbe1b9b6fd524efdaad5002afcfcf40e119..99de9400797493f429f8176a9b6b374a76df4872 100644
--- a/test_tipc/docs/test_train_inference_python.md
+++ b/test_tipc/docs/test_train_inference_python.md
@@ -153,4 +153,4 @@ python3.7 test_tipc/compare_results.py --gt_file=./test_tipc/results/python_*.tx
## 3. 更多教程
本文档为功能测试用,更丰富的训练预测使用教程请参考:
[模型训练](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/training.md)
-[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference.md)
+[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference_ppocr.md)
diff --git a/test_tipc/docs/win_test_train_inference_python.md b/test_tipc/docs/win_test_train_inference_python.md
index 95585af0380be410c230a799b7d61e607de5f654..6e3ce93bb3123133075b9d65c64850a87de5f828 100644
--- a/test_tipc/docs/win_test_train_inference_python.md
+++ b/test_tipc/docs/win_test_train_inference_python.md
@@ -156,4 +156,4 @@ ValueError: The results of python_infer_cpu_usemkldnn_False_threads_1_batchsize_
## 3. 更多教程
本文档为功能测试用,更丰富的训练预测使用教程请参考:
[模型训练](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/training.md)
-[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference.md)
+[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference_ppocr.md)
diff --git a/tools/export_model.py b/tools/export_model.py
index e971f6cb20025d529d0387d287ec87a76abbdbe7..c0cbcd361cec31c51616a7154836c234f076a86e 100755
--- a/tools/export_model.py
+++ b/tools/export_model.py
@@ -17,7 +17,7 @@ import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
-sys.path.append(os.path.abspath(os.path.join(__dir__, "..")))
+sys.path.insert(0, os.path.abspath(os.path.join(__dir__, "..")))
import argparse
diff --git a/tools/infer/utility.py b/tools/infer/utility.py
index 74ec42ec842abe0f214f13eea6b30a613cfc517b..d27aec63edd2fb5c0240ff0254ce1057b62162b0 100644
--- a/tools/infer/utility.py
+++ b/tools/infer/utility.py
@@ -34,6 +34,7 @@ def init_args():
parser = argparse.ArgumentParser()
# params for prediction engine
parser.add_argument("--use_gpu", type=str2bool, default=True)
+ parser.add_argument("--use_xpu", type=str2bool, default=False)
parser.add_argument("--ir_optim", type=str2bool, default=True)
parser.add_argument("--use_tensorrt", type=str2bool, default=False)
parser.add_argument("--min_subgraph_size", type=int, default=15)
@@ -285,6 +286,8 @@ def create_predictor(args, mode, logger):
config.set_trt_dynamic_shape_info(
min_input_shape, max_input_shape, opt_input_shape)
+ elif args.use_xpu:
+ config.enable_xpu(10 * 1024 * 1024)
else:
config.disable_gpu()
if hasattr(args, "cpu_threads"):
diff --git a/tools/infer_rec.py b/tools/infer_rec.py
index 193e24a4de12392130d16b86a3407db74602e1f4..a08fa25b467482da4a2996912ad2cc8cc7c398da 100755
--- a/tools/infer_rec.py
+++ b/tools/infer_rec.py
@@ -157,7 +157,7 @@ def main():
if info is not None:
logger.info("\t result: {}".format(info))
- fout.write(file + "\t" + info)
+ fout.write(file + "\t" + info + "\n")
logger.info("success!")
diff --git a/tools/program.py b/tools/program.py
index 7c02dc0149f36085ef05ca378b79d27e92d6dd57..5d726b820739a166f3375b8ab303d34b22e25d87 100755
--- a/tools/program.py
+++ b/tools/program.py
@@ -112,20 +112,25 @@ def merge_config(config, opts):
return config
-def check_gpu(use_gpu):
+def check_device(use_gpu, use_xpu=False):
"""
Log error and exit when set use_gpu=true in paddlepaddle
cpu version.
"""
- err = "Config use_gpu cannot be set as true while you are " \
- "using paddlepaddle cpu version ! \nPlease try: \n" \
- "\t1. Install paddlepaddle-gpu to run model on GPU \n" \
- "\t2. Set use_gpu as false in config file to run " \
+ err = "Config {} cannot be set as true while your paddle " \
+ "is not compiled with {} ! \nPlease try: \n" \
+ "\t1. Install paddlepaddle to run model on {} \n" \
+ "\t2. Set {} as false in config file to run " \
"model on CPU"
try:
+ if use_gpu and use_xpu:
+ print("use_xpu and use_gpu can not both be ture.")
if use_gpu and not paddle.is_compiled_with_cuda():
- print(err)
+ print(err.format("use_gpu", "cuda", "gpu", "use_gpu"))
+ sys.exit(1)
+ if use_xpu and not paddle.device.is_compiled_with_xpu():
+ print(err.format("use_xpu", "xpu", "xpu", "use_xpu"))
sys.exit(1)
except Exception as e:
pass
@@ -301,6 +306,7 @@ def train(config,
stats['lr'] = lr
train_stats.update(stats)
+
if log_writer is not None and dist.get_rank() == 0:
log_writer.log_metrics(metrics=train_stats.get(), prefix="TRAIN", step=global_step)
@@ -547,7 +553,7 @@ def preprocess(is_train=False):
# check if set use_gpu=True in paddlepaddle cpu version
use_gpu = config['Global']['use_gpu']
- check_gpu(use_gpu)
+ use_xpu = config['Global'].get('use_xpu', False)
# check if set use_xpu=True in paddlepaddle cpu/gpu version
use_xpu = False
@@ -562,11 +568,13 @@ def preprocess(is_train=False):
'SEED', 'SDMGR', 'LayoutXLM', 'LayoutLM', 'PREN', 'FCE', 'SVTR'
]
- device = 'cpu'
- if use_gpu:
- device = 'gpu:{}'.format(dist.ParallelEnv().dev_id)
if use_xpu:
- device = 'xpu'
+ device = 'xpu:{0}'.format(os.getenv('FLAGS_selected_xpus', 0))
+ else:
+ device = 'gpu:{}'.format(dist.ParallelEnv()
+ .dev_id) if use_gpu else 'cpu'
+ check_device(use_gpu, use_xpu)
+
device = paddle.set_device(device)
config['Global']['distributed'] = dist.get_world_size() != 1
diff --git a/tools/train.py b/tools/train.py
index 42aba548d6bf5fc35f033ef2baca0fb54d79e75a..b7c25e34231fb650fd2c7c89dc17320f561962f9 100755
--- a/tools/train.py
+++ b/tools/train.py
@@ -35,6 +35,7 @@ from ppocr.postprocess import build_post_process
from ppocr.metrics import build_metric
from ppocr.utils.save_load import load_model
from ppocr.utils.utility import set_seed
+from ppocr.modeling.architectures import apply_to_static
import tools.program as program
dist.get_world_size()
@@ -121,6 +122,8 @@ def main(config, device, logger, vdl_writer):
if config['Global']['distributed']:
model = paddle.DataParallel(model)
+ model = apply_to_static(model, config, logger)
+
# build loss
loss_class = build_loss(config['Loss'])