diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md
index 8a71b75c249b794e7ecda0ad14dc8cd2f07447e0..2cf0732219ac9cd2309ae24896d7de1499986461 100644
--- a/doc/doc_ch/detection.md
+++ b/doc/doc_ch/detection.md
@@ -13,6 +13,7 @@
   - [2.5 分布式训练](#25-分布式训练)
   - [2.6 知识蒸馏训练](#26-知识蒸馏训练)
   - [2.7 其他训练环境](#27-其他训练环境)
+  - [2.8 模型微调](#28-模型微调)
 - [3. 模型评估与预测](#3-模型评估与预测)
   - [3.1 指标评估](#31-指标评估)
   - [3.2 测试检测效果](#32-测试检测效果)
@@ -141,7 +142,8 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
      Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
  ```
 
-<a name="26---fleet---"></a>
+<a name="25---fleet---"></a>
+
 ## 2.5 分布式训练
 
 多机多卡训练时，通过 `--ips` 参数设置使用的机器IP地址，通过 `--gpus` 参数设置使用的GPU ID：
@@ -151,7 +153,7 @@ python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1
      -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
 ```
 
-**注意:** 采用多机多卡训练时，需要替换上面命令中的ips值为您机器的地址，机器之间需要能够相互ping通。另外，训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`。
+**注意:** （1）采用多机多卡训练时，需要替换上面命令中的ips值为您机器的地址，机器之间需要能够相互ping通；（2）训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`；（3）更多关于分布式训练的性能优势等信息，请参考：[分布式训练教程](./distributed_training.md)。
 
 
 <a name="26---distill---"></a>
@@ -177,6 +179,13 @@ Windows平台只支持`单卡`的训练与预测，指定GPU进行训练`set CUD
 - Linux DCU
 DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3`，其余训练评估预测命令与Linux GPU完全相同。
 
+<a name="28-模型微调"></a>
+
+## 2.8 模型微调
+
+实际使用过程中，建议加载官方提供的预训练模型，在自己的数据集中进行微调，关于检测模型的微调方法，请参考：[模型微调教程](./finetune.md)。
+
+
 <a name="3--------"></a>
 # 3. 模型评估与预测
 
@@ -196,6 +205,7 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{pat
 ## 3.2 测试检测效果
 
 测试单张图像的检测效果：
+
 ```shell
 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"
 ```
@@ -226,14 +236,19 @@ python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained
 ```
 
 DB检测模型inference 模型预测：
+
 ```shell
 python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
 ```
 如果是其他检测，比如EAST模型，det_algorithm参数需要修改为EAST，默认为DB算法：
+
 ```shell
 python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
 ```
 
+更多关于推理超参数的配置与解释，请参考：[模型推理超参数解释教程](./inference_args.md)。
+
+
 <a name="5-faq"></a>
 # 5. FAQ
 
diff --git a/doc/doc_ch/distributed_training.md b/doc/doc_ch/distributed_training.md
index e0251b21ea1157084e4e1b1d77429264d452aa20..6afa4a5b9f77ce238cb18fcb4160e49f7b465369 100644
--- a/doc/doc_ch/distributed_training.md
+++ b/doc/doc_ch/distributed_training.md
@@ -41,11 +41,16 @@ python3 -m paddle.distributed.launch \
 
 ## 性能效果测试
 
-* 基于单机8卡P40，和2机8卡P40，在26W公开识别数据集(LSVT, RCTW, MTWI)上进行训练，最终耗时如下。
+* 在2机8卡P40的机器上，基于26W公开识别数据集(LSVT, RCTW, MTWI)上进行训练，最终耗时如下。
 
-|         模型             |     配置文件 |  机器数量    | 每台机器的GPU数量  |   训练时间    | 识别Acc    | 加速比 |
-| :----------------------: | :------------: | :------------: | :---------------: | :----------: | :-----------: | :-----------: |
-|          CRNN        |   configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml  | 1     |  8  |  60h  |  66.7% | - |
-|          CRNN        |   configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml   | 2   |  8  |  40h  |  67.0% | 150% |
+| 模型   | 配置  | 精度     | 单机8卡耗时 | 2机8卡耗时 | 加速比 |
+|------|-----|--------|--------|--------|-----|
+| CRNN | [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) | 67.0% | 2.50d   | 1.67d  | **1.5** |
 
-可以看出，精度没有下降的情况下，训练时间由60h缩短为了40h，加速比可以达到60h/40h=150%，效率为60h/(40h*2)=75%。
+
+* 在4机8卡V100的机器上，基于全量数据训练，最终耗时如下
+
+
+| 模型   | 配置  | 精度     | 单机8卡耗时 | 4机8卡耗时 | 加速比 |
+|------|-----|--------|--------|--------|-----|
+| SVTR | [ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) | 74.0% | 10d   | 2.84d  | **3.5** |
diff --git a/doc/doc_ch/inference_args.md b/doc/doc_ch/inference_args.md
new file mode 100644
index 0000000000000000000000000000000000000000..fa188ab7c800eaabae8a4ff54413af162dd60e43
--- /dev/null
+++ b/doc/doc_ch/inference_args.md
@@ -0,0 +1,120 @@
+# PaddleOCR模型推理参数解释
+
+在使用PaddleOCR进行模型推理时，可以自定义修改参数，来修改模型、数据、预处理、后处理等内容（参数文件：[utility.py](../../tools/infer/utility.py)），详细的参数解释如下所示。
+
+* 全局信息
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  image_dir | str | 无，必须显式指定 | 图像或者文件夹路径 |
+|  vis_font_path | str | "./doc/fonts/simfang.ttf" | 用于可视化的字体路径 |
+|  drop_score | float | 0.5 | 识别得分小于该值的结果会被丢弃，不会作为返回结果 |
+|  use_pdserving | bool | False | 是否使用Paddle Serving进行预测 |
+|  warmup | bool | False | 是否开启warmup，在统计预测耗时的时候，可以使用这种方法 |
+|  draw_img_save_dir | str | "./inference_results" | 系统串联预测OCR结果的保存文件夹 |
+|  save_crop_res | bool | False  | 是否保存OCR的识别文本图像 |
+|  crop_res_save_dir | str | "./output" | 保存OCR识别出来的文本图像路径 |
+|  use_mp | bool | False | 是否开启多进程预测  |
+|  total_process_num | int | 6 | 开启的进城数，`use_mp`为`True`时生效  |
+|  process_id | int | 0 | 当前进程的id号，无需自己修改  |
+|  benchmark | bool | False | 是否开启benchmark，对预测速度、显存占用等进行统计  |
+|  save_log_path | str | "./log_output/" | 开启`benchmark`时，日志结果的保存文件夹 |
+|  show_log | bool | True | 是否显示预测中的日志信息  |
+|  use_onnx | bool | False | 是否开启onnx预测 |
+
+
+* 预测引擎相关
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  use_gpu | bool | True | 是否使用GPU进行预测 |
+|  ir_optim | bool | True | 是否对计算图进行分析与优化，开启后可以加速预测过程 |
+|  use_tensorrt | bool | False | 是否开启tensorrt |
+|  min_subgraph_size | int | 15 | tensorrt中最小子图size，当子图的size大于该值时，才会尝试对该子图使用trt engine计算 |
+|  precision | str | fp32 | 预测的精度，支持`fp32`, `fp16`, `int8` 3种输入 |
+|  enable_mkldnn | bool | True | 是否开启mkldnn |
+|  cpu_threads | int | 10 | 开启mkldnn时，cpu预测的线程数 |
+
+* 文本检测模型相关
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  det_algorithm | str | "DB" | 文本检测算法名称，目前支持`DB`, `EAST`, `SAST`, `PSE`  |
+|  det_model_dir | str | xx | 检测inference模型路径 |
+|  det_limit_side_len | int | 960 | 检测的图像边长限制 |
+|  det_limit_type | str | "max" | 检测的变成限制类型，目前支持`min`, `max`，`min`表示保证图像最短边不小于`det_limit_side_len`，`max`表示保证图像最长边不大于`det_limit_side_len` |
+
+其中，DB算法相关参数如下
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  det_db_thresh | float | 0.3 | DB输出的概率图中，得分大于该阈值的像素点才会被认为是文字像素点 |
+|  det_db_box_thresh | float | 0.6 | 检测结果边框内，所有像素点的平均得分大于该阈值时，该结果会被认为是文字区域 |
+|  det_db_unclip_ratio | float | 1.5 | `Vatti clipping`算法的扩张系数，使用该方法对文字区域进行扩张 |
+|  max_batch_size | int | 10 | 预测的batch size |
+|  use_dilation | bool | False | 是否对分割结果进行膨胀以获取更优检测效果 |
+|  det_db_score_mode | str | "fast" | DB的检测结果得分计算方法，支持`fast`和`slow`，`fast`是根据polygon的外接矩形边框内的所有像素计算平均得分，`slow`是根据原始polygon内的所有像素计算平均得分，计算速度相对较慢一些，但是更加准确一些。 |
+
+EAST算法相关参数如下
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  det_east_score_thresh | float | 0.8 | EAST后处理中score map的阈值 |
+|  det_east_cover_thresh | float | 0.1 | EAST后处理中文本框的平均得分阈值 |
+|  det_east_nms_thresh | float | 0.2 | EAST后处理中nms的阈值 |
+
+SAST算法相关参数如下
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  det_sast_score_thresh | float | 0.5 | SAST后处理中的得分阈值 |
+|  det_sast_nms_thresh | float | 0.5 | SAST后处理中nms的阈值 |
+|  det_sast_polygon | bool | False | 是否多边形检测，弯曲文本场景（如Total-Text）设置为True |
+
+PSE算法相关参数如下
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  det_pse_thresh | float | 0.0 | 对输出图做二值化的阈值 |
+|  det_pse_box_thresh | float | 0.85 | 对box进行过滤的阈值，低于此阈值的丢弃 |
+|  det_pse_min_area | float | 16 | box的最小面积，低于此阈值的丢弃 |
+|  det_pse_box_type | str | "box" | 返回框的类型，box:四点坐标，poly: 弯曲文本的所有点坐标 |
+|  det_pse_scale | int | 1 | 输入图像相对于进后处理的图的比例，如`640*640`的图像，网络输出为`160*160`，scale为2的情况下，进后处理的图片shape为`320*320`。这个值调大可以加快后处理速度，但是会带来精度的下降 |
+
+* 文本识别模型相关
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  rec_algorithm | str | "CRNN" | 文本识别算法名称，目前支持`CRNN`, `SRN`, `RARE`, `NETR`, `SAR` |
+|  rec_model_dir | str | 无，如果使用识别模型，该项是必填项 | 识别inference模型路径 |
+|  rec_image_shape | list | [3, 32, 320] | 识别时的图像尺寸， |
+|  rec_batch_num | int | 6 | 识别的batch size |
+|  max_text_length | int | 25 | 识别结果最大长度，在`SRN`中有效 |
+|  rec_char_dict_path | str | "./ppocr/utils/ppocr_keys_v1.txt" | 识别的字符字典文件 |
+|  use_space_char | bool | True | 是否包含空格，如果为`True`，则会在最后字符字典中补充`空格`字符 |
+
+
+* 端到端文本检测与识别模型相关
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  e2e_algorithm | str | "PGNet" | 端到端算法名称，目前支持`PGNet` |
+|  e2e_model_dir | str | 无，如果使用端到端模型，该项是必填项 | 端到端模型inference模型路径 |
+|  e2e_limit_side_len | int | 768 | 端到端的输入图像边长限制 |
+|  e2e_limit_type | str | "max" | 端到端的边长限制类型，目前支持`min`, `max`，`min`表示保证图像最短边不小于`e2e_limit_side_len`，`max`表示保证图像最长边不大于`e2e_limit_side_len` |
+|  e2e_pgnet_score_thresh | float | 0.5 | 端到端得分阈值，小于该阈值的结果会被丢弃 |
+|  e2e_char_dict_path | str | "./ppocr/utils/ic15_dict.txt" | 识别的字典文件路径 |
+|  e2e_pgnet_valid_set | str | "totaltext" | 验证集名称，目前支持`totaltext`, `partvgg`，不同数据集对应的后处理方式不同，与训练过程保持一致即可 |
+|  e2e_pgnet_mode | str | "fast" | PGNet的检测结果得分计算方法，支持`fast`和`slow`，`fast`是根据polygon的外接矩形边框内的所有像素计算平均得分，`slow`是根据原始polygon内的所有像素计算平均得分，计算速度相对较慢一些，但是更加准确一些。 |
+
+
+* 方向分类器模型相关
+
+| 参数名称 | 类型 | 默认值 | 含义 |
+| :--: | :--: | :--: | :--: |
+|  use_angle_cls | bool | False | 是否使用方向分类器 |
+|  cls_model_dir | str | 无，如果需要使用，则必须显式指定路径 | 方向分类器inference模型路径 |
+|  cls_image_shape | list | [3, 48, 192] | 预测尺度 |
+|  label_list | list | ['0', '180'] | class id对应的角度值 |
+|  cls_batch_num | int | 6 | 方向分类器预测的batch size |
+|  cls_thresh | float | 0.9 | 预测阈值，模型预测结果为180度，且得分大于该阈值时，认为最终预测结果为180度，需要翻转 |
diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md
index 5134422417fbd78e5635fea5dbede4f3516aeee5..acf09f7bdecf41adfeee26efde6afaff8db7a41e 100644
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@@ -18,6 +18,7 @@
   - [2.6. 知识蒸馏训练](#26-知识蒸馏训练)
   - [2.7. 多语言模型训练](#27-多语言模型训练)
   - [2.8. 其他训练环境](#28-其他训练环境)
+  - [2.9. 模型微调](#29-模型微调)
 - [3. 模型评估与预测](#3-模型评估与预测)
   - [3.1. 指标评估](#31-指标评估)
   - [3.2. 测试识别效果](#32-测试识别效果)
@@ -387,7 +388,7 @@ python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1
      -o Global.pretrained_model=./pretrain_models/en_PP-OCRv3_rec_train/best_accuracy
 ```
 
-**注意:** 采用多机多卡训练时，需要替换上面命令中的ips值为您机器的地址，机器之间需要能够相互ping通。另外，训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`。
+**注意:** （1）采用多机多卡训练时，需要替换上面命令中的ips值为您机器的地址，机器之间需要能够相互ping通；（2）训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`；（3）更多关于分布式训练的性能优势等信息，请参考：[分布式训练教程](./distributed_training.md)。
 
 ## 2.6. 知识蒸馏训练
 
@@ -462,6 +463,11 @@ Windows平台只支持`单卡`的训练与预测，指定GPU进行训练`set CUD
 - Linux DCU
 DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3`，其余训练评估预测命令与Linux GPU完全相同。
 
+## 2.9 模型微调
+
+实际使用过程中，建议加载官方提供的预训练模型，在自己的数据集中进行微调，关于识别模型的微调方法，请参考：[模型微调教程](./finetune.md)。
+
+
 # 3. 模型评估与预测
 
 ## 3.1. 指标评估
@@ -564,12 +570,13 @@ inference/en_PP-OCRv3_rec/
 
 - 自定义模型推理
 
-  如果训练时修改了文本的字典，在使用inference模型预测时，需要通过`--rec_char_dict_path`指定使用的字典路径
+  如果训练时修改了文本的字典，在使用inference模型预测时，需要通过`--rec_char_dict_path`指定使用的字典路径，更多关于推理超参数的配置与解释，请参考：[模型推理超参数解释教程](./inference_args.md)。
 
   ```
   python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 48, 320" --rec_char_dict_path="your text dict path"
   ```
 
+
 # 5. FAQ
 
 Q1: 训练模型转inference 模型之后预测效果不一致？
diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md
index 76e0f8509b92dfaae62dce7ba2b4b73d39da1600..f85bf585cb66332d90de8d66ed315cb04ece7636 100644
--- a/doc/doc_en/detection_en.md
+++ b/doc/doc_en/detection_en.md
@@ -159,7 +159,7 @@ python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1
      -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
 ```
 
-**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`.
+**Note:** (1) When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. (2) Training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`. (3) For more details about the distributed training speedup ratio, please refer to [Distributed Training Tutorial](./distributed_training_en.md).
 
 ### 2.6 Training with knowledge distillation
 
diff --git a/doc/doc_en/distributed_training.md b/doc/doc_en/distributed_training_en.md
similarity index 70%
rename from doc/doc_en/distributed_training.md
rename to doc/doc_en/distributed_training_en.md
index 2822ee5e4ea52720a458e4060d8a09be7b98846b..5a219ed2b494d6239096ff634dfdc702c4be9419 100644
--- a/doc/doc_en/distributed_training.md
+++ b/doc/doc_en/distributed_training_en.md
@@ -40,11 +40,17 @@ python3 -m paddle.distributed.launch \
 
 ## Performance comparison
 
-* Based on 26W public recognition dataset (LSVT, rctw, mtwi), training on single 8-card P40 and dual 8-card P40, the final time consumption is as follows.
+* On two 8-card P40 graphics cards, the final time consumption and speedup ratio for public recognition dataset (LSVT, RCTW, MTWI) containing 260k images are as follows.
 
-|   Model   |   Config file  |  Number of machines |   Number of GPUs per machine   |   Training time      | Recognition acc  | Speedup ratio |
-| :-------: | :------------: |  :----------------: | :----------------------------: | :------------------: | :--------------: | :-----------: |
-|   CRNN    |   configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml   |   1          |  8  |  60h  |  66.7% | - |
-|   CRNN    |   configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml   |   2          |  8  |  40h  |  67.0% | 150% |
 
-It can be seen that the training time is shortened from 60h to 40h, the speedup ratio can reach 150% (60h / 40h), and the efficiency is 75% (60h / (40h * 2)).
+| Model   | Config file  | Recognition acc     | single 8-card training time | two 8-card training time | Speedup ratio |
+|------|-----|--------|--------|--------|-----|
+| CRNN | [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) | 67.0% | 2.50d   | 1.67d  | **1.5** |
+
+
+* On four 8-card V100 graphics cards, the final time consumption and speedup ratio for full data are as follows.
+
+
+| Model   | Config file  | Recognition acc     | single 8-card training time | four 8-card training time | Speedup ratio |
+|------|-----|--------|--------|--------|-----|
+| SVTR | [ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) | 74.0% | 10d   | 2.84d  | **3.5** |
diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md
index 60b4a1b26b373adc562ab9624e55ffe59a775a35..7d31b0ffe28c59ad3397d06fa178bcf8cbb822e9 100644
--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
@@ -306,7 +306,7 @@ python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1
      -o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train
 ```
 
-**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`.
+**Note:** (1) When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. (2) Training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`. (3) For more details about the distributed training speedup ratio, please refer to [Distributed Training Tutorial](./distributed_training_en.md).
 
 <a name="kd"></a>
 ### 2.6 Training with Knowledge Distillation