Merge pull request #7308 from andyjpaddle/add_args_doc_en

Add args doc en

Merge pull request #7308 from andyjpaddle/add_args_doc_en
Add args doc en
a9ee616c · MissPenguin · GitHub · 7663f45d · 6ac2180f · a9ee616c
5 changed file
--- a/doc/doc_ch/inference_args.md
+++ b/doc/doc_ch/inference_args.md
@@ -15,7 +15,7 @@
 |  save_crop_res | bool | False  | 是否保存OCR的识别文本图像 |
 |  crop_res_save_dir | str | "./output" | 保存OCR识别出来的文本图像路径 |
 |  use_mp | bool | False | 是否开启多进程预测  |
-|  total_process_num | int | 6 | 开启的进城数，`use_mp`为`True`时生效  |
+|  total_process_num | int | 6 | 开启的进程数，`use_mp`为`True`时生效  |
 |  process_id | int | 0 | 当前进程的id号，无需自己修改  |
 |  benchmark | bool | False | 是否开启benchmark，对预测速度、显存占用等进行统计  |
 |  save_log_path | str | "./log_output/" | 开启`benchmark`时，日志结果的保存文件夹 |
@@ -39,10 +39,10 @@
 | 参数名称 | 类型 | 默认值 | 含义 |
 | :--: | :--: | :--: | :--: |
-|  det_algorithm | str | "DB" | 文本检测算法名称，目前支持`DB`, `EAST`, `SAST`, `PSE`  |
+|  det_algorithm | str | "DB" | 文本检测算法名称，目前支持`DB`, `EAST`, `SAST`, `PSE`, `DB++`, `FCE`  |
 |  det_model_dir | str | xx | 检测inference模型路径 |
 |  det_limit_side_len | int | 960 | 检测的图像边长限制 |
-|  det_limit_type | str | "max" | 检测的变成限制类型，目前支持`min`, `max`，`min`表示保证图像最短边不小于`det_limit_side_len`，`max`表示保证图像最长边不大于`det_limit_side_len` |
+|  det_limit_type | str | "max" | 检测的边长限制类型，目前支持`min`和`max`，`min`表示保证图像最短边不小于`det_limit_side_len`，`max`表示保证图像最长边不大于`det_limit_side_len` |
 其中，DB算法相关参数如下
@@ -85,9 +85,9 @@ PSE算法相关参数如下
 | 参数名称 | 类型 | 默认值 | 含义 |
 | :--: | :--: | :--: | :--: |
-|  rec_algorithm | str | "CRNN" | 文本识别算法名称，目前支持`CRNN`, `SRN`, `RARE`, `NETR`, `SAR` |
+|  rec_algorithm | str | "CRNN" | 文本识别算法名称，目前支持`CRNN`, `SRN`, `RARE`, `NETR`, `SAR`, `ViTSTR`, `ABINet`, `VisionLAN`, `SPIN`, `RobustScanner`, `SVTR`, `SVTR_LCNet` |
 |  rec_model_dir | str | 无，如果使用识别模型，该项是必填项 | 识别inference模型路径 |
-|  rec_image_shape | list | [3, 32, 320] | 识别时的图像尺寸， |
+|  rec_image_shape | list | [3, 48, 320] | 识别时的图像尺寸 |
 |  rec_batch_num | int | 6 | 识别的batch size |
 |  max_text_length | int | 25 | 识别结果最大长度，在`SRN`中有效 |
 |  rec_char_dict_path | str | "./ppocr/utils/ppocr_keys_v1.txt" | 识别的字符字典文件 |

--- a/doc/doc_ch/inference_ppocr.md
+++ b/doc/doc_ch/inference_ppocr.md
@@ -158,3 +158,5 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --de
 执行命令后，识别结果图像如下：
 ![](../imgs_results/system_res_00018069_v3.jpg)
+更多关于推理超参数的配置与解释，请参考：[模型推理超参数解释教程](./inference_args.md)。
--- a/doc/doc_en/inference_args_en.md
+++ b/doc/doc_en/inference_args_en.md
+# PaddleOCR Model Inference Parameter Explanation
+When using PaddleOCR for model inference, you can customize the modification parameters to modify the model, data, preprocessing, postprocessing, etc.（parameter file：[utility.py](../../tools/infer/utility.py)），The detailed parameter explanation is as follows:
+* Global parameters
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  image_dir | str | None, must be specified explicitly | Image or folder path |
+|  vis_font_path | str | "./doc/fonts/simfang.ttf" | font path for visualization |
+|  drop_score | float | 0.5 | Results with a recognition score less than this value will be discarded and will not be returned as results |
+|  use_pdserving | bool | False | Whether to use Paddle Serving for prediction |
+|  warmup | bool | False | Whether to enable warmup, this method can be used when statistical prediction time |
+|  draw_img_save_dir | str | "./inference_results" | The saving folder of the system's tandem prediction OCR results |
+|  save_crop_res | bool | False  | Whether to save the recognized text image for OCR |
+|  crop_res_save_dir | str | "./output" | Save the text image path recognized by OCR |
+|  use_mp | bool | False | Whether to enable multi-process prediction  |
+|  total_process_num | int | 6 | The number of processes, which takes effect when `use_mp` is `True` |
+|  process_id | int | 0 | The id number of the current process, no need to modify it yourself |
+|  benchmark | bool | False | Whether to enable benchmark, and make statistics on prediction speed, memory usage, etc. |
+|  save_log_path | str | "./log_output/" | Folder where log results are saved when `benchmark` is enabled |
+|  show_log | bool | True | Whether to show the log information in the inference |
+|  use_onnx | bool | False | Whether to enable onnx prediction |
+* Prediction engine related parameters
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  use_gpu | bool | True | Whether to use GPU for prediction |
+|  ir_optim | bool | True | Whether to analyze and optimize the calculation graph. The prediction process can be accelerated when `ir_optim` is enabled |
+|  use_tensorrt | bool | False | Whether to enable tensorrt |
+|  min_subgraph_size | int | 15 | The minimum subgraph size in tensorrt. When the size of the subgraph is greater than this value, it will try to use the trt engine to calculate the subgraph. |
+|  precision | str | fp32 | The precision of prediction, supports `fp32`, `fp16`, `int8` |
+|  enable_mkldnn | bool | True | Whether to enable mkldnn |
+|  cpu_threads | int | 10 | When mkldnn is enabled, the number of threads predicted by the cpu |
+* Text detection model related parameters
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  det_algorithm | str | "DB" | Text detection algorithm name, currently supports `DB`, `EAST`, `SAST`, `PSE`, `DB++`, `FCE` |
+|  det_model_dir | str | xx | Detection inference model paths |
+|  det_limit_side_len | int | 960 | image side length limit |
+|  det_limit_type | str | "max" | The side length limit type, currently supports `min`and `max`. `min` means to ensure that the shortest side of the image is not less than `det_limit_side_len`, `max` means to ensure that the longest side of the image is not greater than `det_limit_side_len` |
+The relevant parameters of the DB algorithm are as follows
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  det_db_thresh | float | 0.3 | In the probability map output by DB, only pixels with a score greater than this threshold will be considered as text pixels |
+|  det_db_box_thresh | float | 0.6 | Within the detection box, when the average score of all pixels is greater than the threshold, the result will be considered as a text area |
+|  det_db_unclip_ratio | float | 1.5 | The expansion factor of the `Vatti clipping` algorithm, which is used to expand the text area |
+|  max_batch_size | int | 10 | max batch size |
+|  use_dilation | bool | False | Whether to inflate the segmentation results to obtain better detection results |
+|  det_db_score_mode | str | "fast" | DB detection result score calculation method, supports `fast` and `slow`, `fast` calculates the average score according to all pixels within the bounding rectangle of the polygon, `slow` calculates the average score according to all pixels within the original polygon, The calculation speed is relatively slower, but more accurate. |
+The relevant parameters of the EAST algorithm are as follows
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  det_east_score_thresh | float | 0.8 | Threshold for score map in EAST postprocess |
+|  det_east_cover_thresh | float | 0.1 | Average score threshold for text boxes in EAST postprocess |
+|  det_east_nms_thresh | float | 0.2 | Threshold of nms in EAST postprocess |
+The relevant parameters of the SAST algorithm are as follows
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  det_sast_score_thresh | float | 0.5 | Score thresholds in SAST postprocess |
+|  det_sast_nms_thresh | float | 0.5 | Thresholding of nms in SAST postprocess |
+|  det_sast_polygon | bool | False | Whether polygon detection, curved text scene (such as Total-Text) is set to True |
+The relevant parameters of the PSE algorithm are as follows
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  det_pse_thresh | float | 0.0 | Threshold for binarizing the output image |
+|  det_pse_box_thresh | float | 0.85 | Threshold for filtering boxes, below this threshold is discarded |
+|  det_pse_min_area | float | 16 | The minimum area of the box, below this threshold is discarded |
+|  det_pse_box_type | str | "box" | The type of the returned box, box: four point coordinates, poly: all point coordinates of the curved text |
+|  det_pse_scale | int | 1 | The ratio of the input image relative to the post-processed image, such as an image of `640*640`, the network output is `160*160`, and when the scale is 2, the shape of the post-processed image is `320*320`. Increasing this value can speed up the post-processing speed, but it will bring about a decrease in accuracy |
+* Text recognition model related parameters
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  rec_algorithm | str | "CRNN" | Text recognition algorithm name, currently supports `CRNN`, `SRN`, `RARE`, `NETR`, `SAR`, `ViTSTR`, `ABINet`, `VisionLAN`, `SPIN`, `RobustScanner`, `SVTR`, `SVTR_LCNet` |
+|  rec_model_dir | str | None, it is required if using the recognition model | recognition inference model paths |
+|  rec_image_shape | list | [3, 48, 320] | Image size at the time of recognition |
+|  rec_batch_num | int | 6 | batch size |
+|  max_text_length | int | 25 | The maximum length of the recognition result, valid in `SRN` |
+|  rec_char_dict_path | str | "./ppocr/utils/ppocr_keys_v1.txt" | character dictionary file |
+|  use_space_char | bool | True | Whether to include spaces, if `True`, the `space` character will be added at the end of the character dictionary |
+* End-to-end text detection and recognition model related parameters
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  e2e_algorithm | str | "PGNet" | End-to-end algorithm name, currently supports `PGNet` |
+|  e2e_model_dir | str | None, it is required if using the end-to-end model | end-to-end model inference model path |
+|  e2e_limit_side_len | int | 768 | End-to-end input image side length limit |
+|  e2e_limit_type | str | "max" | End-to-end side length limit type, currently supports `min` and `max`. `min` means to ensure that the shortest side of the image is not less than `e2e_limit_side_len`, `max` means to ensure that the longest side of the image is not greater than `e2e_limit_side_len` |
+|  e2e_pgnet_score_thresh | float | 0.5 | End-to-end score threshold, results below this threshold are discarded |
+|  e2e_char_dict_path | str | "./ppocr/utils/ic15_dict.txt" | Recognition dictionary file path |
+|  e2e_pgnet_valid_set | str | "totaltext" | The name of the validation set, currently supports `totaltext`, `partvgg`, the post-processing methods corresponding to different data sets are different, and it can be consistent with the training process |
+|  e2e_pgnet_mode | str | "fast" | PGNet's detection result score calculation method, supports `fast` and `slow`, `fast` calculates the average score according to all pixels within the bounding rectangle of the polygon, `slow` calculates the average score according to all pixels within the original polygon, The calculation speed is relatively slower, but more accurate. |
+* Angle classifier model related parameters
+| parameters | type | default | implication |
+| :--: | :--: | :--: | :--: |
+|  use_angle_cls | bool | False | whether to use an angle classifier |
+|  cls_model_dir | str | None, if you need to use, you must specify the path explicitly | angle classifier inference model path |
+|  cls_image_shape | list | [3, 48, 192] | prediction shape |
+|  label_list | list | ['0', '180'] | The angle value corresponding to the class id |
+|  cls_batch_num | int | 6 | batch size |
+|  cls_thresh | float | 0.9 | Prediction threshold, when the model prediction result is 180 degrees, and the score is greater than the threshold, the final prediction result is considered to be 180 degrees and needs to be flipped |
--- a/doc/doc_en/inference_ppocr_en.md
+++ b/doc/doc_en/inference_ppocr_en.md
@@ -160,3 +160,5 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --de
 After executing the command, the recognition result image is as follows:
 ![](../imgs_results/system_res_00018069_v3.jpg)
+For more configuration and explanation of inference parameters, please refer to：[Model Inference Parameters Explained Tutorial](./inference_args_en.md)。
--- a/tools/infer/utility.py
+++ b/tools/infer/utility.py
@@ -248,17 +248,17 @@ def create_predictor(args, mode, logger):
            config.enable_xpu(10 * 1024 * 1024)
        else:
            config.disable_gpu()
-            if hasattr(args, "cpu_threads"):
-                config.set_cpu_math_library_num_threads(args.cpu_threads)
-            else:
-                # default cpu threads as 10
-                config.set_cpu_math_library_num_threads(10)
            if args.enable_mkldnn:
                # cache 10 different shapes for mkldnn to avoid memory leak
                config.set_mkldnn_cache_capacity(10)
                config.enable_mkldnn()
                if args.precision == "fp16":
                    config.enable_mkldnn_bfloat16()
+                if hasattr(args, "cpu_threads"):
+                    config.set_cpu_math_library_num_threads(args.cpu_threads)
+                else:
+                    # default cpu threads as 10
+                    config.set_cpu_math_library_num_threads(10)
        # enable memory optim
        config.enable_memory_optim()
        config.disable_glog_info()