"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
cfg.ir_optim=True
cfg.enable_mkldnn=enable_mkldnn
self.ser_predictor=SerPredictor(cfg)
defmerge_configs(self,):
# deafult cfg
backup_argv=copy.deepcopy(sys.argv)
sys.argv=sys.argv[:1]
cfg=parse_args()
update_cfg_map=vars(read_params())
forkeyinupdate_cfg_map:
cfg.__setattr__(key,update_cfg_map[key])
sys.argv=copy.deepcopy(backup_argv)
returncfg
defread_images(self,paths=[]):
images=[]
forimg_pathinpaths:
assertos.path.isfile(
img_path),"The {} isn't a valid file.".format(img_path)
img=cv2.imread(img_path)
ifimgisNone:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
returnimages
defpredict(self,images=[],paths=[]):
"""
Get the chinese texts in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
Returns:
res (list): The result of chinese texts and save path of images.
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
cfg.ir_optim=True
cfg.enable_mkldnn=enable_mkldnn
self.ser_re_predictor=SerRePredictor(cfg)
defmerge_configs(self,):
# deafult cfg
backup_argv=copy.deepcopy(sys.argv)
sys.argv=sys.argv[:1]
cfg=parse_args()
update_cfg_map=vars(read_params())
forkeyinupdate_cfg_map:
cfg.__setattr__(key,update_cfg_map[key])
sys.argv=copy.deepcopy(backup_argv)
returncfg
defread_images(self,paths=[]):
images=[]
forimg_pathinpaths:
assertos.path.isfile(
img_path),"The {} isn't a valid file.".format(img_path)
img=cv2.imread(img_path)
ifimgisNone:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
returnimages
defpredict(self,images=[],paths=[]):
"""
Get the chinese texts in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
Returns:
res (list): The result of chinese texts and save path of images.
**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself.
@@ -201,6 +215,8 @@ For example, if using the configuration file to start the text angle classificat
`http://127.0.0.1:8869/predict/structure_table`
`http://127.0.0.1:8870/predict/structure_system`
`http://127.0.0.1:8870/predict/structure_layout`
`http://127.0.0.1:8871/predict/kie_ser`
`http://127.0.0.1:8872/predict/kie_ser_re`
-**image_dir**:Test image path, can be a single image path or an image directory path
-**visualize**:Whether to visualize the results, the default value is False
-**output**:The floder to save Visualization result, default value is `./hubserving_result`
...
...
@@ -225,15 +241,17 @@ The returned result is a list. Each item in the list is a dict. The dict may con
The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`. The details are as follows:
| field name/module name | ocr_det | ocr_cls | ocr_rec | ocr_system | structure_table | structure_system | structure_layout |
**Note:** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section.
Paddle2ONNX supports converting the PaddlePaddle model format to the ONNX model format. The operator currently supports exporting ONNX Opset 9~11 stably, and some Paddle operators support lower ONNX Opset conversion.
For more details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_en.md)
- 安装 Paddle2ONNX
- install Paddle2ONNX
```
python3.7 -m pip install paddle2onnx
```
-安装 ONNXRuntime
-install ONNXRuntime
```
# 建议安装 1.9.0 版本,可根据环境更换版本号
# It is recommended to install version 1.9.0, and the version number can be changed according to the environment
There are two ways to obtain the Paddle model: Download the prediction model provided by PaddleOCR in [model_list](../../doc/doc_en/models_list_en.md);
Refer to [Model Export Instructions](../../doc/doc_en/inference_en.md#1-convert-training-model-to-inference-model) to convert the trained weights to inference_model.
以 ppocr 中文检测、识别、分类模型为例:
Take the PP-OCRv3 detection, recognition, and classification model as an example:
* Note: For the OCR model, the conversion process must be in the form of dynamic shape, that is, add the option --input_shape_dict="{'x': [-1, 3, -1, -1]}", otherwise the prediction result may be the same as Predicting directly with Paddle is slightly different.
In addition, the following models do not currently support conversion to ONNX models:
NRTR, SAR, RARE, SRN
## 3. 推理预测
## 3. prediction
以中文OCR模型为例,使用 ONNXRuntime 预测可执行如下命令:
Take the English OCR model as an example, use **ONNXRuntime** to predict and execute the following commands:
After executing the command, the predicted identification information will be printed out in the terminal, and the visualization results will be saved under `./inference_results/`.
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
...
...
@@ -101,7 +102,7 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
-[3. Model Training / Evaluation / Prediction](#3)
-[3.1 Training](#3-1)
-[3.2 Evaluation](#3-2)
-[3.3 Prediction](#3-3)
-[4. Inference and Deployment](#4)
-[4.1 Python Inference](#4-1)
-[4.2 C++ Inference](#4-2)
-[4.3 Serving](#4-3)
-[4.4 More](#4-4)
-[5. FAQ](#5)
<aname="1"></a>
## 1. Introduction
Paper:
> [Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition](https://arxiv.org/abs/2105.06229.pdf)
> Hui Jiang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Yi Niu, Wenqi Ren, Fei Wu, and Wenming Tan
> ICDAR, 2021
Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets, the algorithm reproduction effect is as follows:
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
<aname="3"></a>
## 3. Model Training / Evaluation / Prediction
PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Training:
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
```
#step1:train the CNT branch
#Single GPU training (long training period, not recommended)
First, the model saved during the RFL text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/contribution/rec_resnet_rfl.tar)) ), you can use the following command to convert:
- If you are training the model on your own dataset and have modified the dictionary file, please pay attention to modify the `character_dict_path` in the configuration file to the modified dictionary file.
- If you modified the input size during training, please modify the `infer_shape` corresponding to NRTR in the `tools/export_model.py` file.
After the conversion is successful, there are three files in the directory:
```
/inference/rec_resnet_rfl_att/
├── inference.pdiparams
├── inference.pdiparams.info
└── inference.pdmodel
```
For RFL text recognition model inference, the following commands can be executed:
*On two 8-card P40 graphics cards, the final time consumption and speedup ratio for public recognition dataset (LSVT, RCTW, MTWI) containing 260k images are as follows.
*We conducted model training on 2x8 P40 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below.
| Model | Configuration | Configuration | 8 GPU training time / Accuracy | 3x8 GPU training time / Accuracy | Acceleration ratio |
| Model | Config file | Recognition acc | single 8-card training time | two 8-card training time | Speedup ratio |
> Note: when training with 3x8 GPUs, the single card batch size is unchanged compared with the 1x8 GPUs' training process, and the learning rate is multiplied by 2 (if it is multiplied by 3 by default, the accuracy is only 73.42%).
* We conducted model training on 4x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below.
| Model | Configuration | Configuration | 8 GPU training time / Accuracy | 4x8 GPU training time / Accuracy | Acceleration ratio |
@@ -144,16 +144,17 @@ After executing the command, the prediction results (classification angle and sc
**Note**: The input shape used by the recognition model of `PP-OCRv3` is `3, 48, 320`. If you use other recognition models, you need to set the parameter `--rec_image_shape` according to the model. In addition, the `rec_algorithm` used by the recognition model of `PP-OCRv3` is `SVTR_LCNet` by default. Note the difference from the original `SVTR`.
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, pdf file is also supported, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
| gpu_mem | GPU memory size used for initialization | 8000M |
| image_dir | The images path or folder path for predicting when used by the command line | |
| page_num | Valid when the input type is pdf file, specify to predict the previous page_num pages, all pages are predicted by default | 0 |
| det_algorithm | Type of detection algorithm selected | DB |
| det_model_dir | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| det_max_side_len | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally | 960 |
After the operation is completed, each image will store the visualized image in the `kie` directory under the directory specified by the `output` field, and the image name is the same as the input image name.
### 2.2 RE+SER
```bash
cd ppstructure
mkdir inference && cd inference
# download model
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar && tar -xf ser_vi_layoutxlm_xfund_infer.tar
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_infer.tar && tar -xf re_vi_layoutxlm_xfund_infer.tar
After the operation is completed, each image will have a directory with the same name in the `kie` directory under the directory specified by the `output` field, where the visual images and prediction results are stored.