diff --git a/README.md b/README.md index 4a938f2f049e4a6e05c1f4f4a795f0d898944ea9..95f35277a1d634c87d5720c7151d066b09dbdae7 100644 --- a/README.md +++ b/README.md @@ -181,16 +181,11 @@ For a new language request, please refer to [Guideline for new language_requests ## Guideline for New Language Requests -If you want to request a new language support, a PR with 2 following files are needed: +If you want to request a new language support, a PR with 1 following files are needed: 1. In folder [ppocr/utils/dict](./ppocr/utils/dict), it is necessary to submit the dict text to this path and name it with `{language}_dict.txt` that contains a list of all characters. Please see the format example from other files in that folder. -2. In folder [ppocr/utils/corpus](./ppocr/utils/corpus), -it is necessary to submit the corpus to this path and name it with `{language}_corpus.txt` that contains a list of words in your language. -Maybe, 50000 words per language is necessary at least. -Of course, the more, the better. - If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on. More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048). diff --git a/deploy/slim/prune/README.md b/deploy/slim/prune/README.md index 7b8dd169c5fa9d01421070f1ccc2bd4e8ed543a2..c438572318f57fdfe9066ff2135156d7129bee4c 100644 --- a/deploy/slim/prune/README.md +++ b/deploy/slim/prune/README.md @@ -45,7 +45,7 @@ python3 setup.py install 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} } -加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) +加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0-alpha/docs/zh_cn/algo/algo.md) 进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析训练: ```bash diff --git a/deploy/slim/prune/README_en.md b/deploy/slim/prune/README_en.md index f0d652f249686c1d462cd2aa71f4766cf39e763e..f8fbed47ca1c788ea816cc76f1092b17f0ea5219 100644 --- a/deploy/slim/prune/README_en.md +++ b/deploy/slim/prune/README_en.md @@ -3,7 +3,7 @@ Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance. -This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model. +This example uses PaddleSlim provided[APIs of Pruning](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/docs/zh_cn/api_cn/dygraph/pruners) to compress the OCR model. [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry. It is recommended that you could understand following pages before reading this example: @@ -35,7 +35,7 @@ PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en. ### 3. Pruning sensitivity analysis - After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md) + After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md) The data format of sensitivity file: sen.pickle(Dict){ 'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} @@ -47,7 +47,7 @@ PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en. 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} } - The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) + The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0-alpha/docs/zh_cn/algo/algo.md) Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command: diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md index 4cafe5f44e48a479cf5b0e4209b8e335a7e4917d..d3bf12d625b076c7bc18016bc9973d1212b3d70b 100644 --- a/deploy/slim/quantization/README_en.md +++ b/deploy/slim/quantization/README_en.md @@ -5,11 +5,11 @@ Generally, a more complex model would achieve better performance in the task, bu Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance. -This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model. +This example uses PaddleSlim provided [APIs of Quantization](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/quanter/qat.rst) to compress the OCR model. It is recommended that you could understand following pages before reading this example: - [The training strategy of OCR model](../../../doc/doc_en/quickstart_en.md) -- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) +- [PaddleSlim Document](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/quanter/qat.rst) ## Quick Start Quantization is mostly suitable for the deployment of lightweight models on mobile terminals. diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md index cd5369f64bbfcf8584f3b3af30d65568770b6033..22e7ad7fc1838008be4e5a6daa6b9d273ea0ea78 100644 --- a/doc/doc_ch/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -11,7 +11,7 @@ PaddleOCR收集整理了自从开源以来在issues和用户群中的常见问 OCR领域大佬众多,本文档回答主要依赖有限的项目实践,难免挂一漏万,如有遗漏和不足,也**希望有识之士帮忙补充和修正**,万分感谢。 - [FAQ](#faq) - + * [1. 通用问题](#1) + [1.1 检测](#11) + [1.2 识别](#12) @@ -20,7 +20,7 @@ OCR领域大佬众多,本文档回答主要依赖有限的项目实践,难 + [1.5 垂类场景实现思路](#15) + [1.6 训练过程与模型调优](#16) + [1.7 补充资料](#17) - + * [2. PaddleOCR实战问题](#2) + [2.1 PaddleOCR repo](#21) + [2.2 安装环境](#22) @@ -734,7 +734,7 @@ C++TensorRT预测需要使用支持TRT的预测库并在编译时打开[-DWITH_T #### Q:PaddleOCR中,对于模型预测加速,CPU加速的途径有哪些?基于TenorRT加速GPU对输入有什么要求? -**A**:(1)CPU可以使用mkldnn进行加速;对于python inference的话,可以把enable_mkldnn改为true,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/tools/infer/utility.py#L99),对于cpp inference的话,在配置文件里面配置use_mkldnn 1即可,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/deploy/cpp_infer/tools/config.txt#L6) +**A**:(1)CPU可以使用mkldnn进行加速;对于python inference的话,可以把enable_mkldnn改为true,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/tools/infer/utility.py#L99),对于cpp inference的话,可参考[文档](https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph/deploy/cpp_infer) (2)GPU需要注意变长输入问题等,TRT6 之后才支持变长输入 @@ -838,4 +838,4 @@ nvidia-smi --lock-gpu-clocks=1590 -i 0 #### Q: 预测时显存爆炸、内存泄漏问题? -**A**: 打开显存/内存优化开关`enable_memory_optim`可以解决该问题,相关代码已合入,[查看详情](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/utility.py#L153)。 \ No newline at end of file +**A**: 打开显存/内存优化开关`enable_memory_optim`可以解决该问题,相关代码已合入,[查看详情](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/utility.py#L153)。 diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index 40c63905c3f03070a9dcbf0176ada31378b14fee..1668eba19eb0bcec6bfe3abd39bb6ca73b8f6c14 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -66,7 +66,7 @@ | :---------------------: | :---------------------: | :--------------: | :--------------------: | | model_type | 网络类型 | rec | 目前支持`rec`,`det`,`cls` | | algorithm | 模型名称 | CRNN | 支持列表见[algorithm_overview](./algorithm_overview.md) | -| **Transform** | 设置变换方式 | - | 目前仅rec类型的算法支持, 具体见[ppocr/modeling/transform](../../ppocr/modeling/transform) | +| **Transform** | 设置变换方式 | - | 目前仅rec类型的算法支持, 具体见[ppocr/modeling/transforms](../../ppocr/modeling/transforms) | | name | 变换方式类名 | TPS | 目前支持`TPS` | | num_fiducial | TPS控制点数 | 20 | 上下边各十个 | | loc_lr | 定位网络学习率 | 0.1 | \ | @@ -176,7 +176,7 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi --dict {path/of/dict} \ # 字典文件路径 -o Global.use_gpu=False # 是否使用gpu ... - + ``` 意大利文由拉丁字母组成,因此执行完命令后会得到名为 rec_latin_lite_train.yml 的配置文件。 @@ -191,21 +191,21 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi epoch_num: 500 ... character_dict_path: {path/of/dict} # 字典文件所在路径 - + Train: dataset: name: SimpleDataSet data_dir: train_data/ # 数据存放根目录 label_file_list: ["./train_data/train_list.txt"] # 训练集label路径 ... - + Eval: dataset: name: SimpleDataSet data_dir: train_data/ # 数据存放根目录 label_file_list: ["./train_data/val_list.txt"] # 验证集label路径 ... - + ``` 目前PaddleOCR支持的多语言算法有: diff --git a/doc/doc_ch/serving_inference.md b/doc/doc_ch/serving_inference.md index 7a53628e2f93d4d0ec944ec18ec5f06452698512..fea5a24546ddd2141085f56eeb99cdf72577bff3 100644 --- a/doc/doc_ch/serving_inference.md +++ b/doc/doc_ch/serving_inference.md @@ -20,7 +20,7 @@ **Python操作指南:** -目前Serving用于OCR的部分功能还在测试当中,因此在这里我们给出[Servnig latest package](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) +目前Serving用于OCR的部分功能还在测试当中,因此在这里我们给出[Servnig latest package](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md) 大家根据自己的环境选择需要安装的whl包即可,例如以Python 3.5为例,执行下列命令 ``` #CPU/GPU版本选择一个 diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index eda1e13da956ab1eede72b97e62d76b915e02169..d7bf5eaddd7b10d178cd472caf8081c4706f15b6 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -66,7 +66,7 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck | :---------------------: | :---------------------: | :--------------: | :--------------------: | | model_type | Network Type | rec | Currently support`rec`,`det`,`cls` | | algorithm | Model name | CRNN | See [algorithm_overview](./algorithm_overview_en.md) for the support list | -| **Transform** | Set the transformation method | - | Currently only recognition algorithms are supported, see [ppocr/modeling/transform](../../ppocr/modeling/transform) for details | +| **Transform** | Set the transformation method | - | Currently only recognition algorithms are supported, see [ppocr/modeling/transforms](../../ppocr/modeling/transforms) for details | | name | Transformation class name | TPS | Currently supports `TPS` | | num_fiducial | Number of TPS control points | 20 | Ten on the top and bottom | | loc_lr | Localization network learning rate | 0.1 | \ | diff --git a/doc/doc_en/tricks_en.md b/doc/doc_en/tricks_en.md index eab9c89236ca86d4e473fbb2776941fdd3e7567d..4d59857a04f3985c9f8c189e6b0fc54a6cc1cc0f 100644 --- a/doc/doc_en/tricks_en.md +++ b/doc/doc_en/tricks_en.md @@ -12,25 +12,25 @@ Here we have sorted out some Chinese OCR training and prediction tricks, which a At present, ResNet_vd series and MobileNetV3 series are the backbone networks used in PaddleOCR, whether replacing the other backbone networks will help to improve the accuracy? What should be paid attention to when replacing? - **Tips** - - Whether text detection or text recognition, the choice of backbone network is a trade-off between prediction effect and prediction efficiency. Generally, a larger backbone network is selected, e.g. ResNet101_vd, then the performance of the detection or recognition is more accurate, but the time cost will increase accordingly. And a smaller backbone network is selected, e.g. MobileNetV3_small_x0_35, the prediction speed is faster, but the accuracy of detection or recognition will be reduced. Fortunately, the detection or recognition effect of different backbone networks is positively correlated with the performance of ImageNet 1000 classification task. [**PaddleClas**](https://github.com/PaddlePaddle/PaddleClas/blob/master/README_en.md) have sorted out the 23 series of classification network structures, such as ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet. It provides the top1 accuracy of classification, the time cost of GPU(V100 and T4) and CPU(SD 855), and the 117 pretrained models [**download addresses**](https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html). - + - Whether text detection or text recognition, the choice of backbone network is a trade-off between prediction effect and prediction efficiency. Generally, a larger backbone network is selected, e.g. ResNet101_vd, then the performance of the detection or recognition is more accurate, but the time cost will increase accordingly. And a smaller backbone network is selected, e.g. MobileNetV3_small_x0_35, the prediction speed is faster, but the accuracy of detection or recognition will be reduced. Fortunately, the detection or recognition effect of different backbone networks is positively correlated with the performance of ImageNet 1000 classification task. [**PaddleClas**](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/en/models/models_intro_en.md) have sorted out the 23 series of classification network structures, such as ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet. It provides the top1 accuracy of classification, the time cost of GPU(V100 and T4) and CPU(SD 855), and the 117 pretrained models [**download addresses**](https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html). + - Similar as the 4 stages of ResNet, the replacement of text detection backbone network is to determine those four stages to facilitate the integration of FPN like the object detection heads. In addition, for the text detection problem, the pre trained model in ImageNet1000 can accelerate the convergence and improve the accuracy. - + - In order to replace the backbone network of text recognition, we need to pay attention to the descending position of network width and height stride. Since the ratio between width and height is large in chinese text recognition, the frequency of height decrease is less and the frequency of width decrease is more. You can refer the [modifies of MobileNetV3](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py) in PaddleOCR. #### 2、Long Chinese Text Recognition -- **Problem Description** +- **Problem Description** The maximum resolution of Chinese recognition model during training is [3,32,320], if the text image to be recognized is too long, as shown in the figure below, how to adapt? - +
- + - **Tips** During the training, the training samples are not directly resized to [3,32,320]. At first, the height of samples are resized to 32 and keep the ratio between the width and the height. When the width is less than 320, the excess parts are padding 0. Besides, when the ratio between the width and the height of the samples is larger than 10, these samples will be ignored. When the prediction for one image, do as above, but do not limit the max ratio between the width and the height. When the prediction for an images batch, do as training, but the resized target width is the longest width of the images in the batch. [Code as following](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/predict_rec.py): - + ``` def resize_norm_img(self, img, max_wh_ratio): imgC, imgH, imgW = self.rec_image_shape @@ -58,11 +58,11 @@ Here we have sorted out some Chinese OCR training and prediction tricks, which a - **Problem Description** As shown in the figure below, for Chinese and English mixed scenes, in order to facilitate reading and using the recognition results, it is often necessary to recognize the spaces between words. How can this situation be adapted? - +
- + - **Tips** - + There are two possible methods for space recognition. (1) Optimize the text detection. For spliting the text at the space in detection results, it needs to divide the text line with space into many segments when label the data for detection. (2) Optimize the text recognition. The space character is introduced into the recognition dictionary. Label the blank line in the training data for text recognition. In addition, we can also concat multiple word lines to synthesize the training data with spaces. PaddleOCR currently uses the second method. diff --git a/ppstructure/README.md b/ppstructure/README.md index bf5a7dcf74cdd6865497daebfc719c40de3eb550..236b6a39045d814b1ad3a00f658b5f778ac207c5 100644 --- a/ppstructure/README.md +++ b/ppstructure/README.md @@ -117,4 +117,4 @@ PP-Structure Series Model List (Updating) |ser_LayoutXLM_xfun_zhd|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) | |re_LayoutXLM_xfun_zh|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | -If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and [PPStructure model_list](./docs/model_list.md) +If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and [PPStructure model_list](./docs/models_list.md) diff --git a/ppstructure/README_ch.md b/ppstructure/README_ch.md index 1013c619bf706a9d653371f48c566c087668b812..71456fd03196adec2e4dcff196f084411bb69af6 100644 --- a/ppstructure/README_ch.md +++ b/ppstructure/README_ch.md @@ -116,4 +116,4 @@ PP-Structure系列模型列表(更新中) |re_LayoutXLM_xfun_zh|基于LayoutXLM在xfun中文数据集上训练的RE模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | -更多模型下载,可以参考 [PP-OCR model_list](../doc/doc_en/models_list.md) and [PP-Structure model_list](./docs/models_list.md) +更多模型下载,可以参考 [PP-OCR model_list](../doc/doc_ch/models_list.md) and [PP-Structure model_list](./docs/models_list.md)