未验证 提交 511a2f96 编写于 作者: E Evezerest 提交者: GitHub

Merge pull request #3956 from Evezerest/2.3

Update training.md and inference_ppocr.md
...@@ -21,12 +21,14 @@ ...@@ -21,12 +21,14 @@
``` ```
# 下载超轻量中文检测模型: # 下载超轻量中文检测模型:
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tartar xf ch_ppocr_mobile_v2.0_det_infer.tarpython3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/" wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/"
``` ```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/det_res_00018069.jpg) ![](../imgs_results/det_res_00018069.jpg)
通过参数`limit_type``det_limit_side_len`来对图片的尺寸进行限制, 通过参数`limit_type``det_limit_side_len`来对图片的尺寸进行限制,
`limit_type`可选参数为[`max`, `min`], `limit_type`可选参数为[`max`, `min`],
...@@ -67,7 +69,7 @@ tar xf ch_ppocr_mobile_v2.0_rec_infer.tar ...@@ -67,7 +69,7 @@ tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer"
``` ```
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_4.jpg) ![](../imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: 执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:
...@@ -86,7 +88,7 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.98458153) ...@@ -86,7 +88,7 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.98458153)
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
``` ```
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/korean/1.jpg) ![](../imgs_words/korean/1.jpg)
执行命令后,上图的预测结果为: 执行命令后,上图的预测结果为:
...@@ -107,7 +109,7 @@ tar xf ch_ppocr_mobile_v2.0_cls_infer.tar ...@@ -107,7 +109,7 @@ tar xf ch_ppocr_mobile_v2.0_cls_infer.tar
python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer" python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer"
``` ```
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_1.jpg) ![](../imgs_words/ch/word_1.jpg)
执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下: 执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下:
...@@ -132,5 +134,5 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --de ...@@ -132,5 +134,5 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --de
执行命令后,识别结果图像如下: 执行命令后,识别结果图像如下:
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/system_res_00018069.jpg) ![](../imgs_results/system_res_00018069.jpg)
# 模型训练 # PP-OCR模型训练
本文将介绍模型训练时需掌握的基本概念,和训练时的调优方法。 本文将介绍模型训练时需掌握的基本概念,和训练时的调优方法。
同时会简单介绍PaddleOCR模型训练数据的组成部分,以及如何在垂类场景中准备数据finetune模型。 同时会简单介绍PaddleOCR模型训练数据的组成部分,以及如何在垂类场景中准备数据finetune模型。
- [1. 基本概念](#基本概念) - [1.配置文件](#配置文件)
* [1.1 学习率](#学习率) - [2. 基本概念](#基本概念)
* [1.2 正则化](#正则化) * [2.1 学习率](#学习率)
* [1.3 评估指标](#评估指标) * [2.2 正则化](#正则化)
- [2. 数据与垂类场景](#数据与垂类场景) * [2.3 评估指标](#评估指标)
* [2.1 训练数据](#训练数据) - [3. 数据与垂类场景](#数据与垂类场景)
* [2.2 垂类场景](#垂类场景) * [3.1 训练数据](#训练数据)
* [2.3 自己构建数据集](#自己构建数据集) * [3.2 垂类场景](#垂类场景)
* [3. 常见问题](#常见问题) * [3.3 自己构建数据集](#自己构建数据集)
* [4. 常见问题](#常见问题)
<a name="配置文件"></a>
## 1. 配置文件说明
PaddleOCR模型使用配置文件管理网络训练、评估的参数。在配置文件中,可以设置组建模型、优化器、损失函数、模型前后处理的参数,PaddleOCR从配置文件中读取到这些参数,进而组建出完整的训练流程,完成模型训练,在需要对模型进行优化的时,可以通过修改配置文件中的参数完成配置,使用简单且方便修改。
完整的配置文件说明可以参考[配置文件](./config.md)
<a name="基本概念"></a> <a name="基本概念"></a>
## 1. 基本概念
OCR(Optical Character Recognition,光学字符识别)是指对图像进行分析识别处理,获取文字和版面信息的过程,是典型的计算机视觉任务, ## 2. 基本概念
通常由文本检测和文本识别两个子任务构成。
模型调优时需要关注以下参数 模型训练过程中需要手动调整一些超参数,帮助模型以最小的代价获得最优指标。不同的数据量可能需要不同的超参,当您希望在自己的数据上finetune或对模型效果调优时,有以下几个参数调整策略可供参考
<a name="学习率"></a> <a name="学习率"></a>
### 1.1 学习率 ### 1.1 学习率
......
# MODEL TRAINING # MODEL TRAINING
- [1. Basic concepts](#1-basic-concepts) - [1.Yml Configuration ](#1-Yml-Configuration)
* [1.1 Learning rate](#11-learning-rate) - [2. Basic concepts](#1-basic-concepts)
* [1.2 Regularization](#12-regularization) * [2.1 Learning rate](#11-learning-rate)
* [1.3 Evaluation indicators](#13-evaluation-indicators-) * [2.2 Regularization](#12-regularization)
- [2. Data and vertical scenes](#2-data-and-vertical-scenes) * [2.3 Evaluation indicators](#13-evaluation-indicators-)
* [2.1 Training data](#21-training-data) - [3. Data and vertical scenes](#2-data-and-vertical-scenes)
* [2.2 Vertical scene](#22-vertical-scene) * [3.1 Training data](#21-training-data)
* [2.3 Build your own data set](#23-build-your-own-data-set) * [3.2 Vertical scene](#22-vertical-scene)
* [3. FAQ](#3-faq) * [3.3 Build your own data set](#23-build-your-own-data-set)
* [4. FAQ](#3-faq)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training. This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene. At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
<a name="1-Yml-Configuration"></a>
## 1. Yml configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
For the complete configuration file description, please refer to [Configuration File](./config_en.md)
<a name="1-basic-concepts"></a> <a name="1-basic-concepts"></a>
# 1. Basic concepts
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task. ## 2. Basic concepts
It usually consists of two subtasks: text detection and text recognition.
The following parameters need to be paid attention to when tuning the model: In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
<a name="11-learning-rate"></a> <a name="11-learning-rate"></a>
## 1.1 Learning rate ### 2.1 Learning rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration. The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example: A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
...@@ -45,7 +52,7 @@ and the learning rate is the same in each stage. ...@@ -45,7 +52,7 @@ and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py). warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<a name="12-regularization"></a> <a name="12-regularization"></a>
## 1.2 Regularization ### 2.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods. Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods. L1 and L2 regularization are the most commonly used regularization methods.
...@@ -61,7 +68,7 @@ Optimizer: ...@@ -61,7 +68,7 @@ Optimizer:
factor: 2.0e-05 factor: 2.0e-05
``` ```
<a name="13-evaluation-indicators-"></a> <a name="13-evaluation-indicators-"></a>
## 1.3 Evaluation indicators ### 2.3 Evaluation indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection. (1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
...@@ -71,11 +78,11 @@ Optimizer: ...@@ -71,11 +78,11 @@ Optimizer:
<a name="2-data-and-vertical-scenes"></a> <a name="2-data-and-vertical-scenes"></a>
# 2. Data and vertical scenes ## 3. Data and vertical scenes
<a name="21-training-data"></a> <a name="21-training-data"></a>
## 2.1 Training data ### 3.1 Training data
The current open source models, data sets and magnitudes are as follows: The current open source models, data sets and magnitudes are as follows:
...@@ -92,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl ...@@ -92,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl
<a name="22-vertical-scene"></a> <a name="22-vertical-scene"></a>
## 2.2 Vertical scene ### 3.2 Vertical scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself; PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories. If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<a name="23-build-your-own-data-set"></a> <a name="23-build-your-own-data-set"></a>
## 2.3 Build your own data set ### 3.3 Build your own data set
There are several experiences for reference when constructing the data set: There are several experiences for reference when constructing the data set:
...@@ -117,7 +124,7 @@ There are several experiences for reference when constructing the data set: ...@@ -117,7 +124,7 @@ There are several experiences for reference when constructing the data set:
<a name="3-faq"></a> <a name="3-faq"></a>
# 3. FAQ ## 4. FAQ
**Q**: How to choose a suitable network input shape when training CRNN recognition? **Q**: How to choose a suitable network input shape when training CRNN recognition?
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册