提交 0feb8577 编写于 作者: L LDOUBLEV

fix conflicts

......@@ -26,7 +26,7 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
**Recent updates**
- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Live Address](https://live.bilibili.com/21689802).
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. ([Technical Report](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/PP-OCRv2.pdf))
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
......@@ -98,16 +98,16 @@ For a new language request, please refer to [Guideline for new language_requests
## Tutorials
- [Environment Preparation](./doc/doc_en/environment_en.md)
- [Quick Start](./doc/doc_en/quickstart_en.md)
- [PaddleOCR Overview and Installation](./doc/doc_en/paddleOCR_overview_en.md)
- [PaddleOCR Overview and Project Clone](./doc/doc_en/paddleOCR_overview_en.md)
- PP-OCR Industry Landing: from Training to Deployment
- [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md)
- [PP-OCR Model Zoo](./doc/doc_en/models_en.md)
- [PP-OCR Model Download](./doc/doc_en/models_list_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md)
- [PP-OCR Training](./doc/doc_en/training_en.md)
- [Text Detection](./doc/doc_en/detection_en.md)
- [Text Recognition](./doc/doc_en/recognition_en.md)
- [Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- Inference and Deployment
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
- [Serving](./deploy/pdserving/README.md)
......@@ -146,7 +146,7 @@ For a new language request, please refer to [Guideline for new language_requests
[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (arXiv link is coming soon).
[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the [technical report](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/PP-OCRv2.pdf) of PP-OCRv2.
......
......@@ -25,7 +25,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
**近期更新**
- PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[直播地址](https://live.bilibili.com/21689802)
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。([技术报告](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/PP-OCRv2.pdf))
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
- 2021.4.8 release 2.1版本,新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
......@@ -92,14 +92,14 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md)
- [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md)
- PP-OCR产业落地:从训练到部署
- [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md)
- [PP-OCR模型](./doc/doc_ch/models.md)
- [PP-OCR模型下载](./doc/doc_ch/models_list.md)
- [配置文件内容与生成](./doc/doc_ch/config.md)
- [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md)
- [PP-OCR模型库Python推理](./doc/doc_ch/inference_ppocr.md)
- [PP-OCR模型训练](./doc/doc_ch/training.md)
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [方向分类器](./doc/doc_ch/angle_class.md)
- [配置文件内容与生成](./doc/doc_ch/config.md)
- PP-OCR模型推理部署
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/pdserving/README_CN.md)
......@@ -142,7 +142,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
[1] PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
[2] PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCR技术方案(arxiv链接生成中)
[2] PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/PP-OCRv2.pdf)
<a name="效果展示"></a>
......
# 运行环境准备
Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建PyThon环境。
Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建Python环境。
如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。
......
# PP-OCR模型库快速推理
# PP-OCR模型库Python推理
本文介绍针对PP-OCR模型库的Python推理引擎使用方法,内容依次为文本检测、文本识别、方向分类器以及三者串联在CPU、GPU上的预测方法。
......@@ -24,11 +24,12 @@
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar
tar xf ch_PP-OCRv2_det_infer.tar
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv2_det_infer.tar/"
```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/det_res_00018069.jpg)
![](../imgs_results/det_res_00018069.jpg)
通过参数`limit_type``det_limit_side_len`来对图片的尺寸进行限制,
`limit_type`可选参数为[`max`, `min`],
......@@ -69,7 +70,7 @@ tar xf ch_PP-OCRv2_rec_infer.tar
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./ch_PP-OCRv2_rec_infer/"
```
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_4.jpg)
![](../imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:
......@@ -87,7 +88,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
```
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/korean/1.jpg)
![](../imgs_words/korean/1.jpg)
执行命令后,上图的预测结果为:
......@@ -108,7 +109,7 @@ tar xf ch_ppocr_mobile_v2.0_cls_infer.tar
python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer"
```
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_1.jpg)
![](../imgs_words/ch/word_1.jpg)
执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下:
......@@ -133,4 +134,4 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --de
执行命令后,识别结果图像如下:
![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/system_res_00018069.jpg)
![](../imgs_results/system_res_00018069.jpg)
# PP-OCR模型与配置文件
PP-OCR模型与配置文件一章主要补充一些OCR模型的基本概念、配置文件的内容与作用以便对模型后续的参数调整和训练中拥有更好的体验
# PP-OCR模型
PP-OCR模型一节主要补充一些OCR模型的基本概念以及如何快速运用PP-OCR模型库中的模型
章包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference_ppocr.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。
节包含两个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[PP-OCR模型库Python推理](./inference_ppocr.md)中介绍PP-OCR模型库的使用方法,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。
------
......
# 模型训练
# PP-OCR模型训练
本文将介绍模型训练时需掌握的基本概念,和训练时的调优方法。
同时会简单介绍PaddleOCR模型训练数据的组成部分,以及如何在垂类场景中准备数据finetune模型。
- [1. 基本概念](#基本概念)
* [1.1 学习率](#学习率)
* [1.2 正则化](#正则化)
* [1.3 评估指标](#评估指标)
- [2. 数据与垂类场景](#数据与垂类场景)
* [2.1 训练数据](#训练数据)
* [2.2 垂类场景](#垂类场景)
* [2.3 自己构建数据集](#自己构建数据集)
* [3. 常见问题](#常见问题)
- [1.配置文件](#配置文件)
- [2. 基本概念](#基本概念)
* [2.1 学习率](#学习率)
* [2.2 正则化](#正则化)
* [2.3 评估指标](#评估指标)
- [3. 数据与垂类场景](#数据与垂类场景)
* [3.1 训练数据](#训练数据)
* [3.2 垂类场景](#垂类场景)
* [3.3 自己构建数据集](#自己构建数据集)
* [4. 常见问题](#常见问题)
<a name="配置文件"></a>
## 1. 配置文件说明
PaddleOCR模型使用配置文件管理网络训练、评估的参数。在配置文件中,可以设置组建模型、优化器、损失函数、模型前后处理的参数,PaddleOCR从配置文件中读取到这些参数,进而组建出完整的训练流程,完成模型训练,在需要对模型进行优化的时,可以通过修改配置文件中的参数完成配置,使用简单且方便修改。
完整的配置文件说明可以参考[配置文件](./config.md)
<a name="基本概念"></a>
## 1. 基本概念
OCR(Optical Character Recognition,光学字符识别)是指对图像进行分析识别处理,获取文字和版面信息的过程,是典型的计算机视觉任务,
通常由文本检测和文本识别两个子任务构成。
## 2. 基本概念
模型调优时需要关注以下参数
模型训练过程中需要手动调整一些超参数,帮助模型以最小的代价获得最优指标。不同的数据量可能需要不同的超参,当您希望在自己的数据上finetune或对模型效果调优时,有以下几个参数调整策略可供参考
<a name="学习率"></a>
### 1.1 学习率
......
# PP-OCR Model and Configuration
The chapter on PP-OCR model and configuration file mainly adds some basic concepts of OCR model and the content and role of configuration file to have a better experience in the subsequent parameter adjustment and training of the model.
# PP-OCR Model Zoo
The PP-OCR model zoo section explains some basic concepts of the OCR model and how to quickly use the models in the PP-OCR model library.
This chapter contains three parts. Firstly, [PP-OCR Model Download](. /models_list_en.md) explains the concept of PP-OCR model types and provides links to download all models. Then in [Yml Configuration](. /config_en.md) details the parameters needed to fine-tune the PP-OCR models. The final [Python Inference for PP-OCR Model Library](. /inference_ppocr_en.md) is an introduction to the use of the PP-OCR model library in the first section, which can quickly utilize the rich model library models to obtain test results through the Python inference engine.
This section contains two parts. Firstly, [PP-OCR Model Download](. /models_list_en.md) explains the concept of PP-OCR model types and provides links to download all models. The next [Python Inference for PP-OCR Model Library](. /inference_ppocr_en.md) is an introduction to the use of the PP-OCR model library, which can quickly utilize the rich model library models to obtain test results through the Python inference engine.
------
......
# MODEL TRAINING
- [1. Basic concepts](#1-basic-concepts)
* [1.1 Learning rate](#11-learning-rate)
* [1.2 Regularization](#12-regularization)
* [1.3 Evaluation indicators](#13-evaluation-indicators-)
- [2. Data and vertical scenes](#2-data-and-vertical-scenes)
* [2.1 Training data](#21-training-data)
* [2.2 Vertical scene](#22-vertical-scene)
* [2.3 Build your own data set](#23-build-your-own-data-set)
* [3. FAQ](#3-faq)
- [1.Yml Configuration ](#1-Yml-Configuration)
- [2. Basic concepts](#1-basic-concepts)
* [2.1 Learning rate](#11-learning-rate)
* [2.2 Regularization](#12-regularization)
* [2.3 Evaluation indicators](#13-evaluation-indicators-)
- [3. Data and vertical scenes](#2-data-and-vertical-scenes)
* [3.1 Training data](#21-training-data)
* [3.2 Vertical scene](#22-vertical-scene)
* [3.3 Build your own data set](#23-build-your-own-data-set)
* [4. FAQ](#3-faq)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
<a name="1-Yml-Configuration"></a>
## 1. Yml configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
For the complete configuration file description, please refer to [Configuration File](./config_en.md)
<a name="1-basic-concepts"></a>
# 1. Basic concepts
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task.
It usually consists of two subtasks: text detection and text recognition.
## 2. Basic concepts
The following parameters need to be paid attention to when tuning the model:
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
<a name="11-learning-rate"></a>
## 1.1 Learning rate
### 2.1 Learning rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
......@@ -45,7 +52,7 @@ and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<a name="12-regularization"></a>
## 1.2 Regularization
### 2.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods.
......@@ -61,7 +68,7 @@ Optimizer:
factor: 2.0e-05
```
<a name="13-evaluation-indicators-"></a>
## 1.3 Evaluation indicators
### 2.3 Evaluation indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
......@@ -71,11 +78,11 @@ Optimizer:
<a name="2-data-and-vertical-scenes"></a>
# 2. Data and vertical scenes
## 3. Data and vertical scenes
<a name="21-training-data"></a>
## 2.1 Training data
### 3.1 Training data
The current open source models, data sets and magnitudes are as follows:
......@@ -92,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl
<a name="22-vertical-scene"></a>
## 2.2 Vertical scene
### 3.2 Vertical scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<a name="23-build-your-own-data-set"></a>
## 2.3 Build your own data set
### 3.3 Build your own data set
There are several experiences for reference when constructing the data set:
......@@ -117,7 +124,7 @@ There are several experiences for reference when constructing the data set:
<a name="3-faq"></a>
# 3. FAQ
## 4. FAQ
**Q**: How to choose a suitable network input shape when training CRNN recognition?
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册