diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index 29e46c0dd136ee02e7a157cecea4664f693a7af1..a729b900d4419706c35fa029f163fba3b4afec1e 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -168,7 +168,7 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi --dict {path/of/dict} \ # 字典文件路径 -o Global.use_gpu=False # 是否使用gpu ... - + ``` 意大利文由拉丁字母组成,因此执行完命令后会得到名为 rec_latin_lite_train.yml 的配置文件。 @@ -184,21 +184,21 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi ... character_type: it # 需要识别的语种 character_dict_path: {path/of/dict} # 字典文件所在路径 - + Train: dataset: name: SimpleDataSet data_dir: train_data/ # 数据存放根目录 label_file_list: ["./train_data/train_list.txt"] # 训练集label路径 ... - + Eval: dataset: name: SimpleDataSet data_dir: train_data/ # 数据存放根目录 label_file_list: ["./train_data/val_list.txt"] # 验证集label路径 ... - + ``` 目前PaddleOCR支持的多语言算法有: @@ -217,7 +217,3 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi | rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 | devanagari | 更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99) - -多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以通过下面两种方式下载。 -* [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA)。提取码:frgi。 -* [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view) diff --git a/doc/doc_ch/environment.md b/doc/doc_ch/environment.md index 4f2acc29d9f70e75a0ed18ea358b747f77cd4a9e..8efc31983a9d7cee50f922a3f84a2b1a2de23889 100644 --- a/doc/doc_ch/environment.md +++ b/doc/doc_ch/environment.md @@ -1,7 +1,5 @@ # 运行环境准备 -[运行环境准备](#运行环境准备) - * [1. Python环境搭建](#1) + [1.1 Windows](#1.1) + [1.2 Mac](#1.2) diff --git a/doc/doc_ch/inference_ppocr.md b/doc/doc_ch/inference_ppocr.md new file mode 100644 index 0000000000000000000000000000000000000000..493a4c9868621b762895e1ee11f76ac250918453 --- /dev/null +++ b/doc/doc_ch/inference_ppocr.md @@ -0,0 +1,136 @@ +# PP-OCR模型库快速推理 + +本文介绍针对PP-OCR模型库的Python推理引擎使用方法,内容依次为文本检测、文本识别、方向分类器以及三者串联在CPU、GPU上的预测方法。 + + +- [1. 文本检测模型推理](#文本检测模型推理) + +- [2. 文本识别模型推理](#文本识别模型推理) + - [2.1 超轻量中文识别模型推理](#超轻量中文识别模型推理) + - [2.2 多语言模型的推理](#多语言模型的推理) + +- [3. 方向分类模型推理](#方向分类模型推理) + +- [4. 文本检测、方向分类和文字识别串联推理](#文本检测、方向分类和文字识别串联推理) + + + +## 1. 文本检测模型推理 + +文本检测模型推理,默认使用DB模型的配置参数。超轻量中文检测模型推理,可以执行如下命令: + +``` +# 下载超轻量中文检测模型: +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tartar xf ch_ppocr_mobile_v2.0_det_infer.tarpython3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/" +``` + +可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: + +![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/det_res_00018069.jpg) + +通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制, +`limit_type`可选参数为[`max`, `min`], +`det_limit_size_len` 为正整数,一般设置为32 的倍数,比如960。 + +参数默认设置为`limit_type='max', det_limit_side_len=960`。表示网络输入图像的最长边不能超过960, +如果超过这个值,会对图像做等宽比的resize操作,确保最长边为`det_limit_side_len`。 +设置为`limit_type='min', det_limit_side_len=960` 则表示限制图像的最短边为960。 + +如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以设置det_limit_side_len 为想要的值,比如1216: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216 +``` + +如果想使用CPU进行预测,执行命令如下 + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +``` + + + + + +## 2. 文本识别模型推理 + + + +### 2.1 超轻量中文识别模型推理 + +超轻量中文识别模型推理,可以执行如下命令: + +``` +# 下载超轻量中文识别模型: +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar +tar xf ch_ppocr_mobile_v2.0_rec_infer.tar +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer" +``` + +![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_4.jpg) + +执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: + +```bash +Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.98458153) +``` + + + +### 2.2 多语言模型的推理 + +如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果, +需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/fonts/` 路径下有默认提供的小语种字体,例如韩文识别: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" +``` + +![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/korean/1.jpg) + +执行命令后,上图的预测结果为: + +``` text +Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904) +``` + + + +## 3. 方向分类模型推理 + +方向分类模型推理,可以执行如下命令: + +``` +# 下载超轻量中文方向分类器模型: +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar +tar xf ch_ppocr_mobile_v2.0_cls_infer.tar +python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer" +``` + +![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_1.jpg) + +执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下: + +``` +Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999982] +``` + + + +## 4. 文本检测、方向分类和文字识别串联推理 + +以超轻量中文OCR模型推理为例,在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir`和`rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。`use_mp`表示是否使用多进程。`total_process_num`表示在使用多进程时的进程数。可视化识别结果默认保存到 ./inference_results 文件夹里面。 + +```shell +# 使用方向分类器 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true +# 不使用方向分类器 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false +# 使用多进程 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6 +``` + +执行命令后,识别结果图像如下: + +![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/system_res_00018069.jpg) + diff --git a/doc/doc_ch/models_and_config.md b/doc/doc_ch/models_and_config.md index 167b7ec2cb039a5b7943cda98474d809019a57b7..0797fad34763234dacb9d64cbf586e03f9d846af 100644 --- a/doc/doc_ch/models_and_config.md +++ b/doc/doc_ch/models_and_config.md @@ -1,12 +1,22 @@ -# 目录 +# PP-OCR模型与配置文件 +PP-OCR模型与配置文件一节主要介绍OCR模型的基本概念、配置文件的内容与作用以便在后续模型训练过程中拥有更好的体验。 + +本节包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./doc/doc_ch/config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference.md)是对PP-OCR模型库使用方法的介绍,可以。 + +总体而言, + +下面我们首先了解一些OCR相关的基本概念: + + + - [1. OCR 简要介绍](#1-ocr-----) * [1.1 OCR 检测模型基本概念](#11-ocr---------) * [1.2 OCR 识别模型基本概念](#12-ocr---------) * [1.3 PP-OCR模型](#13-pp-ocr--) -# 1. OCR 简要介绍 +## 1. OCR 简要介绍 本节简要介绍OCR检测模型、识别模型的基本概念,并介绍PaddleOCR的PP-OCR模型。 OCR(Optical Character Recognition,光学字符识别)目前是文字识别的统称,已不限于文档或书本文字识别,更包括识别自然场景下的文字,又可以称为STR(Scene Text Recognition)。 @@ -14,7 +24,7 @@ OCR(Optical Character Recognition,光学字符识别)目前是文字识别 OCR文字识别一般包括两个部分,文本检测和文本识别;文本检测首先利用检测算法检测到图像中的文本行;然后检测到的文本行用识别算法去识别到具体文字。 -## 1.1 OCR 检测模型基本概念 +### 1.1 OCR 检测模型基本概念 文本检测就是要定位图像中的文字区域,然后通常以边界框的形式将单词或文本行标记出来。传统的文字检测算法多是通过手工提取特征的方式,特点是速度快,简单场景效果好,但是面对自然场景,效果会大打折扣。当前多是采用深度学习方法来做。 @@ -24,15 +34,16 @@ OCR文字识别一般包括两个部分,文本检测和文本识别;文本 3. 混合目标检测和分割的方法; -## 1.2 OCR 识别模型基本概念 +### 1.2 OCR 识别模型基本概念 OCR识别算法的输入数据一般是文本行,背景信息不多,文字占据主要部分,识别算法目前可以分为两类算法: 1. 基于CTC的方法;即识别算法的文字预测模块是基于CTC的,常用的算法组合为CNN+RNN+CTC。目前也有一些算法尝试在网络中加入transformer模块等等。 2. 基于Attention的方法;即识别算法的文字预测模块是基于Attention的,常用算法组合是CNN+RNN+Attention。 -## 1.3 PP-OCR模型 +### 1.3 PP-OCR模型 PaddleOCR 中集成了很多OCR算法,文本检测算法有DB、EAST、SAST等等,文本识别算法有CRNN、RARE、StarNet、Rosetta、SRN等算法。 其中PaddleOCR针对中英文自然场景通用OCR,推出了PP-OCR系列模型,PP-OCR模型由DB+CRNN算法组成,利用海量中文数据训练加上模型调优方法,在中文场景上具备较高的文本检测识别能力。并且PaddleOCR推出了高精度超轻量PP-OCRv2模型,检测模型仅3M,识别模型仅8.5M,利用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)的模型量化方法,可以在保持精度不降低的情况下,将检测模型压缩到0.8M,识别压缩到3M,更加适用于移动端部署场景。 + diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index 43671bf5b051b85a7d0728253bfeab069cd82642..793a4130c56b4d99f0edb39cf9e5a3781f2775c3 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -66,46 +66,6 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 #### 3. 多语言识别模型(更多语言持续更新中...) -**说明:** 新增的多语言模型的配置文件通过代码方式生成,您可以通过`--help`参数查看当前PaddleOCR支持生成哪些多语言的配置文件: -```bash -# 该代码需要在指定目录运行 -cd {your/path/}PaddleOCR/configs/rec/multi_language/ -python3 generate_multi_language_configs.py --help -``` -下面以生成意大利语配置文件为例: -##### 1. 生成意大利语配置文件测试现有模型 - -如果您仅仅想用配置文件测试PaddleOCR提供的多语言模型可以通过下面命令生成默认的配置文件,使用PaddleOCR提供的小语种字典进行预测。 -```bash -# 该代码需要在指定目录运行 -cd {your/path/}PaddleOCR/configs/rec/multi_language/ -# 通过-l或者--language参数设置需要生成的语种的配置文件,该命令会将默认参数写入配置文件 -python3 generate_multi_language_configs.py -l it -``` -##### 2. 生成意大利语配置文件训练自己的数据 -如果您想训练自己的小语种模型,可以准备好训练集文件、验证集文件、字典文件和训练数据路径,这里假设准备的意大利语的训练集、验证集、字典和训练数据路径为: -- 训练集:{your/path/}PaddleOCR/train_data/train_list.txt -- 验证集:{your/path/}PaddleOCR/train_data/val_list.txt -- 使用PaddleOCR提供的默认字典:{your/path/}PaddleOCR/ppocr/utils/dict/it_dict.txt -- 训练数据路径:{your/path/}PaddleOCR/train_data - -使用以下命令生成配置文件: -```bash -# 该代码需要在指定目录运行 -cd {your/path/}PaddleOCR/configs/rec/multi_language/ -# -l或者--language字段是必须的 -# --train修改训练集,--val修改验证集,--data_dir修改数据集目录,-o修改对应默认参数 -# --dict命令改变字典路径,示例使用默认字典路径则该参数可不填 -python3 generate_multi_language_configs.py -l it \ ---train train_data/train_list.txt \ ---val train_data/val_list.txt \ ---data_dir train_data \ --o Global.use_gpu=False -``` - - -##### 3. 多语言模型与配置文件 - |模型名称|字典文件|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- |--- | --- | | french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt |法文识别|[rec_french_lite_train.yml](../../configs/rec/multi_language/rec_french_lite_train.yml)|2.65M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) | diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index 9df686501de48234dbc1821d7d645d7f12bda21a..74d7004b35b06a79dd0acfe7d443ed128d63df88 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -1,5 +1,8 @@ # PaddleOCR快速开始 + + - [PaddleOCR快速开始](#paddleocr) + + [1. 安装PaddleOCR whl包](#1) * [2. 便捷使用](#2) + [2.1 命令行使用](#21) @@ -8,7 +11,7 @@ - [2.1.3 版面分析](#213) + [2.2 Python脚本使用](#22) - [2.2.1 中英文与多语言使用](#221) - - [2.2.2 版面分析使用](#222) + - [2.2.2 版面分析](#222) @@ -87,7 +90,7 @@ cd /path/to/ppocr_img ``` -更多whl包使用包括, whl包参数说明 +更多whl包使用可参考[whl包文档](./whl.md) @@ -127,8 +130,11 @@ paddleocr --image_dir ./imgs_en/254.jpg --lang=en 全部语种及其对应的缩写列表可查看[多语言模型教程](./multi_languages.md) + #### 2.1.3 版面分析 +版面分析是指对文档图片中的文字、标题、列表、图片和表格5类区域进行划分。对于前三类区域,直接使用OCR模型完成对应区域文字检测与识别,并将结果保存在txt中。对于表格类区域,经过表格结构化处理后,表格图片转换为相同表格样式的Excel文件。图片区域会被单独裁剪成图像。 + 使用PaddleOCR的版面分析功能,需要指定`--type=structure` ```bash @@ -175,7 +181,7 @@ paddleocr --image_dir=./table/1.png --type=structure | table_model_dir | 表格结构模型 inference 模型地址 | None | | table_char_type | 表格结构模型所用字典地址 | ../ppocr/utils/dict/table_structure_dict.txt | - 大部分参数和paddleocr whl包保持一致,见 [whl包文档](../doc/doc_ch/whl.md) + 大部分参数和paddleocr whl包保持一致,见 [whl包文档](./whl.md) @@ -184,7 +190,7 @@ paddleocr --image_dir=./table/1.png --type=structure #### 2.2.1 中英文与多语言使用 -通过脚本使用PaddleOCR whl包。whl包会自动下载ppocr轻量级模型作为默认模型, +通过Python脚本使用PaddleOCR whl包,whl包会自动下载ppocr轻量级模型作为默认模型。 * 检测+方向分类器+识别全流程 @@ -226,7 +232,7 @@ im_show.save('result.jpg') -#### 2.2.2 版面分析使用 +#### 2.2.2 版面分析 ```python import os diff --git a/doc/doc_ch/training.md b/doc/doc_ch/training.md index 9adfd8f14c42fefca5659478e23dd6779d84fd90..fb7f94a9e86cf392421ab6ed6f99cf2d49390096 100644 --- a/doc/doc_ch/training.md +++ b/doc/doc_ch/training.md @@ -8,11 +8,11 @@ * [1.1 学习率](#学习率) * [1.2 正则化](#正则化) * [1.3 评估指标](#评估指标) -- [2. 常见问题](#常见问题) -- [3. 数据与垂类场景](#数据与垂类场景) - * [3.1 训练数据](#训练数据) - * [3.2 垂类场景](#垂类场景) - * [3.3 自己构建数据集](#自己构建数据集) +- [2. 数据与垂类场景](#数据与垂类场景) + * [2.1 训练数据](#训练数据) + * [2.2 垂类场景](#垂类场景) + * [2.3 自己构建数据集](#自己构建数据集) +* [3. 常见问题](#常见问题) ## 1. 基本概念 @@ -63,40 +63,18 @@ Optimizer: (3)端到端统计: 端对端召回率:准确检测并正确识别文本行在全部标注文本行的占比; 端到端准确率:准确检测并正确识别文本行在 检测到的文本行数量 的占比; 准确检测的标准是检测框与标注框的IOU大于某个阈值,正确识别的的检测框中的文本与标注的文本相同。 - -## 2. 常见问题 - -**Q**:训练CRNN识别时,如何选择合适的网络输入shape? - - A:一般高度采用32,最长宽度的选择,有两种方法: - - (1)统计训练样本图像的宽高比分布。最大宽高比的选取考虑满足80%的训练样本。 - - (2)统计训练样本文字数目。最长字符数目的选取考虑满足80%的训练样本。然后中文字符长宽比近似认为是1,英文认为3:1,预估一个最长宽度。 - -**Q**:识别训练时,训练集精度已经到达90了,但验证集精度一直在70,涨不上去怎么办? - - A:训练集精度90,测试集70多的话,应该是过拟合了,有两个可尝试的方法: - - (1)加入更多的增广方式或者调大增广prob的[概率](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341),默认为0.4。 - - (2)调大系统的[l2 dcay值](https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47) - -**Q**: 识别模型训练时,loss能正常下降,但acc一直为0 - - A:识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。 - -## 3. 数据与垂类场景 + +## 2. 数据与垂类场景 -### 3.1 训练数据 +### 2.1 训练数据 目前开源的模型,数据集和量级如下: - 检测: - 英文数据集,ICDAR2015 - 中文数据集,LSVT街景数据集训练数据3w张图片 - + - 识别: - 英文数据集,MJSynth和SynthText合成数据,数据量上千万。 - 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。 @@ -105,13 +83,13 @@ Optimizer: 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer) 、[SynthText](https://github.com/ankush-me/SynthText) 、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) 等。 -### 3.2 垂类场景 +### 2.2 垂类场景 PaddleOCR主要聚焦通用OCR,如果有垂类需求,您可以用PaddleOCR+垂类数据自己训练; 如果缺少带标注的数据,或者不想投入研发成本,建议直接调用开放的API,开放的API覆盖了目前比较常见的一些垂类。 -### 3.3 自己构建数据集 +### 2.3 自己构建数据集 在构建数据集时有几个经验可供参考: @@ -126,3 +104,28 @@ PaddleOCR主要聚焦通用OCR,如果有垂类需求,您可以用PaddleOCR+ a. 人工采集更多的训练数据,最直接也是最有效的方式。 b. 基于PIL和opencv基本图像处理或者变换。例如PIL中ImageFont, Image, ImageDraw三个模块将文字写到背景中,opencv的旋转仿射变换,高斯滤波等。 c. 利用数据生成算法合成数据,例如pix2pix或StyleText等算法。 + + + +## 3. 常见问题 + +**Q**:训练CRNN识别时,如何选择合适的网络输入shape? + + A:一般高度采用32,最长宽度的选择,有两种方法: + + (1)统计训练样本图像的宽高比分布。最大宽高比的选取考虑满足80%的训练样本。 + + (2)统计训练样本文字数目。最长字符数目的选取考虑满足80%的训练样本。然后中文字符长宽比近似认为是1,英文认为3:1,预估一个最长宽度。 + +**Q**:识别训练时,训练集精度已经到达90了,但验证集精度一直在70,涨不上去怎么办? + + A:训练集精度90,测试集70多的话,应该是过拟合了,有两个可尝试的方法: + + (1)加入更多的增广方式或者调大增广prob的[概率](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341),默认为0.4。 + + (2)调大系统的[l2 dcay值](https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47) + +**Q**: 识别模型训练时,loss能正常下降,但acc一直为0 + + A:识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。 + diff --git a/doc/doc_en/inference_ppocr_en.md b/doc/doc_en/inference_ppocr_en.md new file mode 100755 index 0000000000000000000000000000000000000000..5442d0c578027c33890fbb063a0d2c78ecd226c5 --- /dev/null +++ b/doc/doc_en/inference_ppocr_en.md @@ -0,0 +1,135 @@ + +# Reasoning based on Python prediction engine + +This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU. + + +- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE) + +- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE) + - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION) + - [2. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE) + +- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) + +- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION) + + + +## TEXT DETECTION MODEL INFERENCE + +The default configuration is based on the inference setting of the DB text detection model. For lightweight Chinese detection model inference, you can execute the following commands: + +``` +# download DB text detection inference model +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar +tar xf ch_ppocr_mobile_v2.0_det_infer.tar +# predict +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" +``` + +The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: + +![](../imgs_results/det_res_00018069.jpg) + +You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image, +The optional parameters of `limit_type` are [`max`, `min`], and +`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960. + +The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960, +If this value is exceeded, the image will be resized with the same width ratio to ensure that the longest side is `det_limit_side_len`. +Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest side of the image is limited to 960. + +If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216: +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216 +``` + +If you want to use the CPU for prediction, execute the command as follows +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +``` + + + +## TEXT RECOGNITION MODEL INFERENCE + + + +### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE + +For lightweight Chinese recognition model inference, you can execute the following commands: + +``` +# download CRNN text recognition inference model +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar +tar xf ch_ppocr_mobile_v2.0_rec_infer.tar +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_10.png" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer" +``` + +![](../imgs_words_en/word_10.png) + +After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen. + +```bash +Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658) +``` + + + +### 2. MULTILINGAUL MODEL INFERENCE +If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, +You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" +``` +![](../imgs_words/korean/1.jpg) + +After executing the command, the prediction result of the above figure is: + +``` text +Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904) +``` + + + +## ANGLE CLASSIFICATION MODEL INFERENCE + +For angle classification model inference, you can execute the following commands: + + +``` +# download text angle class inference model: +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar +tar xf ch_ppocr_mobile_v2.0_cls_infer.tar +python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words_en/word_10.png" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer" +``` +![](../imgs_words_en/word_10.png) + +After executing the command, the prediction results (classification angle and score) of the above image will be printed on the screen. + +``` + Predicts of ./doc/imgs_words_en/word_10.png:['0', 0.9999995] +``` + + +## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION + +When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default. + +```shell +# use direction classifier +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true + +# not use use direction classifier +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" + +# use multi-process +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6 +``` + + +After executing the command, the recognition result image is as follows: + +![](../imgs_results/system_res_00018069.jpg) diff --git a/doc/doc_en/models_and_config_en.md b/doc/doc_en/models_and_config_en.md index c88120b5531347304976919cc2175aa54c9f5597..f80c00715f148974db616f411740580c281659ed 100644 --- a/doc/doc_en/models_and_config_en.md +++ b/doc/doc_en/models_and_config_en.md @@ -1,5 +1,4 @@ -# CONTENT -- [Paste Your Document In Here](#paste-your-document-in-here) +# PP-OCR Model and Configuration - [INTRODUCTION ABOUT OCR](#introduction-about-ocr) * [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model) * [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model) @@ -8,7 +7,7 @@ * [On the right](#on-the-right) -# INTRODUCTION ABOUT OCR +## 1. INTRODUCTION ABOUT OCR This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model. @@ -17,7 +16,7 @@ OCR (Optical Character Recognition, Optical Character Recognition) is currently OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line. -## BASIC CONCEPTS OF OCR DETECTION MODEL +### 1.1 BASIC CONCEPTS OF OCR DETECTION MODEL Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used. @@ -27,14 +26,14 @@ Text detection algorithms based on deep learning can be roughly divided into the 3. Hybrid target detection and segmentation method. -## Basic concepts of OCR recognition model +### 1.2 Basic concepts of OCR recognition model The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms: 1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on. 2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention. -## PP-OCR model +### 1.3 PP-OCR model PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms. diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index 1f9ee1489a87e5814f672a1615920ded41d41e03..52a2cf4946aa454b2bce0418ce9b34f1dbdd0b70 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -62,45 +62,6 @@ Relationship of the above models is as follows. #### Multilingual Recognition Model(Updating...) -**Note:** The configuration file of the new multi language model is generated by code. You can use the `--help` parameter to check which multi language are supported by current PaddleOCR. - -```bash -# The code needs to run in the specified directory -cd {your/path/}PaddleOCR/configs/rec/multi_language/ -python3 generate_multi_language_configs.py --help -``` - -Take the Italian configuration file as an example: -##### 1.Generate Italian configuration file to test the model provided -you can generate the default configuration file through the following command, and use the default language dictionary provided by paddleocr for prediction. -```bash -# The code needs to run in the specified directory -cd {your/path/}PaddleOCR/configs/rec/multi_language/ -# Set the required language configuration file through -l or --language parameter -# This command will write the default parameter to the configuration file. -python3 generate_multi_language_configs.py -l it -``` -##### 2. Generate Italian configuration file to train your own data -If you want to train your own model, you can prepare the training set file, verification set file, dictionary file and training data path. Here we assume that the Italian training set, verification set, dictionary and training data path are: -- Training set:{your/path/}PaddleOCR/train_data/train_list.txt -- Validation set: {your/path/}PaddleOCR/train_data/val_list.txt -- Use the default dictionary provided by paddleocr:{your/path/}PaddleOCR/ppocr/utils/dict/it_dict.txt -- Training data path:{your/path/}PaddleOCR/train_data -```bash -# The code needs to run in the specified directory -cd {your/path/}PaddleOCR/configs/rec/multi_language/ -# The -l or --language parameter is required -# --train modify train_list path -# --val modify eval_list path -# --data_dir modify data dir -# -o modify default parameters -# --dict Change the dictionary path. The example uses the default dictionary path, so that this parameter can be empty. -python3 generate_multi_language_configs.py -l it \ ---train {path/to/train_list} \ ---val {path/to/val_list} \ ---data_dir {path/to/data_dir} \ --o Global.use_gpu=False -``` |model name| dict file | description|config|model size|download| | --- | --- | --- |--- | --- | --- | | french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt | Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_language/rec_french_lite_train.yml)|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) | diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index 637e9407ccddfbc27b941a99ec5404ba5173e7e8..19d6f327a64134be4ef5763ded7ce3fd5d8590ef 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -8,10 +8,10 @@ + [2.1 Use by command line](#21-use-by-command-line) - [2.1.1 English and Chinese Model](#211-english-and-chinese-model) - [2.1.2 Multi-language Model](#212-multi-language-model) - - [2.1.3 LayoutParser](#213-layoutparser) + - [2.1.3 Layout Analysis](#213-layoutAnalysis) + [2.2 Use by Code](#22-use-by-code) - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model) - - [2.2.2 LayoutParser](#222-layoutparser) + - [2.2.2 Layout Analysis](#222-layoutAnalysis) @@ -132,9 +132,11 @@ Commonly used multilingual abbreviations include | Chinese Traditional | chinese_cht | | Italian | it | | Russian | ru | A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md) - + -#### 2.1.3 LayoutParser +#### 2.1.3 Layout Analysis + +Layout analysis refers to the division of 5 types of areas of the document, including text, title, list, picture and table. For the first three types of regions, directly use the OCR model to complete the text detection and recognition of the corresponding regions, and save the results in txt. For the table area, after the table structuring process, the table picture is converted into an Excel file of the same table style. The picture area will be individually cropped into an image. To use the layout analysis function of PaddleOCR, you need to specify `--type=structure` @@ -219,9 +221,9 @@ Visualization of results
- + -#### 2.2.2 LayoutParser +#### 2.2.2 Layout Analysis ```python import os @@ -248,4 +250,3 @@ im_show = draw_structure_result(image, result,font_path=font_path) im_show = Image.fromarray(im_show) im_show.save('result.jpg') ``` - diff --git a/doc/doc_en/training_en.md b/doc/doc_en/training_en.md index 357645b51679de4eb04cd6fd5456d54ee300c5a3..eaae2d1e31a2849ea4c0d9315d145888aaeca4cf 100644 --- a/doc/doc_en/training_en.md +++ b/doc/doc_en/training_en.md @@ -4,11 +4,11 @@ * [1.1 Learning rate](#11-learning-rate) * [1.2 Regularization](#12-regularization) * [1.3 Evaluation indicators](#13-evaluation-indicators-) -- [2. FAQ](#2-faq) -- [3. Data and vertical scenes](#3-data-and-vertical-scenes) - * [3.1 Training data](#31-training-data) - * [3.2 Vertical scene](#32-vertical-scene) - * [3.3 Build your own data set](#33-build-your-own-data-set) +- [2. Data and vertical scenes](#2-data-and-vertical-scenes) + * [2.1 Training data](#21-training-data) + * [2.2 Vertical scene](#22-vertical-scene) + * [2.3 Build your own data set](#23-build-your-own-data-set) +* [3. FAQ](#3-faq) This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training. @@ -69,34 +69,13 @@ Optimizer: (3) End-to-end statistics: End-to-end recall rate: accurately detect and correctly identify the proportion of text lines in all labeled text lines; End-to-end accuracy rate: accurately detect and correctly identify the number of text lines in the detected text lines The standard for accurate detection is that the IOU of the detection box and the labeled box is greater than a certain threshold, and the text in the correctly identified detection box is the same as the labeled text. - -# 2. FAQ + -**Q**: How to choose a suitable network input shape when training CRNN recognition? - - A: The general height is 32, the longest width is selected, there are two methods: - - (1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples. - - (2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated. - -**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do? - - A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try: - - (1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4. - - (2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47) +# 2. Data and vertical scenes -**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0 - - A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period. + - -# 3. Data and vertical scenes - - -## 3.1 Training data +## 2.1 Training data The current open source models, data sets and magnitudes are as follows: @@ -111,14 +90,16 @@ The current open source models, data sets and magnitudes are as follows: Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc. - -## 3.2 Vertical scene + + +## 2.2 Vertical scene PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself; If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories. - -## 3.3 Build your own data set + + +## 2.3 Build your own data set There are several experiences for reference when constructing the data set: @@ -133,3 +114,28 @@ There are several experiences for reference when constructing the data set: a. Manually collect more training data, the most direct and effective way. b. Basic image processing or transformation based on PIL and opencv. For example, the three modules of ImageFont, Image, ImageDraw in PIL write text into the background, opencv's rotating affine transformation, Gaussian filtering and so on. c. Use data generation algorithms to synthesize data, such as algorithms such as pix2pix. + + + +# 3. FAQ + +**Q**: How to choose a suitable network input shape when training CRNN recognition? + + A: The general height is 32, the longest width is selected, there are two methods: + + (1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples. + + (2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated. + +**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do? + + A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try: + + (1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4. + + (2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47) + +**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0 + + A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period. +