diff --git a/README.md b/README.md index f6a73a7e15739443b5af7d4a73893f63ae1c0a20..4dc8ec48e491e4583524501f1084a87bbe2991d2 100644 --- a/README.md +++ b/README.md @@ -102,7 +102,6 @@ For a new language request, please refer to [Guideline for new language_requests - PP-OCR Industry Landing: from Training to Deployment - [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md) - [PP-OCR Model Download](./doc/doc_en/models_list_en.md) - - [Yml Configuration](./doc/doc_en/config_en.md) - [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md) - [PP-OCR Training](./doc/doc_en/training_en.md) - [Text Detection](./doc/doc_en/detection_en.md) diff --git a/README_ch.md b/README_ch.md index 7091523d2f6e24dd4256aad9e6a978a6929f9443..67973ed4d4d6d9aef5d16d1d7b1ed800c21e3c19 100755 --- a/README_ch.md +++ b/README_ch.md @@ -94,13 +94,16 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - PP-OCR产业落地:从训练到部署 - [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md) - [PP-OCR模型下载](./doc/doc_ch/models_list.md) - - [配置文件内容与生成](./doc/doc_ch/config.md) - [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md) - [PP-OCR模型训练](./doc/doc_ch/training.md) - [文本检测](./doc/doc_ch/detection.md) - [文本识别](./doc/doc_ch/recognition.md) - [文本方向分类器](./doc/doc_ch/angle_class.md) - [配置文件内容与生成](./doc/doc_ch/config.md) + - PP-OCR模型压缩 + - [模型蒸馏](./doc/doc_ch/knowledge_distillation.md) + - [模型量化](./deploy/slim/quantization/README.md) + - [模型裁剪](./deploy/slim/prune/README.md) - PP-OCR模型推理部署 - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [服务化部署](./deploy/pdserving/README_CN.md) diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index dcd0318ed908375c896d7a6730cd72db4cc4b848..600d5bdb120444ec89222360af02adb3f96a8640 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -37,9 +37,10 @@ | checkpoints | 加载模型参数路径 | None | 用于中断后加载参数继续训练 | | use_visualdl | 设置是否启用visualdl进行可视化log展示 | False | [教程地址](https://www.paddlepaddle.org.cn/paddle/visualdl) | | infer_img | 设置预测图像路径或文件夹路径 | ./infer_img | \| -| character_dict_path | 设置字典路径 | ./ppocr/utils/ppocr_keys_v1.txt | 如果为空,则默认使用小写字母+数字作为字典 | +| character_dict_path | 设置字典路径 | ./ppocr/utils/ppocr_keys_v1.txt | \ | | max_text_length | 设置文本最大长度 | 25 | \ | -| use_space_char | 设置是否识别空格 | True | | +| character_type | 设置字符类型 | ch | en/ch, en时将使用默认dict,ch时使用自定义dict| +| use_space_char | 设置是否识别空格 | True | 仅在 character_type=ch 时支持空格 | | label_list | 设置方向分类器支持的角度 | ['0','180'] | 仅在方向分类器中生效 | | save_res_path | 设置检测模型的结果保存地址 | ./output/det_db/predicts_db.txt | 仅在检测模型中生效 | @@ -176,7 +177,7 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi --dict {path/of/dict} \ # 字典文件路径 -o Global.use_gpu=False # 是否使用gpu ... - + ``` 意大利文由拉丁字母组成,因此执行完命令后会得到名为 rec_latin_lite_train.yml 的配置文件。 @@ -190,37 +191,38 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi use_gpu: True epoch_num: 500 ... + character_type: it # 需要识别的语种 character_dict_path: {path/of/dict} # 字典文件所在路径 - + Train: dataset: name: SimpleDataSet data_dir: train_data/ # 数据存放根目录 label_file_list: ["./train_data/train_list.txt"] # 训练集label路径 ... - + Eval: dataset: name: SimpleDataSet data_dir: train_data/ # 数据存放根目录 label_file_list: ["./train_data/val_list.txt"] # 验证集label路径 ... - + ``` 目前PaddleOCR支持的多语言算法有: -| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | -| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | -| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 中文繁体 | -| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语(区分大小写) | -| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | -| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | -| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | -| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | -| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 拉丁字母 | -| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 阿拉伯字母 | -| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 斯拉夫字母 | -| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 | +| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | +| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 中文繁体 | chinese_cht| +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语(区分大小写) | EN | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | french | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | german | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | japan | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | korean | +| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 拉丁字母 | latin | +| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 阿拉伯字母 | ar | +| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 斯拉夫字母 | cyrillic | +| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 | devanagari | 更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99) diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md index dc50e8388128017d608a4ff38471d24bcb143bf3..5ec880b7996015d4baad4e50442ac67f1d260f69 100644 --- a/doc/doc_ch/detection.md +++ b/doc/doc_ch/detection.md @@ -1,22 +1,24 @@ -# 目录 -- [1. 文字检测](#1-----) - * [1.1 数据准备](#11-----) - * [1.2 下载预训练模型](#12--------) - * [1.3 启动训练](#13-----) - * [1.4 断点训练](#14-----) - * [1.5 更换Backbone 训练](#15---backbone---) - * [1.6 指标评估](#16-----) - * [1.7 测试检测效果](#17-------) - * [1.8 转inference模型测试](#18--inference----) -- [2. FAQ](#2-faq) - - - -# 1. 文字检测 +# 文字检测 本节以icdar2015数据集为例,介绍PaddleOCR中检测模型训练、评估、测试的使用方式。 +- [1. 准备数据和模型](#1--------) + * [1.1 数据准备](#11-----) + * [1.2 下载预训练模型](#12--------) +- [2. 开始训练](#2-----) + * [2.1 启动训练](#21-----) + * [2.2 断点训练](#22-----) + * [2.3 更换Backbone 训练](#23---backbone---) +- [3. 模型评估与预测](#3--------) + * [3.1 指标评估](#31-----) + * [3.2 测试检测效果](#32-------) +- [4. 模型导出与预测](#4--------) +- [5. FAQ](#5-faq) + + +# 1. 准备数据和模型 + ## 1.1 数据准备 @@ -83,8 +85,11 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dyg wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams ``` - -## 1.3 启动训练 + +# 2. 开始训练 + + +## 2.1 启动训练 *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* @@ -106,8 +111,8 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 ``` - -## 1.4 断点训练 + +## 2.2 断点训练 如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径: ```shell @@ -116,8 +121,8 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you **注意**:`Global.checkpoints`的优先级高于`Global.pretrained_model`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrained_model`指定的模型。 - -## 1.5 更换Backbone 训练 + +## 2.3 更换Backbone 训练 PaddleOCR将网络划分为四部分,分别在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones-> necks->heads)依次通过这四个部分。 @@ -164,8 +169,11 @@ args1: args1 **注意**:如果要更换网络的其他模块,可以参考[文档](./add_new_algorithm.md)。 - -## 1.6 指标评估 + +# 3. 模型评估与预测 + + +## 3.1 指标评估 PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean(F-Score)。 @@ -177,8 +185,8 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{pat * 注:`box_thresh`、`unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置 - -## 1.7 测试检测效果 + +## 3.2 测试检测效果 测试单张图像的检测效果 ```shell @@ -195,8 +203,8 @@ python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./ python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy" ``` - -## 1.8 转inference模型测试 + +# 4. 模型导出与预测 inference 模型(`paddle.jit.save`保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 @@ -218,10 +226,11 @@ python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./outpu python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True ``` - -# 2. FAQ + +# 5. FAQ Q1: 训练模型转inference 模型之后预测效果不一致? + **A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。以det_mv3_db.yml配置文件训练的模型为例,训练模型、inference模型预测结果不一致问题解决方式如下: - 检查[trained model预处理](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116),和[inference model的预测预处理](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42)函数是否一致。算法在评估的时候,输入图像大小会影响精度,为了和论文保持一致,训练icdar15配置文件中将图像resize到[736, 1280],但是在inference model预测的时候只有一套默认参数,会考虑到预测速度问题,默认限制图像最长边为960做resize的。训练模型预处理和inference模型的预处理函数位于[ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147) - 检查[trained model后处理](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51),和[inference 后处理参数](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50)是否一致。 diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md index 4e0f1d131e2547f0d4a8bdf35c0f4a6f8bf2e7a3..b9be1e4cb2d1b256a05b82ef5d6db49dfcb2f31f 100755 --- a/doc/doc_ch/inference.md +++ b/doc/doc_ch/inference.md @@ -273,7 +273,7 @@ python3 tools/export_model.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o G CRNN 文本识别模型推理,可以执行如下命令: ``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt" +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` ![](../imgs_words_en/word_336.png) @@ -288,7 +288,7 @@ Predicts of ./doc/imgs_words_en/word_336.png:('super', 0.9999073) - 训练时采用的图像分辨率不同,训练上述模型采用的图像分辨率是[3,32,100],而中文模型训练时,为了保证长文本的识别效果,训练时采用的图像分辨率是[3, 32, 320]。预测推理程序默认的的形状参数是训练中文采用的图像分辨率,即[3, 32, 320]。因此,这里推理上述英文模型时,需要通过参数rec_image_shape设置识别图像的形状。 -- 字符列表,DTRB论文中实验只是针对26个小写英文本母和10个数字进行实验,总共36个字符。所有大小字符都转成了小写字符,不在上面列表的字符都忽略,认为是空格。因此这里没有输入字符字典,而是通过如下命令生成字典.因此在推理时需要设置参数rec_char_dict_path,指定为英文字典"./ppocr/utils/ic15_dict.txt"。 +- 字符列表,DTRB论文中实验只是针对26个小写英文本母和10个数字进行实验,总共36个字符。所有大小字符都转成了小写字符,不在上面列表的字符都忽略,认为是空格。因此这里没有输入字符字典,而是通过如下命令生成字典.因此在推理时需要设置参数rec_char_type,指定为英文"en"。 ``` self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" @@ -303,15 +303,15 @@ dict_character = list(self.character_str) python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \ --rec_model_dir="./inference/srn/" \ --rec_image_shape="1, 64, 256" \ - --rec_char_dict_path="./ppocr/utils/ic15_dict.txt" \ + --rec_char_type="en" \ --rec_algorithm="SRN" ``` ### 4. 自定义文本识别字典的推理 -如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径 +如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch` ``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path" +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" ``` @@ -320,7 +320,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png 需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/fonts/` 路径下有默认提供的小语种字体,例如韩文识别: ``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" ``` ![](../imgs_words/korean/1.jpg) @@ -388,7 +388,7 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --de 下面给出基于EAST文本检测和STAR-Net文本识别执行命令: ``` -python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt" +python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` 执行命令后,识别结果图像如下: diff --git a/doc/doc_ch/inference_ppocr.md b/doc/doc_ch/inference_ppocr.md index 493a4c9868621b762895e1ee11f76ac250918453..3e46f17d3a781839dfe5e632f85aabcd03d0fd17 100644 --- a/doc/doc_ch/inference_ppocr.md +++ b/doc/doc_ch/inference_ppocr.md @@ -1,16 +1,13 @@ -# PP-OCR模型库快速推理 +# 基于Python引擎的PP-OCR模型库推理 本文介绍针对PP-OCR模型库的Python推理引擎使用方法,内容依次为文本检测、文本识别、方向分类器以及三者串联在CPU、GPU上的预测方法。 - [1. 文本检测模型推理](#文本检测模型推理) - - [2. 文本识别模型推理](#文本识别模型推理) - [2.1 超轻量中文识别模型推理](#超轻量中文识别模型推理) - [2.2 多语言模型的推理](#多语言模型的推理) - - [3. 方向分类模型推理](#方向分类模型推理) - - [4. 文本检测、方向分类和文字识别串联推理](#文本检测、方向分类和文字识别串联推理) @@ -21,12 +18,15 @@ ``` # 下载超轻量中文检测模型: -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tartar xf ch_ppocr_mobile_v2.0_det_infer.tarpython3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/" +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar +tar xf ch_PP-OCRv2_det_infer.tar +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv2_det_infer/" + ``` 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/det_res_00018069.jpg) +![](../imgs_results/det_res_00018069.jpg) 通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制, `limit_type`可选参数为[`max`, `min`], @@ -39,13 +39,13 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_i 如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以设置det_limit_side_len 为想要的值,比如1216: ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216 +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --det_limit_type=max --det_limit_side_len=1216 ``` 如果想使用CPU进行预测,执行命令如下 ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --use_gpu=False ``` @@ -62,12 +62,12 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_di ``` # 下载超轻量中文识别模型: -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar -tar xf ch_ppocr_mobile_v2.0_rec_infer.tar -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer" +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar +tar xf ch_PP-OCRv2_rec_infer.tar +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./ch_PP-OCRv2_rec_infer/" ``` -![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_4.jpg) +![](../imgs_words/ch/word_4.jpg) 执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: @@ -79,14 +79,13 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.98458153) ### 2.2 多语言模型的推理 -如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果, -需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/fonts/` 路径下有默认提供的小语种字体,例如韩文识别: - +如果您需要预测的是其他语言模型,可以在[此链接](./models_list.md#%E5%A4%9A%E8%AF%AD%E8%A8%80%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9E%8B)中找到对应语言的inference模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/fonts/` 路径下有默认提供的小语种字体,例如韩文识别: ``` +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" ``` -![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/korean/1.jpg) +![](../imgs_words/korean/1.jpg) 执行命令后,上图的预测结果为: @@ -107,7 +106,7 @@ tar xf ch_ppocr_mobile_v2.0_cls_infer.tar python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer" ``` -![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_words/ch/word_1.jpg) +![](../imgs_words/ch/word_1.jpg) 执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下: @@ -123,14 +122,13 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999982] ```shell # 使用方向分类器 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=true # 不使用方向分类器 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=false # 使用多进程 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=false --use_mp=True --total_process_num=6 ``` 执行命令后,识别结果图像如下: -![](/Users/zhulingfeng01/OCR/PaddleOCR/doc/imgs_results/system_res_00018069.jpg) - +![](../imgs_results/system_res_00018069.jpg) diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index 5e78795bcda96cf005f24b97cfcc0a8580b2ae1e..31ab6a2c1c2df6528c3b08c32a39a94115b946b8 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -1,4 +1,4 @@ -## OCR模型列表(V2.1,2021年9月6日更新) +# OCR模型列表(V2.1,2021年9月6日更新) > **说明** > 1. 2.1版模型相比2.0版模型,2.1的模型在模型精度上做了提升 @@ -6,13 +6,13 @@ > 3. 本文档提供的是PPOCR自研模型列表,更多基于公开数据集的算法介绍与预训练模型可以参考:[算法概览文档](./algorithm_overview.md)。 -- [一、文本检测模型](#文本检测模型) -- [二、文本识别模型](#文本识别模型) - - [1. 中文识别模型](#中文识别模型) - - [2. 英文识别模型](#英文识别模型) - - [3. 多语言识别模型](#多语言识别模型) -- [三、文本方向分类模型](#文本方向分类模型) -- [四、Paddle-Lite 模型](#Paddle-Lite模型) +- [1. 文本检测模型](#文本检测模型) +- [2. 文本识别模型](#文本识别模型) + - [2.1 中文识别模型](#中文识别模型) + - [2.2 英文识别模型](#英文识别模型) + - [2.3 多语言识别模型](#多语言识别模型) +- [3. 文本方向分类模型](#文本方向分类模型) +- [4. Paddle-Lite 模型](#Paddle-Lite模型) PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训练模型`、`slim模型`,模型区别说明如下: @@ -29,27 +29,28 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 -### 一、文本检测模型 +## 1. 文本检测模型 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_PP-OCRv2_det_slim|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| -|ch_PP-OCRv2_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| +|ch_PP-OCRv2_det_slim|【最新】slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| +|ch_PP-OCRv2_det|【最新】原始超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| |ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 2.6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| -### 二、文本识别模型 +## 2. 文本识别模型 -#### 1. 中文识别模型 + +### 2.1 中文识别模型 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_PP-OCRv2_rec_slim|slim量化版超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | -|ch_PP-OCRv2_rec|原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | +|ch_PP-OCRv2_rec_slim|【最新】slim量化版超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | +|ch_PP-OCRv2_rec|【最新】原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -57,7 +58,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 **说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。 -#### 2. 英文识别模型 +### 2.2 英文识别模型 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | @@ -65,7 +66,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | -#### 3. 多语言识别模型(更多语言持续更新中...) +### 2.3 多语言识别模型(更多语言持续更新中...) |模型名称|字典文件|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- |--- | --- | @@ -86,7 +87,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 -### 三、文本方向分类模型 +## 3. 文本方向分类模型 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | @@ -95,7 +96,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 -### 四、Paddle-Lite 模型 +## 4. Paddle-Lite 模型 |模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本| |---|---|---|---|---|---|---| diff --git a/doc/doc_ch/pgnet.md b/doc/doc_ch/pgnet.md index e79c2388ea0047a75ceec3dcb3c1d762c3702b2b..9aa7f255e54ce8dec3a20d475cccb71847d95cc7 100644 --- a/doc/doc_ch/pgnet.md +++ b/doc/doc_ch/pgnet.md @@ -28,9 +28,9 @@ PGNet算法细节详见[论文](https://www.aaai.org/AAAI21Papers/AAAI-2885.Wang ### 性能指标 -测试集: Total Text +#### 测试集: Total Text -测试环境: NVIDIA Tesla V100-SXM2-16GB +#### 测试环境: NVIDIA Tesla V100-SXM2-16GB |PGNetA|det_precision|det_recall|det_f_score|e2e_precision|e2e_recall|e2e_f_score|FPS|下载| | --- | --- | --- | --- | --- | --- | --- | --- | --- | @@ -43,7 +43,7 @@ PGNet算法细节详见[论文](https://www.aaai.org/AAAI21Papers/AAAI-2885.Wang ## 二、环境配置 -请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。 +请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《PaddleOCR全景图与项目克隆》](./paddleOCR_overview.md)克隆项目 ## 三、快速使用 @@ -92,7 +92,7 @@ python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/im |- train.txt # total_text数据集的训练标注 ``` -total_text.txt标注文件格式如下,文件名和标注信息中间用"\t"分隔: +train.txt标注文件格式如下,文件名和标注信息中间用"\t"分隔: ``` " 图像文件名 json.dumps编码的图像标注信息" rgb/img11.jpg [{"transcription": "ASRAMA", "points": [[214.0, 325.0], [235.0, 308.0], [259.0, 296.0], [286.0, 291.0], [313.0, 295.0], [338.0, 305.0], [362.0, 320.0], [349.0, 347.0], [330.0, 337.0], [310.0, 329.0], [290.0, 324.0], [269.0, 328.0], [249.0, 336.0], [231.0, 346.0]]}, {...}] diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index 1896d7a137f0768c6b2a8e0c02b18ff61fbfd03c..d9ff5a628fbd8d8effd50fb2b276d89d5e13225a 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -47,10 +47,10 @@ cd /path/to/ppocr_img #### 2.1.1 中英文模型 -* 检测+方向分类器+识别全流程:设置方向分类器参数`--use_angle_cls true`后可对竖排文本进行识别。 +* 检测+方向分类器+识别全流程:`--use_angle_cls true`设置使用方向分类器识别180度旋转文字,`--use_gpu false`设置不使用GPU ```bash - paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true + paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true --use_gpu false ``` 结果是一个list,每个item包含了文本框,文字和识别置信度 diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index bb7d01712a85c92a02109e41814059e6c98c7cdc..b2d33346fe4f401cc8a02f777605851ceb7329d5 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -33,7 +33,7 @@ ln -sf /train_data/dataset mklink /d /train_data/dataset ``` - + ### 1.1 自定义数据集 下面以通用数据集为例, 介绍如何准备数据集: @@ -86,10 +86,7 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单 若您本地没有数据集,可以在官网下载 [ICDAR2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,下载 benchmark 所需的lmdb格式数据集。 -如果希望复现SAR的论文指标,需要下载[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), 提取码:627x。此外,真实数据集icdar2013, icdar2015, cocotext, IIIT5也作为训练数据的一部分。具体数据细节可以参考论文SAR。 - 如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件,通过以下方式下载: - ``` # 训练集标签 wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt @@ -159,6 +156,7 @@ PaddleOCR内置了一部分字典,可以按需使用。 - 自定义字典 如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。 +并将 `character_type` 设置为 `ch`。 ### 1.4 添加空格类别 @@ -232,10 +230,6 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t | rec_r34_vd_tps_bilstm_att.yml | CRNN | Resnet34_vd | TPS | BiLSTM | att | | rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn | | rec_mtb_nrtr.yml | NRTR | nrtr_mtb | None | transformer encoder | transformer decoder | -| rec_r31_sar.yml | SAR | ResNet31 | None | LSTM encoder | LSTM decoder | -| rec_resnet_stn_bilstm_att.yml | SEED | Aster_Resnet | STN | BiLSTM | att | - -*其中SEED模型需要额外加载FastText训练好的[语言模型](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz) 训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: @@ -245,6 +239,8 @@ Global: ... # 添加自定义字典,如修改字典请将路径指向新字典 character_dict_path: ppocr/utils/ppocr_keys_v1.txt + # 修改字符类型 + character_type: ch ... # 识别空格 use_space_char: True @@ -308,18 +304,18 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi 按语系划分,目前PaddleOCR支持的语种有: -| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | -| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | -| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 中文繁体 | -| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语(区分大小写) | -| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | -| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | -| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | -| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | -| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 拉丁字母 | -| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 阿拉伯字母 | -| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 斯拉夫字母 | -| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 | +| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | +| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 中文繁体 | chinese_cht| +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语(区分大小写) | EN | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | french | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | german | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | japan | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | korean | +| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 拉丁字母 | latin | +| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 阿拉伯字母 | ar | +| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 斯拉夫字母 | cyrillic | +| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 | devanagari | 更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99) @@ -460,3 +456,5 @@ python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_trai ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" ``` + + diff --git a/doc/doc_ch/reference.md b/doc/doc_ch/reference.md index f1337dedc96c685173cbcc8450a57c259d2c0029..3347447741a7852b30e793bd0d30696c190598a0 100644 --- a/doc/doc_ch/reference.md +++ b/doc/doc_ch/reference.md @@ -112,4 +112,14 @@ year={2016} } +13.NRTR +@misc{sheng2019nrtr, + title={NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition}, + author={Fenfen Sheng and Zhineng Chen and Bo Xu}, + year={2019}, + eprint={1806.00926}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + ``` diff --git a/doc/doc_ch/training.md b/doc/doc_ch/training.md index d2deea6f205e0402e8c9c0010d5d8dd3e325fa3b..8dc94546cbf0bf683adf35393cbf3a3a6c95f06f 100644 --- a/doc/doc_ch/training.md +++ b/doc/doc_ch/training.md @@ -1,4 +1,4 @@ -# 模型训练 +# PP-OCR模型训练 本文将介绍模型训练时需掌握的基本概念,和训练时的调优方法。 @@ -15,13 +15,19 @@ * [3.3 自己构建数据集](#自己构建数据集) * [4. 常见问题](#常见问题) + + +## 1. 配置文件说明 + +PaddleOCR模型使用配置文件管理网络训练、评估的参数。在配置文件中,可以设置组建模型、优化器、损失函数、模型前后处理的参数,PaddleOCR从配置文件中读取到这些参数,进而组建出完整的训练流程,完成模型训练,在需要对模型进行优化的时,可以通过修改配置文件中的参数完成配置,使用简单且方便修改。 + +完整的配置文件说明可以参考[配置文件](./config.md) + -## 1. 基本概念 -OCR(Optical Character Recognition,光学字符识别)是指对图像进行分析识别处理,获取文字和版面信息的过程,是典型的计算机视觉任务, -通常由文本检测和文本识别两个子任务构成。 +## 2. 基本概念 -模型调优时需要关注以下参数: +模型训练过程中需要手动调整一些超参数,帮助模型以最小的代价获得最优指标。不同的数据量可能需要不同的超参,当您希望在自己的数据上finetune或对模型效果调优时,有以下几个参数调整策略可供参考: ### 2.1 学习率 @@ -131,9 +137,3 @@ PaddleOCR主要聚焦通用OCR,如果有垂类需求,您可以用PaddleOCR+ A:识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。 - -*** -具体的训练教程可点击下方链接跳转: -- [文本检测模型训练](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/detection.md) -- [文本识别模型训练](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/recognition.md) -- [文本方向分类器训练](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/angle_class.md) \ No newline at end of file diff --git a/doc/doc_en/add_new_algorithm_en.md b/doc/doc_en/add_new_algorithm_en.md index dc81f1820f5d72a54f66fcddd716f18e5f6607e4..db72fe7d4b4b5b5d12e06ff34140c9da1186319b 100644 --- a/doc/doc_en/add_new_algorithm_en.md +++ b/doc/doc_en/add_new_algorithm_en.md @@ -1,4 +1,4 @@ -# Add new algorithm +# Add New Algorithm PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms. @@ -263,7 +263,7 @@ Metric: main_indicator: acc ``` -## 优化器 +## Optimizer The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`. diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md index df8a4ce3ef5fbcadb7ebdfd8ddf2bdf59637783e..45aa8784eb7c65347c663eedb0cfb3df1f820dd4 100755 --- a/doc/doc_en/algorithm_overview_en.md +++ b/doc/doc_en/algorithm_overview_en.md @@ -1,5 +1,14 @@ +# Two-stage Algorithm + +- [1. Algorithm Introduction](#1-algorithm-introduction) + * [1.1 Text Detection Algorithm](#11-text-detection-algorithm) + * [1.2 Text Recognition Algorithm](#12-text-recognition-algorithm) +- [2. Training](#2-training) +- [3. Inference](#3-inference) + -## Algorithm introduction + +## 1. Algorithm Introduction This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md). @@ -8,13 +17,13 @@ This tutorial lists the text detection algorithms and text recognition algorithm - [2. Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM) -### 1. Text Detection Algorithm + +### 1.1 Text Detection Algorithm PaddleOCR open source text detection algorithms list: -- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) -- [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [x] SAST([paper](https://arxiv.org/abs/1908.05498)) -- [x] PSE([paper](https://arxiv.org/abs/1903.12473v2)) +- [x] EAST([paper](https://arxiv.org/abs/1704.03155))[2] +- [x] DB([paper](https://arxiv.org/abs/1911.08947))[1] +- [x] SAST([paper](https://arxiv.org/abs/1908.05498))[4] On the ICDAR2015 dataset, the text detection result is as follows: @@ -25,8 +34,6 @@ On the ICDAR2015 dataset, the text detection result is as follows: |DB|ResNet50_vd|86.41%|78.72%|82.38%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| |DB|MobileNetV3|77.29%|73.08%|75.12%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| |SAST|ResNet50_vd|91.39%|83.77%|87.42%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)| -|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)| -|PSE|MobileNetV3|82.20%|70.48%|75.89%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)| On Total-Text dataset, the text detection result is as follows: @@ -41,16 +48,15 @@ On Total-Text dataset, the text detection result is as follows: For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md) -### 2. Text Recognition Algorithm +### 1.2 Text Recognition Algorithm PaddleOCR open-source text recognition algorithms list: -- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) -- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) -- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) -- [x] SRN([paper](https://arxiv.org/abs/2003.12294)) +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))[7] +- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))[10] +- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))[11] +- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12] +- [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5] - [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2)) -- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2)) Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: @@ -66,6 +72,13 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r |RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)| |SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)| |NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) | -|SAR|Resnet31| 87.2% | rec_r31_sar | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) | -Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md) +Please refer to the document for training guide and use of PaddleOCR + +## 2. Training + +For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md). For text recognition algorithms, please refer to [Text recognition model training/evaluation/prediction](./recognition_en.md) + +## 3. Inference + +Except for the PP-OCR series models of the above models, the other models only support inference based on the Python engine. For details, please refer to [Inference based on Python prediction engine](./inference_en.md) diff --git a/doc/doc_en/angle_class_en.md b/doc/doc_en/angle_class_en.md index 46d91bee43de3af99659651b7f31cf1148e7b294..b7fcd63e070318d3aab37714a1213ad9f56cb6fc 100644 --- a/doc/doc_en/angle_class_en.md +++ b/doc/doc_en/angle_class_en.md @@ -15,7 +15,6 @@ The text line image obtained after text detection is sent to the recognition mod Example of 0 and 180 degree data samples: ![](../imgs_results/angle_class_example.jpg) -### DATA PREPARATION ## 2. Data Preparation @@ -131,8 +130,6 @@ python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/ ## 5. Prediction -### PREDICTION - * Training engine prediction Using the model trained by paddleocr, you can quickly get prediction through the following script. diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index ce76da9b2f39532b387e3e45ca2ff497b0408635..aa78263e4b73a3ac35250e5483a394ab77450c90 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -1,4 +1,4 @@ -# Configuration +# Configuration - [1. Optional Parameter List](#1-optional-parameter-list) - [2. Intorduction to Global Parameters of Configuration File](#2-intorduction-to-global-parameters-of-configuration-file) @@ -37,8 +37,9 @@ Take rec_chinese_lite_train_v2.0.yml as an example | checkpoints | set model parameter path | None | Used to load parameters after interruption to continue training| | use_visualdl | Set whether to enable visualdl for visual log display | False | [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) | | infer_img | Set inference image path or folder path | ./infer_img | \| -| character_dict_path | Set dictionary path | ./ppocr/utils/ppocr_keys_v1.txt | If the character_dict_path is None, model can only recognize number and lower letters | +| character_dict_path | Set dictionary path | ./ppocr/utils/ppocr_keys_v1.txt | \ | | max_text_length | Set the maximum length of text | 25 | \ | +| character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch | | use_space_char | Set whether to recognize spaces | True | Only support in character_type=ch mode | | label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in angle classifier model | | save_res_path | Set the save address of the test model results | ./output/det_db/predicts_db.txt | Only valid in the text detection model | @@ -195,39 +196,40 @@ Italian is made up of Latin letters, so after executing the command, you will ge use_gpu: True epoch_num: 500 ... + character_type: it # language character_dict_path: {path/of/dict} # path of dict - + Train: dataset: name: SimpleDataSet data_dir: train_data/ # root directory of training data label_file_list: ["./train_data/train_list.txt"] # train label path ... - + Eval: dataset: name: SimpleDataSet data_dir: train_data/ # root directory of val data label_file_list: ["./train_data/val_list.txt"] # val label path ... - + ``` Currently, the multi-language algorithms supported by PaddleOCR are: -| Configuration file | Algorithm name | backbone | trans | seq | pred | language | -| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | -| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | -| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | -| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | -| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | -| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | -| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | -| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | -| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | -| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | -| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | +| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | +| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht| +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | EN | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | korean | +| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | latin | +| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | ar | +| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | cyrillic | +| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | devanagari | For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations) diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index df96fd5336cd64049e8f5d9b898f60c55b82b7b4..4f8143d03f09a83d51bdbe3a9674bbfe4e3181ef 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -223,6 +223,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./out ## 5. FAQ Q1: The prediction results of trained model and inference model are inconsistent? + **A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows: - Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147) - Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50). diff --git a/doc/doc_en/environment_en.md b/doc/doc_en/environment_en.md index 9aad92cafb809a0eec519808b1c1755403b39318..8ec080f86a51e5a241689d3bda75d0afb614bd0a 100644 --- a/doc/doc_en/environment_en.md +++ b/doc/doc_en/environment_en.md @@ -1,10 +1,6 @@ # Environment Preparation -Recommended working environment: -- PaddlePaddle >= 2.0.0 (2.1.2) -- python3.7 -- CUDA10.1 / CUDA10.2 -- CUDNN 7.6 +Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. If you are familiar with the Python environment, you can skip to step 2 to install PaddlePaddle. * [1. Python Environment Setup](#1) + [1.1 Windows](#1.1) @@ -12,7 +8,6 @@ Recommended working environment: + [1.3 Linux](#1.3) * [2. Install PaddlePaddle 2.0](#2) - ## 1. Python Environment Setup @@ -45,7 +40,7 @@ Recommended working environment: - Check conda to add environment variables and ignore the warning that add conda to path - + #### 1.1.2 Opening the terminal and creating the conda environment @@ -76,7 +71,7 @@ Recommended working environment: # View the current location of python where python ``` - + create environment The above anaconda environment and python environment are installed @@ -140,13 +135,13 @@ The above anaconda environment and python environment are installed # !!! Contents within this block are managed by 'conda init' !!! __conda_setup="$('/Users/xxx/opt/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" if [ $? -eq 0 ]; then - eval "$__conda_setup" + eval "$__conda_setup" else - if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then - . "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" - else - export PATH="/Users/xxx/opt/anaconda3/bin:$PATH" - fi + if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then + . "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" + else + export PATH="/Users/xxx/opt/anaconda3/bin:$PATH" + fi fi unset __conda_setup # <<< conda initialize <<< @@ -204,10 +199,11 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit - **Download Anaconda**. - Download at: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D - + + - Select the appropriate version for your operating system - Type `uname -m` in the terminal to check the command set used by your system @@ -222,12 +218,12 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit sudo yum install wget # CentOS ``` ```bash - # Then use wget to download from Tsinghua source + # Then use wget to download from Tsinghua source # If you want to download Anaconda3-2021.05-Linux-x86_64.sh, the download command is as follows wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh # If you want to download another version, you need to change the file name after the last 1 / to the version you want to download ``` - + - To install Anaconda. - Type `sh Anaconda3-2021.05-Linux-x86_64.sh` at the command line @@ -315,18 +311,11 @@ cd /home/Projects # Create a docker container named ppocr and map the current directory to the /paddle directory of the container # If using CPU, use docker instead of nvidia-docker to create docker -sudo docker run --name ppocr -v $PWD:/paddle --network=host -it registry.baidubce.com/paddlepaddle/paddle:2.1.3-gpu-cuda10.2-cudnn7 /bin/bash +sudo docker run --name ppocr -v $PWD:/paddle --network=host -it registry.baidubce.com/paddlepaddle/paddle:2.1.3-gpu-cuda10.2-cudnn7 /bin/bash # If using GPU, use nvidia-docker to create docker # docker image registry.baidubce.com/paddlepaddle/paddle:2.1.3-gpu-cuda11.2-cudnn8 is recommended for CUDA11.2 + CUDNN8. sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it registry.baidubce.com/paddlepaddle/paddle:2.1.3-gpu-cuda10.2-cudnn7 /bin/bash - -``` -You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine. - -``` -# ctrl+P+Q to exit docker, to re-enter docker using the following command: -sudo docker container exec -it ppocr /bin/bash ``` @@ -346,3 +335,4 @@ python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple ``` For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. + diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md index 019ac4d0ac15aceed89286048d2c4d88a259e501..b445232feeefadc355e0f38b329050e26ccc0368 100755 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -21,7 +21,7 @@ Next, we first introduce how to convert a trained model into an inference model, - [2.2 DB Text Detection Model Inference](#DB_DETECTION) - [2.3 East Text Detection Model Inference](#EAST_DETECTION) - [2.4 Sast Text Detection Model Inference](#SAST_DETECTION) - + - [3. Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE) - [3.1 Lightweight Chinese Text Recognition Model Reference](#LIGHTWEIGHT_RECOGNITION) - [3.2 CTC-Based Text Recognition Model Inference](#CTC-BASED_RECOGNITION) @@ -281,7 +281,7 @@ python3 tools/export_model.py -c configs/det/rec_r34_vd_none_bilstm_ctc.yml -o G For CRNN text recognition model inference, execute the following commands: ``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt" +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` ![](../imgs_words_en/word_336.png) @@ -314,7 +314,7 @@ with the training, such as: --rec_image_shape="1, 64, 256" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \ --rec_model_dir="./inference/srn/" \ --rec_image_shape="1, 64, 256" \ - --rec_char_dict_path="./ppocr/utils/ic15_dict.txt" \ + --rec_char_type="en" \ --rec_algorithm="SRN" ``` @@ -323,7 +323,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch` ``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path" +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" ``` @@ -333,7 +333,7 @@ If you need to predict other language models, when using inference model predict You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition: ``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" ``` ![](../imgs_words/korean/1.jpg) @@ -399,7 +399,7 @@ If you want to try other detection algorithms or recognition algorithms, please The following command uses the combination of the EAST text detection and STAR-Net text recognition: ``` -python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt" +python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` After executing the command, the recognition result image is as follows: diff --git a/doc/doc_en/inference_ppocr_en.md b/doc/doc_en/inference_ppocr_en.md index fa3b1c88713f01e8e411cf95d107b4b58dd7f4e1..62a672885c86119ae56dc93ef76c2bb746084a05 100755 --- a/doc/doc_en/inference_ppocr_en.md +++ b/doc/doc_en/inference_ppocr_en.md @@ -1,17 +1,14 @@ -# Python Inference for PP-OCR Model Library +# Python Inference for PP-OCR Model Zoo This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU. - [Text Detection Model Inference](#DETECTION_MODEL_INFERENCE) - - [Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE) - [1. Lightweight Chinese Recognition Model Inference](#LIGHTWEIGHT_RECOGNITION) - [2. Multilingaul Model Inference](#MULTILINGUAL_MODEL_INFERENCE) - - [Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE) - - [Text Detection Angle Classification and Recognition Inference Concatenation](#CONCATENATION) @@ -22,10 +19,10 @@ The default configuration is based on the inference setting of the DB text detec ``` # download DB text detection inference model -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar -tar xf ch_ppocr_mobile_v2.0_det_infer.tar -# predict -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar +tar xf ch_PP-OCRv2_det_infer.tar +# run inference +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv2_det_infer.tar/" ``` The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: @@ -42,12 +39,12 @@ Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest si If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216: ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216 +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --det_limit_type=max --det_limit_side_len=1216 ``` If you want to use the CPU for prediction, execute the command as follows ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --use_gpu=False ``` @@ -62,9 +59,10 @@ For lightweight Chinese recognition model inference, you can execute the followi ``` # download CRNN text recognition inference model -wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar -tar xf ch_ppocr_mobile_v2.0_rec_infer.tar -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_10.png" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer" +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar +tar xf ch_PP-OCRv2_rec_infer.tar +# run inference +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./ch_PP-OCRv2_rec_infer/" ``` ![](../imgs_words_en/word_10.png) @@ -78,10 +76,12 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658) ### 2. Multilingaul Model Inference -If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, +If you need to predict [other language models](./models_list_en.md#Multilingual), when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition: ``` +wget wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar + python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" ``` ![](../imgs_words/korean/1.jpg) @@ -120,13 +120,13 @@ When performing prediction, you need to specify the path of a single image or a ```shell # use direction classifier -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=true # not use use direction classifier -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=false # use multi-process -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=false --use_mp=True --total_process_num=6 ``` diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index 3b9b5518701f052079af1398a4fa3e3770eb12a1..dbb4860279cd25a888a6bfaf64dbe1952ef54470 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -1,4 +1,4 @@ -## OCR model list(V2.1, updated on 2021.9.6) +# OCR Model List(V2.1, updated on 2021.9.6) > **Note** > 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model is optimized in accuracy and CPU speed. > 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance. @@ -6,9 +6,9 @@ - [1. Text Detection Model](#Detection) - [2. Text Recognition Model](#Recognition) - - [Chinese Recognition Model](#Chinese) - - [English Recognition Model](#English) - - [Multilingual Recognition Model](#Multilingual) + - [2.1 Chinese Recognition Model](#Chinese) + - [2.2 English Recognition Model](#English) + - [2.3 Multilingual Recognition Model](#Multilingual) - [3. Text Angle Classification Model](#Angle) - [4. Paddle-Lite Model](#Paddle-Lite) @@ -25,26 +25,26 @@ Relationship of the above models is as follows. ![](../imgs_en/model_prod_flow_en.png) -### 1. Text Detection Model +## 1. Text Detection Model |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_PP-OCRv2_det_slim|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| -|ch_PP-OCRv2_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| +|ch_PP-OCRv2_det_slim|[New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| +|ch_PP-OCRv2_det|[New] Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| |ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| -### 2. Text Recognition Model +## 2. Text Recognition Model -#### Chinese Recognition Model +### 2.1 Chinese Recognition Model |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | -|ch_PP-OCRv2_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | +|ch_PP-OCRv2_rec_slim|[New] Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | +|ch_PP-OCRv2_rec|[New] Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -53,7 +53,7 @@ Relationship of the above models is as follows. **Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset. -#### English Recognition Model +### 2.2 English Recognition Model |model name|description|config|model size|download| | --- | --- | --- | --- | --- | @@ -61,7 +61,7 @@ Relationship of the above models is as follows. |en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | -#### Multilingual Recognition Model(Updating...) +### 2.3 Multilingual Recognition Model(Updating...) |model name| dict file | description|config|model size|download| | --- | --- | --- |--- | --- | --- | @@ -82,7 +82,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l -### 3. Text Angle Classification Model +## 3. Text Angle Classification Model |model name|description|config|model size|download| | --- | --- | --- | --- | --- | @@ -90,7 +90,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l |ch_ppocr_mobile_v2.0_cls|Original model for text angle classification|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | -### 4. Paddle-Lite Model +## 4. Paddle-Lite Model |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch| |---|---|---|---|---|---|---| |PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| diff --git a/doc/doc_en/pgnet_en.md b/doc/doc_en/pgnet_en.md index 9c46802b5b5273110a80cd5da8916f43e5f4883f..d2c6b30248ebad920c41ca53ee38cce828dddb8c 100644 --- a/doc/doc_en/pgnet_en.md +++ b/doc/doc_en/pgnet_en.md @@ -24,9 +24,9 @@ The results of detection and recognition are as follows: ![](../imgs_results/e2e_res_img293_pgnet.png) ![](../imgs_results/e2e_res_img295_pgnet.png) ### Performance -####Test set: Total Text +#### Test set: Total Text -####Test environment: NVIDIA Tesla V100-SXM2-16GB +#### Test environment: NVIDIA Tesla V100-SXM2-16GB |PGNetA|det_precision|det_recall|det_f_score|e2e_precision|e2e_recall|e2e_f_score|FPS|download| | --- | --- | --- | --- | --- | --- | --- | --- | --- | |Paper|85.30|86.80|86.1|-|-|61.7|38.20 (size=640)|-| @@ -36,7 +36,7 @@ The results of detection and recognition are as follows: ## 2. Environment Configuration -Please refer to [Quick Installation](./installation_en.md) Configure the PaddleOCR running environment. +Please refer to [Operation Environment Preparation](./environment_en.md) to configure PaddleOCR operating environment first, refer to [PaddleOCR Overview and Project Clone](./paddleOCR_overview_en.md) to clone the project ## 3. Quick Use diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index 0055d8f7a89d0d218d001ea94fd4c620de5d037f..9ed83aceb9f562ac3099f22eaf264b966c0d48c7 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -53,10 +53,10 @@ If you do not use the provided test image, you can replace the following `--imag #### 2.1.1 Chinese and English Model -* Detection, direction classification and recognition: set the direction classifier parameter`--use_angle_cls true` to recognize vertical text. +* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device ```bash - paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en + paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false ``` Output will be a list, each item contains bounding box, text and recognition confidence diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 51857ba16b7773ef38452fad6aa070f2117a9086..0d42f3a768da7bf39e0e1512ce61f9d1965da6fe 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -91,8 +91,6 @@ Similar to the training set, the test set also needs to be provided a folder con If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark -If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR. - PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways: ``` @@ -161,7 +159,7 @@ The current multi-language model is still in the demo stage and will continue to If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo. -To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` . +To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. - Custom dictionary @@ -172,6 +170,8 @@ If you need to customize dic file, please add character_dict_path field in confi If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. +**Note: use_space_char only takes effect when character_type=ch** + ## 2.Training @@ -235,8 +235,6 @@ If the evaluation set is large, the test will be time-consuming. It is recommend | rec_r34_vd_tps_bilstm_att.yml | CRNN | Resnet34_vd | TPS | BiLSTM | att | | rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn | | rec_mtb_nrtr.yml | NRTR | nrtr_mtb | None | transformer encoder | transformer decoder | -| rec_r31_sar.yml | SAR | ResNet31 | None | LSTM encoder | LSTM decoder | - For training Chinese data, it is recommended to use [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: @@ -248,6 +246,7 @@ Global: # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary character_dict_path: ppocr/utils/ppocr_keys_v1.txt # Modify character type + character_type: ch ... # Whether to recognize spaces use_space_char: True @@ -309,18 +308,18 @@ Eval: Currently, the multi-language algorithms supported by PaddleOCR are: -| Configuration file | Algorithm name | backbone | trans | seq | pred | language | -| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | -| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | -| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | -| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | -| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | -| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | -| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | -| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | -| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | -| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | -| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | +| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | +| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht| +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | EN | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | korean | +| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | latin | +| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | ar | +| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | cyrillic | +| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | devanagari | For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations) @@ -468,3 +467,6 @@ inference/det_db/ ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" ``` + + + diff --git a/doc/doc_en/training_en.md b/doc/doc_en/training_en.md index aa5500ac88fef97829b4f19c5421e36f18ae1812..cb7996ee0c17a4e964727ccfd193ba65c1415ba3 100644 --- a/doc/doc_en/training_en.md +++ b/doc/doc_en/training_en.md @@ -25,11 +25,10 @@ The PaddleOCR model uses configuration files to manage network training and eval For the complete configuration file description, please refer to [Configuration File](./config_en.md) -# 1. Basic concepts ## 2. Basic Concepts -The following parameters need to be paid attention to when tuning the model: +In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference: ### 2.1 Learning Rate @@ -53,7 +52,7 @@ and the learning rate is the same in each stage. warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py). -## 1.2 Regularization +### 2.2 Regularization Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods. L1 and L2 regularization are the most commonly used regularization methods. @@ -125,7 +124,7 @@ There are several experiences for reference when constructing the data set: -# 3. FAQ +## 4. FAQ **Q**: How to choose a suitable network input shape when training CRNN recognition? @@ -147,9 +146,3 @@ There are several experiences for reference when constructing the data set: A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period. - -*** -Click the following links for detailed training tutorial: -- [text detection model training](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/detection.md) -- [text recognition model training](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/recognition.md) -- [text direction classification model training](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/angle_class.md)