diff --git a/README.md b/README.md index f4556441bce2c3e0ed22e2ea26aa3e3f1ef2245a..f72fb0c99eacf0dea5b1889da8466a4d8b44b075 100644 --- a/README.md +++ b/README.md @@ -4,100 +4,42 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 **近期更新** +- 2020.7.9 添加支持空格的识别模型,[识别效果](#支持空格的中文OCR效果展示) +- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md) - 2020.6.8 添加[数据集](./doc/doc_ch/datasets.md),并保持持续更新 - 2020.6.5 支持 `attetnion` 模型导出 `inference_model` - 2020.6.5 支持单独预测识别时,输出结果得分 -- 2020.5.30 提供超轻量级中文OCR在线体验 -- 2020.5.30 模型预测、训练支持Windows系统 - [more](./doc/doc_ch/update.md) ## 特性 -- 超轻量级中文OCR,总模型仅8.6M +- 超轻量级中文OCR模型,总模型仅8.6M - 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 - 检测模型DB(4.1M)+识别模型CRNN(4.5M) +- 实用通用中文OCR模型 +- 多种预测推理部署方案,包括服务部署和端测部署 - 多种文本检测训练算法,EAST、DB - 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE +- 可运行于Linux、Windows、MacOS等多种系统 -### 支持的中文模型列表: - -|模型名称|模型简介|检测模型地址|识别模型地址| -|-|-|-|-| -|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| -|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| - -超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr - -**也可以按如下教程快速体验超轻量级中文OCR和通用中文OCR模型。** - -## **超轻量级中文OCR以及通用中文OCR体验** +## 快速体验 ![](doc/imgs_results/11.jpg) -上图是超轻量级中文OCR模型效果展示,更多效果图请见文末[超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示)和[通用中文OCR效果展示](#通用中文OCR效果展示)。 - -#### 1.环境配置 - -请先参考[快速安装](./doc/doc_ch/installation.md)配置PaddleOCR运行环境。 - -#### 2.inference模型下载 - -*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下* - - -#### (1)超轻量级中文OCR模型下载 -``` -mkdir inference && cd inference -# 下载超轻量级中文OCR模型的检测模型并解压 -wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar -# 下载超轻量级中文OCR模型的识别模型并解压 -wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar -cd .. -``` -#### (2)通用中文OCR模型下载 -``` -mkdir inference && cd inference -# 下载通用中文OCR模型的检测模型并解压 -wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar -# 下载通用中文OCR模型的识别模型并解压 -wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar -cd .. -``` - -#### 3.单张图像或者图像集合预测 - -以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 - -```bash - -# 预测image_dir指定的单张图像 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" - -# 预测image_dir指定的图像集合 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" +上图是超轻量级中文OCR模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 -# 如果想使用CPU进行预测,需设置use_gpu参数为False -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False -``` +- 超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr -通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: -``` -# 预测image_dir指定的单张图像 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" -``` +- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md) -更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/doc_ch/inference.md)。 +## 中文OCR模型列表 -## 文档教程 -- [快速安装](./doc/doc_ch/installation.md) -- [文本检测模型训练/评估/预测](./doc/doc_ch/detection.md) -- [文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md) -- [基于预测引擎推理](./doc/doc_ch/inference.md) -- [数据集](./doc/doc_ch/datasets.md) -- [FAQ](#FAQ) -- [联系我们](#欢迎加入PaddleOCR技术交流群) -- [参考文献](#参考文献) +|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| +|-|-|-|-|-| +|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) -## 文本检测算法 +## 算法介绍 +### 1.文本检测算法 PaddleOCR开源的文本检测算法列表: - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) @@ -121,9 +63,9 @@ PaddleOCR开源的文本检测算法列表: * 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 -PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/doc_ch/detection.md)。 +PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)。 -## 文本识别算法 +### 2.文本识别算法 PaddleOCR开源的文本识别算法列表: - [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) @@ -151,27 +93,49 @@ PaddleOCR开源的文本识别算法列表: |超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| |通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| -PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md)。 +PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)。 -## 端到端OCR算法 +### 3.端到端OCR算法 - [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon) +## 文档教程 +- [快速安装](./doc/doc_ch/installation.md) +- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md) +- 模型训练/评估 + - [文本检测](./doc/doc_ch/detection.md) + - [文本识别](./doc/doc_ch/recognition.md) + - [yml参数配置文件介绍](./doc/doc_ch/config.md) +- 预测部署 + - [基于Python预测引擎推理](./doc/doc_ch/inference.md) + - 基于C++预测引擎推理(comming soon) + - [服务部署](./doc/doc_ch/serving.md) + - 端侧部署(comming soon) +- [数据集](./doc/doc_ch/datasets.md) +- [FAQ](#FAQ) +- 效果展示 + - [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示) + - [通用中文OCR效果展示](#通用中文OCR效果展示) + - [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示) +- [技术交流群](#欢迎加入PaddleOCR技术交流群) +- [参考文献](./doc/doc_ch/reference.md) +- [许可证书](#许可证书) +- [贡献代码](#贡献代码) + +## 效果展示 + -## 超轻量级中文OCR效果展示 -![](doc/imgs_results/1.jpg) +### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md) + ![](doc/imgs_results/7.jpg) -![](doc/imgs_results/12.jpg) -![](doc/imgs_results/4.jpg) -![](doc/imgs_results/6.jpg) -![](doc/imgs_results/9.jpg) -![](doc/imgs_results/16.png) -![](doc/imgs_results/22.jpg) -## 通用中文OCR效果展示 +### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md) ![](doc/imgs_results/chinese_db_crnn_server/11.jpg) -![](doc/imgs_results/chinese_db_crnn_server/2.jpg) -![](doc/imgs_results/chinese_db_crnn_server/8.jpg) + + +### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md) + +![](doc/imgs_results/chinese_db_crnn_server/en_paper.jpg) ## FAQ @@ -194,65 +158,11 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识 扫描二维码或者加微信:paddlehelp,备注OCR,小助手拉你进群~ - -## 参考文献 -``` -1. EAST: -@inproceedings{zhou2017east, - title={EAST: an efficient and accurate scene text detector}, - author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun}, - booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition}, - pages={5551--5560}, - year={2017} -} - -2. DB: -@article{liao2019real, - title={Real-time Scene Text Detection with Differentiable Binarization}, - author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang}, - journal={arXiv preprint arXiv:1911.08947}, - year={2019} -} - -3. DTRB: -@inproceedings{baek2019wrong, - title={What is wrong with scene text recognition model comparisons? dataset and model analysis}, - author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk}, - booktitle={Proceedings of the IEEE International Conference on Computer Vision}, - pages={4715--4723}, - year={2019} -} - -4. SAST: -@inproceedings{wang2019single, - title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning}, - author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming}, - booktitle={Proceedings of the 27th ACM International Conference on Multimedia}, - pages={1277--1285}, - year={2019} -} - -5. SRN: -@article{yu2020towards, - title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks}, - author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui}, - journal={arXiv preprint arXiv:2003.12294}, - year={2020} -} - -6. end2end-psl: -@inproceedings{sun2019chinese, - title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning}, - author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo}, - booktitle={Proceedings of the IEEE International Conference on Computer Vision}, - pages={9086--9095}, - year={2019} -} -``` - + ## 许可证书 本项目的发布受Apache 2.0 license许可认证。 + ## 贡献代码 我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。 diff --git a/README_en.md b/README_en.md index 610c25b7eb1ac7c043e39420f777bb3346c3bb08..24653eb5c929faa130cd90cba6a328ab8171bdcf 100644 --- a/README_en.md +++ b/README_en.md @@ -3,12 +3,12 @@ English | [简体中文](README.md) ## INTRODUCTION PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice. -**Recent updates** +**Recent updates**、 +- 2020.7.9 Add recognition model to support space, [recognition result](#space Chinese OCR results) +- 2020.7.9 Add data auguments and learning rate decay strategies,please read [config](./doc/doc_en/config_en.md) - 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating - 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support separate prediction and recognition, output result score -- 2020.5.30 Provide lightweight Chinese OCR online experience -- 2020.5.30 Model prediction and training supported on Windows system - [more](./doc/doc_en/update_en.md) ## FEATURES @@ -18,12 +18,13 @@ PaddleOCR aims to create a rich, leading, and practical OCR tools that help user - Various text detection algorithms: EAST, DB - Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE + ### Supported Chinese models list: -|Model Name|Description |Detection Model link|Recognition Model link| -|-|-|-|-| -|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| -|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| +|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link| +|-|-|-|-|-| +|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr @@ -34,7 +35,7 @@ For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/o ![](doc/imgs_results/11.jpg) -The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results). +The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) , [General Chinese OCR results](#General-Chinese-OCR-results) and [Support for space Recognition Model](#Space-Chinese-OCR-results). #### 1. ENVIRONMENT CONFIGURATION @@ -45,22 +46,42 @@ Please see [Quick installation](./doc/doc_en/installation_en.md) #### (1) Download lightweight Chinese OCR models *If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory* +Copy the detection and recognition 'inference model' address in [Chinese model List](#Supported-Chinese-model-list), download and unpack: + +``` +mkdir inference && cd inference +# Download the detection part of the Chinese OCR and decompress it +wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} +# Download the recognition part of the Chinese OCR and decompress it +wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package} +cd .. +``` + +Take lightweight Chinese OCR model as an example: + ``` mkdir inference && cd inference # Download the detection part of the lightweight Chinese OCR and decompress it wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar # Download the recognition part of the lightweight Chinese OCR and decompress it wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar +# Download the space-recognized part of the lightweight Chinese OCR and decompress it +wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar && tar xf ch_rec_mv3_crnn_enhance_infer.tar + cd .. ``` -#### (2) Download General Chinese OCR models + +After the decompression is completed, the file structure should be as follows: + ``` -mkdir inference && cd inference -# Download the detection part of the general Chinese OCR model and decompress it -wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar -# Download the recognition part of the generic Chinese OCR model and decompress it -wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar -cd .. +|-inference + |-ch_rec_mv3_crnn + |- model + |- params + |-ch_det_mv3_db + |- model + |- params + ... ``` #### 3. SINGLE IMAGE AND BATCH PREDICTION @@ -85,6 +106,13 @@ To run inference of the Generic Chinese OCR model, follow these steps above to d python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" ``` +To run inference of the space-Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows: + +``` +# Prediction on a single image by specifying image path to image_dir +python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" +``` + For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md) ## DOCUMENTATION @@ -92,7 +120,9 @@ For more text detection and recognition models, please refer to the document [In - [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) - [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) - [Inference](./doc/doc_en/inference_en.md) +- [Introduction of yml file](./doc/doc_en/config_en.md) - [Dataset](./doc/doc_en/datasets_en.md) +- [FAQ]((#FAQ) ## TEXT DETECTION ALGORITHM @@ -145,15 +175,15 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows: |Model|Backbone|Configuration file|Pre-trained model| |-|-|-|-| -|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| -|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| +|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)| +|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)| Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) ## END-TO-END OCR ALGORITHM - [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon) - + ## LIGHTWEIGHT CHINESE OCR RESULTS ![](doc/imgs_results/1.jpg) ![](doc/imgs_results/7.jpg) @@ -164,12 +194,24 @@ Please refer to the document for training guide and use of PaddleOCR text recogn ![](doc/imgs_results/16.png) ![](doc/imgs_results/22.jpg) - + ## General Chinese OCR results ![](doc/imgs_results/chinese_db_crnn_server/11.jpg) ![](doc/imgs_results/chinese_db_crnn_server/2.jpg) ![](doc/imgs_results/chinese_db_crnn_server/8.jpg) + + +## space Chinese OCR results + +### LIGHTWEIGHT CHINESE OCR RESULTS + +![](doc/imgs_results/img_11.jpg) + +### General Chinese OCR results +![](doc/imgs_results/chinese_db_crnn_server/en_paper.jpg) + + ## FAQ 1. Error when using attention-based recognition model: KeyError: 'predict' diff --git a/configs/det/det_mv3_db.yml b/configs/det/det_mv3_db.yml index 8efa66a92d6e8c031efef48d738d9690bcc5554c..caa7bd4fa09752cff8b4d596e80b5729cce175bf 100755 --- a/configs/det/det_mv3_db.yml +++ b/configs/det/det_mv3_db.yml @@ -6,7 +6,8 @@ Global: print_batch_step: 2 save_model_dir: ./output/det_db/ save_epoch_step: 200 - eval_batch_step: 5000 + # evaluation is run every 5000 iterations after the 4000th iteration + eval_batch_step: [4000, 5000] train_batch_size_per_card: 16 test_batch_size_per_card: 16 image_shape: [3, 640, 640] @@ -50,4 +51,4 @@ PostProcess: thresh: 0.3 box_thresh: 0.7 max_candidates: 1000 - unclip_ratio: 2.0 \ No newline at end of file + unclip_ratio: 2.0 diff --git a/configs/det/det_mv3_east.yml b/configs/det/det_mv3_east.yml index b6f37256291912757cd1d5b98d1f745d08452fd6..67b82fffff8c47e5ee5866ad22f238ece3822776 100755 --- a/configs/det/det_mv3_east.yml +++ b/configs/det/det_mv3_east.yml @@ -6,7 +6,7 @@ Global: print_batch_step: 5 save_model_dir: ./output/det_east/ save_epoch_step: 200 - eval_batch_step: 5000 + eval_batch_step: [5000, 5000] train_batch_size_per_card: 16 test_batch_size_per_card: 16 image_shape: [3, 512, 512] diff --git a/configs/det/det_r50_vd_db.yml b/configs/det/det_r50_vd_db.yml index 6e3b3b9e264b29fcac2b2b9b20ee2f88d5c975f3..9a3b77e7cebce99f669d0b1be89ee56c84f41034 100755 --- a/configs/det/det_r50_vd_db.yml +++ b/configs/det/det_r50_vd_db.yml @@ -6,7 +6,7 @@ Global: print_batch_step: 2 save_model_dir: ./output/det_db/ save_epoch_step: 200 - eval_batch_step: 5000 + eval_batch_step: [5000, 5000] train_batch_size_per_card: 8 test_batch_size_per_card: 16 image_shape: [3, 640, 640] diff --git a/configs/det/det_r50_vd_east.yml b/configs/det/det_r50_vd_east.yml index bb16f9fa12424db293ba498e78b00f279f1a7ff6..8d86819937c902e47dded38ae0238fb8254d8ff0 100755 --- a/configs/det/det_r50_vd_east.yml +++ b/configs/det/det_r50_vd_east.yml @@ -6,7 +6,7 @@ Global: print_batch_step: 5 save_model_dir: ./output/det_east/ save_epoch_step: 200 - eval_batch_step: 5000 + eval_batch_step: [5000, 5000] train_batch_size_per_card: 8 test_batch_size_per_card: 16 image_shape: [3, 512, 512] diff --git a/configs/rec/rec_chinese_common_train.yml b/configs/rec/rec_chinese_common_train.yml index af56dca26911bfcf8bbc361c7d506cb6980618db..0d897459e0a631a4ac1fa10973f18e8640078c1b 100644 --- a/configs/rec/rec_chinese_common_train.yml +++ b/configs/rec/rec_chinese_common_train.yml @@ -14,6 +14,8 @@ Global: character_type: ch character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt loss_type: ctc + distort: false + use_space_char: false reader_yml: ./configs/rec/rec_chinese_reader.yml pretrain_weights: checkpoints: diff --git a/configs/rec/rec_chinese_lite_train.yml b/configs/rec/rec_chinese_lite_train.yml index b64313a1b8f24cf4bcb1c20c9491ae8b00250fdb..95a39a3b4d349973356594e15a23f951e27dc7c5 100755 --- a/configs/rec/rec_chinese_lite_train.yml +++ b/configs/rec/rec_chinese_lite_train.yml @@ -14,6 +14,8 @@ Global: character_type: ch character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt loss_type: ctc + distort: false + use_space_char: false reader_yml: ./configs/rec/rec_chinese_reader.yml pretrain_weights: checkpoints: diff --git a/configs/rec/rec_icdar15_train.yml b/configs/rec/rec_icdar15_train.yml index 8aa96160f4182f94b79cf0340ade7698d4bf7e55..d0b75628c58833447333de36490141847f1815e4 100755 --- a/configs/rec/rec_icdar15_train.yml +++ b/configs/rec/rec_icdar15_train.yml @@ -13,6 +13,7 @@ Global: max_text_length: 25 character_type: en loss_type: ctc + distort: true reader_yml: ./configs/rec/rec_icdar15_reader.yml pretrain_weights: ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy checkpoints: diff --git a/deploy/ocr_hubserving/ocr_det/__init__.py b/deploy/ocr_hubserving/ocr_det/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/deploy/ocr_hubserving/ocr_det/config.json b/deploy/ocr_hubserving/ocr_det/config.json new file mode 100644 index 0000000000000000000000000000000000000000..f995d0ed2bd4da3aaf39e654a5c7ab51e377e367 --- /dev/null +++ b/deploy/ocr_hubserving/ocr_det/config.json @@ -0,0 +1,14 @@ +{ + "modules_info": { + "ocr_det": { + "init_args": { + "version": "1.0.0", + "det_model_dir": "./inference/ch_det_mv3_db/", + "use_gpu": true + }, + "predict_args": { + "visualization": false + } + } + } +} diff --git a/deploy/ocr_hubserving/ocr_det/module.py b/deploy/ocr_hubserving/ocr_det/module.py new file mode 100644 index 0000000000000000000000000000000000000000..0ee32d38e5b6b4502592b62a3f129a0e11a8cd7a --- /dev/null +++ b/deploy/ocr_hubserving/ocr_det/module.py @@ -0,0 +1,160 @@ +# -*- coding:utf-8 -*- +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import argparse +import ast +import copy +import math +import os +import time + +from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor +from paddlehub.common.logger import logger +from paddlehub.module.module import moduleinfo, runnable, serving +from PIL import Image +import cv2 +import numpy as np +import paddle.fluid as fluid +import paddlehub as hub + +from tools.infer.utility import draw_boxes, base64_to_cv2 +from tools.infer.predict_det import TextDetector + +class Config(object): + pass + +@moduleinfo( + name="ocr_det", + version="1.0.0", + summary="ocr detection service", + author="paddle-dev", + author_email="paddle-dev@baidu.com", + type="cv/text_recognition") +class OCRDet(hub.Module): + def _initialize(self, + det_model_dir="", + det_algorithm="DB", + use_gpu=False + ): + """ + initialize with the necessary elements + """ + self.config = Config() + self.config.use_gpu = use_gpu + if use_gpu: + try: + _places = os.environ["CUDA_VISIBLE_DEVICES"] + int(_places[0]) + print("use gpu: ", use_gpu) + print("CUDA_VISIBLE_DEVICES: ", _places) + except: + raise RuntimeError( + "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." + ) + self.config.ir_optim = True + self.config.gpu_mem = 8000 + + #params for text detector + self.config.det_algorithm = det_algorithm + self.config.det_model_dir = det_model_dir + # self.config.det_model_dir = "./inference/det/" + + #DB parmas + self.config.det_db_thresh =0.3 + self.config.det_db_box_thresh =0.5 + self.config.det_db_unclip_ratio =2.0 + + #EAST parmas + self.config.det_east_score_thresh = 0.8 + self.config.det_east_cover_thresh = 0.1 + self.config.det_east_nms_thresh = 0.2 + + def read_images(self, paths=[]): + images = [] + for img_path in paths: + assert os.path.isfile( + img_path), "The {} isn't a valid file.".format(img_path) + img = cv2.imread(img_path) + if img is None: + logger.info("error in loading image:{}".format(img_path)) + continue + images.append(img) + return images + + def det_text(self, + images=[], + paths=[], + det_max_side_len=960, + draw_img_save='ocr_det_result', + visualization=False): + """ + Get the text box in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + use_gpu (bool): Whether to use gpu. Default false. + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + box_thresh(float): the threshold of the detected text box's confidence + Returns: + res (list): The result of text detection box and save path of images. + """ + + if images != [] and isinstance(images, list) and paths == []: + predicted_data = images + elif images == [] and isinstance(paths, list) and paths != []: + predicted_data = self.read_images(paths) + else: + raise TypeError("The input data is inconsistent with expectations.") + + assert predicted_data != [], "There is not any image to be predicted. Please check the input data." + + self.config.det_max_side_len = det_max_side_len + text_detector = TextDetector(self.config) + all_results = [] + for img in predicted_data: + result = {'save_path': ''} + if img is None: + logger.info("error in loading image") + result['data'] = [] + all_results.append(result) + continue + dt_boxes, elapse = text_detector(img) + print("Predict time : ", elapse) + result['data'] = dt_boxes.astype(np.int).tolist() + + if visualization: + image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) + draw_img = draw_boxes(image, dt_boxes) + draw_img = np.array(draw_img) + if not os.path.exists(draw_img_save): + os.makedirs(draw_img_save) + saved_name = 'ndarray_{}.jpg'.format(time.time()) + save_file_path = os.path.join(draw_img_save, saved_name) + cv2.imwrite(save_file_path, draw_img[:, :, ::-1]) + print("The visualized image saved in {}".format(save_file_path)) + result['save_path'] = save_file_path + + all_results.append(result) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.det_text(images_decode, **kwargs) + return results + + +if __name__ == '__main__': + ocr = OCRDet() + image_path = [ + './doc/imgs/11.jpg', + './doc/imgs/12.jpg', + ] + res = ocr.det_text(paths=image_path, visualization=True) + print(res) \ No newline at end of file diff --git a/deploy/ocr_hubserving/ocr_rec/__init__.py b/deploy/ocr_hubserving/ocr_rec/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/deploy/ocr_hubserving/ocr_rec/config.json b/deploy/ocr_hubserving/ocr_rec/config.json new file mode 100644 index 0000000000000000000000000000000000000000..2cfbc0b558d49d54c341506f0e8789578e1b42cd --- /dev/null +++ b/deploy/ocr_hubserving/ocr_rec/config.json @@ -0,0 +1,13 @@ +{ + "modules_info": { + "ocr_rec": { + "init_args": { + "version": "1.0.0", + "det_model_dir": "./inference/ch_rec_mv3_crnn/", + "use_gpu": true + }, + "predict_args": { + } + } + } +} diff --git a/deploy/ocr_hubserving/ocr_rec/module.py b/deploy/ocr_hubserving/ocr_rec/module.py new file mode 100644 index 0000000000000000000000000000000000000000..b50016a37fc44291b5aa01bdf2b55bdab11c8fe5 --- /dev/null +++ b/deploy/ocr_hubserving/ocr_rec/module.py @@ -0,0 +1,136 @@ +# -*- coding:utf-8 -*- +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import argparse +import ast +import copy +import math +import os +import time + +from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor +from paddlehub.common.logger import logger +from paddlehub.module.module import moduleinfo, runnable, serving +from PIL import Image +import cv2 +import numpy as np +import paddle.fluid as fluid +import paddlehub as hub + +from tools.infer.utility import base64_to_cv2 +from tools.infer.predict_rec import TextRecognizer + +class Config(object): + pass + +@moduleinfo( + name="ocr_rec", + version="1.0.0", + summary="ocr recognition service", + author="paddle-dev", + author_email="paddle-dev@baidu.com", + type="cv/text_recognition") +class OCRRec(hub.Module): + def _initialize(self, + rec_model_dir="", + rec_algorithm="CRNN", + rec_char_dict_path="./ppocr/utils/ppocr_keys_v1.txt", + rec_batch_num=30, + use_gpu=False + ): + """ + initialize with the necessary elements + """ + self.config = Config() + self.config.use_gpu = use_gpu + if use_gpu: + try: + _places = os.environ["CUDA_VISIBLE_DEVICES"] + int(_places[0]) + print("use gpu: ", use_gpu) + print("CUDA_VISIBLE_DEVICES: ", _places) + except: + raise RuntimeError( + "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." + ) + self.config.ir_optim = True + self.config.gpu_mem = 8000 + + #params for text recognizer + self.config.rec_algorithm = rec_algorithm + self.config.rec_model_dir = rec_model_dir + # self.config.rec_model_dir = "./inference/rec/" + + self.config.rec_image_shape = "3, 32, 320" + self.config.rec_char_type = 'ch' + self.config.rec_batch_num = rec_batch_num + self.config.rec_char_dict_path = rec_char_dict_path + self.config.use_space_char = True + + def read_images(self, paths=[]): + images = [] + for img_path in paths: + assert os.path.isfile( + img_path), "The {} isn't a valid file.".format(img_path) + img = cv2.imread(img_path) + if img is None: + logger.info("error in loading image:{}".format(img_path)) + continue + images.append(img) + return images + + def rec_text(self, + images=[], + paths=[]): + """ + Get the text box in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + Returns: + res (list): The result of text detection box and save path of images. + """ + + if images != [] and isinstance(images, list) and paths == []: + predicted_data = images + elif images == [] and isinstance(paths, list) and paths != []: + predicted_data = self.read_images(paths) + else: + raise TypeError("The input data is inconsistent with expectations.") + + assert predicted_data != [], "There is not any image to be predicted. Please check the input data." + + text_recognizer = TextRecognizer(self.config) + img_list = [] + for img in predicted_data: + if img is None: + continue + img_list.append(img) + try: + rec_res, predict_time = text_recognizer(img_list) + except Exception as e: + print(e) + return [] + return rec_res + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.det_text(images_decode, **kwargs) + return results + + +if __name__ == '__main__': + ocr = OCRRec() + image_path = [ + './doc/imgs_words/ch/word_1.jpg', + './doc/imgs_words/ch/word_2.jpg', + './doc/imgs_words/ch/word_3.jpg', + ] + res = ocr.rec_text(paths=image_path) + print(res) \ No newline at end of file diff --git a/deploy/ocr_hubserving/ocr_system/__init__.py b/deploy/ocr_hubserving/ocr_system/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/deploy/ocr_hubserving/ocr_system/config.json b/deploy/ocr_hubserving/ocr_system/config.json new file mode 100644 index 0000000000000000000000000000000000000000..364c7426a44a6b67576508c89cd993368624726f --- /dev/null +++ b/deploy/ocr_hubserving/ocr_system/config.json @@ -0,0 +1,16 @@ +{ + "modules_info": { + "ocr_system": { + "init_args": { + "version": "1.0.0", + "det_model_dir": "./inference/ch_det_mv3_db/", + "rec_model_dir": "./inference/ch_rec_mv3_crnn/", + "use_gpu": true + }, + "predict_args": { + "visualization": false + } + } + } +} + diff --git a/deploy/ocr_hubserving/ocr_system/module.py b/deploy/ocr_hubserving/ocr_system/module.py new file mode 100644 index 0000000000000000000000000000000000000000..dc5ab211b937c114cb87e9bcc058af583f606d6b --- /dev/null +++ b/deploy/ocr_hubserving/ocr_system/module.py @@ -0,0 +1,201 @@ +# -*- coding:utf-8 -*- +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import argparse +import ast +import copy +import math +import os +import time + +from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor +from paddlehub.common.logger import logger +from paddlehub.module.module import moduleinfo, runnable, serving +from PIL import Image +import cv2 +import numpy as np +import paddle.fluid as fluid +import paddlehub as hub + +from tools.infer.utility import draw_ocr, base64_to_cv2 +from tools.infer.predict_system import TextSystem + + +class Config(object): + pass + +@moduleinfo( + name="ocr_system", + version="1.0.0", + summary="ocr system service", + author="paddle-dev", + author_email="paddle-dev@baidu.com", + type="cv/text_recognition") +class OCRSystem(hub.Module): + def _initialize(self, + det_model_dir="", + det_algorithm="DB", + rec_model_dir="", + rec_algorithm="CRNN", + rec_char_dict_path="./ppocr/utils/ppocr_keys_v1.txt", + rec_batch_num=30, + use_gpu=False + ): + """ + initialize with the necessary elements + """ + self.config = Config() + self.config.use_gpu = use_gpu + if use_gpu: + try: + _places = os.environ["CUDA_VISIBLE_DEVICES"] + int(_places[0]) + print("use gpu: ", use_gpu) + print("CUDA_VISIBLE_DEVICES: ", _places) + except: + raise RuntimeError( + "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." + ) + self.config.ir_optim = True + self.config.gpu_mem = 8000 + + #params for text detector + self.config.det_algorithm = det_algorithm + self.config.det_model_dir = det_model_dir + # self.config.det_model_dir = "./inference/det/" + + #DB parmas + self.config.det_db_thresh =0.3 + self.config.det_db_box_thresh =0.5 + self.config.det_db_unclip_ratio =2.0 + + #EAST parmas + self.config.det_east_score_thresh = 0.8 + self.config.det_east_cover_thresh = 0.1 + self.config.det_east_nms_thresh = 0.2 + + #params for text recognizer + self.config.rec_algorithm = rec_algorithm + self.config.rec_model_dir = rec_model_dir + # self.config.rec_model_dir = "./inference/rec/" + + self.config.rec_image_shape = "3, 32, 320" + self.config.rec_char_type = 'ch' + self.config.rec_batch_num = rec_batch_num + self.config.rec_char_dict_path = rec_char_dict_path + self.config.use_space_char = True + + def read_images(self, paths=[]): + images = [] + for img_path in paths: + assert os.path.isfile( + img_path), "The {} isn't a valid file.".format(img_path) + img = cv2.imread(img_path) + if img is None: + logger.info("error in loading image:{}".format(img_path)) + continue + images.append(img) + return images + + def recognize_text(self, + images=[], + paths=[], + det_max_side_len=960, + draw_img_save='ocr_result', + visualization=False, + text_thresh=0.5): + """ + Get the chinese texts in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + use_gpu (bool): Whether to use gpu. + batch_size(int): the program deals once with one + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + box_thresh(float): the threshold of the detected text box's confidence + text_thresh(float): the threshold of the recognize chinese texts' confidence + Returns: + res (list): The result of chinese texts and save path of images. + """ + + if images != [] and isinstance(images, list) and paths == []: + predicted_data = images + elif images == [] and isinstance(paths, list) and paths != []: + predicted_data = self.read_images(paths) + else: + raise TypeError("The input data is inconsistent with expectations.") + + assert predicted_data != [], "There is not any image to be predicted. Please check the input data." + + self.config.det_max_side_len = det_max_side_len + text_sys = TextSystem(self.config) + cnt = 0 + all_results = [] + for img in predicted_data: + result = {'save_path': ''} + if img is None: + logger.info("error in loading image") + result['data'] = [] + all_results.append(result) + continue + starttime = time.time() + dt_boxes, rec_res = text_sys(img) + elapse = time.time() - starttime + cnt += 1 + print("Predict time of image %d: %.3fs" % (cnt, elapse)) + dt_num = len(dt_boxes) + rec_res_final = [] + for dno in range(dt_num): + text, score = rec_res[dno] + # if the recognized text confidence score is lower than text_thresh, then drop it + if score >= text_thresh: + # text_str = "%s, %.3f" % (text, score) + # print(text_str) + rec_res_final.append( + { + 'text': text, + 'confidence': float(score), + 'text_box_position': dt_boxes[dno].astype(np.int).tolist() + } + ) + result['data'] = rec_res_final + + if visualization: + image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) + boxes = dt_boxes + txts = [rec_res[i][0] for i in range(len(rec_res))] + scores = [rec_res[i][1] for i in range(len(rec_res))] + + draw_img = draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5) + if not os.path.exists(draw_img_save): + os.makedirs(draw_img_save) + saved_name = 'ndarray_{}.jpg'.format(time.time()) + save_file_path = os.path.join(draw_img_save, saved_name) + cv2.imwrite(save_file_path, draw_img[:, :, ::-1]) + print("The visualized image saved in {}".format(save_file_path)) + result['save_path'] = save_file_path + + all_results.append(result) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + +if __name__ == '__main__': + ocr = OCRSystem() + image_path = [ + './doc/imgs/11.jpg', + './doc/imgs/12.jpg', + ] + res = ocr.recognize_text(paths=image_path, visualization=True) + print(res) \ No newline at end of file diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index ae16263e5272641f95d5e8842da08ac65d7a0b12..bee2637094b1386210677788f0944d232f7ff82c 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -22,7 +22,7 @@ | print_batch_step | 设置打印log间隔 | 10 | \ | | save_model_dir | 设置模型保存路径 | output/{算法名称} | \ | | save_epoch_step | 设置模型保存间隔 | 3 | \ | -| eval_batch_step | 设置模型评估间隔 | 2000 | \ | +| eval_batch_step | 设置模型评估间隔 | 2000 或 [1000, 2000] | 2000 表示每2000次迭代评估一次,[1000, 2000]表示从1000次迭代开始,每2000次评估一次 | |train_batch_size_per_card | 设置训练时单卡batch size | 256 | \ | | test_batch_size_per_card | 设置评估时单卡batch size | 256 | \ | | image_shape | 设置输入图片尺寸 | [3, 32, 100] | \ | @@ -30,6 +30,8 @@ | character_type | 设置字符类型 | ch | en/ch, en时将使用默认dict,ch时使用自定义dict| | character_dict_path | 设置字典路径 | ./ppocr/utils/ic15_dict.txt | \ | | loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention | +| distort | 设置是否使用数据增强 | false | 设置为true时,将在训练时随机进行扰动,支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) | +| use_space_char | 设置是否识别空格 | false | 仅在 character_type=ch 时支持空格 | | reader_yml | 设置reader配置文件 | ./configs/rec/rec_icdar15_reader.yml | \ | | pretrain_weights | 加载预训练模型路径 | ./pretrain_models/CRNN/best_accuracy | \ | | checkpoints | 加载模型参数路径 | None | 用于中断后加载参数继续训练 | diff --git a/doc/doc_ch/datasets.md b/doc/doc_ch/datasets.md index 23c6ee495996c74f8d33d7338988fd914a82ff2e..455b46b3d5e46f9e0cff96f4cc9d3ac10c6bc5ee 100644 --- a/doc/doc_ch/datasets.md +++ b/doc/doc_ch/datasets.md @@ -6,7 +6,7 @@ - [中文文档文字识别](#中文文档文字识别) - [ICDAR2019-ArT](#ICDAR2019-ArT) -除了开源数据,用户还可使用合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 +除了开源数据,用户还可使用合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[SynthText_Chinese_version](https://github.com/JarveeLee/SynthText_Chinese_version)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 #### 1、ICDAR2019-LSVT diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md index 68cd39385ce9744f76e6b6c4f274907f26c22387..c58caf24a1fa7cdb3b5f255bc0c38e569f38c70e 100644 --- a/doc/doc_ch/inference.md +++ b/doc/doc_ch/inference.md @@ -1,5 +1,5 @@ -# 基于预测引擎推理 +# 基于Python预测引擎推理 inference 模型(fluid.io.save_inference_model保存的模型) 一般是模型训练完成后保存的固化模型,多用于预测部署。 diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md new file mode 100644 index 0000000000000000000000000000000000000000..efb04daa1edb2f50ddf492e8507b3cda074a91e4 --- /dev/null +++ b/doc/doc_ch/quickstart.md @@ -0,0 +1,86 @@ + +# 中文OCR模型快速使用 + +## 1.环境配置 + +请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。 + +## 2.inference模型下载 + +|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| +|-|-|-|-|-| +|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) + +*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下* + +复制上表中的检测和识别的`inference模型`下载地址,并解压 + +``` +mkdir inference && cd inference +# 下载检测模型并解压 +wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} +# 下载识别模型并解压 +wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package} +cd .. +``` + +以超轻量级模型为例: + +``` +mkdir inference && cd inference +# 下载超轻量级中文OCR模型的检测模型并解压 +wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar +# 下载超轻量级中文OCR模型的识别模型并解压 +wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar +cd .. +``` + +解压完毕后应有如下文件结构: + +``` +|-inference + |-ch_rec_mv3_crnn + |- model + |- params + |-ch_det_mv3_db + |- model + |- params + ... +``` + +## 3.单张图像或者图像集合预测 + +以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 + +```bash + +# 预测image_dir指定的单张图像 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" + +# 预测image_dir指定的图像集合 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" + +# 如果想使用CPU进行预测,需设置use_gpu参数为False +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False +``` + +通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: +``` +# 预测image_dir指定的单张图像 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" +``` + +带空格的通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: + +``` +# 预测image_dir指定的单张图像 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" +``` + +更多的文本检测、识别串联推理使用方式请参考文档教程中[基于Python预测引擎推理](./inference.md)。 + +此外,文档教程中也提供了中文OCR模型的其他预测部署方式: +- 基于C++预测引擎推理(comming soon) +- [服务部署](./doc/doc_ch/serving.md) +- 端侧部署(comming soon) diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index b5dc484fd18102cdd01512a72a2dce92ae945e04..8fe28fedfe4809cf811297ea1aed3d8688bdb6d9 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -94,7 +94,10 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起, `ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典, 您可以按需使用。 -如需自定义dic文件,请修改 `configs/rec/rec_icdar15_train.yml` 中的 `character_dict_path` 字段, 并将 `character_type` 设置为 `ch`。 +如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 并将 `character_type` 设置为 `ch`。 + +*如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true`。`use_space_char` 仅在 `character_type=ch` 时生效* + ### 启动训练 @@ -124,6 +127,18 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 tools/train.py -c configs/rec/rec_icdar15_train.yml ``` +- 数据增强 + +PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`。 + +默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse)。 + +训练过程中每种扰动方式以50%的概率被选择,具体代码实现请参考:[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) + +*由于OpenCV的兼容性问题,扰动操作暂时只支持GPU* + +- 训练 + PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` 。 如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。 @@ -157,12 +172,26 @@ Global: character_type: ch # 添加自定义字典,如修改字典请将路径指向新字典 character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt + # 训练时添加数据增强 + distort: true + # 识别空格 + use_space_char: true ... # 修改reader类型 reader_yml: ./configs/rec/rec_chinese_reader.yml ... ... + +Optimizer: + ... + # 添加学习率衰减策略 + decay: + function: cosine_decay + # 每个 epoch 包含 iter 数 + step_each_epoch: 20 + # 总共训练epoch数 + total_epoch: 1000 ``` **注意,预测/评估时的配置文件请务必与训练一致。** diff --git a/doc/doc_ch/reference.md b/doc/doc_ch/reference.md new file mode 100644 index 0000000000000000000000000000000000000000..9d9a6785b353ba8800ae0ff9db8cb40e9bf9caa9 --- /dev/null +++ b/doc/doc_ch/reference.md @@ -0,0 +1,55 @@ +# 参考文献 + +``` +1. EAST: +@inproceedings{zhou2017east, + title={EAST: an efficient and accurate scene text detector}, + author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun}, + booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition}, + pages={5551--5560}, + year={2017} +} + +2. DB: +@article{liao2019real, + title={Real-time Scene Text Detection with Differentiable Binarization}, + author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang}, + journal={arXiv preprint arXiv:1911.08947}, + year={2019} +} + +3. DTRB: +@inproceedings{baek2019wrong, + title={What is wrong with scene text recognition model comparisons? dataset and model analysis}, + author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={4715--4723}, + year={2019} +} + +4. SAST: +@inproceedings{wang2019single, + title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning}, + author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming}, + booktitle={Proceedings of the 27th ACM International Conference on Multimedia}, + pages={1277--1285}, + year={2019} +} + +5. SRN: +@article{yu2020towards, + title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks}, + author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui}, + journal={arXiv preprint arXiv:2003.12294}, + year={2020} +} + +6. end2end-psl: +@inproceedings{sun2019chinese, + title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning}, + author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={9086--9095}, + year={2019} +} +``` \ No newline at end of file diff --git a/doc/doc_ch/serving.md b/doc/doc_ch/serving.md new file mode 100644 index 0000000000000000000000000000000000000000..da043921388ad59a5b6b9e60ebd6f1200454ff25 --- /dev/null +++ b/doc/doc_ch/serving.md @@ -0,0 +1,109 @@ +# 服务部署 + +PaddleOCR提供2种服务部署方式: +- 基于HubServing的部署:已集成到PaddleOCR中([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/ocr_hubserving)),按照本教程使用; +- 基于PaddleServing的部署:详见PaddleServing官网[demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr),后续也将集成到PaddleOCR。 + +服务部署目录下包括检测、识别、2阶段串联三种服务包,根据需求选择相应的服务包进行安装和启动。目录如下: +``` +deploy/hubserving/ + └─ ocr_det 检测模块服务包 + └─ ocr_rec 识别模块服务包 + └─ ocr_system 检测+识别串联服务包 +``` + +每个服务包下包含3个文件。以2阶段串联服务包为例,目录如下: +``` +deploy/hubserving/ocr_system/ + └─ __init__.py 空文件 + └─ config.json 配置文件,启动服务时作为参数传入 + └─ module.py 主模块,包含服务的完整逻辑 +``` + +## 启动服务 +以下步骤以检测+识别2阶段串联服务为例,如果只需要检测服务或识别服务,替换相应文件路径即可。 +### 1. 安装paddlehub +```pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple``` + +### 2. 安装服务模块 +PaddleOCR提供3种服务模块,根据需要安装所需模块。如: + +安装检测服务模块: +```hub install deploy/hubserving/ocr_det/``` + +或,安装识别服务模块: +```hub install deploy/hubserving/ocr_rec/``` + +或,安装检测+识别串联服务模块: +```hub install deploy/hubserving/ocr_system/``` + +### 3. 修改配置文件 +在config.json中指定模型路径、是否使用GPU、是否对结果做可视化等参数,如,串联服务ocr_system的配置: +```python +{ + "modules_info": { + "ocr_system": { + "init_args": { + "version": "1.0.0", + "det_model_dir": "./inference/det/", + "rec_model_dir": "./inference/rec/", + "use_gpu": true + }, + "predict_args": { + "visualization": false + } + } + } +} +``` +其中,模型路径对应的模型为```inference模型```。 + +### 4. 运行启动命令 +```hub serving start -m ocr_system --config hubserving/ocr_det/config.json``` + +这样就完成了一个服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测(即,config中use_gpu置为true),则需要在启动服务之前,设置CUDA_VISIBLE_DEVICES环境变量,如:```export CUDA_VISIBLE_DEVICES=0```,否则不用设置。 + +## 发送预测请求 +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果: + +```python +import requests +import json +import cv2 +import base64 + +def cv2_to_base64(image): + return base64.b64encode(image).decode('utf8') + +# 发送HTTP请求 +data = {'images':[cv2_to_base64(open("./doc/imgs/11.jpg", 'rb').read())]} +headers = {"Content-type": "application/json"} +# url = "http://127.0.0.1:8866/predict/ocr_det" +# url = "http://127.0.0.1:8866/predict/ocr_rec" +url = "http://127.0.0.1:8866/predict/ocr_system" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) + +# 打印预测结果 +print(r.json()["results"]) +``` + +你可能需要根据实际情况修改```url```字符串中的端口号和服务模块名称。 + +上面所示代码都已写入测试脚本,可直接运行命令:```python tools/test_hubserving.py``` + +## 自定义修改服务模块 +如果需要修改服务逻辑,你一般需要操作以下步骤: + +1、 停止服务 +```hub serving stop -m ocr_system``` + +2、 到相应的module.py文件中根据实际需求修改代码 + +3、 卸载旧服务包 +```hub uninstall ocr_system``` + +4、 安装修改后的新服务包 +```hub install deploy/hubserving/ocr_system/``` + diff --git a/doc/doc_ch/update.md b/doc/doc_ch/update.md index 69961b98fb6ead0e615849e6dc06dee255dc79db..a67b8df9bcec20f5abd7f8b9d6d64e2d68406131 100644 --- a/doc/doc_ch/update.md +++ b/doc/doc_ch/update.md @@ -1,5 +1,7 @@ # 版本更新 - +- 2020.7.9 添加支持空格的识别模型,识别效果 +- 2020.7.9 添加数据增强、学习率衰减策略,具体参考配置文件 +- 2020.6.8 添加数据集,并保持持续更新 - 2020.6.5 支持 `attetnion` 模型导出 `inference_model` - 2020.6.5 支持单独预测识别时,输出结果得分 - 2020.5.30 提供超轻量级中文OCR在线体验 diff --git a/doc/doc_ch/visualization.md b/doc/doc_ch/visualization.md new file mode 100644 index 0000000000000000000000000000000000000000..4837c686bc9cc8843dcd91a0858e41281388c3a0 --- /dev/null +++ b/doc/doc_ch/visualization.md @@ -0,0 +1,31 @@ +# 效果展示 +- [超轻量级中文OCR效果展示](#超轻量级中文OCR) +- [通用中文OCR效果展示](#通用中文OCR) +- [支持空格的中文OCR效果展示](#支持空格的中文OCR) + + +## 超轻量级中文OCR效果展示 + +![](../imgs_results/1.jpg) +![](../imgs_results/7.jpg) +![](../imgs_results/12.jpg) +![](../imgs_results/4.jpg) +![](../imgs_results/6.jpg) +![](../imgs_results/9.jpg) +![](../imgs_results/16.png) +![](../imgs_results/22.jpg) + + +## 通用中文OCR效果展示 +![](../imgs_results/chinese_db_crnn_server/11.jpg) +![](../imgs_results/chinese_db_crnn_server/2.jpg) +![](../imgs_results/chinese_db_crnn_server/8.jpg) + + +## 支持空格的中文OCR效果展示 + +### 轻量级模型 +![](../imgs_results/img_11.jpg) + +### 通用模型 +![](../imgs_results/chinese_db_crnn_server/en_paper.jpg) diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index b9ad03947c545a4760331f835a9cc85be6ff67a7..ffead1ee335a3d2f10491792a23a69e0f22a1755 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -22,7 +22,7 @@ Take `rec_chinese_lite_train.yml` as an example | print_batch_step | Set print log interval | 10 | \ | | save_model_dir | Set model save path | output/{model_name} | \ | | save_epoch_step | Set model save interval | 3 | \ | -| eval_batch_step | Set the model evaluation interval | 2000 | \ | +| eval_batch_step | Set the model evaluation interval |2000 or [1000, 2000] |runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration | |train_batch_size_per_card | Set the batch size during training | 256 | \ | | test_batch_size_per_card | Set the batch size during testing | 256 | \ | | image_shape | Set input image size | [3, 32, 100] | \ | @@ -30,6 +30,8 @@ Take `rec_chinese_lite_train.yml` as an example | character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch| | character_dict_path | Set dictionary path | ./ppocr/utils/ic15_dict.txt | \ | | loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention | +| distort | Set use distort | false | Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) | +| use_space_char | Wether to recognize space | false | Only support in character_type=ch mode | | reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ | | pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ | | checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 9a862c7a67c6d2277bc6472b304534788e06921d..ac1bc2f335bd9470924c7e5934265a23b26342c6 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -158,9 +158,23 @@ Global: ... # Modify reader type reader_yml: ./configs/rec/rec_chinese_reader.yml + # Whether to use data augmentation + distort: true + # Whether to recognize spaces + use_space_char: true ... ... + +Optimizer: + ... + # Add learning rate decay strategy + decay: + function: cosine_decay + # Each epoch contains iter number + step_each_epoch: 20 + # Total epoch number + total_epoch: 1000 ``` **Note that the configuration file for prediction/evaluation must be consistent with the training.** diff --git a/doc/imgs_en/img_12.jpg b/doc/imgs_en/img_12.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b0d289538a5dd63fdf391e02466df2bbe5ea76c1 Binary files /dev/null and b/doc/imgs_en/img_12.jpg differ diff --git a/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg b/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c051d3fdb54204e87873093359c486ee0aab8184 Binary files /dev/null and b/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg differ diff --git a/doc/imgs_results/img_11.jpg b/doc/imgs_results/img_11.jpg new file mode 100644 index 0000000000000000000000000000000000000000..cf942f9a59c35041e5a1885d14d7cf8aa582f54d Binary files /dev/null and b/doc/imgs_results/img_11.jpg differ diff --git a/ppocr/data/det/db_process.py b/ppocr/data/det/db_process.py index cdb8efc2bd9028ff5fc004f13b6505db5d74328c..b64b8c8d227f293aff0eff90d1d85dee8dd85fce 100644 --- a/ppocr/data/det/db_process.py +++ b/ppocr/data/det/db_process.py @@ -194,8 +194,12 @@ class DBProcessTest(object): img_std = [0.229, 0.224, 0.225] im = im.astype(np.float32, copy=False) im = im / 255 - im -= img_mean - im /= img_std + im[:, :, 0] -= img_mean[0] + im[:, :, 1] -= img_mean[1] + im[:, :, 2] -= img_mean[2] + im[:, :, 0] /= img_std[0] + im[:, :, 1] /= img_std[1] + im[:, :, 2] /= img_std[2] channel_swap = (2, 0, 1) im = im.transpose(channel_swap) return im diff --git a/ppocr/data/rec/dataset_traversal.py b/ppocr/data/rec/dataset_traversal.py index e57717d937c9dbbfb64007bdc2ed81458559ca6a..510a028451302a92ebc179792ecbcb1ff8649807 100755 --- a/ppocr/data/rec/dataset_traversal.py +++ b/ppocr/data/rec/dataset_traversal.py @@ -45,12 +45,20 @@ class LMDBReader(object): self.use_tps = False if "tps" in params: self.ues_tps = True + self.use_distort = False + if "distort" in params: + self.use_distort = params['distort'] and params['use_gpu'] + if not params['use_gpu']: + logger.info( + "Distort operation can only support in GPU. Distort will be set to False." + ) if params['mode'] == 'train': self.batch_size = params['train_batch_size_per_card'] self.drop_last = True else: self.batch_size = params['test_batch_size_per_card'] self.drop_last = False + self.use_distort = False self.infer_img = params['infer_img'] def load_hierarchical_lmdb_dataset(self): @@ -142,7 +150,8 @@ class LMDBReader(object): label=label, char_ops=self.char_ops, loss_type=self.loss_type, - max_text_length=self.max_text_length) + max_text_length=self.max_text_length, + distort=self.use_distort) if outs is None: continue yield outs @@ -185,12 +194,20 @@ class SimpleReader(object): self.use_tps = False if "tps" in params: self.use_tps = True + self.use_distort = False + if "distort" in params: + self.use_distort = params['distort'] and params['use_gpu'] + if not params['use_gpu']: + logger.info( + "Distort operation can only support in GPU.Distort will be set to False." + ) if params['mode'] == 'train': self.batch_size = params['train_batch_size_per_card'] self.drop_last = True else: self.batch_size = params['test_batch_size_per_card'] self.drop_last = False + self.use_distort = False def __call__(self, process_id): if self.mode != 'train': @@ -232,9 +249,14 @@ class SimpleReader(object): img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) label = substr[1] - outs = process_image(img, self.image_shape, label, - self.char_ops, self.loss_type, - self.max_text_length) + outs = process_image( + img=img, + image_shape=self.image_shape, + label=label, + char_ops=self.char_ops, + loss_type=self.loss_type, + max_text_length=self.max_text_length, + distort=self.use_distort) if outs is None: continue yield outs diff --git a/ppocr/data/rec/img_tools.py b/ppocr/data/rec/img_tools.py index 9c4bfa4d3cf0d86b8c70cb95f425970bb1d67da7..d41abd9ba5c867853984b4b3b4c9eda41ebeff7c 100755 --- a/ppocr/data/rec/img_tools.py +++ b/ppocr/data/rec/img_tools.py @@ -15,6 +15,7 @@ import math import cv2 import numpy as np +import random from ppocr.utils.utility import initial_logger logger = initial_logger() @@ -89,6 +90,254 @@ def get_img_data(value): return imgori +def flag(): + """ + flag + """ + return 1 if random.random() > 0.5000001 else -1 + + +def cvtColor(img): + """ + cvtColor + """ + hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) + delta = 0.001 * random.random() * flag() + hsv[:, :, 2] = hsv[:, :, 2] * (1 + delta) + new_img = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR) + return new_img + + +def blur(img): + """ + blur + """ + h, w, _ = img.shape + if h > 10 and w > 10: + return cv2.GaussianBlur(img, (5, 5), 1) + else: + return img + + +def jitter(img): + """ + jitter + """ + w, h, _ = img.shape + if h > 10 and w > 10: + thres = min(w, h) + s = int(random.random() * thres * 0.01) + src_img = img.copy() + for i in range(s): + img[i:, i:, :] = src_img[:w - i, :h - i, :] + return img + else: + return img + + +def add_gasuss_noise(image, mean=0, var=0.1): + """ + Gasuss noise + """ + + noise = np.random.normal(mean, var**0.5, image.shape) + out = image + 0.5 * noise + out = np.clip(out, 0, 255) + out = np.uint8(out) + return out + + +def get_crop(image): + """ + random crop + """ + h, w, _ = image.shape + top_min = 1 + top_max = 8 + top_crop = int(random.randint(top_min, top_max)) + top_crop = min(top_crop, h - 1) + crop_img = image.copy() + ratio = random.randint(0, 1) + if ratio: + crop_img = crop_img[top_crop:h, :, :] + else: + crop_img = crop_img[0:h - top_crop, :, :] + return crop_img + + +class Config: + """ + Config + """ + + def __init__(self, ): + self.anglex = random.random() * 30 + self.angley = random.random() * 15 + self.anglez = random.random() * 10 + self.fov = 42 + self.r = 0 + self.shearx = random.random() * 0.3 + self.sheary = random.random() * 0.05 + self.borderMode = cv2.BORDER_REPLICATE + + def make(self, w, h, ang): + """ + make + """ + self.anglex = random.random() * 5 * flag() + self.angley = random.random() * 5 * flag() + self.anglez = -1 * random.random() * int(ang) * flag() + self.fov = 42 + self.r = 0 + self.shearx = 0 + self.sheary = 0 + self.borderMode = cv2.BORDER_REPLICATE + self.w = w + self.h = h + + self.perspective = True + self.crop = True + self.affine = False + self.reverse = True + self.noise = True + self.jitter = True + self.blur = True + self.color = True + + +def rad(x): + """ + rad + """ + return x * np.pi / 180 + + +def get_warpR(config): + """ + get_warpR + """ + anglex, angley, anglez, fov, w, h, r = \ + config.anglex, config.angley, config.anglez, config.fov, config.w, config.h, config.r + if w > 69 and w < 112: + anglex = anglex * 1.5 + + z = np.sqrt(w**2 + h**2) / 2 / np.tan(rad(fov / 2)) + # Homogeneous coordinate transformation matrix + rx = np.array([[1, 0, 0, 0], + [0, np.cos(rad(anglex)), -np.sin(rad(anglex)), 0], [ + 0, + -np.sin(rad(anglex)), + np.cos(rad(anglex)), + 0, + ], [0, 0, 0, 1]], np.float32) + ry = np.array([[np.cos(rad(angley)), 0, np.sin(rad(angley)), 0], + [0, 1, 0, 0], [ + -np.sin(rad(angley)), + 0, + np.cos(rad(angley)), + 0, + ], [0, 0, 0, 1]], np.float32) + rz = np.array([[np.cos(rad(anglez)), np.sin(rad(anglez)), 0, 0], + [-np.sin(rad(anglez)), np.cos(rad(anglez)), 0, 0], + [0, 0, 1, 0], [0, 0, 0, 1]], np.float32) + r = rx.dot(ry).dot(rz) + # generate 4 points + pcenter = np.array([h / 2, w / 2, 0, 0], np.float32) + p1 = np.array([0, 0, 0, 0], np.float32) - pcenter + p2 = np.array([w, 0, 0, 0], np.float32) - pcenter + p3 = np.array([0, h, 0, 0], np.float32) - pcenter + p4 = np.array([w, h, 0, 0], np.float32) - pcenter + dst1 = r.dot(p1) + dst2 = r.dot(p2) + dst3 = r.dot(p3) + dst4 = r.dot(p4) + list_dst = np.array([dst1, dst2, dst3, dst4]) + org = np.array([[0, 0], [w, 0], [0, h], [w, h]], np.float32) + dst = np.zeros((4, 2), np.float32) + # Project onto the image plane + dst[:, 0] = list_dst[:, 0] * z / (z - list_dst[:, 2]) + pcenter[0] + dst[:, 1] = list_dst[:, 1] * z / (z - list_dst[:, 2]) + pcenter[1] + + warpR = cv2.getPerspectiveTransform(org, dst) + + dst1, dst2, dst3, dst4 = dst + r1 = int(min(dst1[1], dst2[1])) + r2 = int(max(dst3[1], dst4[1])) + c1 = int(min(dst1[0], dst3[0])) + c2 = int(max(dst2[0], dst4[0])) + + try: + ratio = min(1.0 * h / (r2 - r1), 1.0 * w / (c2 - c1)) + + dx = -c1 + dy = -r1 + T1 = np.float32([[1., 0, dx], [0, 1., dy], [0, 0, 1.0 / ratio]]) + ret = T1.dot(warpR) + except: + ratio = 1.0 + T1 = np.float32([[1., 0, 0], [0, 1., 0], [0, 0, 1.]]) + ret = T1 + return ret, (-r1, -c1), ratio, dst + + +def get_warpAffine(config): + """ + get_warpAffine + """ + anglez = config.anglez + rz = np.array([[np.cos(rad(anglez)), np.sin(rad(anglez)), 0], + [-np.sin(rad(anglez)), np.cos(rad(anglez)), 0]], np.float32) + return rz + + +def warp(img, ang): + """ + warp + """ + h, w, _ = img.shape + config = Config() + config.make(w, h, ang) + new_img = img + + if config.perspective: + tp = random.randint(1, 100) + if tp >= 50: + warpR, (r1, c1), ratio, dst = get_warpR(config) + new_w = int(np.max(dst[:, 0])) - int(np.min(dst[:, 0])) + new_img = cv2.warpPerspective( + new_img, + warpR, (int(new_w * ratio), h), + borderMode=config.borderMode) + if config.crop: + img_height, img_width = img.shape[0:2] + tp = random.randint(1, 100) + if tp >= 50 and img_height >= 20 and img_width >= 20: + new_img = get_crop(new_img) + if config.affine: + warpT = get_warpAffine(config) + new_img = cv2.warpAffine( + new_img, warpT, (w, h), borderMode=config.borderMode) + if config.blur: + tp = random.randint(1, 100) + if tp >= 50: + new_img = blur(new_img) + if config.color: + tp = random.randint(1, 100) + if tp >= 50: + new_img = cvtColor(new_img) + if config.jitter: + new_img = jitter(new_img) + if config.noise: + tp = random.randint(1, 100) + if tp >= 50: + new_img = add_gasuss_noise(new_img) + if config.reverse: + tp = random.randint(1, 100) + if tp >= 50: + new_img = 255 - new_img + return new_img + + def process_image(img, image_shape, label=None, @@ -96,7 +345,10 @@ def process_image(img, loss_type=None, max_text_length=None, tps=None, - infer_mode=False): + infer_mode=False, + distort=False): + if distort: + img = warp(img, 10) if infer_mode and char_ops.character_type == "ch" and not tps: norm_img = resize_norm_img_chinese(img, image_shape) else: @@ -108,7 +360,7 @@ def process_image(img, text = char_ops.encode(label) if len(text) == 0 or len(text) > max_text_length: logger.info( - "Warning in ppocr/data/rec/img_tools.py:line106: Wrong data type." + "Warning in ppocr/data/rec/img_tools.py:line362: Wrong data type." "Excepted string with length between 1 and {}, but " "got '{}'. Label is '{}'".format(max_text_length, len(text), label)) diff --git a/ppocr/utils/character.py b/ppocr/utils/character.py index 3cbc31a49b991cab7f2f8d8c56db4e0d611fbf55..9a3db8dd92454c65256d1cadf7f155b6882ee171 100755 --- a/ppocr/utils/character.py +++ b/ppocr/utils/character.py @@ -30,12 +30,17 @@ class CharacterOps(object): dict_character = list(self.character_str) elif self.character_type == "ch": character_dict_path = config['character_dict_path'] + add_space = False + if 'use_space_char' in config: + add_space = config['use_space_char'] self.character_str = "" with open(character_dict_path, "rb") as fin: lines = fin.readlines() for line in lines: line = line.decode('utf-8').strip("\n").strip("\r\n") self.character_str += line + if add_space: + self.character_str += " " dict_character = list(self.character_str) elif self.character_type == "en_sensitive": # same with ASTER setting (use 94 char). @@ -93,7 +98,7 @@ class CharacterOps(object): if is_remove_duplicate: if idx > 0 and text_index[idx - 1] == text_index[idx]: continue - char_list.append(self.character[text_index[idx]]) + char_list.append(self.character[int(text_index[idx])]) text = ''.join(char_list) return text diff --git a/tools/infer/predict_rec.py b/tools/infer/predict_rec.py index 9761ddbad9123372d706db158aba8008956f30e9..bd96548d827d7b47a059648e8cedc20086488801 100755 --- a/tools/infer/predict_rec.py +++ b/tools/infer/predict_rec.py @@ -39,7 +39,8 @@ class TextRecognizer(object): self.rec_algorithm = args.rec_algorithm char_ops_params = { "character_type": args.rec_char_type, - "character_dict_path": args.rec_char_dict_path + "character_dict_path": args.rec_char_dict_path, + "use_space_char": args.use_space_char } if self.rec_algorithm != "RARE": char_ops_params['loss_type'] = 'ctc' diff --git a/tools/infer/utility.py b/tools/infer/utility.py index 2859c97e5b3f2ded931488365f0aee89df836d3a..f4361a76b9758b970e6b751a7e9b15704e33b4f7 100755 --- a/tools/infer/utility.py +++ b/tools/infer/utility.py @@ -63,6 +63,7 @@ def parse_args(): "--rec_char_dict_path", type=str, default="./ppocr/utils/ppocr_keys_v1.txt") + parser.add_argument("--use_space_char", type=bool, default=True) return parser.parse_args() @@ -90,8 +91,9 @@ def create_predictor(args, mode): config.enable_use_gpu(args.gpu_mem, 0) else: config.disable_gpu() - - config.enable_memory_optim() + config.enable_mkldnn() + config.set_cpu_math_library_num_threads(4) + #config.enable_memory_optim() config.disable_glog_info() # use zero copy @@ -169,26 +171,35 @@ def draw_ocr_box_txt(image, boxes, txts): draw_left = ImageDraw.Draw(img_left) draw_right = ImageDraw.Draw(img_right) for (box, txt) in zip(boxes, txts): - color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) + color = (random.randint(0, 255), random.randint(0, 255), + random.randint(0, 255)) draw_left.polygon(box, fill=color) - draw_right.polygon([box[0][0], box[0][1], - box[1][0], box[1][1], - box[2][0], box[2][1], - box[3][0], box[3][1]], outline=color) - box_height = math.sqrt((box[0][0] - box[3][0]) ** 2 + (box[0][1] - box[3][1]) ** 2) - box_width = math.sqrt((box[0][0] - box[1][0]) ** 2 + (box[0][1] - box[1][1]) ** 2) + draw_right.polygon( + [ + box[0][0], box[0][1], box[1][0], box[1][1], box[2][0], + box[2][1], box[3][0], box[3][1] + ], + outline=color) + box_height = math.sqrt((box[0][0] - box[3][0])**2 + (box[0][1] - box[3][ + 1])**2) + box_width = math.sqrt((box[0][0] - box[1][0])**2 + (box[0][1] - box[1][ + 1])**2) if box_height > 2 * box_width: font_size = max(int(box_width * 0.9), 10) - font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8") + font = ImageFont.truetype( + "./doc/simfang.ttf", font_size, encoding="utf-8") cur_y = box[0][1] for c in txt: char_size = font.getsize(c) - draw_right.text((box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font) + draw_right.text( + (box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font) cur_y += char_size[1] else: font_size = max(int(box_height * 0.8), 10) - font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8") - draw_right.text([box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font) + font = ImageFont.truetype( + "./doc/simfang.ttf", font_size, encoding="utf-8") + draw_right.text( + [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font) img_left = Image.blend(image, img_left, 0.5) img_show = Image.new('RGB', (w * 2, h), (255, 255, 255)) img_show.paste(img_left, (0, 0, w, h)) @@ -292,6 +303,25 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): return np.array(blank_img) +def base64_to_cv2(b64str): + import base64 + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +def draw_boxes(image, boxes, scores=None, drop_score=0.5): + if scores is None: + scores = [1] * len(boxes) + for (box, score) in zip(boxes, scores): + if score < drop_score: + continue + box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) + image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) + return image + + if __name__ == '__main__': test_img = "./doc/test_v2" predict_txt = "./doc/predict.txt" diff --git a/tools/program.py b/tools/program.py index 3c71065a167fa18fc9d00535dace97737904b74d..870d27002f36bbed4b7a665f4ff9bc9cc420f0c1 100755 --- a/tools/program.py +++ b/tools/program.py @@ -219,6 +219,13 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict): epoch_num = config['Global']['epoch_num'] print_batch_step = config['Global']['print_batch_step'] eval_batch_step = config['Global']['eval_batch_step'] + start_eval_step = 0 + if type(eval_batch_step) == list and len(eval_batch_step) >= 2: + start_eval_step = eval_batch_step[0] + eval_batch_step = eval_batch_step[1] + logger.info( + "During the training process, after the {}th iteration, an evaluation is run every {} iterations". + format(start_eval_step, eval_batch_step)) save_epoch_step = config['Global']['save_epoch_step'] save_model_dir = config['Global']['save_model_dir'] if not os.path.exists(save_model_dir): @@ -246,7 +253,7 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict): t2 = time.time() train_batch_elapse = t2 - t1 train_stats.update(stats) - if train_batch_id > 0 and train_batch_id \ + if train_batch_id > start_eval_step and (train_batch_id -start_eval_step) \ % print_batch_step == 0: logs = train_stats.log() strs = 'epoch: {}, iter: {}, {}, time: {:.3f}'.format( @@ -286,6 +293,13 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): epoch_num = config['Global']['epoch_num'] print_batch_step = config['Global']['print_batch_step'] eval_batch_step = config['Global']['eval_batch_step'] + start_eval_step = 0 + if type(eval_batch_step) == list and len(eval_batch_step) >= 2: + start_eval_step = eval_batch_step[0] + eval_batch_step = eval_batch_step[1] + logger.info( + "During the training process, after the {}th iteration, an evaluation is run every {} iterations". + format(start_eval_step, eval_batch_step)) save_epoch_step = config['Global']['save_epoch_step'] save_model_dir = config['Global']['save_model_dir'] if not os.path.exists(save_model_dir): @@ -324,7 +338,7 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): train_batch_elapse = t2 - t1 stats = {'loss': loss, 'acc': acc} train_stats.update(stats) - if train_batch_id > 0 and train_batch_id \ + if train_batch_id > start_eval_step and (train_batch_id - start_eval_step) \ % print_batch_step == 0: logs = train_stats.log() strs = 'epoch: {}, iter: {}, lr: {:.6f}, {}, time: {:.3f}'.format( diff --git a/tools/test_hubserving.py b/tools/test_hubserving.py new file mode 100644 index 0000000000000000000000000000000000000000..edf6ec8cb9fa0b415f932f63f8872dcdd39d2ee3 --- /dev/null +++ b/tools/test_hubserving.py @@ -0,0 +1,25 @@ +#!usr/bin/python +# -*- coding: utf-8 -*- + +import requests +import json +import cv2 +import base64 +import time + +def cv2_to_base64(image): + return base64.b64encode(image).decode('utf8') + +start = time.time() +# 发送HTTP请求 +data = {'images':[cv2_to_base64(open("./doc/imgs/11.jpg", 'rb').read())]} +headers = {"Content-type": "application/json"} +# url = "http://127.0.0.1:8866/predict/ocr_det" +# url = "http://127.0.0.1:8866/predict/ocr_rec" +url = "http://127.0.0.1:8866/predict/ocr_system" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +end = time.time() + +# 打印预测结果 +print(r.json()["results"]) +print("time cost: ", end - start)