提交 f7ae9a5f 编写于 作者: N nblib

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleOCR into develop-nblib

......@@ -21,3 +21,7 @@ output/
*.log
.clang-format
.clang_format.hook
build/
dist/
paddleocr.egg-info/
\ No newline at end of file
include LICENSE.txt
include README.md
recursive-include ppocr/utils *.txt utility.py character.py check.py
recursive-include ppocr/data/det *.py
recursive-include ppocr/postprocess *.py
recursive-include ppocr/postprocess/lanms *.*
recursive-include tools/infer *.py
此差异已折叠。
[English](README.md) | 简体中文
## 简介
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
**近期更新**
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载)
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载)
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md)
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md)
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
- [More](./doc/doc_ch/update.md)
## 特性
- PPOCR系列高质量预训练模型,准确的识别效果
- 超轻量ppocr_mobile移动端系列:检测(2.6M)+方向分类器(0.9M)+ 识别(4.6M)= 8.1M
- 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M
- 超轻量压缩ppocr_mobile_slim系列:检测(1.4M)+方向分类器(0.5M)+ 识别(1.6M)= 3.5M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语
- 支持用户自定义训练,提供丰富的预测推理部署方案
- 支持PIP快速安装使用
- 可运行于Linux、Windows、MacOS等多种系统
## 效果展示
<div align="center">
<img src="doc/imgs_results/1101.jpg" width="800">
<img src="doc/imgs_results/1103.jpg" width="800">
</div>
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)
## 快速体验
- PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统),Android手机也可以直接扫描下面二维码安装体验。
<div align="center">
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
</div>
- 代码体验:从[快速安装](./doc/doc_ch/installation.md) 开始
<a name="模型下载"></a>
## PP-OCR 1.1系列模型列表(9月17日更新)
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
| 中英文超轻量压缩OCR模型(3.5M) | ch_ppocr_mobile_slim_v1.1_xx| 移动端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb)| [推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|
更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](./doc/doc_ch/models_list.md)
## 文档教程
- [快速安装](./doc/doc_ch/installation.md)
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
- [代码组织结构](./doc/doc_ch/tree.md)
- 算法介绍
- [文本检测](./doc/doc_ch/algorithm_overview.md)
- [文本识别](./doc/doc_ch/algorithm_overview.md)
- [PP-OCR Pipline](#PP-OCR)
- 模型训练/评估
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [方向分类器](./doc/doc_ch/angle_class.md)
- [yml参数配置文件介绍](./doc/doc_ch/config.md)
- 预测部署
- [基于pip安装whl包快速推理](./doc/doc_ch/whl.md)
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/hubserving/readme.md)
- [端侧部署](./deploy/lite/readme.md)
- [模型量化](./deploy/slim/quantization/README.md)
- [模型裁剪](./deploy/slim/prune/README.md)
- [Benchmark](./doc/doc_ch/benchmark.md)
- 数据集
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
- [常用数据标注工具](./doc/doc_ch/data_annotation.md)
- [常用数据合成工具](./doc/doc_ch/data_synthesis.md)
- [效果展示](#效果展示)
- FAQ
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
- [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md)
- [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md)
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书)
- [贡献代码](#贡献代码)
<a name="PP-OCR"></a>
## PP-OCR Pipline
<div align="center">
<img src="./doc/ppocr_framework.png" width="800">
</div>
PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 。
<a name="效果展示"></a>
## 效果展示 [more](./doc/doc_ch/visualization.md)
<div align="center">
<img src="./doc/imgs_results/1102.jpg" width="800">
<img src="./doc/imgs_results/1104.jpg" width="800">
<img src="./doc/imgs_results/1106.jpg" width="800">
<img src="./doc/imgs_results/1105.jpg" width="800">
<img src="./doc/imgs_results/1110.jpg" width="800">
<img src="./doc/imgs_results/1112.jpg" width="800">
</div>
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍
<div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" />
</div>
<a name="许可证书"></a>
## 许可证书
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
<a name="贡献代码"></a>
## 贡献代码
我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。
- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck)[Karl Horky](https://github.com/karlhorky) 贡献修改英文文档
- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题
- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码
- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集
- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码
- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。
- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务,支持快速发布可调用的Restful API服务。
English | [简体中文](README.md)
## INTRODUCTION
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**Recent updates**
- 2020.7.9 Add recognition model to support space, [recognition result](#space Chinese OCR results). For more information: [Recognition](./doc/doc_ch/recognition.md) and [quickstart](./doc/doc_ch/quickstart.md)
- 2020.7.9 Add data auguments and learning rate decay strategies,please read [config](./doc/doc_en/config_en.md)
- 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- [more](./doc/doc_en/update_en.md)
## FEATURES
- Lightweight Chinese OCR model, total model size is only 8.6M
- Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
- Detection model DB (4.1M) + recognition model CRNN (4.5M)
- Various text detection algorithms: EAST, DB
- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
<a name="Supported-Chinese-model-list"></a>
### Supported Chinese models list:
|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link|
|-|-|-|-|-|
|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr
**You can also quickly experience the lightweight Chinese OCR and General Chinese OCR models as follows:**
## **LIGHTWEIGHT CHINESE OCR AND GENERAL CHINESE OCR INFERENCE**
![](doc/imgs_results/11.jpg)
The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) , [General Chinese OCR results](#General-Chinese-OCR-results) and [Support for space Recognition Model](#Space-Chinese-OCR-results).
#### 1. ENVIRONMENT CONFIGURATION
Please see [Quick installation](./doc/doc_en/installation_en.md)
#### 2. DOWNLOAD INFERENCE MODELS
#### (1) Download lightweight Chinese OCR models
*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory*
Copy the detection and recognition 'inference model' address in [Chinese model List](#Supported-Chinese-model-list), download and unpack:
```
mkdir inference && cd inference
# Download the detection part of the Chinese OCR and decompress it
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# Download the recognition part of the Chinese OCR and decompress it
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
cd ..
```
Take lightweight Chinese OCR model as an example:
```
mkdir inference && cd inference
# Download the detection part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# Download the recognition part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
# Download the space-recognized part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar && tar xf ch_rec_mv3_crnn_enhance_infer.tar
cd ..
```
After the decompression is completed, the file structure should be as follows:
```
|-inference
|-ch_rec_mv3_crnn
|- model
|- params
|-ch_det_mv3_db
|- model
|- params
...
```
#### 3. SINGLE IMAGE AND BATCH PREDICTION
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default.
```bash
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# Prediction on a batch of images by specifying image folder path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# If you want to use CPU for prediction, you need to set the use_gpu parameter to False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```
To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
```
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
```
To run inference of the space-Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
```
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/"
```
For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md)
## DOCUMENTATION
- [Quick installation](./doc/doc_en/installation_en.md)
- [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
- [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
- [Inference](./doc/doc_en/inference_en.md)
- [Introduction of yml file](./doc/doc_en/config_en.md)
- [Dataset](./doc/doc_en/datasets_en.md)
- [FAQ]((#FAQ)
## TEXT DETECTION ALGORITHM
PaddleOCR open source text detection algorithms list:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
On the ICDAR2015 dataset, the text detection result is as follows:
|Model|Backbone|precision|recall|Hmean|Download link|
|-|-|-|-|-|-|
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for Chinese detection task are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
## TEXT RECOGNITION ALGORITHM
PaddleOCR open-source text recognition algorithms list:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|-|-|-|-|-|
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)|
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
## END-TO-END OCR ALGORITHM
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)
<a name="lightweight-Chinese-OCR-results"></a>
## LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
![](doc/imgs_results/4.jpg)
![](doc/imgs_results/6.jpg)
![](doc/imgs_results/9.jpg)
![](doc/imgs_results/16.png)
![](doc/imgs_results/22.jpg)
<a name="General-Chinese-OCR-results"></a>
## General Chinese OCR results
![](doc/imgs_results/chinese_db_crnn_server/11.jpg)
![](doc/imgs_results/chinese_db_crnn_server/2.jpg)
![](doc/imgs_results/chinese_db_crnn_server/8.jpg)
<a name="Space-Chinese-OCR-results"></a>
## space Chinese OCR results
### LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/img_11.jpg)
### General Chinese OCR results
![](doc/imgs_results/chinese_db_crnn_server/en_paper.jpg)
<a name="FAQ"></a>
## FAQ
1. Error when using attention-based recognition model: KeyError: 'predict'
The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss.
2. About inference speed
When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values.
3. Service deployment and mobile deployment
It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates.
4. Release time of self-developed algorithm
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
[more](./doc/doc_en/FAQ_en.md)
## WELCOME TO THE PaddleOCR TECHNICAL EXCHANGE GROUP
WeChat: paddlehelp, note OCR, our assistant will get you into the group~
<img src="./doc/paddlehelp.jpg" width = "200" height = "200" />
## REFERENCES
```
1. EAST:
@inproceedings{zhou2017east,
title={EAST: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={5551--5560},
year={2017}
}
2. DB:
@article{liao2019real,
title={Real-time Scene Text Detection with Differentiable Binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
journal={arXiv preprint arXiv:1911.08947},
year={2019}
}
3. DTRB:
@inproceedings{baek2019wrong,
title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={4715--4723},
year={2019}
}
4. SAST:
@inproceedings{wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={1277--1285},
year={2019}
}
5. SRN:
@article{yu2020towards,
title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
journal={arXiv preprint arXiv:2003.12294},
year={2020}
}
6. end2end-psl:
@inproceedings{sun2019chinese,
title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9086--9095},
year={2019}
}
```
## LICENSE
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
## CONTRIBUTION
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) for contributing the English documentation.
- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually.
- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure.
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
__all__ = ['PaddleOCR', 'draw_ocr']
from .paddleocr import PaddleOCR
from .tools.infer.utility import draw_ocr
Global:
algorithm: CLS
use_gpu: False
epoch_num: 100
log_smooth_window: 20
print_batch_step: 100
save_model_dir: output/cls_mv3
save_epoch_step: 3
eval_batch_step: 500
train_batch_size_per_card: 512
test_batch_size_per_card: 512
image_shape: [3, 48, 192]
label_list: ['0','180']
distort: True
reader_yml: ./configs/cls/cls_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.cls_model,ClsModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.35
model_name: small
Head:
function: ppocr.modeling.heads.cls_head,ClsHead
class_dim: 2
Loss:
function: ppocr.modeling.losses.cls_loss,ClsLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 1169
total_epoch: 100
\ No newline at end of file
TrainReader:
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data/cls
label_file_path: ./train_data/cls/train.txt
EvalReader:
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
img_set_dir: ./train_data/cls
label_file_path: ./train_data/cls/test.txt
TestReader:
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
......@@ -49,6 +49,6 @@ Optimizer:
PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3
box_thresh: 0.7
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 2.0
unclip_ratio: 1.5
Global:
algorithm: DB
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_db/
save_epoch_step: 200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [4000, 5000]
train_batch_size_per_card: 16
test_batch_size_per_card: 16
image_shape: [3, 640, 640]
reader_yml: ./configs/det/det_db_icdar15_reader.yml
pretrain_weights: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/
checkpoints:
save_res_path: ./output/det_db/predicts_db.txt
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: large
disable_se: true
Head:
function: ppocr.modeling.heads.det_db_head,DBHead
model_name: large
k: 50
inner_channels: 96
out_channels: 2
Loss:
function: ppocr.modeling.losses.det_db_loss,DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 16
total_epoch: 1200
PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Global:
algorithm: DB
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_r_18_vd_db/
save_epoch_step: 200
eval_batch_step: [3000, 2000]
train_batch_size_per_card: 8
test_batch_size_per_card: 1
image_shape: [3, 640, 640]
reader_yml: ./configs/det/det_db_icdar15_reader.yml
pretrain_weights: ./pretrain_models/ResNet18_vd_pretrained/
save_res_path: ./output/det_r18_vd_db/predicts_db.txt
checkpoints:
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_resnet_vd,ResNet
layers: 18
Head:
function: ppocr.modeling.heads.det_db_head,DBHead
model_name: large
k: 50
inner_channels: 256
out_channels: 2
Loss:
function: ppocr.modeling.losses.det_db_loss,DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 32
total_epoch: 1200
PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Global:
algorithm: SAST
use_gpu: true
epoch_num: 2000
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_sast/
save_epoch_step: 20
eval_batch_step: 5000
train_batch_size_per_card: 8
test_batch_size_per_card: 8
image_shape: [3, 512, 512]
reader_yml: ./configs/det/det_sast_icdar15_reader.yml
pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
save_res_path: ./output/det_sast/predicts_sast.txt
checkpoints:
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
layers: 50
Head:
function: ppocr.modeling.heads.det_sast_head,SASTHead
model_name: large
only_fpn_up: False
# with_cab: False
with_cab: True
Loss:
function: ppocr.modeling.losses.det_sast_loss,SASTLoss
Optimizer:
function: ppocr.optimizer,RMSProp
base_lr: 0.001
decay:
function: piecewise_decay
boundaries: [30000, 50000, 80000, 100000, 150000]
decay_rate: 0.3
PostProcess:
function: ppocr.postprocess.sast_postprocess,SASTPostProcess
score_thresh: 0.5
sample_pts_num: 2
nms_thresh: 0.2
expand_scale: 1.0
shrink_ratio_of_width: 0.3
\ No newline at end of file
Global:
algorithm: SAST
use_gpu: true
epoch_num: 2000
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_sast/
save_epoch_step: 20
eval_batch_step: 5000
train_batch_size_per_card: 8
test_batch_size_per_card: 1
image_shape: [3, 512, 512]
reader_yml: ./configs/det/det_sast_totaltext_reader.yml
pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
save_res_path: ./output/det_sast/predicts_sast.txt
checkpoints:
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
layers: 50
Head:
function: ppocr.modeling.heads.det_sast_head,SASTHead
model_name: large
only_fpn_up: False
# with_cab: False
with_cab: True
Loss:
function: ppocr.modeling.losses.det_sast_loss,SASTLoss
Optimizer:
function: ppocr.optimizer,RMSProp
base_lr: 0.001
decay:
function: piecewise_decay
boundaries: [30000, 50000, 80000, 100000, 150000]
decay_rate: 0.3
PostProcess:
function: ppocr.postprocess.sast_postprocess,SASTPostProcess
score_thresh: 0.5
sample_pts_num: 6
nms_thresh: 0.2
expand_scale: 1.2
shrink_ratio_of_width: 0.2
\ No newline at end of file
TrainReader:
reader_function: ppocr.data.det.dataset_traversal,TrainReader
process_function: ppocr.data.det.sast_process,SASTProcessTrain
num_workers: 8
img_set_dir: ./train_data/
label_file_path: [./train_data/icdar2013/train_label_json.txt, ./train_data/icdar2015/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt]
data_ratio_list: [0.1, 0.45, 0.3, 0.15]
min_crop_side_ratio: 0.3
min_crop_size: 24
min_text_size: 4
max_text_size: 512
EvalReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
img_set_dir: ./train_data/icdar2015/text_localization/
label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
max_side_len: 1536
TestReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
infer_img: ./train_data/icdar2015/text_localization/ch4_test_images/img_11.jpg
max_side_len: 1536
TrainReader:
reader_function: ppocr.data.det.dataset_traversal,TrainReader
process_function: ppocr.data.det.sast_process,SASTProcessTrain
num_workers: 8
img_set_dir: ./train_data/
label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
data_ratio_list: [0.5, 0.5]
min_crop_side_ratio: 0.3
min_crop_size: 24
min_text_size: 4
max_text_size: 512
EvalReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
img_set_dir: ./train_data/
label_file_path: ./train_data/total_text_icdar_14pt/test_label_json.txt
max_side_len: 768
TestReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
infer_img: ./train_data/afs/total_text/Images/Test/img623.jpg
max_side_len: 768
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_CRNN
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 128
test_batch_size_per_card: 128
image_shape: [3, 32, 320]
max_text_length: 25
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: true
use_space_char: true
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_resnet_vd,ResNet
layers: 34
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
fc_decay: 0.00004
SeqRNN:
hidden_size: 256
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0005
l2_decay: 0.00004
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 254
total_epoch: 500
warmup_minibatch: 1000
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_CRNN
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: true
use_space_char: true
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
fc_decay: 0.00001
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0005
l2_decay: 0.00001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 254
total_epoch: 500
warmup_minibatch: 1000
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/en_number
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 30
character_type: ch
character_dict_path: ./ppocr/utils/ic15_dict.txt
loss_type: ctc
distort: false
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_en_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
warmup_minibatch: 1000
step_each_epoch: 6530
total_epoch: 500
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/en_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/en_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_french
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: french
character_dict_path: ./ppocr/utils/french_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/french_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/french_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_german
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: german
character_dict_path: ./ppocr/utils/german_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_ger_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/de_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/de_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_japan
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: japan
character_dict_path: ./ppocr/utils/japan_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_japan_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/japan_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/japan_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_korean
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: korean
character_dict_path: ./ppocr/utils/korean_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_korean_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/korean_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/korean_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
Global:
algorithm: SRN
use_gpu: true
epoch_num: 72
log_smooth_window: 20
print_batch_step: 10
save_model_dir: output/rec_pvam_withrotate
save_epoch_step: 1
eval_batch_step: 8000
train_batch_size_per_card: 64
test_batch_size_per_card: 1
image_shape: [1, 64, 256]
max_text_length: 25
character_type: en
loss_type: srn
num_heads: 8
average_window: 0.15
max_average_window: 15625
min_average_window: 10000
reader_yml: ./configs/rec/rec_benchmark_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_resnet_fpn,ResNet
layers: 50
Head:
function: ppocr.modeling.heads.rec_srn_all_head,SRNPredict
encoder_type: rnn
num_encoder_TUs: 2
num_decoder_TUs: 4
hidden_dims: 512
SeqRNN:
hidden_size: 256
Loss:
function: ppocr.modeling.losses.rec_srn_loss,SRNLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0001
beta1: 0.9
beta2: 0.999
# 如何快速测试
### 1. 安装最新版本的Android Studio
可以从https://developer.android.com/studio下载。本Demo使用是4.0版本Android Studio编写。
可以从https://developer.android.com/studio 下载。本Demo使用是4.0版本Android Studio编写。
### 2. 按照NDK 20 以上版本
Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编译成功。
......
......@@ -3,11 +3,11 @@ import java.security.MessageDigest
apply plugin: 'com.android.application'
android {
compileSdkVersion 28
compileSdkVersion 29
defaultConfig {
applicationId "com.baidu.paddle.lite.demo.ocr"
minSdkVersion 15
targetSdkVersion 28
minSdkVersion 23
targetSdkVersion 29
versionCode 1
versionName "1.0"
testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
......@@ -39,9 +39,8 @@ android {
dependencies {
implementation fileTree(include: ['*.jar'], dir: 'libs')
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'com.android.support.constraint:constraint-layout:1.1.3'
implementation 'com.android.support:design:28.0.0'
implementation 'androidx.appcompat:appcompat:1.1.0'
implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
testImplementation 'junit:junit:4.12'
androidTestImplementation 'com.android.support.test:runner:1.0.2'
androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2'
......
......@@ -14,10 +14,10 @@
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/AppTheme">
<!-- to test MiniActivity, change this to com.baidu.paddle.lite.demo.ocr.MiniActivity -->
<activity android:name="com.baidu.paddle.lite.demo.ocr.MainActivity">
<intent-filter>
<action android:name="android.intent.action.MAIN"/>
<category android:name="android.intent.category.LAUNCHER"/>
</intent-filter>
</activity>
......@@ -25,6 +25,15 @@
android:name="com.baidu.paddle.lite.demo.ocr.SettingsActivity"
android:label="Settings">
</activity>
<provider
android:name="androidx.core.content.FileProvider"
android:authorities="com.baidu.paddle.lite.demo.ocr.fileprovider"
android:exported="false"
android:grantUriPermissions="true">
<meta-data
android:name="android.support.FILE_PROVIDER_PATHS"
android:resource="@xml/file_paths"></meta-data>
</provider>
</application>
</manifest>
\ No newline at end of file
......@@ -4,112 +4,111 @@
#include "native.h"
#include "ocr_ppredictor.h"
#include <string>
#include <algorithm>
#include <paddle_api.h>
#include <string>
static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode);
extern "C"
JNIEXPORT jlong JNICALL
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(JNIEnv *env, jobject thiz,
jstring j_det_model_path,
jstring j_rec_model_path,
jint j_thread_num,
jstring j_cpu_mode) {
std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path);
std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path);
int thread_num = j_thread_num;
std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode);
ppredictor::OCR_Config conf;
conf.thread_num = thread_num;
conf.mode = str_to_cpu_mode(cpu_mode);
ppredictor::OCR_PPredictor *orc_predictor = new ppredictor::OCR_PPredictor{conf};
orc_predictor->init_from_file(det_model_path, rec_model_path);
return reinterpret_cast<jlong>(orc_predictor);
extern "C" JNIEXPORT jlong JNICALL
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(
JNIEnv *env, jobject thiz, jstring j_det_model_path,
jstring j_rec_model_path, jstring j_cls_model_path, jint j_thread_num,
jstring j_cpu_mode) {
std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path);
std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path);
std::string cls_model_path = jstring_to_cpp_string(env, j_cls_model_path);
int thread_num = j_thread_num;
std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode);
ppredictor::OCR_Config conf;
conf.thread_num = thread_num;
conf.mode = str_to_cpu_mode(cpu_mode);
ppredictor::OCR_PPredictor *orc_predictor =
new ppredictor::OCR_PPredictor{conf};
orc_predictor->init_from_file(det_model_path, rec_model_path, cls_model_path);
return reinterpret_cast<jlong>(orc_predictor);
}
/**
* "LITE_POWER_HIGH" 转为 paddle::lite_api::LITE_POWER_HIGH
* "LITE_POWER_HIGH" convert to paddle::lite_api::LITE_POWER_HIGH
* @param cpu_mode
* @return
*/
static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode) {
static std::map<std::string, paddle::lite_api::PowerMode> cpu_mode_map{
{"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH},
{"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH},
{"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL},
{"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND},
{"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH},
{"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW}
};
std::string upper_key;
std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(), ::toupper);
auto index = cpu_mode_map.find(upper_key);
if (index == cpu_mode_map.end()) {
LOGE("cpu_mode not found %s", upper_key.c_str());
return paddle::lite_api::LITE_POWER_HIGH;
} else {
return index->second;
}
static paddle::lite_api::PowerMode
str_to_cpu_mode(const std::string &cpu_mode) {
static std::map<std::string, paddle::lite_api::PowerMode> cpu_mode_map{
{"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH},
{"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH},
{"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL},
{"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND},
{"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH},
{"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW}};
std::string upper_key;
std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(),
::toupper);
auto index = cpu_mode_map.find(upper_key);
if (index == cpu_mode_map.end()) {
LOGE("cpu_mode not found %s", upper_key.c_str());
return paddle::lite_api::LITE_POWER_HIGH;
} else {
return index->second;
}
}
extern "C"
JNIEXPORT jfloatArray JNICALL
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward(JNIEnv *env, jobject thiz,
jlong java_pointer, jfloatArray buf,
jfloatArray ddims,
jobject original_image) {
LOGI("begin to run native forward");
if (java_pointer == 0) {
LOGE("JAVA pointer is NULL");
return cpp_array_to_jfloatarray(env, nullptr, 0);
}
cv::Mat origin = bitmap_to_cv_mat(env, original_image);
if (origin.size == 0) {
LOGE("origin bitmap cannot convert to CV Mat");
return cpp_array_to_jfloatarray(env, nullptr, 0);
}
ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer;
std::vector<float> dims_float_arr = jfloatarray_to_float_vector(env, ddims);
std::vector<int64_t> dims_arr;
dims_arr.resize(dims_float_arr.size());
std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin());
extern "C" JNIEXPORT jfloatArray JNICALL
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward(
JNIEnv *env, jobject thiz, jlong java_pointer, jfloatArray buf,
jfloatArray ddims, jobject original_image) {
LOGI("begin to run native forward");
if (java_pointer == 0) {
LOGE("JAVA pointer is NULL");
return cpp_array_to_jfloatarray(env, nullptr, 0);
}
cv::Mat origin = bitmap_to_cv_mat(env, original_image);
if (origin.size == 0) {
LOGE("origin bitmap cannot convert to CV Mat");
return cpp_array_to_jfloatarray(env, nullptr, 0);
}
ppredictor::OCR_PPredictor *ppredictor =
(ppredictor::OCR_PPredictor *)java_pointer;
std::vector<float> dims_float_arr = jfloatarray_to_float_vector(env, ddims);
std::vector<int64_t> dims_arr;
dims_arr.resize(dims_float_arr.size());
std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin());
// 这里值有点大,就不调用jfloatarray_to_float_vector了
int64_t buf_len = (int64_t) env->GetArrayLength(buf);
jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE);
float *data = (jfloat *) buf_data;
std::vector<ppredictor::OCRPredictResult> results = ppredictor->infer_ocr(dims_arr, data,
buf_len,
NET_OCR, origin);
LOGI("infer_ocr finished with boxes %ld", results.size());
// 这里将std::vector<ppredictor::OCRPredictResult> 序列化成 float数组,传输到java层再反序列化
std::vector<float> float_arr;
for (const ppredictor::OCRPredictResult &r :results) {
float_arr.push_back(r.points.size());
float_arr.push_back(r.word_index.size());
float_arr.push_back(r.score);
for (const std::vector<int> &point : r.points) {
float_arr.push_back(point.at(0));
float_arr.push_back(point.at(1));
}
for (int index: r.word_index) {
float_arr.push_back(index);
}
// 这里值有点大,就不调用jfloatarray_to_float_vector了
int64_t buf_len = (int64_t)env->GetArrayLength(buf);
jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE);
float *data = (jfloat *)buf_data;
std::vector<ppredictor::OCRPredictResult> results =
ppredictor->infer_ocr(dims_arr, data, buf_len, NET_OCR, origin);
LOGI("infer_ocr finished with boxes %ld", results.size());
// 这里将std::vector<ppredictor::OCRPredictResult> 序列化成
// float数组,传输到java层再反序列化
std::vector<float> float_arr;
for (const ppredictor::OCRPredictResult &r : results) {
float_arr.push_back(r.points.size());
float_arr.push_back(r.word_index.size());
float_arr.push_back(r.score);
for (const std::vector<int> &point : r.points) {
float_arr.push_back(point.at(0));
float_arr.push_back(point.at(1));
}
return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size());
for (int index : r.word_index) {
float_arr.push_back(index);
}
}
return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size());
}
extern "C"
JNIEXPORT void JNICALL
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(JNIEnv *env, jobject thiz,
jlong java_pointer){
if (java_pointer == 0) {
LOGE("JAVA pointer is NULL");
return;
}
ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer;
delete ppredictor;
extern "C" JNIEXPORT void JNICALL
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(
JNIEnv *env, jobject thiz, jlong java_pointer) {
if (java_pointer == 0) {
LOGE("JAVA pointer is NULL");
return;
}
ppredictor::OCR_PPredictor *ppredictor =
(ppredictor::OCR_PPredictor *)java_pointer;
delete ppredictor;
}
\ No newline at end of file
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "ocr_cls_process.h"
#include <cmath>
#include <cstring>
#include <fstream>
#include <iostream>
#include <iostream>
#include <vector>
const std::vector<int> CLS_IMAGE_SHAPE = {3, 32, 100};
cv::Mat cls_resize_img(const cv::Mat &img) {
int imgC = CLS_IMAGE_SHAPE[0];
int imgW = CLS_IMAGE_SHAPE[2];
int imgH = CLS_IMAGE_SHAPE[1];
float ratio = float(img.cols) / float(img.rows);
int resize_w = 0;
if (ceilf(imgH * ratio) > imgW)
resize_w = imgW;
else
resize_w = int(ceilf(imgH * ratio));
cv::Mat resize_img;
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
cv::INTER_CUBIC);
if (resize_w < imgW) {
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, int(imgW - resize_w),
cv::BORDER_CONSTANT, {0, 0, 0});
}
return resize_img;
}
\ No newline at end of file
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "common.h"
#include <opencv2/opencv.hpp>
#include <vector>
extern const std::vector<int> CLS_IMAGE_SHAPE;
cv::Mat cls_resize_img(const cv::Mat &img);
\ No newline at end of file
......@@ -3,184 +3,237 @@
//
#include "ocr_ppredictor.h"
#include "preprocess.h"
#include "common.h"
#include "ocr_db_post_process.h"
#include "ocr_cls_process.h"
#include "ocr_crnn_process.h"
#include "ocr_db_post_process.h"
#include "preprocess.h"
namespace ppredictor {
OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) {
OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) {}
}
int OCR_PPredictor::init(const std::string &det_model_content,
const std::string &rec_model_content,
const std::string &cls_model_content) {
_det_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
_det_predictor->init_nb(det_model_content);
int
OCR_PPredictor::init(const std::string &det_model_content, const std::string &rec_model_content) {
_det_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
_det_predictor->init_nb(det_model_content);
_rec_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_rec_predictor->init_nb(rec_model_content);
_rec_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_rec_predictor->init_nb(rec_model_content);
return RETURN_OK;
_cls_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_cls_predictor->init_nb(cls_model_content);
return RETURN_OK;
}
int OCR_PPredictor::init_from_file(const std::string &det_model_path, const std::string &rec_model_path){
_det_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
_det_predictor->init_from_file(det_model_path);
_rec_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_rec_predictor->init_from_file(rec_model_path);
return RETURN_OK;
int OCR_PPredictor::init_from_file(const std::string &det_model_path,
const std::string &rec_model_path,
const std::string &cls_model_path) {
_det_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
_det_predictor->init_from_file(det_model_path);
_rec_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_rec_predictor->init_from_file(rec_model_path);
_cls_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_cls_predictor->init_from_file(cls_model_path);
return RETURN_OK;
}
/**
* 调试用,保存第一步的框选结果
* for debug use, show result of First Step
* @param filter_boxes
* @param boxes
* @param srcimg
*/
static void visual_img(const std::vector<std::vector<std::vector<int>>> &filter_boxes,
const std::vector<std::vector<std::vector<int>>> &boxes,
const cv::Mat &srcimg) {
// visualization
cv::Point rook_points[filter_boxes.size()][4];
for (int n = 0; n < filter_boxes.size(); n++) {
for (int m = 0; m < filter_boxes[0].size(); m++) {
rook_points[n][m] = cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1]));
}
static void
visual_img(const std::vector<std::vector<std::vector<int>>> &filter_boxes,
const std::vector<std::vector<std::vector<int>>> &boxes,
const cv::Mat &srcimg) {
// visualization
cv::Point rook_points[filter_boxes.size()][4];
for (int n = 0; n < filter_boxes.size(); n++) {
for (int m = 0; m < filter_boxes[0].size(); m++) {
rook_points[n][m] =
cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1]));
}
cv::Mat img_vis;
srcimg.copyTo(img_vis);
for (int n = 0; n < boxes.size(); n++) {
const cv::Point *ppt[1] = {rook_points[n]};
int npt[] = {4};
cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0);
}
// 调试用,自行替换需要修改的路径
cv::imwrite("/sdcard/1/vis.png", img_vis);
}
cv::Mat img_vis;
srcimg.copyTo(img_vis);
for (int n = 0; n < boxes.size(); n++) {
const cv::Point *ppt[1] = {rook_points[n]};
int npt[] = {4};
cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0);
}
// 调试用,自行替换需要修改的路径
cv::imwrite("/sdcard/1/vis.png", img_vis);
}
std::vector<OCRPredictResult>
OCR_PPredictor::infer_ocr(const std::vector<int64_t> &dims, const float *input_data, int input_len,
int net_flag, cv::Mat &origin) {
OCR_PPredictor::infer_ocr(const std::vector<int64_t> &dims,
const float *input_data, int input_len, int net_flag,
cv::Mat &origin) {
PredictorInput input = _det_predictor->get_first_input();
input.set_dims(dims);
input.set_data(input_data, input_len);
std::vector<PredictorOutput> results = _det_predictor->infer();
PredictorOutput &res = results.at(0);
std::vector<std::vector<std::vector<int>>> filtered_box = calc_filtered_boxes(
res.get_float_data(), res.get_size(), (int)dims[2], (int)dims[3], origin);
LOGI("Filter_box size %ld", filtered_box.size());
return infer_rec(filtered_box, origin);
}
PredictorInput input = _det_predictor->get_first_input();
std::vector<OCRPredictResult> OCR_PPredictor::infer_rec(
const std::vector<std::vector<std::vector<int>>> &boxes,
const cv::Mat &origin_img) {
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
std::vector<int64_t> dims = {1, 3, 0, 0};
std::vector<OCRPredictResult> ocr_results;
PredictorInput input = _rec_predictor->get_first_input();
for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) {
const std::vector<std::vector<int>> &box = *bp;
cv::Mat crop_img = get_rotate_crop_image(origin_img, box);
crop_img = infer_cls(crop_img);
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio);
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
const float *dimg = reinterpret_cast<const float *>(input_image.data);
int input_size = input_image.rows * input_image.cols;
dims[2] = input_image.rows;
dims[3] = input_image.cols;
input.set_dims(dims);
input.set_data(input_data, input_len);
std::vector<PredictorOutput> results = _det_predictor->infer();
PredictorOutput &res = results.at(0);
std::vector<std::vector<std::vector<int>>> filtered_box
= calc_filtered_boxes(res.get_float_data(), res.get_size(), (int) dims[2], (int) dims[3],
origin);
LOGI("Filter_box size %ld", filtered_box.size());
return infer_rec(filtered_box, origin);
}
std::vector<OCRPredictResult>
OCR_PPredictor::infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes,
const cv::Mat &origin_img) {
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
std::vector<int64_t> dims = {1, 3, 0, 0};
std::vector<OCRPredictResult> ocr_results;
PredictorInput input = _rec_predictor->get_first_input();
for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) {
const std::vector<std::vector<int>> &box = *bp;
cv::Mat crop_img = get_rotate_crop_image(origin_img, box);
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio);
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
const float *dimg = reinterpret_cast<const float *>(input_image.data);
int input_size = input_image.rows * input_image.cols;
dims[2] = input_image.rows;
dims[3] = input_image.cols;
input.set_dims(dims);
neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean, scale);
std::vector<PredictorOutput> results = _rec_predictor->infer();
OCRPredictResult res;
res.word_index = postprocess_rec_word_index(results.at(0));
if (res.word_index.empty()) {
continue;
}
res.score = postprocess_rec_score(results.at(1));
res.points = box;
ocr_results.emplace_back(std::move(res));
neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean,
scale);
std::vector<PredictorOutput> results = _rec_predictor->infer();
OCRPredictResult res;
res.word_index = postprocess_rec_word_index(results.at(0));
if (res.word_index.empty()) {
continue;
}
LOGI("ocr_results finished %lu", ocr_results.size());
return ocr_results;
res.score = postprocess_rec_score(results.at(1));
res.points = box;
ocr_results.emplace_back(std::move(res));
}
LOGI("ocr_results finished %lu", ocr_results.size());
return ocr_results;
}
cv::Mat OCR_PPredictor::infer_cls(const cv::Mat &img, float thresh) {
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
std::vector<int64_t> dims = {1, 3, 0, 0};
std::vector<OCRPredictResult> ocr_results;
PredictorInput input = _cls_predictor->get_first_input();
cv::Mat input_image = cls_resize_img(img);
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
const float *dimg = reinterpret_cast<const float *>(input_image.data);
int input_size = input_image.rows * input_image.cols;
dims[2] = input_image.rows;
dims[3] = input_image.cols;
input.set_dims(dims);
neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean,
scale);
std::vector<PredictorOutput> results = _cls_predictor->infer();
const float *scores = results.at(0).get_float_data();
const int *labels = results.at(1).get_int_data();
for (int64_t i = 0; i < results.at(0).get_size(); i++) {
LOGI("output scores [%f]", scores[i]);
}
for (int64_t i = 0; i < results.at(1).get_size(); i++) {
LOGI("output label [%d]", labels[i]);
}
int label_idx = labels[0];
float score = scores[label_idx];
cv::Mat srcimg;
img.copyTo(srcimg);
if (label_idx % 2 == 1 && score > thresh) {
cv::rotate(srcimg, srcimg, 1);
}
return srcimg;
}
std::vector<std::vector<std::vector<int>>>
OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size, int output_height,
int output_width, const cv::Mat &origin) {
const double threshold = 0.3;
const double maxvalue = 1;
cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F);
memcpy(pred_map.data, pred, pred_size * sizeof(float));
cv::Mat cbuf_map;
pred_map.convertTo(cbuf_map, CV_8UC1);
cv::Mat bit_map;
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
std::vector<std::vector<std::vector<int>>> boxes = boxes_from_bitmap(pred_map, bit_map);
float ratio_h = output_height * 1.0f / origin.rows;
float ratio_w = output_width * 1.0f / origin.cols;
std::vector<std::vector<std::vector<int>>> filter_boxes = filter_tag_det_res(boxes, ratio_h,
ratio_w, origin);
return filter_boxes;
OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size,
int output_height, int output_width,
const cv::Mat &origin) {
const double threshold = 0.3;
const double maxvalue = 1;
cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F);
memcpy(pred_map.data, pred, pred_size * sizeof(float));
cv::Mat cbuf_map;
pred_map.convertTo(cbuf_map, CV_8UC1);
cv::Mat bit_map;
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
std::vector<std::vector<std::vector<int>>> boxes =
boxes_from_bitmap(pred_map, bit_map);
float ratio_h = output_height * 1.0f / origin.rows;
float ratio_w = output_width * 1.0f / origin.cols;
std::vector<std::vector<std::vector<int>>> filter_boxes =
filter_tag_det_res(boxes, ratio_h, ratio_w, origin);
return filter_boxes;
}
std::vector<int> OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) {
const int *rec_idx = res.get_int_data();
const std::vector<std::vector<uint64_t>> rec_idx_lod = res.get_lod();
std::vector<int>
OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) {
const int *rec_idx = res.get_int_data();
const std::vector<std::vector<uint64_t>> rec_idx_lod = res.get_lod();
std::vector<int> pred_idx;
for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) {
pred_idx.emplace_back(rec_idx[n]);
}
return pred_idx;
std::vector<int> pred_idx;
for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) {
pred_idx.emplace_back(rec_idx[n]);
}
return pred_idx;
}
float OCR_PPredictor::postprocess_rec_score(const PredictorOutput &res) {
const float *predict_batch = res.get_float_data();
const std::vector<int64_t> predict_shape = res.get_shape();
const std::vector<std::vector<uint64_t>> predict_lod = res.get_lod();
int blank = predict_shape[1];
float score = 0.f;
int count = 0;
for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) {
int argmax_idx = argmax(predict_batch + n * predict_shape[1],
predict_batch + (n + 1) * predict_shape[1]);
float max_value = predict_batch[n * predict_shape[1] + argmax_idx];
if (blank - 1 - argmax_idx > 1e-5) {
score += max_value;
count += 1;
}
}
if (count == 0) {
LOGE("calc score count 0");
} else {
score /= count;
const float *predict_batch = res.get_float_data();
const std::vector<int64_t> predict_shape = res.get_shape();
const std::vector<std::vector<uint64_t>> predict_lod = res.get_lod();
int blank = predict_shape[1];
float score = 0.f;
int count = 0;
for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) {
int argmax_idx = argmax(predict_batch + n * predict_shape[1],
predict_batch + (n + 1) * predict_shape[1]);
float max_value = predict_batch[n * predict_shape[1] + argmax_idx];
if (blank - 1 - argmax_idx > 1e-5) {
score += max_value;
count += 1;
}
LOGI("calc score: %f", score);
return score;
}
if (count == 0) {
LOGE("calc score count 0");
} else {
score /= count;
}
LOGI("calc score: %f", score);
return score;
}
NET_TYPE OCR_PPredictor::get_net_flag() const {
return NET_OCR;
}
NET_TYPE OCR_PPredictor::get_net_flag() const { return NET_OCR; }
}
\ No newline at end of file
......@@ -4,109 +4,119 @@
#pragma once
#include <string>
#include "ppredictor.h"
#include <opencv2/opencv.hpp>
#include <paddle_api.h>
#include "ppredictor.h"
#include <string>
namespace ppredictor {
/**
* 配置
* Config
*/
struct OCR_Config {
int thread_num = 4; // 线程数
paddle::lite_api::PowerMode mode = paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode
int thread_num = 4; // Thread num
paddle::lite_api::PowerMode mode =
paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode
};
/**
* 一个四边形内图片的推理结果,
* PolyGone Result
*/
struct OCRPredictResult {
std::vector<int> word_index; //
std::vector<std::vector<int>> points;
float score;
std::vector<int> word_index;
std::vector<std::vector<int>> points;
float score;
};
/**
* OCR 一共有2个模型进行推理,
* 1. 使用第一个模型(det),框选出多个四边形
* 2. 从原图从抠出这些多边形,使用第二个模型(rec),获取文本
* OCR there are 2 models
* 1. First model(det),select polygones to show where are the texts
* 2. crop from the origin images, use these polygones to infer
*/
class OCR_PPredictor : public PPredictor_Interface {
public:
OCR_PPredictor(const OCR_Config &config);
virtual ~OCR_PPredictor() {
}
/**
* 初始化二个模型的Predictor
* @param det_model_content
* @param rec_model_content
* @return
*/
int init(const std::string &det_model_content, const std::string &rec_model_content);
int init_from_file(const std::string &det_model_path, const std::string &rec_model_path);
/**
* 返回OCR结果
* @param dims
* @param input_data
* @param input_len
* @param net_flag
* @param origin
* @return
*/
virtual std::vector<OCRPredictResult>
infer_ocr(const std::vector<int64_t> &dims, const float *input_data, int input_len,
int net_flag, cv::Mat &origin);
virtual NET_TYPE get_net_flag() const;
OCR_PPredictor(const OCR_Config &config);
virtual ~OCR_PPredictor() {}
/**
* 初始化二个模型的Predictor
* @param det_model_content
* @param rec_model_content
* @return
*/
int init(const std::string &det_model_content,
const std::string &rec_model_content,
const std::string &cls_model_content);
int init_from_file(const std::string &det_model_path,
const std::string &rec_model_path,
const std::string &cls_model_path);
/**
* Return OCR result
* @param dims
* @param input_data
* @param input_len
* @param net_flag
* @param origin
* @return
*/
virtual std::vector<OCRPredictResult>
infer_ocr(const std::vector<int64_t> &dims, const float *input_data,
int input_len, int net_flag, cv::Mat &origin);
virtual NET_TYPE get_net_flag() const;
private:
/**
* 从第一个模型的结果中计算有文字的四边形
* @param pred
* @param output_height
* @param output_width
* @param origin
* @return
*/
std::vector<std::vector<std::vector<int>>>
calc_filtered_boxes(const float *pred, int pred_size, int output_height, int output_width,
const cv::Mat &origin);
/**
* 第二个模型的推理
*
* @param boxes
* @param origin
* @return
*/
std::vector<OCRPredictResult>
infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes, const cv::Mat &origin);
/**
* 第二个模型提取文字的后处理
* @param res
* @return
*/
std::vector<int> postprocess_rec_word_index(const PredictorOutput &res);
/**
* 计算第二个模型的文字的置信度
* @param res
* @return
*/
float postprocess_rec_score(const PredictorOutput &res);
std::unique_ptr<PPredictor> _det_predictor;
std::unique_ptr<PPredictor> _rec_predictor;
OCR_Config _config;
/**
* calcul Polygone from the result image of first model
* @param pred
* @param output_height
* @param output_width
* @param origin
* @return
*/
std::vector<std::vector<std::vector<int>>>
calc_filtered_boxes(const float *pred, int pred_size, int output_height,
int output_width, const cv::Mat &origin);
/**
* infer for second model
*
* @param boxes
* @param origin
* @return
*/
std::vector<OCRPredictResult>
infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes,
const cv::Mat &origin);
/**
* infer for cls model
*
* @param boxes
* @param origin
* @return
*/
cv::Mat infer_cls(const cv::Mat &origin, float thresh = 0.5);
/**
* Postprocess or sencod model to extract text
* @param res
* @return
*/
std::vector<int> postprocess_rec_word_index(const PredictorOutput &res);
/**
* calculate confidence of second model text result
* @param res
* @return
*/
float postprocess_rec_score(const PredictorOutput &res);
std::unique_ptr<PPredictor> _det_predictor;
std::unique_ptr<PPredictor> _rec_predictor;
std::unique_ptr<PPredictor> _cls_predictor;
OCR_Config _config;
};
}
......@@ -7,7 +7,7 @@
namespace ppredictor {
/**
* PaddleLite Preditor 通用接口
* PaddleLite Preditor Common Interface
*/
class PPredictor_Interface {
public:
......@@ -21,7 +21,7 @@ public:
};
/**
* 通用推理
* Common Predictor
*/
class PPredictor : public PPredictor_Interface {
public:
......@@ -33,9 +33,9 @@ public:
}
/**
* 初始化paddlitelite的opt模型,nb格式,与init_paddle二选一
* init paddlitelite opt model,nb format ,or use ini_paddle
* @param model_content
* @return 0 目前是固定值0, 之后其他值表示失败
* @return 0
*/
virtual int init_nb(const std::string &model_content);
......
......@@ -21,10 +21,10 @@ public:
const std::vector<std::vector<uint64_t>> get_lod() const;
const std::vector<int64_t> get_shape() const;
std::vector<float> data; // 通常是float返回,与下面的data_int二选一
std::vector<int> data_int; // 少数层是int返回,与 data二选一
std::vector<int64_t> shape; // PaddleLite输出层的shape
std::vector<std::vector<uint64_t>> lod; // PaddleLite输出层的lod
std::vector<float> data; // return float, or use data_int
std::vector<int> data_int; // several layers return int ,or use data
std::vector<int64_t> shape; // PaddleLite output shape
std::vector<std::vector<uint64_t>> lod; // PaddleLite output lod
private:
std::unique_ptr<const paddle::lite_api::Tensor> _tensor;
......
......@@ -19,15 +19,16 @@ package com.baidu.paddle.lite.demo.ocr;
import android.content.res.Configuration;
import android.os.Bundle;
import android.preference.PreferenceActivity;
import android.support.annotation.LayoutRes;
import android.support.annotation.Nullable;
import android.support.v7.app.ActionBar;
import android.support.v7.app.AppCompatDelegate;
import android.support.v7.widget.Toolbar;
import android.view.MenuInflater;
import android.view.View;
import android.view.ViewGroup;
import androidx.annotation.LayoutRes;
import androidx.annotation.Nullable;
import androidx.appcompat.app.ActionBar;
import androidx.appcompat.app.AppCompatDelegate;
import androidx.appcompat.widget.Toolbar;
/**
* A {@link PreferenceActivity} which implements and proxies the necessary calls
* to be used with AppCompat.
......
......@@ -3,23 +3,22 @@ package com.baidu.paddle.lite.demo.ocr;
import android.Manifest;
import android.app.ProgressDialog;
import android.content.ContentResolver;
import android.content.Context;
import android.content.Intent;
import android.content.SharedPreferences;
import android.content.pm.PackageManager;
import android.database.Cursor;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.media.ExifInterface;
import android.net.Uri;
import android.os.Bundle;
import android.os.Environment;
import android.os.Handler;
import android.os.HandlerThread;
import android.os.Message;
import android.preference.PreferenceManager;
import android.provider.MediaStore;
import android.support.annotation.NonNull;
import android.support.v4.app.ActivityCompat;
import android.support.v4.content.ContextCompat;
import android.support.v7.app.AppCompatActivity;
import android.text.method.ScrollingMovementMethod;
import android.util.Log;
import android.view.Menu;
......@@ -29,9 +28,17 @@ import android.widget.ImageView;
import android.widget.TextView;
import android.widget.Toast;
import androidx.annotation.NonNull;
import androidx.appcompat.app.AppCompatActivity;
import androidx.core.app.ActivityCompat;
import androidx.core.content.ContextCompat;
import androidx.core.content.FileProvider;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.text.SimpleDateFormat;
import java.util.Date;
public class MainActivity extends AppCompatActivity {
private static final String TAG = MainActivity.class.getSimpleName();
......@@ -69,6 +76,7 @@ public class MainActivity extends AppCompatActivity {
protected float[] inputMean = new float[]{};
protected float[] inputStd = new float[]{};
protected float scoreThreshold = 0.1f;
private String currentPhotoPath;
protected Predictor predictor = new Predictor();
......@@ -368,18 +376,56 @@ public class MainActivity extends AppCompatActivity {
}
private void takePhoto() {
Intent takePhotoIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
if (takePhotoIntent.resolveActivity(getPackageManager()) != null) {
startActivityForResult(takePhotoIntent, TAKE_PHOTO_REQUEST_CODE);
Intent takePictureIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
// Ensure that there's a camera activity to handle the intent
if (takePictureIntent.resolveActivity(getPackageManager()) != null) {
// Create the File where the photo should go
File photoFile = null;
try {
photoFile = createImageFile();
} catch (IOException ex) {
Log.e("MainActitity", ex.getMessage(), ex);
Toast.makeText(MainActivity.this,
"Create Camera temp file failed: " + ex.getMessage(), Toast.LENGTH_SHORT).show();
}
// Continue only if the File was successfully created
if (photoFile != null) {
Log.i(TAG, "FILEPATH " + getExternalFilesDir("Pictures").getAbsolutePath());
Uri photoURI = FileProvider.getUriForFile(this,
"com.baidu.paddle.lite.demo.ocr.fileprovider",
photoFile);
currentPhotoPath = photoFile.getAbsolutePath();
takePictureIntent.putExtra(MediaStore.EXTRA_OUTPUT, photoURI);
startActivityForResult(takePictureIntent, TAKE_PHOTO_REQUEST_CODE);
Log.i(TAG, "startActivityForResult finished");
}
}
}
private File createImageFile() throws IOException {
// Create an image file name
String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(new Date());
String imageFileName = "JPEG_" + timeStamp + "_";
File storageDir = getExternalFilesDir(Environment.DIRECTORY_PICTURES);
File image = File.createTempFile(
imageFileName, /* prefix */
".bmp", /* suffix */
storageDir /* directory */
);
return image;
}
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (resultCode == RESULT_OK && data != null) {
if (resultCode == RESULT_OK) {
switch (requestCode) {
case OPEN_GALLERY_REQUEST_CODE:
if (data == null) {
break;
}
try {
ContentResolver resolver = getContentResolver();
Uri uri = data.getData();
......@@ -393,9 +439,22 @@ public class MainActivity extends AppCompatActivity {
}
break;
case TAKE_PHOTO_REQUEST_CODE:
Bundle extras = data.getExtras();
Bitmap image = (Bitmap) extras.get("data");
onImageChanged(image);
if (currentPhotoPath != null) {
ExifInterface exif = null;
try {
exif = new ExifInterface(currentPhotoPath);
} catch (IOException e) {
e.printStackTrace();
}
int orientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION,
ExifInterface.ORIENTATION_UNDEFINED);
Log.i(TAG, "rotation " + orientation);
Bitmap image = BitmapFactory.decodeFile(currentPhotoPath);
image = Utils.rotateBitmap(image, orientation);
onImageChanged(image);
} else {
Log.e(TAG, "currentPhotoPath is null");
}
break;
default:
break;
......
package com.baidu.paddle.lite.demo.ocr;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.Build;
import android.os.Bundle;
import android.os.Handler;
import android.os.HandlerThread;
import android.os.Message;
import android.util.Log;
import android.view.View;
import android.widget.Button;
import android.widget.ImageView;
import android.widget.TextView;
import android.widget.Toast;
import androidx.appcompat.app.AppCompatActivity;
import java.io.IOException;
import java.io.InputStream;
public class MiniActivity extends AppCompatActivity {
public static final int REQUEST_LOAD_MODEL = 0;
public static final int REQUEST_RUN_MODEL = 1;
public static final int REQUEST_UNLOAD_MODEL = 2;
public static final int RESPONSE_LOAD_MODEL_SUCCESSED = 0;
public static final int RESPONSE_LOAD_MODEL_FAILED = 1;
public static final int RESPONSE_RUN_MODEL_SUCCESSED = 2;
public static final int RESPONSE_RUN_MODEL_FAILED = 3;
private static final String TAG = "MiniActivity";
protected Handler receiver = null; // Receive messages from worker thread
protected Handler sender = null; // Send command to worker thread
protected HandlerThread worker = null; // Worker thread to load&run model
protected volatile Predictor predictor = null;
private String assetModelDirPath = "models/ocr_v1_for_cpu";
private String assetlabelFilePath = "labels/ppocr_keys_v1.txt";
private Button button;
private ImageView imageView; // image result
private TextView textView; // text result
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_mini);
Log.i(TAG, "SHOW in Logcat");
// Prepare the worker thread for mode loading and inference
worker = new HandlerThread("Predictor Worker");
worker.start();
sender = new Handler(worker.getLooper()) {
public void handleMessage(Message msg) {
switch (msg.what) {
case REQUEST_LOAD_MODEL:
// Load model and reload test image
if (!onLoadModel()) {
runOnUiThread(new Runnable() {
@Override
public void run() {
Toast.makeText(MiniActivity.this, "Load model failed!", Toast.LENGTH_SHORT).show();
}
});
}
break;
case REQUEST_RUN_MODEL:
// Run model if model is loaded
final boolean isSuccessed = onRunModel();
runOnUiThread(new Runnable() {
@Override
public void run() {
if (isSuccessed){
onRunModelSuccessed();
}else{
Toast.makeText(MiniActivity.this, "Run model failed!", Toast.LENGTH_SHORT).show();
}
}
});
break;
}
}
};
sender.sendEmptyMessage(REQUEST_LOAD_MODEL); // corresponding to REQUEST_LOAD_MODEL, to call onLoadModel()
imageView = findViewById(R.id.imageView);
textView = findViewById(R.id.sample_text);
button = findViewById(R.id.button);
button.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
sender.sendEmptyMessage(REQUEST_RUN_MODEL);
}
});
}
@Override
protected void onDestroy() {
onUnloadModel();
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.JELLY_BEAN_MR2) {
worker.quitSafely();
} else {
worker.quit();
}
super.onDestroy();
}
/**
* call in onCreate, model init
*
* @return
*/
private boolean onLoadModel() {
if (predictor == null) {
predictor = new Predictor();
}
return predictor.init(this, assetModelDirPath, assetlabelFilePath);
}
/**
* init engine
* call in onCreate
*
* @return
*/
private boolean onRunModel() {
try {
String assetImagePath = "images/5.jpg";
InputStream imageStream = getAssets().open(assetImagePath);
Bitmap image = BitmapFactory.decodeStream(imageStream);
// Input is Bitmap
predictor.setInputImage(image);
return predictor.isLoaded() && predictor.runModel();
} catch (IOException e) {
e.printStackTrace();
return false;
}
}
private void onRunModelSuccessed() {
Log.i(TAG, "onRunModelSuccessed");
textView.setText(predictor.outputResult);
imageView.setImageBitmap(predictor.outputImage);
}
private void onUnloadModel() {
if (predictor != null) {
predictor.releaseModel();
}
}
}
......@@ -29,16 +29,16 @@ public class OCRPredictorNative {
public OCRPredictorNative(Config config) {
this.config = config;
loadLibrary();
nativePointer = init(config.detModelFilename, config.recModelFilename,
nativePointer = init(config.detModelFilename, config.recModelFilename,config.clsModelFilename,
config.cpuThreadNum, config.cpuPower);
Log.i("OCRPredictorNative", "load success " + nativePointer);
}
public void release(){
if (nativePointer != 0){
public void release() {
if (nativePointer != 0) {
nativePointer = 0;
destory(nativePointer);
// destory(nativePointer);
}
}
......@@ -55,10 +55,11 @@ public class OCRPredictorNative {
public String cpuPower;
public String detModelFilename;
public String recModelFilename;
public String clsModelFilename;
}
protected native long init(String detModelPath, String recModelPath, int threadNum, String cpuMode);
protected native long init(String detModelPath, String recModelPath,String clsModelPath, int threadNum, String cpuMode);
protected native float[] forward(long pointer, float[] buf, float[] ddims, Bitmap originalImage);
......
......@@ -38,7 +38,7 @@ public class Predictor {
protected float scoreThreshold = 0.1f;
protected Bitmap inputImage = null;
protected Bitmap outputImage = null;
protected String outputResult = "";
protected volatile String outputResult = "";
protected float preprocessTime = 0;
protected float postprocessTime = 0;
......@@ -46,6 +46,16 @@ public class Predictor {
public Predictor() {
}
public boolean init(Context appCtx, String modelPath, String labelPath) {
isLoaded = loadModel(appCtx, modelPath, cpuThreadNum, cpuPowerMode);
if (!isLoaded) {
return false;
}
isLoaded = loadLabel(appCtx, labelPath);
return isLoaded;
}
public boolean init(Context appCtx, String modelPath, String labelPath, int cpuThreadNum, String cpuPowerMode,
String inputColorFormat,
long[] inputShape, float[] inputMean,
......@@ -76,11 +86,7 @@ public class Predictor {
Log.e(TAG, "Only BGR color format is supported.");
return false;
}
isLoaded = loadModel(appCtx, modelPath, cpuThreadNum, cpuPowerMode);
if (!isLoaded) {
return false;
}
isLoaded = loadLabel(appCtx, labelPath);
boolean isLoaded = init(appCtx, modelPath, labelPath);
if (!isLoaded) {
return false;
}
......@@ -115,7 +121,8 @@ public class Predictor {
config.cpuThreadNum = cpuThreadNum;
config.detModelFilename = realPath + File.separator + "ch_det_mv3_db_opt.nb";
config.recModelFilename = realPath + File.separator + "ch_rec_mv3_crnn_opt.nb";
Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename);
config.clsModelFilename = realPath + File.separator + "cls_opt_arm.nb";
Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename + ";" + config.clsModelFilename);
config.cpuPower = cpuPowerMode;
paddlePredictor = new OCRPredictorNative(config);
......@@ -127,12 +134,12 @@ public class Predictor {
}
public void releaseModel() {
if (paddlePredictor != null){
if (paddlePredictor != null) {
paddlePredictor.release();
paddlePredictor = null;
}
isLoaded = false;
cpuThreadNum = 4;
cpuThreadNum = 1;
cpuPowerMode = "LITE_POWER_HIGH";
modelPath = "";
modelName = "";
......@@ -222,7 +229,7 @@ public class Predictor {
for (int i = 0; i < warmupIterNum; i++) {
paddlePredictor.runImage(inputData, width, height, channels, inputImage);
}
warmupIterNum = 0; // 之后不要再warm了
warmupIterNum = 0; // do not need warm
// Run inference
start = new Date();
ArrayList<OcrResultModel> results = paddlePredictor.runImage(inputData, width, height, channels, inputImage);
......@@ -287,9 +294,7 @@ public class Predictor {
if (image == null) {
return;
}
// Scale image to the size of input tensor
Bitmap rgbaImage = image.copy(Bitmap.Config.ARGB_8888, true);
this.inputImage = rgbaImage;
this.inputImage = image.copy(Bitmap.Config.ARGB_8888, true);
}
private ArrayList<OcrResultModel> postprocess(ArrayList<OcrResultModel> results) {
......@@ -310,7 +315,7 @@ public class Predictor {
private void drawResults(ArrayList<OcrResultModel> results) {
StringBuffer outputResultSb = new StringBuffer("");
for (int i=0;i<results.size();i++) {
for (int i = 0; i < results.size(); i++) {
OcrResultModel result = results.get(i);
StringBuilder sb = new StringBuilder("");
sb.append(result.getLabel());
......@@ -319,8 +324,8 @@ public class Predictor {
for (Point p : result.getPoints()) {
sb.append("(").append(p.x).append(",").append(p.y).append(") ");
}
Log.i(TAG, sb.toString());
outputResultSb.append(i+1).append(": ").append(result.getLabel()).append("\n");
Log.i(TAG, sb.toString()); // show LOG in Logcat panel
outputResultSb.append(i + 1).append(": ").append(result.getLabel()).append("\n");
}
outputResult = outputResultSb.toString();
outputImage = inputImage;
......
......@@ -5,7 +5,8 @@ import android.os.Bundle;
import android.preference.CheckBoxPreference;
import android.preference.EditTextPreference;
import android.preference.ListPreference;
import android.support.v7.app.ActionBar;
import androidx.appcompat.app.ActionBar;
import java.util.ArrayList;
import java.util.List;
......
......@@ -2,6 +2,8 @@ package com.baidu.paddle.lite.demo.ocr;
import android.content.Context;
import android.graphics.Bitmap;
import android.graphics.Matrix;
import android.media.ExifInterface;
import android.os.Environment;
import java.io.*;
......@@ -110,4 +112,48 @@ public class Utils {
}
return Bitmap.createScaledBitmap(bitmap, newWidth, newHeight, true);
}
public static Bitmap rotateBitmap(Bitmap bitmap, int orientation) {
Matrix matrix = new Matrix();
switch (orientation) {
case ExifInterface.ORIENTATION_NORMAL:
return bitmap;
case ExifInterface.ORIENTATION_FLIP_HORIZONTAL:
matrix.setScale(-1, 1);
break;
case ExifInterface.ORIENTATION_ROTATE_180:
matrix.setRotate(180);
break;
case ExifInterface.ORIENTATION_FLIP_VERTICAL:
matrix.setRotate(180);
matrix.postScale(-1, 1);
break;
case ExifInterface.ORIENTATION_TRANSPOSE:
matrix.setRotate(90);
matrix.postScale(-1, 1);
break;
case ExifInterface.ORIENTATION_ROTATE_90:
matrix.setRotate(90);
break;
case ExifInterface.ORIENTATION_TRANSVERSE:
matrix.setRotate(-90);
matrix.postScale(-1, 1);
break;
case ExifInterface.ORIENTATION_ROTATE_270:
matrix.setRotate(-90);
break;
default:
return bitmap;
}
try {
Bitmap bmRotated = Bitmap.createBitmap(bitmap, 0, 0, bitmap.getWidth(), bitmap.getHeight(), matrix, true);
bitmap.recycle();
return bmRotated;
}
catch (OutOfMemoryError e) {
e.printStackTrace();
return null;
}
}
}
<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
......@@ -96,4 +96,4 @@
</RelativeLayout>
</android.support.constraint.ConstraintLayout>
\ No newline at end of file
</androidx.constraintlayout.widget.ConstraintLayout>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!-- for MiniActivity Use Only -->
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintLeft_toRightOf="parent"
tools:context=".MainActivity">
<TextView
android:id="@+id/sample_text"
android:layout_width="0dp"
android:layout_height="wrap_content"
android:text="Hello World!"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
app:layout_constraintTop_toBottomOf="@id/imageView"
android:scrollbars="vertical"
/>
<ImageView
android:id="@+id/imageView"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:paddingTop="20dp"
android:paddingBottom="20dp"
app:layout_constraintBottom_toTopOf="@id/imageView"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
app:layout_constraintTop_toTopOf="parent"
tools:srcCompat="@tools:sample/avatars" />
<Button
android:id="@+id/button"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginBottom="4dp"
android:text="Button"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
tools:layout_editor_absoluteX="161dp" />
</androidx.constraintlayout.widget.ConstraintLayout>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<paths xmlns:android="http://schemas.android.com/apk/res/android">
<external-files-path name="my_images" path="Pictures" />
</paths>
\ No newline at end of file
#Thu Aug 22 15:05:37 CST 2019
#Wed Jul 22 23:48:44 CST 2020
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
......
project(ocr_system CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
option(WITH_TENSORRT "Compile demo with TensorRT." OFF)
SET(PADDLE_LIB "" CACHE PATH "Location of libraries")
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
SET(CUDNN_LIB "" CACHE PATH "Location of libraries")
SET(TENSORRT_DIR "" CACHE PATH "Compile demo with TensorRT")
set(DEMO_NAME "ocr_system")
macro(safe_set_static_flag)
......@@ -15,24 +24,60 @@ macro(safe_set_static_flag)
endforeach(flag_var)
endmacro()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -g -fpermissive")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
message("flags" ${CMAKE_CXX_FLAGS})
set(CMAKE_CXX_FLAGS_RELEASE "-O3")
if (WITH_MKL)
ADD_DEFINITIONS(-DUSE_MKL)
endif()
if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
if(NOT DEFINED OPENCV_DIR)
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif()
set(OPENCV_DIR ${OPENCV_DIR})
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
if (WIN32)
include_directories("${PADDLE_LIB}/paddle/fluid/inference")
include_directories("${PADDLE_LIB}/paddle/include")
link_directories("${PADDLE_LIB}/paddle/fluid/inference")
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH)
else ()
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
include_directories("${PADDLE_LIB}/paddle/include")
link_directories("${PADDLE_LIB}/paddle/lib")
endif ()
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories("${PADDLE_LIB}/paddle/include")
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
if (WITH_STATIC_LIB)
safe_set_static_flag()
add_definitions(-DSTATIC_LIB)
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o3 -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
endif()
message("flags" ${CMAKE_CXX_FLAGS})
if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
endif()
if (NOT WIN32)
if (NOT DEFINED CUDNN_LIB)
message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
endif()
endif(NOT WIN32)
endif()
include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
include_directories("${PADDLE_LIB}/third_party/install/glog/include")
include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
......@@ -43,10 +88,12 @@ include_directories("${PADDLE_LIB}/third_party/eigen3")
include_directories("${CMAKE_SOURCE_DIR}/")
if (USE_TENSORRT AND WITH_GPU)
include_directories("${TENSORRT_ROOT}/include")
link_directories("${TENSORRT_ROOT}/lib")
endif()
if (NOT WIN32)
if (WITH_TENSORRT AND WITH_GPU)
include_directories("${TENSORRT_DIR}/include")
link_directories("${TENSORRT_DIR}/lib")
endif()
endif(NOT WIN32)
link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
......@@ -57,17 +104,24 @@ link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
AUX_SOURCE_DIRECTORY(./src SRCS)
add_executable(${DEMO_NAME} ${SRCS})
if(WITH_MKL)
include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
if (WIN32)
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.lib
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.lib)
else ()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
execute_process(COMMAND cp -r ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib)
endif ()
set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
if (WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else ()
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif ()
endif()
else()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
......@@ -82,24 +136,66 @@ else()
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(EXTERNAL_LIB "-lrt -ldl -lpthread -lm")
if (NOT WIN32)
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
)
if(EXISTS "${PADDLE_LIB}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
if (EXISTS "${PADDLE_LIB}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags_static libprotobuf xxhash)
set(DEPS ${DEPS} libcmt shlwapi)
if (EXISTS "${PADDLE_LIB}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
if(EXISTS "${PADDLE_LIB}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
endif(NOT WIN32)
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
${EXTERNAL_LIB} ${OpenCV_LIBS})
if(WITH_GPU)
if (USE_TENSORRT)
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
if(NOT WIN32)
if (WITH_TENSORRT)
set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/libcublas${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX} )
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread")
set(DEPS ${DEPS} ${EXTERNAL_LIB})
endif()
set(DEPS ${DEPS} ${OpenCV_LIBS})
AUX_SOURCE_DIRECTORY(./src SRCS)
add_executable(${DEMO_NAME} ${SRCS})
target_link_libraries(${DEMO_NAME} ${DEPS})
if (WIN32 AND WITH_MKL)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
)
endif()
\ No newline at end of file
# Visual Studio 2019 Community CMake 编译指南
PaddleOCR在Windows 平台下基于`Visual Studio 2019 Community` 进行了测试。微软从`Visual Studio 2017`开始即支持直接管理`CMake`跨平台编译项目,但是直到`2019`才提供了稳定和完全的支持,所以如果你想使用CMake管理项目编译构建,我们推荐你使用`Visual Studio 2019`环境下构建。
## 前置条件
* Visual Studio 2019
* CUDA 9.0 / CUDA 10.0,cudnn 7+ (仅在使用GPU版本的预测库时需要)
* CMake 3.0+
请确保系统已经安装好上述基本软件,我们使用的是`VS2019`的社区版。
**下面所有示例以工作目录为 `D:\projects`演示**
### Step1: 下载PaddlePaddle C++ 预测库 fluid_inference
PaddlePaddle C++ 预测库针对不同的`CPU``CUDA`版本提供了不同的预编译版本,请根据实际情况下载: [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/windows_cpp_inference.html)
解压后`D:\projects\fluid_inference`目录包含内容为:
```
fluid_inference
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
|
└── version.txt # 版本和编译信息
```
### Step2: 安装配置OpenCV
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv`
3. 配置环境变量,如下流程所示
- 我的电脑->属性->高级系统设置->环境变量
- 在系统变量中找到Path(如没有,自行创建),并双击编辑
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
### Step3: 使用Visual Studio 2019直接编译CMake
1. 打开Visual Studio 2019 Community,点击`继续但无需代码`
![step2](https://paddleseg.bj.bcebos.com/inference/vs2019_step1.png)
2. 点击: `文件`->`打开`->`CMake`
![step2.1](https://paddleseg.bj.bcebos.com/inference/vs2019_step2.png)
选择项目代码所在路径,并打开`CMakeList.txt`
![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step3.png)
3. 点击:`项目`->`cpp_inference_demo的CMake设置`
![step3](https://paddleseg.bj.bcebos.com/inference/vs2019_step4.png)
4. 点击`浏览`,分别设置编译选项指定`CUDA``CUDNN_LIB``OpenCV``Paddle预测库`的路径
三个编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐,**使用9.0、10.0版本,不使用9.2、10.1等版本CUDA库**):
| 参数名 | 含义 |
| ---- | ---- |
| *CUDA_LIB | CUDA的库路径 |
| *CUDNN_LIB | CUDNN的库路径 |
| OPENCV_DIR | OpenCV的安装路径 |
| PADDLE_LIB | Paddle预测库的路径 |
**注意:**
1. 使用`CPU`版预测库,请把`WITH_GPU`的勾去掉
2. 如果使用的是`openblas`版本,请把`WITH_MKL`勾去掉
![step4](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
**设置完成后**, 点击上图中`保存并生成CMake缓存以加载变量`
5. 点击`生成`->`全部生成`
![step6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
### Step4: 预测及可视化
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录:
```
cd D:\projects\PaddleOCR\deploy\cpp_infer\out\build\x64-Release
```
可执行文件`ocr_system.exe`即为样例的预测程序,其主要使用方法如下
```shell
#预测图片 `D:\projects\PaddleOCR\doc\imgs\10.jpg`
.\ocr_system.exe D:\projects\PaddleOCR\deploy\cpp_infer\tools\config.txt D:\projects\PaddleOCR\doc\imgs\10.jpg
```
第一个参数为配置文件路径,第二个参数为需要预测的图片路径。
### 注意
* 在Windows下的终端中执行文件exe时,可能会发生乱码的现象,此时需要在终端中输入`CHCP 65001`,将终端的编码方式由GBK编码(默认)改为UTF-8编码,更加具体的解释可以参考这篇博客:[https://blog.csdn.net/qq_35038153/article/details/78430359](https://blog.csdn.net/qq_35038153/article/details/78430359)
......@@ -41,13 +41,15 @@ public:
this->use_mkldnn = bool(stoi(config_map_["use_mkldnn"]));
this->use_zero_copy_run = bool(stoi(config_map_["use_zero_copy_run"]));
this->max_side_len = stoi(config_map_["max_side_len"]);
this->det_db_thresh = stod(config_map_["det_db_thresh"]);
this->det_db_box_thresh = stod(config_map_["det_db_box_thresh"]);
this->det_db_box_thresh = stod(config_map_["det_db_box_thresh"]);
this->det_db_unclip_ratio = stod(config_map_["det_db_unclip_ratio"]);
this->det_model_dir.assign(config_map_["det_model_dir"]);
......@@ -55,6 +57,12 @@ public:
this->char_list_file.assign(config_map_["char_list_file"]);
this->use_angle_cls = bool(stoi(config_map_["use_angle_cls"]));
this->cls_model_dir.assign(config_map_["cls_model_dir"]);
this->cls_thresh = stod(config_map_["cls_thresh"]);
this->visualize = bool(stoi(config_map_["visualize"]));
}
......@@ -68,6 +76,8 @@ public:
bool use_mkldnn = false;
bool use_zero_copy_run = false;
int max_side_len = 960;
double det_db_thresh = 0.3;
......@@ -80,8 +90,14 @@ public:
std::string rec_model_dir;
bool use_angle_cls;
std::string char_list_file;
std::string cls_model_dir;
double cls_thresh;
bool visualize = true;
void PrintConfigInfo();
......
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include "paddle_api.h"
#include "paddle_inference_api.h"
#include <chrono>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <vector>
#include <cstring>
#include <fstream>
#include <numeric>
#include <include/preprocess_op.h>
#include <include/utility.h>
namespace PaddleOCR {
class Classifier {
public:
explicit Classifier(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const bool &use_zero_copy_run,
const double &cls_thresh) {
this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->cls_thresh = cls_thresh;
LoadModel(model_dir);
}
// Load Paddle inference model
void LoadModel(const std::string &model_dir);
cv::Mat Run(cv::Mat &img);
private:
std::shared_ptr<PaddlePredictor> predictor_;
bool use_gpu_ = false;
int gpu_id_ = 0;
int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
double cls_thresh = 0.5;
std::vector<float> mean_ = {0.5f, 0.5f, 0.5f};
std::vector<float> scale_ = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
bool is_scale_ = true;
// pre-process
ClsResizeImg resize_op_;
Normalize normalize_op_;
Permute permute_op_;
}; // class Classifier
} // namespace PaddleOCR
......@@ -39,8 +39,8 @@ public:
explicit DBDetector(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const int &max_side_len,
const double &det_db_thresh,
const bool &use_mkldnn, const bool &use_zero_copy_run,
const int &max_side_len, const double &det_db_thresh,
const double &det_db_box_thresh,
const double &det_db_unclip_ratio,
const bool &visualize) {
......@@ -49,6 +49,7 @@ public:
this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->max_side_len_ = max_side_len;
......@@ -75,6 +76,7 @@ private:
int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
int max_side_len_ = 960;
......
......@@ -27,6 +27,7 @@
#include <fstream>
#include <numeric>
#include <include/ocr_cls.h>
#include <include/postprocess_op.h>
#include <include/preprocess_op.h>
#include <include/utility.h>
......@@ -38,14 +39,17 @@ public:
explicit CRNNRecognizer(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const string &label_path) {
const bool &use_mkldnn, const bool &use_zero_copy_run,
const string &label_path) {
this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->label_list_ = Utility::ReadDict(label_path);
this->label_list_.push_back(" ");
LoadModel(model_dir);
}
......@@ -53,7 +57,8 @@ public:
// Load Paddle inference model
void LoadModel(const std::string &model_dir);
void Run(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat &img);
void Run(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat &img,
Classifier *cls);
private:
std::shared_ptr<PaddlePredictor> predictor_;
......@@ -63,6 +68,7 @@ private:
int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
std::vector<std::string> label_list_;
......@@ -83,4 +89,4 @@ private:
}; // class CrnnRecognizer
} // namespace PaddleOCR
\ No newline at end of file
} // namespace PaddleOCR
......@@ -56,4 +56,10 @@ public:
const std::vector<int> &rec_image_shape = {3, 32, 320});
};
class ClsResizeImg {
public:
virtual void Run(const cv::Mat &img, cv::Mat &resize_img,
const std::vector<int> &rec_image_shape = {3, 32, 320});
};
} // namespace PaddleOCR
\ No newline at end of file
......@@ -7,6 +7,9 @@
### 运行准备
- Linux环境,推荐使用docker。
- Windows环境,目前支持基于`Visual Studio 2019 Community`进行编译。
* 该文档主要介绍基于Linux环境的PaddleOCR C++预测流程,如果需要在Windows下基于预测库进行C++预测,具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md)
### 1.1 编译opencv库
......@@ -184,12 +187,15 @@ make -j
### 运行demo
* 执行以下命令,完成对一幅图像的OCR识别与检测,最终输出
* 执行以下命令,完成对一幅图像的OCR识别与检测
```shell
sh tools/run.sh
```
* 若需要使用方向分类器,则需要将`tools/config.txt`中的`use_angle_cls`参数修改为1,表示开启方向分类器的预测。
最终屏幕上会输出检测结果如下。
<div align="center">
......
# Server-side C++ inference
In this tutorial, we will introduce the detailed steps of deploying PaddleOCR ultra-lightweight Chinese detection and recognition models on the server side.
## 1. Prepare the environment
### Environment
- Linux, docker is recommended.
### 1.1 Compile opencv
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download command is as follows.
```
wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz
tar -xf 3.4.7.tar.gz
```
Finally, you can see the folder of `opencv-3.4.7/` in the current directory.
* Compile opencv, the opencv source path (`root_path`) and installation path (`install_path`) should be set by yourself. Enter the opencv source code path and compile it in the following way.
```shell
root_path=your_opencv_root_path
install_path=${root_path}/opencv3
rm -rf build
mkdir build
cd build
cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path} \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_IPP=OFF \
-DBUILD_IPP_IW=OFF \
-DWITH_LAPACK=OFF \
-DWITH_EIGEN=OFF \
-DCMAKE_INSTALL_LIBDIR=lib64 \
-DWITH_ZLIB=ON \
-DBUILD_ZLIB=ON \
-DWITH_JPEG=ON \
-DBUILD_JPEG=ON \
-DWITH_PNG=ON \
-DBUILD_PNG=ON \
-DWITH_TIFF=ON \
-DBUILD_TIFF=ON
make -j
make install
```
Among them, `root_path` is the downloaded opencv source code path, and `install_path` is the installation path of opencv. After `make install` is completed, the opencv header file and library file will be generated in this folder for later OCR source code compilation.
The final file structure under the opencv installation path is as follows.
```
opencv3/
|-- bin
|-- include
|-- lib
|-- lib64
|-- share
```
### 1.2 Compile or download or the Paddle inference library
* There are 2 ways to obtain the Paddle inference library, described in detail below.
#### 1.2.1 Compile from the source code
* If you want to get the latest Paddle inference library features, you can download the latest code from Paddle github repository and compile the inference library from the source code.
* You can refer to [Paddle inference library] (https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html) to get the Paddle source code from github, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
```shell
git clone https://github.com/PaddlePaddle/Paddle.git
```
* After entering the Paddle directory, the compilation method is as follows.
```shell
rm -rf build
mkdir build
cd build
cmake .. \
-DWITH_CONTRIB=OFF \
-DWITH_MKL=ON \
-DWITH_MKLDNN=ON \
-DWITH_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_INFERENCE_API_TEST=OFF \
-DON_INFER=ON \
-DWITH_PYTHON=ON
make -j
make inference_lib_dist
```
For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html).
* After the compilation process, you can see the following files in the folder of `build/fluid_inference_install_dir/`.
```
build/fluid_inference_install_dir/
|-- CMakeCache.txt
|-- paddle
|-- third_party
|-- version.txt
```
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
#### 1.2.2 Direct download and installation
* Different cuda versions of the Linux inference library (based on GCC 4.8.2) are provided on the
[Paddle inference library official website](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html). You can view and select the appropriate version of the inference library on the official website.
* After downloading, use the following method to uncompress.
```
tar -xf fluid_inference.tgz
```
Finally you can see the following files in the folder of `fluid_inference/`.
## 2. Compile and run the demo
### 2.1 Export the inference model
* You can refer to [Model inference](../../doc/doc_ch/inference.md),export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
```
inference/
|-- det_db
| |--model
| |--params
|-- rec_rcnn
| |--model
| |--params
```
### 2.2 Compile PaddleOCR C++ inference demo
* The compilation commands are as follows. The addresses of Paddle C++ inference library, opencv and other Dependencies need to be replaced with the actual addresses on your own machines.
```shell
sh tools/build.sh
```
Specifically, the content in `tools/build.sh` is as follows.
```shell
OPENCV_DIR=your_opencv_dir
LIB_DIR=your_paddle_inference_dir
CUDA_LIB_DIR=your_cuda_lib_dir
CUDNN_LIB_DIR=your_cudnn_lib_dir
BUILD_DIR=build
rm -rf ${BUILD_DIR}
mkdir ${BUILD_DIR}
cd ${BUILD_DIR}
cmake .. \
-DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=ON \
-DDEMO_NAME=ocr_system \
-DWITH_GPU=OFF \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=OFF \
-DOPENCV_DIR=${OPENCV_DIR} \
-DCUDNN_LIB=${CUDNN_LIB_DIR} \
-DCUDA_LIB=${CUDA_LIB_DIR} \
make -j
```
`OPENCV_DIR` is the opencv installation path; `LIB_DIR` is the download (`fluid_inference` folder) or the generated Paddle inference library path (`build/fluid_inference_install_dir` folder); `CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* After the compilation is completed, an executable file named `ocr_system` will be generated in the `build` folder.
### Run the demo
* Execute the following command to complete the OCR recognition and detection of an image.
```shell
sh tools/run.sh
```
* If you want to orientation classifier to correct the detected boxes, you can set `use_angle_cls` in the file `tools/config.txt` as 1 to enable the function.
The detection results will be shown on the screen, which is as follows.
<div align="center">
<img src="../imgs/cpp_infer_pred_12.png" width="600">
</div>
### 2.3 Note
* `MKLDNN` is disabled by default for C++ inference (`use_mkldnn` in `tools/config.txt` is set to 0), if you need to use MKLDNN for inference acceleration, you need to modify `use_mkldnn` to 1, and use the latest version of the Paddle source code to compile the inference library. When using MKLDNN for CPU prediction, if multiple images are predicted at the same time, there will be a memory leak problem (the problem is not present if MKLDNN is disabled). The problem is currently being fixed, and the temporary solution is: when predicting multiple pictures, Re-initialize the recognition (`CRNNRecognizer`) and detection class (`DBDetector`) every 30 pictures or so.
......@@ -44,7 +44,7 @@ Config::LoadConfig(const std::string &config_path) {
std::map<std::string, std::string> dict;
for (int i = 0; i < config.size(); i++) {
// pass for empty line or comment
if (config[i].size() <= 1 or config[i][0] == '#') {
if (config[i].size() <= 1 || config[i][0] == '#') {
continue;
}
std::vector<std::string> res = split(config[i], " ");
......
......@@ -48,20 +48,30 @@ int main(int argc, char **argv) {
cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR);
DBDetector det(config.det_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.max_side_len, config.det_db_thresh,
config.det_db_box_thresh, config.det_db_unclip_ratio,
config.visualize);
DBDetector det(
config.det_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem,
config.cpu_math_library_num_threads, config.use_mkldnn,
config.use_zero_copy_run, config.max_side_len, config.det_db_thresh,
config.det_db_box_thresh, config.det_db_unclip_ratio, config.visualize);
Classifier *cls = nullptr;
if (config.use_angle_cls == true) {
cls = new Classifier(config.cls_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.use_zero_copy_run,
config.cls_thresh);
}
CRNNRecognizer rec(config.rec_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.char_list_file);
config.use_mkldnn, config.use_zero_copy_run,
config.char_list_file);
auto start = std::chrono::system_clock::now();
std::vector<std::vector<std::vector<int>>> boxes;
det.Run(srcimg, boxes);
rec.Run(boxes, srcimg);
rec.Run(boxes, srcimg, cls);
auto end = std::chrono::system_clock::now();
auto duration =
......
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <include/ocr_cls.h>
namespace PaddleOCR {
cv::Mat Classifier::Run(cv::Mat &img) {
cv::Mat src_img;
img.copyTo(src_img);
cv::Mat resize_img;
std::vector<int> rec_image_shape = {3, 32, 100};
int index = 0;
float wh_ratio = float(img.cols) / float(img.rows);
this->resize_op_.Run(img, resize_img, rec_image_shape);
this->normalize_op_.Run(&resize_img, this->mean_, this->scale_,
this->is_scale_);
std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
this->permute_op_.Run(&resize_img, input.data());
// Inference.
if (this->use_zero_copy_run_) {
auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<float> softmax_out;
std::vector<int64_t> label_out;
auto output_names = this->predictor_->GetOutputNames();
auto softmax_out_t = this->predictor_->GetOutputTensor(output_names[0]);
auto label_out_t = this->predictor_->GetOutputTensor(output_names[1]);
auto softmax_shape_out = softmax_out_t->shape();
auto label_shape_out = label_out_t->shape();
int softmax_out_num =
std::accumulate(softmax_shape_out.begin(), softmax_shape_out.end(), 1,
std::multiplies<int>());
int label_out_num =
std::accumulate(label_shape_out.begin(), label_shape_out.end(), 1,
std::multiplies<int>());
softmax_out.resize(softmax_out_num);
label_out.resize(label_out_num);
softmax_out_t->copy_to_cpu(softmax_out.data());
label_out_t->copy_to_cpu(label_out.data());
int label = label_out[0];
float score = softmax_out[label];
// std::cout << "\nlabel "<<label<<" score: "<<score;
if (label % 2 == 1 && score > this->cls_thresh) {
cv::rotate(src_img, src_img, 1);
}
return src_img;
}
void Classifier::LoadModel(const std::string &model_dir) {
AnalysisConfig config;
config.SetModel(model_dir + "/model", model_dir + "/params");
if (this->use_gpu_) {
config.EnableUseGpu(this->gpu_mem_, this->gpu_id_);
} else {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
// false for zero copy tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
// true for multiple input
config.SwitchSpecifyInputNames(true);
config.SwitchIrOptim(true);
config.EnableMemoryOptim();
config.DisableGlogInfo();
this->predictor_ = CreatePaddlePredictor(config);
}
} // namespace PaddleOCR
......@@ -26,12 +26,15 @@ void DBDetector::LoadModel(const std::string &model_dir) {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
// false for zero copy tensor
config.SwitchUseFeedFetchOps(false);
// true for commom tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
// true for multiple input
config.SwitchSpecifyInputNames(true);
......@@ -59,12 +62,22 @@ void DBDetector::Run(cv::Mat &img,
std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
this->permute_op_.Run(&resize_img, input.data());
auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
// Inference.
if (this->use_zero_copy_run_) {
auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<float> out_data;
auto output_names = this->predictor_->GetOutputNames();
......@@ -95,9 +108,11 @@ void DBDetector::Run(cv::Mat &img,
const double maxvalue = 255;
cv::Mat bit_map;
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
cv::Mat dilation_map;
cv::Mat dila_ele = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2,2));
cv::dilate(bit_map, dilation_map, dila_ele);
boxes = post_processor_.BoxesFromBitmap(
pred_map, bit_map, this->det_db_box_thresh_, this->det_db_unclip_ratio_);
pred_map, dilation_map, this->det_db_box_thresh_, this->det_db_unclip_ratio_);
boxes = post_processor_.FilterTagDetRes(boxes, ratio_h, ratio_w, srcimg);
......
......@@ -17,7 +17,7 @@
namespace PaddleOCR {
void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
cv::Mat &img) {
cv::Mat &img, Classifier *cls) {
cv::Mat srcimg;
img.copyTo(srcimg);
cv::Mat crop_img;
......@@ -27,6 +27,9 @@ void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
int index = 0;
for (int i = boxes.size() - 1; i >= 0; i--) {
crop_img = GetRotateCropImage(srcimg, boxes[i]);
if (cls != nullptr) {
crop_img = cls->Run(crop_img);
}
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
......@@ -39,18 +42,29 @@ void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
this->permute_op_.Run(&resize_img, input.data());
auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
// Inference.
if (this->use_zero_copy_run_) {
auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<int64_t> rec_idx;
auto output_names = this->predictor_->GetOutputNames();
auto output_t = this->predictor_->GetOutputTensor(output_names[0]);
auto rec_idx_lod = output_t->lod();
auto shape_out = output_t->shape();
int out_num = std::accumulate(shape_out.begin(), shape_out.end(), 1,
std::multiplies<int>());
......@@ -115,12 +129,15 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
// false for zero copy tensor
config.SwitchUseFeedFetchOps(false);
// true for commom tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
// true for multiple input
config.SwitchSpecifyInputNames(true);
......
......@@ -219,7 +219,7 @@ PostProcessor::BoxesFromBitmap(const cv::Mat pred, const cv::Mat bitmap,
std::vector<std::vector<std::vector<int>>> boxes;
for (int _i = 0; _i < num_contours; _i++) {
if (contours[_i].size() <= 0) {
if (contours[_i].size() <= 2) {
continue;
}
float ssid;
......@@ -294,7 +294,7 @@ PostProcessor::FilterTagDetRes(std::vector<std::vector<std::vector<int>>> boxes,
pow(boxes[n][0][1] - boxes[n][1][1], 2)));
rect_height = int(sqrt(pow(boxes[n][0][0] - boxes[n][3][0], 2) +
pow(boxes[n][0][1] - boxes[n][3][1], 2)));
if (rect_width <= 10 || rect_height <= 10)
if (rect_width <= 4 || rect_height <= 4)
continue;
root_points.push_back(boxes[n]);
}
......
......@@ -116,4 +116,26 @@ void CrnnResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img, float wh_ratio,
cv::INTER_LINEAR);
}
void ClsResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
const std::vector<int> &rec_image_shape) {
int imgC, imgH, imgW;
imgC = rec_image_shape[0];
imgH = rec_image_shape[1];
imgW = rec_image_shape[2];
float ratio = float(img.cols) / float(img.rows);
int resize_w, resize_h;
if (ceilf(imgH * ratio) > imgW)
resize_w = imgW;
else
resize_w = int(ceilf(imgH * ratio));
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
cv::INTER_LINEAR);
if (resize_w < imgW) {
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, imgW - resize_w,
cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
}
}
} // namespace PaddleOCR
\ No newline at end of file
......@@ -39,22 +39,21 @@ std::vector<std::string> Utility::ReadDict(const std::string &path) {
void Utility::VisualizeBboxes(
const cv::Mat &srcimg,
const std::vector<std::vector<std::vector<int>>> &boxes) {
cv::Point rook_points[boxes.size()][4];
for (int n = 0; n < boxes.size(); n++) {
for (int m = 0; m < boxes[0].size(); m++) {
rook_points[n][m] = cv::Point(int(boxes[n][m][0]), int(boxes[n][m][1]));
}
}
cv::Mat img_vis;
srcimg.copyTo(img_vis);
for (int n = 0; n < boxes.size(); n++) {
const cv::Point *ppt[1] = {rook_points[n]};
cv::Point rook_points[4];
for (int m = 0; m < boxes[n].size(); m++) {
rook_points[m] = cv::Point(int(boxes[n][m][0]), int(boxes[n][m][1]));
}
const cv::Point *ppt[1] = {rook_points};
int npt[] = {4};
cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0);
}
cv::imwrite("./ocr_vis.png", img_vis);
std::cout << "The detection visualized image saved in ./ocr_vis.png.pn"
std::cout << "The detection visualized image saved in ./ocr_vis.png"
<< std::endl;
}
......
OPENCV_DIR=your_opencv_dir
LIB_DIR=your_paddle_inference_dir
CUDA_LIB_DIR=your_cuda_lib_dir
CUDNN_LIB_DIR=/your_cudnn_lib_dir
CUDNN_LIB_DIR=your_cudnn_lib_dir
BUILD_DIR=build
rm -rf ${BUILD_DIR}
......@@ -11,7 +10,6 @@ cd ${BUILD_DIR}
cmake .. \
-DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=ON \
-DDEMO_NAME=ocr_system \
-DWITH_GPU=OFF \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=OFF \
......
......@@ -3,20 +3,25 @@ use_gpu 0
gpu_id 0
gpu_mem 4000
cpu_math_library_num_threads 10
use_mkldnn 0
use_mkldnn 1
use_zero_copy_run 1
# det config
max_side_len 960
det_db_thresh 0.3
det_db_box_thresh 0.5
det_db_unclip_ratio 2.0
det_db_unclip_ratio 1.6
det_model_dir ./inference/det_db
# cls config
use_angle_cls 0
cls_model_dir ./inference/cls
cls_thresh 0.9
# rec config
rec_model_dir ./inference/rec_crnn
char_list_file ../../ppocr/utils/ppocr_keys_v1.txt
img_path ../../doc/imgs/11.jpg
# show the detection results
visualize 0
visualize 1
English | [简体中文](README_cn.md)
## Introduction
Many user hopes package the PaddleOCR service into an docker image, so that it can be quickly released and used in the docker or k8s environment.
This page provide some standardized code to achieve this goal. You can quickly publish the PaddleOCR project into a callable Restful API service through the following steps. (At present, the deployment based on the HubServing mode is implemented first, and author plans to increase the deployment of the PaddleServing mode in the futrue)
## 1. Prerequisites
You need to install the following basic components first:
a. Docker
b. Graphics driver and CUDA 10.0+(GPU)
c. NVIDIA Container Toolkit(GPU,Docker 19.03+ can skip this)
d. cuDNN 7.6+(GPU)
## 2. Build Image
a. Download PaddleOCR sourcecode
```
git clone https://github.com/PaddlePaddle/PaddleOCR.git
```
b. Goto Dockerfile directory(ps:Need to distinguish between cpu and gpu version, the following takes cpu as an example, gpu version needs to replace the keyword)
```
cd deploy/docker/cpu
```
c. Build image
```
docker build -t paddleocr:cpu .
```
## 3. Start container
a. CPU version
```
sudo docker run -dp 8866:8866 --name paddle_ocr paddleocr:cpu
```
b. GPU version (base on NVIDIA Container Toolkit)
```
sudo nvidia-docker run -dp 8866:8866 --name paddle_ocr paddleocr:gpu
```
c. GPU version (Docker 19.03++)
```
sudo docker run -dp 8866:8866 --gpus all --name paddle_ocr paddleocr:gpu
```
d. Check service status(If you can see the following statement then it means completed:Successfully installed ocr_system && Running on http://0.0.0.0:8866/)
```
docker logs -f paddle_ocr
```
## 4. Test
a. Calculate the Base64 encoding of the picture to be recognized (if you just test, you can use a free online tool, like:https://freeonlinetools24.com/base64-image/)
b. Post a service request(sample request in sample_request.txt)
```
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"Input image Base64 encode(need to delete the code 'data:image/jpg;base64,')\"]}" http://localhost:8866/predict/ocr_system
```
c. Get resposne(If the call is successful, the following result will be returned)
```
{"msg":"","results":[[{"confidence":0.8403433561325073,"text":"约定","text_region":[[345,377],[641,390],[634,540],[339,528]]},{"confidence":0.8131805658340454,"text":"最终相遇","text_region":[[356,532],[624,530],[624,596],[356,598]]}]],"status":"0"}
```
[English](README.md) | 简体中文
## Docker化部署服务
在日常项目应用中,相信大家一般都会希望能通过Docker技术,把PaddleOCR服务打包成一个镜像,以便在Docker或k8s环境里,快速发布上线使用。
本文将提供一些标准化的代码来实现这样的目标。大家通过如下步骤可以把PaddleOCR项目快速发布成可调用的Restful API服务。(目前暂时先实现了基于HubServing模式的部署,后续作者计划增加PaddleServing模式的部署)
## 1.实施前提准备
需要先完成如下基本组件的安装:
a. Docker环境
b. 显卡驱动和CUDA 10.0+(GPU)
c. NVIDIA Container Toolkit(GPU,Docker 19.03以上版本可以跳过此步)
d. cuDNN 7.6+(GPU)
## 2.制作镜像
a.下载PaddleOCR项目代码
```
git clone https://github.com/PaddlePaddle/PaddleOCR.git
```
b.切换至Dockerfile目录(注:需要区分cpu或gpu版本,下文以cpu为例,gpu版本需要替换一下关键字即可)
```
cd deploy/docker/cpu
```
c.生成镜像
```
docker build -t paddleocr:cpu .
```
## 3.启动Docker容器
a. CPU 版本
```
sudo docker run -dp 8866:8866 --name paddle_ocr paddleocr:cpu
```
b. GPU 版本 (通过NVIDIA Container Toolkit)
```
sudo nvidia-docker run -dp 8866:8866 --name paddle_ocr paddleocr:gpu
```
c. GPU 版本 (Docker 19.03以上版本,可以直接用如下命令)
```
sudo docker run -dp 8866:8866 --gpus all --name paddle_ocr paddleocr:gpu
```
d. 检查服务运行情况(出现:Successfully installed ocr_system和Running on http://0.0.0.0:8866/等信息,表示运行成功)
```
docker logs -f paddle_ocr
```
## 4.测试服务
a. 计算待识别图片的Base64编码(如果只是测试一下效果,可以通过免费的在线工具实现,如:http://tool.chinaz.com/tools/imgtobase/)
b. 发送服务请求(可参见sample_request.txt中的值)
```
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"填入图片Base64编码(需要删除'data:image/jpg;base64,')\"]}" http://localhost:8866/predict/ocr_system
```
c. 返回结果(如果调用成功,会返回如下结果)
```
{"msg":"","results":[[{"confidence":0.8403433561325073,"text":"约定","text_region":[[345,377],[641,390],[634,540],[339,528]]},{"confidence":0.8131805658340454,"text":"最终相遇","text_region":[[356,532],[624,530],[624,596],[356,598]]}]],"status":"0"}
```
# Version: 1.0.0
FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev
# PaddleOCR base on Python3.7
RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN python3.7 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
WORKDIR /PaddleOCR
RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN mkdir -p /PaddleOCR/inference
# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py)
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py)
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
EXPOSE 8866
CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]
\ No newline at end of file
# Version: 1.0.0
FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev
# PaddleOCR base on Python3.7
RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN python3.7 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
WORKDIR /home/PaddleOCR
RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN mkdir -p /PaddleOCR/inference
# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py)
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py)
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
EXPOSE 8866
CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]
\ No newline at end of file
此差异已折叠。
......@@ -31,7 +31,7 @@ from tools.infer.predict_det import TextDetector
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class OCRDet(hub.Module):
def _initialize(self, use_gpu=False):
def _initialize(self, use_gpu=False, enable_mkldnn=False):
"""
initialize with the necessary elements
"""
......@@ -51,6 +51,7 @@ class OCRDet(hub.Module):
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
cfg.ir_optim = True
cfg.enable_mkldnn = enable_mkldnn
self.text_detector = TextDetector(cfg)
......
......@@ -13,7 +13,7 @@ def read_params():
#params for text detector
cfg.det_algorithm = "DB"
cfg.det_model_dir = "./inference/ch_det_mv3_db/"
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
cfg.det_max_side_len = 960
#DB parmas
......@@ -36,4 +36,6 @@ def read_params():
# cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
# cfg.use_space_char = True
return cfg
\ No newline at end of file
cfg.use_zero_copy_run = False
return cfg
......@@ -31,7 +31,7 @@ from tools.infer.predict_rec import TextRecognizer
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class OCRRec(hub.Module):
def _initialize(self, use_gpu=False):
def _initialize(self, use_gpu=False, enable_mkldnn=False):
"""
initialize with the necessary elements
"""
......@@ -51,6 +51,7 @@ class OCRRec(hub.Module):
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
cfg.ir_optim = True
cfg.enable_mkldnn = enable_mkldnn
self.text_recognizer = TextRecognizer(cfg)
......
......@@ -28,12 +28,24 @@ def read_params():
#params for text recognizer
cfg.rec_algorithm = "CRNN"
cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/"
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/"
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
cfg.use_space_char = True
return cfg
\ No newline at end of file
#params for text classifier
cfg.use_angle_cls = True
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
cfg.use_zero_copy_run = False
return cfg
......@@ -31,7 +31,7 @@ from tools.infer.predict_system import TextSystem
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class OCRSystem(hub.Module):
def _initialize(self, use_gpu=False):
def _initialize(self, use_gpu=False, enable_mkldnn=False):
"""
initialize with the necessary elements
"""
......@@ -51,7 +51,8 @@ class OCRSystem(hub.Module):
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
cfg.ir_optim = True
cfg.enable_mkldnn = enable_mkldnn
self.text_sys = TextSystem(cfg)
def read_images(self, paths=[]):
......
......@@ -10,10 +10,10 @@ class Config(object):
def read_params():
cfg = Config()
#params for text detector
cfg.det_algorithm = "DB"
cfg.det_model_dir = "./inference/ch_det_mv3_db/"
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
cfg.det_max_side_len = 960
#DB parmas
......@@ -28,12 +28,24 @@ def read_params():
#params for text recognizer
cfg.rec_algorithm = "CRNN"
cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/"
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/"
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
cfg.use_space_char = True
return cfg
\ No newline at end of file
#params for text classifier
cfg.use_angle_cls = True
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
cfg.use_zero_copy_run = False
return cfg
# 服务部署
[English](readme_en.md) | 简体中文
PaddleOCR提供2种服务部署方式:
- 基于HubServing的部署:已集成到PaddleOCR中([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)),按照本教程使用;
- 基于PaddleServing的部署:详见PaddleServing官网[demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr),后续也将集成到PaddleOCR。
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用;
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../pdserving/readme.md)
服务部署目录下包括检测、识别、2阶段串联三种服务包,根据需求选择相应的服务包进行安装和启动。目录如下:
# 基于PaddleHub Serving的服务部署
hubserving服务部署目录下包括检测、识别、2阶段串联三种服务包,请根据需求选择相应的服务包进行安装和启动。目录结构如下:
```
deploy/hubserving/
└─ ocr_det 检测模块服务包
......@@ -28,23 +30,51 @@ deploy/hubserving/ocr_system/
# 安装paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# 设置环境变量
# 在Linux下设置环境变量
export PYTHONPATH=.
```
### 2. 安装服务模块
PaddleOCR提供3种服务模块,根据需要安装所需模块。如:
# 或者,在Windows下设置环境变量
SET PYTHONPATH=.
```
安装检测服务模块:
```hub install deploy/hubserving/ocr_det/```
### 2. 下载推理模型
安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认模型路径为:
```
检测模型:./inference/ch_ppocr_mobile_v1.1_det_infer/
识别模型:./inference/ch_ppocr_mobile_v1.1_rec_infer/
方向分类器:./inference/ch_ppocr_mobile_v1.1_cls_infer/
```
或,安装识别服务模块:
```hub install deploy/hubserving/ocr_rec/```
**模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。
或,安装检测+识别串联服务模块:
```hub install deploy/hubserving/ocr_system/```
### 3. 安装服务模块
PaddleOCR提供3种服务模块,根据需要安装所需模块。
### 3. 启动服务
* 在Linux环境下,安装示例如下:
```shell
# 安装检测服务模块:
hub install deploy/hubserving/ocr_det/
# 或,安装识别服务模块:
hub install deploy/hubserving/ocr_rec/
# 或,安装检测+识别串联服务模块:
hub install deploy/hubserving/ocr_system/
```
* 在Windows环境下(文件夹的分隔符为`\`),安装示例如下:
```shell
# 安装检测服务模块:
hub install deploy\hubserving\ocr_det\
# 或,安装识别服务模块:
hub install deploy\hubserving\ocr_rec\
# 或,安装检测+识别串联服务模块:
hub install deploy\hubserving\ocr_system\
```
### 4. 启动服务
#### 方式1. 命令行命令启动(仅支持CPU)
**启动命令:**
```shell
......@@ -69,9 +99,9 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
#### 方式2. 配置文件启动(支持CPU、GPU)
**启动命令:**
```hub serving start --config/-c config.json```
```hub serving start -c config.json```
其中,`config.json`格式如下:
其中,`config.json`格式如下:
```python
{
"modules_info": {
......@@ -96,6 +126,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
**注意:**
- 使用配置文件启动服务时,其他参数会被忽略。
- 如果使用GPU预测(即,`use_gpu`置为`true`),则需要在启动服务之前,设置CUDA_VISIBLE_DEVICES环境变量,如:```export CUDA_VISIBLE_DEVICES=0```,否则不用设置。
- **`use_gpu`不可与`use_multiprocess`同时为`true`**。
如,使用GPU 3号卡启动串联服务:
```shell
......@@ -120,6 +151,25 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
访问示例:
```python tools/test_hubserving.py http://127.0.0.1:8868/predict/ocr_system ./doc/imgs/```
## 返回结果格式说明
返回结果为列表(list),列表中的每一项为词典(dict),词典一共可能包含3种字段,信息如下:
|字段名称|数据类型|意义|
|-|-|-|
|text|str|文本内容|
|confidence|float| 文本识别置信度|
|text_region|list|文本位置坐标|
不同模块返回的字段不同,如,文本识别服务模块返回结果不含`text_region`字段,具体信息如下:
|字段名/模块名|ocr_det|ocr_rec|ocr_system|
|-|-|-|-|
|text||✔|✔|
|confidence||✔|✔|
|text_region|✔||✔|
**说明:** 如果需要增加、删除、修改返回字段,可在相应模块的`module.py`文件中进行修改,完整流程参考下一节自定义修改服务模块。
## 自定义修改服务模块
如果需要修改服务逻辑,你一般需要操作以下步骤(以修改`ocr_system`为例):
......@@ -127,7 +177,7 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
```hub serving stop --port/-p XXXX```
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。
例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。
例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,如果需要关闭文本方向分类器,则将参数`use_angle_cls`置为`False`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。**
- 3、 卸载旧服务包
```hub uninstall ocr_system```
......@@ -137,4 +187,3 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
- 5、重新启动服务
```hub serving start -m ocr_system```
English | [简体中文](readme.md)
PaddleOCR provides 2 service deployment methods:
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial.
- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../pdserving/readme_en.md) for usage.
# Service deployment based on PaddleHub Serving
The hubserving service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Please select the corresponding service package to install and start service according to your needs. The directory is as follows:
```
deploy/hubserving/
└─ ocr_det detection module service package
└─ ocr_rec recognition module service package
└─ ocr_system two-stage series connection service package
```
Each service pack contains 3 files. Take the 2-stage series connection service package as an example, the directory is as follows:
```
deploy/hubserving/ocr_system/
└─ __init__.py Empty file, required
└─ config.json Configuration file, optional, passed in as a parameter when using configuration to start the service
└─ module.py Main module file, required, contains the complete logic of the service
└─ params.py Parameter file, required, including parameters such as model path, pre- and post-processing parameters
```
## Quick start service
The following steps take the 2-stage series service as an example. If only the detection service or recognition service is needed, replace the corresponding file path.
### 1. Prepare the environment
```shell
# Install paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# Set environment variables on Linux
export PYTHONPATH=.
# Set environment variables on Windows
SET PYTHONPATH=.
```
### 2. Download inference model
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default model path is:
```
detection model: ./inference/ch_ppocr_mobile_v1.1_det_infer/
recognition model: ./inference/ch_ppocr_mobile_v1.1_rec_infer/
text direction classifier: ./inference/ch_ppocr_mobile_v1.1_cls_infer/
```
**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself.
### 3. Install Service Module
PaddleOCR provides 3 kinds of service modules, install the required modules according to your needs.
* On Linux platform, the examples are as follows.
```shell
# Install the detection service module:
hub install deploy/hubserving/ocr_det/
# Or, install the recognition service module:
hub install deploy/hubserving/ocr_rec/
# Or, install the 2-stage series service module:
hub install deploy/hubserving/ocr_system/
```
* On Windows platform, the examples are as follows.
```shell
# Install the detection service module:
hub install deploy\hubserving\ocr_det\
# Or, install the recognition service module:
hub install deploy\hubserving\ocr_rec\
# Or, install the 2-stage series service module:
hub install deploy\hubserving\ocr_system\
```
### 4. Start service
#### Way 1. Start with command line parameters (CPU only)
**start command:**
```shell
$ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
--port XXXX \
--use_multiprocess \
--workers \
```
**parameters:**
|parameters|usage|
|-|-|
|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When Version is not specified, the latest version is selected by default`*|
|--port/-p|Service port, default is 8866|
|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For example, start the 2-stage series service:
```shell
hub serving start -m ocr_system
```
This completes the deployment of a service API, using the default port number 8866.
#### Way 2. Start with configuration file(CPU、GPU)
**start command:**
```shell
hub serving start --config/-c config.json
```
Wherein, the format of `config.json` is as follows:
```python
{
"modules_info": {
"ocr_system": {
"init_args": {
"version": "1.0.0",
"use_gpu": true
},
"predict_args": {
}
}
},
"port": 8868,
"use_multiprocess": false,
"workers": 2
}
```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them, **when `use_gpu` is `true`, it means that the GPU is used to start the service**.
- The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Note:**
- When using the configuration file to start the service, other parameters will be ignored.
- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
- **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**
For example, use GPU card No. 3 to start the 2-stage series service:
```shell
export CUDA_VISIBLE_DEVICES=3
hub serving start -c deploy/hubserving/ocr_system/config.json
```
## Send prediction requests
After the service starts, you can use the following command to send a prediction request to obtain the prediction result:
```shell
python tools/test_hubserving.py server_url image_path
```
Two parameters need to be passed to the script:
- **server_url**:service address,format of which is
`http://[ip_address]:[port]/predict/[module_name]`
For example, if the detection, recognition and 2-stage serial services are started with provided configuration files, the respective `server_url` would be:
`http://127.0.0.1:8866/predict/ocr_det`
`http://127.0.0.1:8867/predict/ocr_rec`
`http://127.0.0.1:8868/predict/ocr_system`
- **image_path**:Test image path, can be a single image path or an image directory path
**Eg.**
```shell
python tools/test_hubserving.py http://127.0.0.1:8868/predict/ocr_system ./doc/imgs/
```
## Returned result format
The returned result is a list. Each item in the list is a dict. The dict may contain three fields. The information is as follows:
|field name|data type|description|
|-|-|-|
|text|str|text content|
|confidence|float|text recognition confidence|
|text_region|list|text location coordinates|
The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`. The details are as follows:
|field name/module name|ocr_det|ocr_rec|ocr_system|
|-|-|-|-|
|text||✔|✔|
|confidence||✔|✔|
|text_region|✔||✔|
**Note:** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section.
## User defined service module modification
If you need to modify the service logic, the following steps are generally required (take the modification of `ocr_system` for example):
- 1. Stop service
```shell
hub serving stop --port/-p XXXX
```
- 2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs.
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. If you want to turn off the text direction classifier, set the parameter `use_angle_cls` to `False`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. It is suggested to run `module.py` directly for debugging after modification before starting the service test.
- 3. Uninstall old service module
```shell
hub uninstall ocr_system
```
- 4. Install modified service module
```shell
hub install deploy/hubserving/ocr_system/
```
- 5. Restart service
```shell
hub serving start -m ocr_system
```
#!/bin/bash
set -e
OCR_MODEL_URL="https://paddleocr.bj.bcebos.com/deploy/lite/ocr_v1_for_cpu.tar.gz"
PADDLE_LITE_LIB_URL="https://paddlelite-demo.bj.bcebos.com/libs/ios/paddle_lite_libs_v2_6_0.tar.gz"
OPENCV3_FRAMEWORK_URL="https://paddlelite-demo.bj.bcebos.com/libs/ios/opencv3.framework.tar.gz"
download_and_extract() {
local url="$1"
local dst_dir="$2"
local tempdir=$(mktemp -d)
echo "Downloading ${url} ..."
curl -L ${url} > ${tempdir}/temp.tar.gz
echo "Download ${url} done "
if [ ! -d ${dst_dir} ];then
mkdir -p ${dst_dir}
fi
echo "Extracting ..."
tar -zxvf ${tempdir}/temp.tar.gz -C ${dst_dir}
echo "Extract done "
rm -rf ${tempdir}
}
echo -e "[Download ios ocr demo denpendancy]\n"
download_and_extract "${OCR_MODEL_URL}" "./ocr_demo/models"
download_and_extract "${PADDLE_LITE_LIB_URL}" "./ocr_demo"
download_and_extract "${OPENCV3_FRAMEWORK_URL}" "./ocr_demo"
echo -e "[done]\n"
此差异已折叠。
//
// AppDelegate.h
// seg_demo
//
// Created by Li,Xiaoyang(SYS) on 2018/11/13.
// Copyright © 2018年 Li,Xiaoyang(SYS). All rights reserved.
//
#import <UIKit/UIKit.h>
@interface AppDelegate : UIResponder <UIApplicationDelegate>
@property (strong, nonatomic) UIWindow *window;
@end
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册