提交 7e4aee47 编写于 作者: W WenmuZhou

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleOCR into fix_cls_shape

...@@ -4,10 +4,11 @@ English | [简体中文](README_ch.md) ...@@ -4,10 +4,11 @@ English | [简体中文](README_ch.md)
PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice. PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**Recent updates** **Recent updates**
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](#PP-OCR)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list) - 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipeline](#PP-OCR-Pipeline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list)
- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list) - 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list)
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](./doc/doc_en/whl_en.md) - 2020.9.17 update [English recognition model](./doc/doc_en/models_list_en.md#english-recognition-model) and [Multilingual recognition model](doc/doc_en/models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
- 2020.8.24 Support the use of PaddleOCR through whl package installation,please refer [PaddleOCR Package](./doc/doc_en/whl_en.md)
- 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519) - 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519)
- [more](./doc/doc_en/update_en.md) - [more](./doc/doc_en/update_en.md)
...@@ -29,7 +30,16 @@ PaddleOCR aims to create rich, leading, and practical OCR tools that help users ...@@ -29,7 +30,16 @@ PaddleOCR aims to create rich, leading, and practical OCR tools that help users
<img src="doc/imgs_results/1103.jpg" width="800"> <img src="doc/imgs_results/1103.jpg" width="800">
</div> </div>
The above picture is the effect display of the general ppocr_server model. For more effect pictures, please see [More visualization](./doc/doc_en/visualization_en.md). The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md).
<a name="Community"></a>
## Community
- Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation.
<div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" />
</div>
## Quick Experience ## Quick Experience
...@@ -49,11 +59,11 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr ...@@ -49,11 +59,11 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
## PP-OCR 1.1 series model list(Update on Sep 17) ## PP-OCR 1.1 series model list(Update on Sep 17)
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | | | Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ---- | | ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | | | Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | | | Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
| Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb) | | | Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) |
For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md) For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md)
...@@ -65,10 +75,11 @@ For more model downloads (including multiple languages), please refer to [PP-OCR ...@@ -65,10 +75,11 @@ For more model downloads (including multiple languages), please refer to [PP-OCR
- Algorithm introduction - Algorithm introduction
- [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md) - [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md) - [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [PP-OCR Pipline](#PP-OCR-Pipline) - [PP-OCR Pipeline](#PP-OCR-Pipeline)
- Model training/evaluation - Model training/evaluation
- [Text Detection](./doc/doc_en/detection_en.md) - [Text Detection](./doc/doc_en/detection_en.md)
- [Text Recognition](./doc/doc_en/recognition_en.md) - [Text Recognition](./doc/doc_en/recognition_en.md)
- [Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md) - [Yml Configuration](./doc/doc_en/config_en.md)
- Inference and Deployment - Inference and Deployment
- [Quick inference based on pip](./doc/doc_en/whl_en.md) - [Quick inference based on pip](./doc/doc_en/whl_en.md)
...@@ -92,35 +103,40 @@ For more model downloads (including multiple languages), please refer to [PP-OCR ...@@ -92,35 +103,40 @@ For more model downloads (including multiple languages), please refer to [PP-OCR
- [License](#LICENSE) - [License](#LICENSE)
- [Contribution](#CONTRIBUTION) - [Contribution](#CONTRIBUTION)
<a name="PP-OCR-Pipline"></a> <a name="PP-OCR-Pipeline"></a>
## PP-OCR Pipline ## PP-OCR Pipeline
<div align="center"> <div align="center">
<img src="./doc/ppocr_framework.png" width="800"> <img src="./doc/ppocr_framework.png" width="800">
</div> </div>
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (Arxiv article link is being generated). PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner and PACT quantization is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
## Visualization [more](./doc/doc_en/visualization_en.md)
## Visualization [more](./doc/doc_en/visualization_en.md)
- Chinese OCR model
<div align="center"> <div align="center">
<img src="./doc/imgs_results/1102.jpg" width="800"> <img src="./doc/imgs_results/1102.jpg" width="800">
<img src="./doc/imgs_results/1104.jpg" width="800"> <img src="./doc/imgs_results/1104.jpg" width="800">
<img src="./doc/imgs_results/1106.jpg" width="800"> <img src="./doc/imgs_results/1106.jpg" width="800">
<img src="./doc/imgs_results/1105.jpg" width="800"> <img src="./doc/imgs_results/1105.jpg" width="800">
<img src="./doc/imgs_results/1110.jpg" width="800">
<img src="./doc/imgs_results/1112.jpg" width="800">
</div> </div>
<a name="Community"></a> - English OCR model
## Community <div align="center">
Scan the QR code below with your Wechat and completing the questionnaire, you can access to offical technical exchange group. <img src="./doc/imgs_results/img_12.jpg" width="800">
</div>
- Multilingual OCR model
<div align="center"> <div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" /> <img src="./doc/imgs_results/1110.jpg" width="800">
<img src="./doc/imgs_results/1112.jpg" width="800">
</div> </div>
<a name="LICENSE"></a> <a name="LICENSE"></a>
## License ## License
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a> This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
......
...@@ -4,9 +4,11 @@ ...@@ -4,9 +4,11 @@
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
**近期更新** **近期更新**
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载) - 2020.10.26 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,共计94个常见问题及解答,并且计划以后每周一都会更新,欢迎大家持续关注。
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipeline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载)
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载) - 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载)
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md) - 2020.9.17 更新[英文识别模型](./doc/doc_ch/models_list.md#英文识别模型)[多语言识别模型](doc/doc_ch/models_list.md#多语言识别模型),已支持`德语、法语、日语、韩语`,更多语种识别模型将持续更新。
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md) - 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md)
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) - 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
- [More](./doc/doc_ch/update.md) - [More](./doc/doc_ch/update.md)
...@@ -14,7 +16,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -14,7 +16,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 特性 ## 特性
- PPOCR系列高质量预训练模型,媲美商业效果 - PPOCR系列高质量预训练模型,准确的识别效果
- 超轻量ppocr_mobile移动端系列:检测(2.6M)+方向分类器(0.9M)+ 识别(4.6M)= 8.1M - 超轻量ppocr_mobile移动端系列:检测(2.6M)+方向分类器(0.9M)+ 识别(4.6M)= 8.1M
- 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M - 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M
- 超轻量压缩ppocr_mobile_slim系列:检测(1.4M)+方向分类器(0.5M)+ 识别(1.6M)= 3.5M - 超轻量压缩ppocr_mobile_slim系列:检测(1.4M)+方向分类器(0.5M)+ 识别(1.6M)= 3.5M
...@@ -33,6 +35,15 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -33,6 +35,15 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md) 上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
- 微信扫描二维码加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
<div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" />
</div>
## 快速体验 ## 快速体验
- PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr - PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
...@@ -48,11 +59,11 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -48,11 +59,11 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<a name="模型下载"></a> <a name="模型下载"></a>
## PP-OCR 1.1系列模型列表(9月17日更新) ## PP-OCR 1.1系列模型列表(9月17日更新)
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | | | 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------ | --------------- | ----------------|---- | ---------- | -------- | ---- | | ------------ | --------------- | ----------------|---- | ---------- | -------- |
| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | | | 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | | | 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
| 中英文超轻量压缩OCR模型(3.5M) | ch_ppocr_mobile_slim_v1.1_xx| 移动端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb)| | || | 中英文超轻量压缩OCR模型(3.5M) | ch_ppocr_mobile_slim_v1.1_xx| 移动端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb)| [推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|
更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](./doc/doc_ch/models_list.md) 更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](./doc/doc_ch/models_list.md)
...@@ -63,19 +74,20 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -63,19 +74,20 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- 算法介绍 - 算法介绍
- [文本检测](./doc/doc_ch/algorithm_overview.md) - [文本检测](./doc/doc_ch/algorithm_overview.md)
- [文本识别](./doc/doc_ch/algorithm_overview.md) - [文本识别](./doc/doc_ch/algorithm_overview.md)
- [PP-OCR Pipline](#PP-OCR) - [PP-OCR Pipeline](#PP-OCR)
- 模型训练/评估 - 模型训练/评估
- [文本检测](./doc/doc_ch/detection.md) - [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md) - [文本识别](./doc/doc_ch/recognition.md)
- [方向分类器](./doc/doc_ch/angle_class.md)
- [yml参数配置文件介绍](./doc/doc_ch/config.md) - [yml参数配置文件介绍](./doc/doc_ch/config.md)
- 预测部署 - 预测部署
- [基于pip安装whl包快速推理](./doc/doc_ch/whl.md) - [基于pip安装whl包快速推理](./doc/doc_ch/whl.md)
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md) - [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/hubserving/readme.md) - [服务化部署](./doc/doc_ch/serving_inference.md)
- [端侧部署](./deploy/lite/readme.md) - [端侧部署](./deploy/lite/readme.md)
- [模型量化](./deploy/slim/quantization/README.md) - [模型量化](./deploy/slim/quantization/README.md)
- [模型裁剪](./deploy/slim/prune/README_ch.md) - [模型裁剪](./deploy/slim/prune/README.md)
- [Benchmark](./doc/doc_ch/benchmark.md) - [Benchmark](./doc/doc_ch/benchmark.md)
- 数据集 - 数据集
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md) - [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
...@@ -86,43 +98,44 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -86,43 +98,44 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [效果展示](#效果展示) - [效果展示](#效果展示)
- FAQ - FAQ
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md) - [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
- [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md) - [【理论篇】OCR通用23个问题](./doc/doc_ch/FAQ.md)
- [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md) - [【实战篇】PaddleOCR实战61个问题](./doc/doc_ch/FAQ.md)
- [技术交流群](#欢迎加入PaddleOCR技术交流群) - [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [参考文献](./doc/doc_ch/reference.md) - [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书) - [许可证书](#许可证书)
- [贡献代码](#贡献代码) - [贡献代码](#贡献代码)
<a name="PP-OCR"></a> <a name="PP-OCR"></a>
## PP-OCR Pipline ## PP-OCR Pipeline
<div align="center"> <div align="center">
<img src="./doc/ppocr_framework.png" width="800"> <img src="./doc/ppocr_framework.png" width="800">
</div> </div>
PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术文章(Arxiv文章链接生成中)。 PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 。其中FPGM裁剪器和PACT量化的实现可以参考[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
<a name="效果展示"></a> <a name="效果展示"></a>
## 效果展示 [more](./doc/doc_ch/visualization.md) ## 效果展示 [more](./doc/doc_ch/visualization.md)
- 中文模型
<div align="center"> <div align="center">
<img src="./doc/imgs_results/1102.jpg" width="800"> <img src="./doc/imgs_results/1102.jpg" width="800">
<img src="./doc/imgs_results/1104.jpg" width="800"> <img src="./doc/imgs_results/1104.jpg" width="800">
<img src="./doc/imgs_results/1106.jpg" width="800"> <img src="./doc/imgs_results/1106.jpg" width="800">
<img src="./doc/imgs_results/1105.jpg" width="800"> <img src="./doc/imgs_results/1105.jpg" width="800">
<img src="./doc/imgs_results/1110.jpg" width="800">
<img src="./doc/imgs_results/1112.jpg" width="800">
</div> </div>
- 英文模型
<div align="center">
<img src="./doc/imgs_results/img_12.jpg" width="800">
</div>
<a name="欢迎加入PaddleOCR技术交流群"></a> - 其他语言模型
## 欢迎加入PaddleOCR技术交流群
请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍
<div align="center"> <div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" /> <img src="./doc/imgs_results/1110.jpg" width="800">
<img src="./doc/imgs_results/1112.jpg" width="800">
</div> </div>
<a name="许可证书"></a> <a name="许可证书"></a>
## 许可证书 ## 许可证书
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。 本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
......
...@@ -18,5 +18,4 @@ TestReader: ...@@ -18,5 +18,4 @@ TestReader:
infer_img: infer_img:
img_set_dir: ./train_data/icdar2015/text_localization/ img_set_dir: ./train_data/icdar2015/text_localization/
label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
test_image_shape: [736, 1280]
do_eval: True do_eval: True
...@@ -24,7 +24,6 @@ Backbone: ...@@ -24,7 +24,6 @@ Backbone:
function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3 function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3
scale: 0.5 scale: 0.5
model_name: large model_name: large
disable_se: true
Head: Head:
function: ppocr.modeling.heads.det_db_head,DBHead function: ppocr.modeling.heads.det_db_head,DBHead
......
Global:
algorithm: DB
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_db/
save_epoch_step: 200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [4000, 5000]
train_batch_size_per_card: 16
test_batch_size_per_card: 16
image_shape: [3, 640, 640]
reader_yml: ./configs/det/det_db_icdar15_reader.yml
pretrain_weights: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/
checkpoints:
save_res_path: ./output/det_db/predicts_db.txt
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: large
disable_se: true
Head:
function: ppocr.modeling.heads.det_db_head,DBHead
model_name: large
k: 50
inner_channels: 96
out_channels: 2
Loss:
function: ppocr.modeling.losses.det_db_loss,DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 16
total_epoch: 1200
PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Global:
algorithm: DB
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_r_18_vd_db/
save_epoch_step: 200
eval_batch_step: [3000, 2000]
train_batch_size_per_card: 8
test_batch_size_per_card: 1
image_shape: [3, 640, 640]
reader_yml: ./configs/det/det_db_icdar15_reader.yml
pretrain_weights: ./pretrain_models/ResNet18_vd_pretrained/
save_res_path: ./output/det_r18_vd_db/predicts_db.txt
checkpoints:
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_resnet_vd,ResNet
layers: 18
Head:
function: ppocr.modeling.heads.det_db_head,DBHead
model_name: large
k: 50
inner_channels: 256
out_channels: 2
Loss:
function: ppocr.modeling.losses.det_db_loss,DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 32
total_epoch: 1200
PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3
box_thresh: 0.5
max_candidates: 1000
unclip_ratio: 1.6
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_CRNN
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 128
test_batch_size_per_card: 128
image_shape: [3, 32, 320]
max_text_length: 25
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: true
use_space_char: true
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_resnet_vd,ResNet
layers: 34
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
fc_decay: 0.00004
SeqRNN:
hidden_size: 256
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0005
l2_decay: 0.00004
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 254
total_epoch: 500
warmup_minibatch: 1000
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_CRNN
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: true
use_space_char: true
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
fc_decay: 0.00001
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0005
l2_decay: 0.00001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 254
total_epoch: 500
warmup_minibatch: 1000
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: french character_type: french
character_dict_path: ./ppocr/utils/french_dict.txt character_dict_path: ./ppocr/utils/dict/french_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: german character_type: german
character_dict_path: ./ppocr/utils/german_dict.txt character_dict_path: ./ppocr/utils/dict/german_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: japan character_type: japan
character_dict_path: ./ppocr/utils/japan_dict.txt character_dict_path: ./ppocr/utils/dict/japan_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: korean character_type: korean
character_dict_path: ./ppocr/utils/korean_dict.txt character_dict_path: ./ppocr/utils/dict/korean_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
### 1. 安装最新版本的Android Studio ### 1. 安装最新版本的Android Studio
可以从https://developer.android.com/studio 下载。本Demo使用是4.0版本Android Studio编写。 可以从https://developer.android.com/studio 下载。本Demo使用是4.0版本Android Studio编写。
### 2. 按照NDK 20 以上版本 ### 2. 按照NDK 20 以上版本
Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编译成功。 Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编译成功。
如果您是初学者,可以用以下方式安装和测试NDK编译环境。 如果您是初学者,可以用以下方式安装和测试NDK编译环境。
...@@ -17,3 +17,10 @@ Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编 ...@@ -17,3 +17,10 @@ Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编
- Demo APP:可使用手机扫码安装,方便手机端快速体验文字识别 - Demo APP:可使用手机扫码安装,方便手机端快速体验文字识别
- SDK:模型被封装为适配不同芯片硬件和操作系统SDK,包括完善的接口,方便进行二次开发 - SDK:模型被封装为适配不同芯片硬件和操作系统SDK,包括完善的接口,方便进行二次开发
# FAQ:
Q1: 更新1.1版本的模型后,demo报错?
A1. 如果要更换V1.1 版本的模型,请更新模型的同时,更新预测库文件,建议使用[PaddleLite 2.6.3](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.6.3)版本的预测库文件,OCR移动端部署参考[教程](../lite/readme.md)
...@@ -98,7 +98,7 @@ private: ...@@ -98,7 +98,7 @@ private:
* @param origin * @param origin
* @return * @return
*/ */
cv::Mat infer_cls(const cv::Mat &origin, float thresh = 0.5); cv::Mat infer_cls(const cv::Mat &origin, float thresh = 0.9);
/** /**
* Postprocess or sencod model to extract text * Postprocess or sencod model to extract text
......
...@@ -35,12 +35,6 @@ public class OCRPredictorNative { ...@@ -35,12 +35,6 @@ public class OCRPredictorNative {
} }
public void release() {
if (nativePointer != 0) {
nativePointer = 0;
// destory(nativePointer);
}
}
public ArrayList<OcrResultModel> runImage(float[] inputData, int width, int height, int channels, Bitmap originalImage) { public ArrayList<OcrResultModel> runImage(float[] inputData, int width, int height, int channels, Bitmap originalImage) {
Log.i("OCRPredictorNative", "begin to run image " + inputData.length + " " + width + " " + height); Log.i("OCRPredictorNative", "begin to run image " + inputData.length + " " + width + " " + height);
...@@ -63,7 +57,7 @@ public class OCRPredictorNative { ...@@ -63,7 +57,7 @@ public class OCRPredictorNative {
protected native float[] forward(long pointer, float[] buf, float[] ddims, Bitmap originalImage); protected native float[] forward(long pointer, float[] buf, float[] ddims, Bitmap originalImage);
protected native void destory(long pointer); public native void release(long pointer);
private ArrayList<OcrResultModel> postprocess(float[] raw) { private ArrayList<OcrResultModel> postprocess(float[] raw) {
ArrayList<OcrResultModel> results = new ArrayList<OcrResultModel>(); ArrayList<OcrResultModel> results = new ArrayList<OcrResultModel>();
......
...@@ -52,20 +52,29 @@ include_directories(${OpenCV_INCLUDE_DIRS}) ...@@ -52,20 +52,29 @@ include_directories(${OpenCV_INCLUDE_DIRS})
if (WIN32) if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=") add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd") if(WITH_MKL)
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT") set(FLAG_OPENMP "/openmp")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd") endif()
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT") set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
if (WITH_STATIC_LIB) if (WITH_STATIC_LIB)
safe_set_static_flag() safe_set_static_flag()
add_definitions(-DSTATIC_LIB) add_definitions(-DSTATIC_LIB)
endif() endif()
message("cmake c debug flags " ${CMAKE_C_FLAGS_DEBUG})
message("cmake c release flags " ${CMAKE_C_FLAGS_RELEASE})
message("cmake cxx debug flags " ${CMAKE_CXX_FLAGS_DEBUG})
message("cmake cxx release flags " ${CMAKE_CXX_FLAGS_RELEASE})
else() else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o3 -std=c++11") if(WITH_MKL)
set(FLAG_OPENMP "-fopenmp")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o3 ${FLAG_OPENMP} -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "") set(CMAKE_STATIC_LIBRARY_PREFIX "")
message("cmake cxx flags" ${CMAKE_CXX_FLAGS})
endif() endif()
message("flags" ${CMAKE_CXX_FLAGS})
if (WITH_GPU) if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "") if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
...@@ -198,4 +207,4 @@ if (WIN32 AND WITH_MKL) ...@@ -198,4 +207,4 @@ if (WIN32 AND WITH_MKL)
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
) )
endif() endif()
\ No newline at end of file
...@@ -194,7 +194,37 @@ sh tools/run.sh ...@@ -194,7 +194,37 @@ sh tools/run.sh
``` ```
* 若需要使用方向分类器,则需要将`tools/config.txt`中的`use_angle_cls`参数修改为1,表示开启方向分类器的预测。 * 若需要使用方向分类器,则需要将`tools/config.txt`中的`use_angle_cls`参数修改为1,表示开启方向分类器的预测。
* 更多地,tools/config.txt中的参数及解释如下。
```
use_gpu 0 # 是否使用GPU,1表示使用,0表示不使用
gpu_id 0 # GPU id,使用GPU时有效
gpu_mem 4000 # 申请的GPU内存
cpu_math_library_num_threads 10 # CPU预测时的线程数,在机器核数充足的情况下,该值越大,预测速度越快
use_mkldnn 1 # 是否使用mkldnn库
use_zero_copy_run 1 # 是否使用use_zero_copy_run进行预测
# det config
max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960
det_db_thresh 0.3 # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显
det_db_box_thresh 0.5 # DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小
det_db_unclip_ratio 1.6 # 表示文本框的紧致程度,越小则文本框更靠近文本
det_model_dir ./inference/det_db # 检测模型inference model地址
# cls config
use_angle_cls 0 # 是否使用方向分类器,0表示不使用,1表示使用
cls_model_dir ./inference/cls # 方向分类器inference model地址
cls_thresh 0.9 # 方向分类器的得分阈值
# rec config
rec_model_dir ./inference/rec_crnn # 识别模型inference model地址
char_list_file ../../ppocr/utils/ppocr_keys_v1.txt # 字典文件
# show the detection results
visualize 1 # 是否对结果进行可视化,为1时,会在当前文件夹下保存文件名为`ocr_vis.png`的预测结果。
```
* PaddleOCR也支持多语言的预测,更多细节可以参考[识别文档](../../doc/doc_ch/recognition.md)中的多语言字典与模型部分。
最终屏幕上会输出检测结果如下。 最终屏幕上会输出检测结果如下。
...@@ -205,4 +235,4 @@ sh tools/run.sh ...@@ -205,4 +235,4 @@ sh tools/run.sh
### 2.3 注意 ### 2.3 注意
* C++预测默认未开启MKLDNN(`tools/config.txt`中的`use_mkldnn`设置为0),如果需要使用MKLDNN进行预测加速,则需要将`use_mkldnn`修改为1,同时使用最新版本的Paddle源码编译预测库。在使用MKLDNN进行CPU预测时,如果同时预测多张图像,则会出现内存泄露的问题(不打开MKLDNN则没有该问题),目前该问题正在修复中,临时解决方案为:预测多张图片时,每隔30张图片左右对识别(`CRNNRecognizer`)和检测类(`DBDetector`)重新初始化一次 * 在使用Paddle预测库时,推荐使用2.0.0-beta0版本的预测库
...@@ -202,6 +202,38 @@ sh tools/run.sh ...@@ -202,6 +202,38 @@ sh tools/run.sh
``` ```
* If you want to orientation classifier to correct the detected boxes, you can set `use_angle_cls` in the file `tools/config.txt` as 1 to enable the function. * If you want to orientation classifier to correct the detected boxes, you can set `use_angle_cls` in the file `tools/config.txt` as 1 to enable the function.
* What's more, Parameters and their meanings in `tools/config.txt` are as follows.
```
use_gpu 0 # Whether to use GPU, 0 means not to use, 1 means to use
gpu_id 0 # GPU id when use_gpu is 1
gpu_mem 4000 # GPU memory requested
cpu_math_library_num_threads 10 # Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed
use_mkldnn 1 # Whether to use mkdlnn library
use_zero_copy_run 1 # Whether to use use_zero_copy_run for inference
max_side_len 960 # Limit the maximum image height and width to 960
det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
det_model_dir ./inference/det_db # Address of detection inference model
# cls config
use_angle_cls 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use
cls_model_dir ./inference/cls # Address of direction classifier inference model
cls_thresh 0.9 # Score threshold of the direction classifier
# rec config
rec_model_dir ./inference/rec_crnn # Address of recognition inference model
char_list_file ../../ppocr/utils/ppocr_keys_v1.txt # dictionary file
# show the detection results
visualize 1 # Whether to visualize the results,when it is set as 1, The prediction result will be save in the image file `./ocr_vis.png`.
```
* Multi-language inference is also supported in PaddleOCR, for more details, please refer to part of multi-language dictionaries and models in [recognition tutorial](../../doc/doc_en/recognition_en.md).
The detection results will be shown on the screen, which is as follows. The detection results will be shown on the screen, which is as follows.
...@@ -210,6 +242,6 @@ The detection results will be shown on the screen, which is as follows. ...@@ -210,6 +242,6 @@ The detection results will be shown on the screen, which is as follows.
</div> </div>
### 2.3 Note ### 2.3 Notes
* `MKLDNN` is disabled by default for C++ inference (`use_mkldnn` in `tools/config.txt` is set to 0), if you need to use MKLDNN for inference acceleration, you need to modify `use_mkldnn` to 1, and use the latest version of the Paddle source code to compile the inference library. When using MKLDNN for CPU prediction, if multiple images are predicted at the same time, there will be a memory leak problem (the problem is not present if MKLDNN is disabled). The problem is currently being fixed, and the temporary solution is: when predicting multiple pictures, Re-initialize the recognition (`CRNNRecognizer`) and detection class (`DBDetector`) every 30 pictures or so. * Paddle2.0.0-beta0 inference model library is recommanded for this tuturial.
...@@ -12,6 +12,8 @@ ...@@ -12,6 +12,8 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
#include "glog/logging.h"
#include "omp.h"
#include "opencv2/core.hpp" #include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp" #include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp" #include "opencv2/imgproc.hpp"
...@@ -67,6 +69,19 @@ int main(int argc, char **argv) { ...@@ -67,6 +69,19 @@ int main(int argc, char **argv) {
config.use_mkldnn, config.use_zero_copy_run, config.use_mkldnn, config.use_zero_copy_run,
config.char_list_file); config.char_list_file);
#ifdef USE_MKL
#pragma omp parallel
for (auto i = 0; i < 10; i++) {
LOG_IF(WARNING,
config.cpu_math_library_num_threads != omp_get_num_threads())
<< "WARNING! MKL is running on " << omp_get_num_threads()
<< " threads while cpu_math_library_num_threads is set to "
<< config.cpu_math_library_num_threads
<< ". Possible reason could be 1. You have set omp_set_num_threads() "
"somewhere; 2. MKL is not linked properly";
}
#endif
auto start = std::chrono::system_clock::now(); auto start = std::chrono::system_clock::now();
std::vector<std::vector<std::vector<int>>> boxes; std::vector<std::vector<std::vector<int>>> boxes;
det.Run(srcimg, boxes); det.Run(srcimg, boxes);
......
...@@ -85,7 +85,7 @@ void ResizeImgType0::Run(const cv::Mat &img, cv::Mat &resize_img, ...@@ -85,7 +85,7 @@ void ResizeImgType0::Run(const cv::Mat &img, cv::Mat &resize_img,
if (resize_w % 32 == 0) if (resize_w % 32 == 0)
resize_w = resize_w; resize_w = resize_w;
else if (resize_w / 32 < 1) else if (resize_w / 32 < 1 + 1e-5)
resize_w = 32; resize_w = 32;
else else
resize_w = (resize_w / 32 - 1) * 32; resize_w = (resize_w / 32 - 1) * 32;
...@@ -138,4 +138,4 @@ void ClsResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img, ...@@ -138,4 +138,4 @@ void ClsResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
} }
} }
} // namespace PaddleOCR } // namespace PaddleOCR
\ No newline at end of file
...@@ -3,11 +3,14 @@ from __future__ import absolute_import ...@@ -3,11 +3,14 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import os
import sys
sys.path.insert(0, ".")
import argparse import argparse
import ast import ast
import copy import copy
import math import math
import os
import time import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
...@@ -67,9 +70,7 @@ class OCRDet(hub.Module): ...@@ -67,9 +70,7 @@ class OCRDet(hub.Module):
images.append(img) images.append(img)
return images return images
def predict(self, def predict(self, images=[], paths=[]):
images=[],
paths=[]):
""" """
Get the text box in the predicted images. Get the text box in the predicted images.
Args: Args:
...@@ -87,7 +88,7 @@ class OCRDet(hub.Module): ...@@ -87,7 +88,7 @@ class OCRDet(hub.Module):
raise TypeError("The input data is inconsistent with expectations.") raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data." assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
all_results = [] all_results = []
for img in predicted_data: for img in predicted_data:
if img is None: if img is None:
...@@ -99,11 +100,9 @@ class OCRDet(hub.Module): ...@@ -99,11 +100,9 @@ class OCRDet(hub.Module):
rec_res_final = [] rec_res_final = []
for dno in range(len(dt_boxes)): for dno in range(len(dt_boxes)):
rec_res_final.append( rec_res_final.append({
{ 'text_region': dt_boxes[dno].astype(np.int).tolist()
'text_region': dt_boxes[dno].astype(np.int).tolist() })
}
)
all_results.append(rec_res_final) all_results.append(rec_res_final)
return all_results return all_results
...@@ -116,7 +115,7 @@ class OCRDet(hub.Module): ...@@ -116,7 +115,7 @@ class OCRDet(hub.Module):
results = self.predict(images_decode, **kwargs) results = self.predict(images_decode, **kwargs)
return results return results
if __name__ == '__main__': if __name__ == '__main__':
ocr = OCRDet() ocr = OCRDet()
image_path = [ image_path = [
...@@ -124,4 +123,4 @@ if __name__ == '__main__': ...@@ -124,4 +123,4 @@ if __name__ == '__main__':
'./doc/imgs/12.jpg', './doc/imgs/12.jpg',
] ]
res = ocr.predict(paths=image_path) res = ocr.predict(paths=image_path)
print(res) print(res)
\ No newline at end of file
...@@ -37,5 +37,6 @@ def read_params(): ...@@ -37,5 +37,6 @@ def read_params():
# cfg.use_space_char = True # cfg.use_space_char = True
cfg.use_zero_copy_run = False cfg.use_zero_copy_run = False
cfg.use_pdserving = False
return cfg return cfg
...@@ -3,11 +3,14 @@ from __future__ import absolute_import ...@@ -3,11 +3,14 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import os
import sys
sys.path.insert(0, ".")
import argparse import argparse
import ast import ast
import copy import copy
import math import math
import os
import time import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
...@@ -67,9 +70,7 @@ class OCRRec(hub.Module): ...@@ -67,9 +70,7 @@ class OCRRec(hub.Module):
images.append(img) images.append(img)
return images return images
def predict(self, def predict(self, images=[], paths=[]):
images=[],
paths=[]):
""" """
Get the text box in the predicted images. Get the text box in the predicted images.
Args: Args:
...@@ -87,31 +88,28 @@ class OCRRec(hub.Module): ...@@ -87,31 +88,28 @@ class OCRRec(hub.Module):
raise TypeError("The input data is inconsistent with expectations.") raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data." assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
img_list = [] img_list = []
for img in predicted_data: for img in predicted_data:
if img is None: if img is None:
continue continue
img_list.append(img) img_list.append(img)
rec_res_final = [] rec_res_final = []
try: try:
rec_res, predict_time = self.text_recognizer(img_list) rec_res, predict_time = self.text_recognizer(img_list)
for dno in range(len(rec_res)): for dno in range(len(rec_res)):
text, score = rec_res[dno] text, score = rec_res[dno]
rec_res_final.append( rec_res_final.append({
{ 'text': text,
'text': text, 'confidence': float(score),
'confidence': float(score), })
}
)
except Exception as e: except Exception as e:
print(e) print(e)
return [[]] return [[]]
return [rec_res_final] return [rec_res_final]
@serving @serving
def serving_method(self, images, **kwargs): def serving_method(self, images, **kwargs):
""" """
...@@ -121,7 +119,7 @@ class OCRRec(hub.Module): ...@@ -121,7 +119,7 @@ class OCRRec(hub.Module):
results = self.predict(images_decode, **kwargs) results = self.predict(images_decode, **kwargs)
return results return results
if __name__ == '__main__': if __name__ == '__main__':
ocr = OCRRec() ocr = OCRRec()
image_path = [ image_path = [
...@@ -130,4 +128,4 @@ if __name__ == '__main__': ...@@ -130,4 +128,4 @@ if __name__ == '__main__':
'./doc/imgs_words/ch/word_3.jpg', './doc/imgs_words/ch/word_3.jpg',
] ]
res = ocr.predict(paths=image_path) res = ocr.predict(paths=image_path)
print(res) print(res)
\ No newline at end of file
...@@ -38,6 +38,15 @@ def read_params(): ...@@ -38,6 +38,15 @@ def read_params():
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
cfg.use_space_char = True cfg.use_space_char = True
#params for text classifier
cfg.use_angle_cls = True
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
cfg.use_zero_copy_run = False cfg.use_zero_copy_run = False
cfg.use_pdserving = False
return cfg return cfg
...@@ -3,11 +3,14 @@ from __future__ import absolute_import ...@@ -3,11 +3,14 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import os
import sys
sys.path.insert(0, ".")
import argparse import argparse
import ast import ast
import copy import copy
import math import math
import os
import time import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
...@@ -52,7 +55,7 @@ class OCRSystem(hub.Module): ...@@ -52,7 +55,7 @@ class OCRSystem(hub.Module):
) )
cfg.ir_optim = True cfg.ir_optim = True
cfg.enable_mkldnn = enable_mkldnn cfg.enable_mkldnn = enable_mkldnn
self.text_sys = TextSystem(cfg) self.text_sys = TextSystem(cfg)
def read_images(self, paths=[]): def read_images(self, paths=[]):
...@@ -67,9 +70,7 @@ class OCRSystem(hub.Module): ...@@ -67,9 +70,7 @@ class OCRSystem(hub.Module):
images.append(img) images.append(img)
return images return images
def predict(self, def predict(self, images=[], paths=[]):
images=[],
paths=[]):
""" """
Get the chinese texts in the predicted images. Get the chinese texts in the predicted images.
Args: Args:
...@@ -104,13 +105,11 @@ class OCRSystem(hub.Module): ...@@ -104,13 +105,11 @@ class OCRSystem(hub.Module):
for dno in range(dt_num): for dno in range(dt_num):
text, score = rec_res[dno] text, score = rec_res[dno]
rec_res_final.append( rec_res_final.append({
{ 'text': text,
'text': text, 'confidence': float(score),
'confidence': float(score), 'text_region': dt_boxes[dno].astype(np.int).tolist()
'text_region': dt_boxes[dno].astype(np.int).tolist() })
}
)
all_results.append(rec_res_final) all_results.append(rec_res_final)
return all_results return all_results
...@@ -123,7 +122,7 @@ class OCRSystem(hub.Module): ...@@ -123,7 +122,7 @@ class OCRSystem(hub.Module):
results = self.predict(images_decode, **kwargs) results = self.predict(images_decode, **kwargs)
return results return results
if __name__ == '__main__': if __name__ == '__main__':
ocr = OCRSystem() ocr = OCRSystem()
image_path = [ image_path = [
...@@ -131,4 +130,4 @@ if __name__ == '__main__': ...@@ -131,4 +130,4 @@ if __name__ == '__main__':
'./doc/imgs/12.jpg', './doc/imgs/12.jpg',
] ]
res = ocr.predict(paths=image_path) res = ocr.predict(paths=image_path)
print(res) print(res)
\ No newline at end of file
...@@ -39,12 +39,14 @@ def read_params(): ...@@ -39,12 +39,14 @@ def read_params():
cfg.use_space_char = True cfg.use_space_char = True
#params for text classifier #params for text classifier
cfg.use_angle_cls = False cfg.use_angle_cls = True
cfg.cls_model_dir = "./inference/ch_ppocr_mobile-v1.1.cls_infer/" cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
cfg.cls_image_shape = "3, 48, 192" cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180'] cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30 cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
cfg.use_zero_copy_run = False cfg.use_zero_copy_run = False
cfg.use_pdserving = False
return cfg return cfg
[English](readme_en.md) | 简体中文 [English](readme_en.md) | 简体中文
PaddleOCR提供2种服务部署方式: PaddleOCR提供2种服务部署方式:
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用; - 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用;
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../pdserving/readme.md) - 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../../doc/doc_ch/serving_inference.md)
# 基于PaddleHub Serving的服务部署 # 基于PaddleHub Serving的服务部署
...@@ -29,17 +29,15 @@ deploy/hubserving/ocr_system/ ...@@ -29,17 +29,15 @@ deploy/hubserving/ocr_system/
```shell ```shell
# 安装paddlehub # 安装paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# 在Linux下设置环境变量
export PYTHONPATH=.
# 或者,在Windows下设置环境变量
SET PYTHONPATH=.
``` ```
### 2. 下载推理模型 ### 2. 下载推理模型
安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认检测模型路径为: 安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认模型路径为:
`./inference/ch_ppocr_mobile_v1.1_det_infer/`,识别模型路径为:`./inference/ch_ppocr_mobile_v1.1_rec_infer/` ```
检测模型:./inference/ch_ppocr_mobile_v1.1_det_infer/
识别模型:./inference/ch_ppocr_mobile_v1.1_rec_infer/
方向分类器:./inference/ch_ppocr_mobile_v1.1_cls_infer/
```
**模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。 **模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。
...@@ -173,7 +171,7 @@ hub serving start -c deploy/hubserving/ocr_system/config.json ...@@ -173,7 +171,7 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
```hub serving stop --port/-p XXXX``` ```hub serving stop --port/-p XXXX```
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。 - 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。
例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。** 例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,如果需要关闭文本方向分类器,则将参数`use_angle_cls`置为`False`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。**
- 3、 卸载旧服务包 - 3、 卸载旧服务包
```hub uninstall ocr_system``` ```hub uninstall ocr_system```
......
English | [简体中文](readme.md) English | [简体中文](readme.md)
PaddleOCR provides 2 service deployment methods: PaddleOCR provides 2 service deployment methods:
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial. - Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial.
- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../pdserving/readme_en.md) for usage. - Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../../doc/doc_ch/serving_inference.md) for usage.
# Service deployment based on PaddleHub Serving # Service deployment based on PaddleHub Serving
...@@ -30,16 +30,15 @@ The following steps take the 2-stage series service as an example. If only the d ...@@ -30,16 +30,15 @@ The following steps take the 2-stage series service as an example. If only the d
```shell ```shell
# Install paddlehub # Install paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# Set environment variables on Linux
export PYTHONPATH=.
# Set environment variables on Windows
SET PYTHONPATH=.
``` ```
### 2. Download inference model ### 2. Download inference model
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default detection model path is: `./inference/ch_ppocr_mobile_v1.1_det_infer/`, the default recognition model path is: `./inference/ch_ppocr_mobile_v1.1_rec_infer/`. Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default model path is:
```
detection model: ./inference/ch_ppocr_mobile_v1.1_det_infer/
recognition model: ./inference/ch_ppocr_mobile_v1.1_rec_infer/
text direction classifier: ./inference/ch_ppocr_mobile_v1.1_cls_infer/
```
**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself. **The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself.
...@@ -180,7 +179,7 @@ If you need to modify the service logic, the following steps are generally requi ...@@ -180,7 +179,7 @@ If you need to modify the service logic, the following steps are generally requi
hub serving stop --port/-p XXXX hub serving stop --port/-p XXXX
``` ```
- 2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs. - 2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs.
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. It is suggested to run `module.py` directly for debugging after modification before starting the service test. For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. If you want to turn off the text direction classifier, set the parameter `use_angle_cls` to `False`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. It is suggested to run `module.py` directly for debugging after modification before starting the service test.
- 3. Uninstall old service module - 3. Uninstall old service module
```shell ```shell
hub uninstall ocr_system hub uninstall ocr_system
......
max_side_len 960 max_side_len 960
det_db_thresh 0.3 det_db_thresh 0.3
det_db_box_thresh 0.5 det_db_box_thresh 0.5
det_db_unclip_ratio 1.6 det_db_unclip_ratio 1.6
\ No newline at end of file use_direction_classify 0
\ No newline at end of file
...@@ -112,7 +112,7 @@ OrderPointsClockwise(std::vector<std::vector<int>> pts) { ...@@ -112,7 +112,7 @@ OrderPointsClockwise(std::vector<std::vector<int>> pts) {
} }
std::vector<std::vector<float>> GetMiniBoxes(cv::RotatedRect box, float &ssid) { std::vector<std::vector<float>> GetMiniBoxes(cv::RotatedRect box, float &ssid) {
ssid = std::max(box.size.width, box.size.height); ssid = std::min(box.size.width, box.size.height);
cv::Mat points; cv::Mat points;
cv::boxPoints(box, points); cv::boxPoints(box, points);
......
...@@ -107,13 +107,14 @@ cv::Mat DetResizeImg(const cv::Mat img, int max_size_len, ...@@ -107,13 +107,14 @@ cv::Mat DetResizeImg(const cv::Mat img, int max_size_len,
} }
cv::Mat RunClsModel(cv::Mat img, std::shared_ptr<PaddlePredictor> predictor_cls, cv::Mat RunClsModel(cv::Mat img, std::shared_ptr<PaddlePredictor> predictor_cls,
const float thresh = 0.5) { const float thresh = 0.9) {
std::vector<float> mean = {0.5f, 0.5f, 0.5f}; std::vector<float> mean = {0.5f, 0.5f, 0.5f};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
cv::Mat srcimg; cv::Mat srcimg;
img.copyTo(srcimg); img.copyTo(srcimg);
cv::Mat crop_img; cv::Mat crop_img;
img.copyTo(crop_img);
cv::Mat resize_img; cv::Mat resize_img;
int index = 0; int index = 0;
...@@ -154,7 +155,8 @@ void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img, ...@@ -154,7 +155,8 @@ void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img,
std::vector<std::string> &rec_text, std::vector<std::string> &rec_text,
std::vector<float> &rec_text_score, std::vector<float> &rec_text_score,
std::vector<std::string> charactor_dict, std::vector<std::string> charactor_dict,
std::shared_ptr<PaddlePredictor> predictor_cls) { std::shared_ptr<PaddlePredictor> predictor_cls,
int use_direction_classify) {
std::vector<float> mean = {0.5f, 0.5f, 0.5f}; std::vector<float> mean = {0.5f, 0.5f, 0.5f};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
...@@ -166,7 +168,9 @@ void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img, ...@@ -166,7 +168,9 @@ void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img,
int index = 0; int index = 0;
for (int i = boxes.size() - 1; i >= 0; i--) { for (int i = boxes.size() - 1; i >= 0; i--) {
crop_img = GetRotateCropImage(srcimg, boxes[i]); crop_img = GetRotateCropImage(srcimg, boxes[i]);
crop_img = RunClsModel(crop_img, predictor_cls); if (use_direction_classify >= 1) {
crop_img = RunClsModel(crop_img, predictor_cls);
}
float wh_ratio = float wh_ratio =
static_cast<float>(crop_img.cols) / static_cast<float>(crop_img.rows); static_cast<float>(crop_img.cols) / static_cast<float>(crop_img.rows);
...@@ -290,7 +294,7 @@ RunDetModel(std::shared_ptr<PaddlePredictor> predictor, cv::Mat img, ...@@ -290,7 +294,7 @@ RunDetModel(std::shared_ptr<PaddlePredictor> predictor, cv::Mat img,
cv::Mat bit_map; cv::Mat bit_map;
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY); cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
cv::Mat dilation_map; cv::Mat dilation_map;
cv::Mat dila_ele = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2,2)); cv::Mat dila_ele = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2, 2));
cv::dilate(bit_map, dilation_map, dila_ele); cv::dilate(bit_map, dilation_map, dila_ele);
auto boxes = BoxesFromBitmap(pred_map, dilation_map, Config); auto boxes = BoxesFromBitmap(pred_map, dilation_map, Config);
...@@ -366,7 +370,8 @@ std::map<std::string, double> LoadConfigTxt(std::string config_path) { ...@@ -366,7 +370,8 @@ std::map<std::string, double> LoadConfigTxt(std::string config_path) {
int main(int argc, char **argv) { int main(int argc, char **argv) {
if (argc < 5) { if (argc < 5) {
std::cerr << "[ERROR] usage: " << argv[0] std::cerr << "[ERROR] usage: " << argv[0]
<< " det_model_file rec_model_file image_path\n"; << " det_model_file cls_model_file rec_model_file image_path "
"charactor_dict\n";
exit(1); exit(1);
} }
std::string det_model_file = argv[1]; std::string det_model_file = argv[1];
...@@ -377,6 +382,7 @@ int main(int argc, char **argv) { ...@@ -377,6 +382,7 @@ int main(int argc, char **argv) {
//// load config from txt file //// load config from txt file
auto Config = LoadConfigTxt("./config.txt"); auto Config = LoadConfigTxt("./config.txt");
int use_direction_classify = int(Config["use_direction_classify"]);
auto start = std::chrono::system_clock::now(); auto start = std::chrono::system_clock::now();
...@@ -392,8 +398,9 @@ int main(int argc, char **argv) { ...@@ -392,8 +398,9 @@ int main(int argc, char **argv) {
std::vector<std::string> rec_text; std::vector<std::string> rec_text;
std::vector<float> rec_text_score; std::vector<float> rec_text_score;
RunRecModel(boxes, srcimg, rec_predictor, rec_text, rec_text_score, RunRecModel(boxes, srcimg, rec_predictor, rec_text, rec_text_score,
charactor_dict, cls_predictor); charactor_dict, cls_predictor, use_direction_classify);
auto end = std::chrono::system_clock::now(); auto end = std::chrono::system_clock::now();
auto duration = auto duration =
......
...@@ -15,10 +15,9 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理 ...@@ -15,10 +15,9 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理
交叉编译环境用于编译 Paddle Lite 和 PaddleOCR 的C++ demo。 交叉编译环境用于编译 Paddle Lite 和 PaddleOCR 的C++ demo。
支持多种开发环境,不同开发环境的编译流程请参考对应文档。 支持多种开发环境,不同开发环境的编译流程请参考对应文档。
1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#docker) 1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#android) 2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#id13) 3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
4. [Windows](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/x86.html#id4)
### 1.2 准备预测库 ### 1.2 准备预测库
...@@ -26,17 +25,16 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理 ...@@ -26,17 +25,16 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理
- 1. 直接下载,预测库下载链接如下: - 1. 直接下载,预测库下载链接如下:
|平台|预测库下载链接| |平台|预测库下载链接|
|-|-| |-|-|
|Android|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv7.gcc.c++_static.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.CV_ON.tar.gz)| |Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)|
|IOS|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios.armv7.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios64.armv8.with_extra.CV_ON.tar.gz)| |IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)|
|x86(Linux)|[预测库](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/X86/Linux/inference_lite_lib.x86.linux.tar.gz)|
注:如果是从下Paddle-Lite[官网文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/release_lib.html#android-toolchain-gcc)下载的预测库, 注:1. 上述预测库为PaddleLite 2.6.3分支编译得到,有关PaddleLite 2.6.3 详细信息可参考[链接](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.6.3)。
注意选择`with_extra=ON,with_cv=ON`的下载链接。
- 2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下: - 2. [推荐]编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下:
``` ```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite cd Paddle-Lite
# 切换到Paddle-Lite develop稳定分支
git checkout develop git checkout develop
./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON ./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON
``` ```
...@@ -80,11 +78,17 @@ inference_lite_lib.android.armv8/ ...@@ -80,11 +78,17 @@ inference_lite_lib.android.armv8/
Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-lite的opt工具可以自动 Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-lite的opt工具可以自动
对inference模型进行优化,优化后的模型更轻量,模型运行速度更快。 对inference模型进行优化,优化后的模型更轻量,模型运行速度更快。
下述表格中提供了优化好的超轻量中文模型: 如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。
|模型简介|检测模型|识别模型|Paddle-Lite版本| 下述表格中也提供了一系列中文移动端模型:
|-|-|-|-|
|超轻量级中文OCR opt优化模型|[下载地址](https://paddleocr.bj.bcebos.com/deploy/lite/ch_det_mv3_db_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/deploy/lite/ch_rec_mv3_crnn_opt.nb)|develop| |模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本|
|-|-|-|-|-|-|-|
|V1.1|超轻量中文OCR 移动端模型|8.1M|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_opt.nb)|develop|
|【slim】V1.1|超轻量中文OCR 移动端模型|3.5M|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|develop|
|V1.0|轻量级中文OCR 移动端模型|8.6M|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.0_det_opt.nb)|---|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.0_rec_opt.nb)|develop|
注意:V1.1 3.0M 轻量模型是使用PaddleSlim优化后的,需要配合Paddle-Lite最新预测库使用。
如果直接使用上述表格中的模型进行部署,可略过下述步骤,直接阅读 [2.2节](#2.2与手机联调) 如果直接使用上述表格中的模型进行部署,可略过下述步骤,直接阅读 [2.2节](#2.2与手机联调)
...@@ -96,6 +100,8 @@ Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括 ...@@ -96,6 +100,8 @@ Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括
git clone https://github.com/PaddlePaddle/Paddle-Lite.git git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite cd Paddle-Lite
git checkout develop git checkout develop
# 切换到固定的commit
git reset --hard 55c53482bcdd2868373d024dd1144e4c5ec0e6b8
# 启动编译 # 启动编译
./lite/tools/build.sh build_optimize_tool ./lite/tools/build.sh build_optimize_tool
``` ```
...@@ -121,20 +127,27 @@ cd build.opt/lite/api/ ...@@ -121,20 +127,27 @@ cd build.opt/lite/api/
下面以PaddleOCR的超轻量中文模型为例,介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。 下面以PaddleOCR的超轻量中文模型为例,介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。
``` ```
# 下载PaddleOCR的超轻量文inference模型,并解压 # 【推荐】 下载PaddleOCR V1.1版本的中英文 inference模型,V1.1比1.0效果更好,模型更小
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_prune_infer.tar
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_quant_infer.tar
# 转换V1.1检测模型
./opt --model_file=./ch_ppocr_mobile_v1.1_det_prune_infer/model --param_file=./ch_ppocr_mobile_v1.1_det_prune_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_det_prune_opt --valid_targets=arm
# 转换V1.1识别模型
./opt --model_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/model --param_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_rec_quant_opt --valid_targets=arm
# 或下载使用PaddleOCR的V1.0超轻量中英文 inference模型,解压并转换为移动端支持的模型
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
# 转换V1.0检测模型
# 转换检测模型
./opt --model_file=./ch_det_mv3_db/model --param_file=./ch_det_mv3_db/params --optimize_out_type=naive_buffer --optimize_out=./ch_det_mv3_db_opt --valid_targets=arm ./opt --model_file=./ch_det_mv3_db/model --param_file=./ch_det_mv3_db/params --optimize_out_type=naive_buffer --optimize_out=./ch_det_mv3_db_opt --valid_targets=arm
# 转换V1.0识别模型
# 转换识别模型
./opt --model_file=./ch_rec_mv3_crnn/model --param_file=./ch_rec_mv3_crnn/params --optimize_out_type=naive_buffer --optimize_out=./ch_rec_mv3_crnn_opt --valid_targets=arm ./opt --model_file=./ch_rec_mv3_crnn/model --param_file=./ch_rec_mv3_crnn/params --optimize_out_type=naive_buffer --optimize_out=./ch_rec_mv3_crnn_opt --valid_targets=arm
``` ```
转换成功后,当前目录下会多出`ch_det_mv3_db_opt.nb`, `ch_rec_mv3_crnn_opt.nb`结尾的文件,即是转换成功的模型文件。 转换成功后,当前目录下会多出`.nb`结尾的文件,即是转换成功的模型文件。
注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt 转换的输入模型是paddle保存的inference模型 注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt 工具的输入模型是paddle保存的inference模型
<a name="2.2与手机联调"></a> <a name="2.2与手机联调"></a>
### 2.2 与手机联调 ### 2.2 与手机联调
...@@ -182,17 +195,18 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar ...@@ -182,17 +195,18 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar
``` ```
准备测试图像,以`PaddleOCR/doc/imgs/11.jpg`为例,将测试的图像复制到`demo/cxx/ocr/debug/`文件夹下。 准备测试图像,以`PaddleOCR/doc/imgs/11.jpg`为例,将测试的图像复制到`demo/cxx/ocr/debug/`文件夹下。
准备lite opt工具优化后的模型文件,`ch_det_mv3_db_opt.nb,ch_rec_mv3_crnn_opt.nb`放置在`demo/cxx/ocr/debug/`文件夹下。 准备lite opt工具优化后的模型文件,比如使用`ch_ppocr_mobile_v1.1_det_prune_opt.nb,ch_ppocr_mobile_v1.1_rec_quant_opt.nb, ch_ppocr_mobile_cls_quant_opt.nb`,模型文件放置在`demo/cxx/ocr/debug/`文件夹下。
执行完成后,ocr文件夹下将有如下文件格式: 执行完成后,ocr文件夹下将有如下文件格式:
``` ```
demo/cxx/ocr/ demo/cxx/ocr/
|-- debug/ |-- debug/
| |--ch_det_mv3_db_opt.nb 优化后的检测模型文件 | |--ch_ppocr_mobile_v1.1_det_prune_opt.nb 优化后的检测模型文件
| |--ch_rec_mv3_crnn_opt.nb 优化后的识别模型文件 | |--ch_ppocr_mobile_v1.1_rec_quant_opt.nb 优化后的识别模型文件
| |--ch_ppocr_mobile_cls_quant_opt.nb 优化后的文字方向分类器模型文件
| |--11.jpg 待测试图像 | |--11.jpg 待测试图像
| |--ppocr_keys_v1.txt 字典文件 | |--ppocr_keys_v1.txt 中文字典文件
| |--libpaddle_light_api_shared.so C++预测库文件 | |--libpaddle_light_api_shared.so C++预测库文件
| |--config.txt DB-CRNN超参数配置 | |--config.txt DB-CRNN超参数配置
|-- config.txt DB-CRNN超参数配置 |-- config.txt DB-CRNN超参数配置
...@@ -202,7 +216,27 @@ demo/cxx/ocr/ ...@@ -202,7 +216,27 @@ demo/cxx/ocr/
|-- db_post_process.h |-- db_post_process.h
|-- Makefile 编译文件 |-- Makefile 编译文件
|-- ocr_db_crnn.cc C++预测源文件 |-- ocr_db_crnn.cc C++预测源文件
```
#### 注意:
1. ppocr_keys_v1.txt是中文字典文件,如果使用的 nb 模型是英文数字或其他语言的模型,需要更换为对应语言的字典。
PaddleOCR 在ppocr/utils/下存放了多种字典,包括:
```
dict/french_dict.txt # 法语字典
dict/german_dict.txt # 德语字典
ic15_dict.txt # 英文字典
dict/japan_dict.txt # 日语字典
dict/korean_dict.txt # 韩语字典
ppocr_keys_v1.txt # 中文字典
```
2. `config.txt` 包含了检测器、分类器的超参数,如下:
```
max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960
det_db_thresh 0.3 # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显
det_db_box_thresh 0.5 # DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小
det_db_unclip_ratio 1.6 # 表示文本框的紧致程度,越小则文本框更靠近文本
use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1表示使用
``` ```
5. 启动调试 5. 启动调试
...@@ -212,7 +246,7 @@ demo/cxx/ocr/ ...@@ -212,7 +246,7 @@ demo/cxx/ocr/
``` ```
# 执行编译,得到可执行文件ocr_db_crnn # 执行编译,得到可执行文件ocr_db_crnn
# ocr_db_crnn可执行文件的使用方式为: # ocr_db_crnn可执行文件的使用方式为:
# ./ocr_db_crnn 检测模型文件 识别模型文件 测试图像路径 # ./ocr_db_crnn 检测模型文件 方向分类器模型文件 识别模型文件 测试图像路径 字典文件路径
make -j make -j
# 将编译的可执行文件移动到debug文件夹中 # 将编译的可执行文件移动到debug文件夹中
mv ocr_db_crnn ./debug/ mv ocr_db_crnn ./debug/
...@@ -220,8 +254,8 @@ demo/cxx/ocr/ ...@@ -220,8 +254,8 @@ demo/cxx/ocr/
adb push debug /data/local/tmp/ adb push debug /data/local/tmp/
adb shell adb shell
cd /data/local/tmp/debug cd /data/local/tmp/debug
export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
./ocr_db_crnn ch_det_mv3_db_opt.nb ch_rec_mv3_crnn_opt.nb ./11.jpg ppocr_keys_v1.txt ./ocr_db_crnn ch_ppocr_mobile_v1.1_det_prune_opt.nb ch_ppocr_mobile_v1.1_rec_quant_opt.nb ch_ppocr_mobile_cls_quant_opt.nb ./11.jpg ppocr_keys_v1.txt
``` ```
如果对代码做了修改,则需要重新编译并push到手机上。 如果对代码做了修改,则需要重新编译并push到手机上。
...@@ -231,3 +265,14 @@ demo/cxx/ocr/ ...@@ -231,3 +265,14 @@ demo/cxx/ocr/
<div align="center"> <div align="center">
<img src="../imgs/demo.png" width="600"> <img src="../imgs/demo.png" width="600">
</div> </div>
## FAQ
Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗?
A1:如果已经走通了上述步骤,更换模型只需要替换 .nb 模型文件即可,同时要注意字典更新
Q2:换一个图测试怎么做?
A2:替换debug下的.jpg测试图像为你想要测试的图像,adb push 到手机上即可
Q3:如何封装到手机APP中?
A3:此demo旨在提供能在手机上运行OCR的核心算法部分,PaddleOCR/deploy/android_demo是将这个demo封装到手机app的示例,供参考
...@@ -14,18 +14,18 @@ deployment solutions for end-side deployment issues. ...@@ -14,18 +14,18 @@ deployment solutions for end-side deployment issues.
- Mobile phone (arm7 or arm8) - Mobile phone (arm7 or arm8)
## 2. Build PaddleLite library ## 2. Build PaddleLite library
[build for Docker](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#docker) 1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
[build for Linux](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#android) 2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
[build for MAC OS](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#id13) 3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
[build for windows](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/x86.html#id4)
## 3. Download prebuild library for android and ios ## 3. Download prebuild library for android and ios
|Platform|Prebuild library Download Link| |Platform|Prebuild library Download Link|
|-|-| |-|-|
|Android|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv7.gcc.c++_static.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.CV_ON.tar.gz)| |Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)|
|IOS|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios.armv7.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios64.armv8.with_extra.CV_ON.tar.gz)| |IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.6.3/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)|
|x86(Linux)|[预测库](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/X86/Linux/inference_lite_lib.x86.linux.tar.gz)|
note: The above pre-build inference library is compiled from the PaddleLite `release/2.6.3` branch. For more information about PaddleLite 2.6.3, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.6.3).
The structure of the prediction library is as follows: The structure of the prediction library is as follows:
...@@ -56,17 +56,20 @@ inference_lite_lib.android.armv8/ ...@@ -56,17 +56,20 @@ inference_lite_lib.android.armv8/
``` ```
## 4. Inference Model Optimization ## 4. Inference Model Optimization
Paddle Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle Lite provide opt tools to automatically complete the optimization steps and output a lightweight, optimal executable model. Paddle Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle Lite provide opt tools to automatically complete the optimization steps and output a lightweight, optimal executable model.
If you use PaddleOCR 8.6M OCR model to deploy, you can directly download the optimized model. If you have prepared the model file ending in `.nb`, you can skip this step.
The following table also provides a series of models that can be deployed on mobile phones to recognize Chinese.
You can directly download the optimized model.
|Introduction|Detection model|Recognition model|Paddle Lite branch | | Version | Introduction | Model size | Detection model | Text Direction model | Recognition model | Paddle Lite branch |
|-|-|-|-| | - | - | - | - | - | - | - |
|lightweight Chinese OCR optimized model|[Download](https://paddleocr.bj.bcebos.com/deploy/lite/ch_det_mv3_db_opt.nb)|[Download](https://paddleocr.bj.bcebos.com/deploy/lite/ch_rec_mv3_crnn_opt.nb)|develop| | V1.1 | extra-lightweight chinese OCR optimized model | 8.1M | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_opt.nb) | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_opt.nb) | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_opt.nb) | develop |
| [slim] V1.1 | extra-lightweight chinese OCR optimized model | 3.5M | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) | develop |
| V1.0 | lightweight Chinese OCR optimized model | 8.6M | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.0_det_opt.nb) | - | [Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.0_rec_opt.nb) | develop |
If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model. If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model.
...@@ -74,6 +77,8 @@ If the model to be deployed is not in the above table, you need to follow the st ...@@ -74,6 +77,8 @@ If the model to be deployed is not in the above table, you need to follow the st
git clone https://github.com/PaddlePaddle/Paddle-Lite.git git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite cd Paddle-Lite
git checkout develop git checkout develop
# switch to the specified commit
git reset --hard 55c53482bcdd2868373d024dd1144e4c5ec0e6b8
./lite/tools/build.sh build_optimize_tool ./lite/tools/build.sh build_optimize_tool
``` ```
...@@ -85,6 +90,14 @@ The `opt` can optimize the inference model saved by paddle.io.save_inference_mod ...@@ -85,6 +90,14 @@ The `opt` can optimize the inference model saved by paddle.io.save_inference_mod
The usage of opt is as follows: The usage of opt is as follows:
``` ```
# 【Recommend】V1.1 is better than V1.0. steps for convert V1.1 model to nb file are as follows
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_prune_infer.tar
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_quant_infer.tar
./opt --model_file=./ch_ppocr_mobile_v1.1_det_prune_infer/model --param_file=./ch_ppocr_mobile_v1.1_det_prune_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_det_prune_opt --valid_targets=arm
./opt --model_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/model --param_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_rec_quant_opt --valid_targets=arm
# or use V1.0 model
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
...@@ -93,8 +106,7 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar ...@@ -93,8 +106,7 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar
``` ```
When the above code command is completed, there will be two more files `ch_det_mv3_db_opt.nb`, When the above code command is completed, there will be two more files `.nb` in the current directory, which is the converted model file.
`ch_rec_mv3_crnn_opt.nb` in the current directory, which is the converted model file.
## 5. Run optimized model on Phone ## 5. Run optimized model on Phone
...@@ -153,8 +165,9 @@ The structure of the OCR demo is as follows after the above command is executed: ...@@ -153,8 +165,9 @@ The structure of the OCR demo is as follows after the above command is executed:
``` ```
demo/cxx/ocr/ demo/cxx/ocr/
|-- debug/ |-- debug/
| |--ch_det_mv3_db_opt.nb Detection model | |--ch_ppocr_mobile_v1.1_det_prune_opt.nb Detection model
| |--ch_rec_mv3_crnn_opt.nb Recognition model | |--ch_ppocr_mobile_v1.1_rec_quant_opt.nb Recognition model
| |--ch_ppocr_mobile_cls_quant_opt.nb Text direction classification model
| |--11.jpg Image for OCR | |--11.jpg Image for OCR
| |--ppocr_keys_v1.txt Dictionary file | |--ppocr_keys_v1.txt Dictionary file
| |--libpaddle_light_api_shared.so C++ .so file | |--libpaddle_light_api_shared.so C++ .so file
...@@ -169,6 +182,28 @@ demo/cxx/ocr/ ...@@ -169,6 +182,28 @@ demo/cxx/ocr/
``` ```
#### Note:
1. ppocr_keys_v1.txt is a Chinese dictionary file.
If the nb model is used for English recognition or other language recognition, dictionary file should be replaced with a dictionary of the corresponding language.
PaddleOCR provides a variety of dictionaries under ppocr/utils/, including:
```
dict/french_dict.txt # french
dict/german_dict.txt # german
ic15_dict.txt # english
dict/japan_dict.txt # japan
dict/korean_dict.txt # korean
ppocr_keys_v1.txt # chinese
```
2. `config.txt` of the detector and classifier, as shown below:
```
max_side_len 960 # Limit the maximum image height and width to 960
det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
use_direction_classify 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use
```
5. Run Model on phone 5. Run Model on phone
``` ```
...@@ -180,7 +215,7 @@ adb shell ...@@ -180,7 +215,7 @@ adb shell
cd /data/local/tmp/debug cd /data/local/tmp/debug
export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH
# run model # run model
./ocr_db_crnn ch_det_mv3_db_opt.nb ch_rec_mv3_crnn_opt.nb ./11.jpg ppocr_keys_v1.txt ./ocr_db_crnn ch_ppocr_mobile_v1.1_det_prune_opt.nb ch_ppocr_mobile_v1.1_rec_quant_opt.nb ch_ppocr_mobile_cls_quant_opt.nb ./11.jpg ppocr_keys_v1.txt
``` ```
The outputs are as follows: The outputs are as follows:
...@@ -188,3 +223,15 @@ The outputs are as follows: ...@@ -188,3 +223,15 @@ The outputs are as follows:
<div align="center"> <div align="center">
<img src="../imgs/demo.png" width="600"> <img src="../imgs/demo.png" width="600">
</div> </div>
## FAQ
Q1: What if I want to change the model, do I need to run it again according to the process?
A1: If you have performed the above steps, you only need to replace the .nb model file to complete the model replacement.
Q2: How to test with another picture?
A2: Replace the .jpg test image under `./debug` with the image you want to test, and run `adb push` to push new image to the phone.
Q3: How to package it into the mobile APP?
A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further,
PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference.
#!/bin/sh
# ----------------------------------------------------------------------------
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# ----------------------------------------------------------------------------
# ----------------------------------------------------------------------------
# Maven Start Up Batch script
#
# Required ENV vars:
# ------------------
# JAVA_HOME - location of a JDK home dir
#
# Optional ENV vars
# -----------------
# M2_HOME - location of maven2's installed home dir
# MAVEN_OPTS - parameters passed to the Java VM when running Maven
# e.g. to debug Maven itself, use
# set MAVEN_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000
# MAVEN_SKIP_RC - flag to disable loading of mavenrc files
# ----------------------------------------------------------------------------
if [ -z "$MAVEN_SKIP_RC" ] ; then
if [ -f /etc/mavenrc ] ; then
. /etc/mavenrc
fi
if [ -f "$HOME/.mavenrc" ] ; then
. "$HOME/.mavenrc"
fi
fi
# OS specific support. $var _must_ be set to either true or false.
cygwin=false;
darwin=false;
mingw=false
case "`uname`" in
CYGWIN*) cygwin=true ;;
MINGW*) mingw=true;;
Darwin*) darwin=true
# Use /usr/libexec/java_home if available, otherwise fall back to /Library/Java/Home
# See https://developer.apple.com/library/mac/qa/qa1170/_index.html
if [ -z "$JAVA_HOME" ]; then
if [ -x "/usr/libexec/java_home" ]; then
export JAVA_HOME="`/usr/libexec/java_home`"
else
export JAVA_HOME="/Library/Java/Home"
fi
fi
;;
esac
if [ -z "$JAVA_HOME" ] ; then
if [ -r /etc/gentoo-release ] ; then
JAVA_HOME=`java-config --jre-home`
fi
fi
if [ -z "$M2_HOME" ] ; then
## resolve links - $0 may be a link to maven's home
PRG="$0"
# need this for relative symlinks
while [ -h "$PRG" ] ; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG="`dirname "$PRG"`/$link"
fi
done
saveddir=`pwd`
M2_HOME=`dirname "$PRG"`/..
# make it fully qualified
M2_HOME=`cd "$M2_HOME" && pwd`
cd "$saveddir"
# echo Using m2 at $M2_HOME
fi
# For Cygwin, ensure paths are in UNIX format before anything is touched
if $cygwin ; then
[ -n "$M2_HOME" ] &&
M2_HOME=`cygpath --unix "$M2_HOME"`
[ -n "$JAVA_HOME" ] &&
JAVA_HOME=`cygpath --unix "$JAVA_HOME"`
[ -n "$CLASSPATH" ] &&
CLASSPATH=`cygpath --path --unix "$CLASSPATH"`
fi
# For Mingw, ensure paths are in UNIX format before anything is touched
if $mingw ; then
[ -n "$M2_HOME" ] &&
M2_HOME="`(cd "$M2_HOME"; pwd)`"
[ -n "$JAVA_HOME" ] &&
JAVA_HOME="`(cd "$JAVA_HOME"; pwd)`"
fi
if [ -z "$JAVA_HOME" ]; then
javaExecutable="`which javac`"
if [ -n "$javaExecutable" ] && ! [ "`expr \"$javaExecutable\" : '\([^ ]*\)'`" = "no" ]; then
# readlink(1) is not available as standard on Solaris 10.
readLink=`which readlink`
if [ ! `expr "$readLink" : '\([^ ]*\)'` = "no" ]; then
if $darwin ; then
javaHome="`dirname \"$javaExecutable\"`"
javaExecutable="`cd \"$javaHome\" && pwd -P`/javac"
else
javaExecutable="`readlink -f \"$javaExecutable\"`"
fi
javaHome="`dirname \"$javaExecutable\"`"
javaHome=`expr "$javaHome" : '\(.*\)/bin'`
JAVA_HOME="$javaHome"
export JAVA_HOME
fi
fi
fi
if [ -z "$JAVACMD" ] ; then
if [ -n "$JAVA_HOME" ] ; then
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD="$JAVA_HOME/jre/sh/java"
else
JAVACMD="$JAVA_HOME/bin/java"
fi
else
JAVACMD="`which java`"
fi
fi
if [ ! -x "$JAVACMD" ] ; then
echo "Error: JAVA_HOME is not defined correctly." >&2
echo " We cannot execute $JAVACMD" >&2
exit 1
fi
if [ -z "$JAVA_HOME" ] ; then
echo "Warning: JAVA_HOME environment variable is not set."
fi
CLASSWORLDS_LAUNCHER=org.codehaus.plexus.classworlds.launcher.Launcher
# traverses directory structure from process work directory to filesystem root
# first directory with .mvn subdirectory is considered project base directory
find_maven_basedir() {
if [ -z "$1" ]
then
echo "Path not specified to find_maven_basedir"
return 1
fi
basedir="$1"
wdir="$1"
while [ "$wdir" != '/' ] ; do
if [ -d "$wdir"/.mvn ] ; then
basedir=$wdir
break
fi
# workaround for JBEAP-8937 (on Solaris 10/Sparc)
if [ -d "${wdir}" ]; then
wdir=`cd "$wdir/.."; pwd`
fi
# end of workaround
done
echo "${basedir}"
}
# concatenates all lines of a file
concat_lines() {
if [ -f "$1" ]; then
echo "$(tr -s '\n' ' ' < "$1")"
fi
}
BASE_DIR=`find_maven_basedir "$(pwd)"`
if [ -z "$BASE_DIR" ]; then
exit 1;
fi
##########################################################################################
# Extension to allow automatically downloading the maven-wrapper.jar from Maven-central
# This allows using the maven wrapper in projects that prohibit checking in binary data.
##########################################################################################
if [ -r "$BASE_DIR/.mvn/wrapper/maven-wrapper.jar" ]; then
if [ "$MVNW_VERBOSE" = true ]; then
echo "Found .mvn/wrapper/maven-wrapper.jar"
fi
else
if [ "$MVNW_VERBOSE" = true ]; then
echo "Couldn't find .mvn/wrapper/maven-wrapper.jar, downloading it ..."
fi
if [ -n "$MVNW_REPOURL" ]; then
jarUrl="$MVNW_REPOURL/io/takari/maven-wrapper/0.5.6/maven-wrapper-0.5.6.jar"
else
jarUrl="https://repo.maven.apache.org/maven2/io/takari/maven-wrapper/0.5.6/maven-wrapper-0.5.6.jar"
fi
while IFS="=" read key value; do
case "$key" in (wrapperUrl) jarUrl="$value"; break ;;
esac
done < "$BASE_DIR/.mvn/wrapper/maven-wrapper.properties"
if [ "$MVNW_VERBOSE" = true ]; then
echo "Downloading from: $jarUrl"
fi
wrapperJarPath="$BASE_DIR/.mvn/wrapper/maven-wrapper.jar"
if $cygwin; then
wrapperJarPath=`cygpath --path --windows "$wrapperJarPath"`
fi
if command -v wget > /dev/null; then
if [ "$MVNW_VERBOSE" = true ]; then
echo "Found wget ... using wget"
fi
if [ -z "$MVNW_USERNAME" ] || [ -z "$MVNW_PASSWORD" ]; then
wget "$jarUrl" -O "$wrapperJarPath"
else
wget --http-user=$MVNW_USERNAME --http-password=$MVNW_PASSWORD "$jarUrl" -O "$wrapperJarPath"
fi
elif command -v curl > /dev/null; then
if [ "$MVNW_VERBOSE" = true ]; then
echo "Found curl ... using curl"
fi
if [ -z "$MVNW_USERNAME" ] || [ -z "$MVNW_PASSWORD" ]; then
curl -o "$wrapperJarPath" "$jarUrl" -f
else
curl --user $MVNW_USERNAME:$MVNW_PASSWORD -o "$wrapperJarPath" "$jarUrl" -f
fi
else
if [ "$MVNW_VERBOSE" = true ]; then
echo "Falling back to using Java to download"
fi
javaClass="$BASE_DIR/.mvn/wrapper/MavenWrapperDownloader.java"
# For Cygwin, switch paths to Windows format before running javac
if $cygwin; then
javaClass=`cygpath --path --windows "$javaClass"`
fi
if [ -e "$javaClass" ]; then
if [ ! -e "$BASE_DIR/.mvn/wrapper/MavenWrapperDownloader.class" ]; then
if [ "$MVNW_VERBOSE" = true ]; then
echo " - Compiling MavenWrapperDownloader.java ..."
fi
# Compiling the Java class
("$JAVA_HOME/bin/javac" "$javaClass")
fi
if [ -e "$BASE_DIR/.mvn/wrapper/MavenWrapperDownloader.class" ]; then
# Running the downloader
if [ "$MVNW_VERBOSE" = true ]; then
echo " - Running MavenWrapperDownloader.java ..."
fi
("$JAVA_HOME/bin/java" -cp .mvn/wrapper MavenWrapperDownloader "$MAVEN_PROJECTBASEDIR")
fi
fi
fi
fi
##########################################################################################
# End of extension
##########################################################################################
export MAVEN_PROJECTBASEDIR=${MAVEN_BASEDIR:-"$BASE_DIR"}
if [ "$MVNW_VERBOSE" = true ]; then
echo $MAVEN_PROJECTBASEDIR
fi
MAVEN_OPTS="$(concat_lines "$MAVEN_PROJECTBASEDIR/.mvn/jvm.config") $MAVEN_OPTS"
# For Cygwin, switch paths to Windows format before running java
if $cygwin; then
[ -n "$M2_HOME" ] &&
M2_HOME=`cygpath --path --windows "$M2_HOME"`
[ -n "$JAVA_HOME" ] &&
JAVA_HOME=`cygpath --path --windows "$JAVA_HOME"`
[ -n "$CLASSPATH" ] &&
CLASSPATH=`cygpath --path --windows "$CLASSPATH"`
[ -n "$MAVEN_PROJECTBASEDIR" ] &&
MAVEN_PROJECTBASEDIR=`cygpath --path --windows "$MAVEN_PROJECTBASEDIR"`
fi
# Provide a "standardized" way to retrieve the CLI args that will
# work with both Windows and non-Windows executions.
MAVEN_CMD_LINE_ARGS="$MAVEN_CONFIG $@"
export MAVEN_CMD_LINE_ARGS
WRAPPER_LAUNCHER=org.apache.maven.wrapper.MavenWrapperMain
exec "$JAVACMD" \
$MAVEN_OPTS \
-classpath "$MAVEN_PROJECTBASEDIR/.mvn/wrapper/maven-wrapper.jar" \
"-Dmaven.home=${M2_HOME}" "-Dmaven.multiModuleProjectDirectory=${MAVEN_PROJECTBASEDIR}" \
${WRAPPER_LAUNCHER} $MAVEN_CONFIG "$@"
@REM ----------------------------------------------------------------------------
@REM Licensed to the Apache Software Foundation (ASF) under one
@REM or more contributor license agreements. See the NOTICE file
@REM distributed with this work for additional information
@REM regarding copyright ownership. The ASF licenses this file
@REM to you under the Apache License, Version 2.0 (the
@REM "License"); you may not use this file except in compliance
@REM with the License. You may obtain a copy of the License at
@REM
@REM https://www.apache.org/licenses/LICENSE-2.0
@REM
@REM Unless required by applicable law or agreed to in writing,
@REM software distributed under the License is distributed on an
@REM "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@REM KIND, either express or implied. See the License for the
@REM specific language governing permissions and limitations
@REM under the License.
@REM ----------------------------------------------------------------------------
@REM ----------------------------------------------------------------------------
@REM Maven Start Up Batch script
@REM
@REM Required ENV vars:
@REM JAVA_HOME - location of a JDK home dir
@REM
@REM Optional ENV vars
@REM M2_HOME - location of maven2's installed home dir
@REM MAVEN_BATCH_ECHO - set to 'on' to enable the echoing of the batch commands
@REM MAVEN_BATCH_PAUSE - set to 'on' to wait for a keystroke before ending
@REM MAVEN_OPTS - parameters passed to the Java VM when running Maven
@REM e.g. to debug Maven itself, use
@REM set MAVEN_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000
@REM MAVEN_SKIP_RC - flag to disable loading of mavenrc files
@REM ----------------------------------------------------------------------------
@REM Begin all REM lines with '@' in case MAVEN_BATCH_ECHO is 'on'
@echo off
@REM set title of command window
title %0
@REM enable echoing by setting MAVEN_BATCH_ECHO to 'on'
@if "%MAVEN_BATCH_ECHO%" == "on" echo %MAVEN_BATCH_ECHO%
@REM set %HOME% to equivalent of $HOME
if "%HOME%" == "" (set "HOME=%HOMEDRIVE%%HOMEPATH%")
@REM Execute a user defined script before this one
if not "%MAVEN_SKIP_RC%" == "" goto skipRcPre
@REM check for pre script, once with legacy .bat ending and once with .cmd ending
if exist "%HOME%\mavenrc_pre.bat" call "%HOME%\mavenrc_pre.bat"
if exist "%HOME%\mavenrc_pre.cmd" call "%HOME%\mavenrc_pre.cmd"
:skipRcPre
@setlocal
set ERROR_CODE=0
@REM To isolate internal variables from possible post scripts, we use another setlocal
@setlocal
@REM ==== START VALIDATION ====
if not "%JAVA_HOME%" == "" goto OkJHome
echo.
echo Error: JAVA_HOME not found in your environment. >&2
echo Please set the JAVA_HOME variable in your environment to match the >&2
echo location of your Java installation. >&2
echo.
goto error
:OkJHome
if exist "%JAVA_HOME%\bin\java.exe" goto init
echo.
echo Error: JAVA_HOME is set to an invalid directory. >&2
echo JAVA_HOME = "%JAVA_HOME%" >&2
echo Please set the JAVA_HOME variable in your environment to match the >&2
echo location of your Java installation. >&2
echo.
goto error
@REM ==== END VALIDATION ====
:init
@REM Find the project base dir, i.e. the directory that contains the folder ".mvn".
@REM Fallback to current working directory if not found.
set MAVEN_PROJECTBASEDIR=%MAVEN_BASEDIR%
IF NOT "%MAVEN_PROJECTBASEDIR%"=="" goto endDetectBaseDir
set EXEC_DIR=%CD%
set WDIR=%EXEC_DIR%
:findBaseDir
IF EXIST "%WDIR%"\.mvn goto baseDirFound
cd ..
IF "%WDIR%"=="%CD%" goto baseDirNotFound
set WDIR=%CD%
goto findBaseDir
:baseDirFound
set MAVEN_PROJECTBASEDIR=%WDIR%
cd "%EXEC_DIR%"
goto endDetectBaseDir
:baseDirNotFound
set MAVEN_PROJECTBASEDIR=%EXEC_DIR%
cd "%EXEC_DIR%"
:endDetectBaseDir
IF NOT EXIST "%MAVEN_PROJECTBASEDIR%\.mvn\jvm.config" goto endReadAdditionalConfig
@setlocal EnableExtensions EnableDelayedExpansion
for /F "usebackq delims=" %%a in ("%MAVEN_PROJECTBASEDIR%\.mvn\jvm.config") do set JVM_CONFIG_MAVEN_PROPS=!JVM_CONFIG_MAVEN_PROPS! %%a
@endlocal & set JVM_CONFIG_MAVEN_PROPS=%JVM_CONFIG_MAVEN_PROPS%
:endReadAdditionalConfig
SET MAVEN_JAVA_EXE="%JAVA_HOME%\bin\java.exe"
set WRAPPER_JAR="%MAVEN_PROJECTBASEDIR%\.mvn\wrapper\maven-wrapper.jar"
set WRAPPER_LAUNCHER=org.apache.maven.wrapper.MavenWrapperMain
set DOWNLOAD_URL="https://repo.maven.apache.org/maven2/io/takari/maven-wrapper/0.5.6/maven-wrapper-0.5.6.jar"
FOR /F "tokens=1,2 delims==" %%A IN ("%MAVEN_PROJECTBASEDIR%\.mvn\wrapper\maven-wrapper.properties") DO (
IF "%%A"=="wrapperUrl" SET DOWNLOAD_URL=%%B
)
@REM Extension to allow automatically downloading the maven-wrapper.jar from Maven-central
@REM This allows using the maven wrapper in projects that prohibit checking in binary data.
if exist %WRAPPER_JAR% (
if "%MVNW_VERBOSE%" == "true" (
echo Found %WRAPPER_JAR%
)
) else (
if not "%MVNW_REPOURL%" == "" (
SET DOWNLOAD_URL="%MVNW_REPOURL%/io/takari/maven-wrapper/0.5.6/maven-wrapper-0.5.6.jar"
)
if "%MVNW_VERBOSE%" == "true" (
echo Couldn't find %WRAPPER_JAR%, downloading it ...
echo Downloading from: %DOWNLOAD_URL%
)
powershell -Command "&{"^
"$webclient = new-object System.Net.WebClient;"^
"if (-not ([string]::IsNullOrEmpty('%MVNW_USERNAME%') -and [string]::IsNullOrEmpty('%MVNW_PASSWORD%'))) {"^
"$webclient.Credentials = new-object System.Net.NetworkCredential('%MVNW_USERNAME%', '%MVNW_PASSWORD%');"^
"}"^
"[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; $webclient.DownloadFile('%DOWNLOAD_URL%', '%WRAPPER_JAR%')"^
"}"
if "%MVNW_VERBOSE%" == "true" (
echo Finished downloading %WRAPPER_JAR%
)
)
@REM End of extension
@REM Provide a "standardized" way to retrieve the CLI args that will
@REM work with both Windows and non-Windows executions.
set MAVEN_CMD_LINE_ARGS=%*
%MAVEN_JAVA_EXE% %JVM_CONFIG_MAVEN_PROPS% %MAVEN_OPTS% %MAVEN_DEBUG_OPTS% -classpath %WRAPPER_JAR% "-Dmaven.multiModuleProjectDirectory=%MAVEN_PROJECTBASEDIR%" %WRAPPER_LAUNCHER% %MAVEN_CONFIG% %*
if ERRORLEVEL 1 goto error
goto end
:error
set ERROR_CODE=1
:end
@endlocal & set ERROR_CODE=%ERROR_CODE%
if not "%MAVEN_SKIP_RC%" == "" goto skipRcPost
@REM check for post script, once with legacy .bat ending and once with .cmd ending
if exist "%HOME%\mavenrc_post.bat" call "%HOME%\mavenrc_post.bat"
if exist "%HOME%\mavenrc_post.cmd" call "%HOME%\mavenrc_post.cmd"
:skipRcPost
@REM pause the script if MAVEN_BATCH_PAUSE is set to 'on'
if "%MAVEN_BATCH_PAUSE%" == "on" pause
if "%MAVEN_TERMINATE_CMD%" == "on" exit %ERROR_CODE%
exit /B %ERROR_CODE%
<?xml version="1.0" encoding="UTF-8"?>
<module org.jetbrains.idea.maven.project.MavenProjectsManager.isMavenModule="true" type="JAVA_MODULE" version="4">
<component name="FacetManager">
<facet type="Spring" name="Spring">
<configuration />
</facet>
<facet type="web" name="Web">
<configuration>
<webroots />
<sourceRoots>
<root url="file://$MODULE_DIR$/src/main/java" />
<root url="file://$MODULE_DIR$/src/main/resources" />
</sourceRoots>
</configuration>
</facet>
</component>
<component name="NewModuleRootManager" LANGUAGE_LEVEL="JDK_1_8">
<output url="file://$MODULE_DIR$/target/classes" />
<output-test url="file://$MODULE_DIR$/target/test-classes" />
<content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/src/main/java" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/src/main/resources" type="java-resource" />
<sourceFolder url="file://$MODULE_DIR$/src/test/java" isTestSource="true" />
<excludeFolder url="file://$MODULE_DIR$/target" />
</content>
<orderEntry type="jdk" jdkName="1.8" jdkType="JavaSDK" />
<orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="library" name="Maven: org.springframework.boot:spring-boot-starter-web:2.3.4.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework.boot:spring-boot-starter:2.3.4.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework.boot:spring-boot:2.3.4.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework.boot:spring-boot-autoconfigure:2.3.4.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework.boot:spring-boot-starter-logging:2.3.4.RELEASE" level="project" />
<orderEntry type="library" name="Maven: ch.qos.logback:logback-classic:1.2.3" level="project" />
<orderEntry type="library" name="Maven: ch.qos.logback:logback-core:1.2.3" level="project" />
<orderEntry type="library" name="Maven: org.apache.logging.log4j:log4j-to-slf4j:2.13.3" level="project" />
<orderEntry type="library" name="Maven: org.apache.logging.log4j:log4j-api:2.13.3" level="project" />
<orderEntry type="library" name="Maven: org.slf4j:jul-to-slf4j:1.7.30" level="project" />
<orderEntry type="library" name="Maven: jakarta.annotation:jakarta.annotation-api:1.3.5" level="project" />
<orderEntry type="library" name="Maven: org.yaml:snakeyaml:1.26" level="project" />
<orderEntry type="library" name="Maven: org.springframework.boot:spring-boot-starter-json:2.3.4.RELEASE" level="project" />
<orderEntry type="library" name="Maven: com.fasterxml.jackson.core:jackson-databind:2.11.2" level="project" />
<orderEntry type="library" name="Maven: com.fasterxml.jackson.core:jackson-annotations:2.11.2" level="project" />
<orderEntry type="library" name="Maven: com.fasterxml.jackson.core:jackson-core:2.11.2" level="project" />
<orderEntry type="library" name="Maven: com.fasterxml.jackson.datatype:jackson-datatype-jdk8:2.11.2" level="project" />
<orderEntry type="library" name="Maven: com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.11.2" level="project" />
<orderEntry type="library" name="Maven: com.fasterxml.jackson.module:jackson-module-parameter-names:2.11.2" level="project" />
<orderEntry type="library" name="Maven: org.springframework.boot:spring-boot-starter-tomcat:2.3.4.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.apache.tomcat.embed:tomcat-embed-core:9.0.38" level="project" />
<orderEntry type="library" name="Maven: org.glassfish:jakarta.el:3.0.3" level="project" />
<orderEntry type="library" name="Maven: org.apache.tomcat.embed:tomcat-embed-websocket:9.0.38" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-web:5.2.9.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-beans:5.2.9.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-webmvc:5.2.9.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-aop:5.2.9.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-context:5.2.9.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-expression:5.2.9.RELEASE" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.springframework.boot:spring-boot-starter-test:2.3.4.RELEASE" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.springframework.boot:spring-boot-test:2.3.4.RELEASE" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.springframework.boot:spring-boot-test-autoconfigure:2.3.4.RELEASE" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: com.jayway.jsonpath:json-path:2.4.0" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: net.minidev:json-smart:2.3" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: net.minidev:accessors-smart:1.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.ow2.asm:asm:5.0.4" level="project" />
<orderEntry type="library" name="Maven: org.slf4j:slf4j-api:1.7.30" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: jakarta.xml.bind:jakarta.xml.bind-api:2.3.3" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: jakarta.activation:jakarta.activation-api:1.2.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.assertj:assertj-core:3.16.1" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.hamcrest:hamcrest:2.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.junit.jupiter:junit-jupiter:5.6.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.junit.jupiter:junit-jupiter-api:5.6.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.apiguardian:apiguardian-api:1.1.0" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.opentest4j:opentest4j:1.2.0" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.junit.platform:junit-platform-commons:1.6.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.junit.jupiter:junit-jupiter-params:5.6.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.junit.jupiter:junit-jupiter-engine:5.6.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.junit.platform:junit-platform-engine:1.6.2" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.mockito:mockito-core:3.3.3" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: net.bytebuddy:byte-buddy:1.10.14" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: net.bytebuddy:byte-buddy-agent:1.10.14" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.objenesis:objenesis:2.6" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.mockito:mockito-junit-jupiter:3.3.3" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.skyscreamer:jsonassert:1.5.0" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-core:5.2.9.RELEASE" level="project" />
<orderEntry type="library" name="Maven: org.springframework:spring-jcl:5.2.9.RELEASE" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.springframework:spring-test:5.2.9.RELEASE" level="project" />
<orderEntry type="library" scope="TEST" name="Maven: org.xmlunit:xmlunit-core:2.7.0" level="project" />
<orderEntry type="library" name="Maven: org.apache.httpcomponents:httpclient:4.5.12" level="project" />
<orderEntry type="library" name="Maven: org.apache.httpcomponents:httpcore:4.4.13" level="project" />
<orderEntry type="library" name="Maven: commons-codec:commons-codec:1.14" level="project" />
<orderEntry type="library" name="Maven: com.vaadin.external.google:android-json:0.0.20131108.vaadin1" level="project" />
<orderEntry type="library" name="Maven: com.google.code.gson:gson:2.8.6" level="project" />
</component>
</module>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.3.4.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.paddelOcr_springBoot</groupId>
<artifactId>demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>demo</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<!--引入RestTemplate-->
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
<dependency>
<groupId>com.vaadin.external.google</groupId>
<artifactId>android-json</artifactId>
<version>0.0.20131108.vaadin1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.6</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
简体中文
- 使用本教程前请先基于PaddleHub Serving的部署.
# 基于PaddleHub Serving的Java SpringBoot调用
paddleOcrSpringBoot服务部署目录下包括全部SpringBoot代码。目录结构如下:
```
deploy/paddleOcrSpringBoot/
└─ src 代码文件
└─ main 主函数代码
└─ java\com\paddelocr_springboot\demo
└─ DemoApplication.java SpringBoot启动代码
└─ Controller
└─ OCR.java 控制器代码
└─ test 测试代码
```
- Hub Serving启动后的APi端口如下:
`http://[ip_address]:[port]/predict/[module_name]`
## 返回结果格式说明
返回结果为列表(list),列表中的每一项为词典(dict),词典一共可能包含3种字段,信息如下:
|字段名称|数据类型|意义|
|-|-|-|
|text|str|文本内容|
|confidence|float| 文本识别置信度|
|text_region|list|文本位置坐标|
不同模块返回的字段不同,如,文本识别服务模块返回结果不含`text_region`字段,具体信息如下:
|字段名/模块名|ocr_det|ocr_rec|ocr_system|
|-|-|-|-|
|text||✔|✔|
|confidence||✔|✔|
|text_region|✔||✔|
\ No newline at end of file
package com.paddelocr_springboot.demo.Controller;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import com.google.gson.Gson;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import org.json.JSONObject;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.util.ResourceUtils;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.http.*;
import org.springframework.util.MultiValueMap;
import org.springframework.web.client.RestTemplate;
import sun.misc.BASE64Encoder;
import java.util.Objects;
@RestController
public class OCR {
@RequestMapping("/")
public ResponseEntity<String> hi(){
//创建请求头
HttpHeaders headers = new HttpHeaders();
//设置请求头格式
headers.setContentType(MediaType.APPLICATION_JSON);
//读入静态资源文件1.png
InputStream imagePath = this.getClass().getResourceAsStream("/1.png");
//构建请求参数
MultiValueMap<String, String> map= new LinkedMultiValueMap<String, String>();
//添加请求参数images,并将Base64编码的图片传入
map.add("images", ImageToBase64(imagePath));
//构建请求
HttpEntity<MultiValueMap<String, String>> request = new HttpEntity<MultiValueMap<String, String>>(map, headers);
RestTemplate restTemplate = new RestTemplate();
//发送请求
ResponseEntity<String> response = restTemplate.postForEntity("http://127.0.0.1:8866/predict/ocr_system", request, String.class);
//打印请求返回值
return response;
}
private String ImageToBase64(InputStream imgPath) {
byte[] data = null;
// 读取图片字节数组
try {
InputStream in = imgPath;
System.out.println(imgPath);
data = new byte[in.available()];
in.read(data);
in.close();
} catch (IOException e) {
e.printStackTrace();
}
// 对字节数组Base64编码
BASE64Encoder encoder = new BASE64Encoder();
// 返回Base64编码过的字节数组字符串
//System.out.println("图片转换Base64:" + encoder.encode(Objects.requireNonNull(data)));
return encoder.encode(Objects.requireNonNull(data));
}
}
package com.paddelocr_springboot.demo;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
}
server.port=8081
http_pool.max_total: 200
http_pool.default_max_per_route: 100
http_pool.connect_timeout: 5000
http_pool.connection_request_timeout: 1000
http_pool.socket_timeout: 65000
http_pool.validate_after_inactivity: 2000
\ No newline at end of file
package com.paddelocr_springboot.demo;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
@SpringBootTest
class DemoApplicationTests {
@Test
void contextLoads() {
}
}
...@@ -13,65 +13,115 @@ ...@@ -13,65 +13,115 @@
# limitations under the License. # limitations under the License.
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2 import cv2
import sys import sys
import numpy as np import numpy as np
import os import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
if sys.argv[1] == 'gpu':
from paddle_serving_server_gpu.web_service import WebService
elif sys.argv[1] == 'cpu':
from paddle_serving_server.web_service import WebService
import time import time
import re import re
import base64 import base64
from tools.infer.predict_cls import TextClassifier
from params import read_params
global_args = read_params()
if global_args.use_gpu:
from paddle_serving_server_gpu.web_service import WebService
else:
from paddle_serving_server.web_service import WebService
class TextClassifierHelper(TextClassifier):
def __init__(self, args):
self.cls_image_shape = [int(v) for v in args.cls_image_shape.split(",")]
self.cls_batch_num = args.rec_batch_num
self.label_list = args.label_list
self.cls_thresh = args.cls_thresh
self.fetch = [
"save_infer_model/scale_0.tmp_0", "save_infer_model/scale_1.tmp_0"
]
def preprocess(self, img_list):
args = {}
img_num = len(img_list)
args["img_list"] = img_list
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
# Sorting can speed up the cls process
indices = np.argsort(np.array(width_list))
args["indices"] = indices
cls_res = [['', 0.0]] * img_num
batch_num = self.cls_batch_num
predict_time = 0
beg_img_no, end_img_no = 0, img_num
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img(img_list[indices[ino]])
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch)
feed = {"image": norm_img_batch.copy()}
return feed, self.fetch, args
def postprocess(self, outputs, args):
prob_out = outputs[0]
label_out = outputs[1]
indices = args["indices"]
img_list = args["img_list"]
cls_res = [['', 0.0]] * len(label_out)
if len(label_out.shape) != 1:
prob_out, label_out = label_out, prob_out
for rno in range(len(label_out)):
label_idx = label_out[rno]
score = prob_out[rno][label_idx]
label = self.label_list[label_idx]
cls_res[indices[rno]] = [label, score]
if '180' in label and score > self.cls_thresh:
img_list[indices[rno]] = cv2.rotate(img_list[indices[rno]], 1)
return img_list, cls_res
class OCRService(WebService): class OCRService(WebService):
def init_rec(self): def init_rec(self):
self.ocr_reader = OCRReader() self.text_classifier = TextClassifierHelper(global_args)
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
# TODO: to handle batch rec images
img_list = [] img_list = []
for feed_data in feed: for feed_data in feed:
data = base64.b64decode(feed_data["image"].encode('utf8')) data = base64.b64decode(feed_data["image"].encode('utf8'))
data = np.fromstring(data, np.uint8) data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR) im = cv2.imdecode(data, cv2.IMREAD_COLOR)
img_list.append(im) img_list.append(im)
feed_list = [] feed, fetch, self.tmp_args = self.text_classifier.preprocess(img_list)
max_wh_ratio = 0 return feed, fetch
for i, boximg in enumerate(img_list):
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for img in img_list:
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
feed = {"image": norm_img}
feed_list.append(feed)
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed_list, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None): def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) outputs = [fetch_map[x] for x in self.text_classifier.fetch]
res_lst = [] for x in fetch_map.keys():
for res in rec_res: if ".lod" in x:
res_lst.append(res[0]) self.tmp_args[x] = fetch_map[x]
res = {"res": res_lst} _, rec_res = self.text_classifier.postprocess(outputs, self.tmp_args)
res = {
"pred_text": [x[0] for x in rec_res],
"score": [str(x[1]) for x in rec_res]
}
return res return res
ocr_service = OCRService(name="ocr") if __name__ == "__main__":
ocr_service.load_model_config("ocr_rec_model") ocr_service = OCRService(name="ocr")
ocr_service.init_rec() ocr_service.load_model_config(global_args.cls_model_dir)
if sys.argv[1] == 'gpu': ocr_service.init_rec()
ocr_service.set_gpus("0") if global_args.use_gpu:
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) ocr_service.prepare_server(
elif sys.argv[1] == 'cpu': workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") else:
ocr_service.run_rpc_service() ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_web_service() ocr_service.run_debugger_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
import cv2
import sys
import numpy as np
import os
import time
import re
import base64
from tools.infer.predict_cls import TextClassifier
from params import read_params
global_args = read_params()
if global_args.use_gpu:
from paddle_serving_server_gpu.web_service import WebService
else:
from paddle_serving_server.web_service import WebService
class TextClassifierHelper(TextClassifier):
def __init__(self, args):
self.cls_image_shape = [int(v) for v in args.cls_image_shape.split(",")]
self.cls_batch_num = args.rec_batch_num
self.label_list = args.label_list
self.cls_thresh = args.cls_thresh
self.fetch = [
"save_infer_model/scale_0.tmp_0", "save_infer_model/scale_1.tmp_0"
]
def preprocess(self, img_list):
args = {}
img_num = len(img_list)
args["img_list"] = img_list
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
# Sorting can speed up the cls process
indices = np.argsort(np.array(width_list))
args["indices"] = indices
cls_res = [['', 0.0]] * img_num
batch_num = self.cls_batch_num
predict_time = 0
beg_img_no, end_img_no = 0, img_num
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img(img_list[indices[ino]])
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch)
if img_num > 1:
feed = [{
"image": norm_img_batch[x]
} for x in range(norm_img_batch.shape[0])]
else:
feed = {"image": norm_img_batch[0]}
return feed, self.fetch, args
def postprocess(self, outputs, args):
prob_out = outputs[0]
label_out = outputs[1]
indices = args["indices"]
img_list = args["img_list"]
cls_res = [['', 0.0]] * len(label_out)
if len(label_out.shape) != 1:
prob_out, label_out = label_out, prob_out
for rno in range(len(label_out)):
label_idx = label_out[rno]
score = prob_out[rno][label_idx]
label = self.label_list[label_idx]
cls_res[indices[rno]] = [label, score]
if '180' in label and score > self.cls_thresh:
img_list[indices[rno]] = cv2.rotate(img_list[indices[rno]], 1)
return img_list, cls_res
class OCRService(WebService):
def init_rec(self):
self.text_classifier = TextClassifierHelper(global_args)
def preprocess(self, feed=[], fetch=[]):
# TODO: to handle batch rec images
img_list = []
for feed_data in feed:
data = base64.b64decode(feed_data["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
img_list.append(im)
feed, fetch, self.tmp_args = self.text_classifier.preprocess(img_list)
return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
outputs = [fetch_map[x] for x in self.text_classifier.fetch]
for x in fetch_map.keys():
if ".lod" in x:
self.tmp_args[x] = fetch_map[x]
_, rec_res = self.text_classifier.postprocess(outputs, self.tmp_args)
res = {
"direction": [x[0] for x in rec_res],
"score": [str(x[1]) for x in rec_res]
}
return res
if __name__ == "__main__":
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config(global_args.cls_model_dir)
ocr_service.init_rec()
if global_args.use_gpu:
ocr_service.prepare_server(
workdir="workdir", port=9292, device="gpu", gpuid=0)
else:
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_rpc_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
import requests
import json
import cv2
import base64
import os, sys
import time
def cv2_to_base64(image):
#data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(image).decode(
'utf8') #data.tostring()).decode('utf8')
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:9292/ocr/prediction"
test_img_dir = "../../doc/imgs_words/ch/"
for img_file in os.listdir(test_img_dir):
with open(os.path.join(test_img_dir, img_file), 'rb') as file:
image_data1 = file.read()
image = cv2_to_base64(image_data1)
data = {"feed": [{"image": image}], "fetch": ["res"]}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
...@@ -17,63 +17,91 @@ import cv2 ...@@ -17,63 +17,91 @@ import cv2
import sys import sys
import numpy as np import numpy as np
import os import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes
if sys.argv[1] == 'gpu':
from paddle_serving_server_gpu.web_service import WebService
elif sys.argv[1] == 'cpu':
from paddle_serving_server.web_service import WebService
import time import time
import re import re
import base64 import base64
from tools.infer.predict_det import TextDetector
from params import read_params
global_args = read_params()
if global_args.use_gpu:
from paddle_serving_server_gpu.web_service import WebService
else:
from paddle_serving_server.web_service import WebService
class TextDetectorHelper(TextDetector):
def __init__(self, args):
super(TextDetectorHelper, self).__init__(args)
if self.det_algorithm == "SAST":
self.fetch = [
"bn_f_border4.output.tmp_2", "bn_f_tco4.output.tmp_2",
"bn_f_tvo4.output.tmp_2", "sigmoid_0.tmp_0"
]
elif self.det_algorithm == "EAST":
self.fetch = ["sigmoid_0.tmp_0", "tmp_2"]
elif self.det_algorithm == "DB":
self.fetch = ["save_infer_model/scale_0.tmp_0"]
def preprocess(self, img):
img = img.copy()
im, ratio_list = self.preprocess_op(img)
if im is None:
return None, 0
return {
"image": im.copy()
}, self.fetch, {
"ratio_list": [ratio_list],
"ori_im": img
}
def postprocess(self, outputs, args):
outs_dict = {}
if self.det_algorithm == "EAST":
outs_dict['f_geo'] = outputs[0]
outs_dict['f_score'] = outputs[1]
elif self.det_algorithm == 'SAST':
outs_dict['f_border'] = outputs[0]
outs_dict['f_score'] = outputs[1]
outs_dict['f_tco'] = outputs[2]
outs_dict['f_tvo'] = outputs[3]
else:
outs_dict['maps'] = outputs[0]
dt_boxes_list = self.postprocess_op(outs_dict, args["ratio_list"])
dt_boxes = dt_boxes_list[0]
if self.det_algorithm == "SAST" and self.det_sast_polygon:
dt_boxes = self.filter_tag_det_res_only_clip(dt_boxes,
args["ori_im"].shape)
else:
dt_boxes = self.filter_tag_det_res(dt_boxes, args["ori_im"].shape)
return dt_boxes
class OCRService(WebService): class DetService(WebService):
def init_det(self): def init_det(self):
self.det_preprocess = Sequential([ self.text_detector = TextDetectorHelper(global_args)
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
self.filter_func = FilterBoxes(10, 10)
self.post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
data = base64.b64decode(feed[0]["image"].encode('utf8')) data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8) data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR) im = cv2.imdecode(data, cv2.IMREAD_COLOR)
self.ori_h, self.ori_w, _ = im.shape feed, fetch, self.tmp_args = self.text_detector.preprocess(im)
det_img = self.det_preprocess(im) return feed, fetch
_, self.new_h, self.new_w = det_img.shape
return {"image": det_img[np.newaxis, :].copy()}, ["concat_1.tmp_0"]
def postprocess(self, feed={}, fetch=[], fetch_map=None): def postprocess(self, feed={}, fetch=[], fetch_map=None):
det_out = fetch_map["concat_1.tmp_0"] outputs = [fetch_map[x] for x in fetch]
ratio_list = [ res = self.text_detector.postprocess(outputs, self.tmp_args)
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w return {"boxes": res.tolist()}
]
dt_boxes_list = self.post_func(det_out, [ratio_list])
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
return {"dt_boxes": dt_boxes.tolist()}
ocr_service = OCRService(name="ocr") if __name__ == "__main__":
ocr_service.load_model_config("ocr_det_model") ocr_service = DetService(name="ocr")
ocr_service.init_det() ocr_service.load_model_config(global_args.det_model_dir)
if sys.argv[1] == 'gpu': ocr_service.init_det()
ocr_service.set_gpus("0") if global_args.use_gpu:
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) ocr_service.prepare_server(
ocr_service.run_debugger_service(gpu=True) workdir="workdir", port=9292, device="gpu", gpuid=0)
elif sys.argv[1] == 'cpu': else:
ocr_service.prepare_server(workdir="workdir", port=9292) ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_debugger_service() ocr_service.run_debugger_service()
ocr_service.init_det() ocr_service.run_web_service()
ocr_service.run_web_service()
...@@ -17,62 +17,90 @@ import cv2 ...@@ -17,62 +17,90 @@ import cv2
import sys import sys
import numpy as np import numpy as np
import os import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes
if sys.argv[1] == 'gpu':
from paddle_serving_server_gpu.web_service import WebService
elif sys.argv[1] == 'cpu':
from paddle_serving_server.web_service import WebService
import time import time
import re import re
import base64 import base64
from tools.infer.predict_det import TextDetector
from params import read_params
global_args = read_params()
if global_args.use_gpu:
from paddle_serving_server_gpu.web_service import WebService
else:
from paddle_serving_server.web_service import WebService
class TextDetectorHelper(TextDetector):
def __init__(self, args):
super(TextDetectorHelper, self).__init__(args)
if self.det_algorithm == "SAST":
self.fetch = [
"bn_f_border4.output.tmp_2", "bn_f_tco4.output.tmp_2",
"bn_f_tvo4.output.tmp_2", "sigmoid_0.tmp_0"
]
elif self.det_algorithm == "EAST":
self.fetch = ["sigmoid_0.tmp_0", "tmp_2"]
elif self.det_algorithm == "DB":
self.fetch = ["save_infer_model/scale_0.tmp_0"]
def preprocess(self, img):
im, ratio_list = self.preprocess_op(img)
if im is None:
return None, 0
return {
"image": im[0]
}, self.fetch, {
"ratio_list": [ratio_list],
"ori_im": img
}
def postprocess(self, outputs, args):
outs_dict = {}
if self.det_algorithm == "EAST":
outs_dict['f_geo'] = outputs[0]
outs_dict['f_score'] = outputs[1]
elif self.det_algorithm == 'SAST':
outs_dict['f_border'] = outputs[0]
outs_dict['f_score'] = outputs[1]
outs_dict['f_tco'] = outputs[2]
outs_dict['f_tvo'] = outputs[3]
else:
outs_dict['maps'] = outputs[0]
dt_boxes_list = self.postprocess_op(outs_dict, args["ratio_list"])
dt_boxes = dt_boxes_list[0]
if self.det_algorithm == "SAST" and self.det_sast_polygon:
dt_boxes = self.filter_tag_det_res_only_clip(dt_boxes,
args["ori_im"].shape)
else:
dt_boxes = self.filter_tag_det_res(dt_boxes, args["ori_im"].shape)
return dt_boxes
class OCRService(WebService): class DetService(WebService):
def init_det(self): def init_det(self):
self.det_preprocess = Sequential([ self.text_detector = TextDetectorHelper(global_args)
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
self.filter_func = FilterBoxes(10, 10)
self.post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
data = base64.b64decode(feed[0]["image"].encode('utf8')) data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8) data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR) im = cv2.imdecode(data, cv2.IMREAD_COLOR)
self.ori_h, self.ori_w, _ = im.shape feed, fetch, self.tmp_args = self.text_detector.preprocess(im)
det_img = self.det_preprocess(im) return feed, fetch
_, self.new_h, self.new_w = det_img.shape
print(det_img)
return {"image": det_img}, ["concat_1.tmp_0"]
def postprocess(self, feed={}, fetch=[], fetch_map=None): def postprocess(self, feed={}, fetch=[], fetch_map=None):
det_out = fetch_map["concat_1.tmp_0"] outputs = [fetch_map[x] for x in fetch]
ratio_list = [ res = self.text_detector.postprocess(outputs, self.tmp_args)
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w return {"boxes": res.tolist()}
]
dt_boxes_list = self.post_func(det_out, [ratio_list])
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
return {"dt_boxes": dt_boxes.tolist()}
ocr_service = OCRService(name="ocr") if __name__ == "__main__":
ocr_service.load_model_config("ocr_det_model") ocr_service = DetService(name="ocr")
if sys.argv[1] == 'gpu': ocr_service.load_model_config(global_args.det_model_dir)
ocr_service.set_gpus("0") ocr_service.init_det()
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) if global_args.use_gpu:
elif sys.argv[1] == 'cpu': ocr_service.prepare_server(
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.init_det() else:
ocr_service.run_rpc_service() ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_web_service() ocr_service.run_rpc_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
import requests
import json
import cv2
import base64
import os, sys
import time
def cv2_to_base64(image):
#data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(image).decode(
'utf8') #data.tostring()).decode('utf8')
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:9292/ocr/prediction"
test_img_dir = "../../doc/imgs/"
for img_file in os.listdir(test_img_dir):
with open(os.path.join(test_img_dir, img_file), 'rb') as file:
image_data1 = file.read()
image = cv2_to_base64(image_data1)
data = {"feed": [{"image": image}], "fetch": ["res"]}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
rjson = r.json()
print(rjson)
...@@ -13,102 +13,108 @@ ...@@ -13,102 +13,108 @@
# limitations under the License. # limitations under the License.
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2 import cv2
import sys import sys
import numpy as np import numpy as np
import os import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
if sys.argv[1] == 'gpu':
from paddle_serving_server_gpu.web_service import WebService
elif sys.argv[1] == 'cpu':
from paddle_serving_server.web_service import WebService
from paddle_serving_app.local_predict import Debugger
import time import time
import re import re
import base64 import base64
from clas_local_server import TextClassifierHelper
from det_local_server import TextDetectorHelper
from rec_local_server import TextRecognizerHelper
from tools.infer.predict_system import TextSystem, sorted_boxes
from paddle_serving_app.local_predict import Debugger
import copy
from params import read_params
global_args = read_params()
class OCRService(WebService): if global_args.use_gpu:
def init_det_debugger(self, det_model_config): from paddle_serving_server_gpu.web_service import WebService
self.det_preprocess = Sequential([ else:
ResizeByFactor(32, 960), Div(255), from paddle_serving_server.web_service import WebService
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
]) class TextSystemHelper(TextSystem):
def __init__(self, args):
self.text_detector = TextDetectorHelper(args)
self.text_recognizer = TextRecognizerHelper(args)
self.use_angle_cls = args.use_angle_cls
if self.use_angle_cls:
self.clas_client = Debugger()
self.clas_client.load_model_config(
global_args.cls_model_dir, gpu=True, profile=False)
self.text_classifier = TextClassifierHelper(args)
self.det_client = Debugger() self.det_client = Debugger()
if sys.argv[1] == 'gpu': self.det_client.load_model_config(
self.det_client.load_model_config( global_args.det_model_dir, gpu=True, profile=False)
det_model_config, gpu=True, profile=False) self.fetch = ["save_infer_model/scale_0.tmp_0", "save_infer_model/scale_1.tmp_0"]
elif sys.argv[1] == 'cpu':
self.det_client.load_model_config( def preprocess(self, img):
det_model_config, gpu=False, profile=False) feed, fetch, self.tmp_args = self.text_detector.preprocess(img)
self.ocr_reader = OCRReader() fetch_map = self.det_client.predict(feed, fetch)
outputs = [fetch_map[x] for x in fetch]
dt_boxes = self.text_detector.postprocess(outputs, self.tmp_args)
if dt_boxes is None:
return None, None
img_crop_list = []
dt_boxes = sorted_boxes(dt_boxes)
for bno in range(len(dt_boxes)):
tmp_box = copy.deepcopy(dt_boxes[bno])
img_crop = self.get_rotate_crop_image(img, tmp_box)
img_crop_list.append(img_crop)
if self.use_angle_cls:
feed, fetch, self.tmp_args = self.text_classifier.preprocess(
img_crop_list)
fetch_map = self.clas_client.predict(feed, fetch)
outputs = [fetch_map[x] for x in self.text_classifier.fetch]
for x in fetch_map.keys():
if ".lod" in x:
self.tmp_args[x] = fetch_map[x]
img_crop_list, _ = self.text_classifier.postprocess(outputs,
self.tmp_args)
feed, fetch, self.tmp_args = self.text_recognizer.preprocess(
img_crop_list)
return feed, self.fetch, self.tmp_args
def postprocess(self, outputs, args):
return self.text_recognizer.postprocess(outputs, args)
class OCRService(WebService):
def init_rec(self):
self.text_system = TextSystemHelper(global_args)
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
# TODO: to handle batch rec images
data = base64.b64decode(feed[0]["image"].encode('utf8')) data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8) data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR) im = cv2.imdecode(data, cv2.IMREAD_COLOR)
ori_h, ori_w, _ = im.shape feed, fetch, self.tmp_args = self.text_system.preprocess(im)
det_img = self.det_preprocess(im)
_, new_h, new_w = det_img.shape
det_img = det_img[np.newaxis, :]
det_img = det_img.copy()
det_out = self.det_client.predict(
feed={"image": det_img}, fetch=["concat_1.tmp_0"])
filter_func = FilterBoxes(10, 10)
post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
sorted_boxes = SortedBoxes()
ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list])
dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
dt_boxes = sorted_boxes(dt_boxes)
get_rotate_crop_image = GetRotateCropImage()
img_list = []
max_wh_ratio = 0
for i, dtbox in enumerate(dt_boxes):
boximg = get_rotate_crop_image(im, dt_boxes[i])
img_list.append(boximg)
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
if len(img_list) == 0:
return [], []
_, w, h = self.ocr_reader.resize_norm_img(img_list[0],
max_wh_ratio).shape
imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
for id, img in enumerate(img_list):
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
imgs[id] = norm_img
feed = {"image": imgs.copy()}
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed, fetch return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None): def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) outputs = [fetch_map[x] for x in self.text_system.fetch]
res_lst = [] for x in fetch_map.keys():
for res in rec_res: if ".lod" in x:
res_lst.append(res[0]) self.tmp_args[x] = fetch_map[x]
res = {"res": res_lst} rec_res = self.text_system.postprocess(outputs, self.tmp_args)
res = {
"pred_text": [x[0] for x in rec_res],
"score": [str(x[1]) for x in rec_res]
}
return res return res
ocr_service = OCRService(name="ocr") if __name__ == "__main__":
ocr_service.load_model_config("ocr_rec_model") ocr_service = OCRService(name="ocr")
ocr_service.init_det_debugger(det_model_config="ocr_det_model") ocr_service.load_model_config(global_args.rec_model_dir)
if sys.argv[1] == 'gpu': ocr_service.init_rec()
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) if global_args.use_gpu:
ocr_service.run_debugger_service(gpu=True) ocr_service.prepare_server(
elif sys.argv[1] == 'cpu': workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") else:
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_debugger_service() ocr_service.run_debugger_service()
ocr_service.run_web_service() ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
import cv2
import sys
import numpy as np
import os
import time
import re
import base64
from clas_rpc_server import TextClassifierHelper
from det_rpc_server import TextDetectorHelper
from rec_rpc_server import TextRecognizerHelper
from tools.infer.predict_system import TextSystem, sorted_boxes
import copy
from params import read_params
global_args = read_params()
if global_args.use_gpu:
from paddle_serving_server_gpu.web_service import WebService
else:
from paddle_serving_server.web_service import WebService
class TextSystemHelper(TextSystem):
def __init__(self, args):
self.text_detector = TextDetectorHelper(args)
self.text_recognizer = TextRecognizerHelper(args)
self.use_angle_cls = args.use_angle_cls
if self.use_angle_cls:
self.clas_client = Client()
self.clas_client.load_client_config(
"cls_infer_client/serving_client_conf.prototxt")
self.clas_client.connect(["127.0.0.1:9294"])
self.text_classifier = TextClassifierHelper(args)
self.det_client = Client()
self.det_client.load_client_config(
"det_infer_client/serving_client_conf.prototxt")
self.det_client.connect(["127.0.0.1:9293"])
self.fetch = ["save_infer_model/scale_0.tmp_0", "save_infer_model/scale_1.tmp_0"]
def preprocess(self, img):
feed, fetch, self.tmp_args = self.text_detector.preprocess(img)
fetch_map = self.det_client.predict(feed, fetch)
outputs = [fetch_map[x] for x in fetch]
dt_boxes = self.text_detector.postprocess(outputs, self.tmp_args)
print(dt_boxes)
if dt_boxes is None:
return None, None
img_crop_list = []
dt_boxes = sorted_boxes(dt_boxes)
for bno in range(len(dt_boxes)):
tmp_box = copy.deepcopy(dt_boxes[bno])
img_crop = self.get_rotate_crop_image(img, tmp_box)
img_crop_list.append(img_crop)
if self.use_angle_cls:
feed, fetch, self.tmp_args = self.text_classifier.preprocess(
img_crop_list)
fetch_map = self.clas_client.predict(feed, fetch)
print(fetch_map)
outputs = [fetch_map[x] for x in self.text_classifier.fetch]
for x in fetch_map.keys():
if ".lod" in x:
self.tmp_args[x] = fetch_map[x]
img_crop_list, _ = self.text_classifier.postprocess(outputs,
self.tmp_args)
feed, fetch, self.tmp_args = self.text_recognizer.preprocess(
img_crop_list)
return feed, self.fetch, self.tmp_args
def postprocess(self, outputs, args):
return self.text_recognizer.postprocess(outputs, args)
class OCRService(WebService):
def init_rec(self):
self.text_system = TextSystemHelper(global_args)
def preprocess(self, feed=[], fetch=[]):
# TODO: to handle batch rec images
data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
feed, fetch, self.tmp_args = self.text_system.preprocess(im)
return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
outputs = [fetch_map[x] for x in self.text_system.fetch]
for x in fetch_map.keys():
if ".lod" in x:
self.tmp_args[x] = fetch_map[x]
rec_res = self.text_system.postprocess(outputs, self.tmp_args)
res = {
"pred_text": [x[0] for x in rec_res],
"score": [str(x[1]) for x in rec_res]
}
return res
if __name__ == "__main__":
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config(global_args.rec_model_dir)
ocr_service.init_rec()
if global_args.use_gpu:
ocr_service.prepare_server(
workdir="workdir", port=9292, device="gpu", gpuid=0)
else:
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_rpc_service()
ocr_service.run_web_service()
...@@ -20,11 +20,13 @@ import base64 ...@@ -20,11 +20,13 @@ import base64
import os, sys import os, sys
import time import time
def cv2_to_base64(image): def cv2_to_base64(image):
#data = cv2.imencode('.jpg', image)[1] #data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(image).decode( return base64.b64encode(image).decode(
'utf8') #data.tostring()).decode('utf8') 'utf8') #data.tostring()).decode('utf8')
headers = {"Content-type": "application/json"} headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:9292/ocr/prediction" url = "http://127.0.0.1:9292/ocr/prediction"
test_img_dir = "../../doc/imgs/" test_img_dir = "../../doc/imgs/"
...@@ -34,4 +36,5 @@ for img_file in os.listdir(test_img_dir): ...@@ -34,4 +36,5 @@ for img_file in os.listdir(test_img_dir):
image = cv2_to_base64(image_data1) image = cv2_to_base64(image_data1)
data = {"feed": [{"image": image}], "fetch": ["res"]} data = {"feed": [{"image": image}], "fetch": ["res"]}
r = requests.post(url=url, headers=headers, data=json.dumps(data)) r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json()) rjson = r.json()
print(rjson)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2
import sys
import numpy as np
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
if sys.argv[1] == 'gpu':
from paddle_serving_server_gpu.web_service import WebService
elif sys.argv[1] == 'cpu':
from paddle_serving_server.web_service import WebService
import time
import re
import base64
class OCRService(WebService):
def init_det_client(self, det_port, det_client_config):
self.det_preprocess = Sequential([
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
self.det_client = Client()
self.det_client.load_client_config(det_client_config)
self.det_client.connect(["127.0.0.1:{}".format(det_port)])
self.ocr_reader = OCRReader()
def preprocess(self, feed=[], fetch=[]):
data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
ori_h, ori_w, _ = im.shape
det_img = self.det_preprocess(im)
det_out = self.det_client.predict(
feed={"image": det_img}, fetch=["concat_1.tmp_0"])
_, new_h, new_w = det_img.shape
filter_func = FilterBoxes(10, 10)
post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
sorted_boxes = SortedBoxes()
ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list])
dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
dt_boxes = sorted_boxes(dt_boxes)
get_rotate_crop_image = GetRotateCropImage()
feed_list = []
img_list = []
max_wh_ratio = 0
for i, dtbox in enumerate(dt_boxes):
boximg = get_rotate_crop_image(im, dt_boxes[i])
img_list.append(boximg)
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for img in img_list:
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
feed = {"image": norm_img}
feed_list.append(feed)
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed_list, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
res_lst = []
for res in rec_res:
res_lst.append(res[0])
res = {"res": res_lst}
return res
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config("ocr_rec_model")
if sys.argv[1] == 'gpu':
ocr_service.set_gpus("0")
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
elif sys.argv[1] == 'cpu':
ocr_service.prepare_server(workdir="workdir", port=9292)
ocr_service.init_det_client(
det_port=9293,
det_client_config="ocr_det_client/serving_client_conf.prototxt")
ocr_service.run_rpc_service()
ocr_service.run_web_service()
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
class Config(object):
pass
def read_params():
cfg = Config()
#use gpu
cfg.use_gpu = False
cfg.use_pdserving = True
#params for text detector
cfg.det_algorithm = "DB"
cfg.det_model_dir = "./det_infer_server/"
cfg.det_max_side_len = 960
#DB parmas
cfg.det_db_thresh =0.3
cfg.det_db_box_thresh =0.5
cfg.det_db_unclip_ratio =2.0
#EAST parmas
cfg.det_east_score_thresh = 0.8
cfg.det_east_cover_thresh = 0.1
cfg.det_east_nms_thresh = 0.2
#params for text recognizer
cfg.rec_algorithm = "CRNN"
cfg.rec_model_dir = "./rec_infer_server/"
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr_keys_v1.txt"
cfg.use_space_char = True
#params for text classifier
cfg.use_angle_cls = True
cfg.cls_model_dir = "./cls_infer_server/"
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
return cfg
[English](readme_en.md) | 简体中文
PaddleOCR提供2种服务部署方式:
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",使用方法参考[文档](../hubserving/readme.md)
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",按照本教程使用。
# Paddle Serving 服务部署
本教程将介绍基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)部署PaddleOCR在线预测服务的详细步骤。
## 快速启动服务
### 1. 准备环境
我们先安装Paddle Serving相关组件
我们推荐用户使用GPU来做Paddle Serving的OCR服务部署
**CUDA版本:9.0**
**CUDNN版本:7.0**
**操作系统版本:CentOS 6以上**
**Python版本: 2.7/3.6/3.7**
**Python操作指南:**
```
#CPU/GPU版本选择一个
#GPU版本服务端
python -m pip install paddle_serving_server_gpu
#CPU版本服务端
python -m pip install paddle_serving_server
#客户端和App包使用以下链接(CPU,GPU通用)
python -m pip install paddle_serving_app paddle_serving_client
```
### 2. 模型转换
可以使用`paddle_serving_app`提供的模型,执行下列命令
```
python -m paddle_serving_app.package --get_model ocr_rec
tar -xzvf ocr_rec.tar.gz
python -m paddle_serving_app.package --get_model ocr_det
tar -xzvf ocr_det.tar.gz
```
执行上述命令会下载`db_crnn_mobile`的模型,如果想要下载规模更大的`db_crnn_server`模型,可以在下载预测模型并解压之后。参考[如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md)
我们以`ch_rec_r34_vd_crnn`模型作为例子,下载链接在:
```
wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
tar xf ch_rec_r34_vd_crnn_infer.tar
```
因此我们按照Serving模型转换教程,运行下列python文件。
```
from paddle_serving_client.io import inference_model_to_serving
inference_model_dir = "ch_rec_r34_vd_crnn"
serving_client_dir = "serving_client_dir"
serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params")
```
最终会在`serving_client_dir``serving_server_dir`生成客户端和服务端的模型配置。
### 3. 启动服务
启动服务可以根据实际需求选择启动`标准版`或者`快速版`,两种方式的对比如下表:
|版本|特点|适用场景|
|-|-|-|
|标准版|稳定性高,分布式部署|适用于吞吐量大,需要跨机房部署的情况|
|快速版|部署方便,预测速度快|适用于对预测速度要求高,迭代速度快的场景|
#### 方式1. 启动标准版服务
```
# cpu,gpu启动二选一,以下是cpu启动
python -m paddle_serving_server.serve --model ocr_det_model --port 9293
python ocr_web_server.py cpu
# gpu启动
python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0
python ocr_web_server.py gpu
```
#### 方式2. 启动快速版服务
```
# cpu,gpu启动二选一,以下是cpu启动
python ocr_local_server.py cpu
# gpu启动
python ocr_local_server.py gpu
```
## 发送预测请求
```
python ocr_web_client.py
```
## 返回结果格式说明
返回结果是json格式
```
{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}}
```
我们也可以打印结果json串中`res`字段的每一句话
```
土地整治与土壤修复研究中心
华南农业大学1素图
```
## 自定义修改服务逻辑
`ocr_web_server.py`或是`ocr_local_server.py`当中的`preprocess`函数里面做了检测服务和识别服务的前处理,`postprocess`函数里面做了识别的后处理服务,可以在相应的函数中做修改。调用了`paddle_serving_app`库提供的常见CV模型的前处理/后处理库。
如果想要单独启动Paddle Serving的检测服务和识别服务,参见下列表格, 执行对应的脚本即可,并且在命令行参数注明用的CPU或是GPU来提供服务。
| 模型 | 标准版 | 快速版 |
| ---- | ----------------- | ------------------- |
| 检测 | det_web_server.py | det_local_server.py |
| 识别 | rec_web_server.py | rec_local_server.py |
更多信息参见[Paddle Serving](https://github.com/PaddlePaddle/Serving)
English | [简体中文](readme.md)
PaddleOCR provides 2 service deployment methods:
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../hubserving/readme_en.md) for usage.
- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please follow this tutorial.
# Service deployment based on Paddle Serving
This tutorial will introduce the detail steps of deploying PaddleOCR online prediction service based on [Paddle Serving](https://github.com/PaddlePaddle/Serving).
## Quick start service
### 1. Prepare the environment
Let's first install the relevant components of Paddle Serving. GPU is recommended for service deployment with Paddle Serving.
**Requirements:**
- **CUDA version: 9.0**
- **CUDNN version: 7.0**
- **Operating system version: >= CentOS 6**
- **Python version: 2.7/3.6/3.7**
**Installation:**
```
# install GPU server
python -m pip install paddle_serving_server_gpu
# or, install CPU server
python -m pip install paddle_serving_server
# install client and App package (CPU/GPU)
python -m pip install paddle_serving_app paddle_serving_client
```
### 2. Model transformation
You can directly use converted model provided by `paddle_serving_app` for convenience. Execute the following command to obtain:
```
python -m paddle_serving_app.package --get_model ocr_rec
tar -xzvf ocr_rec.tar.gz
python -m paddle_serving_app.package --get_model ocr_det
tar -xzvf ocr_det.tar.gz
```
Executing the above command will download the `db_crnn_mobile` model, which is in different format with inference model. If you want to use other models for deployment, you can refer to the [tutorial](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md) to convert your inference model to a model which is deployable for Paddle Serving.
We take `ch_rec_r34_vd_crnn` model as example. Download the inference model by executing the following command:
```
wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
tar xf ch_rec_r34_vd_crnn_infer.tar
```
Convert the downloaded model by executing the following python script:
```
from paddle_serving_client.io import inference_model_to_serving
inference_model_dir = "ch_rec_r34_vd_crnn"
serving_client_dir = "serving_client_dir"
serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params")
```
Finally, model configuration of client and server will be generated in `serving_client_dir` and `serving_server_dir`.
### 3. Start service
Start the standard version or the fast version service according to your actual needs. The comparison of the two versions is shown in the table below:
|version|characteristics|recommended scenarios|
|-|-|-|
|standard version|High stability, suitable for distributed deployment|Large throughput and cross regional deployment|
|fast version|Easy to deploy and fast to predict|Suitable for scenarios which requires high prediction speed and fast iteration speed|
#### Mode 1. Start the standard mode service
```
# start with CPU
python -m paddle_serving_server.serve --model ocr_det_model --port 9293
python ocr_web_server.py cpu
# or, with GPU
python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0
python ocr_web_server.py gpu
```
#### Mode 2. Start the fast mode service
```
# start with CPU
python ocr_local_server.py cpu
# or, with GPU
python ocr_local_server.py gpu
```
## Send prediction requests
```
python ocr_web_client.py
```
## Returned result format
The returned result is a JSON string, eg.
```
{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}}
```
You can also print the readable result in `res`:
```
土地整治与土壤修复研究中心
华南农业大学1素图
```
## User defined service module modification
The pre-processing and post-processing process, can be found in the `preprocess` and `postprocess` function in `ocr_web_server.py` or `ocr_local_server.py`. The pre-processing/post-processing library for common CV models provided by `paddle_serving_app` is called.
You can modify the corresponding code as actual needs.
If you only want to start the detection service or the recognition service, execute the corresponding script reffering to the following table. Indicate the CPU or GPU is used in the start command parameters.
| task | standard | fast |
| ---- | ----------------- | ------------------- |
| detection | det_web_server.py | det_local_server.py |
| recognition | rec_web_server.py | rec_local_server.py |
More info can be found in [Paddle Serving](https://github.com/PaddlePaddle/Serving).
...@@ -13,67 +13,161 @@ ...@@ -13,67 +13,161 @@
# limitations under the License. # limitations under the License.
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2 import cv2
import sys import sys
import numpy as np import numpy as np
import os import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
if sys.argv[1] == 'gpu':
from paddle_serving_server_gpu.web_service import WebService
elif sys.argv[1] == 'cpu':
from paddle_serving_server.web_service import WebService
import time import time
import re import re
import base64 import base64
from tools.infer.predict_rec import TextRecognizer
from params import read_params
global_args = read_params()
if global_args.use_gpu:
from paddle_serving_server_gpu.web_service import WebService
else:
from paddle_serving_server.web_service import WebService
class TextRecognizerHelper(TextRecognizer):
def __init__(self, args):
super(TextRecognizerHelper, self).__init__(args)
if self.loss_type == "ctc":
self.fetch = ["save_infer_model/scale_0.tmp_0", "save_infer_model/scale_1.tmp_0"]
def preprocess(self, img_list):
img_num = len(img_list)
args = {}
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
indices = np.argsort(np.array(width_list))
args["indices"] = indices
predict_time = 0
beg_img_no = 0
end_img_no = img_num
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
if self.loss_type != "srn":
norm_img = self.resize_norm_img(img_list[indices[ino]],
max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
else:
norm_img = self.process_image_srn(img_list[indices[ino]],
self.rec_image_shape, 8, 25,
self.char_ops)
encoder_word_pos_list = []
gsrm_word_pos_list = []
gsrm_slf_attn_bias1_list = []
gsrm_slf_attn_bias2_list = []
encoder_word_pos_list.append(norm_img[1])
gsrm_word_pos_list.append(norm_img[2])
gsrm_slf_attn_bias1_list.append(norm_img[3])
gsrm_slf_attn_bias2_list.append(norm_img[4])
norm_img_batch.append(norm_img[0])
norm_img_batch = np.concatenate(norm_img_batch, axis=0).copy()
feed = {"image": norm_img_batch.copy()}
return feed, self.fetch, args
def postprocess(self, outputs, args):
if self.loss_type == "ctc":
rec_idx_batch = outputs[0]
predict_batch = outputs[1]
rec_idx_lod = args["save_infer_model/scale_0.tmp_0.lod"]
predict_lod = args["save_infer_model/scale_1.tmp_0.lod"]
indices = args["indices"]
rec_res = [['', 0.0]] * (len(rec_idx_lod) - 1)
for rno in range(len(rec_idx_lod) - 1):
beg = rec_idx_lod[rno]
end = rec_idx_lod[rno + 1]
rec_idx_tmp = rec_idx_batch[beg:end, 0]
preds_text = self.char_ops.decode(rec_idx_tmp)
beg = predict_lod[rno]
end = predict_lod[rno + 1]
probs = predict_batch[beg:end, :]
ind = np.argmax(probs, axis=1)
blank = probs.shape[1]
valid_ind = np.where(ind != (blank - 1))[0]
if len(valid_ind) == 0:
continue
score = np.mean(probs[valid_ind, ind[valid_ind]])
rec_res[indices[rno]] = [preds_text, score]
elif self.loss_type == 'srn':
char_num = self.char_ops.get_char_num()
preds = rec_idx_batch.reshape(-1)
elapse = time.time() - starttime
predict_time += elapse
total_preds = preds.copy()
for ino in range(int(len(rec_idx_batch) / self.text_len)):
preds = total_preds[ino * self.text_len:(ino + 1) *
self.text_len]
ind = np.argmax(probs, axis=1)
valid_ind = np.where(preds != int(char_num - 1))[0]
if len(valid_ind) == 0:
continue
score = np.mean(probs[valid_ind, ind[valid_ind]])
preds = preds[:valid_ind[-1] + 1]
preds_text = self.char_ops.decode(preds)
rec_res[indices[ino]] = [preds_text, score]
else:
for rno in range(len(rec_idx_batch)):
end_pos = np.where(rec_idx_batch[rno, :] == 1)[0]
if len(end_pos) <= 1:
preds = rec_idx_batch[rno, 1:]
score = np.mean(predict_batch[rno, 1:])
else:
preds = rec_idx_batch[rno, 1:end_pos[1]]
score = np.mean(predict_batch[rno, 1:end_pos[1]])
preds_text = self.char_ops.decode(preds)
rec_res[indices[rno]] = [preds_text, score]
return rec_res
class OCRService(WebService): class OCRService(WebService):
def init_rec(self): def init_rec(self):
self.ocr_reader = OCRReader() self.text_recognizer = TextRecognizerHelper(global_args)
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
# TODO: to handle batch rec images
img_list = [] img_list = []
for feed_data in feed: for feed_data in feed:
data = base64.b64decode(feed_data["image"].encode('utf8')) data = base64.b64decode(feed_data["image"].encode('utf8'))
data = np.fromstring(data, np.uint8) data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR) im = cv2.imdecode(data, cv2.IMREAD_COLOR)
img_list.append(im) img_list.append(im)
max_wh_ratio = 0 feed, fetch, self.tmp_args = self.text_recognizer.preprocess(img_list)
for i, boximg in enumerate(img_list):
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
_, w, h = self.ocr_reader.resize_norm_img(img_list[0],
max_wh_ratio).shape
imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
for i, img in enumerate(img_list):
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
imgs[i] = norm_img
feed = {"image": imgs.copy()}
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed, fetch return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None): def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) outputs = [fetch_map[x] for x in self.text_recognizer.fetch]
res_lst = [] for x in fetch_map.keys():
for res in rec_res: if ".lod" in x:
res_lst.append(res[0]) self.tmp_args[x] = fetch_map[x]
res = {"res": res_lst} rec_res = self.text_recognizer.postprocess(outputs, self.tmp_args)
res = {
"pred_text": [x[0] for x in rec_res],
"score": [str(x[1]) for x in rec_res]
}
return res return res
ocr_service = OCRService(name="ocr") if __name__ == "__main__":
ocr_service.load_model_config("ocr_rec_model") ocr_service = OCRService(name="ocr")
ocr_service.init_rec() ocr_service.load_model_config(global_args.rec_model_dir)
if sys.argv[1] == 'gpu': ocr_service.init_rec()
ocr_service.set_gpus("0") if global_args.use_gpu:
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) ocr_service.prepare_server(
ocr_service.run_debugger_service(gpu=True) workdir="workdir", port=9292, device="gpu", gpuid=0)
elif sys.argv[1] == 'cpu': else:
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_debugger_service() ocr_service.run_debugger_service()
ocr_service.run_web_service() ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
import cv2
import sys
import numpy as np
import os
import time
import re
import base64
from tools.infer.predict_rec import TextRecognizer
from params import read_params
global_args = read_params()
if global_args.use_gpu:
from paddle_serving_server_gpu.web_service import WebService
else:
from paddle_serving_server.web_service import WebService
class TextRecognizerHelper(TextRecognizer):
def __init__(self, args):
super(TextRecognizerHelper, self).__init__(args)
if self.loss_type == "ctc":
self.fetch = ["save_infer_model/scale_0.tmp_0", "save_infer_model/scale_1.tmp_0"]
def preprocess(self, img_list):
img_num = len(img_list)
args = {}
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
indices = np.argsort(np.array(width_list))
args["indices"] = indices
predict_time = 0
beg_img_no = 0
end_img_no = img_num
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
if self.loss_type != "srn":
norm_img = self.resize_norm_img(img_list[indices[ino]],
max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
else:
norm_img = self.process_image_srn(img_list[indices[ino]],
self.rec_image_shape, 8, 25,
self.char_ops)
encoder_word_pos_list = []
gsrm_word_pos_list = []
gsrm_slf_attn_bias1_list = []
gsrm_slf_attn_bias2_list = []
encoder_word_pos_list.append(norm_img[1])
gsrm_word_pos_list.append(norm_img[2])
gsrm_slf_attn_bias1_list.append(norm_img[3])
gsrm_slf_attn_bias2_list.append(norm_img[4])
norm_img_batch.append(norm_img[0])
norm_img_batch = np.concatenate(norm_img_batch, axis=0)
if img_num > 1:
feed = [{
"image": norm_img_batch[x]
} for x in range(norm_img_batch.shape[0])]
else:
feed = {"image": norm_img_batch[0]}
return feed, self.fetch, args
def postprocess(self, outputs, args):
if self.loss_type == "ctc":
rec_idx_batch = outputs[0]
predict_batch = outputs[1]
rec_idx_lod = args["save_infer_model/scale_0.tmp_0.lod"]
predict_lod = args["save_infer_model/scale_1.tmp_0.lod"]
indices = args["indices"]
rec_res = [['', 0.0]] * (len(rec_idx_lod) - 1)
for rno in range(len(rec_idx_lod) - 1):
beg = rec_idx_lod[rno]
end = rec_idx_lod[rno + 1]
rec_idx_tmp = rec_idx_batch[beg:end, 0]
preds_text = self.char_ops.decode(rec_idx_tmp)
beg = predict_lod[rno]
end = predict_lod[rno + 1]
probs = predict_batch[beg:end, :]
ind = np.argmax(probs, axis=1)
blank = probs.shape[1]
valid_ind = np.where(ind != (blank - 1))[0]
if len(valid_ind) == 0:
continue
score = np.mean(probs[valid_ind, ind[valid_ind]])
rec_res[indices[rno]] = [preds_text, score]
elif self.loss_type == 'srn':
char_num = self.char_ops.get_char_num()
preds = rec_idx_batch.reshape(-1)
elapse = time.time() - starttime
predict_time += elapse
total_preds = preds.copy()
for ino in range(int(len(rec_idx_batch) / self.text_len)):
preds = total_preds[ino * self.text_len:(ino + 1) *
self.text_len]
ind = np.argmax(probs, axis=1)
valid_ind = np.where(preds != int(char_num - 1))[0]
if len(valid_ind) == 0:
continue
score = np.mean(probs[valid_ind, ind[valid_ind]])
preds = preds[:valid_ind[-1] + 1]
preds_text = self.char_ops.decode(preds)
rec_res[indices[ino]] = [preds_text, score]
else:
for rno in range(len(rec_idx_batch)):
end_pos = np.where(rec_idx_batch[rno, :] == 1)[0]
if len(end_pos) <= 1:
preds = rec_idx_batch[rno, 1:]
score = np.mean(predict_batch[rno, 1:])
else:
preds = rec_idx_batch[rno, 1:end_pos[1]]
score = np.mean(predict_batch[rno, 1:end_pos[1]])
preds_text = self.char_ops.decode(preds)
rec_res[indices[rno]] = [preds_text, score]
return rec_res
class OCRService(WebService):
def init_rec(self):
self.text_recognizer = TextRecognizerHelper(global_args)
def preprocess(self, feed=[], fetch=[]):
# TODO: to handle batch rec images
img_list = []
for feed_data in feed:
data = base64.b64decode(feed_data["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
img_list.append(im)
feed, fetch, self.tmp_args = self.text_recognizer.preprocess(img_list)
return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
outputs = [fetch_map[x] for x in self.text_recognizer.fetch]
for x in fetch_map.keys():
if ".lod" in x:
self.tmp_args[x] = fetch_map[x]
rec_res = self.text_recognizer.postprocess(outputs, self.tmp_args)
res = {
"pred_text": [x[0] for x in rec_res],
"score": [str(x[1]) for x in rec_res]
}
return res
if __name__ == "__main__":
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config(global_args.rec_model_dir)
ocr_service.init_rec()
if global_args.use_gpu:
ocr_service.prepare_server(
workdir="workdir", port=9292, device="gpu", gpuid=0)
else:
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
ocr_service.run_rpc_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
import requests
import json
import cv2
import base64
import os, sys
import time
def cv2_to_base64(image):
#data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(image).decode(
'utf8') #data.tostring()).decode('utf8')
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:9292/ocr/prediction"
test_img_dir = "../../doc/imgs_words/ch/"
for img_file in os.listdir(test_img_dir):
with open(os.path.join(test_img_dir, img_file), 'rb') as file:
image_data1 = file.read()
image = cv2_to_base64(image_data1)
data = {"feed": [{"image": image}], "fetch": ["res"]}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
## 介绍
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。
在开始本教程之前,建议先了解:
1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)
2. [分类模型裁剪教程](https://paddlepaddle.github.io/PaddleSlim/tutorials/pruning_tutorial/)
3. [PaddleSlim 裁剪压缩API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/)
## 快速开始
模型裁剪主要包括五个步骤:
1. 安装 PaddleSlim
2. 准备训练好的模型
3. 敏感度分析、训练
4. 模型裁剪训练
5. 导出模型、预测部署
### 1. 安装PaddleSlim
```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim
python setup.py install
```
### 2. 获取预训练模型
模型裁剪需要加载事先训练好的模型,PaddleOCR也提供了一系列模型[../../../doc/doc_ch/models_list.md],开发者可根据需要自行选择模型或使用自己的模型。
### 3. 敏感度分析训练
加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,得到敏感度文件:sensitivities_0.data,可以通过PaddleSlim提供的[接口](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221)加载文件,获得各网络层在不同裁剪比例下的精度损失。从而了解各网络层冗余度,决定每个网络层的裁剪比例。
敏感度分析的具体细节见:[敏感度分析](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
敏感度文件内容格式:
sensitivities_0.data(Dict){
'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
}
例子:
{
'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
}
加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析训练:
```bash
python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db_v1.1.yml -o Global.pretrain_weights="your trained model" Global.test_batch_size_per_card=1
```
### 4. 模型裁剪训练
裁剪时通过之前的敏感度分析文件决定每个网络层的裁剪比例。在具体实现时,为了尽可能多的保留从图像中提取的低阶特征,我们跳过了backbone中靠近输入的4个卷积层。同样,为了减少由于裁剪导致的模型性能损失,我们通过之前敏感度分析所获得的敏感度表,人工挑选出了一些冗余较少,对裁剪较为敏感的[网络层](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41)(指在较低的裁剪比例下就导致很高性能损失的网络层),并在之后的裁剪过程中选择避开这些网络层。裁剪过后finetune的过程沿用OCR检测模型原始的训练策略。
```bash
python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db_v1.1.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
```
通过对比可以发现,经过裁剪训练保存的模型更小。
### 5. 导出模型、预测部署
在得到裁剪训练保存的模型后,我们可以将其导出为inference_model:
```bash
python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db_v1.1.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
```
inference model的预测和部署参考:
1. [inference model python端预测](../../../doc/doc_ch/inference.md)
2. [inference model C++预测](../../cpp_infer/readme.md)
3. [inference model在移动端部署](../../lite/readme.md)
\> 运行示例前请先安装develop版本PaddleSlim
# 模型裁剪压缩教程
压缩结果:
<table>
<thead>
<tr>
<th>序号</th>
<th>任务</th>
<th>模型</th>
<th>压缩策略<sup><a href="#quant">[3]</a><a href="#prune">[4]</a><sup></th>
<th>精度(自建中文数据集)</th>
<th>耗时<sup><a href="#latency">[1]</a></sup>(ms)</th>
<th>整体耗时<sup><a href="#rec">[2]</a></sup>(ms)</th>
<th>加速比</th>
<th>整体模型大小(M)</th>
<th>压缩比例</th>
<th>下载链接</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">0</td>
<td>检测</td>
<td>MobileNetV3_DB</td>
<td></td>
<td>61.7</td>
<td>224</td>
<td rowspan="2">375</td>
<td rowspan="2">-</td>
<td rowspan="2">8.6</td>
<td rowspan="2">-</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>MobileNetV3_CRNN</td>
<td></td>
<td>62.0</td>
<td>9.52</td>
<td></td>
</tr>
<tr>
<td rowspan="2">1</td>
<td>检测</td>
<td>SlimTextDet</td>
<td>PACT量化训练</td>
<td>62.1</td>
<td>195</td>
<td rowspan="2">348</td>
<td rowspan="2">8%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>SlimTextRec</td>
<td>PACT量化训练</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">2</td>
<td>检测</td>
<td>SlimTextDet_quat_pruning</td>
<td>剪裁+PACT量化训练</td>
<td>60.86</td>
<td>142</td>
<td rowspan="2">288</td>
<td rowspan="2">30%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>SlimTextRec</td>
<td>PACT量化训练</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">3</td>
<td>检测</td>
<td>SlimTextDet_pruning</td>
<td>剪裁</td>
<td>61.57</td>
<td>138</td>
<td rowspan="2">295</td>
<td rowspan="2">27%</td>
<td rowspan="2">2.9</td>
<td rowspan="2">66.28%</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>SlimTextRec</td>
<td>PACT量化训练</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
</tbody>
</table>
## 概述
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
该示例使用PaddleSlim提供的[裁剪压缩API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/)对OCR模型进行压缩。
在阅读该示例前,建议您先了解以下内容:
\- [OCR模型的常规训练方法](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
\- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)
## 安装PaddleSlim
```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim
python setup.py install
```
## 获取预训练模型
[检测预训练模型下载地址]()
## 敏感度分析训练
加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,了解各网络层冗余度,从而决定每个网络层的裁剪比例。敏感度分析的具体细节见:[敏感度分析](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析:
```bash
python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
```
## 裁剪模型与fine-tune
裁剪时通过之前的敏感度分析文件决定每个网络层的裁剪比例。在具体实现时,为了尽可能多的保留从图像中提取的低阶特征,我们跳过了backbone中靠近输入的4个卷积层。同样,为了减少由于裁剪导致的模型性能损失,我们通过之前敏感度分析所获得的敏感度表,挑选出了一些冗余较少,对裁剪较为敏感的[网络层](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41),并在之后的裁剪过程中选择避开这些网络层。裁剪过后finetune的过程沿用OCR检测模型原始的训练策略。
```bash
python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
```
## 导出模型
在得到裁剪训练保存的模型后,我们可以将其导出为inference_model,用于预测部署:
```bash
python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
```
\> PaddleSlim develop version should be installed before runing this example.
## Introduction
# Model compress tutorial (Pruning)
Compress results:
<table>
<thead>
<tr>
<th>ID</th>
<th>Task</th>
<th>Model</th>
<th>Compress Strategy<sup><a href="#quant">[3]</a><a href="#prune">[4]</a><sup></th>
<th>Criterion(Chinese dataset)</th>
<th>Inference Time<sup><a href="#latency">[1]</a></sup>(ms)</th>
<th>Inference Time(Total model)<sup><a href="#rec">[2]</a></sup>(ms)</th>
<th>Acceleration Ratio</th>
<th>Model Size(MB)</th>
<th>Commpress Ratio</th>
<th>Download Link</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">0</td>
<td>Detection</td>
<td>MobileNetV3_DB</td>
<td>None</td>
<td>61.7</td>
<td>224</td>
<td rowspan="2">375</td>
<td rowspan="2">-</td>
<td rowspan="2">8.6</td>
<td rowspan="2">-</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>MobileNetV3_CRNN</td>
<td>None</td>
<td>62.0</td>
<td>9.52</td>
<td></td>
</tr>
<tr>
<td rowspan="2">1</td>
<td>Detection</td>
<td>SlimTextDet</td>
<td>PACT Quant Aware Training</td>
<td>62.1</td>
<td>195</td>
<td rowspan="2">348</td>
<td rowspan="2">8%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">2</td>
<td>Detection</td>
<td>SlimTextDet_quat_pruning</td>
<td>Pruning+PACT Quant Aware Training</td>
<td>60.86</td>
<td>142</td>
<td rowspan="2">288</td>
<td rowspan="2">30%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PPACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">3</td>
<td>Detection</td>
<td>SlimTextDet_pruning</td>
<td>Pruning</td>
<td>61.57</td>
<td>138</td>
<td rowspan="2">295</td>
<td rowspan="2">27%</td>
<td rowspan="2">2.9</td>
<td rowspan="2">66.28%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
</tbody>
</table>
## Overview
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance. Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model. This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.
It is recommended that you could understand following pages before reading this example,: It is recommended that you could understand following pages before reading this example:
1. [PaddleOCR training methods](../../../doc/doc_ch/quickstart.md)
2. [The demo of prune](https://paddlepaddle.github.io/PaddleSlim/tutorials/pruning_tutorial/)
3. [PaddleSlim prune API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/)
\- [The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
\- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/)
## Quick start
Five steps for OCR model prune:
1. Install PaddleSlim
2. Prepare the trained model
3. Sensitivity analysis and training
4. Model tailoring training
5. Export model, predict deployment
## Install PaddleSlim ### 1. Install PaddleSlim
```bash ```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim cd Paddleslim
python setup.py install python setup.py install
``` ```
## Download Pretrain Model ### 2. Download Pretrain Model
Model prune needs to load pre-trained models.
PaddleOCR also provides a series of (models)[../../../doc/doc_en/models_list_en.md]. Developers can choose their own models or use their own models according to their needs.
[Download link of Detection pretrain model]()
### 3. Pruning sensitivity analysis
## Pruning sensitivity analysis After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sensitivities_0.data. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
The data format of sensitivity file:
sensitivities_0.data(Dict){
'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
}
example:
{
'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
}
The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of correspoding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, thereby determining the pruning ratio of each network layer. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command: Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:
```bash ```bash
python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1 python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db_v1.1.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
``` ```
## Model pruning and Fine-tune ### 4. Model pruning and Fine-tune
When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model. When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model.
```bash ```bash
python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1 python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db_v1.1.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
``` ```
### 5. Export inference model and deploy it
We can export the pruned model as inference_model for deployment:
## Export inference model
After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:
```bash ```bash
python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db_v1.1.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
``` ```
Reference for prediction and deployment of inference model:
1. [inference model python prediction](../../../doc/doc_en/inference_en.md)
2. [inference model C++ prediction](../../cpp_infer/readme_en.md)
3. [Deployment of inference model on mobile](../../lite/readme_en.md)
...@@ -92,7 +92,8 @@ def main(): ...@@ -92,7 +92,8 @@ def main():
sen = load_sensitivities("sensitivities_0.data") sen = load_sensitivities("sensitivities_0.data")
for i in skip_list: for i in skip_list:
sen.pop(i) if i in sen.keys():
sen.pop(i)
back_bone_list = ['conv' + str(x) for x in range(1, 5)] back_bone_list = ['conv' + str(x) for x in range(1, 5)]
for i in back_bone_list: for i in back_bone_list:
for key in list(sen.keys()): for key in list(sen.keys()):
...@@ -134,7 +135,7 @@ def main(): ...@@ -134,7 +135,7 @@ def main():
if alg in ['EAST', 'DB']: if alg in ['EAST', 'DB']:
program.train_eval_det_run( program.train_eval_det_run(
config, exe, train_info_dict, eval_info_dict, is_pruning=True) config, exe, train_info_dict, eval_info_dict, is_slim="prune")
else: else:
program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict) program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict)
......
> 运行示例前请先安装1.2.0或更高版本PaddleSlim
# 模型量化压缩教程
压缩结果:
<table>
<thead>
<tr>
<th>序号</th>
<th>任务</th>
<th>模型</th>
<th>压缩策略</th>
<th>精度(自建中文数据集)</th>
<th>耗时(ms)</th>
<th>整体耗时(ms)</th>
<th>加速比</th>
<th>整体模型大小(M)</th>
<th>压缩比例</th>
<th>下载链接</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">0</td>
<td>检测</td>
<td>MobileNetV3_DB</td>
<td></td>
<td>61.7</td>
<td>224</td>
<td rowspan="2">375</td>
<td rowspan="2">-</td>
<td rowspan="2">8.6</td>
<td rowspan="2">-</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>MobileNetV3_CRNN</td>
<td></td>
<td>62.0</td>
<td>9.52</td>
<td></td>
</tr>
<tr>
<td rowspan="2">1</td>
<td>检测</td>
<td>SlimTextDet</td>
<td>PACT量化训练</td>
<td>62.1</td>
<td>195</td>
<td rowspan="2">348</td>
<td rowspan="2">8%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>SlimTextRec</td>
<td>PACT量化训练</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">2</td>
<td>检测</td>
<td>SlimTextDet_quat_pruning</td>
<td>剪裁+PACT量化训练</td>
<td>60.86</td>
<td>142</td>
<td rowspan="2">288</td>
<td rowspan="2">30%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>SlimTextRec</td>
<td>PACT量化训练</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">3</td>
<td>检测</td>
<td>SlimTextDet_pruning</td>
<td>剪裁</td>
<td>61.57</td>
<td>138</td>
<td rowspan="2">295</td>
<td rowspan="2">27%</td>
<td rowspan="2">2.9</td>
<td rowspan="2">66.28%</td>
<td></td>
</tr>
<tr>
<td>识别</td>
<td>SlimTextRec</td>
<td>PACT量化训练</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
</tbody>
</table>
## 概述
## 介绍
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。
该示例使用PaddleSlim提供的[量化压缩API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)对OCR模型进行压缩。 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
在阅读该示例前,建议您先了解以下内容: [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。
- [OCR模型的常规训练方法](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md) 在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
- [PaddleSlim使用文档](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
## 快速开始
量化多适用于轻量模型在移动端的部署,当训练出一个模型后,如果希望进一步的压缩模型大小并加速预测,可使用量化的方法压缩模型。
## 安装PaddleSlim 模型量化主要包括五个步骤:
1. 安装 PaddleSlim
2. 准备训练好的模型
3. 量化训练
4. 导出量化推理模型
5. 量化模型预测部署
### 1. 安装PaddleSlim
```bash ```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim cd Paddleslim
python setup.py install python setup.py install
``` ```
### 2. 准备训练好的模型
PaddleOCR提供了一系列训练好的[模型](../../../doc/doc_ch/models_list.md),如果待量化的模型不在列表中,需要按照[常规训练](../../../doc/doc_ch/quickstart.md)方法得到训练好的模型。
## 获取预训练模型 ### 3. 量化训练
量化训练包括离线量化训练和在线量化训练,在线量化训练效果更好,需加载预训练模型,在定义好量化策略后即可对模型进行量化。
[识别预训练模型下载地址]()
[检测预训练模型下载地址]()
## 量化训练
加载预训练模型后,在定义好量化策略后即可对模型进行量化。量化相关功能的使用具体细节见:[模型量化](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/quantization_api.html)
进入PaddleOCR根目录,通过以下命令对模型进行量化:
量化训练的代码位于slim/quantization/quant/py 中,比如训练检测模型,训练指令如下:
```bash ```bash
python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=det_mv3_db/best_accuracy Global.save_model_dir=./output/quant_model python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model' Global.save_model_dir=./output/quant_model
```
# 比如下载提供的训练模型
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar
tar xf ch_ppocr_mobile_v1.1_det_train.tar
python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_model_dir=./output/quant_model
```
如果要训练识别模型的量化,修改配置文件和加载的模型参数即可。
## 导出模型 ### 4. 导出模型
在得到量化训练保存的模型后,我们可以将其导出为inference_model,用于预测部署: 在得到量化训练保存的模型后,我们可以将其导出为inference_model,用于预测部署:
```bash ```bash
python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model
``` ```
### 5. 量化模型部署
上述步骤导出的量化模型,参数精度仍然是FP32,但是参数的数值范围是int8,导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。
量化模型部署的可参考 [移动端模型部署](../lite/readme.md)
\> PaddleSlim 1.2.0 or higher version should be installed before runing this example.
# Model compress tutorial (Quantization)
Compress results:
<table>
<thead>
<tr>
<th>ID</th>
<th>Task</th>
<th>Model</th>
<th>Compress Strategy</th>
<th>Criterion(Chinese dataset)</th>
<th>Inference Time(ms)</th>
<th>Inference Time(Total model)(ms)</th>
<th>Acceleration Ratio</th>
<th>Model Size(MB)</th>
<th>Commpress Ratio</th>
<th>Download Link</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">0</td>
<td>Detection</td>
<td>MobileNetV3_DB</td>
<td>None</td>
<td>61.7</td>
<td>224</td>
<td rowspan="2">375</td>
<td rowspan="2">-</td>
<td rowspan="2">8.6</td>
<td rowspan="2">-</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>MobileNetV3_CRNN</td>
<td>None</td>
<td>62.0</td>
<td>9.52</td>
<td></td>
</tr>
<tr>
<td rowspan="2">1</td>
<td>Detection</td>
<td>SlimTextDet</td>
<td>PACT Quant Aware Training</td>
<td>62.1</td>
<td>195</td>
<td rowspan="2">348</td>
<td rowspan="2">8%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">2</td>
<td>Detection</td>
<td>SlimTextDet_quat_pruning</td>
<td>Pruning+PACT Quant Aware Training</td>
<td>60.86</td>
<td>142</td>
<td rowspan="2">288</td>
<td rowspan="2">30%</td>
<td rowspan="2">2.8</td>
<td rowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PPACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<td rowspan="2">3</td>
<td>Detection</td>
<td>SlimTextDet_pruning</td>
<td>Pruning</td>
<td>61.57</td>
<td>138</td>
<td rowspan="2">295</td>
<td rowspan="2">27%</td>
<td rowspan="2">2.9</td>
<td rowspan="2">66.28%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
</tbody>
</table>
## Overview
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Quantization is a technique that reduces this redundancyby reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model. ## Introduction
It is recommended that you could understand following pages before reading this example,:
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model.
- [The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md) It is recommended that you could understand following pages before reading this example:
- [The training strategy of OCR model](../../../doc/doc_en/quickstart_en.md)
- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) - [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)
## Quick Start
Quantization is mostly suitable for the deployment of lightweight models on mobile terminals.
After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps.
1. Install PaddleSlim
2. Prepare trained model
3. Quantization-Aware Training
4. Export inference model
5. Deploy quantization inference model
## Install PaddleSlim
### 1. Install PaddleSlim
```bash ```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim cd Paddleslim
python setup.py install python setup.py install
``` ```
## Download Pretrain Model ### 2. Download Pretrain Model
PaddleOCR provides a series of trained [models](../../../doc/doc_en/models_list_en.md).
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../doc/doc_en/quickstart_en.md) method to get the trained model.
[Download link of Detection pretrain model]()
[Download link of recognization pretrain model]() ### 3. Quant-Aware Training
Quantization training includes offline quantization training and online quantization training.
Online quantization training is more effective. It is necessary to load the pre-training model.
After the quantization strategy is defined, the model can be quantified.
The code for quantization training is located in `slim/quantization/quant/py`. For example, to train a detection model, the training instructions are as follows:
```bash
python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model' Global.save_model_dir=./output/quant_model
## Quan-Aware Training # download provided model
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar
After loading the pre training model, the model can be quantified after defining the quantization strategy. For specific details of quantization method, see:[Model Quantization](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/quantization_api.html) tar xf ch_ppocr_mobile_v1.1_det_train.tar
python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_model_dir=./output/quant_model
Enter the PaddleOCR root directory,perform model quantization with the following command:
```bash
python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
``` ```
### 4. Export inference model
## Export inference model
After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment: After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:
```bash ```bash
python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model
``` ```
### 5. Deploy
The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
The derived model can be converted through the `opt tool` of PaddleLite.
For quantitative model deployment, please refer to [Mobile terminal model deployment](../lite/readme_en.md)
...@@ -51,6 +51,7 @@ from paddleslim.quant import quant_aware, convert ...@@ -51,6 +51,7 @@ from paddleslim.quant import quant_aware, convert
from paddle.fluid.layer_helper import LayerHelper from paddle.fluid.layer_helper import LayerHelper
from eval_utils.eval_det_utils import eval_det_run from eval_utils.eval_det_utils import eval_det_run
from eval_utils.eval_rec_utils import eval_rec_run from eval_utils.eval_rec_utils import eval_rec_run
from eval_utils.eval_cls_utils import eval_cls_run
def main(): def main():
...@@ -105,6 +106,8 @@ def main(): ...@@ -105,6 +106,8 @@ def main():
if alg_type == 'det': if alg_type == 'det':
final_metrics = eval_det_run(exe, config, quant_info_dict, "eval") final_metrics = eval_det_run(exe, config, quant_info_dict, "eval")
elif alg_type == 'cls':
final_metrics = eval_cls_run(exe, quant_info_dict)
else: else:
final_metrics = eval_rec_run(exe, config, quant_info_dict, "eval") final_metrics = eval_rec_run(exe, config, quant_info_dict, "eval")
print(final_metrics) print(final_metrics)
......
...@@ -155,14 +155,13 @@ def main(): ...@@ -155,14 +155,13 @@ def main():
act_preprocess_func=act_preprocess_func, act_preprocess_func=act_preprocess_func,
optimizer_func=optimizer_func, optimizer_func=optimizer_func,
executor=executor, executor=executor,
for_test=False, for_test=False)
return_program=True)
# compile program for multi-devices # compile program for multi-devices
train_compile_program = program.create_multi_devices_program( train_compile_program = program.create_multi_devices_program(
quant_train_program, train_opt_loss_name, for_quant=True) quant_train_program, train_opt_loss_name, for_quant=True)
init_model(config, quant_train_program, exe) init_model(config, train_program, exe)
train_info_dict = {'compile_program':train_compile_program,\ train_info_dict = {'compile_program':train_compile_program,\
'train_program':quant_train_program,\ 'train_program':quant_train_program,\
...@@ -177,9 +176,14 @@ def main(): ...@@ -177,9 +176,14 @@ def main():
'fetch_varname_list':eval_fetch_varname_list} 'fetch_varname_list':eval_fetch_varname_list}
if train_alg_type == 'det': if train_alg_type == 'det':
program.train_eval_det_run(config, exe, train_info_dict, eval_info_dict) program.train_eval_det_run(
config, exe, train_info_dict, eval_info_dict, is_slim="quant")
elif train_alg_type == 'rec':
program.train_eval_rec_run(
config, exe, train_info_dict, eval_info_dict, is_slim="quant")
else: else:
program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict) program.train_eval_cls_run(
config, exe, train_info_dict, eval_info_dict, is_slim="quant")
if __name__ == '__main__': if __name__ == '__main__':
......
文件已添加
...@@ -9,19 +9,45 @@ ...@@ -9,19 +9,45 @@
## PaddleOCR常见问题汇总(持续更新) ## PaddleOCR常见问题汇总(持续更新)
* [近期更新(2020.10.26)](#近期更新)
* [【精选】OCR精选10个问题](#OCR精选10个问题) * [【精选】OCR精选10个问题](#OCR精选10个问题)
* [【理论篇】OCR通用21个问题](#OCR通用问题) * [【理论篇】OCR通用23个问题](#OCR通用问题)
* [基础知识3](#基础知识) * [基础知识5](#基础知识)
* [数据集4题](#数据集) * [数据集4题](#数据集)
* [模型训练调优6题](#模型训练调优) * [模型训练调优6题](#模型训练调优)
* [预测部署8题](#预测部署) * [预测部署8题](#预测部署)
* [【实战篇】PaddleOCR实战53个问题](#PaddleOCR实战问题) * [【实战篇】PaddleOCR实战61个问题](#PaddleOCR实战问题)
* [使用咨询17](#使用咨询) * [使用咨询20](#使用咨询)
* [数据集9](#数据集) * [数据集10](#数据集)
* [模型训练调优13](#模型训练调优) * [模型训练调优15](#模型训练调优)
* [预测部署14](#预测部署) * [预测部署16](#预测部署)
<a name="近期更新"></a>
## 近期更新(2020.10.26)
#### Q2.1.4 印章如何识别
**A**: 1. 使用带tps的识别网络或abcnet,2.使用极坐标变换将图片拉平之后使用crnn
#### Q2.1.5 多语言的字典里是混合了不同的语种,这个是有什么讲究吗?统一到一个字典里会对精度造成多大的损失?
**A**:统一到一个字典里,会造成最后一层FC过大,增加模型大小。如果有特殊需求的话,可以把需要的几种语言合并字典训练模型,合并字典之后如果引入过多的形近字,可能会造成精度损失,字符平衡的问题可能也需要考虑一下。在PaddleOCR里暂时将语言字典分开。
#### Q3.3.16: 如何对检测模型finetune,比如冻结前面的层或某些层使用小的学习率学习?
**A**:如果是冻结某些层,可以将变量的stop_gradient属性设置为True,这样计算这个变量之前的所有参数都不会更新了,参考:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/faq/train_cn.html#id4
如果对某些层使用更小的学习率学习,静态图里还不是很方便,一个方法是在参数初始化的时候,给权重的属性设置固定的学习率,参考:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/fluid/param_attr/ParamAttr_cn.html#paramattr
实际上我们实验发现,直接加载模型去fine-tune,不设置某些层不同学习率,效果也都不错
#### Q3.3.17: 使用通用中文模型作为预训练模型,更改了字典文件,出现ctc_fc_b not used的错误
**A**:修改了字典之后,识别模型的最后一层FC纬度发生了改变,没有办法加载参数。这里是一个警告,可以忽略,正常训练即可。
#### Q3.1.18:如何加入自己的检测算法?
**A**:1. 在ppocr/modeling对应目录下分别选择backbone,head。如果没有可用的可以新建文件并添加
2. 在ppocr/data下选择对应的数据处理处理方式,如果没有可用的可以新建文件并添加
3. 在ppocr/losses下新建文件并编写loss
4. 在ppocr/postprocess下新建文件并编写后处理算法
5. 将上面四个步骤里新添加的类或函数参照yml文件写到配置中
<a name="OCR精选10个问题"></a> <a name="OCR精选10个问题"></a>
## 【精选】OCR精选10个问题 ## 【精选】OCR精选10个问题
...@@ -127,6 +153,11 @@ ...@@ -127,6 +153,11 @@
**A**:端到端在文字分布密集的业务场景,效率会比较有保证,精度的话看自己业务数据积累情况,如果行级别的识别数据积累比较多的话two-stage会比较好。百度的落地场景,比如工业仪表识别、车牌识别都用到端到端解决方案。 **A**:端到端在文字分布密集的业务场景,效率会比较有保证,精度的话看自己业务数据积累情况,如果行级别的识别数据积累比较多的话two-stage会比较好。百度的落地场景,比如工业仪表识别、车牌识别都用到端到端解决方案。
#### Q2.1.4 印章如何识别
**A**: 1. 使用带tps的识别网络或abcnet,2.使用极坐标变换将图片拉平之后使用crnn
#### Q2.1.5 多语言的字典里是混合了不同的语种,这个是有什么讲究吗?统一到一个字典里会对精度造成多大的损失?
**A**:统一到一个字典里,会造成最后一层FC过大,增加模型大小。如果有特殊需求的话,可以把需要的几种语言合并字典训练模型,合并字典之后如果引入过多的形近字,可能会造成精度损失,字符平衡的问题可能也需要考虑一下。在PaddleOCR里暂时将语言字典分开。
### 数据集 ### 数据集
...@@ -305,6 +336,13 @@ ...@@ -305,6 +336,13 @@
|8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml| |8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml|
|通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml| |通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|
#### 3.1.18:如何加入自己的检测算法?
**A**:1. 在ppocr/modeling对应目录下分别选择backbone,head。如果没有可用的可以新建文件并添加
2. 在ppocr/data下选择对应的数据处理处理方式,如果没有可用的可以新建文件并添加
3. 在ppocr/losses下新建文件并编写loss
4. 在ppocr/postprocess下新建文件并编写后处理算法
5. 将上面四个步骤里新添加的类或函数参照yml文件写到配置中
### 数据集 ### 数据集
...@@ -359,6 +397,12 @@ ...@@ -359,6 +397,12 @@
**A**:可以主要参考可视化效果,通用模型更倾向于检测一整行文字,轻量级可能会有一行文字被分成两段检测的情况,不是数量越多,效果就越好。 **A**:可以主要参考可视化效果,通用模型更倾向于检测一整行文字,轻量级可能会有一行文字被分成两段检测的情况,不是数量越多,效果就越好。
#### Q3.2.10: crnn+ctc模型训练所用的垂直文本(旋转至水平方向)是如何生成的?
**A**:方法与合成水平方向文字一致,只是将字体替换成了垂直字体。
### 模型训练调优 ### 模型训练调优
#### Q3.3.1:文本长度超过25,应该怎么处理? #### Q3.3.1:文本长度超过25,应该怎么处理?
...@@ -406,7 +450,7 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320) ...@@ -406,7 +450,7 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320)
#### Q3.3.8:如何进行模型微调? #### Q3.3.8:如何进行模型微调?
**A**:注意配置好匹配的数据集合适,然后在finetune训练时,可以加载我们提供的预训练模型,设置配置文件中Global.pretrain_weights 参数为要加载的预训练模型路径。 **A**:注意配置好合适的数据集,对齐数据格式,然后在finetune训练时,可以加载我们提供的预训练模型,设置配置文件中Global.pretrain_weights 参数为要加载的预训练模型路径。
#### Q3.3.9:文本检测换成自己的数据没法训练,有一些”###”是什么意思? #### Q3.3.9:文本检测换成自己的数据没法训练,有一些”###”是什么意思?
...@@ -418,7 +462,7 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320) ...@@ -418,7 +462,7 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320)
#### Q3.3.11:自己训练出来的未inference转换的模型 可以当作预训练模型吗? #### Q3.3.11:自己训练出来的未inference转换的模型 可以当作预训练模型吗?
**A**:可以的,但是如果训练数据量少的话,可能会过拟合到少量数据上,泛化性能不佳。 **A**:可以的,但是如果训练数据量少的话,可能会过拟合到少量数据上,泛化性能不佳。
#### Q3.3.12:使用带TPS的识别模型预测报错 #### Q3.3.12:使用带TPS的识别模型预测报错
...@@ -428,6 +472,26 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320) ...@@ -428,6 +472,26 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320)
**A**:TPS模块暂时无法支持变长的输入,请设置 ``--rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape`` **A**:TPS模块暂时无法支持变长的输入,请设置 ``--rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape``
#### Q3.3.14:使用之前版本的代码加载最新1.1版的通用检测预训练模型,提示在模型文件.pdparams中找不到bn4e_branch2a_variance是什么情况?是网络结构发生了变化吗?
**A**:1.1版的轻量检测模型去掉了mv3结构中的se模块,可以对比下这两个配置文件:[det_mv3_db.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/configs/det/det_mv3_db.yml)[det_mv3_db_v1.1.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/configs/det/det_mv3_db_v1.1.yml)
#### Q3.3.15: 训练中使用的字典需要与加载的预训练模型使用的字典一样吗?
**A**:是的,训练的字典与你使用该模型进行预测的字典需要保持一致的。
#### Q3.3.16: 如何对检测模型finetune,比如冻结前面的层或某些层使用小的学习率学习?
**A**
**A**:如果是冻结某些层,可以将变量的stop_gradient属性设置为True,这样计算这个变量之前的所有参数都不会更新了,参考:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/faq/train_cn.html#id4
如果对某些层使用更小的学习率学习,静态图里还不是很方便,一个方法是在参数初始化的时候,给权重的属性设置固定的学习率,参考:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/fluid/param_attr/ParamAttr_cn.html#paramattr
实际上我们实验发现,直接加载模型去fine-tune,不设置某些层不同学习率,效果也都不错
#### Q3.3.17: 使用通用中文模型作为预训练模型,更改了字典文件,出现ctc_fc_b not used的错误
**A**:修改了字典之后,识别模型的最后一层FC纬度发生了改变,没有办法加载参数。这里是一个警告,可以忽略,正常训练即可。
### 预测部署 ### 预测部署
#### Q3.4.1:如何pip安装opt模型转换工具? #### Q3.4.1:如何pip安装opt模型转换工具?
...@@ -493,3 +557,11 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320) ...@@ -493,3 +557,11 @@ return paddle.reader.multiprocess_reader(readers, False, queue_size=320)
#### Q3.4.14:PaddleOCR模型部署方式有哪几种? #### Q3.4.14:PaddleOCR模型部署方式有哪几种?
**A**:目前有Inference部署,serving部署和手机端Paddle Lite部署,可根据不同场景做灵活的选择:Inference部署适用于本地离线部署,serving部署适用于云端部署,Paddle Lite部署适用于手机端集成。 **A**:目前有Inference部署,serving部署和手机端Paddle Lite部署,可根据不同场景做灵活的选择:Inference部署适用于本地离线部署,serving部署适用于云端部署,Paddle Lite部署适用于手机端集成。
#### Q3.4.15: hubserving、pdserving这两种部署方式区别是什么?
**A**:hubserving原本是paddlehub的配套服务部署工具,可以很方便的将paddlehub内置的模型部署为服务,paddleocr使用了这个功能,并将模型路径等参数暴露出来方便用户自定义修改。paddle serving是面向所有paddle模型的部署工具,文档中可以看到我们提供了快速版和标准版,其中快速版和hubserving的本质是一样的,而标准版基于rpc,更稳定,更适合分布式部署。
#### Q3.4.16: hub serving部署服务时如何多gpu同时利用起来,export CUDA_VISIBLE_DEVICES=0,1 方式吗?
**A**:hubserving的部署方式目前暂不支持多卡预测,除非手动启动多个serving,不同端口对应不同卡。或者可以使用paddleserving进行部署,部署工具已经发布:https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/pdserving ,在启动服务时--gpu_id 0,1 这样就可以
<a name="算法介绍"></a> <a name="算法介绍"></a>
## 算法介绍 ## 算法介绍
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v1.1 系列模型下载](./models_list.md)
- [1.文本检测算法](#文本检测算法) - [1.文本检测算法](#文本检测算法)
- [2.文本识别算法](#文本识别算法) - [2.文本识别算法](#文本识别算法)
...@@ -29,18 +31,9 @@ PaddleOCR开源的文本检测算法列表: ...@@ -29,18 +31,9 @@ PaddleOCR开源的文本检测算法列表:
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) **说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
使用[LSVT](./datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下:
|模型|骨干网络|配置文件|预训练模型|
|-|-|-|-|
|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化
PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./detection.md) PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./detection.md)
<a name="文本识别算法"></a> <a name="文本识别算法"></a>
### 2.文本识别算法 ### 2.文本识别算法
...@@ -68,11 +61,4 @@ PaddleOCR开源的文本识别算法列表: ...@@ -68,11 +61,4 @@ PaddleOCR开源的文本识别算法列表:
**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。 **说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。
原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。 原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。
使用[LSVT](./datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下:
|模型|骨干网络|配置文件|预训练模型|
|-|-|-|-|
|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md) PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
## 配置文件 Global 参数介绍 ## 配置文件 Global 参数介绍
`rec_chinese_lite_train.yml` 为例 `rec_chinese_lite_train_v1.1.yml ` 为例
| 字段 | 用途 | 默认值 | 备注 | | 字段 | 用途 | 默认值 | 备注 |
...@@ -32,6 +32,7 @@ ...@@ -32,6 +32,7 @@
| loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention | | loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention |
| distort | 设置是否使用数据增强 | false | 设置为true时,将在训练时随机进行扰动,支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) | | distort | 设置是否使用数据增强 | false | 设置为true时,将在训练时随机进行扰动,支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) |
| use_space_char | 设置是否识别空格 | false | 仅在 character_type=ch 时支持空格 | | use_space_char | 设置是否识别空格 | false | 仅在 character_type=ch 时支持空格 |
| label_list | 设置方向分类器支持的角度 | ['0','180'] | 仅在方向分类器中生效 |
| average_window | ModelAverage优化器中的窗口长度计算比例 | 0.15 | 目前仅应用与SRN | | average_window | ModelAverage优化器中的窗口长度计算比例 | 0.15 | 目前仅应用与SRN |
| max_average_window | 平均值计算窗口长度的最大值 | 15625 | 推荐设置为一轮训练中mini-batchs的数目| | max_average_window | 平均值计算窗口长度的最大值 | 15625 | 推荐设置为一轮训练中mini-batchs的数目|
| min_average_window | 平均值计算窗口长度的最小值 | 10000 | \ | | min_average_window | 平均值计算窗口长度的最小值 | 10000 | \ |
......
...@@ -19,3 +19,9 @@ ...@@ -19,3 +19,9 @@
- 工具地址:https://github.com/wkentaro/labelme - 工具地址:https://github.com/wkentaro/labelme
- 示意图: - 示意图:
![](../datasets/labelme.jpg) ![](../datasets/labelme.jpg)
### 4. Vott
- 工具描述:支持矩形,多边形等图片标注.支持视频标注.方便使用的快捷键以及比较好看的界面.同时支持导出多种标签格式.
- 工具地址:https://github.com/microsoft/VoTT
- 示意图:
![](../datasets/VoTT.jpg)
...@@ -44,13 +44,15 @@ json.dumps编码前的图像标注信息是包含多个字典的list,字典中 ...@@ -44,13 +44,15 @@ json.dumps编码前的图像标注信息是包含多个字典的list,字典中
## 快速启动训练 ## 快速启动训练
首先下载模型backbone的pretrain model,PaddleOCR的检测模型目前支持两种backbone,分别是MobileNetV3、ResNet50_vd 首先下载模型backbone的pretrain model,PaddleOCR的检测模型目前支持两种backbone,分别是MobileNetV3、ResNet_vd系列
您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures)中的模型更换backbone。 您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures)中的模型更换backbone。
```shell ```shell
cd PaddleOCR/ cd PaddleOCR/
# 下载MobileNetV3的预训练模型 # 下载MobileNetV3的预训练模型
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
# 下载ResNet50的预训练模型 # 或,下载ResNet18_vd的预训练模型
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar
# 或,下载ResNet50_vd的预训练模型
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
# 解压预训练模型文件,以MobileNetV3为例 # 解压预训练模型文件,以MobileNetV3为例
...@@ -72,24 +74,24 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model ...@@ -72,24 +74,24 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
```shell ```shell
# 训练 mv3_db 模型,并将训练日志保存为 tain_det.log # 训练 mv3_db 模型,并将训练日志保存为 tain_det.log
python3 tools/train.py -c configs/det/det_mv3_db.yml \ python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml \
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ \ -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ \
2>&1 | tee train_det.log 2>&1 | tee train_det.log
``` ```
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。 上述指令中,通过-c 选择训练使用configs/det/det_db_mv3_v1.1.yml配置文件。
有关配置文件的详细解释,请参考[链接](./config.md) 有关配置文件的详细解释,请参考[链接](./config.md)
您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001 您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001
``` ```
#### 断点训练 #### 断点训练
如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径: 如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径:
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model
``` ```
**注意**`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。 **注意**`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。
...@@ -98,17 +100,17 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you ...@@ -98,17 +100,17 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean。 PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean。
运行如下代码,根据配置文件`det_db_mv3.yml``save_res_path`指定的测试集检测结果文件,计算评估指标。 运行如下代码,根据配置文件`det_db_mv3_v1.1.yml``save_res_path`指定的测试集检测结果文件,计算评估指标。
评估时设置后处理参数`box_thresh=0.6``unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化 评估时设置后处理参数`box_thresh=0.6``unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。 训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。
比如: 比如:
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
* 注:`box_thresh``unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置 * 注:`box_thresh``unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置
...@@ -117,16 +119,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou ...@@ -117,16 +119,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou
测试单张图像的检测效果 测试单张图像的检测效果
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
测试DB模型时,调整后处理阈值, 测试DB模型时,调整后处理阈值,
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
测试文件夹下所有图像的检测效果 测试文件夹下所有图像的检测效果
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
inference 模型(`fluid.io.save_inference_model`保存的模型) inference 模型(`fluid.io.save_inference_model`保存的模型)
一般是模型训练完成后保存的固化模型,多用于预测部署。训练过程中保存的模型是checkpoints模型,保存的是模型的参数,多用于恢复训练等。 一般是模型训练完成后保存的固化模型,多用于预测部署。训练过程中保存的模型是checkpoints模型,保存的是模型的参数,多用于恢复训练等。
与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合与实际系统集成。更详细的介绍请参考文档[分类预测框架](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合与实际系统集成。更详细的介绍请参考文档[分类预测框架](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/extension/paddle_inference.md).
接下来首先介绍如何将训练的模型转换成inference模型,然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。 接下来首先介绍如何将训练的模型转换成inference模型,然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。
...@@ -23,8 +23,9 @@ inference 模型(`fluid.io.save_inference_model`保存的模型) ...@@ -23,8 +23,9 @@ inference 模型(`fluid.io.save_inference_model`保存的模型)
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理) - [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理) - [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
- [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理) - [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理)
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理) - [4. 基于SRN损失的识别模型推理](#基于SRN损失的识别模型推理)
- [5. 多语言模型的推理](#多语言模型的推理) - [5. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
- [6. 多语言模型的推理](#多语言模型的推理)
- [四、方向分类模型推理](#方向识别模型推理) - [四、方向分类模型推理](#方向识别模型推理)
- [1. 方向分类模型推理](#方向分类模型推理) - [1. 方向分类模型推理](#方向分类模型推理)
...@@ -41,7 +42,7 @@ inference 模型(`fluid.io.save_inference_model`保存的模型) ...@@ -41,7 +42,7 @@ inference 模型(`fluid.io.save_inference_model`保存的模型)
下载超轻量级中文检测模型: 下载超轻量级中文检测模型:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar -C ./ch_lite/
``` ```
上述模型是以MobileNetV3为backbone训练的DB算法,将训练好的模型转换成inference模型只需要运行如下命令: 上述模型是以MobileNetV3为backbone训练的DB算法,将训练好的模型转换成inference模型只需要运行如下命令:
``` ```
...@@ -50,7 +51,7 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar & ...@@ -50,7 +51,7 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar &
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。 # Global.save_inference_dir参数设置转换的模型将保存的地址。
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/ python3 tools/export_model.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_inference_dir=./inference/det_db/
``` ```
转inference模型时,使用的配置文件和训练时使用的配置文件相同。另外,还需要设置配置文件中的`Global.checkpoints``Global.save_inference_dir`参数。 转inference模型时,使用的配置文件和训练时使用的配置文件相同。另外,还需要设置配置文件中的`Global.checkpoints``Global.save_inference_dir`参数。
其中`Global.checkpoints`指向训练中保存的模型参数文件,`Global.save_inference_dir`是生成的inference模型要保存的目录。 其中`Global.checkpoints`指向训练中保存的模型参数文件,`Global.save_inference_dir`是生成的inference模型要保存的目录。
...@@ -66,7 +67,7 @@ inference/det_db/ ...@@ -66,7 +67,7 @@ inference/det_db/
下载超轻量中文识别模型: 下载超轻量中文识别模型:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_rec_train.tar -C ./ch_lite/
``` ```
识别模型转inference模型与检测的方式相同,如下: 识别模型转inference模型与检测的方式相同,如下:
...@@ -76,7 +77,7 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar ...@@ -76,7 +77,7 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。 # Global.save_inference_dir参数设置转换的模型将保存的地址。
python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ python3 tools/export_model.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_rec_train/best_accuracy \
Global.save_inference_dir=./inference/rec_crnn/ Global.save_inference_dir=./inference/rec_crnn/
``` ```
...@@ -94,7 +95,7 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa ...@@ -94,7 +95,7 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa
下载方向分类模型: 下载方向分类模型:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_pre.tar && tar xf ./ch_lite/ch_ppocr_mobile-v1.1.cls_pre.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_cls_train.tar -C ./ch_lite/
``` ```
方向分类模型转inference模型与检测的方式相同,如下: 方向分类模型转inference模型与检测的方式相同,如下:
...@@ -104,7 +105,7 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile- ...@@ -104,7 +105,7 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。 # Global.save_inference_dir参数设置转换的模型将保存的地址。
python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/cls_model/best_accuracy \ python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_cls_train/best_accuracy \
Global.save_inference_dir=./inference/cls/ Global.save_inference_dir=./inference/cls/
``` ```
...@@ -297,9 +298,21 @@ Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] ...@@ -297,9 +298,21 @@ Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555]
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str) dict_character = list(self.character_str)
``` ```
<a name="基于SRN损失的识别模型推理"></a>
### 4. 基于SRN损失的识别模型推理
基于SRN损失的识别模型,需要额外设置识别算法参数 --rec_algorithm="SRN"。 同时需要保证预测shape与训练时一致,如: --rec_image_shape="1, 64, 256"
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \
--rec_model_dir="./inference/srn/" \
--rec_image_shape="1, 64, 256" \
--rec_char_type="en" \
--rec_algorithm="SRN"
```
<a name="自定义文本识别字典的推理"></a> <a name="自定义文本识别字典的推理"></a>
### 4. 自定义文本识别字典的推理 ### 5. 自定义文本识别字典的推理
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径 如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径
``` ```
...@@ -307,12 +320,12 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png ...@@ -307,12 +320,12 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
``` ```
<a name="多语言模型的推理"></a> <a name="多语言模型的推理"></a>
### 5. 多语言模型的推理 ### 6. 多语言模型的推理
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果, 如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别: 需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/korean_dict.txt" --vis_font_path="doc/korean.ttf" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/korean.ttf"
``` ```
![](../imgs_words/korean/1.jpg) ![](../imgs_words/korean/1.jpg)
...@@ -351,9 +364,17 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963] ...@@ -351,9 +364,17 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963]
在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir``rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。可视化识别结果默认保存到 ./inference_results 文件夹里面。 在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir``rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。可视化识别结果默认保存到 ./inference_results 文件夹里面。
``` ```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls true # 使用方向分类器
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true
# 不使用方向分类器
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false
``` ```
执行命令后,识别结果图像如下: 执行命令后,识别结果图像如下:
![](../imgs_results/2.jpg) ![](../imgs_results/2.jpg)
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23 经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23
PaddleOCR 工作环境 PaddleOCR 工作环境
- PaddlePaddle 1.7+ - PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0.0.beta
- python3.7 - python3.7
- glibc 2.23 - glibc 2.23
- cuDNN 7.6+ (GPU) - cuDNN 7.6+ (GPU)
...@@ -47,19 +47,16 @@ docker images ...@@ -47,19 +47,16 @@ docker images
hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829
``` ```
**2. 安装PaddlePaddle Fluid v1.7** **2. 安装PaddlePaddle Fluid v2.0**
``` ```
pip3 install --upgrade pip pip3 install --upgrade pip
如果您的机器安装的是CUDA9,请运行以下命令安装 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
如果您的机器安装的是CUDA10,请运行以下命令安装
python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
如果您的机器是CPU,请运行以下命令安装 如果您的机器是CPU,请运行以下命令安装
python3 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
``` ```
......
...@@ -18,11 +18,11 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 ...@@ -18,11 +18,11 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
<a name="文本检测模型"></a> <a name="文本检测模型"></a>
### 一、文本检测模型 ### 一、文本检测模型
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|配置文件|推理模型大小|下载地址|
|-|-|-|-| |-|-|-|-|-|
|ch_ppocr_mobile_slim_v1.1_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|1.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)| |ch_ppocr_mobile_slim_v1.1_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|1.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|
|ch_ppocr_mobile_v1.1_det|原始超轻量模型,支持中英文、多语种文本检测|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)| |ch_ppocr_mobile_v1.1_det|原始超轻量模型,支持中英文、多语种文本检测|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|
|ch_ppocr_server_v1.1_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|47.2M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)| |ch_ppocr_server_v1.1_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[det_r18_vd_db_v1.1.yml](../../configs/det/det_r18_vd_db_v1.1.yml)|47.2M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)|
<a name="文本识别模型"></a> <a name="文本识别模型"></a>
...@@ -30,42 +30,42 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 ...@@ -30,42 +30,42 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
<a name="中文识别模型"></a> <a name="中文识别模型"></a>
#### 1. 中文识别模型 #### 1. 中文识别模型
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|配置文件|推理模型大小|下载地址|
|-|-|-|-| |-|-|-|-|-|
|ch_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|1.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)| |ch_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|1.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) |
|ch_ppocr_mobile_v1.1_rec|原始超轻量模型,支持中英文、数字识别|4.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar)| |ch_ppocr_mobile_v1.1_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|4.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
|ch_ppocr_server_v1.1_rec|通用模型,支持中英文、数字识别|105M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar)| |ch_ppocr_server_v1.1_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml)|105M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
**说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。 **说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。
<a name="英文识别模型"></a> <a name="英文识别模型"></a>
#### 2. 英文识别模型 #### 2. 英文识别模型
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|配置文件|推理模型大小|下载地址|
|-|-|-|-| |-|-|-|-|-|
|en_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|0.9M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb)| |en_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|0.9M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb) |
|en_ppocr_mobile_v1.1_rec|原始超轻量模型,支持英文、数字识别|2.0M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar)| |en_ppocr_mobile_v1.1_rec|原始超轻量模型,支持英文、数字识别|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|2.0M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar) |
<a name="多语言识别模型"></a> <a name="多语言识别模型"></a>
#### 3. 多语言识别模型(更多语言持续更新中...) #### 3. 多语言识别模型(更多语言持续更新中...)
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|配置文件|推理模型大小|下载地址|
|-|-|-|-| |-|-|-|-|-|
| french_ppocr_mobile_v1.1_rec |法文识别|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar)| | french_ppocr_mobile_v1.1_rec |法文识别|[rec_french_lite_train.yml](../../configs/rec/multi_languages/rec_french_lite_train.yml)|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar) |
| german_ppocr_mobile_v1.1_rec |德文识别|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar)| | german_ppocr_mobile_v1.1_rec |德文识别|[rec_ger_lite_train.yml](../../configs/rec/multi_languages/rec_ger_lite_train.yml)|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar) |
| korean_ppocr_mobile_v1.1_rec |韩文识别|3.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar)| | korean_ppocr_mobile_v1.1_rec |韩文识别|[rec_korean_lite_train.yml](../../configs/rec/multi_languages/rec_korean_lite_train.yml)|3.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar) |
| japan_ppocr_mobile_v1.1_rec |日文识别|3.7M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar)| | japan_ppocr_mobile_v1.1_rec |日文识别|[rec_japan_lite_train.yml](../../configs/rec/multi_languages/rec_japan_lite_train.yml)|3.7M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar) |
<a name="文本方向分类模型"></a> <a name="文本方向分类模型"></a>
### 三、文本方向分类模型 ### 三、文本方向分类模型
|模型名称|模型简介|推理模型大小|下载地址| |模型名称|模型简介|配置文件|推理模型大小|下载地址|
|-|-|-|-| |-|-|-|-|-|
|ch_ppocr_mobile_v1.1_cls_quant|slim量化版模型|0.5M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim模型]()| |ch_ppocr_mobile_v1.1_cls_quant|slim量化版模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|0.5M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) |
|ch_ppocr_mobile_v1.1_cls|原始模型|850kb|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar)| |ch_ppocr_mobile_v1.1_cls|原始模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|850kb|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |
## OCR模型列表(V1.0,7月16日更新) ## OCR模型列表(V1.0,7月16日更新)
|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| |模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址|
|-|-|-|-|-| |-|-|-|-|-|
|chinese_db_crnn_mobile|8.6M超轻量级中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) |chinese_db_crnn_mobile|8.6M超轻量级中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|通用中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) |chinese_db_crnn_server|通用中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
...@@ -9,12 +9,15 @@ ...@@ -9,12 +9,15 @@
## 2.inference模型下载 ## 2.inference模型下载
|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| * 移动端和服务器端的检测与识别模型如下,更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](../doc_ch/models_list.md)
|-|-|-|-|-|
|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下* | 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
* windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下
复制上表中的检测和识别的`inference模型`下载地址,并解压 复制上表中的检测和识别的`inference模型`下载地址,并解压
...@@ -24,6 +27,8 @@ mkdir inference && cd inference ...@@ -24,6 +27,8 @@ mkdir inference && cd inference
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# 下载识别模型并解压 # 下载识别模型并解压
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package} wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
# 下载方向分类器模型并解压
wget {url/of/classification/inference_model} && tar xf {name/of/classification/inference_model/package}
cd .. cd ..
``` ```
...@@ -32,9 +37,11 @@ cd .. ...@@ -32,9 +37,11 @@ cd ..
``` ```
mkdir inference && cd inference mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压 # 下载超轻量级中文OCR模型的检测模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压 # 下载超轻量级中文OCR模型的识别模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_infer.tar
# 下载超轻量级中文OCR模型的文本方向分类器模型并解压
wget https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar && tar xf ch_ppocr_mobile_v1.1_cls_infer.tar
cd .. cd ..
``` ```
...@@ -42,10 +49,13 @@ cd .. ...@@ -42,10 +49,13 @@ cd ..
``` ```
|-inference |-inference
|-ch_rec_mv3_crnn |-ch_ppocr_mobile_v1.1_det_infer
|- model
|- params
|-ch_ppocr_mobile_v1.1_rec_infer
|- model |- model
|- params |- params
|-ch_det_mv3_db |-ch_ppocr_mobile-v1.1_cls_infer
|- model |- model
|- params |- params
... ...
...@@ -53,42 +63,37 @@ cd .. ...@@ -53,42 +63,37 @@ cd ..
## 3.单张图像或者图像集合预测 ## 3.单张图像或者图像集合预测
以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数`det_model_dir`指定检测inference模型的路径、参数`rec_model_dir`指定识别inference模型的路径、参数`use_angle_cls`指定是否使用方向分类器、参数`cls_model_dir`指定方向分类器inference模型的路径、参数`use_space_char`指定是否预测空格字符。可视化识别结果默认保存到`./inference_results`文件夹里面。
```bash ```bash
# 预测image_dir指定的单张图像 # 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
# 预测image_dir指定的图像集合 # 预测image_dir指定的图像集合
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
# 如果想使用CPU进行预测,需设置use_gpu参数为False # 如果想使用CPU进行预测,需设置use_gpu参数为False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False
``` ```
- 通用中文OCR模型 - 通用中文OCR模型
请按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: 请按照上述步骤下载相应的模型,并且更新相关的参数,示例如下:
```
```bash
# 预测image_dir指定的单张图像 # 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_server_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
``` ```
- 支持空格的通用中文OCR模型 * 注意:
- 如果希望使用不支持空格的识别模型,在预测的时候需要注意:请将代码更新到最新版本,并添加参数 `--use_space_char=False`
- 如果不希望使用方向分类器,在预测的时候需要注意:请将代码更新到最新版本,并添加参数 `--use_angle_cls=False`
请按照上述步骤下载相应的模型,并且更新相关的参数,示例如下:
*注意:请将代码更新到最新版本,并添加参数 `--use_space_char=True` *
```
# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" --use_space_char=True
```
更多的文本检测、识别串联推理使用方式请参考文档教程中[基于Python预测引擎推理](./inference.md) 更多的文本检测、识别串联推理使用方式请参考文档教程中[基于Python预测引擎推理](./inference.md)
此外,文档教程中也提供了中文OCR模型的其他预测部署方式: 此外,文档教程中也提供了中文OCR模型的其他预测部署方式:
- [基于C++预测引擎推理](../../deploy/cpp_infer/readme.md) - [基于C++预测引擎推理](../../deploy/cpp_infer/readme.md)
- [服务部署](./serving.md) - [服务部署](./serving_inference.md)
- [端侧部署](../../deploy/lite/readme.md) - [端侧部署](../../deploy/lite/readme.md)
## 文字识别 ## 文字识别
- [一、数据准备](#数据准备)
- [数据下载](#数据下载)
- [自定义数据集](#自定义数据集)
- [字典](#字典)
- [支持空格](#支持空格)
- [二、启动训练](#文本检测模型推理)
- [1. 数据增强](#数据增强)
- [2. 训练](#训练)
- [3. 小语种](#小语种)
- [三、评估](#评估)
- [四、预测](#预测)
- [1. 训练引擎预测](#训练引擎预测)
<a name="数据准备"></a>
### 数据准备 ### 数据准备
...@@ -13,16 +32,18 @@ PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据,调试算 ...@@ -13,16 +32,18 @@ PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据,调试算
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
``` ```
<a name="数据下载"></a>
* 数据下载 * 数据下载
若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),下载 benchmark 所需的lmdb格式数据集。 若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),下载 benchmark 所需的lmdb格式数据集。
如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。 如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。
* 使用自己数据集: <a name="自定义数据集"></a>
* 使用自己数据集
若您希望使用自己的数据进行训练,请参考下文组织您的数据。 若您希望使用自己的数据进行训练,请参考下文组织您的数据。
- 训练集 - 训练集
首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(rec_gt_train.txt)记录图片路径和标签。 首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(rec_gt_train.txt)记录图片路径和标签。
...@@ -77,7 +98,7 @@ python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_ ...@@ -77,7 +98,7 @@ python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_
|- word_003.jpg |- word_003.jpg
| ... | ...
``` ```
<a name="字典"></a>
- 字典 - 字典
最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。 最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。
...@@ -96,21 +117,36 @@ n ...@@ -96,21 +117,36 @@ n
word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,“and” 将被映射成 [2 5 1] word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,“and” 将被映射成 [2 5 1]
`ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典, `ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典,
`ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典, `ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典,
`ppocr/utils/dict/french_dict.txt` 是一个包含118个字符的法文字典
`ppocr/utils/dict/japan_dict.txt` 是一个包含4399个字符的法文字典
`ppocr/utils/dict/korean_dict.txt` 是一个包含3636个字符的法文字典
`ppocr/utils/dict/german_dict.txt` 是一个包含131个字符的法文字典
您可以按需使用。 您可以按需使用。
目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体**
如您愿意可将字典文件提交至 [dict](../../ppocr/utils/dict) 将语料文件提交至[corpus](../../ppocr/utils/corpus),我们会在Repo中感谢您。
- 自定义字典 - 自定义字典
如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。 如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。
并将 `character_type` 设置为 `ch` 并将 `character_type` 设置为 `ch`
<a name="支持空格"></a>
- 添加空格类别 - 添加空格类别
如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true` 如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true`
**注意:`use_space_char` 仅在 `character_type=ch` 时生效** **注意:`use_space_char` 仅在 `character_type=ch` 时生效**
<a name="启动训练"></a>
### 启动训练 ### 启动训练
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例: PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
...@@ -131,14 +167,12 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar ...@@ -131,14 +167,12 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
``` ```
# 设置PYTHONPATH路径
export PYTHONPATH=$PYTHONPATH:.
# GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号 # GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
# 训练icdar15英文数据 并将训练日志保存为 tain_rec.log # 训练icdar15英文数据 并将训练日志保存为 tain_rec.log
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log
``` ```
<a name="数据增强"></a>
- 数据增强 - 数据增强
PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true` PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`
...@@ -149,6 +183,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入 ...@@ -149,6 +183,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入
*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux* *由于OpenCV的兼容性问题,扰动操作暂时只支持Linux*
<a name="训练"></a>
- 训练 - 训练
PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy`
...@@ -160,7 +195,10 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t ...@@ -160,7 +195,10 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
| 配置文件 | 算法名称 | backbone | trans | seq | pred | | 配置文件 | 算法名称 | backbone | trans | seq | pred |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: |
| [rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| [rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc |
| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | | rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc |
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
...@@ -172,7 +210,7 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t ...@@ -172,7 +210,7 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
| rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn | | rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn |
训练中文数据,推荐使用`rec_chinese_lite_train.yml`,如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: 训练中文数据,推荐使用[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件:
`rec_mv3_none_none_ctc.yml` 为例: `rec_mv3_none_none_ctc.yml` 为例:
``` ```
...@@ -208,20 +246,54 @@ Optimizer: ...@@ -208,20 +246,54 @@ Optimizer:
``` ```
**注意,预测/评估时的配置文件请务必与训练一致。** **注意,预测/评估时的配置文件请务必与训练一致。**
<a name="小语种"></a>
- 小语种 - 小语种
PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的提供了多语言的配置文件,目前PaddleOCR支持的多语言算法有: PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的提供了多语言的配置文件,目前PaddleOCR支持的多语言算法有:
| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | | 配置文件 | 算法名称 | backbone | trans | seq | pred | language |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 | | rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 |
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | | rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 |
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | | rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 |
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | | rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 |
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | | rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 |
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以在 [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 上下载,提取码:frgi。
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体和测试数据可以在[百度网盘]()上下载。 如您希望在现有模型效果的基础上调优,请参考下列说明修改配置文件:
`rec_french_lite_train` 为例:
```
Global:
...
# 添加自定义字典,如修改字典请将路径指向新字典
character_dict_path: ./ppocr/utils/dict/french_dict.txt
# 训练时添加数据增强
distort: true
# 识别空格
use_space_char: true
...
# 修改reader类型
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
...
...
```
同时需要修改数据读取文件 `rec_french_reader.yml`
```
TrainReader:
...
# 修改训练数据存放的目录名
img_set_dir: ./train_data
# 修改 label 文件名称
label_file_path: ./train_data/french_train.txt
...
```
<a name="评估"></a>
### 评估 ### 评估
评估数据集可以通过 `configs/rec/rec_icdar15_reader.yml` 修改EvalReader中的 `label_file_path` 设置。 评估数据集可以通过 `configs/rec/rec_icdar15_reader.yml` 修改EvalReader中的 `label_file_path` 设置。
...@@ -233,8 +305,10 @@ export CUDA_VISIBLE_DEVICES=0 ...@@ -233,8 +305,10 @@ export CUDA_VISIBLE_DEVICES=0
python3 tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy python3 tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
``` ```
<a name="预测"></a>
### 预测 ### 预测
<a name="训练引擎预测"></a>
* 训练引擎的预测 * 训练引擎的预测
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。 使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
...@@ -258,12 +332,12 @@ infer_img: doc/imgs_words/en/word_1.png ...@@ -258,12 +332,12 @@ infer_img: doc/imgs_words/en/word_1.png
word : joint word : joint
``` ```
预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml` 完成了中文模型的训练, 预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml` 完成了中文模型的训练,
您可以使用如下命令进行中文模型预测。 您可以使用如下命令进行中文模型预测。
``` ```
# 预测中文结果 # 预测中文结果
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg
``` ```
预测图片: 预测图片:
......
PaddleOCR提供2种服务部署方式:
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",按照本教程使用。。
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",使用方法参考[文档](../../deploy/hubserving/readme.md)
# 使用Paddle Serving预测推理
阅读本文档之前,请先阅读文档 [基于Python预测引擎推理](./inference.md)
同本地执行预测一样,我们需要保存一份可以用于Paddle Serving的模型。
接下来首先介绍如何将训练的模型转换成Paddle Serving模型,然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。
### 一、 准备环境
我们先安装Paddle Serving相关组件
我们推荐用户使用GPU来做Paddle Serving的OCR服务部署
**CUDA版本:9.X/10.X**
**CUDNN版本:7.X**
**操作系统版本:Linux/Windows**
**Python版本: 2.7/3.5/3.6/3.7**
**Python操作指南:**
目前Serving用于OCR的部分功能还在测试当中,因此在这里我们给出[Servnig latest package](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)
大家根据自己的环境选择需要安装的whl包即可,例如以Python 3.5为例,执行下列命令
```
#CPU/GPU版本选择一个
#GPU版本服务端
#CUDA 9
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post9-py3-none-any.whl
#CUDA 10
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post10-py3-none-any.whl
#CPU版本服务端
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.0.0-py3-none-any.whl
#客户端和App包使用以下链接(CPU,GPU通用)
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp36-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.0.0-py3-none-any.whl
```
## 二、训练模型转Serving模型
在前序文档 [基于Python预测引擎推理](./inference.md) 中,我们提供了如何把训练的checkpoint转换成Paddle模型。Paddle模型通常由一个文件夹构成,内含模型结构描述文件`model`和模型参数文件`params`。Serving模型由两个文件夹构成,用于存放客户端和服务端的配置。
我们以`ch_rec_r34_vd_crnn`模型作为例子,下载链接在:
```
wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
tar xf ch_rec_r34_vd_crnn_infer.tar
```
因此我们按照Serving模型转换教程,运行下列python文件。
```
python tools/inference_to_serving.py --model_dir ch_rec_r34_vd_crnn
```
最终会在`serving_client_dir``serving_server_dir`生成客户端和服务端的模型配置。其中`serving_server_dir``serving_client_dir`的名字可以自定义。最终文件结构如下
```
/ch_rec_r34_vd_crnn/
├── serving_client_dir # 客户端配置文件夹
└── serving_server_dir # 服务端配置文件夹
```
## 三、文本检测模型Serving推理
启动服务可以根据实际需求选择启动`标准版`或者`快速版`,两种方式的对比如下表:
|版本|特点|适用场景|
|-|-|-|
|标准版|稳定性高,分布式部署|适用于吞吐量大,需要跨机房部署的情况|
|快速版|部署方便,预测速度快|适用于对预测速度要求高,迭代速度快的场景,Windows用户只能选择快速版|
接下来的命令中,我们会指定快速版和标准版的命令。需要说明的是,标准版只能用Linux平台,快速版可以支持Linux/Windows。
文本检测模型推理,默认使用DB模型的配置参数,识别默认为CRNN。
配置文件在`params.py`中,我们贴出配置部分,如果需要做改动,也在这个文件内部进行修改。
```
def read_params():
cfg = Config()
#use gpu
cfg.use_gpu = False # 是否使用GPU
cfg.use_pdserving = True # 是否使用paddleserving,必须为True
#params for text detector
cfg.det_algorithm = "DB" # 检测算法, DB/EAST等
cfg.det_model_dir = "./det_mv_server/" # 检测算法模型路径
cfg.det_max_side_len = 960
#DB params
cfg.det_db_thresh =0.3
cfg.det_db_box_thresh =0.5
cfg.det_db_unclip_ratio =2.0
#EAST params
cfg.det_east_score_thresh = 0.8
cfg.det_east_cover_thresh = 0.1
cfg.det_east_nms_thresh = 0.2
#params for text recognizer
cfg.rec_algorithm = "CRNN" # 识别算法, CRNN/RARE等
cfg.rec_model_dir = "./ocr_rec_server/" # 识别算法模型路径
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr_keys_v1.txt" # 识别算法字典文件
cfg.use_space_char = True
#params for text classifier
cfg.use_angle_cls = True # 是否启用分类算法
cfg.cls_model_dir = "./ocr_clas_server/" # 分类算法模型路径
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
return cfg
```
与本地预测不同的是,Serving预测需要一个客户端和一个服务端,因此接下来的教程都是两行代码。
在正式执行服务端启动命令之前,先export PYTHONPATH到工程主目录下。
```
export PYTHONPATH=$PWD:$PYTHONPATH
cd deploy/pdserving
```
为了方便用户复现Demo程序,我们提供了Chinese and English ultra-lightweight OCR model (8.1M)版本的Serving模型
```
wget --no-check-certificate https://paddleocr.bj.bcebos.com/deploy/pdserving/ocr_pdserving_suite.tar.gz
tar xf ocr_pdserving_suite.tar.gz
```
### 1. 超轻量中文检测模型推理
超轻量中文检测模型推理,可以执行如下命令启动服务端:
```
#根据环境只需要启动其中一个就可以
python det_rpc_server.py #标准版,Linux用户
python det_local_server.py #快速版,Windows/Linux用户
```
客户端
```
python det_web_client.py
```
Serving的推测和本地预测不同点在于,客户端发送请求到服务端,服务端需要检测到文字框之后返回框的坐标,此处没有后处理的图片,只能看到坐标值。
## 四、文本识别模型Serving推理
下面将介绍超轻量中文识别模型推理、基于CTC损失的识别模型推理和基于Attention损失的识别模型推理。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。此外,如果训练时修改了文本的字典,请参考下面的自定义文本识别字典的推理。
### 1. 超轻量中文识别模型推理
超轻量中文识别模型推理,可以执行如下命令启动服务端:
需要注意params.py中的`--use_gpu`的值
```
#根据环境只需要启动其中一个就可以
python rec_rpc_server.py #标准版,Linux用户
python rec_local_server.py #快速版,Windows/Linux用户
```
如果需要使用CPU版本,还需增加 `--use_gpu False`
客户端
```
python rec_web_client.py
```
![](../imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:
```
{u'result': {u'score': [u'0.89547354'], u'pred_text': ['实力活力']}}
```
## 五、方向分类模型推理
下面将介绍方向分类模型推理。
### 1. 方向分类模型推理
方向分类模型推理, 可以执行如下命令启动服务端:
需要注意params.py中的`--use_gpu`的值
```
#根据环境只需要启动其中一个就可以
python clas_rpc_server.py #标准版,Linux用户
python clas_local_server.py #快速版,Windows/Linux用户
```
客户端
```
python rec_web_client.py
```
![](../imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下:
```
{u'result': {u'direction': [u'0'], u'score': [u'0.9999963']}}
```
## 六、文本检测、方向分类和文字识别串联Serving推理
### 1. 超轻量中文OCR模型推理
在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir``rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。与本地预测不同的是,为了减少网络传输耗时,可视化识别结果目前不做处理,用户收到的是推理得到的文字字段。
执行如下命令启动服务端:
需要注意params.py中的`--use_gpu`的值
```
#标准版,Linux用户
#GPU用户
python -m paddle_serving_server_gpu.serve --model det_infer_server --port 9293 --gpu_id 0
python -m paddle_serving_server_gpu.serve --model cls_infer_server --port 9294 --gpu_id 0
python ocr_rpc_server.py
#CPU用户
python -m paddle_serving_server.serve --model det_infer_server --port 9293
python -m paddle_serving_server.serve --model cls_infer_server --port 9294
python ocr_rpc_server.py
#快速版,Windows/Linux用户
python ocr_local_server.py
```
客户端
```
python rec_web_client.py
```
# 更新 # 更新
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载) - 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载) - 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](../../README_ch.md#PP-OCR)),适合在移动端部署使用。[模型下载](../../README_ch.md#模型下载)
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md) - 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](../../README_ch.md#模型下载)
- 2020.9.17 更新[英文识别模型](./models_list.md#english-recognition-model)[多语种识别模型](./models_list.md#english-recognition-model),已支持`德语、法语、日语、韩语`,更多语种识别模型将持续更新。
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./FAQ.md)
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md) - 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) - 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294) - 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
...@@ -10,8 +12,8 @@ ...@@ -10,8 +12,8 @@
- 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark - 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark
- 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具 - 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具
- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档 - 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md) - 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./config.md)
- 2020.6.8 添加[数据集](./doc/doc_ch/datasets.md),并保持持续更新 - 2020.6.8 添加[数据集](./datasets.md),并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model` - 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分 - 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验 - 2020.5.30 提供超轻量级中文OCR在线体验
......
# 效果展示 # 效果展示
- PP-OCR 1.1系列模型效果
- [通用ppocr_server_1.1效果展示](#通用ppocr_server_1.1效果展示)
- [通用ppocr_mobile_1.1效果展示(待补充)]()
- PP-OCR 1.0系列模型效果
- [超轻量ppocr_mobile_1.0效果展示](#超轻量ppocr_mobile_1.0效果展示)
- [通用ppocr_server_1.0效果展示](#通用ppocr_server_1.0效果展示)
<a name="通用ppocr_server_1.1效果展示"></a> <a name="通用ppocr_server_1.1效果展示"></a>
## 通用ppocr_server_1.1效果展示 ## 通用ppocr_server_1.1效果展示
<div align="center"> <div align="center">
<img src="../imgs_results/1101.jpg" width="800">
<img src="../imgs_results/1102.jpg" width="800"> <img src="../imgs_results/1102.jpg" width="800">
<img src="../imgs_results/1103.jpg" width="800"> <img src="../imgs_results/1103.jpg" width="800">
<img src="../imgs_results/1104.jpg" width="800"> <img src="../imgs_results/1104.jpg" width="800">
<img src="../imgs_results/1105.jpg" width="800"> <img src="../imgs_results/1105.jpg" width="800">
<img src="../imgs_results/1106.jpg" width="800">
</div>
<a name="英文识别模型效果展示"></a>
## 英文识别模型效果展示
<div align="center">
<img src="../imgs_results/img_12.jpg" width="800">
</div>
<a name="多语言识别模型效果展示"></a>
## 多语言识别模型效果展示
<div align="center">
<img src="../imgs_results/1110.jpg" width="800"> <img src="../imgs_results/1110.jpg" width="800">
<img src="../imgs_results/1112.jpg" width="800"> <img src="../imgs_results/1112.jpg" width="800">
</div> </div>
<a name="超轻量ppocr_mobile_1.0效果展示"></a> <a name="超轻量ppocr_mobile_1.0效果展示"></a>
## 超轻量ppocr_mobile_1.0效果展示 ## 超轻量ppocr_mobile_1.0效果展示
<div align="center"> <div align="center">
<img src="../imgs_results/1.jpg" width="800"> <img src="../imgs_results/1.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/7.jpg" width="800"> <img src="../imgs_results/7.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/6.jpg" width="800"> <img src="../imgs_results/6.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/16.png" width="800"> <img src="../imgs_results/16.png" width="800">
</div> </div>
...@@ -45,12 +44,6 @@ ...@@ -45,12 +44,6 @@
<div align="center"> <div align="center">
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800"> <img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800"> <img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800"> <img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
</div> </div>
...@@ -11,8 +11,8 @@ pip install paddleocr ...@@ -11,8 +11,8 @@ pip install paddleocr
本地构建并安装 本地构建并安装
```bash ```bash
python setup.py bdist_wheel python3 setup.py bdist_wheel
pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号 pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号
``` ```
### 1. 代码使用 ### 1. 代码使用
...@@ -20,7 +20,7 @@ pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本 ...@@ -20,7 +20,7 @@ pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本
```python ```python
from paddleocr import PaddleOCR, draw_ocr from paddleocr import PaddleOCR, draw_ocr
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换 # Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
# 参数依次为`zh`, `en`, `french`, `german`, `korean`, `japan`。 # 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg' img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path, cls=True) result = ocr.ocr(img_path, cls=True)
...@@ -280,7 +280,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_ ...@@ -280,7 +280,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
| rec_algorithm | 使用的识别算法类型 | CRNN | | rec_algorithm | 使用的识别算法类型 | CRNN |
| rec_model_dir | 识别模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None | | rec_model_dir | 识别模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
| rec_image_shape | 识别算法的输入图片尺寸 | "3,32,320" | | rec_image_shape | 识别算法的输入图片尺寸 | "3,32,320" |
| rec_char_type | 识别算法的字符类型,中文(ch)或英文(en) | ch | | rec_char_type | 识别算法的字符类型,中英文(ch)、英文(en)、法语(french)、德语(german)、韩语(korean)、日语(japan) | ch |
| rec_batch_num | 进行识别时,同时前向的图片数 | 30 | | rec_batch_num | 进行识别时,同时前向的图片数 | 30 |
| max_text_length | 识别算法能识别的最大文字长度 | 25 | | max_text_length | 识别算法能识别的最大文字长度 | 25 |
| rec_char_dict_path | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径 | ./ppocr/utils/ppocr_keys_v1.txt | | rec_char_dict_path | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径 | ./ppocr/utils/ppocr_keys_v1.txt |
......
<a name="Algorithm_introduction"></a>
## Algorithm introduction ## Algorithm introduction
[TOC] This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v1.1 models list](./models_list_en.md).
<a name="TEXTDETECTIONALGORITHM"></a>
- [1. Text Detection Algorithm](#TEXTDETECTIONALGORITHM)
- [2. Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM)
<a name="TEXTDETECTIONALGORITHM"></a>
### 1. Text Detection Algorithm ### 1. Text Detection Algorithm
PaddleOCR open source text detection algorithms list: PaddleOCR open source text detection algorithms list:
...@@ -29,14 +33,6 @@ On Total-Text dataset, the text detection result is as follows: ...@@ -29,14 +33,6 @@ On Total-Text dataset, the text detection result is as follows:
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). **Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for text detection task are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|ultra-lightweight OCR model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|General OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
<a name="TEXTRECOGNITIONALGORITHM"></a> <a name="TEXTRECOGNITIONALGORITHM"></a>
...@@ -67,11 +63,4 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r ...@@ -67,11 +63,4 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar). The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar).
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w training data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the model. The related configuration and pre-trained models are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|ultra-lightweight OCR model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
|General OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)|
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
...@@ -110,7 +110,7 @@ The default prediction picture is stored in `infer_img`, and the weight is speci ...@@ -110,7 +110,7 @@ The default prediction picture is stored in `infer_img`, and the weight is speci
``` ```
# Predict English results # Predict English results
python3 tools/infer_rec.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
``` ```
Input image: Input image:
......
...@@ -10,7 +10,7 @@ The following list can be viewed via `--help` ...@@ -10,7 +10,7 @@ The following list can be viewed via `--help`
## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE ## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE
Take `rec_chinese_lite_train.yml` as an example Take `rec_chinese_lite_train_v1.1.yml` as an example
| Parameter | Use | Default | Note | | Parameter | Use | Default | Note |
...@@ -32,6 +32,7 @@ Take `rec_chinese_lite_train.yml` as an example ...@@ -32,6 +32,7 @@ Take `rec_chinese_lite_train.yml` as an example
| loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention | | loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention |
| distort | Set use distort | false | Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) | | distort | Set use distort | false | Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) |
| use_space_char | Wether to recognize space | false | Only support in character_type=ch mode | | use_space_char | Wether to recognize space | false | Only support in character_type=ch mode |
label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in the direction classifier |
| reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ | | reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ |
| pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ | | pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ |
| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | | checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption |
......
...@@ -38,12 +38,14 @@ If you want to train PaddleOCR on other datasets, please build the annotation fi ...@@ -38,12 +38,14 @@ If you want to train PaddleOCR on other datasets, please build the annotation fi
## TRAINING ## TRAINING
First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
```shell ```shell
cd PaddleOCR/ cd PaddleOCR/
# Download the pre-trained model of MobileNetV3 # Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
# Download the pre-trained model of ResNet50 # or, download the pre-trained model of ResNet18_vd
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar
# or, download the pre-trained model of ResNet50_vd
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
# decompressing the pre-training model file, take MobileNetV3 as an example # decompressing the pre-training model file, take MobileNetV3 as an example
...@@ -62,7 +64,7 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model ...@@ -62,7 +64,7 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
#### START TRAINING #### START TRAINING
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.* *If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml 2>&1 | tee train_det.log python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml 2>&1 | tee train_det.log
``` ```
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file. In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
...@@ -70,7 +72,7 @@ For a detailed explanation of the configuration file, please refer to [config](. ...@@ -70,7 +72,7 @@ For a detailed explanation of the configuration file, please refer to [config](.
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001
``` ```
#### load trained model and continue training #### load trained model and continue training
...@@ -78,7 +80,7 @@ If you expect to load trained model and continue the training again, you can spe ...@@ -78,7 +80,7 @@ If you expect to load trained model and continue the training again, you can spe
For example: For example:
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model
``` ```
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded. **Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
...@@ -88,18 +90,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you ...@@ -88,18 +90,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml` Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3_v1.1.yml`
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result. When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
Such as: Such as:
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. * Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
...@@ -108,16 +110,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou ...@@ -108,16 +110,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou
Test the detection result on a single image: Test the detection result on a single image:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
When testing the DB model, adjust the post-processing threshold: When testing the DB model, adjust the post-processing threshold:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
Test the detection result on all images in the folder: Test the detection result on all images in the folder:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
...@@ -5,7 +5,7 @@ The inference model (the model saved by `fluid.io.save_inference_model`) is gene ...@@ -5,7 +5,7 @@ The inference model (the model saved by `fluid.io.save_inference_model`) is gene
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/extension/paddle_inference.md).
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model. Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model.
...@@ -26,7 +26,9 @@ Next, we first introduce how to convert a trained model into an inference model, ...@@ -26,7 +26,9 @@ Next, we first introduce how to convert a trained model into an inference model,
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION) - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION) - [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
- [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION) - [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
- [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS) - [4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE](#SRN-BASED_RECOGNITION)
- [5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
- [6. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) - [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) - [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
...@@ -42,8 +44,9 @@ Next, we first introduce how to convert a trained model into an inference model, ...@@ -42,8 +44,9 @@ Next, we first introduce how to convert a trained model into an inference model,
Download the lightweight Chinese detection model: Download the lightweight Chinese detection model:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar -C ./ch_lite/
``` ```
The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command: The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command:
``` ```
# -c Set the training algorithm yml configuration file # -c Set the training algorithm yml configuration file
...@@ -51,9 +54,9 @@ The above model is a DB algorithm trained with MobileNetV3 as the backbone. To c ...@@ -51,9 +54,9 @@ The above model is a DB algorithm trained with MobileNetV3 as the backbone. To c
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. # Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved. # Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy \ python3 tools/export_model.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_inference_dir=./inference/det_db/
Global.save_inference_dir=./inference/det_db/
``` ```
When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file. When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file.
`Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved. `Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved.
After the conversion is successful, there are two files in the `save_inference_dir` directory: After the conversion is successful, there are two files in the `save_inference_dir` directory:
...@@ -68,7 +71,7 @@ inference/det_db/ ...@@ -68,7 +71,7 @@ inference/det_db/
Download the lightweight Chinese recognition model: Download the lightweight Chinese recognition model:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar && tar xf ch_ppocr_mobile_v1.1_rec_train.tar -C ./ch_lite/
``` ```
The recognition model is converted to the inference model in the same way as the detection, as follows: The recognition model is converted to the inference model in the same way as the detection, as follows:
...@@ -78,8 +81,7 @@ The recognition model is converted to the inference model in the same way as the ...@@ -78,8 +81,7 @@ The recognition model is converted to the inference model in the same way as the
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. # Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved. # Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ python3 tools/export_model.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_rec_train/best_accuracy \
Global.save_inference_dir=./inference/rec_crnn/
``` ```
If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path. If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path.
...@@ -96,7 +98,7 @@ After the conversion is successful, there are two files in the directory: ...@@ -96,7 +98,7 @@ After the conversion is successful, there are two files in the directory:
Download the angle classification model: Download the angle classification model:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_pre.tar && tar xf ./ch_lite/ch_ppocr_mobile-v1.1.cls_pre.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_cls_train.tar -C ./ch_lite/
``` ```
The angle classification model is converted to the inference model in the same way as the detection, as follows: The angle classification model is converted to the inference model in the same way as the detection, as follows:
...@@ -106,7 +108,7 @@ The angle classification model is converted to the inference model in the same w ...@@ -106,7 +108,7 @@ The angle classification model is converted to the inference model in the same w
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. # Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved. # Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/cls_model/best_accuracy \ python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_cls_train/best_accuracy \
Global.save_inference_dir=./inference/cls/ Global.save_inference_dir=./inference/cls/
``` ```
...@@ -299,31 +301,45 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" ...@@ -299,31 +301,45 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str) dict_character = list(self.character_str)
``` ```
<a name="SRN-BASED_RECOGNITION"></a>
### 4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE
The recognition model based on SRN requires additional setting of the recognition algorithm parameter --rec_algorithm="SRN".
At the same time, it is necessary to ensure that the predicted shape is consistent with the training, such as: --rec_image_shape="1, 64, 256"
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \
--rec_model_dir="./inference/srn/" \
--rec_image_shape="1, 64, 256" \
--rec_char_type="en" \
--rec_algorithm="SRN"
```
<a name="USING_CUSTOM_CHARACTERS"></a> <a name="USING_CUSTOM_CHARACTERS"></a>
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY ### 5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict. If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict.
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
``` ```
<a name="Multilingual model inference"></a> <a name="MULTILINGUAL_MODEL_INFERENCE"></a>
### 6. MULTILINGAUL MODEL INFERENCE
### 5. Multilingual Model Reasoning
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition: You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/ utils/korean_dict.txt" --vis_font_path="doc/korean.ttf" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/korean.ttf"
``` ```
![](../imgs_words/korean/1.jpg) ![](../imgs_words/korean/1.jpg)
After executing the command, the prediction result of the above figure is: After executing the command, the prediction result of the above figure is:
``` text ``` text
2020-09-19 16:15:05,076-INFO: index: [205 206 38 39] 2020-09-19 16:15:05,076-INFO: index: [205 206 38 39]
2020-09-19 16:15:05,077-INFO: word : 바탕으로 2020-09-19 16:15:05,077-INFO: word : 바탕으로
2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535 2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535
``` ```
<a name="ANGLE_CLASSIFICATION_MODEL_INFERENCE"></a> <a name="ANGLE_CLASSIFICATION_MODEL_INFERENCE"></a>
...@@ -357,7 +373,11 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963] ...@@ -357,7 +373,11 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963]
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model.The visualized recognition results are saved to the `./inference_results` folder by default. When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model.The visualized recognition results are saved to the `./inference_results` folder by default.
``` ```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls true # use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true
# not use use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/"
``` ```
After executing the command, the recognition result image is as follows: After executing the command, the recognition result image is as follows:
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
PaddleOCR working environment: PaddleOCR working environment:
- PaddlePaddle1.7 - PaddlePaddle1.8+, Recommend PaddlePaddle 2.0.0.beta
- python3.7 - python3.7
- glibc 2.23 - glibc 2.23
...@@ -49,18 +49,15 @@ docker images ...@@ -49,18 +49,15 @@ docker images
hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829
``` ```
**2. Install PaddlePaddle Fluid v1.7 (the higher version is not supported yet, the adaptation work is in progress)** **2. Install PaddlePaddle Fluid v2.0**
``` ```
pip3 install --upgrade pip pip3 install --upgrade pip
# If you have cuda9 installed on your machine, please run the following command to install # If you have cuda9 or cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
# If you have cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
# If you only have cpu on your machine, please run the following command to install # If you only have cpu on your machine, please run the following command to install
python3 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
``` ```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
......
...@@ -18,11 +18,11 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine ...@@ -18,11 +18,11 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
<a name="Detection"></a> <a name="Detection"></a>
### 1. Text Detection Model ### 1. Text Detection Model
|model name|description|model size|download| |model name|description|config|model size|download|
|-|-|-|-| |-|-|-|-|-|
|ch_ppocr_mobile_slim_v1.1_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|1.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)| |ch_ppocr_mobile_slim_v1.1_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|1.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|
|ch_ppocr_mobile_v1.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)| |ch_ppocr_mobile_v1.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|
|ch_ppocr_server_v1.1_det|General model, which is larger than the lightweight model, but achieved better performance|47.2M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)| |ch_ppocr_server_v1.1_det|General model, which is larger than the lightweight model, but achieved better performance|[det_r18_vd_db_v1.1.yml](../../configs/det/det_r18_vd_db_v1.1.yml)|47.2M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)|
<a name="Recognition"></a> <a name="Recognition"></a>
...@@ -30,41 +30,41 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine ...@@ -30,41 +30,41 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
<a name="Chinese"></a> <a name="Chinese"></a>
#### Chinese Recognition Model #### Chinese Recognition Model
|model name|description|model size|download| |model name|description|config|model size|download|
|-|-|-|-| |-|-|-|-|-|
|ch_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|1.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)| |ch_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|1.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) |
|ch_ppocr_mobile_v1.1_rec|Original lightweight model, supporting Chinese, English and number recognition|4.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar)| |ch_ppocr_mobile_v1.1_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|4.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
|ch_ppocr_server_v1.1_rec|General model, supporting Chinese, English and number recognition|105M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar)| |ch_ppocr_server_v1.1_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml)|105M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset. **Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset.
<a name="English"></a> <a name="English"></a>
#### English Recognition Model #### English Recognition Model
|model name|description|model size|download| |model name|description|config|model size|download|
|-|-|-|-| |-|-|-|-|-|
|en_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|0.9M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb)| |en_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|0.9M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb) |
|en_ppocr_mobile_v1.1_rec|Original lightweight model, supporting English and number recognition|2.0M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar)| |en_ppocr_mobile_v1.1_rec|Original lightweight model, supporting English and number recognition|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|2.0M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar) |
<a name="Multilingual"></a> <a name="Multilingual"></a>
#### Multilingual Recognition Model(Updating...) #### Multilingual Recognition Model(Updating...)
|model name|description|model size|download| |model name|description|config|model size|download|
|-|-|-|-| |-|-|-|-|-|
| french_ppocr_mobile_v1.1_rec |Lightweight model for French recognition|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar)| | french_ppocr_mobile_v1.1_rec |Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_languages/rec_french_lite_train.yml)|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar) |
| german_ppocr_mobile_v1.1_rec |German model for French recognition|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar)| | german_ppocr_mobile_v1.1_rec |German model for French recognition|[rec_ger_lite_train.yml](../../configs/rec/multi_languages/rec_ger_lite_train.yml)|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar) |
| korean_ppocr_mobile_v1.1_rec |Lightweight model for Korean recognition|3.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar)| | korean_ppocr_mobile_v1.1_rec |Lightweight model for Korean recognition|[rec_korean_lite_train.yml](../../configs/rec/multi_languages/rec_korean_lite_train.yml)|3.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar) |
| japan_ppocr_mobile_v1.1_rec |Lightweight model for Japanese recognition|3.7M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar)| | japan_ppocr_mobile_v1.1_rec |Lightweight model for Japanese recognition|[rec_japan_lite_train.yml](../../configs/rec/multi_languages/rec_japan_lite_train.yml)|3.7M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar) |
<a name="Angle"></a> <a name="Angle"></a>
### 3. Text Angle Classification Model ### 3. Text Angle Classification Model
|model name|description|model size|download| |model name|description|config|model size|download|
|-|-|-|-| |-|-|-|-|-|
|ch_ppocr_mobile_v1.1_cls_quant|Slim quantized model|0.5M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim model]()| |ch_ppocr_mobile_v1.1_cls_quant|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|0.5M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) |
|ch_ppocr_mobile_v1.1_cls|Original model|850kb|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar)| |ch_ppocr_mobile_v1.1_cls|Original model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|850kb|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |
## OCR model list(V1.0, updated on 7.16) ## OCR model list(V1.0, updated on 7.16)
|model name|description|detection model|recognition model|recognition model supporting space recognition| |model name|description|detection model|recognition model|recognition model supporting space recognition|
|-|-|-|-|-| |-|-|-|-|-|
|chinese_db_crnn_mobile|8.6M lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) |chinese_db_crnn_mobile|8.6M lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar) | [inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar) |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) |chinese_db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar) | [inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar) |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
...@@ -5,16 +5,19 @@ ...@@ -5,16 +5,19 @@
Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment. Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment.
*Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)。* * Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md).
## 2.inference models ## 2.inference models
| Name | Introduction | Detection model | Recognition model | Recognition model with space support | The detection and recognition models on the mobile and server sides are as follows. For more models (including multiple languages), please refer to [PP-OCR v1.1 series model list](../doc_ch/models_list.md)
|-|-|-|-|-|
|chinese_db_crnn_mobile| Ultra-lightweight Chinese OCR model |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server| Universal Chinese OCR model |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
* If wget is not installed in the windows environment, you can copy the link to the browser to download when downloading the model, and uncompress it and place it in the corresponding directory.
| Model introduction | Model name | Recommended scene | Detection model | Direction Classifier | Recognition model |
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| Ultra-lightweight Chinese OCR model(8.1M) | ch_ppocr_mobile_v1.1_xx |Mobile-side/Server-side|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| Universal Chinese OCR model(155.1M) |ch_ppocr_server_v1.1_xx|Server-side |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
* If `wget` is not installed in the windows environment, you can copy the link to the browser to download when downloading the model, then uncompress it and place it in the corresponding directory.
Copy the download address of the `inference model` for detection and recognition in the table above, and uncompress them. Copy the download address of the `inference model` for detection and recognition in the table above, and uncompress them.
...@@ -24,6 +27,8 @@ mkdir inference && cd inference ...@@ -24,6 +27,8 @@ mkdir inference && cd inference
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# Download the recognition model and unzip # Download the recognition model and unzip
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package} wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
# Download the direction classifier model and unzip
wget {url/of/classification/inference_model} && tar xf {name/of/classification/inference_model/package}
cd .. cd ..
``` ```
...@@ -32,9 +37,11 @@ Take the ultra-lightweight model as an example: ...@@ -32,9 +37,11 @@ Take the ultra-lightweight model as an example:
``` ```
mkdir inference && cd inference mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it # Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it # Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_infer.tar
# Download the direction classifier model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar && tar xf ch_ppocr_mobile_v1.1_cls_infer.tar
cd .. cd ..
``` ```
...@@ -42,10 +49,13 @@ After decompression, the file structure should be as follows: ...@@ -42,10 +49,13 @@ After decompression, the file structure should be as follows:
``` ```
|-inference |-inference
|-ch_rec_mv3_crnn |-ch_ppocr_mobile_v1.1_det_infer
|- model
|- params
|-ch_ppocr_mobile_v1.1_rec_infer
|- model |- model
|- params |- params
|-ch_det_mv3_db |-ch_ppocr_mobile_v1.1_cls_infer
|- model |- model
|- params |- params
... ...
...@@ -53,20 +63,20 @@ After decompression, the file structure should be as follows: ...@@ -53,20 +63,20 @@ After decompression, the file structure should be as follows:
## 3. Single image or image set prediction ## 3. Single image or image set prediction
* The following code implements text detection and recognition process. When performing prediction, you need to specify the path of a single image or image set through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual results are saved to the `./inference_results` folder by default. * The following code implements text detection and recognition process. When performing prediction, you need to specify the path of a single image or image set through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `rec_model_dir` specifies the path to identify the inference model, the parameter `use_angle_cls` specifies whether to use the direction classifier, the parameter `cls_model_dir` specifies the path to identify the direction classifier model, the parameter `use_space_char` specifies whether to predict the space char. The visual results are saved to the `./inference_results` folder by default.
```bash ```bash
# Predict a single image specified by image_dir # Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
# Predict imageset specified by image_dir # Predict imageset specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False # If you want to use the CPU for prediction, you need to set the use_gpu parameter to False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False
``` ```
- Universal Chinese OCR model - Universal Chinese OCR model
...@@ -75,25 +85,18 @@ Please follow the above steps to download the corresponding models and update th ...@@ -75,25 +85,18 @@ Please follow the above steps to download the corresponding models and update th
``` ```
# Predict a single image specified by image_dir # Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_server_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
``` ```
- Universal Chinese OCR model with the support of space * Note
- If you want to use the recognition model which does not support space char recognition, please update the source code to the latest version and add parameters `--use_space_char=False`.
Please follow the above steps to download the corresponding models and update the relevant parameters, The example is as follows. - If you do not want to use direction classifier, please update the source code to the latest version and add parameters `--use_angle_cls=False`.
* Note: Please update the source code to the latest version and add parameters `--use_space_char=True`
```
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" --use_space_char=True
```
For more text detection and recognition tandem reasoning, please refer to the document tutorial For more text detection and recognition tandem reasoning, please refer to the document tutorial
: [Inference with Python inference engine](./inference_en.md) : [Inference with Python inference engine](./inference_en.md)
In addition, the tutorial also provides other deployment methods for the Chinese OCR model: In addition, the tutorial also provides other deployment methods for the Chinese OCR model:
- [Server-side C++ inference](../../deploy/cpp_infer/readme_en.md) - [Server-side C++ inference](../../deploy/cpp_infer/readme_en.md)
- [Service deployment](./serving_en.md) - [Service deployment](../../deploy/hubserving/readme_en.md)
- [End-to-end deployment](../../deploy/lite/readme_en.md) - [End-to-end deployment](../../deploy/lite/readme_en.md)
## TEXT RECOGNITION ## TEXT RECOGNITION
- [DATA PREPARATION](#DATA_PREPARATION)
- [Dataset Download](#Dataset_download)
- [Costom Dataset](#Costom_Dataset)
- [Dictionary](#Dictionary)
- [Add Space Category](#Add_space_category)
- [TRAINING](#TRAINING)
- [Data Augmentation](#Data_Augmentation)
- [Training](#Training)
- [Multi-language](#Multi_language)
- [EVALUATION](#EVALUATION)
- [PREDICTION](#PREDICTION)
- [Training engine prediction](#Training_engine_prediction)
<a name="DATA_PREPARATION"></a>
### DATA PREPARATION ### DATA PREPARATION
...@@ -13,13 +30,14 @@ The default storage path for training data is `PaddleOCR/train_data`, if you alr ...@@ -13,13 +30,14 @@ The default storage path for training data is `PaddleOCR/train_data`, if you alr
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
``` ```
<a name="Dataset_download"></a>
* Dataset download * Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path. If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
<a name="Costom_Dataset"></a>
* Use your own dataset: * Use your own dataset:
If you want to use your own data for training, please refer to the following to organize your data. If you want to use your own data for training, please refer to the following to organize your data.
...@@ -72,7 +90,7 @@ Similar to the training set, the test set also needs to be provided a folder con ...@@ -72,7 +90,7 @@ Similar to the training set, the test set also needs to be provided a folder con
|- word_003.jpg |- word_003.jpg
| ... | ...
``` ```
<a name="Dictionary"></a>
- Dictionary - Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
...@@ -92,9 +110,21 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a ...@@ -92,9 +110,21 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters. `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters. `ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
`ppocr/utils/dict/french_dict.txt` is a French dictionary with 118 characters
`ppocr/utils/dict/japan_dict.txt` is a French dictionary with 4399 characters
`ppocr/utils/dict/korean_dict.txt` is a French dictionary with 3636 characters
`ppocr/utils/dict/german_dict.txt` is a French dictionary with 131 characters
You can use it on demand.
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) or corpus file to [corpus](../../ppocr/utils/corpus) and we will thank you in the Repo.
You can use them if needed.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
...@@ -102,12 +132,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co ...@@ -102,12 +132,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch. If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<a name="Add_space_category"></a>
- Add space category - Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `true`. If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `true`.
**Note: use_space_char only takes effect when character_type=ch** **Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a>
### TRAINING ### TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
...@@ -126,14 +158,12 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar ...@@ -126,14 +158,12 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar
Start training: Start training:
``` ```
# Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:.
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES # GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
# Training icdar15 English data and saving the log as train_rec.log # Training icdar15 English data and saving the log as train_rec.log
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log
``` ```
<a name="Data_Augmentation"></a>
- Data Augmentation - Data Augmentation
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file. PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
...@@ -142,7 +172,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand ...@@ -142,7 +172,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
<a name="Training"></a>
- Training - Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process. PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
...@@ -154,7 +184,10 @@ If the evaluation set is large, the test will be time-consuming. It is recommend ...@@ -154,7 +184,10 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
| Configuration file | Algorithm | backbone | trans | seq | pred | | Configuration file | Algorithm | backbone | trans | seq | pred |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: |
| [rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| [rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc |
| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | | rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc |
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
...@@ -165,7 +198,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend ...@@ -165,7 +198,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention | | rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention |
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: For training Chinese data, it is recommended to use
训练中文数据,推荐使用[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
co co
Take `rec_mv3_none_none_ctc.yml` as an example: Take `rec_mv3_none_none_ctc.yml` as an example:
``` ```
...@@ -201,7 +235,8 @@ Optimizer: ...@@ -201,7 +235,8 @@ Optimizer:
``` ```
**Note that the configuration file for prediction/evaluation must be consistent with the training.** **Note that the configuration file for prediction/evaluation must be consistent with the training.**
-Minor language <a name="Multi_language"></a>
- Multi-language
PaddleOCR also provides multi-language. The configuration file in `configs/rec/multi_languages` provides multi-language configuration files. Currently, the multi-language algorithms supported by PaddleOCR are: PaddleOCR also provides multi-language. The configuration file in `configs/rec/multi_languages` provides multi-language configuration files. Currently, the multi-language algorithms supported by PaddleOCR are:
...@@ -213,8 +248,30 @@ PaddleOCR also provides multi-language. The configuration file in `configs/rec/m ...@@ -213,8 +248,30 @@ PaddleOCR also provides multi-language. The configuration file in `configs/rec/m
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | | rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese |
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | | rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean |
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk](). The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file:
Take `rec_french_lite_train` as an example:
```
Global:
...
# Add a custom dictionary, if you modify the dictionary
# please point the path to the new dictionary
character_dict_path: ./ppocr/utils/dict/french_dict.txt
# Add data augmentation during training
distort: true
# Identify spaces
use_space_char: true
...
# Modify reader type
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
...
...
```
<a name="EVALUATION"></a>
### EVALUATION ### EVALUATION
The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader.
...@@ -222,11 +279,13 @@ The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` ...@@ -222,11 +279,13 @@ The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml`
``` ```
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
# GPU evaluation, Global.checkpoints is the weight to be tested # GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy python3 tools/eval.py -c configs/rec/rec_icdar15_reader.yml -o Global.checkpoints={path/to/weights}/best_accuracy
``` ```
<a name="PREDICTION"></a>
### PREDICTION ### PREDICTION
<a name="Training_engine_prediction"></a>
* Training engine prediction * Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script. Using the model trained by paddleocr, you can quickly get prediction through the following script.
...@@ -235,7 +294,7 @@ The default prediction picture is stored in `infer_img`, and the weight is speci ...@@ -235,7 +294,7 @@ The default prediction picture is stored in `infer_img`, and the weight is speci
``` ```
# Predict English results # Predict English results
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
``` ```
Input image: Input image:
...@@ -250,11 +309,11 @@ infer_img: doc/imgs_words/en/word_1.png ...@@ -250,11 +309,11 @@ infer_img: doc/imgs_words/en/word_1.png
word : joint word : joint
``` ```
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`, you can use the following command to predict the Chinese model: The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml`, you can use the following command to predict the Chinese model:
``` ```
# Predict Chinese results # Predict Chinese results
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg
``` ```
Input image: Input image:
......
# RECENT UPDATES # RECENT UPDATES
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](../../README.md#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](../../README.md#Supported-Chinese-model-list)
- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](../../README.md#Supported-Chinese-model-list)
- 2020.9.17 update [English recognition model](./models_list_en.md#english-recognition-model) and [Multilingual recognition model](./models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md) - 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)
- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294) - 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519) - 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
...@@ -7,7 +11,7 @@ ...@@ -7,7 +11,7 @@
- 2020.7.15, Add several related datasets, data annotation and synthesis tools. - 2020.7.15, Add several related datasets, data annotation and synthesis tools.
- 2020.7.9 Add a new model to support recognize the character "space". - 2020.7.9 Add a new model to support recognize the character "space".
- 2020.7.9 Add the data augument and learning rate decay strategies during training. - 2020.7.9 Add the data augument and learning rate decay strategies during training.
- 2020.6.8 Add [datasets](./doc/doc_en/datasets_en.md) and keep updating - 2020.6.8 Add [datasets](./datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score - 2020.6.5 Support separate prediction and recognition, output result score
- 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support exporting `attention` model to `inference_model`
......
# Visualization # Visualization
- [Chinese/English OCR Visualization (Space_support )](#Space_support) <a name="ppocr_server_1.1"></a>
- [Ultra-lightweight Chinese/English OCR Visualization](#Ultra-lightweight) ## ch_ppocr_server_1.1
- [General Chinese/English OCR Visualization](#General)
<a name="Space_support"></a>
## Chinese/English OCR Visualization (Space_support )
### Ultra-lightweight Model
<div align="center">
<img src="../imgs_results/img_11.jpg" width="800">
</div>
### General OCR Model
<div align="center"> <div align="center">
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800"> <img src="../imgs_results/1101.jpg" width="800">
<img src="../imgs_results/1102.jpg" width="800">
<img src="../imgs_results/1103.jpg" width="800">
<img src="../imgs_results/1104.jpg" width="800">
<img src="../imgs_results/1105.jpg" width="800">
<img src="../imgs_results/1106.jpg" width="800">
</div> </div>
<a name="Ultra-lightweight"></a>
## Ultra-lightweight Chinese/English OCR Visualization
<a name="en_ppocr_mobile_1.1"></a>
## en_ppocr_mobile_1.1
<div align="center"> <div align="center">
<img src="../imgs_results/1.jpg" width="800"> <img src="../imgs_results/img_12.jpg" width="800">
</div> </div>
<div align="center">
<img src="../imgs_results/7.jpg" width="800">
</div>
<a name="multilingual"></a>
## (multilingual)_ppocr_mobile_1.1
<div align="center"> <div align="center">
<img src="../imgs_results/12.jpg" width="800"> <img src="../imgs_results/1110.jpg" width="800">
<img src="../imgs_results/1112.jpg" width="800">
</div> </div>
<div align="center">
<img src="../imgs_results/4.jpg" width="800">
</div>
<div align="center"> <a name="ppocr_mobile_1.0"></a>
<img src="../imgs_results/6.jpg" width="800"> ## ppocr_mobile_1.0
</div>
<div align="center">
<img src="../imgs_results/9.jpg" width="800">
</div>
<div align="center"> <div align="center">
<img src="../imgs_results/1.jpg" width="800">
<img src="../imgs_results/7.jpg" width="800">
<img src="../imgs_results/6.jpg" width="800">
<img src="../imgs_results/16.png" width="800"> <img src="../imgs_results/16.png" width="800">
</div> </div>
<div align="center">
<img src="../imgs_results/22.jpg" width="800">
</div>
<a name="General"></a> <a name="ppocr_server_1.0"></a>
## General Chinese/English OCR Visualization ## ppocr_server_1.0
<div align="center"> <div align="center">
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800"> <img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800"> <img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800"> <img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
</div> </div>
...@@ -9,8 +9,8 @@ pip install paddleocr ...@@ -9,8 +9,8 @@ pip install paddleocr
build own whl package and install build own whl package and install
```bash ```bash
python setup.py bdist_wheel python3 setup.py bdist_wheel
pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
``` ```
### 1. Use by code ### 1. Use by code
...@@ -18,7 +18,7 @@ pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of padd ...@@ -18,7 +18,7 @@ pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of padd
```python ```python
from paddleocr import PaddleOCR,draw_ocr from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese. # Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `zh`, `en`, `french`, `german`, `korean`, `japan` # You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
# to switch the language model in order. # to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
...@@ -302,7 +302,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_ ...@@ -302,7 +302,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
| cls_batch_num | When performing classification, the batchsize of forward images | 30 | | cls_batch_num | When performing classification, the batchsize of forward images | 30 |
| enable_mkldnn | Whether to enable mkldnn | FALSE | | enable_mkldnn | Whether to enable mkldnn | FALSE |
| use_zero_copy_run | Whether to forward by zero_copy_run | FALSE | | use_zero_copy_run | Whether to forward by zero_copy_run | FALSE |
| lang | The support language, now only chinese(ch) and english(en) are supported | ch | | lang | The support language, now only Chinese(ch)、English(en)、French(french)、German(german)、Korean(korean)、Japanese(japan) are supported | ch |
| det | Enable detction when `ppocr.ocr` func exec | TRUE | | det | Enable detction when `ppocr.ocr` func exec | TRUE |
| rec | Enable recognition when `ppocr.ocr` func exec | TRUE | | rec | Enable recognition when `ppocr.ocr` func exec | TRUE |
| cls | Enable classification when `ppocr.ocr` func exec | FALSE | | cls | Enable classification when `ppocr.ocr` func exec | FALSE |
doc/joinus.PNG

15.7 KB | W: | H:

doc/joinus.PNG

405.4 KB | W: | H:

doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
  • 2-up
  • Swipe
  • Onion skin
...@@ -165,13 +165,13 @@ class DBProcessTest(object): ...@@ -165,13 +165,13 @@ class DBProcessTest(object):
elif resize_h // 32 <= 1: elif resize_h // 32 <= 1:
resize_h = 32 resize_h = 32
else: else:
resize_h = (resize_h // 32 - 1) * 32 resize_h = (resize_h // 32) * 32
if resize_w % 32 == 0: if resize_w % 32 == 0:
resize_w = resize_w resize_w = resize_w
elif resize_w // 32 <= 1: elif resize_w // 32 <= 1:
resize_w = 32 resize_w = 32
else: else:
resize_w = (resize_w // 32 - 1) * 32 resize_w = (resize_w // 32) * 32
try: try:
if int(resize_w) <= 0 or int(resize_h) <= 0: if int(resize_w) <= 0 or int(resize_h) <= 0:
return None, (None, None) return None, (None, None)
......
...@@ -19,6 +19,7 @@ import json ...@@ -19,6 +19,7 @@ import json
import sys import sys
import os import os
class EASTProcessTrain(object): class EASTProcessTrain(object):
def __init__(self, params): def __init__(self, params):
self.img_set_dir = params['img_set_dir'] self.img_set_dir = params['img_set_dir']
...@@ -495,13 +496,13 @@ class EASTProcessTest(object): ...@@ -495,13 +496,13 @@ class EASTProcessTest(object):
elif resize_h // 32 <= 1: elif resize_h // 32 <= 1:
resize_h = 32 resize_h = 32
else: else:
resize_h = (resize_h // 32 - 1) * 32 resize_h = (resize_h // 32) * 32
if resize_w % 32 == 0: if resize_w % 32 == 0:
resize_w = resize_w resize_w = resize_w
elif resize_w // 32 <= 1: elif resize_w // 32 <= 1:
resize_w = 32 resize_w = 32
else: else:
resize_w = (resize_w // 32 - 1) * 32 resize_w = (resize_w // 32) * 32
try: try:
if int(resize_w) <= 0 or int(resize_h) <= 0: if int(resize_w) <= 0 or int(resize_h) <= 0:
return None, (None, None) return None, (None, None)
......
...@@ -599,7 +599,7 @@ class SASTProcessTrain(object): ...@@ -599,7 +599,7 @@ class SASTProcessTrain(object):
""" """
text_polys, txt_tags, txts = [], [], [] text_polys, txt_tags, txts = [], [], []
with open(poly_txt_path) as f: with open(poly_txt_path, 'rb') as f:
for line in f.readlines(): for line in f.readlines():
poly_str, txt = line.strip().split('\t') poly_str, txt = line.strip().split('\t')
poly = map(float, poly_str.split(',')) poly = map(float, poly_str.split(','))
......
...@@ -127,6 +127,7 @@ class LMDBReader(object): ...@@ -127,6 +127,7 @@ class LMDBReader(object):
img=img, img=img,
image_shape=self.image_shape, image_shape=self.image_shape,
num_heads=self.num_heads, num_heads=self.num_heads,
char_ops=self.char_ops,
max_text_length=self.max_text_length) max_text_length=self.max_text_length)
else: else:
norm_img = process_image( norm_img = process_image(
......
...@@ -19,6 +19,8 @@ import random ...@@ -19,6 +19,8 @@ import random
from ppocr.utils.utility import initial_logger from ppocr.utils.utility import initial_logger
logger = initial_logger() logger = initial_logger()
from .text_image_aug.augment import tia_distort, tia_stretch, tia_perspective
def get_bounding_box_rect(pos): def get_bounding_box_rect(pos):
left = min(pos[0]) left = min(pos[0])
...@@ -54,7 +56,7 @@ def resize_norm_img(img, image_shape): ...@@ -54,7 +56,7 @@ def resize_norm_img(img, image_shape):
def resize_norm_img_chinese(img, image_shape): def resize_norm_img_chinese(img, image_shape):
imgC, imgH, imgW = image_shape imgC, imgH, imgW = image_shape
# todo: change to 0 and modified image shape # todo: change to 0 and modified image shape
max_wh_ratio = 0 max_wh_ratio = imgW * 1.0 / imgH
h, w = img.shape[0], img.shape[1] h, w = img.shape[0], img.shape[1]
ratio = w * 1.0 / h ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, ratio) max_wh_ratio = max(max_wh_ratio, ratio)
...@@ -196,6 +198,9 @@ class Config: ...@@ -196,6 +198,9 @@ class Config:
self.h = h self.h = h
self.perspective = True self.perspective = True
self.stretch = True
self.distort = True
self.crop = True self.crop = True
self.affine = False self.affine = False
self.reverse = True self.reverse = True
...@@ -299,41 +304,52 @@ def warp(img, ang): ...@@ -299,41 +304,52 @@ def warp(img, ang):
config.make(w, h, ang) config.make(w, h, ang)
new_img = img new_img = img
prob = 0.4
if config.distort:
img_height, img_width = img.shape[0:2]
if random.random() <= prob and img_height >= 20 and img_width >= 20:
try:
new_img = tia_distort(new_img, random.randint(3, 6))
except:
logger.warning(
"Exception occured during tia_distort, pass it...")
if config.stretch:
img_height, img_width = img.shape[0:2]
if random.random() <= prob and img_height >= 20 and img_width >= 20:
try:
new_img = tia_stretch(new_img, random.randint(3, 6))
except:
logger.warning(
"Exception occured during tia_stretch, pass it...")
if config.perspective: if config.perspective:
tp = random.randint(1, 100) if random.random() <= prob:
if tp >= 50: try:
warpR, (r1, c1), ratio, dst = get_warpR(config) new_img = tia_perspective(new_img)
new_w = int(np.max(dst[:, 0])) - int(np.min(dst[:, 0])) except:
new_img = cv2.warpPerspective( logger.warning(
new_img, "Exception occured during tia_perspective, pass it...")
warpR, (int(new_w * ratio), h),
borderMode=config.borderMode)
if config.crop: if config.crop:
img_height, img_width = img.shape[0:2] img_height, img_width = img.shape[0:2]
tp = random.randint(1, 100) if random.random() <= prob and img_height >= 20 and img_width >= 20:
if tp >= 50 and img_height >= 20 and img_width >= 20:
new_img = get_crop(new_img) new_img = get_crop(new_img)
if config.affine:
warpT = get_warpAffine(config)
new_img = cv2.warpAffine(
new_img, warpT, (w, h), borderMode=config.borderMode)
if config.blur: if config.blur:
tp = random.randint(1, 100) if random.random() <= prob:
if tp >= 50:
new_img = blur(new_img) new_img = blur(new_img)
if config.color: if config.color:
tp = random.randint(1, 100) if random.random() <= prob:
if tp >= 50:
new_img = cvtColor(new_img) new_img = cvtColor(new_img)
if config.jitter: if config.jitter:
new_img = jitter(new_img) new_img = jitter(new_img)
if config.noise: if config.noise:
tp = random.randint(1, 100) if random.random() <= prob:
if tp >= 50:
new_img = add_gasuss_noise(new_img) new_img = add_gasuss_noise(new_img)
if config.reverse: if config.reverse:
tp = random.randint(1, 100) if random.random() <= prob:
if tp >= 50:
new_img = 255 - new_img new_img = 255 - new_img
return new_img return new_img
...@@ -382,6 +398,7 @@ def process_image(img, ...@@ -382,6 +398,7 @@ def process_image(img,
% loss_type % loss_type
return (norm_img) return (norm_img)
def resize_norm_img_srn(img, image_shape): def resize_norm_img_srn(img, image_shape):
imgC, imgH, imgW = image_shape imgC, imgH, imgW = image_shape
...@@ -408,30 +425,39 @@ def resize_norm_img_srn(img, image_shape): ...@@ -408,30 +425,39 @@ def resize_norm_img_srn(img, image_shape):
return np.reshape(img_black, (c, row, col)).astype(np.float32) return np.reshape(img_black, (c, row, col)).astype(np.float32)
def srn_other_inputs(image_shape,
num_heads, def srn_other_inputs(image_shape, num_heads, max_text_length, char_num):
max_text_length,
char_num):
imgC, imgH, imgW = image_shape imgC, imgH, imgW = image_shape
feature_dim = int((imgH / 8) * (imgW / 8)) feature_dim = int((imgH / 8) * (imgW / 8))
encoder_word_pos = np.array(range(0, feature_dim)).reshape((feature_dim, 1)).astype('int64') encoder_word_pos = np.array(range(0, feature_dim)).reshape(
gsrm_word_pos = np.array(range(0, max_text_length)).reshape((max_text_length, 1)).astype('int64') (feature_dim, 1)).astype('int64')
gsrm_word_pos = np.array(range(0, max_text_length)).reshape(
(max_text_length, 1)).astype('int64')
lbl_weight = np.array([int(char_num-1)] * max_text_length).reshape((-1,1)).astype('int64') lbl_weight = np.array([int(char_num - 1)] * max_text_length).reshape(
(-1, 1)).astype('int64')
gsrm_attn_bias_data = np.ones((1, max_text_length, max_text_length)) gsrm_attn_bias_data = np.ones((1, max_text_length, max_text_length))
gsrm_slf_attn_bias1 = np.triu(gsrm_attn_bias_data, 1).reshape([-1, 1, max_text_length, max_text_length]) gsrm_slf_attn_bias1 = np.triu(gsrm_attn_bias_data, 1).reshape(
gsrm_slf_attn_bias1 = np.tile(gsrm_slf_attn_bias1, [1, num_heads, 1, 1]) * [-1e9] [-1, 1, max_text_length, max_text_length])
gsrm_slf_attn_bias1 = np.tile(gsrm_slf_attn_bias1,
[1, num_heads, 1, 1]) * [-1e9]
gsrm_slf_attn_bias2 = np.tril(gsrm_attn_bias_data, -1).reshape([-1, 1, max_text_length, max_text_length]) gsrm_slf_attn_bias2 = np.tril(gsrm_attn_bias_data, -1).reshape(
gsrm_slf_attn_bias2 = np.tile(gsrm_slf_attn_bias2, [1, num_heads, 1, 1]) * [-1e9] [-1, 1, max_text_length, max_text_length])
gsrm_slf_attn_bias2 = np.tile(gsrm_slf_attn_bias2,
[1, num_heads, 1, 1]) * [-1e9]
encoder_word_pos = encoder_word_pos[np.newaxis, :] encoder_word_pos = encoder_word_pos[np.newaxis, :]
gsrm_word_pos = gsrm_word_pos[np.newaxis, :] gsrm_word_pos = gsrm_word_pos[np.newaxis, :]
return [lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2] return [
lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1,
gsrm_slf_attn_bias2
]
def process_image_srn(img, def process_image_srn(img,
image_shape, image_shape,
...@@ -453,14 +479,16 @@ def process_image_srn(img, ...@@ -453,14 +479,16 @@ def process_image_srn(img,
return None return None
else: else:
if loss_type == "srn": if loss_type == "srn":
text_padded = [int(char_num-1)] * max_text_length text_padded = [int(char_num - 1)] * max_text_length
for i in range(len(text)): for i in range(len(text)):
text_padded[i] = text[i] text_padded[i] = text[i]
lbl_weight[i] = [1.0] lbl_weight[i] = [1.0]
text_padded = np.array(text_padded) text_padded = np.array(text_padded)
text = text_padded.reshape(-1, 1) text = text_padded.reshape(-1, 1)
return (norm_img, text,encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2,lbl_weight) return (norm_img, text, encoder_word_pos, gsrm_word_pos,
gsrm_slf_attn_bias1, gsrm_slf_attn_bias2, lbl_weight)
else: else:
assert False, "Unsupport loss_type %s in process_image"\ assert False, "Unsupport loss_type %s in process_image"\
% loss_type % loss_type
return (norm_img, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2) return (norm_img, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1,
gsrm_slf_attn_bias2)
# -*- coding:utf-8 -*-
# Author: RubanSeven
# Reference: https://github.com/RubanSeven/Text-Image-Augmentation-python
# import cv2
import numpy as np
from .warp_mls import WarpMLS
def tia_distort(src, segment=4):
img_h, img_w = src.shape[:2]
cut = img_w // segment
thresh = cut // 3
src_pts = list()
dst_pts = list()
src_pts.append([0, 0])
src_pts.append([img_w, 0])
src_pts.append([img_w, img_h])
src_pts.append([0, img_h])
dst_pts.append([np.random.randint(thresh), np.random.randint(thresh)])
dst_pts.append(
[img_w - np.random.randint(thresh), np.random.randint(thresh)])
dst_pts.append(
[img_w - np.random.randint(thresh), img_h - np.random.randint(thresh)])
dst_pts.append(
[np.random.randint(thresh), img_h - np.random.randint(thresh)])
half_thresh = thresh * 0.5
for cut_idx in np.arange(1, segment, 1):
src_pts.append([cut * cut_idx, 0])
src_pts.append([cut * cut_idx, img_h])
dst_pts.append([
cut * cut_idx + np.random.randint(thresh) - half_thresh,
np.random.randint(thresh) - half_thresh
])
dst_pts.append([
cut * cut_idx + np.random.randint(thresh) - half_thresh,
img_h + np.random.randint(thresh) - half_thresh
])
trans = WarpMLS(src, src_pts, dst_pts, img_w, img_h)
dst = trans.generate()
return dst
def tia_stretch(src, segment=4):
img_h, img_w = src.shape[:2]
cut = img_w // segment
thresh = cut * 4 // 5
src_pts = list()
dst_pts = list()
src_pts.append([0, 0])
src_pts.append([img_w, 0])
src_pts.append([img_w, img_h])
src_pts.append([0, img_h])
dst_pts.append([0, 0])
dst_pts.append([img_w, 0])
dst_pts.append([img_w, img_h])
dst_pts.append([0, img_h])
half_thresh = thresh * 0.5
for cut_idx in np.arange(1, segment, 1):
move = np.random.randint(thresh) - half_thresh
src_pts.append([cut * cut_idx, 0])
src_pts.append([cut * cut_idx, img_h])
dst_pts.append([cut * cut_idx + move, 0])
dst_pts.append([cut * cut_idx + move, img_h])
trans = WarpMLS(src, src_pts, dst_pts, img_w, img_h)
dst = trans.generate()
return dst
def tia_perspective(src):
img_h, img_w = src.shape[:2]
thresh = img_h // 2
src_pts = list()
dst_pts = list()
src_pts.append([0, 0])
src_pts.append([img_w, 0])
src_pts.append([img_w, img_h])
src_pts.append([0, img_h])
dst_pts.append([0, np.random.randint(thresh)])
dst_pts.append([img_w, np.random.randint(thresh)])
dst_pts.append([img_w, img_h - np.random.randint(thresh)])
dst_pts.append([0, img_h - np.random.randint(thresh)])
trans = WarpMLS(src, src_pts, dst_pts, img_w, img_h)
dst = trans.generate()
return dst
# -*- coding:utf-8 -*-
# Author: RubanSeven
# Reference: https://github.com/RubanSeven/Text-Image-Augmentation-python
import math
import numpy as np
class WarpMLS:
def __init__(self, src, src_pts, dst_pts, dst_w, dst_h, trans_ratio=1.):
self.src = src
self.src_pts = src_pts
self.dst_pts = dst_pts
self.pt_count = len(self.dst_pts)
self.dst_w = dst_w
self.dst_h = dst_h
self.trans_ratio = trans_ratio
self.grid_size = 100
self.rdx = np.zeros((self.dst_h, self.dst_w))
self.rdy = np.zeros((self.dst_h, self.dst_w))
@staticmethod
def __bilinear_interp(x, y, v11, v12, v21, v22):
return (v11 * (1 - y) + v12 * y) * (1 - x) + (v21 *
(1 - y) + v22 * y) * x
def generate(self):
self.calc_delta()
return self.gen_img()
def calc_delta(self):
w = np.zeros(self.pt_count, dtype=np.float32)
if self.pt_count < 2:
return
i = 0
while 1:
if self.dst_w <= i < self.dst_w + self.grid_size - 1:
i = self.dst_w - 1
elif i >= self.dst_w:
break
j = 0
while 1:
if self.dst_h <= j < self.dst_h + self.grid_size - 1:
j = self.dst_h - 1
elif j >= self.dst_h:
break
sw = 0
swp = np.zeros(2, dtype=np.float32)
swq = np.zeros(2, dtype=np.float32)
new_pt = np.zeros(2, dtype=np.float32)
cur_pt = np.array([i, j], dtype=np.float32)
k = 0
for k in range(self.pt_count):
if i == self.dst_pts[k][0] and j == self.dst_pts[k][1]:
break
w[k] = 1. / (
(i - self.dst_pts[k][0]) * (i - self.dst_pts[k][0]) +
(j - self.dst_pts[k][1]) * (j - self.dst_pts[k][1]))
sw += w[k]
swp = swp + w[k] * np.array(self.dst_pts[k])
swq = swq + w[k] * np.array(self.src_pts[k])
if k == self.pt_count - 1:
pstar = 1 / sw * swp
qstar = 1 / sw * swq
miu_s = 0
for k in range(self.pt_count):
if i == self.dst_pts[k][0] and j == self.dst_pts[k][1]:
continue
pt_i = self.dst_pts[k] - pstar
miu_s += w[k] * np.sum(pt_i * pt_i)
cur_pt -= pstar
cur_pt_j = np.array([-cur_pt[1], cur_pt[0]])
for k in range(self.pt_count):
if i == self.dst_pts[k][0] and j == self.dst_pts[k][1]:
continue
pt_i = self.dst_pts[k] - pstar
pt_j = np.array([-pt_i[1], pt_i[0]])
tmp_pt = np.zeros(2, dtype=np.float32)
tmp_pt[0] = np.sum(pt_i * cur_pt) * self.src_pts[k][0] - \
np.sum(pt_j * cur_pt) * self.src_pts[k][1]
tmp_pt[1] = -np.sum(pt_i * cur_pt_j) * self.src_pts[k][0] + \
np.sum(pt_j * cur_pt_j) * self.src_pts[k][1]
tmp_pt *= (w[k] / miu_s)
new_pt += tmp_pt
new_pt += qstar
else:
new_pt = self.src_pts[k]
self.rdx[j, i] = new_pt[0] - i
self.rdy[j, i] = new_pt[1] - j
j += self.grid_size
i += self.grid_size
def gen_img(self):
src_h, src_w = self.src.shape[:2]
dst = np.zeros_like(self.src, dtype=np.float32)
for i in np.arange(0, self.dst_h, self.grid_size):
for j in np.arange(0, self.dst_w, self.grid_size):
ni = i + self.grid_size
nj = j + self.grid_size
w = h = self.grid_size
if ni >= self.dst_h:
ni = self.dst_h - 1
h = ni - i + 1
if nj >= self.dst_w:
nj = self.dst_w - 1
w = nj - j + 1
di = np.reshape(np.arange(h), (-1, 1))
dj = np.reshape(np.arange(w), (1, -1))
delta_x = self.__bilinear_interp(
di / h, dj / w, self.rdx[i, j], self.rdx[i, nj],
self.rdx[ni, j], self.rdx[ni, nj])
delta_y = self.__bilinear_interp(
di / h, dj / w, self.rdy[i, j], self.rdy[i, nj],
self.rdy[ni, j], self.rdy[ni, nj])
nx = j + dj + delta_x * self.trans_ratio
ny = i + di + delta_y * self.trans_ratio
nx = np.clip(nx, 0, src_w - 1)
ny = np.clip(ny, 0, src_h - 1)
nxi = np.array(np.floor(nx), dtype=np.int32)
nyi = np.array(np.floor(ny), dtype=np.int32)
nxi1 = np.array(np.ceil(nx), dtype=np.int32)
nyi1 = np.array(np.ceil(ny), dtype=np.int32)
if len(self.src.shape) == 3:
x = np.tile(np.expand_dims(ny - nyi, axis=-1), (1, 1, 3))
y = np.tile(np.expand_dims(nx - nxi, axis=-1), (1, 1, 3))
else:
x = ny - nyi
y = nx - nxi
dst[i:i + h, j:j + w] = self.__bilinear_interp(
x, y, self.src[nyi, nxi], self.src[nyi, nxi1],
self.src[nyi1, nxi], self.src[nyi1, nxi1])
dst = np.clip(dst, 0, 255)
dst = np.array(dst, dtype=np.uint8)
return dst
...@@ -65,6 +65,7 @@ class ClsModel(object): ...@@ -65,6 +65,7 @@ class ClsModel(object):
labels = None labels = None
loader = None loader = None
image = fluid.data(name='image', shape=image_shape, dtype='float32') image = fluid.data(name='image', shape=image_shape, dtype='float32')
image.stop_gradient = False
return image, labels, loader return image, labels, loader
def __call__(self, mode): def __call__(self, mode):
......
...@@ -16,6 +16,8 @@ from __future__ import absolute_import ...@@ -16,6 +16,8 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from collections import OrderedDict
from paddle import fluid from paddle import fluid
from ppocr.utils.utility import create_module from ppocr.utils.utility import create_module
...@@ -25,6 +27,12 @@ from copy import deepcopy ...@@ -25,6 +27,12 @@ from copy import deepcopy
class RecModel(object): class RecModel(object):
"""
Rec model architecture
Args:
params(object): Params from yaml file and settings from command line
"""
def __init__(self, params): def __init__(self, params):
super(RecModel, self).__init__() super(RecModel, self).__init__()
global_params = params['Global'] global_params = params['Global']
...@@ -64,6 +72,12 @@ class RecModel(object): ...@@ -64,6 +72,12 @@ class RecModel(object):
self.num_heads = None self.num_heads = None
def create_feed(self, mode): def create_feed(self, mode):
"""
Create feed dict and DataLoader object
Args:
mode(str): runtime mode, can be "train", "eval" or "test"
Return: image, labels, loader
"""
image_shape = deepcopy(self.image_shape) image_shape = deepcopy(self.image_shape)
image_shape.insert(0, -1) image_shape.insert(0, -1)
if mode == "train": if mode == "train":
...@@ -189,9 +203,12 @@ class RecModel(object): ...@@ -189,9 +203,12 @@ class RecModel(object):
inputs = image inputs = image
else: else:
inputs = self.tps(image) inputs = self.tps(image)
# backbone
conv_feas = self.backbone(inputs) conv_feas = self.backbone(inputs)
# predict
predicts = self.head(conv_feas, labels, mode) predicts = self.head(conv_feas, labels, mode)
decoded_out = predicts['decoded_out'] decoded_out = predicts['decoded_out']
# loss
if mode == "train": if mode == "train":
loss = self.loss(predicts, labels) loss = self.loss(predicts, labels)
if self.loss_type == "attention": if self.loss_type == "attention":
...@@ -200,33 +217,32 @@ class RecModel(object): ...@@ -200,33 +217,32 @@ class RecModel(object):
label = labels['label'] label = labels['label']
if self.loss_type == 'srn': if self.loss_type == 'srn':
total_loss, img_loss, word_loss = self.loss(predicts, labels) total_loss, img_loss, word_loss = self.loss(predicts, labels)
outputs = { outputs = OrderedDict([('total_loss', total_loss),
'total_loss': total_loss, ('img_loss', img_loss),
'img_loss': img_loss, ('word_loss', word_loss),
'word_loss': word_loss, ('decoded_out', decoded_out),
'decoded_out': decoded_out, ('label', label)])
'label': label
}
else: else:
outputs = {'total_loss':loss, 'decoded_out':\ outputs = OrderedDict([('total_loss', loss),
decoded_out, 'label':label} ('decoded_out', decoded_out),
('label', label)])
return loader, outputs return loader, outputs
# export_model
elif mode == "export": elif mode == "export":
predict = predicts['predict'] predict = predicts['predict']
if self.loss_type == "ctc": if self.loss_type == "ctc":
predict = fluid.layers.softmax(predict) predict = fluid.layers.softmax(predict)
if self.loss_type == "srn": if self.loss_type == "srn":
return [ return [
image, labels, { image, labels, OrderedDict([('decoded_out', decoded_out),
'decoded_out': decoded_out, ('predicts', predict)])]
'predicts': predict
}
]
return [image, {'decoded_out': decoded_out, 'predicts': predict}] return [image, OrderedDict([('decoded_out', decoded_out),
('predicts', predict)])]
# eval or test
else: else:
predict = predicts['predict'] predict = predicts['predict']
if self.loss_type == "ctc": if self.loss_type == "ctc":
predict = fluid.layers.softmax(predict) predict = fluid.layers.softmax(predict)
return loader, {'decoded_out': decoded_out, 'predicts': predict} return loader, OrderedDict([('decoded_out', decoded_out),
('predicts', predict)])
\ No newline at end of file
...@@ -100,11 +100,10 @@ class ResNet(): ...@@ -100,11 +100,10 @@ class ResNet():
conv = self.basic_block( conv = self.basic_block(
input=conv, input=conv,
num_filters=num_filters[block], num_filters=num_filters[block],
stride=stride, stride=stride_list[block] if i == 0 else 1,
if_first=block == i == 0, is_first=block == i == 0,
name=conv_name) name=conv_name)
F.append(conv) F.append(conv)
base = F[-1] base = F[-1]
for i in [-2, -3]: for i in [-2, -3]:
b, c, w, h = F[i].shape b, c, w, h = F[i].shape
......
...@@ -27,6 +27,12 @@ import numpy as np ...@@ -27,6 +27,12 @@ import numpy as np
class CTCPredict(object): class CTCPredict(object):
"""
CTC predict
Args:
params(object): Params from yaml file and settings from command line
"""
def __init__(self, params): def __init__(self, params):
super(CTCPredict, self).__init__() super(CTCPredict, self).__init__()
self.char_num = params['char_num'] self.char_num = params['char_num']
......
...@@ -33,6 +33,7 @@ class AttentionLoss(object): ...@@ -33,6 +33,7 @@ class AttentionLoss(object):
predict = predicts['predict'] predict = predicts['predict']
label_out = labels['label_out'] label_out = labels['label_out']
label_out = fluid.layers.cast(x=label_out, dtype='int64') label_out = fluid.layers.cast(x=label_out, dtype='int64')
# calculate attention loss
cost = fluid.layers.cross_entropy(input=predict, label=label_out) cost = fluid.layers.cross_entropy(input=predict, label=label_out)
sum_cost = fluid.layers.reduce_sum(cost) sum_cost = fluid.layers.reduce_sum(cost)
return sum_cost return sum_cost
...@@ -30,6 +30,7 @@ class CTCLoss(object): ...@@ -30,6 +30,7 @@ class CTCLoss(object):
def __call__(self, predicts, labels): def __call__(self, predicts, labels):
predict = predicts['predict'] predict = predicts['predict']
label = labels['label'] label = labels['label']
# calculate ctc loss
cost = fluid.layers.warpctc( cost = fluid.layers.warpctc(
input=predict, label=label, blank=self.char_num, norm_by_times=True) input=predict, label=label, blank=self.char_num, norm_by_times=True)
sum_cost = fluid.layers.reduce_sum(cost) sum_cost = fluid.layers.reduce_sum(cost)
......
...@@ -20,15 +20,21 @@ import sys ...@@ -20,15 +20,21 @@ import sys
class CharacterOps(object): class CharacterOps(object):
""" Convert between text-label and text-index """ """
Convert between text-label and text-index
Args:
config: config from yaml file
"""
def __init__(self, config): def __init__(self, config):
self.character_type = config['character_type'] self.character_type = config['character_type']
self.loss_type = config['loss_type'] self.loss_type = config['loss_type']
self.max_text_len = config['max_text_length'] self.max_text_len = config['max_text_length']
# use the default dictionary(36 char)
if self.character_type == "en": if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str) dict_character = list(self.character_str)
# use the custom dictionary
elif self.character_type in [ elif self.character_type in [
"ch", 'japan', 'korean', 'french', 'german' "ch", 'japan', 'korean', 'french', 'german'
]: ]:
...@@ -55,25 +61,27 @@ class CharacterOps(object): ...@@ -55,25 +61,27 @@ class CharacterOps(object):
"Nonsupport type of the character: {}".format(self.character_str) "Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos" self.beg_str = "sos"
self.end_str = "eos" self.end_str = "eos"
# add start and end str for attention
if self.loss_type == "attention": if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character dict_character = [self.beg_str, self.end_str] + dict_character
elif self.loss_type == "srn": elif self.loss_type == "srn":
dict_character = dict_character + [self.beg_str, self.end_str] dict_character = dict_character + [self.beg_str, self.end_str]
# create char dict
self.dict = {} self.dict = {}
for i, char in enumerate(dict_character): for i, char in enumerate(dict_character):
self.dict[char] = i self.dict[char] = i
self.character = dict_character self.character = dict_character
def encode(self, text): def encode(self, text):
"""convert text-label into text-index. """
input: convert text-label into text-index.
Args:
text: text labels of each image. [batch_size] text: text labels of each image. [batch_size]
Return:
output:
text: concatenated text index for CTCLoss. text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)] [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
""" """
# Ignore capital
if self.character_type == "en": if self.character_type == "en":
text = text.lower() text = text.lower()
...@@ -86,7 +94,15 @@ class CharacterOps(object): ...@@ -86,7 +94,15 @@ class CharacterOps(object):
return text return text
def decode(self, text_index, is_remove_duplicate=False): def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """ """
convert text-index into text-label.
Args:
text_index: text index for each image
is_remove_duplicate: Whether to remove duplicate characters,
The default is False
Return:
text: text label
"""
char_list = [] char_list = []
char_num = self.get_char_num() char_num = self.get_char_num()
...@@ -108,6 +124,9 @@ class CharacterOps(object): ...@@ -108,6 +124,9 @@ class CharacterOps(object):
return text return text
def get_char_num(self): def get_char_num(self):
"""
Get character num
"""
return len(self.character) return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end): def get_beg_end_flag_idx(self, beg_or_end):
...@@ -132,6 +151,21 @@ def cal_predicts_accuracy(char_ops, ...@@ -132,6 +151,21 @@ def cal_predicts_accuracy(char_ops,
labels, labels,
labels_lod, labels_lod,
is_remove_duplicate=False): is_remove_duplicate=False):
"""
Calculate prediction accuracy
Args:
char_ops: CharacterOps
preds: preds result,text index
preds_lod: lod tensor of preds
labels: label of input image, text index
labels_lod: lod tensor of label
is_remove_duplicate: Whether to remove duplicate characters,
The default is False
Return:
acc: The accuracy of test set
acc_num: The correct number of samples predicted
img_num: The total sample number of the test set
"""
acc_num = 0 acc_num = 0
img_num = 0 img_num = 0
for ino in range(len(labels_lod) - 1): for ino in range(len(labels_lod) - 1):
...@@ -189,6 +223,14 @@ def cal_predicts_accuracy_srn(char_ops, ...@@ -189,6 +223,14 @@ def cal_predicts_accuracy_srn(char_ops,
def convert_rec_attention_infer_res(preds): def convert_rec_attention_infer_res(preds):
"""
Convert recognition attention predict result with lod information
Args:
preds: the output of the model
Return:
convert_ids: A 1-D Tensor represents all the predicted results.
target_lod: The lod information of the predicted results
"""
img_num = preds.shape[0] img_num = preds.shape[0]
target_lod = [0] target_lod = [0]
convert_ids = [] convert_ids = []
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
# Waiting for your contribution
PaddleOCR welcomes you to provide multilingual corpus for us to synthesize more data to optimize the model.
If you are interested, you can submit the corpus text to this directory and name it with {language}_corpus.txt.
PaddleOCR thanks for your contribution.
\ No newline at end of file
# 欢迎贡献语料
PaddleOCR非常欢迎你提供多语言的语料,以供我们合成更多数据来优化模型。
如你感兴趣,可将语料文本提交到此目录,并以 {语言}_corpus.txt 命名,PaddleOCR团队感谢你的贡献。
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
]
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
©
°
²
´
½
Á
Ä
Å
Ç
È
É
Í
Ó
Ö
×
Ü
ß
à
á
â
ã
ä
å
æ
ç
è
é
ê
ë
í
ð
ñ
ò
ó
ô
õ
ö
ø
ú
û
ü
ý
ā
ă
ą
ć
Č
č
đ
ē
ė
ę
ğ
ī
ı
Ł
ł
ń
ň
ō
ř
Ş
ş
Š
š
ţ
ū
ż
Ž
ž
Ș
ș
ț
Δ
α
λ
μ
φ
Г
О
а
в
л
о
р
с
т
я
 
鹿
使
調
宿
便
退
簿
姿
沿
寿
稿
湿
1
2
西
尿
禿
貿

"
0
`
'
9
6
8
3
-
5
7
:
á
ň
4
ó
%
,
/
ž
&
_
ă
ş
å
殿
!
 
é
$
綿
Ș
Č
č
;
ș
駿
í
è
ø
=
#
ä
õ
ë
ñ
ö
ß
ü
ę
ř
â
Ł
ą
ł
μ
+
ć
û
ā
à
Ž
đ
Ä
š
漿
×
Ü
Å
*
ı
ê
ç
ō
ţ
О
с
т
р
о
в
Г
а
л
я
ń
ī
Š
É
Á
ã
ğ
½
α
Ö
φ
ô
Ó
λ
Δ
ż
ò
´
ū
©
Ç
ú
È
Ş
ė
<
æ
²
Í
ð
ț
>
ý
ē
°
C
O
E
A
B
L
G
T
M
S
u
(
)
a
.
W
i
V
b
c
f
e
N
K
R
U
D
g
P
F
Z
I
H
Q
y
o
t
J
Y
X
s
p
n
m
j
麿
w
輿
k
v
h
r
d
l
橿
竿
廿
椿
婿
谿
滿
耀
z
[
]
丿 丿
使
便
姿
婿
宿
寿
尿
廿
忿 忿
椿
槿 槿
橿
殿
沿
湿
滿
漿
禿
稿
竿
簿
綿
耀
西
調
谿
貿
輿
退
駿
鹿
麿
?
!
"
#
: $
%
&
'
< *
+
-
/
0
1
2 2
3
4
5 5
-
6 6
3
9
'
7 7
1
8 8
9
:
{
0
}
4
+
|
[
]
"
*
=
#
; ;
<
=
?
_
> >
?
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\ \
]
^
_
!
` `
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
&
~ ~
©
°
²
½
Á
Ä
Å
Ç
É
Í
Î
Ó
Ö
×
Ü
ß
à
á
믿 â
ã
ä
å
æ
ç
è
é
ê
ë
ì
í
î
ï
ð
ñ
ò
ó
ô
õ
ö
ø
ú
û
ü
ý
ā
ă
ą
ć
Č
č
đ
ē
ė
ę
ě
ğ
ī
İ
ı
Ł
ł
ń
ň
ō
ř
Ş
ş
Š
š
ţ
ū
ź
ż
Ž
ž
Ș
ș
Α
Δ
α
λ
φ
Г
О
а
в
л
о
р
с
т
я
%
使
便
滿
西
尿
彿
殿
沿
/
滿
릿
굿
西
調
便
輿
鹿
輿
調
彿
$
沿
殿
鹿
꼿
굿
꼿
릿
믿
使
尿
^
á
ň
ó
ž
ç
ü
í
é
ã
ä
ć
ă
ş
ö
Š
ě
ñ
퀀 퀀
å
ř
ý
è
ê
ō
ø
î
Č
č
Ș
ș
â
ë
É
Ö
ß
ę
Ł
ź
ą
ł
Α
û
ā
à
Ž
đ
Ä
š
×
Ü
Å
ì
ı
ţ
İ
О
с
т
р
о
в
Г
а
л
я
ė
ń
Á
ī
ğ
½
Ç
φ
ż
ô
Ó
λ
Δ
ò
ū
α
©
ï
ú
Ş
æ
²
õ
°
Í
ð
Î
ē
!
"
%
&
'
(
)
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
?
[
]
«
³
µ
º
»
A
Á
À
B
C
Ç
D
E
É
È
F
G
H
I
Í
Ï
J
K
L
M
N
O
Ó
Ò
P
Q
R
S
T
U
V
W
X
Y
Z
a
á
à
b
c
d
e
é
è
f
g
h
i
í
ï
j
k
l
m
n
o
ó
ò
p
q
r
s
t
u
ú
ü
v
w
x
y
z
ç
æ
Æ
ê
Ê
ë
Ë
ñ
Ñ
ô
Ô
œ
Œ
ù
Ù
...@@ -90,3 +90,15 @@ def check_and_read_gif(img_path): ...@@ -90,3 +90,15 @@ def check_and_read_gif(img_path):
return imgvalue, True return imgvalue, True
return None, False return None, False
def create_multi_devices_program(program, loss_var_name):
build_strategy = fluid.BuildStrategy()
build_strategy.memory_optimize = False
build_strategy.enable_inplace = True
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = 1
compile_program = fluid.CompiledProgram(program).with_data_parallel(
loss_name=loss_var_name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
return compile_program
...@@ -3,4 +3,5 @@ imgaug ...@@ -3,4 +3,5 @@ imgaug
pyclipper pyclipper
lmdb lmdb
tqdm tqdm
numpy numpy
\ No newline at end of file opencv-python
...@@ -122,7 +122,9 @@ def eval_rec_run(exe, config, eval_info_dict, mode): ...@@ -122,7 +122,9 @@ def eval_rec_run(exe, config, eval_info_dict, mode):
def test_rec_benchmark(exe, config, eval_info_dict): def test_rec_benchmark(exe, config, eval_info_dict):
" Evaluate lmdb dataset " """
eval rec benchmark
"""
eval_data_list = ['IIIT5k_3000', 'SVT', 'IC03_860', 'IC03_867', \ eval_data_list = ['IIIT5k_3000', 'SVT', 'IC03_860', 'IC03_867', \
'IC13_857', 'IC13_1015', 'IC15_1811', 'IC15_2077', 'SVTP', 'CUTE80'] 'IC13_857', 'IC13_1015', 'IC15_1811', 'IC15_2077', 'SVTP', 'CUTE80']
eval_data_dir = config['TestReader']['lmdb_sets_dir'] eval_data_dir = config['TestReader']['lmdb_sets_dir']
......
...@@ -33,12 +33,13 @@ from paddle import fluid ...@@ -33,12 +33,13 @@ from paddle import fluid
class TextClassifier(object): class TextClassifier(object):
def __init__(self, args): def __init__(self, args):
self.predictor, self.input_tensor, self.output_tensors = \ if args.use_pdserving is False:
utility.create_predictor(args, mode="cls") self.predictor, self.input_tensor, self.output_tensors = \
utility.create_predictor(args, mode="cls")
self.use_zero_copy_run = args.use_zero_copy_run
self.cls_image_shape = [int(v) for v in args.cls_image_shape.split(",")] self.cls_image_shape = [int(v) for v in args.cls_image_shape.split(",")]
self.cls_batch_num = args.rec_batch_num self.cls_batch_num = args.rec_batch_num
self.label_list = args.label_list self.label_list = args.label_list
self.use_zero_copy_run = args.use_zero_copy_run
self.cls_thresh = args.cls_thresh self.cls_thresh = args.cls_thresh
def resize_norm_img(self, img): def resize_norm_img(self, img):
...@@ -103,7 +104,6 @@ class TextClassifier(object): ...@@ -103,7 +104,6 @@ class TextClassifier(object):
label_out = self.output_tensors[1].copy_to_cpu() label_out = self.output_tensors[1].copy_to_cpu()
if len(label_out.shape) != 1: if len(label_out.shape) != 1:
prob_out, label_out = label_out, prob_out prob_out, label_out = label_out, prob_out
elapse = time.time() - starttime elapse = time.time() - starttime
predict_time += elapse predict_time += elapse
for rno in range(len(label_out)): for rno in range(len(label_out)):
......
...@@ -42,7 +42,6 @@ class TextDetector(object): ...@@ -42,7 +42,6 @@ class TextDetector(object):
def __init__(self, args): def __init__(self, args):
max_side_len = args.det_max_side_len max_side_len = args.det_max_side_len
self.det_algorithm = args.det_algorithm self.det_algorithm = args.det_algorithm
self.use_zero_copy_run = args.use_zero_copy_run
preprocess_params = {'max_side_len': max_side_len} preprocess_params = {'max_side_len': max_side_len}
postprocess_params = {} postprocess_params = {}
if self.det_algorithm == "DB": if self.det_algorithm == "DB":
...@@ -75,9 +74,10 @@ class TextDetector(object): ...@@ -75,9 +74,10 @@ class TextDetector(object):
else: else:
logger.info("unknown det_algorithm:{}".format(self.det_algorithm)) logger.info("unknown det_algorithm:{}".format(self.det_algorithm))
sys.exit(0) sys.exit(0)
if args.use_pdserving is False:
self.predictor, self.input_tensor, self.output_tensors =\ self.use_zero_copy_run = args.use_zero_copy_run
utility.create_predictor(args, mode="det") self.predictor, self.input_tensor, self.output_tensors =\
utility.create_predictor(args, mode="det")
def order_points_clockwise(self, pts): def order_points_clockwise(self, pts):
""" """
...@@ -117,7 +117,7 @@ class TextDetector(object): ...@@ -117,7 +117,7 @@ class TextDetector(object):
box = self.clip_det_res(box, img_height, img_width) box = self.clip_det_res(box, img_height, img_width)
rect_width = int(np.linalg.norm(box[0] - box[1])) rect_width = int(np.linalg.norm(box[0] - box[1]))
rect_height = int(np.linalg.norm(box[0] - box[3])) rect_height = int(np.linalg.norm(box[0] - box[3]))
if rect_width <= 10 or rect_height <= 10: if rect_width <= 3 or rect_height <= 3:
continue continue
dt_boxes_new.append(box) dt_boxes_new.append(box)
dt_boxes = np.array(dt_boxes_new) dt_boxes = np.array(dt_boxes_new)
......
...@@ -34,14 +34,15 @@ from ppocr.utils.character import CharacterOps ...@@ -34,14 +34,15 @@ from ppocr.utils.character import CharacterOps
class TextRecognizer(object): class TextRecognizer(object):
def __init__(self, args): def __init__(self, args):
self.predictor, self.input_tensor, self.output_tensors =\ if args.use_pdserving is False:
utility.create_predictor(args, mode="rec") self.predictor, self.input_tensor, self.output_tensors =\
utility.create_predictor(args, mode="rec")
self.use_zero_copy_run = args.use_zero_copy_run
self.rec_image_shape = [int(v) for v in args.rec_image_shape.split(",")] self.rec_image_shape = [int(v) for v in args.rec_image_shape.split(",")]
self.character_type = args.rec_char_type self.character_type = args.rec_char_type
self.rec_batch_num = args.rec_batch_num self.rec_batch_num = args.rec_batch_num
self.rec_algorithm = args.rec_algorithm self.rec_algorithm = args.rec_algorithm
self.text_len = args.max_text_length self.text_len = args.max_text_length
self.use_zero_copy_run = args.use_zero_copy_run
char_ops_params = { char_ops_params = {
"character_type": args.rec_char_type, "character_type": args.rec_char_type,
"character_dict_path": args.rec_char_dict_path, "character_dict_path": args.rec_char_dict_path,
...@@ -62,8 +63,9 @@ class TextRecognizer(object): ...@@ -62,8 +63,9 @@ class TextRecognizer(object):
def resize_norm_img(self, img, max_wh_ratio): def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape imgC, imgH, imgW = self.rec_image_shape
assert imgC == img.shape[2] assert imgC == img.shape[2]
wh_ratio = max(max_wh_ratio, imgW * 1.0 / imgH)
if self.character_type == "ch": if self.character_type == "ch":
imgW = int((32 * max_wh_ratio)) imgW = int((32 * wh_ratio))
h, w = img.shape[:2] h, w = img.shape[:2]
ratio = w / float(h) ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW: if math.ceil(imgH * ratio) > imgW:
...@@ -320,7 +322,7 @@ def main(args): ...@@ -320,7 +322,7 @@ def main(args):
print(e) print(e)
logger.info( logger.info(
"ERROR!!!! \n" "ERROR!!!! \n"
"Please read the FAQhttps://github.com/PaddlePaddle/PaddleOCR#faq \n" "Please read the FAQ: https://github.com/PaddlePaddle/PaddleOCR#faq \n"
"If your model has tps module: " "If your model has tps module: "
"TPS does not support variable shape.\n" "TPS does not support variable shape.\n"
"Please set --rec_image_shape='3,32,100' and --rec_char_type='en' ") "Please set --rec_image_shape='3,32,100' and --rec_char_type='en' ")
......
...@@ -160,8 +160,13 @@ def main(args): ...@@ -160,8 +160,13 @@ def main(args):
txts = [rec_res[i][0] for i in range(len(rec_res))] txts = [rec_res[i][0] for i in range(len(rec_res))]
scores = [rec_res[i][1] for i in range(len(rec_res))] scores = [rec_res[i][1] for i in range(len(rec_res))]
draw_img = draw_ocr( draw_img = draw_ocr_box_txt(
image, boxes, txts, scores, drop_score=drop_score, font_path=font_path) image,
boxes,
txts,
scores,
drop_score=drop_score,
font_path=font_path)
draw_img_save = "./inference_results/" draw_img_save = "./inference_results/"
if not os.path.exists(draw_img_save): if not os.path.exists(draw_img_save):
os.makedirs(draw_img_save) os.makedirs(draw_img_save)
......
...@@ -72,9 +72,7 @@ def parse_args(): ...@@ -72,9 +72,7 @@ def parse_args():
default="./ppocr/utils/ppocr_keys_v1.txt") default="./ppocr/utils/ppocr_keys_v1.txt")
parser.add_argument("--use_space_char", type=str2bool, default=True) parser.add_argument("--use_space_char", type=str2bool, default=True)
parser.add_argument( parser.add_argument(
"--vis_font_path", "--vis_font_path", type=str, default="./doc/simfang.ttf")
type=str,
default="./doc/simfang.ttf")
# params for text classifier # params for text classifier
parser.add_argument("--use_angle_cls", type=str2bool, default=False) parser.add_argument("--use_angle_cls", type=str2bool, default=False)
...@@ -86,6 +84,9 @@ def parse_args(): ...@@ -86,6 +84,9 @@ def parse_args():
parser.add_argument("--enable_mkldnn", type=str2bool, default=False) parser.add_argument("--enable_mkldnn", type=str2bool, default=False)
parser.add_argument("--use_zero_copy_run", type=str2bool, default=False) parser.add_argument("--use_zero_copy_run", type=str2bool, default=False)
parser.add_argument("--use_pdserving", type=str2bool, default=False)
return parser.parse_args() return parser.parse_args()
...@@ -203,7 +204,12 @@ def draw_ocr(image, ...@@ -203,7 +204,12 @@ def draw_ocr(image,
return image return image
def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"): def draw_ocr_box_txt(image,
boxes,
txts,
scores=None,
drop_score=0.5,
font_path="./doc/simfang.ttf"):
h, w = image.height, image.width h, w = image.height, image.width
img_left = image.copy() img_left = image.copy()
img_right = Image.new('RGB', (w, h), (255, 255, 255)) img_right = Image.new('RGB', (w, h), (255, 255, 255))
...@@ -213,7 +219,9 @@ def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"): ...@@ -213,7 +219,9 @@ def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"):
random.seed(0) random.seed(0)
draw_left = ImageDraw.Draw(img_left) draw_left = ImageDraw.Draw(img_left)
draw_right = ImageDraw.Draw(img_right) draw_right = ImageDraw.Draw(img_right)
for (box, txt) in zip(boxes, txts): for idx, (box, txt) in enumerate(zip(boxes, txts)):
if scores is not None and scores[idx] < drop_score:
continue
color = (random.randint(0, 255), random.randint(0, 255), color = (random.randint(0, 255), random.randint(0, 255),
random.randint(0, 255)) random.randint(0, 255))
draw_left.polygon(box, fill=color) draw_left.polygon(box, fill=color)
...@@ -229,8 +237,7 @@ def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"): ...@@ -229,8 +237,7 @@ def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"):
1])**2) 1])**2)
if box_height > 2 * box_width: if box_height > 2 * box_width:
font_size = max(int(box_width * 0.9), 10) font_size = max(int(box_width * 0.9), 10)
font = ImageFont.truetype( font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
font_path, font_size, encoding="utf-8")
cur_y = box[0][1] cur_y = box[0][1]
for c in txt: for c in txt:
char_size = font.getsize(c) char_size = font.getsize(c)
...@@ -239,8 +246,7 @@ def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"): ...@@ -239,8 +246,7 @@ def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"):
cur_y += char_size[1] cur_y += char_size[1]
else: else:
font_size = max(int(box_height * 0.8), 10) font_size = max(int(box_height * 0.8), 10)
font = ImageFont.truetype( font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
font_path, font_size, encoding="utf-8")
draw_right.text( draw_right.text(
[box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font) [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font)
img_left = Image.blend(image, img_left, 0.5) img_left = Image.blend(image, img_left, 0.5)
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
from paddle_serving_client.io import inference_model_to_serving
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--model_dir", type=str)
parser.add_argument("--server_dir", type=str, default="serving_server_dir")
parser.add_argument("--client_dir", type=str, default="serving_client_dir")
return parser.parse_args()
args = parse_args()
inference_model_dir = args.model_dir
serving_client_dir = args.server_dir
serving_server_dir = args.client_dir
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params")
...@@ -90,13 +90,13 @@ def load_config(file_path): ...@@ -90,13 +90,13 @@ def load_config(file_path):
merge_config(default_config) merge_config(default_config)
_, ext = os.path.splitext(file_path) _, ext = os.path.splitext(file_path)
assert ext in ['.yml', '.yaml'], "only support yaml files for now" assert ext in ['.yml', '.yaml'], "only support yaml files for now"
merge_config(yaml.load(open(file_path), Loader=yaml.Loader)) merge_config(yaml.load(open(file_path, 'rb'), Loader=yaml.Loader))
assert "reader_yml" in global_config['Global'],\ assert "reader_yml" in global_config['Global'],\
"absence reader_yml in global" "absence reader_yml in global"
reader_file_path = global_config['Global']['reader_yml'] reader_file_path = global_config['Global']['reader_yml']
_, ext = os.path.splitext(reader_file_path) _, ext = os.path.splitext(reader_file_path)
assert ext in ['.yml', '.yaml'], "only support yaml files for reader" assert ext in ['.yml', '.yaml'], "only support yaml files for reader"
merge_config(yaml.load(open(reader_file_path), Loader=yaml.Loader)) merge_config(yaml.load(open(reader_file_path, 'rb'), Loader=yaml.Loader))
return global_config return global_config
...@@ -150,19 +150,20 @@ def check_gpu(use_gpu): ...@@ -150,19 +150,20 @@ def check_gpu(use_gpu):
def build(config, main_prog, startup_prog, mode): def build(config, main_prog, startup_prog, mode):
""" """
Build a program using a model and an optimizer Build a program using a model and an optimizer
1. create feeds 1. create a dataloader
2. create a dataloader 2. create a model
3. create a model 3. create fetches
4. create fetchs 4. create an optimizer
5. create an optimizer
Args: Args:
config(dict): config config(dict): config
main_prog(): main program main_prog(): main program
startup_prog(): startup program startup_prog(): startup program
is_train(bool): train or valid mode(str): train or valid
Returns: Returns:
dataloader(): a bridge between the model and the data dataloader(): a bridge between the model and the data
fetchs(dict): dict of model outputs(included loss and measures) fetch_name_list(dict): dict of model outputs(included loss and measures)
fetch_varname_list(list): list of outputs' varname
opt_loss_name(str): name of loss
""" """
with fluid.program_guard(main_prog, startup_prog): with fluid.program_guard(main_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
...@@ -207,8 +208,8 @@ def build_export(config, main_prog, startup_prog): ...@@ -207,8 +208,8 @@ def build_export(config, main_prog, startup_prog):
Build input and output for exporting a checkpoints model to an inference model Build input and output for exporting a checkpoints model to an inference model
Args: Args:
config(dict): config config(dict): config
main_prog(): main program main_prog: main program
startup_prog(): startup program startup_prog: startup program
Returns: Returns:
feeded_var_names(list[str]): var names of input for exported inference model feeded_var_names(list[str]): var names of input for exported inference model
target_vars(list[Variable]): output vars for exported inference model target_vars(list[Variable]): output vars for exported inference model
...@@ -241,9 +242,11 @@ def create_multi_devices_program(program, loss_var_name, for_quant=False): ...@@ -241,9 +242,11 @@ def create_multi_devices_program(program, loss_var_name, for_quant=False):
build_strategy.enable_inplace = True build_strategy.enable_inplace = True
if for_quant: if for_quant:
build_strategy.fuse_all_reduce_ops = False build_strategy.fuse_all_reduce_ops = False
else:
program = fluid.CompiledProgram(program)
exec_strategy = fluid.ExecutionStrategy() exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = 1 exec_strategy.num_iteration_per_drop_scope = 1
compile_program = fluid.CompiledProgram(program).with_data_parallel( compile_program = program.with_data_parallel(
loss_name=loss_var_name, loss_name=loss_var_name,
build_strategy=build_strategy, build_strategy=build_strategy,
exec_strategy=exec_strategy) exec_strategy=exec_strategy)
...@@ -254,10 +257,15 @@ def train_eval_det_run(config, ...@@ -254,10 +257,15 @@ def train_eval_det_run(config,
exe, exe,
train_info_dict, train_info_dict,
eval_info_dict, eval_info_dict,
is_pruning=False): is_slim=None):
''' """
main program of evaluation for detection Feed data to the model and fetch the measures and loss for detection
''' Args:
config: config
exe:
train_info_dict: information dict for training
eval_info_dict: information dict for evaluation
"""
train_batch_id = 0 train_batch_id = 0
log_smooth_window = config['Global']['log_smooth_window'] log_smooth_window = config['Global']['log_smooth_window']
epoch_num = config['Global']['epoch_num'] epoch_num = config['Global']['epoch_num']
...@@ -313,14 +321,21 @@ def train_eval_det_run(config, ...@@ -313,14 +321,21 @@ def train_eval_det_run(config,
best_batch_id = train_batch_id best_batch_id = train_batch_id
best_epoch = epoch best_epoch = epoch
save_path = save_model_dir + "/best_accuracy" save_path = save_model_dir + "/best_accuracy"
if is_pruning: if is_slim is None:
import paddleslim as slim
slim.prune.save_model(
exe, train_info_dict['train_program'],
save_path)
else:
save_model(train_info_dict['train_program'], save_model(train_info_dict['train_program'],
save_path) save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(
exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
strs = 'Test iter: {}, metrics:{}, best_hmean:{:.6f}, best_epoch:{}, best_batch_id:{}'.format( strs = 'Test iter: {}, metrics:{}, best_hmean:{:.6f}, best_epoch:{}, best_batch_id:{}'.format(
train_batch_id, metrics, best_eval_hmean, best_epoch, train_batch_id, metrics, best_eval_hmean, best_epoch,
best_batch_id) best_batch_id)
...@@ -331,27 +346,50 @@ def train_eval_det_run(config, ...@@ -331,27 +346,50 @@ def train_eval_det_run(config,
train_loader.reset() train_loader.reset()
if epoch == 0 and save_epoch_step == 1: if epoch == 0 and save_epoch_step == 1:
save_path = save_model_dir + "/iter_epoch_0" save_path = save_model_dir + "/iter_epoch_0"
if is_pruning: if is_slim is None:
import paddleslim as slim
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
else:
save_model(train_info_dict['train_program'], save_path) save_model(train_info_dict['train_program'], save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
if epoch > 0 and epoch % save_epoch_step == 0: if epoch > 0 and epoch % save_epoch_step == 0:
save_path = save_model_dir + "/iter_epoch_%d" % (epoch) save_path = save_model_dir + "/iter_epoch_%d" % (epoch)
if is_pruning: if is_slim is None:
import paddleslim as slim
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
else:
save_model(train_info_dict['train_program'], save_path) save_model(train_info_dict['train_program'], save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
return return
def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): def train_eval_rec_run(config,
''' exe,
main program of evaluation for recognition train_info_dict,
''' eval_info_dict,
is_slim=None):
"""
Feed data to the model and fetch the measures and loss for recognition
Args:
config: config
exe:
train_info_dict: information dict for training
eval_info_dict: information dict for evaluation
"""
train_batch_id = 0 train_batch_id = 0
log_smooth_window = config['Global']['log_smooth_window'] log_smooth_window = config['Global']['log_smooth_window']
epoch_num = config['Global']['epoch_num'] epoch_num = config['Global']['epoch_num']
...@@ -428,7 +466,21 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): ...@@ -428,7 +466,21 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
best_batch_id = train_batch_id best_batch_id = train_batch_id
best_epoch = epoch best_epoch = epoch
save_path = save_model_dir + "/best_accuracy" save_path = save_model_dir + "/best_accuracy"
save_model(train_info_dict['train_program'], save_path) if is_slim is None:
save_model(train_info_dict['train_program'],
save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(
exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
strs = 'Test iter: {}, acc:{:.6f}, best_acc:{:.6f}, best_epoch:{}, best_batch_id:{}, eval_sample_num:{}'.format( strs = 'Test iter: {}, acc:{:.6f}, best_acc:{:.6f}, best_epoch:{}, best_batch_id:{}, eval_sample_num:{}'.format(
train_batch_id, eval_acc, best_eval_acc, best_epoch, train_batch_id, eval_acc, best_eval_acc, best_epoch,
best_batch_id, eval_sample_num) best_batch_id, eval_sample_num)
...@@ -439,14 +491,42 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): ...@@ -439,14 +491,42 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
train_loader.reset() train_loader.reset()
if epoch == 0 and save_epoch_step == 1: if epoch == 0 and save_epoch_step == 1:
save_path = save_model_dir + "/iter_epoch_0" save_path = save_model_dir + "/iter_epoch_0"
save_model(train_info_dict['train_program'], save_path) if is_slim is None:
save_model(train_info_dict['train_program'], save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
if epoch > 0 and epoch % save_epoch_step == 0: if epoch > 0 and epoch % save_epoch_step == 0:
save_path = save_model_dir + "/iter_epoch_%d" % (epoch) save_path = save_model_dir + "/iter_epoch_%d" % (epoch)
save_model(train_info_dict['train_program'], save_path) if is_slim is None:
save_model(train_info_dict['train_program'], save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
return return
def train_eval_cls_run(config, exe, train_info_dict, eval_info_dict): def train_eval_cls_run(config,
exe,
train_info_dict,
eval_info_dict,
is_slim=None):
train_batch_id = 0 train_batch_id = 0
log_smooth_window = config['Global']['log_smooth_window'] log_smooth_window = config['Global']['log_smooth_window']
epoch_num = config['Global']['epoch_num'] epoch_num = config['Global']['epoch_num']
...@@ -509,7 +589,21 @@ def train_eval_cls_run(config, exe, train_info_dict, eval_info_dict): ...@@ -509,7 +589,21 @@ def train_eval_cls_run(config, exe, train_info_dict, eval_info_dict):
best_batch_id = train_batch_id best_batch_id = train_batch_id
best_epoch = epoch best_epoch = epoch
save_path = save_model_dir + "/best_accuracy" save_path = save_model_dir + "/best_accuracy"
save_model(train_info_dict['train_program'], save_path) if is_slim is None:
save_model(train_info_dict['train_program'],
save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(
exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
strs = 'Test iter: {}, acc:{:.6f}, best_acc:{:.6f}, best_epoch:{}, best_batch_id:{}, eval_sample_num:{}'.format( strs = 'Test iter: {}, acc:{:.6f}, best_acc:{:.6f}, best_epoch:{}, best_batch_id:{}, eval_sample_num:{}'.format(
train_batch_id, eval_acc, best_eval_acc, best_epoch, train_batch_id, eval_acc, best_eval_acc, best_epoch,
best_batch_id, eval_sample_num) best_batch_id, eval_sample_num)
...@@ -520,10 +614,34 @@ def train_eval_cls_run(config, exe, train_info_dict, eval_info_dict): ...@@ -520,10 +614,34 @@ def train_eval_cls_run(config, exe, train_info_dict, eval_info_dict):
train_loader.reset() train_loader.reset()
if epoch == 0 and save_epoch_step == 1: if epoch == 0 and save_epoch_step == 1:
save_path = save_model_dir + "/iter_epoch_0" save_path = save_model_dir + "/iter_epoch_0"
save_model(train_info_dict['train_program'], save_path) if is_slim is None:
save_model(train_info_dict['train_program'], save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
if epoch > 0 and epoch % save_epoch_step == 0: if epoch > 0 and epoch % save_epoch_step == 0:
save_path = save_model_dir + "/iter_epoch_%d" % (epoch) save_path = save_model_dir + "/iter_epoch_%d" % (epoch)
save_model(train_info_dict['train_program'], save_path) if is_slim is None:
save_model(train_info_dict['train_program'], save_path)
else:
import paddleslim as slim
if is_slim == "prune":
slim.prune.save_model(exe, train_info_dict['train_program'],
save_path)
elif is_slim == "quant":
save_model(eval_info_dict['program'], save_path)
else:
raise ValueError(
"Only quant and prune are supported currently. But received {}".
format(is_slim))
return return
......
...@@ -63,8 +63,7 @@ def draw_server_result(image_file, res): ...@@ -63,8 +63,7 @@ def draw_server_result(image_file, res):
scores.append(res[dno]['confidence']) scores.append(res[dno]['confidence'])
boxes = np.array(boxes) boxes = np.array(boxes)
scores = np.array(scores) scores = np.array(scores)
draw_img = draw_ocr( draw_img = draw_ocr(image, boxes, texts, scores, drop_score=0.5)
image, boxes, texts, scores, draw_txt=True, drop_score=0.5)
return draw_img return draw_img
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
#limitations under the License. #limitations under the License.
import os import os
import argparse import argparse
import json
def gen_rec_label(input_path, out_label): def gen_rec_label(input_path, out_label):
...@@ -32,15 +33,19 @@ def gen_det_label(root_path, input_dir, out_label): ...@@ -32,15 +33,19 @@ def gen_det_label(root_path, input_dir, out_label):
label = [] label = []
with open(os.path.join(input_dir, label_file), 'r') as f: with open(os.path.join(input_dir, label_file), 'r') as f:
for line in f.readlines(): for line in f.readlines():
tmp = line.strip("\n\r").replace("\xef\xbb\xbf", "").split(',') tmp = line.strip("\n\r").replace("\xef\xbb\xbf",
points = tmp[:-2] "").split(',')
points = tmp[:8]
s = [] s = []
for i in range(0, len(points), 2): for i in range(0, len(points), 2):
b = points[i:i + 2] b = points[i:i + 2]
b = [int(t) for t in b]
s.append(b) s.append(b)
result = {"transcription": tmp[-1], "points": s} result = {"transcription": tmp[8], "points": s}
label.append(result) label.append(result)
out_file.write(img_path + '\t' + str(label) + '\n')
out_file.write(img_path + '\t' + json.dumps(
label, ensure_ascii=False) + '\n')
if __name__ == "__main__": if __name__ == "__main__":
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册