diff --git a/.gitignore b/.gitignore index 1a2dd675e961f1804fa58e2e2e49118536b84ce9..9eecb4f1056fc040d4c9579d593bee2cc4013837 100644 --- a/.gitignore +++ b/.gitignore @@ -21,3 +21,7 @@ output/ *.log .clang-format .clang_format.hook + +build/ +dist/ +paddleocr.egg-info/ \ No newline at end of file diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000000000000000000000000000000000000..388882df0c3701780dd6371bc91887356a7bca40 --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,8 @@ +include LICENSE.txt +include README.md + +recursive-include ppocr/utils *.txt utility.py character.py check.py +recursive-include ppocr/data/det *.py +recursive-include ppocr/postprocess *.py +recursive-include ppocr/postprocess/lanms *.* +recursive-include tools/infer *.py diff --git a/README.md b/README.md index b49d18eecc5abf56d5251a6d1aa5baa5be1b1898..867e3cbcbffe1a7981409d26ef58793acea28522 100644 --- a/README.md +++ b/README.md @@ -1,209 +1,139 @@ -[English](README_en.md) | 简体中文 - -## 简介 -PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 - -**近期更新** -- 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统 -- 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark -- 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具 -- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档 -- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md) -- [more](./doc/doc_ch/update.md) - -## 特性 -- 超轻量级中文OCR模型,总模型仅8.6M - - 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 - - 检测模型DB(4.1M)+识别模型CRNN(4.5M) -- 实用通用中文OCR模型 -- 多种预测推理部署方案,包括服务部署和端侧部署 -- 多种文本检测训练算法,EAST、DB -- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE -- 可运行于Linux、Windows、MacOS等多种系统 - -## 快速体验 +English | [简体中文](README_ch.md) + +## Introduction +PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice. + +**Recent updates** +- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 +- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list) +- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list) +- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](./doc/doc_en/whl_en.md) +- 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519) +- [more](./doc/doc_en/update_en.md) + +## Features +- PPOCR series of high-quality pre-trained models, comparable to commercial effects + - Ultra lightweight ppocr_mobile series models: detection (2.6M) + direction classifier (0.9M) + recognition (4.6M) = 8.1M + - General ppocr_server series models: detection (47.2M) + direction classifier (0.9M) + recognition (107M) = 155.1M + - Ultra lightweight compression ppocr_mobile_slim series models: detection (1.4M) + direction classifier (0.5M) + recognition (1.6M) = 3.5M +- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition +- Support multi-language recognition: Korean, Japanese, German, French +- Support user-defined training, provides rich predictive inference deployment solutions +- Support PIP installation, easy to use +- Support Linux, Windows, MacOS and other systems + +## Visualization
- + +
-上图是超轻量级中文OCR模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 +The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md). -- 超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr -- 移动端DEMO体验(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统):[安装包二维码获取地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) +## Quick Experience - Android手机也可以扫描下面二维码安装体验。 +You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) -
- -
+Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) -- [**中文OCR模型快速使用**](./doc/doc_ch/quickstart.md) - - -## 中文OCR模型列表 - -|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| -|-|-|-|-|-| -|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) -|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) - -## 文档教程 -- [快速安装](./doc/doc_ch/installation.md) -- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md) -- 算法介绍 - - [文本检测](#文本检测算法) - - [文本识别](#文本识别算法) - - [端到端OCR](#端到端OCR算法) -- 模型训练/评估 - - [文本检测](./doc/doc_ch/detection.md) - - [文本识别](./doc/doc_ch/recognition.md) - - [yml参数配置文件介绍](./doc/doc_ch/config.md) - - 中文OCR训练预测技巧 -- 预测部署 - - [基于Python预测引擎推理](./doc/doc_ch/inference.md) - - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - - [服务化部署](./doc/doc_ch/serving.md) - - [端侧部署](./deploy/lite/readme.md) - - 模型量化压缩 - - [Benchmark](./doc/doc_ch/benchmark.md) -- 数据集 - - [通用中英文OCR数据集](./doc/doc_ch/datasets.md) - - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) - - 垂类多语言OCR数据集 - - [常用数据标注工具](./doc/doc_ch/data_annotation.md) - - [常用数据合成工具](./doc/doc_ch/data_synthesis.md) -- [FAQ](#FAQ) -- 效果展示 - - [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示) - - [通用中文OCR效果展示](#通用中文OCR效果展示) - - [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示) -- [技术交流群](#欢迎加入PaddleOCR技术交流群) -- [参考文献](./doc/doc_ch/reference.md) -- [许可证书](#许可证书) -- [贡献代码](#贡献代码) - - -## 算法介绍 - -### 1.文本检测算法 - -PaddleOCR开源的文本检测算法列表: -- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) -- [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研, comming soon) - -在ICDAR2015文本检测公开数据集上,算法效果如下: - -|模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| - -使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下: -|模型|骨干网络|配置文件|预训练模型| -|-|-|-|-| -|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| -|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| - -* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 - -PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)。 - - -### 2.文本识别算法 - -PaddleOCR开源的文本识别算法列表: -- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) -- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) -- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) -- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研, comming soon) - -参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: - -|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| -|-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| -|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| -|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| - -使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下: - -|模型|骨干网络|配置文件|预训练模型| -|-|-|-|-| -|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| -|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| - -PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)。 - - -### 3.端到端OCR算法 -- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon) - -## 效果展示 - - -### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md) + Also, you can scan the QR code below to install the App (**Android support only**)
- +
- -### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md) +- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md) + + + +## PP-OCR 1.1 series model list(Update on Sep 17) + +| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | +| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | +| Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | +| Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) | + +For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md) + + +## Tutorials +- [Installation](./doc/doc_en/installation_en.md) +- [Quick Start](./doc/doc_en/quickstart_en.md) +- [Code Structure](./doc/doc_en/tree_en.md) +- Algorithm introduction + - [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md) + - [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md) + - [PP-OCR Pipline](#PP-OCR-Pipline) +- Model training/evaluation + - [Text Detection](./doc/doc_en/detection_en.md) + - [Text Recognition](./doc/doc_en/recognition_en.md) + - [Direction Classification](./doc/doc_en/angle_class_en.md) + - [Yml Configuration](./doc/doc_en/config_en.md) +- Inference and Deployment + - [Quick inference based on pip](./doc/doc_en/whl_en.md) + - [Python Inference](./doc/doc_en/inference_en.md) + - [C++ Inference](./deploy/cpp_infer/readme_en.md) + - [Serving](./deploy/hubserving/readme_en.md) + - [Mobile](./deploy/lite/readme_en.md) + - [Model Quantization](./deploy/slim/quantization/README_en.md) + - [Model Compression](./deploy/slim/prune/README_en.md) + - [Benchmark](./doc/doc_en/benchmark_en.md) +- Datasets + - [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md) + - [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md) + - [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md) + - [Data Annotation Tools](./doc/doc_en/data_annotation_en.md) + - [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md) +- [Visualization](#Visualization) +- [FAQ](./doc/doc_en/FAQ_en.md) +- [Community](#Community) +- [References](./doc/doc_en/reference_en.md) +- [License](#LICENSE) +- [Contribution](#CONTRIBUTION) + + + +## PP-OCR Pipline
- +
- -### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md) +PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). + +## Visualization [more](./doc/doc_en/visualization_en.md)
- + + + + + +
- -## FAQ -1. **转换attention识别模型时报错:KeyError: 'predict'** -问题已解,请更新到最新代码。 - -2. **关于推理速度** -图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。 - -3. **服务部署与移动端部署** -预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案,欢迎持续关注。 - -4. **自研算法发布时间** -自研算法SAST、SRN、End2End-PSL都将在7-8月陆续发布,敬请期待。 - -[more](./doc/doc_ch/FAQ.md) - - -## 欢迎加入PaddleOCR技术交流群 -请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍 + +## Community +Scan the QR code below with your Wechat and completing the questionnaire, you can access to offical technical exchange group.
- +
- -## 许可证书 -本项目的发布受Apache 2.0 license许可认证。 - - -## 贡献代码 -我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。 - -- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 贡献了英文文档。 -- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题 -- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码 -- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集 + +## License +This project is released under Apache 2.0 license + + +## Contribution +We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. + +- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation. +- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually. +- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure. +- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets. +- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively. +- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style. +- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services. diff --git a/README_ch.md b/README_ch.md new file mode 100644 index 0000000000000000000000000000000000000000..58b1acb982d2b880a949a4592026f6923a6bd343 --- /dev/null +++ b/README_ch.md @@ -0,0 +1,141 @@ +[English](README.md) | 简体中文 + +## 简介 +PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 + +**近期更新** +- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941 +- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载) +- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载) +- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md) +- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md) +- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) +- [More](./doc/doc_ch/update.md) + + +## 特性 + +- PPOCR系列高质量预训练模型,准确的识别效果 + - 超轻量ppocr_mobile移动端系列:检测(2.6M)+方向分类器(0.9M)+ 识别(4.6M)= 8.1M + - 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M + - 超轻量压缩ppocr_mobile_slim系列:检测(1.4M)+方向分类器(0.5M)+ 识别(1.6M)= 3.5M +- 支持中英文数字组合识别、竖排文本识别、长文本识别 +- 支持多语言识别:韩语、日语、德语、法语 +- 支持用户自定义训练,提供丰富的预测推理部署方案 +- 支持PIP快速安装使用 +- 可运行于Linux、Windows、MacOS等多种系统 + +## 效果展示 + +
+ + +
+ +上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 + +## 快速体验 +- PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr + +- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统),Android手机也可以直接扫描下面二维码安装体验。 + + +
+ +
+ +- 代码体验:从[快速安装](./doc/doc_ch/installation.md) 开始 + + +## PP-OCR 1.1系列模型列表(9月17日更新) + +| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | +| ------------ | --------------- | ----------------|---- | ---------- | -------- | +| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | +| 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | +| 中英文超轻量压缩OCR模型(3.5M) | ch_ppocr_mobile_slim_v1.1_xx| 移动端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb)| [推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)| + +更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](./doc/doc_ch/models_list.md) + +## 文档教程 +- [快速安装](./doc/doc_ch/installation.md) +- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md) +- [代码组织结构](./doc/doc_ch/tree.md) +- 算法介绍 + - [文本检测](./doc/doc_ch/algorithm_overview.md) + - [文本识别](./doc/doc_ch/algorithm_overview.md) + - [PP-OCR Pipline](#PP-OCR) +- 模型训练/评估 + - [文本检测](./doc/doc_ch/detection.md) + - [文本识别](./doc/doc_ch/recognition.md) + - [方向分类器](./doc/doc_ch/angle_class.md) + - [yml参数配置文件介绍](./doc/doc_ch/config.md) +- 预测部署 + - [基于pip安装whl包快速推理](./doc/doc_ch/whl.md) + - [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md) + - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) + - [服务化部署](./deploy/hubserving/readme.md) + - [端侧部署](./deploy/lite/readme.md) + - [模型量化](./deploy/slim/quantization/README.md) + - [模型裁剪](./deploy/slim/prune/README.md) + - [Benchmark](./doc/doc_ch/benchmark.md) +- 数据集 + - [通用中英文OCR数据集](./doc/doc_ch/datasets.md) + - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) + - [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md) + - [常用数据标注工具](./doc/doc_ch/data_annotation.md) + - [常用数据合成工具](./doc/doc_ch/data_synthesis.md) +- [效果展示](#效果展示) +- FAQ + - [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md) + - [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md) + - [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md) +- [技术交流群](#欢迎加入PaddleOCR技术交流群) +- [参考文献](./doc/doc_ch/reference.md) +- [许可证书](#许可证书) +- [贡献代码](#贡献代码) + + +## PP-OCR Pipline +
+ +
+ +PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 。 + + +## 效果展示 [more](./doc/doc_ch/visualization.md) + +
+ + + + + + +
+ + + +## 欢迎加入PaddleOCR技术交流群 +请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍 + +
+ +
+ + +## 许可证书 +本项目的发布受Apache 2.0 license许可认证。 + + +## 贡献代码 +我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。 + +- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档 +- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题 +- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码 +- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集 +- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码 +- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。 +- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务,支持快速发布可调用的Restful API服务。 diff --git a/README_en.md b/README_en.md deleted file mode 100644 index 38bda392072087f08ecca69bb2d16493e9bd2ffd..0000000000000000000000000000000000000000 --- a/README_en.md +++ /dev/null @@ -1,302 +0,0 @@ -English | [简体中文](README.md) - -## INTRODUCTION -PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice. - -**Recent updates**、 -- 2020.7.9 Add recognition model to support space, [recognition result](#space Chinese OCR results). For more information: [Recognition](./doc/doc_ch/recognition.md) and [quickstart](./doc/doc_ch/quickstart.md) -- 2020.7.9 Add data auguments and learning rate decay strategies,please read [config](./doc/doc_en/config_en.md) -- 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating -- 2020.6.5 Support exporting `attention` model to `inference_model` -- 2020.6.5 Support separate prediction and recognition, output result score -- [more](./doc/doc_en/update_en.md) - -## FEATURES -- Lightweight Chinese OCR model, total model size is only 8.6M - - Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition - - Detection model DB (4.1M) + recognition model CRNN (4.5M) -- Various text detection algorithms: EAST, DB -- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE - - -### Supported Chinese models list: - -|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link| -|-|-|-|-|-| -|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) -|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) - - -For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr - -**You can also quickly experience the lightweight Chinese OCR and General Chinese OCR models as follows:** - -## **LIGHTWEIGHT CHINESE OCR AND GENERAL CHINESE OCR INFERENCE** - -![](doc/imgs_results/11.jpg) - -The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) , [General Chinese OCR results](#General-Chinese-OCR-results) and [Support for space Recognition Model](#Space-Chinese-OCR-results). - -#### 1. ENVIRONMENT CONFIGURATION - -Please see [Quick installation](./doc/doc_en/installation_en.md) - -#### 2. DOWNLOAD INFERENCE MODELS - -#### (1) Download lightweight Chinese OCR models -*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory* - -Copy the detection and recognition 'inference model' address in [Chinese model List](#Supported-Chinese-model-list), download and unpack: - -``` -mkdir inference && cd inference -# Download the detection part of the Chinese OCR and decompress it -wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} -# Download the recognition part of the Chinese OCR and decompress it -wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package} -cd .. -``` - -Take lightweight Chinese OCR model as an example: - -``` -mkdir inference && cd inference -# Download the detection part of the lightweight Chinese OCR and decompress it -wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar -# Download the recognition part of the lightweight Chinese OCR and decompress it -wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar -# Download the space-recognized part of the lightweight Chinese OCR and decompress it -wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar && tar xf ch_rec_mv3_crnn_enhance_infer.tar - -cd .. -``` - -After the decompression is completed, the file structure should be as follows: - -``` -|-inference - |-ch_rec_mv3_crnn - |- model - |- params - |-ch_det_mv3_db - |- model - |- params - ... -``` - -#### 3. SINGLE IMAGE AND BATCH PREDICTION - -The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default. - -```bash - -# Prediction on a single image by specifying image path to image_dir -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" - -# Prediction on a batch of images by specifying image folder path to image_dir -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" - -# If you want to use CPU for prediction, you need to set the use_gpu parameter to False -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False -``` - -To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows: -``` -# Prediction on a single image by specifying image path to image_dir -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" -``` - -To run inference of the space-Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows: - -``` -# Prediction on a single image by specifying image path to image_dir -python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" -``` - -For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md) - -## DOCUMENTATION -- [Quick installation](./doc/doc_en/installation_en.md) -- [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) -- [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) -- [Inference](./doc/doc_en/inference_en.md) -- [Introduction of yml file](./doc/doc_en/config_en.md) -- [Dataset](./doc/doc_en/datasets_en.md) -- [FAQ]((#FAQ) - -## TEXT DETECTION ALGORITHM - -PaddleOCR open source text detection algorithms list: -- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) -- [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon) - -On the ICDAR2015 dataset, the text detection result is as follows: - -|Model|Backbone|precision|recall|Hmean|Download link| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| - -For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for Chinese detection task are as follows: -|Model|Backbone|Configuration file|Pre-trained model| -|-|-|-|-| -|lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| -|General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| - -* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result. - -For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) - -## TEXT RECOGNITION ALGORITHM - -PaddleOCR open-source text recognition algorithms list: -- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) -- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) -- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) -- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon) - -Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: - -|Model|Backbone|Avg Accuracy|Module combination|Download link| -|-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| -|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| -|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| - -We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows: -|Model|Backbone|Configuration file|Pre-trained model| -|-|-|-|-| -|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)| -|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)| - -Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) - -## END-TO-END OCR ALGORITHM -- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon) - - -## LIGHTWEIGHT CHINESE OCR RESULTS -![](doc/imgs_results/1.jpg) -![](doc/imgs_results/7.jpg) -![](doc/imgs_results/12.jpg) -![](doc/imgs_results/4.jpg) -![](doc/imgs_results/6.jpg) -![](doc/imgs_results/9.jpg) -![](doc/imgs_results/16.png) -![](doc/imgs_results/22.jpg) - - -## General Chinese OCR results -![](doc/imgs_results/chinese_db_crnn_server/11.jpg) -![](doc/imgs_results/chinese_db_crnn_server/2.jpg) -![](doc/imgs_results/chinese_db_crnn_server/8.jpg) - - - -## space Chinese OCR results - -### LIGHTWEIGHT CHINESE OCR RESULTS - -![](doc/imgs_results/img_11.jpg) - -### General Chinese OCR results -![](doc/imgs_results/chinese_db_crnn_server/en_paper.jpg) - - -## FAQ -1. Error when using attention-based recognition model: KeyError: 'predict' - - The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss. - -2. About inference speed - - When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values. - -3. Service deployment and mobile deployment - - It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates. - -4. Release time of self-developed algorithm - - Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. - -[more](./doc/doc_en/FAQ_en.md) - -## WELCOME TO THE PaddleOCR TECHNICAL EXCHANGE GROUP -WeChat: paddlehelp, note OCR, our assistant will get you into the group~ - - - -## REFERENCES -``` -1. EAST: -@inproceedings{zhou2017east, - title={EAST: an efficient and accurate scene text detector}, - author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun}, - booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition}, - pages={5551--5560}, - year={2017} -} - -2. DB: -@article{liao2019real, - title={Real-time Scene Text Detection with Differentiable Binarization}, - author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang}, - journal={arXiv preprint arXiv:1911.08947}, - year={2019} -} - -3. DTRB: -@inproceedings{baek2019wrong, - title={What is wrong with scene text recognition model comparisons? dataset and model analysis}, - author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk}, - booktitle={Proceedings of the IEEE International Conference on Computer Vision}, - pages={4715--4723}, - year={2019} -} - -4. SAST: -@inproceedings{wang2019single, - title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning}, - author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming}, - booktitle={Proceedings of the 27th ACM International Conference on Multimedia}, - pages={1277--1285}, - year={2019} -} - -5. SRN: -@article{yu2020towards, - title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks}, - author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui}, - journal={arXiv preprint arXiv:2003.12294}, - year={2020} -} - -6. end2end-psl: -@inproceedings{sun2019chinese, - title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning}, - author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo}, - booktitle={Proceedings of the IEEE International Conference on Computer Vision}, - pages={9086--9095}, - year={2019} -} -``` - -## LICENSE -This project is released under Apache 2.0 license - -## CONTRIBUTION -We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. - -- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) for contributing the English documentation. -- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually. -- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure. diff --git a/__init__.py b/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7d94f66be072067172d56da13d8bb27d9aeac431 --- /dev/null +++ b/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +__all__ = ['PaddleOCR', 'draw_ocr'] +from .paddleocr import PaddleOCR +from .tools.infer.utility import draw_ocr diff --git a/configs/cls/cls_mv3.yml b/configs/cls/cls_mv3.yml new file mode 100755 index 0000000000000000000000000000000000000000..57afab507c03c2a32f1665f908170de05d91143a --- /dev/null +++ b/configs/cls/cls_mv3.yml @@ -0,0 +1,44 @@ +Global: + algorithm: CLS + use_gpu: False + epoch_num: 100 + log_smooth_window: 20 + print_batch_step: 100 + save_model_dir: output/cls_mv3 + save_epoch_step: 3 + eval_batch_step: 500 + train_batch_size_per_card: 512 + test_batch_size_per_card: 512 + image_shape: [3, 48, 192] + label_list: ['0','180'] + distort: True + reader_yml: ./configs/cls/cls_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.cls_model,ClsModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.35 + model_name: small + +Head: + function: ppocr.modeling.heads.cls_head,ClsHead + class_dim: 2 + +Loss: + function: ppocr.modeling.losses.cls_loss,ClsLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 1169 + total_epoch: 100 \ No newline at end of file diff --git a/configs/cls/cls_reader.yml b/configs/cls/cls_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..2b1d4c4e75217998f2c489bcd3bfbbb8b8b7f415 --- /dev/null +++ b/configs/cls/cls_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.cls.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data/cls + label_file_path: ./train_data/cls/train.txt + +EvalReader: + reader_function: ppocr.data.cls.dataset_traversal,SimpleReader + img_set_dir: ./train_data/cls + label_file_path: ./train_data/cls/test.txt + +TestReader: + reader_function: ppocr.data.cls.dataset_traversal,SimpleReader diff --git a/configs/det/det_mv3_db.yml b/configs/det/det_mv3_db.yml index caa7bd4fa09752cff8b4d596e80b5729cce175bf..91a8e86f8bba440df83c1d9f7da0e6523d5907bb 100755 --- a/configs/det/det_mv3_db.yml +++ b/configs/det/det_mv3_db.yml @@ -49,6 +49,6 @@ Optimizer: PostProcess: function: ppocr.postprocess.db_postprocess,DBPostProcess thresh: 0.3 - box_thresh: 0.7 + box_thresh: 0.6 max_candidates: 1000 - unclip_ratio: 2.0 + unclip_ratio: 1.5 diff --git a/configs/det/det_mv3_db_v1.1.yml b/configs/det/det_mv3_db_v1.1.yml new file mode 100755 index 0000000000000000000000000000000000000000..afc11aa01dc329d095abac6d61a48cf604ee2aa2 --- /dev/null +++ b/configs/det/det_mv3_db_v1.1.yml @@ -0,0 +1,59 @@ +Global: + algorithm: DB + use_gpu: true + epoch_num: 1200 + log_smooth_window: 20 + print_batch_step: 2 + save_model_dir: ./output/det_db/ + save_epoch_step: 200 + # evaluation is run every 5000 iterations after the 4000th iteration + eval_batch_step: [4000, 5000] + train_batch_size_per_card: 16 + test_batch_size_per_card: 16 + image_shape: [3, 640, 640] + reader_yml: ./configs/det/det_db_icdar15_reader.yml + pretrain_weights: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/ + checkpoints: + save_res_path: ./output/det_db/predicts_db.txt + save_inference_dir: + +Architecture: + function: ppocr.modeling.architectures.det_model,DetModel + +Backbone: + function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: large + disable_se: true + +Head: + function: ppocr.modeling.heads.det_db_head,DBHead + model_name: large + k: 50 + inner_channels: 96 + out_channels: 2 + +Loss: + function: ppocr.modeling.losses.det_db_loss,DBLoss + balance_loss: true + main_loss_type: DiceLoss + alpha: 5 + beta: 10 + ohem_ratio: 3 + +Optimizer: + function: ppocr.optimizer,AdamDecay + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay_warmup + step_each_epoch: 16 + total_epoch: 1200 + +PostProcess: + function: ppocr.postprocess.db_postprocess,DBPostProcess + thresh: 0.3 + box_thresh: 0.6 + max_candidates: 1000 + unclip_ratio: 1.5 diff --git a/configs/det/det_r18_vd_db_v1.1.yml b/configs/det/det_r18_vd_db_v1.1.yml new file mode 100755 index 0000000000000000000000000000000000000000..aa6dc0ee01c7e218ac6b3815c7ebacf886507e14 --- /dev/null +++ b/configs/det/det_r18_vd_db_v1.1.yml @@ -0,0 +1,57 @@ +Global: + algorithm: DB + use_gpu: true + epoch_num: 1200 + log_smooth_window: 20 + print_batch_step: 2 + save_model_dir: ./output/det_r_18_vd_db/ + save_epoch_step: 200 + eval_batch_step: [3000, 2000] + train_batch_size_per_card: 8 + test_batch_size_per_card: 1 + image_shape: [3, 640, 640] + reader_yml: ./configs/det/det_db_icdar15_reader.yml + pretrain_weights: ./pretrain_models/ResNet18_vd_pretrained/ + save_res_path: ./output/det_r18_vd_db/predicts_db.txt + checkpoints: + save_inference_dir: + +Architecture: + function: ppocr.modeling.architectures.det_model,DetModel + +Backbone: + function: ppocr.modeling.backbones.det_resnet_vd,ResNet + layers: 18 + +Head: + function: ppocr.modeling.heads.det_db_head,DBHead + model_name: large + k: 50 + inner_channels: 256 + out_channels: 2 + +Loss: + function: ppocr.modeling.losses.det_db_loss,DBLoss + balance_loss: true + main_loss_type: DiceLoss + alpha: 5 + beta: 10 + ohem_ratio: 3 + +Optimizer: + function: ppocr.optimizer,AdamDecay + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay_warmup + step_each_epoch: 32 + total_epoch: 1200 + +PostProcess: + function: ppocr.postprocess.db_postprocess,DBPostProcess + thresh: 0.3 + box_thresh: 0.6 + max_candidates: 1000 + unclip_ratio: 1.5 + diff --git a/configs/det/det_r50_vd_sast_icdar15.yml b/configs/det/det_r50_vd_sast_icdar15.yml new file mode 100644 index 0000000000000000000000000000000000000000..f1ecd61dc8ccb14fde98c2fc55cb2c9e630b5c44 --- /dev/null +++ b/configs/det/det_r50_vd_sast_icdar15.yml @@ -0,0 +1,50 @@ +Global: + algorithm: SAST + use_gpu: true + epoch_num: 2000 + log_smooth_window: 20 + print_batch_step: 2 + save_model_dir: ./output/det_sast/ + save_epoch_step: 20 + eval_batch_step: 5000 + train_batch_size_per_card: 8 + test_batch_size_per_card: 8 + image_shape: [3, 512, 512] + reader_yml: ./configs/det/det_sast_icdar15_reader.yml + pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/ + save_res_path: ./output/det_sast/predicts_sast.txt + checkpoints: + save_inference_dir: + +Architecture: + function: ppocr.modeling.architectures.det_model,DetModel + +Backbone: + function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet + layers: 50 + +Head: + function: ppocr.modeling.heads.det_sast_head,SASTHead + model_name: large + only_fpn_up: False +# with_cab: False + with_cab: True + +Loss: + function: ppocr.modeling.losses.det_sast_loss,SASTLoss + +Optimizer: + function: ppocr.optimizer,RMSProp + base_lr: 0.001 + decay: + function: piecewise_decay + boundaries: [30000, 50000, 80000, 100000, 150000] + decay_rate: 0.3 + +PostProcess: + function: ppocr.postprocess.sast_postprocess,SASTPostProcess + score_thresh: 0.5 + sample_pts_num: 2 + nms_thresh: 0.2 + expand_scale: 1.0 + shrink_ratio_of_width: 0.3 \ No newline at end of file diff --git a/configs/det/det_r50_vd_sast_totaltext.yml b/configs/det/det_r50_vd_sast_totaltext.yml new file mode 100644 index 0000000000000000000000000000000000000000..ec42ce6d4bafd0c5d4360a255f35d07e83f90787 --- /dev/null +++ b/configs/det/det_r50_vd_sast_totaltext.yml @@ -0,0 +1,50 @@ +Global: + algorithm: SAST + use_gpu: true + epoch_num: 2000 + log_smooth_window: 20 + print_batch_step: 2 + save_model_dir: ./output/det_sast/ + save_epoch_step: 20 + eval_batch_step: 5000 + train_batch_size_per_card: 8 + test_batch_size_per_card: 1 + image_shape: [3, 512, 512] + reader_yml: ./configs/det/det_sast_totaltext_reader.yml + pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/ + save_res_path: ./output/det_sast/predicts_sast.txt + checkpoints: + save_inference_dir: + +Architecture: + function: ppocr.modeling.architectures.det_model,DetModel + +Backbone: + function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet + layers: 50 + +Head: + function: ppocr.modeling.heads.det_sast_head,SASTHead + model_name: large + only_fpn_up: False + # with_cab: False + with_cab: True + +Loss: + function: ppocr.modeling.losses.det_sast_loss,SASTLoss + +Optimizer: + function: ppocr.optimizer,RMSProp + base_lr: 0.001 + decay: + function: piecewise_decay + boundaries: [30000, 50000, 80000, 100000, 150000] + decay_rate: 0.3 + +PostProcess: + function: ppocr.postprocess.sast_postprocess,SASTPostProcess + score_thresh: 0.5 + sample_pts_num: 6 + nms_thresh: 0.2 + expand_scale: 1.2 + shrink_ratio_of_width: 0.2 \ No newline at end of file diff --git a/configs/det/det_sast_icdar15_reader.yml b/configs/det/det_sast_icdar15_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..ee45a85da7452e2069b0d7467b1ccfc44dd656b7 --- /dev/null +++ b/configs/det/det_sast_icdar15_reader.yml @@ -0,0 +1,24 @@ +TrainReader: + reader_function: ppocr.data.det.dataset_traversal,TrainReader + process_function: ppocr.data.det.sast_process,SASTProcessTrain + num_workers: 8 + img_set_dir: ./train_data/ + label_file_path: [./train_data/icdar2013/train_label_json.txt, ./train_data/icdar2015/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt] + data_ratio_list: [0.1, 0.45, 0.3, 0.15] + min_crop_side_ratio: 0.3 + min_crop_size: 24 + min_text_size: 4 + max_text_size: 512 + +EvalReader: + reader_function: ppocr.data.det.dataset_traversal,EvalTestReader + process_function: ppocr.data.det.sast_process,SASTProcessTest + img_set_dir: ./train_data/icdar2015/text_localization/ + label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt + max_side_len: 1536 + +TestReader: + reader_function: ppocr.data.det.dataset_traversal,EvalTestReader + process_function: ppocr.data.det.sast_process,SASTProcessTest + infer_img: ./train_data/icdar2015/text_localization/ch4_test_images/img_11.jpg + max_side_len: 1536 diff --git a/configs/det/det_sast_totaltext_reader.yml b/configs/det/det_sast_totaltext_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..92503d9f0e2b57f0d22b15591c5400185daf2afa --- /dev/null +++ b/configs/det/det_sast_totaltext_reader.yml @@ -0,0 +1,24 @@ +TrainReader: + reader_function: ppocr.data.det.dataset_traversal,TrainReader + process_function: ppocr.data.det.sast_process,SASTProcessTrain + num_workers: 8 + img_set_dir: ./train_data/ + label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt] + data_ratio_list: [0.5, 0.5] + min_crop_side_ratio: 0.3 + min_crop_size: 24 + min_text_size: 4 + max_text_size: 512 + +EvalReader: + reader_function: ppocr.data.det.dataset_traversal,EvalTestReader + process_function: ppocr.data.det.sast_process,SASTProcessTest + img_set_dir: ./train_data/ + label_file_path: ./train_data/total_text_icdar_14pt/test_label_json.txt + max_side_len: 768 + +TestReader: + reader_function: ppocr.data.det.dataset_traversal,EvalTestReader + process_function: ppocr.data.det.sast_process,SASTProcessTest + infer_img: ./train_data/afs/total_text/Images/Test/img623.jpg + max_side_len: 768 diff --git a/configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml b/configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml new file mode 100644 index 0000000000000000000000000000000000000000..8a84c635d32cce44daf405a8d48bcb0547b13acd --- /dev/null +++ b/configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_CRNN + save_epoch_step: 3 + eval_batch_step: 2000 + train_batch_size_per_card: 128 + test_batch_size_per_card: 128 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: ch + character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt + loss_type: ctc + distort: true + use_space_char: true + reader_yml: ./configs/rec/rec_chinese_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_resnet_vd,ResNet + layers: 34 + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + fc_decay: 0.00004 + SeqRNN: + hidden_size: 256 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + base_lr: 0.0005 + l2_decay: 0.00004 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay_warmup + step_each_epoch: 254 + total_epoch: 500 + warmup_minibatch: 1000 diff --git a/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml b/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml new file mode 100755 index 0000000000000000000000000000000000000000..89333f89ad9a6af4dd744daa8972ce35f805113a --- /dev/null +++ b/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml @@ -0,0 +1,54 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_CRNN + save_epoch_step: 3 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: ch + character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt + loss_type: ctc + distort: true + use_space_char: true + reader_yml: ./configs/rec/rec_chinese_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + fc_decay: 0.00001 + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + base_lr: 0.0005 + l2_decay: 0.00001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay_warmup + step_each_epoch: 254 + total_epoch: 500 + warmup_minibatch: 1000 diff --git a/configs/rec/multi_languages/rec_en_lite_train.yml b/configs/rec/multi_languages/rec_en_lite_train.yml new file mode 100644 index 0000000000000000000000000000000000000000..128424b4d3a5631f8237f6cd596c901990ff2277 --- /dev/null +++ b/configs/rec/multi_languages/rec_en_lite_train.yml @@ -0,0 +1,53 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/en_number + save_epoch_step: 3 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 30 + character_type: ch + character_dict_path: ./ppocr/utils/ic15_dict.txt + loss_type: ctc + distort: false + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_en_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay_warmup + warmup_minibatch: 1000 + step_each_epoch: 6530 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_en_reader.yml b/configs/rec/multi_languages/rec_en_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..558e2c9b653642f919b5a1e15211b934dc39ad13 --- /dev/null +++ b/configs/rec/multi_languages/rec_en_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/en_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/en_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_french_lite_train.yml b/configs/rec/multi_languages/rec_french_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..2cf54c427eb6a7c64f4b54b021c44013a1dc1d6a --- /dev/null +++ b/configs/rec/multi_languages/rec_french_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_french + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: french + character_dict_path: ./ppocr/utils/french_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_french_reader.yml b/configs/rec/multi_languages/rec_french_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..e456de1dc8800822cc9af496e825c45cdbebe081 --- /dev/null +++ b/configs/rec/multi_languages/rec_french_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/french_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/french_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_ger_lite_train.yml b/configs/rec/multi_languages/rec_ger_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..beb1755b105fea9cbade9f35ceac15d380651f37 --- /dev/null +++ b/configs/rec/multi_languages/rec_ger_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_german + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: german + character_dict_path: ./ppocr/utils/german_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_ger_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_ger_reader.yml b/configs/rec/multi_languages/rec_ger_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..edd78d4f115dc7e1376556ee0c93f655ac891e47 --- /dev/null +++ b/configs/rec/multi_languages/rec_ger_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/de_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/de_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_japan_lite_train.yml b/configs/rec/multi_languages/rec_japan_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..fbbab33eadd2901d9eac93f49e737e92d9441270 --- /dev/null +++ b/configs/rec/multi_languages/rec_japan_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_japan + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: japan + character_dict_path: ./ppocr/utils/japan_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_japan_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_japan_reader.yml b/configs/rec/multi_languages/rec_japan_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..348590920a131843a6ab7d8c76498a486d4ed709 --- /dev/null +++ b/configs/rec/multi_languages/rec_japan_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/japan_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/japan_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_korean_lite_train.yml b/configs/rec/multi_languages/rec_korean_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..29cc08aaefb017c690551e030a57e85ebb21e2dd --- /dev/null +++ b/configs/rec/multi_languages/rec_korean_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_korean + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: korean + character_dict_path: ./ppocr/utils/korean_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_korean_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_korean_reader.yml b/configs/rec/multi_languages/rec_korean_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..58ebf6cf8d340a06c0b3e2883be8839112980123 --- /dev/null +++ b/configs/rec/multi_languages/rec_korean_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/korean_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/korean_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/rec_r50fpn_vd_none_srn.yml b/configs/rec/rec_r50fpn_vd_none_srn.yml new file mode 100755 index 0000000000000000000000000000000000000000..30709e479f8da56b6bd7fe9ebf817a27bff9cc38 --- /dev/null +++ b/configs/rec/rec_r50fpn_vd_none_srn.yml @@ -0,0 +1,49 @@ +Global: + algorithm: SRN + use_gpu: true + epoch_num: 72 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: output/rec_pvam_withrotate + save_epoch_step: 1 + eval_batch_step: 8000 + train_batch_size_per_card: 64 + test_batch_size_per_card: 1 + image_shape: [1, 64, 256] + max_text_length: 25 + character_type: en + loss_type: srn + num_heads: 8 + average_window: 0.15 + max_average_window: 15625 + min_average_window: 10000 + reader_yml: ./configs/rec/rec_benchmark_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_resnet_fpn,ResNet + layers: 50 + +Head: + function: ppocr.modeling.heads.rec_srn_all_head,SRNPredict + encoder_type: rnn + num_encoder_TUs: 2 + num_decoder_TUs: 4 + hidden_dims: 512 + SeqRNN: + hidden_size: 256 + +Loss: + function: ppocr.modeling.losses.rec_srn_loss,SRNLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + base_lr: 0.0001 + beta1: 0.9 + beta2: 0.999 diff --git a/deploy/android_demo/README.md b/deploy/android_demo/README.md index 4d85dee99ab3616594b4ff3a17acb97a6267b12d..e35e757914aa355c97293662652b1e02676e32eb 100644 --- a/deploy/android_demo/README.md +++ b/deploy/android_demo/README.md @@ -1,6 +1,6 @@ # 如何快速测试 ### 1. 安装最新版本的Android Studio -可以从https://developer.android.com/studio下载。本Demo使用是4.0版本Android Studio编写。 +可以从https://developer.android.com/studio 下载。本Demo使用是4.0版本Android Studio编写。 ### 2. 按照NDK 20 以上版本 Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编译成功。 diff --git a/deploy/android_demo/app/build.gradle b/deploy/android_demo/app/build.gradle index adf3968b40960b50bc62a7ba669ce28346afa362..5ecb11692c2a66f941dc41425761519607bad39e 100644 --- a/deploy/android_demo/app/build.gradle +++ b/deploy/android_demo/app/build.gradle @@ -3,11 +3,11 @@ import java.security.MessageDigest apply plugin: 'com.android.application' android { - compileSdkVersion 28 + compileSdkVersion 29 defaultConfig { applicationId "com.baidu.paddle.lite.demo.ocr" - minSdkVersion 15 - targetSdkVersion 28 + minSdkVersion 23 + targetSdkVersion 29 versionCode 1 versionName "1.0" testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner" @@ -39,9 +39,8 @@ android { dependencies { implementation fileTree(include: ['*.jar'], dir: 'libs') - implementation 'com.android.support:appcompat-v7:28.0.0' - implementation 'com.android.support.constraint:constraint-layout:1.1.3' - implementation 'com.android.support:design:28.0.0' + implementation 'androidx.appcompat:appcompat:1.1.0' + implementation 'androidx.constraintlayout:constraintlayout:1.1.3' testImplementation 'junit:junit:4.12' androidTestImplementation 'com.android.support.test:runner:1.0.2' androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2' diff --git a/deploy/android_demo/app/src/main/AndroidManifest.xml b/deploy/android_demo/app/src/main/AndroidManifest.xml index ff1900d637a827998c4da52b9a2dda51b8ae89c8..54482b1dcc9de66021d0109e5683302c8445ba6a 100644 --- a/deploy/android_demo/app/src/main/AndroidManifest.xml +++ b/deploy/android_demo/app/src/main/AndroidManifest.xml @@ -14,10 +14,10 @@ android:roundIcon="@mipmap/ic_launcher_round" android:supportsRtl="true" android:theme="@style/AppTheme"> + - @@ -25,6 +25,15 @@ android:name="com.baidu.paddle.lite.demo.ocr.SettingsActivity" android:label="Settings"> + + + \ No newline at end of file diff --git a/deploy/android_demo/app/src/main/assets/images/180.jpg b/deploy/android_demo/app/src/main/assets/images/180.jpg new file mode 100644 index 0000000000000000000000000000000000000000..84cf4c79ef14769d01b0b0e9667387bd16b3e6e7 Binary files /dev/null and b/deploy/android_demo/app/src/main/assets/images/180.jpg differ diff --git a/deploy/android_demo/app/src/main/assets/images/270.jpg b/deploy/android_demo/app/src/main/assets/images/270.jpg new file mode 100644 index 0000000000000000000000000000000000000000..568739043b7779425b0abeb4459dbb485caed847 Binary files /dev/null and b/deploy/android_demo/app/src/main/assets/images/270.jpg differ diff --git a/deploy/android_demo/app/src/main/assets/images/90.jpg b/deploy/android_demo/app/src/main/assets/images/90.jpg new file mode 100644 index 0000000000000000000000000000000000000000..49e949aa9cc14e3afc507c5806c87d9894c2dcb9 Binary files /dev/null and b/deploy/android_demo/app/src/main/assets/images/90.jpg differ diff --git a/deploy/android_demo/app/src/main/cpp/native.cpp b/deploy/android_demo/app/src/main/cpp/native.cpp index 33233e5372e307a892786c6bea779691e1f6781a..963c5246d5b7b50720f92705d288526ae2cc6a73 100644 --- a/deploy/android_demo/app/src/main/cpp/native.cpp +++ b/deploy/android_demo/app/src/main/cpp/native.cpp @@ -4,112 +4,111 @@ #include "native.h" #include "ocr_ppredictor.h" -#include #include #include +#include static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode); -extern "C" -JNIEXPORT jlong JNICALL -Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(JNIEnv *env, jobject thiz, - jstring j_det_model_path, - jstring j_rec_model_path, - jint j_thread_num, - jstring j_cpu_mode) { - std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path); - std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path); - int thread_num = j_thread_num; - std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode); - ppredictor::OCR_Config conf; - conf.thread_num = thread_num; - conf.mode = str_to_cpu_mode(cpu_mode); - ppredictor::OCR_PPredictor *orc_predictor = new ppredictor::OCR_PPredictor{conf}; - orc_predictor->init_from_file(det_model_path, rec_model_path); - return reinterpret_cast(orc_predictor); +extern "C" JNIEXPORT jlong JNICALL +Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init( + JNIEnv *env, jobject thiz, jstring j_det_model_path, + jstring j_rec_model_path, jstring j_cls_model_path, jint j_thread_num, + jstring j_cpu_mode) { + std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path); + std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path); + std::string cls_model_path = jstring_to_cpp_string(env, j_cls_model_path); + int thread_num = j_thread_num; + std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode); + ppredictor::OCR_Config conf; + conf.thread_num = thread_num; + conf.mode = str_to_cpu_mode(cpu_mode); + ppredictor::OCR_PPredictor *orc_predictor = + new ppredictor::OCR_PPredictor{conf}; + orc_predictor->init_from_file(det_model_path, rec_model_path, cls_model_path); + return reinterpret_cast(orc_predictor); } /** - * "LITE_POWER_HIGH" 转为 paddle::lite_api::LITE_POWER_HIGH + * "LITE_POWER_HIGH" convert to paddle::lite_api::LITE_POWER_HIGH * @param cpu_mode * @return */ -static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode) { - static std::map cpu_mode_map{ - {"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH}, - {"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH}, - {"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL}, - {"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND}, - {"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH}, - {"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW} - }; - std::string upper_key; - std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(), ::toupper); - auto index = cpu_mode_map.find(upper_key); - if (index == cpu_mode_map.end()) { - LOGE("cpu_mode not found %s", upper_key.c_str()); - return paddle::lite_api::LITE_POWER_HIGH; - } else { - return index->second; - } - +static paddle::lite_api::PowerMode +str_to_cpu_mode(const std::string &cpu_mode) { + static std::map cpu_mode_map{ + {"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH}, + {"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH}, + {"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL}, + {"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND}, + {"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH}, + {"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW}}; + std::string upper_key; + std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(), + ::toupper); + auto index = cpu_mode_map.find(upper_key); + if (index == cpu_mode_map.end()) { + LOGE("cpu_mode not found %s", upper_key.c_str()); + return paddle::lite_api::LITE_POWER_HIGH; + } else { + return index->second; + } } -extern "C" -JNIEXPORT jfloatArray JNICALL -Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward(JNIEnv *env, jobject thiz, - jlong java_pointer, jfloatArray buf, - jfloatArray ddims, - jobject original_image) { - LOGI("begin to run native forward"); - if (java_pointer == 0) { - LOGE("JAVA pointer is NULL"); - return cpp_array_to_jfloatarray(env, nullptr, 0); - } - cv::Mat origin = bitmap_to_cv_mat(env, original_image); - if (origin.size == 0) { - LOGE("origin bitmap cannot convert to CV Mat"); - return cpp_array_to_jfloatarray(env, nullptr, 0); - } - ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer; - std::vector dims_float_arr = jfloatarray_to_float_vector(env, ddims); - std::vector dims_arr; - dims_arr.resize(dims_float_arr.size()); - std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin()); +extern "C" JNIEXPORT jfloatArray JNICALL +Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward( + JNIEnv *env, jobject thiz, jlong java_pointer, jfloatArray buf, + jfloatArray ddims, jobject original_image) { + LOGI("begin to run native forward"); + if (java_pointer == 0) { + LOGE("JAVA pointer is NULL"); + return cpp_array_to_jfloatarray(env, nullptr, 0); + } + cv::Mat origin = bitmap_to_cv_mat(env, original_image); + if (origin.size == 0) { + LOGE("origin bitmap cannot convert to CV Mat"); + return cpp_array_to_jfloatarray(env, nullptr, 0); + } + ppredictor::OCR_PPredictor *ppredictor = + (ppredictor::OCR_PPredictor *)java_pointer; + std::vector dims_float_arr = jfloatarray_to_float_vector(env, ddims); + std::vector dims_arr; + dims_arr.resize(dims_float_arr.size()); + std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin()); - // 这里值有点大,就不调用jfloatarray_to_float_vector了 - int64_t buf_len = (int64_t) env->GetArrayLength(buf); - jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE); - float *data = (jfloat *) buf_data; - std::vector results = ppredictor->infer_ocr(dims_arr, data, - buf_len, - NET_OCR, origin); - LOGI("infer_ocr finished with boxes %ld", results.size()); - // 这里将std::vector 序列化成 float数组,传输到java层再反序列化 - std::vector float_arr; - for (const ppredictor::OCRPredictResult &r :results) { - float_arr.push_back(r.points.size()); - float_arr.push_back(r.word_index.size()); - float_arr.push_back(r.score); - for (const std::vector &point : r.points) { - float_arr.push_back(point.at(0)); - float_arr.push_back(point.at(1)); - } - for (int index: r.word_index) { - float_arr.push_back(index); - } + // 这里值有点大,就不调用jfloatarray_to_float_vector了 + int64_t buf_len = (int64_t)env->GetArrayLength(buf); + jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE); + float *data = (jfloat *)buf_data; + std::vector results = + ppredictor->infer_ocr(dims_arr, data, buf_len, NET_OCR, origin); + LOGI("infer_ocr finished with boxes %ld", results.size()); + // 这里将std::vector 序列化成 + // float数组,传输到java层再反序列化 + std::vector float_arr; + for (const ppredictor::OCRPredictResult &r : results) { + float_arr.push_back(r.points.size()); + float_arr.push_back(r.word_index.size()); + float_arr.push_back(r.score); + for (const std::vector &point : r.points) { + float_arr.push_back(point.at(0)); + float_arr.push_back(point.at(1)); } - return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size()); + for (int index : r.word_index) { + float_arr.push_back(index); + } + } + return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size()); } -extern "C" -JNIEXPORT void JNICALL -Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(JNIEnv *env, jobject thiz, - jlong java_pointer){ - if (java_pointer == 0) { - LOGE("JAVA pointer is NULL"); - return; - } - ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer; - delete ppredictor; +extern "C" JNIEXPORT void JNICALL +Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release( + JNIEnv *env, jobject thiz, jlong java_pointer) { + if (java_pointer == 0) { + LOGE("JAVA pointer is NULL"); + return; + } + ppredictor::OCR_PPredictor *ppredictor = + (ppredictor::OCR_PPredictor *)java_pointer; + delete ppredictor; } \ No newline at end of file diff --git a/deploy/android_demo/app/src/main/cpp/ocr_cls_process.cpp b/deploy/android_demo/app/src/main/cpp/ocr_cls_process.cpp new file mode 100644 index 0000000000000000000000000000000000000000..d720066667b60ee87bc1a1227ad720074254074e --- /dev/null +++ b/deploy/android_demo/app/src/main/cpp/ocr_cls_process.cpp @@ -0,0 +1,46 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "ocr_cls_process.h" +#include +#include +#include +#include +#include +#include + +const std::vector CLS_IMAGE_SHAPE = {3, 32, 100}; + +cv::Mat cls_resize_img(const cv::Mat &img) { + int imgC = CLS_IMAGE_SHAPE[0]; + int imgW = CLS_IMAGE_SHAPE[2]; + int imgH = CLS_IMAGE_SHAPE[1]; + + float ratio = float(img.cols) / float(img.rows); + int resize_w = 0; + if (ceilf(imgH * ratio) > imgW) + resize_w = imgW; + else + resize_w = int(ceilf(imgH * ratio)); + + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f, + cv::INTER_CUBIC); + + if (resize_w < imgW) { + cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, int(imgW - resize_w), + cv::BORDER_CONSTANT, {0, 0, 0}); + } + return resize_img; +} \ No newline at end of file diff --git a/deploy/android_demo/app/src/main/cpp/ocr_cls_process.h b/deploy/android_demo/app/src/main/cpp/ocr_cls_process.h new file mode 100644 index 0000000000000000000000000000000000000000..1c30ee1071e647ce1ab7050ac0641d0eff7c62ad --- /dev/null +++ b/deploy/android_demo/app/src/main/cpp/ocr_cls_process.h @@ -0,0 +1,23 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include "common.h" +#include +#include + +extern const std::vector CLS_IMAGE_SHAPE; + +cv::Mat cls_resize_img(const cv::Mat &img); \ No newline at end of file diff --git a/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.cpp b/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.cpp index 6548157b7ecac09ca3802ee6e226d555bfcd9099..f0d855e83f010ef762cb4b01086e41a0f64fb4cb 100644 --- a/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.cpp +++ b/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.cpp @@ -3,184 +3,237 @@ // #include "ocr_ppredictor.h" -#include "preprocess.h" #include "common.h" -#include "ocr_db_post_process.h" +#include "ocr_cls_process.h" #include "ocr_crnn_process.h" +#include "ocr_db_post_process.h" +#include "preprocess.h" namespace ppredictor { -OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) { +OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) {} -} +int OCR_PPredictor::init(const std::string &det_model_content, + const std::string &rec_model_content, + const std::string &cls_model_content) { + _det_predictor = std::unique_ptr( + new PPredictor{_config.thread_num, NET_OCR, _config.mode}); + _det_predictor->init_nb(det_model_content); -int -OCR_PPredictor::init(const std::string &det_model_content, const std::string &rec_model_content) { - _det_predictor = std::unique_ptr( - new PPredictor{_config.thread_num, NET_OCR, _config.mode}); - _det_predictor->init_nb(det_model_content); + _rec_predictor = std::unique_ptr( + new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode}); + _rec_predictor->init_nb(rec_model_content); - _rec_predictor = std::unique_ptr( - new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode}); - _rec_predictor->init_nb(rec_model_content); - return RETURN_OK; + _cls_predictor = std::unique_ptr( + new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode}); + _cls_predictor->init_nb(cls_model_content); + return RETURN_OK; } -int OCR_PPredictor::init_from_file(const std::string &det_model_path, const std::string &rec_model_path){ - _det_predictor = std::unique_ptr( - new PPredictor{_config.thread_num, NET_OCR, _config.mode}); - _det_predictor->init_from_file(det_model_path); - - _rec_predictor = std::unique_ptr( - new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode}); - _rec_predictor->init_from_file(rec_model_path); - return RETURN_OK; +int OCR_PPredictor::init_from_file(const std::string &det_model_path, + const std::string &rec_model_path, + const std::string &cls_model_path) { + _det_predictor = std::unique_ptr( + new PPredictor{_config.thread_num, NET_OCR, _config.mode}); + _det_predictor->init_from_file(det_model_path); + + _rec_predictor = std::unique_ptr( + new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode}); + _rec_predictor->init_from_file(rec_model_path); + + _cls_predictor = std::unique_ptr( + new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode}); + _cls_predictor->init_from_file(cls_model_path); + return RETURN_OK; } /** - * 调试用,保存第一步的框选结果 + * for debug use, show result of First Step * @param filter_boxes * @param boxes * @param srcimg */ -static void visual_img(const std::vector>> &filter_boxes, - const std::vector>> &boxes, - const cv::Mat &srcimg) { - // visualization - cv::Point rook_points[filter_boxes.size()][4]; - for (int n = 0; n < filter_boxes.size(); n++) { - for (int m = 0; m < filter_boxes[0].size(); m++) { - rook_points[n][m] = cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1])); - } +static void +visual_img(const std::vector>> &filter_boxes, + const std::vector>> &boxes, + const cv::Mat &srcimg) { + // visualization + cv::Point rook_points[filter_boxes.size()][4]; + for (int n = 0; n < filter_boxes.size(); n++) { + for (int m = 0; m < filter_boxes[0].size(); m++) { + rook_points[n][m] = + cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1])); } - - cv::Mat img_vis; - srcimg.copyTo(img_vis); - for (int n = 0; n < boxes.size(); n++) { - const cv::Point *ppt[1] = {rook_points[n]}; - int npt[] = {4}; - cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0); - } - // 调试用,自行替换需要修改的路径 - cv::imwrite("/sdcard/1/vis.png", img_vis); + } + + cv::Mat img_vis; + srcimg.copyTo(img_vis); + for (int n = 0; n < boxes.size(); n++) { + const cv::Point *ppt[1] = {rook_points[n]}; + int npt[] = {4}; + cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0); + } + // 调试用,自行替换需要修改的路径 + cv::imwrite("/sdcard/1/vis.png", img_vis); } std::vector -OCR_PPredictor::infer_ocr(const std::vector &dims, const float *input_data, int input_len, - int net_flag, cv::Mat &origin) { +OCR_PPredictor::infer_ocr(const std::vector &dims, + const float *input_data, int input_len, int net_flag, + cv::Mat &origin) { + PredictorInput input = _det_predictor->get_first_input(); + input.set_dims(dims); + input.set_data(input_data, input_len); + std::vector results = _det_predictor->infer(); + PredictorOutput &res = results.at(0); + std::vector>> filtered_box = calc_filtered_boxes( + res.get_float_data(), res.get_size(), (int)dims[2], (int)dims[3], origin); + LOGI("Filter_box size %ld", filtered_box.size()); + return infer_rec(filtered_box, origin); +} - PredictorInput input = _det_predictor->get_first_input(); +std::vector OCR_PPredictor::infer_rec( + const std::vector>> &boxes, + const cv::Mat &origin_img) { + std::vector mean = {0.5f, 0.5f, 0.5f}; + std::vector scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; + std::vector dims = {1, 3, 0, 0}; + std::vector ocr_results; + + PredictorInput input = _rec_predictor->get_first_input(); + for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) { + const std::vector> &box = *bp; + cv::Mat crop_img = get_rotate_crop_image(origin_img, box); + crop_img = infer_cls(crop_img); + + float wh_ratio = float(crop_img.cols) / float(crop_img.rows); + cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio); + input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f); + const float *dimg = reinterpret_cast(input_image.data); + int input_size = input_image.rows * input_image.cols; + + dims[2] = input_image.rows; + dims[3] = input_image.cols; input.set_dims(dims); - input.set_data(input_data, input_len); - std::vector results = _det_predictor->infer(); - PredictorOutput &res = results.at(0); - std::vector>> filtered_box - = calc_filtered_boxes(res.get_float_data(), res.get_size(), (int) dims[2], (int) dims[3], - origin); - LOGI("Filter_box size %ld", filtered_box.size()); - return infer_rec(filtered_box, origin); -} -std::vector -OCR_PPredictor::infer_rec(const std::vector>> &boxes, - const cv::Mat &origin_img) { - std::vector mean = {0.5f, 0.5f, 0.5f}; - std::vector scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; - std::vector dims = {1, 3, 0, 0}; - std::vector ocr_results; - - PredictorInput input = _rec_predictor->get_first_input(); - for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) { - const std::vector> &box = *bp; - cv::Mat crop_img = get_rotate_crop_image(origin_img, box); - float wh_ratio = float(crop_img.cols) / float(crop_img.rows); - cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio); - input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f); - const float *dimg = reinterpret_cast(input_image.data); - int input_size = input_image.rows * input_image.cols; - - dims[2] = input_image.rows; - dims[3] = input_image.cols; - input.set_dims(dims); - - neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean, scale); - - std::vector results = _rec_predictor->infer(); - - OCRPredictResult res; - res.word_index = postprocess_rec_word_index(results.at(0)); - if (res.word_index.empty()) { - continue; - } - res.score = postprocess_rec_score(results.at(1)); - res.points = box; - ocr_results.emplace_back(std::move(res)); + neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean, + scale); + + std::vector results = _rec_predictor->infer(); + + OCRPredictResult res; + res.word_index = postprocess_rec_word_index(results.at(0)); + if (res.word_index.empty()) { + continue; } - LOGI("ocr_results finished %lu", ocr_results.size()); - return ocr_results; + res.score = postprocess_rec_score(results.at(1)); + res.points = box; + ocr_results.emplace_back(std::move(res)); + } + LOGI("ocr_results finished %lu", ocr_results.size()); + return ocr_results; +} + +cv::Mat OCR_PPredictor::infer_cls(const cv::Mat &img, float thresh) { + std::vector mean = {0.5f, 0.5f, 0.5f}; + std::vector scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; + std::vector dims = {1, 3, 0, 0}; + std::vector ocr_results; + + PredictorInput input = _cls_predictor->get_first_input(); + + cv::Mat input_image = cls_resize_img(img); + input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f); + const float *dimg = reinterpret_cast(input_image.data); + int input_size = input_image.rows * input_image.cols; + + dims[2] = input_image.rows; + dims[3] = input_image.cols; + input.set_dims(dims); + + neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean, + scale); + + std::vector results = _cls_predictor->infer(); + + const float *scores = results.at(0).get_float_data(); + const int *labels = results.at(1).get_int_data(); + for (int64_t i = 0; i < results.at(0).get_size(); i++) { + LOGI("output scores [%f]", scores[i]); + } + for (int64_t i = 0; i < results.at(1).get_size(); i++) { + LOGI("output label [%d]", labels[i]); + } + int label_idx = labels[0]; + float score = scores[label_idx]; + + cv::Mat srcimg; + img.copyTo(srcimg); + if (label_idx % 2 == 1 && score > thresh) { + cv::rotate(srcimg, srcimg, 1); + } + return srcimg; } std::vector>> -OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size, int output_height, - int output_width, const cv::Mat &origin) { - const double threshold = 0.3; - const double maxvalue = 1; - - cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F); - memcpy(pred_map.data, pred, pred_size * sizeof(float)); - cv::Mat cbuf_map; - pred_map.convertTo(cbuf_map, CV_8UC1); - - cv::Mat bit_map; - cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY); - - std::vector>> boxes = boxes_from_bitmap(pred_map, bit_map); - float ratio_h = output_height * 1.0f / origin.rows; - float ratio_w = output_width * 1.0f / origin.cols; - std::vector>> filter_boxes = filter_tag_det_res(boxes, ratio_h, - ratio_w, origin); - return filter_boxes; +OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size, + int output_height, int output_width, + const cv::Mat &origin) { + const double threshold = 0.3; + const double maxvalue = 1; + + cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F); + memcpy(pred_map.data, pred, pred_size * sizeof(float)); + cv::Mat cbuf_map; + pred_map.convertTo(cbuf_map, CV_8UC1); + + cv::Mat bit_map; + cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY); + + std::vector>> boxes = + boxes_from_bitmap(pred_map, bit_map); + float ratio_h = output_height * 1.0f / origin.rows; + float ratio_w = output_width * 1.0f / origin.cols; + std::vector>> filter_boxes = + filter_tag_det_res(boxes, ratio_h, ratio_w, origin); + return filter_boxes; } -std::vector OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) { - const int *rec_idx = res.get_int_data(); - const std::vector> rec_idx_lod = res.get_lod(); +std::vector +OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) { + const int *rec_idx = res.get_int_data(); + const std::vector> rec_idx_lod = res.get_lod(); - std::vector pred_idx; - for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) { - pred_idx.emplace_back(rec_idx[n]); - } - return pred_idx; + std::vector pred_idx; + for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) { + pred_idx.emplace_back(rec_idx[n]); + } + return pred_idx; } float OCR_PPredictor::postprocess_rec_score(const PredictorOutput &res) { - const float *predict_batch = res.get_float_data(); - const std::vector predict_shape = res.get_shape(); - const std::vector> predict_lod = res.get_lod(); - int blank = predict_shape[1]; - float score = 0.f; - int count = 0; - for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) { - int argmax_idx = argmax(predict_batch + n * predict_shape[1], - predict_batch + (n + 1) * predict_shape[1]); - float max_value = predict_batch[n * predict_shape[1] + argmax_idx]; - if (blank - 1 - argmax_idx > 1e-5) { - score += max_value; - count += 1; - } - - } - if (count == 0) { - LOGE("calc score count 0"); - } else { - score /= count; + const float *predict_batch = res.get_float_data(); + const std::vector predict_shape = res.get_shape(); + const std::vector> predict_lod = res.get_lod(); + int blank = predict_shape[1]; + float score = 0.f; + int count = 0; + for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) { + int argmax_idx = argmax(predict_batch + n * predict_shape[1], + predict_batch + (n + 1) * predict_shape[1]); + float max_value = predict_batch[n * predict_shape[1] + argmax_idx]; + if (blank - 1 - argmax_idx > 1e-5) { + score += max_value; + count += 1; } - LOGI("calc score: %f", score); - return score; - + } + if (count == 0) { + LOGE("calc score count 0"); + } else { + score /= count; + } + LOGI("calc score: %f", score); + return score; } - -NET_TYPE OCR_PPredictor::get_net_flag() const { - return NET_OCR; -} +NET_TYPE OCR_PPredictor::get_net_flag() const { return NET_OCR; } } \ No newline at end of file diff --git a/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.h b/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.h index 9adbf1e35214d4c83230ecf89b650aa1c1125a8f..0ec458a4952cbc605e9979ce7850bdeab36c4629 100644 --- a/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.h +++ b/deploy/android_demo/app/src/main/cpp/ocr_ppredictor.h @@ -4,109 +4,119 @@ #pragma once -#include +#include "ppredictor.h" #include #include -#include "ppredictor.h" +#include namespace ppredictor { /** - * 配置 + * Config */ struct OCR_Config { - int thread_num = 4; // 线程数 - paddle::lite_api::PowerMode mode = paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode + int thread_num = 4; // Thread num + paddle::lite_api::PowerMode mode = + paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode }; /** - * 一个四边形内图片的推理结果, + * PolyGone Result */ struct OCRPredictResult { - std::vector word_index; // - std::vector> points; - float score; + std::vector word_index; + std::vector> points; + float score; }; /** - * OCR 一共有2个模型进行推理, - * 1. 使用第一个模型(det),框选出多个四边形 - * 2. 从原图从抠出这些多边形,使用第二个模型(rec),获取文本 + * OCR there are 2 models + * 1. First model(det),select polygones to show where are the texts + * 2. crop from the origin images, use these polygones to infer */ class OCR_PPredictor : public PPredictor_Interface { public: - OCR_PPredictor(const OCR_Config &config); - - virtual ~OCR_PPredictor() { - - } - - /** - * 初始化二个模型的Predictor - * @param det_model_content - * @param rec_model_content - * @return - */ - int init(const std::string &det_model_content, const std::string &rec_model_content); - int init_from_file(const std::string &det_model_path, const std::string &rec_model_path); - /** - * 返回OCR结果 - * @param dims - * @param input_data - * @param input_len - * @param net_flag - * @param origin - * @return - */ - virtual std::vector - infer_ocr(const std::vector &dims, const float *input_data, int input_len, - int net_flag, cv::Mat &origin); - - - virtual NET_TYPE get_net_flag() const; - + OCR_PPredictor(const OCR_Config &config); + + virtual ~OCR_PPredictor() {} + + /** + * 初始化二个模型的Predictor + * @param det_model_content + * @param rec_model_content + * @return + */ + int init(const std::string &det_model_content, + const std::string &rec_model_content, + const std::string &cls_model_content); + int init_from_file(const std::string &det_model_path, + const std::string &rec_model_path, + const std::string &cls_model_path); + /** + * Return OCR result + * @param dims + * @param input_data + * @param input_len + * @param net_flag + * @param origin + * @return + */ + virtual std::vector + infer_ocr(const std::vector &dims, const float *input_data, + int input_len, int net_flag, cv::Mat &origin); + + virtual NET_TYPE get_net_flag() const; private: - - /** - * 从第一个模型的结果中计算有文字的四边形 - * @param pred - * @param output_height - * @param output_width - * @param origin - * @return - */ - std::vector>> - calc_filtered_boxes(const float *pred, int pred_size, int output_height, int output_width, - const cv::Mat &origin); - - /** - * 第二个模型的推理 - * - * @param boxes - * @param origin - * @return - */ - std::vector - infer_rec(const std::vector>> &boxes, const cv::Mat &origin); - - /** - * 第二个模型提取文字的后处理 - * @param res - * @return - */ - std::vector postprocess_rec_word_index(const PredictorOutput &res); - - /** - * 计算第二个模型的文字的置信度 - * @param res - * @return - */ - float postprocess_rec_score(const PredictorOutput &res); - - std::unique_ptr _det_predictor; - std::unique_ptr _rec_predictor; - OCR_Config _config; - + /** + * calcul Polygone from the result image of first model + * @param pred + * @param output_height + * @param output_width + * @param origin + * @return + */ + std::vector>> + calc_filtered_boxes(const float *pred, int pred_size, int output_height, + int output_width, const cv::Mat &origin); + + /** + * infer for second model + * + * @param boxes + * @param origin + * @return + */ + std::vector + infer_rec(const std::vector>> &boxes, + const cv::Mat &origin); + + /** + * infer for cls model + * + * @param boxes + * @param origin + * @return + */ + cv::Mat infer_cls(const cv::Mat &origin, float thresh = 0.5); + + /** + * Postprocess or sencod model to extract text + * @param res + * @return + */ + std::vector postprocess_rec_word_index(const PredictorOutput &res); + + /** + * calculate confidence of second model text result + * @param res + * @return + */ + float postprocess_rec_score(const PredictorOutput &res); + + std::unique_ptr _det_predictor; + std::unique_ptr _rec_predictor; + std::unique_ptr _cls_predictor; + OCR_Config _config; }; } diff --git a/deploy/android_demo/app/src/main/cpp/ppredictor.h b/deploy/android_demo/app/src/main/cpp/ppredictor.h index 9cdf3a88170ca2fbae9a2b1d8353fc99ebdfb971..1391109f9197b5e53796c940857c9d01b30a1125 100644 --- a/deploy/android_demo/app/src/main/cpp/ppredictor.h +++ b/deploy/android_demo/app/src/main/cpp/ppredictor.h @@ -7,7 +7,7 @@ namespace ppredictor { /** - * PaddleLite Preditor 通用接口 + * PaddleLite Preditor Common Interface */ class PPredictor_Interface { public: @@ -21,7 +21,7 @@ public: }; /** - * 通用推理 + * Common Predictor */ class PPredictor : public PPredictor_Interface { public: @@ -33,9 +33,9 @@ public: } /** - * 初始化paddlitelite的opt模型,nb格式,与init_paddle二选一 + * init paddlitelite opt model,nb format ,or use ini_paddle * @param model_content - * @return 0 目前是固定值0, 之后其他值表示失败 + * @return 0 */ virtual int init_nb(const std::string &model_content); diff --git a/deploy/android_demo/app/src/main/cpp/predictor_output.h b/deploy/android_demo/app/src/main/cpp/predictor_output.h index c56e2d9a4e9890faae89d6d183b81773a9c9a228..ec7086c62f0d5ca555ec17b38b27b6eea824fdb5 100644 --- a/deploy/android_demo/app/src/main/cpp/predictor_output.h +++ b/deploy/android_demo/app/src/main/cpp/predictor_output.h @@ -21,10 +21,10 @@ public: const std::vector> get_lod() const; const std::vector get_shape() const; - std::vector data; // 通常是float返回,与下面的data_int二选一 - std::vector data_int; // 少数层是int返回,与 data二选一 - std::vector shape; // PaddleLite输出层的shape - std::vector> lod; // PaddleLite输出层的lod + std::vector data; // return float, or use data_int + std::vector data_int; // several layers return int ,or use data + std::vector shape; // PaddleLite output shape + std::vector> lod; // PaddleLite output lod private: std::unique_ptr _tensor; diff --git a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/AppCompatPreferenceActivity.java b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/AppCompatPreferenceActivity.java index 397e4e39fe35a541fd534634ac509b94dd4b2b86..49af0afea425561d65d435a8fe67e96e98912680 100644 --- a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/AppCompatPreferenceActivity.java +++ b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/AppCompatPreferenceActivity.java @@ -19,15 +19,16 @@ package com.baidu.paddle.lite.demo.ocr; import android.content.res.Configuration; import android.os.Bundle; import android.preference.PreferenceActivity; -import android.support.annotation.LayoutRes; -import android.support.annotation.Nullable; -import android.support.v7.app.ActionBar; -import android.support.v7.app.AppCompatDelegate; -import android.support.v7.widget.Toolbar; import android.view.MenuInflater; import android.view.View; import android.view.ViewGroup; +import androidx.annotation.LayoutRes; +import androidx.annotation.Nullable; +import androidx.appcompat.app.ActionBar; +import androidx.appcompat.app.AppCompatDelegate; +import androidx.appcompat.widget.Toolbar; + /** * A {@link PreferenceActivity} which implements and proxies the necessary calls * to be used with AppCompat. diff --git a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/MainActivity.java b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/MainActivity.java index b72d72df47a3c6d769559230185c50823276fe85..afb261dcf2afb2a2eaebf58c8c1f30f89200c902 100644 --- a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/MainActivity.java +++ b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/MainActivity.java @@ -3,23 +3,22 @@ package com.baidu.paddle.lite.demo.ocr; import android.Manifest; import android.app.ProgressDialog; import android.content.ContentResolver; +import android.content.Context; import android.content.Intent; import android.content.SharedPreferences; import android.content.pm.PackageManager; import android.database.Cursor; import android.graphics.Bitmap; import android.graphics.BitmapFactory; +import android.media.ExifInterface; import android.net.Uri; import android.os.Bundle; +import android.os.Environment; import android.os.Handler; import android.os.HandlerThread; import android.os.Message; import android.preference.PreferenceManager; import android.provider.MediaStore; -import android.support.annotation.NonNull; -import android.support.v4.app.ActivityCompat; -import android.support.v4.content.ContextCompat; -import android.support.v7.app.AppCompatActivity; import android.text.method.ScrollingMovementMethod; import android.util.Log; import android.view.Menu; @@ -29,9 +28,17 @@ import android.widget.ImageView; import android.widget.TextView; import android.widget.Toast; +import androidx.annotation.NonNull; +import androidx.appcompat.app.AppCompatActivity; +import androidx.core.app.ActivityCompat; +import androidx.core.content.ContextCompat; +import androidx.core.content.FileProvider; + import java.io.File; import java.io.IOException; import java.io.InputStream; +import java.text.SimpleDateFormat; +import java.util.Date; public class MainActivity extends AppCompatActivity { private static final String TAG = MainActivity.class.getSimpleName(); @@ -69,6 +76,7 @@ public class MainActivity extends AppCompatActivity { protected float[] inputMean = new float[]{}; protected float[] inputStd = new float[]{}; protected float scoreThreshold = 0.1f; + private String currentPhotoPath; protected Predictor predictor = new Predictor(); @@ -368,18 +376,56 @@ public class MainActivity extends AppCompatActivity { } private void takePhoto() { - Intent takePhotoIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE); - if (takePhotoIntent.resolveActivity(getPackageManager()) != null) { - startActivityForResult(takePhotoIntent, TAKE_PHOTO_REQUEST_CODE); + Intent takePictureIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE); + // Ensure that there's a camera activity to handle the intent + if (takePictureIntent.resolveActivity(getPackageManager()) != null) { + // Create the File where the photo should go + File photoFile = null; + try { + photoFile = createImageFile(); + } catch (IOException ex) { + Log.e("MainActitity", ex.getMessage(), ex); + Toast.makeText(MainActivity.this, + "Create Camera temp file failed: " + ex.getMessage(), Toast.LENGTH_SHORT).show(); + } + // Continue only if the File was successfully created + if (photoFile != null) { + Log.i(TAG, "FILEPATH " + getExternalFilesDir("Pictures").getAbsolutePath()); + Uri photoURI = FileProvider.getUriForFile(this, + "com.baidu.paddle.lite.demo.ocr.fileprovider", + photoFile); + currentPhotoPath = photoFile.getAbsolutePath(); + takePictureIntent.putExtra(MediaStore.EXTRA_OUTPUT, photoURI); + startActivityForResult(takePictureIntent, TAKE_PHOTO_REQUEST_CODE); + Log.i(TAG, "startActivityForResult finished"); + } } + + } + + private File createImageFile() throws IOException { + // Create an image file name + String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(new Date()); + String imageFileName = "JPEG_" + timeStamp + "_"; + File storageDir = getExternalFilesDir(Environment.DIRECTORY_PICTURES); + File image = File.createTempFile( + imageFileName, /* prefix */ + ".bmp", /* suffix */ + storageDir /* directory */ + ); + + return image; } @Override protected void onActivityResult(int requestCode, int resultCode, Intent data) { super.onActivityResult(requestCode, resultCode, data); - if (resultCode == RESULT_OK && data != null) { + if (resultCode == RESULT_OK) { switch (requestCode) { case OPEN_GALLERY_REQUEST_CODE: + if (data == null) { + break; + } try { ContentResolver resolver = getContentResolver(); Uri uri = data.getData(); @@ -393,9 +439,22 @@ public class MainActivity extends AppCompatActivity { } break; case TAKE_PHOTO_REQUEST_CODE: - Bundle extras = data.getExtras(); - Bitmap image = (Bitmap) extras.get("data"); - onImageChanged(image); + if (currentPhotoPath != null) { + ExifInterface exif = null; + try { + exif = new ExifInterface(currentPhotoPath); + } catch (IOException e) { + e.printStackTrace(); + } + int orientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION, + ExifInterface.ORIENTATION_UNDEFINED); + Log.i(TAG, "rotation " + orientation); + Bitmap image = BitmapFactory.decodeFile(currentPhotoPath); + image = Utils.rotateBitmap(image, orientation); + onImageChanged(image); + } else { + Log.e(TAG, "currentPhotoPath is null"); + } break; default: break; diff --git a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/MiniActivity.java b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/MiniActivity.java new file mode 100644 index 0000000000000000000000000000000000000000..d5608911db2e043657eb01b9e8e92fe9b79c99b7 --- /dev/null +++ b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/MiniActivity.java @@ -0,0 +1,157 @@ +package com.baidu.paddle.lite.demo.ocr; + +import android.graphics.Bitmap; +import android.graphics.BitmapFactory; +import android.os.Build; +import android.os.Bundle; +import android.os.Handler; +import android.os.HandlerThread; +import android.os.Message; +import android.util.Log; +import android.view.View; +import android.widget.Button; +import android.widget.ImageView; +import android.widget.TextView; +import android.widget.Toast; + +import androidx.appcompat.app.AppCompatActivity; + +import java.io.IOException; +import java.io.InputStream; + +public class MiniActivity extends AppCompatActivity { + + + public static final int REQUEST_LOAD_MODEL = 0; + public static final int REQUEST_RUN_MODEL = 1; + public static final int REQUEST_UNLOAD_MODEL = 2; + public static final int RESPONSE_LOAD_MODEL_SUCCESSED = 0; + public static final int RESPONSE_LOAD_MODEL_FAILED = 1; + public static final int RESPONSE_RUN_MODEL_SUCCESSED = 2; + public static final int RESPONSE_RUN_MODEL_FAILED = 3; + + private static final String TAG = "MiniActivity"; + + protected Handler receiver = null; // Receive messages from worker thread + protected Handler sender = null; // Send command to worker thread + protected HandlerThread worker = null; // Worker thread to load&run model + protected volatile Predictor predictor = null; + + private String assetModelDirPath = "models/ocr_v1_for_cpu"; + private String assetlabelFilePath = "labels/ppocr_keys_v1.txt"; + + private Button button; + private ImageView imageView; // image result + private TextView textView; // text result + + @Override + protected void onCreate(Bundle savedInstanceState) { + super.onCreate(savedInstanceState); + setContentView(R.layout.activity_mini); + + Log.i(TAG, "SHOW in Logcat"); + + // Prepare the worker thread for mode loading and inference + worker = new HandlerThread("Predictor Worker"); + worker.start(); + sender = new Handler(worker.getLooper()) { + public void handleMessage(Message msg) { + switch (msg.what) { + case REQUEST_LOAD_MODEL: + // Load model and reload test image + if (!onLoadModel()) { + runOnUiThread(new Runnable() { + @Override + public void run() { + Toast.makeText(MiniActivity.this, "Load model failed!", Toast.LENGTH_SHORT).show(); + } + }); + } + break; + case REQUEST_RUN_MODEL: + // Run model if model is loaded + final boolean isSuccessed = onRunModel(); + runOnUiThread(new Runnable() { + @Override + public void run() { + if (isSuccessed){ + onRunModelSuccessed(); + }else{ + Toast.makeText(MiniActivity.this, "Run model failed!", Toast.LENGTH_SHORT).show(); + } + } + }); + break; + } + } + }; + sender.sendEmptyMessage(REQUEST_LOAD_MODEL); // corresponding to REQUEST_LOAD_MODEL, to call onLoadModel() + + imageView = findViewById(R.id.imageView); + textView = findViewById(R.id.sample_text); + button = findViewById(R.id.button); + button.setOnClickListener(new View.OnClickListener() { + @Override + public void onClick(View v) { + sender.sendEmptyMessage(REQUEST_RUN_MODEL); + } + }); + + + } + + @Override + protected void onDestroy() { + onUnloadModel(); + if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.JELLY_BEAN_MR2) { + worker.quitSafely(); + } else { + worker.quit(); + } + super.onDestroy(); + } + + /** + * call in onCreate, model init + * + * @return + */ + private boolean onLoadModel() { + if (predictor == null) { + predictor = new Predictor(); + } + return predictor.init(this, assetModelDirPath, assetlabelFilePath); + } + + /** + * init engine + * call in onCreate + * + * @return + */ + private boolean onRunModel() { + try { + String assetImagePath = "images/5.jpg"; + InputStream imageStream = getAssets().open(assetImagePath); + Bitmap image = BitmapFactory.decodeStream(imageStream); + // Input is Bitmap + predictor.setInputImage(image); + return predictor.isLoaded() && predictor.runModel(); + } catch (IOException e) { + e.printStackTrace(); + return false; + } + } + + private void onRunModelSuccessed() { + Log.i(TAG, "onRunModelSuccessed"); + textView.setText(predictor.outputResult); + imageView.setImageBitmap(predictor.outputImage); + } + + private void onUnloadModel() { + if (predictor != null) { + predictor.releaseModel(); + } + } +} diff --git a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/OCRPredictorNative.java b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/OCRPredictorNative.java index 103d5d37aec3ddc026d48a202df17b140e3e4533..7499d4b92689645c0b1009256884733d392ff68d 100644 --- a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/OCRPredictorNative.java +++ b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/OCRPredictorNative.java @@ -29,16 +29,16 @@ public class OCRPredictorNative { public OCRPredictorNative(Config config) { this.config = config; loadLibrary(); - nativePointer = init(config.detModelFilename, config.recModelFilename, + nativePointer = init(config.detModelFilename, config.recModelFilename,config.clsModelFilename, config.cpuThreadNum, config.cpuPower); Log.i("OCRPredictorNative", "load success " + nativePointer); } - public void release(){ - if (nativePointer != 0){ + public void release() { + if (nativePointer != 0) { nativePointer = 0; - destory(nativePointer); +// destory(nativePointer); } } @@ -55,10 +55,11 @@ public class OCRPredictorNative { public String cpuPower; public String detModelFilename; public String recModelFilename; + public String clsModelFilename; } - protected native long init(String detModelPath, String recModelPath, int threadNum, String cpuMode); + protected native long init(String detModelPath, String recModelPath,String clsModelPath, int threadNum, String cpuMode); protected native float[] forward(long pointer, float[] buf, float[] ddims, Bitmap originalImage); diff --git a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/Predictor.java b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/Predictor.java index d491481e7b61bca8043c34a65ecb3bbf6a72487d..ddf69ab481618696189a7d0d45264791267e5631 100644 --- a/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/Predictor.java +++ b/deploy/android_demo/app/src/main/java/com/baidu/paddle/lite/demo/ocr/Predictor.java @@ -38,7 +38,7 @@ public class Predictor { protected float scoreThreshold = 0.1f; protected Bitmap inputImage = null; protected Bitmap outputImage = null; - protected String outputResult = ""; + protected volatile String outputResult = ""; protected float preprocessTime = 0; protected float postprocessTime = 0; @@ -46,6 +46,16 @@ public class Predictor { public Predictor() { } + public boolean init(Context appCtx, String modelPath, String labelPath) { + isLoaded = loadModel(appCtx, modelPath, cpuThreadNum, cpuPowerMode); + if (!isLoaded) { + return false; + } + isLoaded = loadLabel(appCtx, labelPath); + return isLoaded; + } + + public boolean init(Context appCtx, String modelPath, String labelPath, int cpuThreadNum, String cpuPowerMode, String inputColorFormat, long[] inputShape, float[] inputMean, @@ -76,11 +86,7 @@ public class Predictor { Log.e(TAG, "Only BGR color format is supported."); return false; } - isLoaded = loadModel(appCtx, modelPath, cpuThreadNum, cpuPowerMode); - if (!isLoaded) { - return false; - } - isLoaded = loadLabel(appCtx, labelPath); + boolean isLoaded = init(appCtx, modelPath, labelPath); if (!isLoaded) { return false; } @@ -115,7 +121,8 @@ public class Predictor { config.cpuThreadNum = cpuThreadNum; config.detModelFilename = realPath + File.separator + "ch_det_mv3_db_opt.nb"; config.recModelFilename = realPath + File.separator + "ch_rec_mv3_crnn_opt.nb"; - Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename); + config.clsModelFilename = realPath + File.separator + "cls_opt_arm.nb"; + Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename + ";" + config.clsModelFilename); config.cpuPower = cpuPowerMode; paddlePredictor = new OCRPredictorNative(config); @@ -127,12 +134,12 @@ public class Predictor { } public void releaseModel() { - if (paddlePredictor != null){ + if (paddlePredictor != null) { paddlePredictor.release(); paddlePredictor = null; } isLoaded = false; - cpuThreadNum = 4; + cpuThreadNum = 1; cpuPowerMode = "LITE_POWER_HIGH"; modelPath = ""; modelName = ""; @@ -222,7 +229,7 @@ public class Predictor { for (int i = 0; i < warmupIterNum; i++) { paddlePredictor.runImage(inputData, width, height, channels, inputImage); } - warmupIterNum = 0; // 之后不要再warm了 + warmupIterNum = 0; // do not need warm // Run inference start = new Date(); ArrayList results = paddlePredictor.runImage(inputData, width, height, channels, inputImage); @@ -287,9 +294,7 @@ public class Predictor { if (image == null) { return; } - // Scale image to the size of input tensor - Bitmap rgbaImage = image.copy(Bitmap.Config.ARGB_8888, true); - this.inputImage = rgbaImage; + this.inputImage = image.copy(Bitmap.Config.ARGB_8888, true); } private ArrayList postprocess(ArrayList results) { @@ -310,7 +315,7 @@ public class Predictor { private void drawResults(ArrayList results) { StringBuffer outputResultSb = new StringBuffer(""); - for (int i=0;i - - \ No newline at end of file + \ No newline at end of file diff --git a/deploy/android_demo/app/src/main/res/layout/activity_mini.xml b/deploy/android_demo/app/src/main/res/layout/activity_mini.xml new file mode 100644 index 0000000000000000000000000000000000000000..ec4622ae5c21334769a0ef7f084f73a3ac6a05ab --- /dev/null +++ b/deploy/android_demo/app/src/main/res/layout/activity_mini.xml @@ -0,0 +1,46 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/deploy/ios_demo/ocr_demo/BoxLayer.h b/deploy/ios_demo/ocr_demo/BoxLayer.h new file mode 100644 index 0000000000000000000000000000000000000000..9a3cc10ed43ed86319a92d50f1cbb5b5906e9c81 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/BoxLayer.h @@ -0,0 +1,20 @@ +// +// Created by chenxiaoyu on 2018/5/5. +// Copyright (c) 2018 baidu. All rights reserved. +// + +#import +#import +#import "OcrData.h" + + +@interface BoxLayer : CAShapeLayer + +/** + * 绘制OCR的结果 + */ +-(void) renderOcrPolygon: (OcrData *)data withHeight:(CGFloat)originHeight withWidth:(CGFloat)originWidth withLabel:(bool) withLabel; + + + +@end diff --git a/deploy/ios_demo/ocr_demo/BoxLayer.m b/deploy/ios_demo/ocr_demo/BoxLayer.m new file mode 100644 index 0000000000000000000000000000000000000000..9ad0c4ee004ccaa7fedb70ad6acdf17f08f464df --- /dev/null +++ b/deploy/ios_demo/ocr_demo/BoxLayer.m @@ -0,0 +1,80 @@ +// +// Created by chenxiaoyu on 2018/5/5. +// Copyright (c) 2018 baidu. All rights reserved. +// + +#include "BoxLayer.h" +#import "Helpers.h" + +@implementation BoxLayer { + +} + +#define MAIN_COLOR UIColorFromRGB(0x3B85F5) +- (void)renderOcrPolygon:(OcrData *)d withHeight:(CGFloat)originHeight withWidth:(CGFloat)originWidth withLabel:(bool)withLabel { + + if ([d.polygonPoints count] != 4) { + NSLog(@"poloygonPoints size is not 4"); + return; + } + + CGPoint startPoint = [d.polygonPoints[0] CGPointValue]; + NSString *text = d.label; + + CGFloat x = startPoint.x * originWidth; + CGFloat y = startPoint.y * originHeight; + CGFloat width = originWidth - x; + CGFloat height = originHeight - y; + + + UIFont *font = [UIFont systemFontOfSize:16]; + NSDictionary *attrs = @{ +// NSStrokeColorAttributeName: [UIColor blackColor], + NSForegroundColorAttributeName: [UIColor whiteColor], +// NSStrokeWidthAttributeName : @((float) -6.0), + NSFontAttributeName: font + }; + + + if (withLabel) { + NSAttributedString *displayStr = [[NSAttributedString alloc] initWithString:text attributes:attrs]; + CATextLayer *textLayer = [[CATextLayer alloc] init]; + textLayer.wrapped = YES; + textLayer.string = displayStr; + textLayer.frame = CGRectMake(x + 2, y + 2, width, height); + textLayer.contentsScale = [[UIScreen mainScreen] scale]; + + // 加阴影显得有点乱 +// textLayer.shadowColor = [MAIN_COLOR CGColor]; +// textLayer.shadowOffset = CGSizeMake(2.0, 2.0); +// textLayer.shadowOpacity = 0.8; +// textLayer.shadowRadius = 0.0; + + [self addSublayer:textLayer]; + } + + + UIBezierPath *path = [UIBezierPath new]; + + + [path moveToPoint:CGPointMake(startPoint.x * originWidth, startPoint.y * originHeight)]; + for (NSValue *val in d.polygonPoints) { + CGPoint p = [val CGPointValue]; + [path addLineToPoint:CGPointMake(p.x * originWidth, p.y * originHeight)]; + } + [path closePath]; + + self.path = path.CGPath; + self.strokeColor = MAIN_COLOR.CGColor; + self.lineWidth = 2.0; + self.fillColor = [MAIN_COLOR colorWithAlphaComponent:0.2].CGColor; + self.lineJoin = kCALineJoinBevel; + +} + +- (void)renderSingleBox:(OcrData *)data withHeight:(CGFloat)originHeight withWidth:(CGFloat)originWidth { + [self renderOcrPolygon:data withHeight:originHeight withWidth:originWidth withLabel:YES]; +} + + +@end diff --git a/deploy/ios_demo/ocr_demo/Helpers.h b/deploy/ios_demo/ocr_demo/Helpers.h new file mode 100644 index 0000000000000000000000000000000000000000..2437472f5f63a8e3b6d91683d89f1ea164b28ca5 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/Helpers.h @@ -0,0 +1,31 @@ +// +// Helpers.h +// EasyDLDemo +// +// Created by chenxiaoyu on 2018/5/14. +// Copyright © 2018年 baidu. All rights reserved. +// + +#import +#import + +#define UIColorFromRGB(rgbValue) \ +[UIColor colorWithRed:((float)((rgbValue & 0xFF0000) >> 16))/255.0 \ +green:((float)((rgbValue & 0x00FF00) >> 8))/255.0 \ +blue:((float)((rgbValue & 0x0000FF) >> 0))/255.0 \ +alpha:1.0] + +#define SCREEN_HEIGHT [UIScreen mainScreen].bounds.size.height +#define SCREEN_WIDTH [UIScreen mainScreen].bounds.size.width + +#define HIGHLIGHT_COLOR UIColorFromRGB(0xF5A623) + +//#define BTN_HIGHTLIGH_TEXT_COLOR UIColorFromRGB(0xF5A623) + + +@interface Helpers : NSObject { + + +} + +@end diff --git a/deploy/ios_demo/ocr_demo/Helpers.m b/deploy/ios_demo/ocr_demo/Helpers.m new file mode 100644 index 0000000000000000000000000000000000000000..13191d4fb07ec166377237a099bf4db43b0d88d4 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/Helpers.m @@ -0,0 +1,17 @@ +// +// Helpers.m +// EasyDLDemo +// +// Created by chenxiaoyu on 2018/5/14. +// Copyright © 2018年 baidu. All rights reserved. +// + +#import "Helpers.h" + + + +@implementation Helpers + + + +@end diff --git a/deploy/ios_demo/ocr_demo/Info.plist b/deploy/ios_demo/ocr_demo/Info.plist new file mode 100644 index 0000000000000000000000000000000000000000..bcdffa3ce899276654fd83d9fea9aee3d57eb737 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/Info.plist @@ -0,0 +1,49 @@ + + + + + CFBundleDevelopmentRegion + $(DEVELOPMENT_LANGUAGE) + CFBundleDisplayName + $(PRODUCT_NAME) + CFBundleExecutable + $(EXECUTABLE_NAME) + CFBundleIdentifier + $(PRODUCT_BUNDLE_IDENTIFIER) + CFBundleInfoDictionaryVersion + 6.0 + CFBundleName + $(PRODUCT_NAME) + CFBundlePackageType + APPL + CFBundleShortVersionString + 0.1 + CFBundleVersion + 1 + LSRequiresIPhoneOS + + NSCameraUsageDescription + for test + UILaunchStoryboardName + LaunchScreen + UIMainStoryboardFile + Main + UIRequiredDeviceCapabilities + + armv7 + + UISupportedInterfaceOrientations + + UIInterfaceOrientationPortrait + UIInterfaceOrientationLandscapeLeft + UIInterfaceOrientationLandscapeRight + + UISupportedInterfaceOrientations~ipad + + UIInterfaceOrientationPortrait + UIInterfaceOrientationPortraitUpsideDown + UIInterfaceOrientationLandscapeLeft + UIInterfaceOrientationLandscapeRight + + + diff --git a/deploy/ios_demo/ocr_demo/OcrData.h b/deploy/ios_demo/ocr_demo/OcrData.h new file mode 100644 index 0000000000000000000000000000000000000000..b6f397b9426974c6c1b9c1289de77abd8da52132 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/OcrData.h @@ -0,0 +1,15 @@ +// +// Created by Lv,Xiangxiang on 2020/7/11. +// Copyright (c) 2020 Li,Xiaoyang(SYS). All rights reserved. +// + +#import + + +@interface OcrData : NSObject +@property(nonatomic, copy) NSString *label; +@property(nonatomic) int category; +@property(nonatomic) float accuracy; +@property(nonatomic) NSArray *polygonPoints; + +@end \ No newline at end of file diff --git a/deploy/ios_demo/ocr_demo/OcrData.m b/deploy/ios_demo/ocr_demo/OcrData.m new file mode 100644 index 0000000000000000000000000000000000000000..d5850dd411fad58a87a05ffd845c5344ecf82aa3 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/OcrData.m @@ -0,0 +1,12 @@ +// +// Created by Lv,Xiangxiang on 2020/7/11. +// Copyright (c) 2020 Li,Xiaoyang(SYS). All rights reserved. +// + +#import "OcrData.h" + + +@implementation OcrData { + +} +@end \ No newline at end of file diff --git a/deploy/ios_demo/ocr_demo/ViewController.h b/deploy/ios_demo/ocr_demo/ViewController.h new file mode 100644 index 0000000000000000000000000000000000000000..f51b6a684d99c69d768a4bacfca90d495abc1305 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/ViewController.h @@ -0,0 +1,16 @@ +// +// ViewController.h +// seg_demo +// +// Created by Li,Xiaoyang(SYS) on 2018/11/13. +// Copyright © 2018年 Li,Xiaoyang(SYS). All rights reserved. +// + +#import + + +@interface ViewController : UIViewController + + +@end + diff --git a/deploy/ios_demo/ocr_demo/ViewController.mm b/deploy/ios_demo/ocr_demo/ViewController.mm new file mode 100644 index 0000000000000000000000000000000000000000..175d0192c96104c7aea4d45b6d560a67928955c8 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/ViewController.mm @@ -0,0 +1,536 @@ +// +// Created by lvxiangxiang on 2020/7/10. +// Copyright (c) 2020 baidu. All rights reserved. +// + +#import +#import +#import +//#import +#import "ViewController.h" +#import "BoxLayer.h" + +#include "include/paddle_api.h" +#include "timer.h" +#import "pdocr/ocr_db_post_process.h" +#import "pdocr/ocr_crnn_process.h" + +using namespace paddle::lite_api; +using namespace cv; + +struct Object { + int batch_id; + cv::Rect rec; + int class_id; + float prob; +}; + +std::mutex mtx; +std::shared_ptr net_ocr1; +std::shared_ptr net_ocr2; +Timer tic; +long long count = 0; + +double tensor_mean(const Tensor &tin) { + auto shape = tin.shape(); + int64_t size = 1; + for (int i = 0; i < shape.size(); i++) { + size *= shape[i]; + } + double mean = 0.; + auto ptr = tin.data(); + for (int i = 0; i < size; i++) { + mean += ptr[i]; + } + return mean / size; +} + +cv::Mat resize_img_type0(const cv::Mat &img, int max_size_len, float *ratio_h, float *ratio_w) { + int w = img.cols; + int h = img.rows; + + float ratio = 1.f; + int max_wh = w >= h ? w : h; + if (max_wh > max_size_len) { + if (h > w) { + ratio = float(max_size_len) / float(h); + } else { + ratio = float(max_size_len) / float(w); + } + } + + int resize_h = int(float(h) * ratio); + int resize_w = int(float(w) * ratio); + if (resize_h % 32 == 0) + resize_h = resize_h; + else if (resize_h / 32 < 1) + resize_h = 32; + else + resize_h = (resize_h / 32 - 1) * 32; + + if (resize_w % 32 == 0) + resize_w = resize_w; + else if (resize_w / 32 < 1) + resize_w = 32; + else + resize_w = (resize_w / 32 - 1) * 32; + + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(resize_w, resize_h)); + + *ratio_h = float(resize_h) / float(h); + *ratio_w = float(resize_w) / float(w); + return resize_img; +} + +void neon_mean_scale(const float *din, float *dout, int size, std::vector mean, std::vector scale) { + float32x4_t vmean0 = vdupq_n_f32(mean[0]); + float32x4_t vmean1 = vdupq_n_f32(mean[1]); + float32x4_t vmean2 = vdupq_n_f32(mean[2]); + float32x4_t vscale0 = vdupq_n_f32(1.f / scale[0]); + float32x4_t vscale1 = vdupq_n_f32(1.f / scale[1]); + float32x4_t vscale2 = vdupq_n_f32(1.f / scale[2]); + + float *dout_c0 = dout; + float *dout_c1 = dout + size; + float *dout_c2 = dout + size * 2; + + int i = 0; + for (; i < size - 3; i += 4) { + float32x4x3_t vin3 = vld3q_f32(din); + float32x4_t vsub0 = vsubq_f32(vin3.val[0], vmean0); + float32x4_t vsub1 = vsubq_f32(vin3.val[1], vmean1); + float32x4_t vsub2 = vsubq_f32(vin3.val[2], vmean2); + float32x4_t vs0 = vmulq_f32(vsub0, vscale0); + float32x4_t vs1 = vmulq_f32(vsub1, vscale1); + float32x4_t vs2 = vmulq_f32(vsub2, vscale2); + vst1q_f32(dout_c0, vs0); + vst1q_f32(dout_c1, vs1); + vst1q_f32(dout_c2, vs2); + + din += 12; + dout_c0 += 4; + dout_c1 += 4; + dout_c2 += 4; + } + for (; i < size; i++) { + *(dout_c0++) = (*(din++) - mean[0]) / scale[0]; + *(dout_c1++) = (*(din++) - mean[1]) / scale[1]; + *(dout_c2++) = (*(din++) - mean[2]) / scale[2]; + } +} + + +// fill tensor with mean and scale, neon speed up +void fill_tensor_with_cvmat(const Mat &img_in, Tensor &tout, int width, int height, + std::vector mean, std::vector scale, bool is_scale) { + if (img_in.channels() == 4) { + cv::cvtColor(img_in, img_in, CV_RGBA2RGB); + } + cv::Mat im; + cv::resize(img_in, im, cv::Size(width, height), 0.f, 0.f); + cv::Mat imgf; + float scale_factor = is_scale ? 1 / 255.f : 1.f; + im.convertTo(imgf, CV_32FC3, scale_factor); + const float *dimg = reinterpret_cast(imgf.data); + float *dout = tout.mutable_data(); + neon_mean_scale(dimg, dout, width * height, mean, scale); +} + +std::vector detect_object(const float *data, + int count, + const std::vector> &lod, + const float thresh, + Mat &image) { + std::vector rect_out; + const float *dout = data; + for (int iw = 0; iw < count; iw++) { + int oriw = image.cols; + int orih = image.rows; + if (dout[1] > thresh && static_cast(dout[0]) > 0) { + Object obj; + int x = static_cast(dout[2] * oriw); + int y = static_cast(dout[3] * orih); + int w = static_cast(dout[4] * oriw) - x; + int h = static_cast(dout[5] * orih) - y; + cv::Rect rec_clip = cv::Rect(x, y, w, h) & cv::Rect(0, 0, image.cols, image.rows); + obj.batch_id = 0; + obj.class_id = static_cast(dout[0]); + obj.prob = dout[1]; + obj.rec = rec_clip; + if (w > 0 && h > 0 && obj.prob <= 1) { + rect_out.push_back(obj); + cv::rectangle(image, rec_clip, cv::Scalar(255, 0, 0)); + } + } + dout += 6; + } + return rect_out; +} + +@interface ViewController () +@property(weak, nonatomic) IBOutlet UIImageView *imageView; +@property(weak, nonatomic) IBOutlet UISwitch *flag_process; +@property(weak, nonatomic) IBOutlet UISwitch *flag_video; +@property(weak, nonatomic) IBOutlet UIImageView *preView; +@property(weak, nonatomic) IBOutlet UISwitch *flag_back_cam; +@property(weak, nonatomic) IBOutlet UILabel *result; +@property(nonatomic, strong) CvVideoCamera *videoCamera; +@property(nonatomic, strong) UIImage *image; +@property(nonatomic) bool flag_init; +@property(nonatomic) bool flag_cap_photo; +@property(nonatomic) std::vector scale; +@property(nonatomic) std::vector mean; +@property(nonatomic) NSArray *labels; +@property(nonatomic) cv::Mat cvimg; +@property(nonatomic, strong) UIImage *ui_img_test; +@property(strong, nonatomic) CALayer *boxLayer; + +@end + +@implementation ViewController +@synthesize imageView; + +- (OcrData *)paddleOcrRec:(cv::Mat)image { + + OcrData *result = [OcrData new]; + + std::vector mean = {0.5f, 0.5f, 0.5f}; + std::vector scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; + + cv::Mat crop_img; + image.copyTo(crop_img); + cv::Mat resize_img; + + + float wh_ratio = float(crop_img.cols) / float(crop_img.rows); + + resize_img = crnn_resize_img(crop_img, wh_ratio); + resize_img.convertTo(resize_img, CV_32FC3, 1 / 255.f); + + const float *dimg = reinterpret_cast(resize_img.data); + + std::unique_ptr input_tensor0(std::move(net_ocr2->GetInput(0))); + input_tensor0->Resize({1, 3, resize_img.rows, resize_img.cols}); + auto *data0 = input_tensor0->mutable_data(); + + neon_mean_scale(dimg, data0, resize_img.rows * resize_img.cols, mean, scale); + + //// Run CRNN predictor + net_ocr2->Run(); + + // Get output and run postprocess + std::unique_ptr output_tensor0(std::move(net_ocr2->GetOutput(0))); + auto *rec_idx = output_tensor0->data(); + + auto rec_idx_lod = output_tensor0->lod(); + auto shape_out = output_tensor0->shape(); + NSMutableString *text = [[NSMutableString alloc] init]; + for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) { + if (rec_idx[n] >= self.labels.count) { + std::cout << "Index " << rec_idx[n] << " out of text dict range!" << std::endl; + continue; + } + [text appendString:self.labels[rec_idx[n]]]; + } + + result.label = text; + // get score + std::unique_ptr output_tensor1(std::move(net_ocr2->GetOutput(1))); + auto *predict_batch = output_tensor1->data(); + auto predict_shape = output_tensor1->shape(); + + auto predict_lod = output_tensor1->lod(); + + int argmax_idx; + int blank = predict_shape[1]; + float score = 0.f; + int count = 0; + float max_value = 0.0f; + + for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) { + argmax_idx = int(argmax(&predict_batch[n * predict_shape[1]], &predict_batch[(n + 1) * predict_shape[1]])); + max_value = float(*std::max_element(&predict_batch[n * predict_shape[1]], &predict_batch[(n + 1) * predict_shape[1]])); + + if (blank - 1 - argmax_idx > 1e-5) { + score += max_value; + count += 1; + } + + } + score /= count; + result.accuracy = score; + return result; +} +- (NSArray *) ocr_infer:(cv::Mat) originImage{ + int max_side_len = 960; + float ratio_h{}; + float ratio_w{}; + cv::Mat image; + cv::cvtColor(originImage, image, cv::COLOR_RGB2BGR); + + cv::Mat img; + image.copyTo(img); + + img = resize_img_type0(img, max_side_len, &ratio_h, &ratio_w); + cv::Mat img_fp; + img.convertTo(img_fp, CV_32FC3, 1.0 / 255.f); + + std::unique_ptr input_tensor(net_ocr1->GetInput(0)); + input_tensor->Resize({1, 3, img_fp.rows, img_fp.cols}); + auto *data0 = input_tensor->mutable_data(); + const float *dimg = reinterpret_cast(img_fp.data); + neon_mean_scale(dimg, data0, img_fp.rows * img_fp.cols, self.mean, self.scale); + tic.clear(); + tic.start(); + net_ocr1->Run(); + std::unique_ptr output_tensor(std::move(net_ocr1->GetOutput(0))); + auto *outptr = output_tensor->data(); + auto shape_out = output_tensor->shape(); + + int64_t out_numl = 1; + double sum = 0; + for (auto i : shape_out) { + out_numl *= i; + } + + int s2 = int(shape_out[2]); + int s3 = int(shape_out[3]); + + cv::Mat pred_map = cv::Mat::zeros(s2, s3, CV_32F); + memcpy(pred_map.data, outptr, s2 * s3 * sizeof(float)); + cv::Mat cbuf_map; + pred_map.convertTo(cbuf_map, CV_8UC1, 255.0f); + + const double threshold = 0.1 * 255; + const double maxvalue = 255; + cv::Mat bit_map; + cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY); + + auto boxes = boxes_from_bitmap(pred_map, bit_map); + + std::vector>> filter_boxes = filter_tag_det_res(boxes, ratio_h, ratio_w, image); + + + cv::Point rook_points[filter_boxes.size()][4]; + + for (int n = 0; n < filter_boxes.size(); n++) { + for (int m = 0; m < filter_boxes[0].size(); m++) { + rook_points[n][m] = cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1])); + } + } + + NSMutableArray *result = [[NSMutableArray alloc] init]; + + for (int i = 0; i < filter_boxes.size(); i++) { + cv::Mat crop_img; + crop_img = get_rotate_crop_image(image, filter_boxes[i]); + OcrData *r = [self paddleOcrRec:crop_img ]; + NSMutableArray *points = [NSMutableArray new]; + for (int jj = 0; jj < 4; ++jj) { + NSValue *v = [NSValue valueWithCGPoint:CGPointMake( + rook_points[i][jj].x / CGFloat(originImage.cols), + rook_points[i][jj].y / CGFloat(originImage.rows))]; + [points addObject:v]; + } + r.polygonPoints = points; + [result addObject:r]; + } + NSArray* rec_out =[[result reverseObjectEnumerator] allObjects]; + tic.end(); + std::cout<<"infer time: "<(config); + MobileConfig config2; + config2.set_model_from_file(model2_path_str); + net_ocr2 = CreatePaddlePredictor(config2); + + + cv::Mat originImage; + UIImageToMat(self.image, originImage); + NSArray *rec_out = [self ocr_infer:originImage]; + [_boxLayer.sublayers makeObjectsPerformSelector:@selector(removeFromSuperlayer)]; + std::cout<_cvimg, CV_RGBA2RGB); + } + auto rec_out =[self ocr_infer:self->_cvimg]; + std::ostringstream result; + NSInteger cnt = 0; + [_boxLayer.sublayers makeObjectsPerformSelector:@selector(removeFromSuperlayer)]; + + CGFloat h = _boxLayer.frame.size.height; + CGFloat w = _boxLayer.frame.size.width; + for (id obj in rec_out) { + OcrData *data = obj; + BoxLayer *singleBox = [[BoxLayer alloc] init]; + [singleBox renderOcrPolygon:data withHeight:h withWidth:w withLabel:YES]; + [_boxLayer addSublayer:singleBox]; + result<<[data.label UTF8String] <<","<_cvimg, self->_cvimg, CV_RGB2BGR); + self.imageView.image = MatToUIImage(self->_cvimg); + } + } + } + }); +} + +- (void)didReceiveMemoryWarning { + [super didReceiveMemoryWarning]; + // Dispose of any resources that can be recreated. +} + +@end diff --git a/deploy/ios_demo/ocr_demo/label_list.txt b/deploy/ios_demo/ocr_demo/label_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..84b885d8352226e49b1d5d791b8f43a663e246aa --- /dev/null +++ b/deploy/ios_demo/ocr_demo/label_list.txt @@ -0,0 +1,6623 @@ +' +疗 +绚 +诚 +娇 +溜 +题 +贿 +者 +廖 +更 +纳 +加 +奉 +公 +一 +就 +汴 +计 +与 +路 +房 +原 +妇 +2 +0 +8 +- +7 +其 +> +: +] +, +, +骑 +刈 +全 +消 +昏 +傈 +安 +久 +钟 +嗅 +不 +影 +处 +驽 +蜿 +资 +关 +椤 +地 +瘸 +专 +问 +忖 +票 +嫉 +炎 +韵 +要 +月 +田 +节 +陂 +鄙 +捌 +备 +拳 +伺 +眼 +网 +盎 +大 +傍 +心 +东 +愉 +汇 +蹿 +科 +每 +业 +里 +航 +晏 +字 +平 +录 +先 +1 +3 +彤 +鲶 +产 +稍 +督 +腴 +有 +象 +岳 +注 +绍 +在 +泺 +文 +定 +核 +名 +水 +过 +理 +让 +偷 +率 +等 +这 +发 +” +为 +含 +肥 +酉 +相 +鄱 +七 +编 +猥 +锛 +日 +镀 +蒂 +掰 +倒 +辆 +栾 +栗 +综 +涩 +州 +雌 +滑 +馀 +了 +机 +块 +司 +宰 +甙 +兴 +矽 +抚 +保 +用 +沧 +秩 +如 +收 +息 +滥 +页 +疑 +埠 +! +! +姥 +异 +橹 +钇 +向 +下 +跄 +的 +椴 +沫 +国 +绥 +獠 +报 +开 +民 +蜇 +何 +分 +凇 +长 +讥 +藏 +掏 +施 +羽 +中 +讲 +派 +嘟 +人 +提 +浼 +间 +世 +而 +古 +多 +倪 +唇 +饯 +控 +庚 +首 +赛 +蜓 +味 +断 +制 +觉 +技 +替 +艰 +溢 +潮 +夕 +钺 +外 +摘 +枋 +动 +双 +单 +啮 +户 +枇 +确 +锦 +曜 +杜 +或 +能 +效 +霜 +盒 +然 +侗 +电 +晁 +放 +步 +鹃 +新 +杖 +蜂 +吒 +濂 +瞬 +评 +总 +隍 +对 +独 +合 +也 +是 +府 +青 +天 +诲 +墙 +组 +滴 +级 +邀 +帘 +示 +已 +时 +骸 +仄 +泅 +和 +遨 +店 +雇 +疫 +持 +巍 +踮 +境 +只 +亨 +目 +鉴 +崤 +闲 +体 +泄 +杂 +作 +般 +轰 +化 +解 +迂 +诿 +蛭 +璀 +腾 +告 +版 +服 +省 +师 +小 +规 +程 +线 +海 +办 +引 +二 +桧 +牌 +砺 +洄 +裴 +修 +图 +痫 +胡 +许 +犊 +事 +郛 +基 +柴 +呼 +食 +研 +奶 +律 +蛋 +因 +葆 +察 +戏 +褒 +戒 +再 +李 +骁 +工 +貂 +油 +鹅 +章 +啄 +休 +场 +给 +睡 +纷 +豆 +器 +捎 +说 +敏 +学 +会 +浒 +设 +诊 +格 +廓 +查 +来 +霓 +室 +溆 +¢ +诡 +寥 +焕 +舜 +柒 +狐 +回 +戟 +砾 +厄 +实 +翩 +尿 +五 +入 +径 +惭 +喹 +股 +宇 +篝 +| +; +美 +期 +云 +九 +祺 +扮 +靠 +锝 +槌 +系 +企 +酰 +阊 +暂 +蚕 +忻 +豁 +本 +羹 +执 +条 +钦 +H +獒 +限 +进 +季 +楦 +于 +芘 +玖 +铋 +茯 +未 +答 +粘 +括 +样 +精 +欠 +矢 +甥 +帷 +嵩 +扣 +令 +仔 +风 +皈 +行 +支 +部 +蓉 +刮 +站 +蜡 +救 +钊 +汗 +松 +嫌 +成 +可 +. +鹤 +院 +从 +交 +政 +怕 +活 +调 +球 +局 +验 +髌 +第 +韫 +谗 +串 +到 +圆 +年 +米 +/ +* +友 +忿 +检 +区 +看 +自 +敢 +刃 +个 +兹 +弄 +流 +留 +同 +没 +齿 +星 +聆 +轼 +湖 +什 +三 +建 +蛔 +儿 +椋 +汕 +震 +颧 +鲤 +跟 +力 +情 +璺 +铨 +陪 +务 +指 +族 +训 +滦 +鄣 +濮 +扒 +商 +箱 +十 +召 +慷 +辗 +所 +莞 +管 +护 +臭 +横 +硒 +嗓 +接 +侦 +六 +露 +党 +馋 +驾 +剖 +高 +侬 +妪 +幂 +猗 +绺 +骐 +央 +酐 +孝 +筝 +课 +徇 +缰 +门 +男 +西 +项 +句 +谙 +瞒 +秃 +篇 +教 +碲 +罚 +声 +呐 +景 +前 +富 +嘴 +鳌 +稀 +免 +朋 +啬 +睐 +去 +赈 +鱼 +住 +肩 +愕 +速 +旁 +波 +厅 +健 +茼 +厥 +鲟 +谅 +投 +攸 +炔 +数 +方 +击 +呋 +谈 +绩 +别 +愫 +僚 +躬 +鹧 +胪 +炳 +招 +喇 +膨 +泵 +蹦 +毛 +结 +5 +4 +谱 +识 +陕 +粽 +婚 +拟 +构 +且 +搜 +任 +潘 +比 +郢 +妨 +醪 +陀 +桔 +碘 +扎 +选 +哈 +骷 +楷 +亿 +明 +缆 +脯 +监 +睫 +逻 +婵 +共 +赴 +淝 +凡 +惦 +及 +达 +揖 +谩 +澹 +减 +焰 +蛹 +番 +祁 +柏 +员 +禄 +怡 +峤 +龙 +白 +叽 +生 +闯 +起 +细 +装 +谕 +竟 +聚 +钙 +上 +导 +渊 +按 +艾 +辘 +挡 +耒 +盹 +饪 +臀 +记 +邮 +蕙 +受 +各 +医 +搂 +普 +滇 +朗 +茸 +带 +翻 +酚 +( +光 +堤 +墟 +蔷 +万 +幻 +〓 +瑙 +辈 +昧 +盏 +亘 +蛀 +吉 +铰 +请 +子 +假 +闻 +税 +井 +诩 +哨 +嫂 +好 +面 +琐 +校 +馊 +鬣 +缂 +营 +访 +炖 +占 +农 +缀 +否 +经 +钚 +棵 +趟 +张 +亟 +吏 +茶 +谨 +捻 +论 +迸 +堂 +玉 +信 +吧 +瞠 +乡 +姬 +寺 +咬 +溏 +苄 +皿 +意 +赉 +宝 +尔 +钰 +艺 +特 +唳 +踉 +都 +荣 +倚 +登 +荐 +丧 +奇 +涵 +批 +炭 +近 +符 +傩 +感 +道 +着 +菊 +虹 +仲 +众 +懈 +濯 +颞 +眺 +南 +释 +北 +缝 +标 +既 +茗 +整 +撼 +迤 +贲 +挎 +耱 +拒 +某 +妍 +卫 +哇 +英 +矶 +藩 +治 +他 +元 +领 +膜 +遮 +穗 +蛾 +飞 +荒 +棺 +劫 +么 +市 +火 +温 +拈 +棚 +洼 +转 +果 +奕 +卸 +迪 +伸 +泳 +斗 +邡 +侄 +涨 +屯 +萋 +胭 +氡 +崮 +枞 +惧 +冒 +彩 +斜 +手 +豚 +随 +旭 +淑 +妞 +形 +菌 +吲 +沱 +争 +驯 +歹 +挟 +兆 +柱 +传 +至 +包 +内 +响 +临 +红 +功 +弩 +衡 +寂 +禁 +老 +棍 +耆 +渍 +织 +害 +氵 +渑 +布 +载 +靥 +嗬 +虽 +苹 +咨 +娄 +库 +雉 +榜 +帜 +嘲 +套 +瑚 +亲 +簸 +欧 +边 +6 +腿 +旮 +抛 +吹 +瞳 +得 +镓 +梗 +厨 +继 +漾 +愣 +憨 +士 +策 +窑 +抑 +躯 +襟 +脏 +参 +贸 +言 +干 +绸 +鳄 +穷 +藜 +音 +折 +详 +) +举 +悍 +甸 +癌 +黎 +谴 +死 +罩 +迁 +寒 +驷 +袖 +媒 +蒋 +掘 +模 +纠 +恣 +观 +祖 +蛆 +碍 +位 +稿 +主 +澧 +跌 +筏 +京 +锏 +帝 +贴 +证 +糠 +才 +黄 +鲸 +略 +炯 +饱 +四 +出 +园 +犀 +牧 +容 +汉 +杆 +浈 +汰 +瑷 +造 +虫 +瘩 +怪 +驴 +济 +应 +花 +沣 +谔 +夙 +旅 +价 +矿 +以 +考 +s +u +呦 +晒 +巡 +茅 +准 +肟 +瓴 +詹 +仟 +褂 +译 +桌 +混 +宁 +怦 +郑 +抿 +些 +余 +鄂 +饴 +攒 +珑 +群 +阖 +岔 +琨 +藓 +预 +环 +洮 +岌 +宀 +杲 +瀵 +最 +常 +囡 +周 +踊 +女 +鼓 +袭 +喉 +简 +范 +薯 +遐 +疏 +粱 +黜 +禧 +法 +箔 +斤 +遥 +汝 +奥 +直 +贞 +撑 +置 +绱 +集 +她 +馅 +逗 +钧 +橱 +魉 +[ +恙 +躁 +唤 +9 +旺 +膘 +待 +脾 +惫 +购 +吗 +依 +盲 +度 +瘿 +蠖 +俾 +之 +镗 +拇 +鲵 +厝 +簧 +续 +款 +展 +啃 +表 +剔 +品 +钻 +腭 +损 +清 +锶 +统 +涌 +寸 +滨 +贪 +链 +吠 +冈 +伎 +迥 +咏 +吁 +览 +防 +迅 +失 +汾 +阔 +逵 +绀 +蔑 +列 +川 +凭 +努 +熨 +揪 +利 +俱 +绉 +抢 +鸨 +我 +即 +责 +膦 +易 +毓 +鹊 +刹 +玷 +岿 +空 +嘞 +绊 +排 +术 +估 +锷 +违 +们 +苟 +铜 +播 +肘 +件 +烫 +审 +鲂 +广 +像 +铌 +惰 +铟 +巳 +胍 +鲍 +康 +憧 +色 +恢 +想 +拷 +尤 +疳 +知 +S +Y +F +D +A +峄 +裕 +帮 +握 +搔 +氐 +氘 +难 +墒 +沮 +雨 +叁 +缥 +悴 +藐 +湫 +娟 +苑 +稠 +颛 +簇 +后 +阕 +闭 +蕤 +缚 +怎 +佞 +码 +嘤 +蔡 +痊 +舱 +螯 +帕 +赫 +昵 +升 +烬 +岫 +、 +疵 +蜻 +髁 +蕨 +隶 +烛 +械 +丑 +盂 +梁 +强 +鲛 +由 +拘 +揉 +劭 +龟 +撤 +钩 +呕 +孛 +费 +妻 +漂 +求 +阑 +崖 +秤 +甘 +通 +深 +补 +赃 +坎 +床 +啪 +承 +吼 +量 +暇 +钼 +烨 +阂 +擎 +脱 +逮 +称 +P +神 +属 +矗 +华 +届 +狍 +葑 +汹 +育 +患 +窒 +蛰 +佼 +静 +槎 +运 +鳗 +庆 +逝 +曼 +疱 +克 +代 +官 +此 +麸 +耧 +蚌 +晟 +例 +础 +榛 +副 +测 +唰 +缢 +迹 +灬 +霁 +身 +岁 +赭 +扛 +又 +菡 +乜 +雾 +板 +读 +陷 +徉 +贯 +郁 +虑 +变 +钓 +菜 +圾 +现 +琢 +式 +乐 +维 +渔 +浜 +左 +吾 +脑 +钡 +警 +T +啵 +拴 +偌 +漱 +湿 +硕 +止 +骼 +魄 +积 +燥 +联 +踢 +玛 +则 +窿 +见 +振 +畿 +送 +班 +钽 +您 +赵 +刨 +印 +讨 +踝 +籍 +谡 +舌 +崧 +汽 +蔽 +沪 +酥 +绒 +怖 +财 +帖 +肱 +私 +莎 +勋 +羔 +霸 +励 +哼 +帐 +将 +帅 +渠 +纪 +婴 +娩 +岭 +厘 +滕 +吻 +伤 +坝 +冠 +戊 +隆 +瘁 +介 +涧 +物 +黍 +并 +姗 +奢 +蹑 +掣 +垸 +锴 +命 +箍 +捉 +病 +辖 +琰 +眭 +迩 +艘 +绌 +繁 +寅 +若 +毋 +思 +诉 +类 +诈 +燮 +轲 +酮 +狂 +重 +反 +职 +筱 +县 +委 +磕 +绣 +奖 +晋 +濉 +志 +徽 +肠 +呈 +獐 +坻 +口 +片 +碰 +几 +村 +柿 +劳 +料 +获 +亩 +惕 +晕 +厌 +号 +罢 +池 +正 +鏖 +煨 +家 +棕 +复 +尝 +懋 +蜥 +锅 +岛 +扰 +队 +坠 +瘾 +钬 +@ +卧 +疣 +镇 +譬 +冰 +彷 +频 +黯 +据 +垄 +采 +八 +缪 +瘫 +型 +熹 +砰 +楠 +襁 +箐 +但 +嘶 +绳 +啤 +拍 +盥 +穆 +傲 +洗 +盯 +塘 +怔 +筛 +丿 +台 +恒 +喂 +葛 +永 +¥ +烟 +酒 +桦 +书 +砂 +蚝 +缉 +态 +瀚 +袄 +圳 +轻 +蛛 +超 +榧 +遛 +姒 +奘 +铮 +右 +荽 +望 +偻 +卡 +丶 +氰 +附 +做 +革 +索 +戚 +坨 +桷 +唁 +垅 +榻 +岐 +偎 +坛 +莨 +山 +殊 +微 +骇 +陈 +爨 +推 +嗝 +驹 +澡 +藁 +呤 +卤 +嘻 +糅 +逛 +侵 +郓 +酌 +德 +摇 +※ +鬃 +被 +慨 +殡 +羸 +昌 +泡 +戛 +鞋 +河 +宪 +沿 +玲 +鲨 +翅 +哽 +源 +铅 +语 +照 +邯 +址 +荃 +佬 +顺 +鸳 +町 +霭 +睾 +瓢 +夸 +椁 +晓 +酿 +痈 +咔 +侏 +券 +噎 +湍 +签 +嚷 +离 +午 +尚 +社 +锤 +背 +孟 +使 +浪 +缦 +潍 +鞅 +军 +姹 +驶 +笑 +鳟 +鲁 +》 +孽 +钜 +绿 +洱 +礴 +焯 +椰 +颖 +囔 +乌 +孔 +巴 +互 +性 +椽 +哞 +聘 +昨 +早 +暮 +胶 +炀 +隧 +低 +彗 +昝 +铁 +呓 +氽 +藉 +喔 +癖 +瑗 +姨 +权 +胱 +韦 +堑 +蜜 +酋 +楝 +砝 +毁 +靓 +歙 +锲 +究 +屋 +喳 +骨 +辨 +碑 +武 +鸠 +宫 +辜 +烊 +适 +坡 +殃 +培 +佩 +供 +走 +蜈 +迟 +翼 +况 +姣 +凛 +浔 +吃 +飘 +债 +犟 +金 +促 +苛 +崇 +坂 +莳 +畔 +绂 +兵 +蠕 +斋 +根 +砍 +亢 +欢 +恬 +崔 +剁 +餐 +榫 +快 +扶 +‖ +濒 +缠 +鳜 +当 +彭 +驭 +浦 +篮 +昀 +锆 +秸 +钳 +弋 +娣 +瞑 +夷 +龛 +苫 +拱 +致 +% +嵊 +障 +隐 +弑 +初 +娓 +抉 +汩 +累 +蓖 +" +唬 +助 +苓 +昙 +押 +毙 +破 +城 +郧 +逢 +嚏 +獭 +瞻 +溱 +婿 +赊 +跨 +恼 +璧 +萃 +姻 +貉 +灵 +炉 +密 +氛 +陶 +砸 +谬 +衔 +点 +琛 +沛 +枳 +层 +岱 +诺 +脍 +榈 +埂 +征 +冷 +裁 +打 +蹴 +素 +瘘 +逞 +蛐 +聊 +激 +腱 +萘 +踵 +飒 +蓟 +吆 +取 +咙 +簋 +涓 +矩 +曝 +挺 +揣 +座 +你 +史 +舵 +焱 +尘 +苏 +笈 +脚 +溉 +榨 +诵 +樊 +邓 +焊 +义 +庶 +儋 +蟋 +蒲 +赦 +呷 +杞 +诠 +豪 +还 +试 +颓 +茉 +太 +除 +紫 +逃 +痴 +草 +充 +鳕 +珉 +祗 +墨 +渭 +烩 +蘸 +慕 +璇 +镶 +穴 +嵘 +恶 +骂 +险 +绋 +幕 +碉 +肺 +戳 +刘 +潞 +秣 +纾 +潜 +銮 +洛 +须 +罘 +销 +瘪 +汞 +兮 +屉 +r +林 +厕 +质 +探 +划 +狸 +殚 +善 +煊 +烹 +〒 +锈 +逯 +宸 +辍 +泱 +柚 +袍 +远 +蹋 +嶙 +绝 +峥 +娥 +缍 +雀 +徵 +认 +镱 +谷 += +贩 +勉 +撩 +鄯 +斐 +洋 +非 +祚 +泾 +诒 +饿 +撬 +威 +晷 +搭 +芍 +锥 +笺 +蓦 +候 +琊 +档 +礁 +沼 +卵 +荠 +忑 +朝 +凹 +瑞 +头 +仪 +弧 +孵 +畏 +铆 +突 +衲 +车 +浩 +气 +茂 +悖 +厢 +枕 +酝 +戴 +湾 +邹 +飚 +攘 +锂 +写 +宵 +翁 +岷 +无 +喜 +丈 +挑 +嗟 +绛 +殉 +议 +槽 +具 +醇 +淞 +笃 +郴 +阅 +饼 +底 +壕 +砚 +弈 +询 +缕 +庹 +翟 +零 +筷 +暨 +舟 +闺 +甯 +撞 +麂 +茌 +蔼 +很 +珲 +捕 +棠 +角 +阉 +媛 +娲 +诽 +剿 +尉 +爵 +睬 +韩 +诰 +匣 +危 +糍 +镯 +立 +浏 +阳 +少 +盆 +舔 +擘 +匪 +申 +尬 +铣 +旯 +抖 +赘 +瓯 +居 +ˇ +哮 +游 +锭 +茏 +歌 +坏 +甚 +秒 +舞 +沙 +仗 +劲 +潺 +阿 +燧 +郭 +嗖 +霏 +忠 +材 +奂 +耐 +跺 +砀 +输 +岖 +媳 +氟 +极 +摆 +灿 +今 +扔 +腻 +枝 +奎 +药 +熄 +吨 +话 +q +额 +慑 +嘌 +协 +喀 +壳 +埭 +视 +著 +於 +愧 +陲 +翌 +峁 +颅 +佛 +腹 +聋 +侯 +咎 +叟 +秀 +颇 +存 +较 +罪 +哄 +岗 +扫 +栏 +钾 +羌 +己 +璨 +枭 +霉 +煌 +涸 +衿 +键 +镝 +益 +岢 +奏 +连 +夯 +睿 +冥 +均 +糖 +狞 +蹊 +稻 +爸 +刿 +胥 +煜 +丽 +肿 +璃 +掸 +跚 +灾 +垂 +樾 +濑 +乎 +莲 +窄 +犹 +撮 +战 +馄 +软 +络 +显 +鸢 +胸 +宾 +妲 +恕 +埔 +蝌 +份 +遇 +巧 +瞟 +粒 +恰 +剥 +桡 +博 +讯 +凯 +堇 +阶 +滤 +卖 +斌 +骚 +彬 +兑 +磺 +樱 +舷 +两 +娱 +福 +仃 +差 +找 +桁 +÷ +净 +把 +阴 +污 +戬 +雷 +碓 +蕲 +楚 +罡 +焖 +抽 +妫 +咒 +仑 +闱 +尽 +邑 +菁 +爱 +贷 +沥 +鞑 +牡 +嗉 +崴 +骤 +塌 +嗦 +订 +拮 +滓 +捡 +锻 +次 +坪 +杩 +臃 +箬 +融 +珂 +鹗 +宗 +枚 +降 +鸬 +妯 +阄 +堰 +盐 +毅 +必 +杨 +崃 +俺 +甬 +状 +莘 +货 +耸 +菱 +腼 +铸 +唏 +痤 +孚 +澳 +懒 +溅 +翘 +疙 +杷 +淼 +缙 +骰 +喊 +悉 +砻 +坷 +艇 +赁 +界 +谤 +纣 +宴 +晃 +茹 +归 +饭 +梢 +铡 +街 +抄 +肼 +鬟 +苯 +颂 +撷 +戈 +炒 +咆 +茭 +瘙 +负 +仰 +客 +琉 +铢 +封 +卑 +珥 +椿 +镧 +窨 +鬲 +寿 +御 +袤 +铃 +萎 +砖 +餮 +脒 +裳 +肪 +孕 +嫣 +馗 +嵇 +恳 +氯 +江 +石 +褶 +冢 +祸 +阻 +狈 +羞 +银 +靳 +透 +咳 +叼 +敷 +芷 +啥 +它 +瓤 +兰 +痘 +懊 +逑 +肌 +往 +捺 +坊 +甩 +呻 +〃 +沦 +忘 +膻 +祟 +菅 +剧 +崆 +智 +坯 +臧 +霍 +墅 +攻 +眯 +倘 +拢 +骠 +铐 +庭 +岙 +瓠 +′ +缺 +泥 +迢 +捶 +? +? +郏 +喙 +掷 +沌 +纯 +秘 +种 +听 +绘 +固 +螨 +团 +香 +盗 +妒 +埚 +蓝 +拖 +旱 +荞 +铀 +血 +遏 +汲 +辰 +叩 +拽 +幅 +硬 +惶 +桀 +漠 +措 +泼 +唑 +齐 +肾 +念 +酱 +虚 +屁 +耶 +旗 +砦 +闵 +婉 +馆 +拭 +绅 +韧 +忏 +窝 +醋 +葺 +顾 +辞 +倜 +堆 +辋 +逆 +玟 +贱 +疾 +董 +惘 +倌 +锕 +淘 +嘀 +莽 +俭 +笏 +绑 +鲷 +杈 +择 +蟀 +粥 +嗯 +驰 +逾 +案 +谪 +褓 +胫 +哩 +昕 +颚 +鲢 +绠 +躺 +鹄 +崂 +儒 +俨 +丝 +尕 +泌 +啊 +萸 +彰 +幺 +吟 +骄 +苣 +弦 +脊 +瑰 +〈 +诛 +镁 +析 +闪 +剪 +侧 +哟 +框 +螃 +守 +嬗 +燕 +狭 +铈 +缮 +概 +迳 +痧 +鲲 +俯 +售 +笼 +痣 +扉 +挖 +满 +咋 +援 +邱 +扇 +歪 +便 +玑 +绦 +峡 +蛇 +叨 +〖 +泽 +胃 +斓 +喋 +怂 +坟 +猪 +该 +蚬 +炕 +弥 +赞 +棣 +晔 +娠 +挲 +狡 +创 +疖 +铕 +镭 +稷 +挫 +弭 +啾 +翔 +粉 +履 +苘 +哦 +楼 +秕 +铂 +土 +锣 +瘟 +挣 +栉 +习 +享 +桢 +袅 +磨 +桂 +谦 +延 +坚 +蔚 +噗 +署 +谟 +猬 +钎 +恐 +嬉 +雒 +倦 +衅 +亏 +璩 +睹 +刻 +殿 +王 +算 +雕 +麻 +丘 +柯 +骆 +丸 +塍 +谚 +添 +鲈 +垓 +桎 +蚯 +芥 +予 +飕 +镦 +谌 +窗 +醚 +菀 +亮 +搪 +莺 +蒿 +羁 +足 +J +真 +轶 +悬 +衷 +靛 +翊 +掩 +哒 +炅 +掐 +冼 +妮 +l +谐 +稚 +荆 +擒 +犯 +陵 +虏 +浓 +崽 +刍 +陌 +傻 +孜 +千 +靖 +演 +矜 +钕 +煽 +杰 +酗 +渗 +伞 +栋 +俗 +泫 +戍 +罕 +沾 +疽 +灏 +煦 +芬 +磴 +叱 +阱 +榉 +湃 +蜀 +叉 +醒 +彪 +租 +郡 +篷 +屎 +良 +垢 +隗 +弱 +陨 +峪 +砷 +掴 +颁 +胎 +雯 +绵 +贬 +沐 +撵 +隘 +篙 +暖 +曹 +陡 +栓 +填 +臼 +彦 +瓶 +琪 +潼 +哪 +鸡 +摩 +啦 +俟 +锋 +域 +耻 +蔫 +疯 +纹 +撇 +毒 +绶 +痛 +酯 +忍 +爪 +赳 +歆 +嘹 +辕 +烈 +册 +朴 +钱 +吮 +毯 +癜 +娃 +谀 +邵 +厮 +炽 +璞 +邃 +丐 +追 +词 +瓒 +忆 +轧 +芫 +谯 +喷 +弟 +半 +冕 +裙 +掖 +墉 +绮 +寝 +苔 +势 +顷 +褥 +切 +衮 +君 +佳 +嫒 +蚩 +霞 +佚 +洙 +逊 +镖 +暹 +唛 +& +殒 +顶 +碗 +獗 +轭 +铺 +蛊 +废 +恹 +汨 +崩 +珍 +那 +杵 +曲 +纺 +夏 +薰 +傀 +闳 +淬 +姘 +舀 +拧 +卷 +楂 +恍 +讪 +厩 +寮 +篪 +赓 +乘 +灭 +盅 +鞣 +沟 +慎 +挂 +饺 +鼾 +杳 +树 +缨 +丛 +絮 +娌 +臻 +嗳 +篡 +侩 +述 +衰 +矛 +圈 +蚜 +匕 +筹 +匿 +濞 +晨 +叶 +骋 +郝 +挚 +蚴 +滞 +增 +侍 +描 +瓣 +吖 +嫦 +蟒 +匾 +圣 +赌 +毡 +癞 +恺 +百 +曳 +需 +篓 +肮 +庖 +帏 +卿 +驿 +遗 +蹬 +鬓 +骡 +歉 +芎 +胳 +屐 +禽 +烦 +晌 +寄 +媾 +狄 +翡 +苒 +船 +廉 +终 +痞 +殇 +々 +畦 +饶 +改 +拆 +悻 +萄 +£ +瓿 +乃 +訾 +桅 +匮 +溧 +拥 +纱 +铍 +骗 +蕃 +龋 +缬 +父 +佐 +疚 +栎 +醍 +掳 +蓄 +x +惆 +颜 +鲆 +榆 +〔 +猎 +敌 +暴 +谥 +鲫 +贾 +罗 +玻 +缄 +扦 +芪 +癣 +落 +徒 +臾 +恿 +猩 +托 +邴 +肄 +牵 +春 +陛 +耀 +刊 +拓 +蓓 +邳 +堕 +寇 +枉 +淌 +啡 +湄 +兽 +酷 +萼 +碚 +濠 +萤 +夹 +旬 +戮 +梭 +琥 +椭 +昔 +勺 +蜊 +绐 +晚 +孺 +僵 +宣 +摄 +冽 +旨 +萌 +忙 +蚤 +眉 +噼 +蟑 +付 +契 +瓜 +悼 +颡 +壁 +曾 +窕 +颢 +澎 +仿 +俑 +浑 +嵌 +浣 +乍 +碌 +褪 +乱 +蔟 +隙 +玩 +剐 +葫 +箫 +纲 +围 +伐 +决 +伙 +漩 +瑟 +刑 +肓 +镳 +缓 +蹭 +氨 +皓 +典 +畲 +坍 +铑 +檐 +塑 +洞 +倬 +储 +胴 +淳 +戾 +吐 +灼 +惺 +妙 +毕 +珐 +缈 +虱 +盖 +羰 +鸿 +磅 +谓 +髅 +娴 +苴 +唷 +蚣 +霹 +抨 +贤 +唠 +犬 +誓 +逍 +庠 +逼 +麓 +籼 +釉 +呜 +碧 +秧 +氩 +摔 +霄 +穸 +纨 +辟 +妈 +映 +完 +牛 +缴 +嗷 +炊 +恩 +荔 +茆 +掉 +紊 +慌 +莓 +羟 +阙 +萁 +磐 +另 +蕹 +辱 +鳐 +湮 +吡 +吩 +唐 +睦 +垠 +舒 +圜 +冗 +瞿 +溺 +芾 +囱 +匠 +僳 +汐 +菩 +饬 +漓 +黑 +霰 +浸 +濡 +窥 +毂 +蒡 +兢 +驻 +鹉 +芮 +诙 +迫 +雳 +厂 +忐 +臆 +猴 +鸣 +蚪 +栈 +箕 +羡 +渐 +莆 +捍 +眈 +哓 +趴 +蹼 +埕 +嚣 +骛 +宏 +淄 +斑 +噜 +严 +瑛 +垃 +椎 +诱 +压 +庾 +绞 +焘 +廿 +抡 +迄 +棘 +夫 +纬 +锹 +眨 +瞌 +侠 +脐 +竞 +瀑 +孳 +骧 +遁 +姜 +颦 +荪 +滚 +萦 +伪 +逸 +粳 +爬 +锁 +矣 +役 +趣 +洒 +颔 +诏 +逐 +奸 +甭 +惠 +攀 +蹄 +泛 +尼 +拼 +阮 +鹰 +亚 +颈 +惑 +勒 +〉 +际 +肛 +爷 +刚 +钨 +丰 +养 +冶 +鲽 +辉 +蔻 +画 +覆 +皴 +妊 +麦 +返 +醉 +皂 +擀 +〗 +酶 +凑 +粹 +悟 +诀 +硖 +港 +卜 +z +杀 +涕 +± +舍 +铠 +抵 +弛 +段 +敝 +镐 +奠 +拂 +轴 +跛 +袱 +e +t +沉 +菇 +俎 +薪 +峦 +秭 +蟹 +历 +盟 +菠 +寡 +液 +肢 +喻 +染 +裱 +悱 +抱 +氙 +赤 +捅 +猛 +跑 +氮 +谣 +仁 +尺 +辊 +窍 +烙 +衍 +架 +擦 +倏 +璐 +瑁 +币 +楞 +胖 +夔 +趸 +邛 +惴 +饕 +虔 +蝎 +§ +哉 +贝 +宽 +辫 +炮 +扩 +饲 +籽 +魏 +菟 +锰 +伍 +猝 +末 +琳 +哚 +蛎 +邂 +呀 +姿 +鄞 +却 +歧 +仙 +恸 +椐 +森 +牒 +寤 +袒 +婆 +虢 +雅 +钉 +朵 +贼 +欲 +苞 +寰 +故 +龚 +坭 +嘘 +咫 +礼 +硷 +兀 +睢 +汶 +’ +铲 +烧 +绕 +诃 +浃 +钿 +哺 +柜 +讼 +颊 +璁 +腔 +洽 +咐 +脲 +簌 +筠 +镣 +玮 +鞠 +谁 +兼 +姆 +挥 +梯 +蝴 +谘 +漕 +刷 +躏 +宦 +弼 +b +垌 +劈 +麟 +莉 +揭 +笙 +渎 +仕 +嗤 +仓 +配 +怏 +抬 +错 +泯 +镊 +孰 +猿 +邪 +仍 +秋 +鼬 +壹 +歇 +吵 +炼 +< +尧 +射 +柬 +廷 +胧 +霾 +凳 +隋 +肚 +浮 +梦 +祥 +株 +堵 +退 +L +鹫 +跎 +凶 +毽 +荟 +炫 +栩 +玳 +甜 +沂 +鹿 +顽 +伯 +爹 +赔 +蛴 +徐 +匡 +欣 +狰 +缸 +雹 +蟆 +疤 +默 +沤 +啜 +痂 +衣 +禅 +w +i +h +辽 +葳 +黝 +钗 +停 +沽 +棒 +馨 +颌 +肉 +吴 +硫 +悯 +劾 +娈 +马 +啧 +吊 +悌 +镑 +峭 +帆 +瀣 +涉 +咸 +疸 +滋 +泣 +翦 +拙 +癸 +钥 +蜒 ++ +尾 +庄 +凝 +泉 +婢 +渴 +谊 +乞 +陆 +锉 +糊 +鸦 +淮 +I +B +N +晦 +弗 +乔 +庥 +葡 +尻 +席 +橡 +傣 +渣 +拿 +惩 +麋 +斛 +缃 +矮 +蛏 +岘 +鸽 +姐 +膏 +催 +奔 +镒 +喱 +蠡 +摧 +钯 +胤 +柠 +拐 +璋 +鸥 +卢 +荡 +倾 +^ +_ +珀 +逄 +萧 +塾 +掇 +贮 +笆 +聂 +圃 +冲 +嵬 +M +滔 +笕 +值 +炙 +偶 +蜱 +搐 +梆 +汪 +蔬 +腑 +鸯 +蹇 +敞 +绯 +仨 +祯 +谆 +梧 +糗 +鑫 +啸 +豺 +囹 +猾 +巢 +柄 +瀛 +筑 +踌 +沭 +暗 +苁 +鱿 +蹉 +脂 +蘖 +牢 +热 +木 +吸 +溃 +宠 +序 +泞 +偿 +拜 +檩 +厚 +朐 +毗 +螳 +吞 +媚 +朽 +担 +蝗 +橘 +畴 +祈 +糟 +盱 +隼 +郜 +惜 +珠 +裨 +铵 +焙 +琚 +唯 +咚 +噪 +骊 +丫 +滢 +勤 +棉 +呸 +咣 +淀 +隔 +蕾 +窈 +饨 +挨 +煅 +短 +匙 +粕 +镜 +赣 +撕 +墩 +酬 +馁 +豌 +颐 +抗 +酣 +氓 +佑 +搁 +哭 +递 +耷 +涡 +桃 +贻 +碣 +截 +瘦 +昭 +镌 +蔓 +氚 +甲 +猕 +蕴 +蓬 +散 +拾 +纛 +狼 +猷 +铎 +埋 +旖 +矾 +讳 +囊 +糜 +迈 +粟 +蚂 +紧 +鲳 +瘢 +栽 +稼 +羊 +锄 +斟 +睁 +桥 +瓮 +蹙 +祉 +醺 +鼻 +昱 +剃 +跳 +篱 +跷 +蒜 +翎 +宅 +晖 +嗑 +壑 +峻 +癫 +屏 +狠 +陋 +袜 +途 +憎 +祀 +莹 +滟 +佶 +溥 +臣 +约 +盛 +峰 +磁 +慵 +婪 +拦 +莅 +朕 +鹦 +粲 +裤 +哎 +疡 +嫖 +琵 +窟 +堪 +谛 +嘉 +儡 +鳝 +斩 +郾 +驸 +酊 +妄 +胜 +贺 +徙 +傅 +噌 +钢 +栅 +庇 +恋 +匝 +巯 +邈 +尸 +锚 +粗 +佟 +蛟 +薹 +纵 +蚊 +郅 +绢 +锐 +苗 +俞 +篆 +淆 +膀 +鲜 +煎 +诶 +秽 +寻 +涮 +刺 +怀 +噶 +巨 +褰 +魅 +灶 +灌 +桉 +藕 +谜 +舸 +薄 +搀 +恽 +借 +牯 +痉 +渥 +愿 +亓 +耘 +杠 +柩 +锔 +蚶 +钣 +珈 +喘 +蹒 +幽 +赐 +稗 +晤 +莱 +泔 +扯 +肯 +菪 +裆 +腩 +豉 +疆 +骜 +腐 +倭 +珏 +唔 +粮 +亡 +润 +慰 +伽 +橄 +玄 +誉 +醐 +胆 +龊 +粼 +塬 +陇 +彼 +削 +嗣 +绾 +芽 +妗 +垭 +瘴 +爽 +薏 +寨 +龈 +泠 +弹 +赢 +漪 +猫 +嘧 +涂 +恤 +圭 +茧 +烽 +屑 +痕 +巾 +赖 +荸 +凰 +腮 +畈 +亵 +蹲 +偃 +苇 +澜 +艮 +换 +骺 +烘 +苕 +梓 +颉 +肇 +哗 +悄 +氤 +涠 +葬 +屠 +鹭 +植 +竺 +佯 +诣 +鲇 +瘀 +鲅 +邦 +移 +滁 +冯 +耕 +癔 +戌 +茬 +沁 +巩 +悠 +湘 +洪 +痹 +锟 +循 +谋 +腕 +鳃 +钠 +捞 +焉 +迎 +碱 +伫 +急 +榷 +奈 +邝 +卯 +辄 +皲 +卟 +醛 +畹 +忧 +稳 +雄 +昼 +缩 +阈 +睑 +扌 +耗 +曦 +涅 +捏 +瞧 +邕 +淖 +漉 +铝 +耦 +禹 +湛 +喽 +莼 +琅 +诸 +苎 +纂 +硅 +始 +嗨 +傥 +燃 +臂 +赅 +嘈 +呆 +贵 +屹 +壮 +肋 +亍 +蚀 +卅 +豹 +腆 +邬 +迭 +浊 +} +童 +螂 +捐 +圩 +勐 +触 +寞 +汊 +壤 +荫 +膺 +渌 +芳 +懿 +遴 +螈 +泰 +蓼 +蛤 +茜 +舅 +枫 +朔 +膝 +眙 +避 +梅 +判 +鹜 +璜 +牍 +缅 +垫 +藻 +黔 +侥 +惚 +懂 +踩 +腰 +腈 +札 +丞 +唾 +慈 +顿 +摹 +荻 +琬 +~ +斧 +沈 +滂 +胁 +胀 +幄 +莜 +Z +匀 +鄄 +掌 +绰 +茎 +焚 +赋 +萱 +谑 +汁 +铒 +瞎 +夺 +蜗 +野 +娆 +冀 +弯 +篁 +懵 +灞 +隽 +芡 +脘 +俐 +辩 +芯 +掺 +喏 +膈 +蝈 +觐 +悚 +踹 +蔗 +熠 +鼠 +呵 +抓 +橼 +峨 +畜 +缔 +禾 +崭 +弃 +熊 +摒 +凸 +拗 +穹 +蒙 +抒 +祛 +劝 +闫 +扳 +阵 +醌 +踪 +喵 +侣 +搬 +仅 +荧 +赎 +蝾 +琦 +买 +婧 +瞄 +寓 +皎 +冻 +赝 +箩 +莫 +瞰 +郊 +笫 +姝 +筒 +枪 +遣 +煸 +袋 +舆 +痱 +涛 +母 +〇 +启 +践 +耙 +绲 +盘 +遂 +昊 +搞 +槿 +诬 +纰 +泓 +惨 +檬 +亻 +越 +C +o +憩 +熵 +祷 +钒 +暧 +塔 +阗 +胰 +咄 +娶 +魔 +琶 +钞 +邻 +扬 +杉 +殴 +咽 +弓 +〆 +髻 +】 +吭 +揽 +霆 +拄 +殖 +脆 +彻 +岩 +芝 +勃 +辣 +剌 +钝 +嘎 +甄 +佘 +皖 +伦 +授 +徕 +憔 +挪 +皇 +庞 +稔 +芜 +踏 +溴 +兖 +卒 +擢 +饥 +鳞 +煲 +‰ +账 +颗 +叻 +斯 +捧 +鳍 +琮 +讹 +蛙 +纽 +谭 +酸 +兔 +莒 +睇 +伟 +觑 +羲 +嗜 +宜 +褐 +旎 +辛 +卦 +诘 +筋 +鎏 +溪 +挛 +熔 +阜 +晰 +鳅 +丢 +奚 +灸 +呱 +献 +陉 +黛 +鸪 +甾 +萨 +疮 +拯 +洲 +疹 +辑 +叙 +恻 +谒 +允 +柔 +烂 +氏 +逅 +漆 +拎 +惋 +扈 +湟 +纭 +啕 +掬 +擞 +哥 +忽 +涤 +鸵 +靡 +郗 +瓷 +扁 +廊 +怨 +雏 +钮 +敦 +E +懦 +憋 +汀 +拚 +啉 +腌 +岸 +f +痼 +瞅 +尊 +咀 +眩 +飙 +忌 +仝 +迦 +熬 +毫 +胯 +篑 +茄 +腺 +凄 +舛 +碴 +锵 +诧 +羯 +後 +漏 +汤 +宓 +仞 +蚁 +壶 +谰 +皑 +铄 +棰 +罔 +辅 +晶 +苦 +牟 +闽 +\ +烃 +饮 +聿 +丙 +蛳 +朱 +煤 +涔 +鳖 +犁 +罐 +荼 +砒 +淦 +妤 +黏 +戎 +孑 +婕 +瑾 +戢 +钵 +枣 +捋 +砥 +衩 +狙 +桠 +稣 +阎 +肃 +梏 +诫 +孪 +昶 +婊 +衫 +嗔 +侃 +塞 +蜃 +樵 +峒 +貌 +屿 +欺 +缫 +阐 +栖 +诟 +珞 +荭 +吝 +萍 +嗽 +恂 +啻 +蜴 +磬 +峋 +俸 +豫 +谎 +徊 +镍 +韬 +魇 +晴 +U +囟 +猜 +蛮 +坐 +囿 +伴 +亭 +肝 +佗 +蝠 +妃 +胞 +滩 +榴 +氖 +垩 +苋 +砣 +扪 +馏 +姓 +轩 +厉 +夥 +侈 +禀 +垒 +岑 +赏 +钛 +辐 +痔 +披 +纸 +碳 +“ +坞 +蠓 +挤 +荥 +沅 +悔 +铧 +帼 +蒌 +蝇 +a +p +y +n +g +哀 +浆 +瑶 +凿 +桶 +馈 +皮 +奴 +苜 +佤 +伶 +晗 +铱 +炬 +优 +弊 +氢 +恃 +甫 +攥 +端 +锌 +灰 +稹 +炝 +曙 +邋 +亥 +眶 +碾 +拉 +萝 +绔 +捷 +浍 +腋 +姑 +菖 +凌 +涞 +麽 +锢 +桨 +潢 +绎 +镰 +殆 +锑 +渝 +铬 +困 +绽 +觎 +匈 +糙 +暑 +裹 +鸟 +盔 +肽 +迷 +綦 +『 +亳 +佝 +俘 +钴 +觇 +骥 +仆 +疝 +跪 +婶 +郯 +瀹 +唉 +脖 +踞 +针 +晾 +忒 +扼 +瞩 +叛 +椒 +疟 +嗡 +邗 +肆 +跆 +玫 +忡 +捣 +咧 +唆 +艄 +蘑 +潦 +笛 +阚 +沸 +泻 +掊 +菽 +贫 +斥 +髂 +孢 +镂 +赂 +麝 +鸾 +屡 +衬 +苷 +恪 +叠 +希 +粤 +爻 +喝 +茫 +惬 +郸 +绻 +庸 +撅 +碟 +宄 +妹 +膛 +叮 +饵 +崛 +嗲 +椅 +冤 +搅 +咕 +敛 +尹 +垦 +闷 +蝉 +霎 +勰 +败 +蓑 +泸 +肤 +鹌 +幌 +焦 +浠 +鞍 +刁 +舰 +乙 +竿 +裔 +。 +茵 +函 +伊 +兄 +丨 +娜 +匍 +謇 +莪 +宥 +似 +蝽 +翳 +酪 +翠 +粑 +薇 +祢 +骏 +赠 +叫 +Q +噤 +噻 +竖 +芗 +莠 +潭 +俊 +羿 +耜 +O +郫 +趁 +嗪 +囚 +蹶 +芒 +洁 +笋 +鹑 +敲 +硝 +啶 +堡 +渲 +揩 +』 +携 +宿 +遒 +颍 +扭 +棱 +割 +萜 +蔸 +葵 +琴 +捂 +饰 +衙 +耿 +掠 +募 +岂 +窖 +涟 +蔺 +瘤 +柞 +瞪 +怜 +匹 +距 +楔 +炜 +哆 +秦 +缎 +幼 +茁 +绪 +痨 +恨 +楸 +娅 +瓦 +桩 +雪 +嬴 +伏 +榔 +妥 +铿 +拌 +眠 +雍 +缇 +‘ +卓 +搓 +哌 +觞 +噩 +屈 +哧 +髓 +咦 +巅 +娑 +侑 +淫 +膳 +祝 +勾 +姊 +莴 +胄 +疃 +薛 +蜷 +胛 +巷 +芙 +芋 +熙 +闰 +勿 +窃 +狱 +剩 +钏 +幢 +陟 +铛 +慧 +靴 +耍 +k +浙 +浇 +飨 +惟 +绗 +祜 +澈 +啼 +咪 +磷 +摞 +诅 +郦 +抹 +跃 +壬 +吕 +肖 +琏 +颤 +尴 +剡 +抠 +凋 +赚 +泊 +津 +宕 +殷 +倔 +氲 +漫 +邺 +涎 +怠 +$ +垮 +荬 +遵 +俏 +叹 +噢 +饽 +蜘 +孙 +筵 +疼 +鞭 +羧 +牦 +箭 +潴 +c +眸 +祭 +髯 +啖 +坳 +愁 +芩 +驮 +倡 +巽 +穰 +沃 +胚 +怒 +凤 +槛 +剂 +趵 +嫁 +v +邢 +灯 +鄢 +桐 +睽 +檗 +锯 +槟 +婷 +嵋 +圻 +诗 +蕈 +颠 +遭 +痢 +芸 +怯 +馥 +竭 +锗 +徜 +恭 +遍 +籁 +剑 +嘱 +苡 +龄 +僧 +桑 +潸 +弘 +澶 +楹 +悲 +讫 +愤 +腥 +悸 +谍 +椹 +呢 +桓 +葭 +攫 +阀 +翰 +躲 +敖 +柑 +郎 +笨 +橇 +呃 +魁 +燎 +脓 +葩 +磋 +垛 +玺 +狮 +沓 +砜 +蕊 +锺 +罹 +蕉 +翱 +虐 +闾 +巫 +旦 +茱 +嬷 +枯 +鹏 +贡 +芹 +汛 +矫 +绁 +拣 +禺 +佃 +讣 +舫 +惯 +乳 +趋 +疲 +挽 +岚 +虾 +衾 +蠹 +蹂 +飓 +氦 +铖 +孩 +稞 +瑜 +壅 +掀 +勘 +妓 +畅 +髋 +W +庐 +牲 +蓿 +榕 +练 +垣 +唱 +邸 +菲 +昆 +婺 +穿 +绡 +麒 +蚱 +掂 +愚 +泷 +涪 +漳 +妩 +娉 +榄 +讷 +觅 +旧 +藤 +煮 +呛 +柳 +腓 +叭 +庵 +烷 +阡 +罂 +蜕 +擂 +猖 +咿 +媲 +脉 +【 +沏 +貅 +黠 +熏 +哲 +烁 +坦 +酵 +兜 +× +潇 +撒 +剽 +珩 +圹 +乾 +摸 +樟 +帽 +嗒 +襄 +魂 +轿 +憬 +锡 +〕 +喃 +皆 +咖 +隅 +脸 +残 +泮 +袂 +鹂 +珊 +囤 +捆 +咤 +误 +徨 +闹 +淙 +芊 +淋 +怆 +囗 +拨 +梳 +渤 +R +G +绨 +蚓 +婀 +幡 +狩 +麾 +谢 +唢 +裸 +旌 +伉 +纶 +裂 +驳 +砼 +咛 +澄 +樨 +蹈 +宙 +澍 +倍 +貔 +操 +勇 +蟠 +摈 +砧 +虬 +够 +缁 +悦 +藿 +撸 +艹 +摁 +淹 +豇 +虎 +榭 +ˉ +吱 +d +° +喧 +荀 +踱 +侮 +奋 +偕 +饷 +犍 +惮 +坑 +璎 +徘 +宛 +妆 +袈 +倩 +窦 +昂 +荏 +乖 +K +怅 +撰 +鳙 +牙 +袁 +酞 +X +痿 +琼 +闸 +雁 +趾 +荚 +虻 +涝 +《 +杏 +韭 +偈 +烤 +绫 +鞘 +卉 +症 +遢 +蓥 +诋 +杭 +荨 +匆 +竣 +簪 +辙 +敕 +虞 +丹 +缭 +咩 +黟 +m +淤 +瑕 +咂 +铉 +硼 +茨 +嶂 +痒 +畸 +敬 +涿 +粪 +窘 +熟 +叔 +嫔 +盾 +忱 +裘 +憾 +梵 +赡 +珙 +咯 +娘 +庙 +溯 +胺 +葱 +痪 +摊 +荷 +卞 +乒 +髦 +寐 +铭 +坩 +胗 +枷 +爆 +溟 +嚼 +羚 +砬 +轨 +惊 +挠 +罄 +竽 +菏 +氧 +浅 +楣 +盼 +枢 +炸 +阆 +杯 +谏 +噬 +淇 +渺 +俪 +秆 +墓 +泪 +跻 +砌 +痰 +垡 +渡 +耽 +釜 +讶 +鳎 +煞 +呗 +韶 +舶 +绷 +鹳 +缜 +旷 +铊 +皱 +龌 +檀 +霖 +奄 +槐 +艳 +蝶 +旋 +哝 +赶 +骞 +蚧 +腊 +盈 +丁 +` +蜚 +矸 +蝙 +睨 +嚓 +僻 +鬼 +醴 +夜 +彝 +磊 +笔 +拔 +栀 +糕 +厦 +邰 +纫 +逭 +纤 +眦 +膊 +馍 +躇 +烯 +蘼 +冬 +诤 +暄 +骶 +哑 +瘠 +」 +臊 +丕 +愈 +咱 +螺 +擅 +跋 +搏 +硪 +谄 +笠 +淡 +嘿 +骅 +谧 +鼎 +皋 +姚 +歼 +蠢 +驼 +耳 +胬 +挝 +涯 +狗 +蒽 +孓 +犷 +凉 +芦 +箴 +铤 +孤 +嘛 +坤 +V +茴 +朦 +挞 +尖 +橙 +诞 +搴 +碇 +洵 +浚 +帚 +蜍 +漯 +柘 +嚎 +讽 +芭 +荤 +咻 +祠 +秉 +跖 +埃 +吓 +糯 +眷 +馒 +惹 +娼 +鲑 +嫩 +讴 +轮 +瞥 +靶 +褚 +乏 +缤 +宋 +帧 +删 +驱 +碎 +扑 +俩 +俄 +偏 +涣 +竹 +噱 +皙 +佰 +渚 +唧 +斡 +# +镉 +刀 +崎 +筐 +佣 +夭 +贰 +肴 +峙 +哔 +艿 +匐 +牺 +镛 +缘 +仡 +嫡 +劣 +枸 +堀 +梨 +簿 +鸭 +蒸 +亦 +稽 +浴 +{ +衢 +束 +槲 +j +阁 +揍 +疥 +棋 +潋 +聪 +窜 +乓 +睛 +插 +冉 +阪 +苍 +搽 +「 +蟾 +螟 +幸 +仇 +樽 +撂 +慢 +跤 +幔 +俚 +淅 +覃 +觊 +溶 +妖 +帛 +侨 +曰 +妾 +泗 +· +: +瀘 +風 +Ë +( +) +∶ +紅 +紗 +瑭 +雲 +頭 +鶏 +財 +許 +• +¥ +樂 +焗 +麗 +— +; +滙 +東 +榮 +繪 +興 +… +門 +業 +π +楊 +國 +顧 +é +盤 +寳 +Λ +龍 +鳳 +島 +誌 +緣 +結 +銭 +萬 +勝 +祎 +璟 +優 +歡 +臨 +時 +購 += +★ +藍 +昇 +鐵 +觀 +勅 +農 +聲 +畫 +兿 +術 +發 +劉 +記 +專 +耑 +園 +書 +壴 +種 +Ο +● +褀 +號 +銀 +匯 +敟 +锘 +葉 +橪 +廣 +進 +蒄 +鑽 +阝 +祙 +貢 +鍋 +豊 +夬 +喆 +團 +閣 +開 +燁 +賓 +館 +酡 +沔 +順 ++ +硚 +劵 +饸 +陽 +車 +湓 +復 +萊 +氣 +軒 +華 +堃 +迮 +纟 +戶 +馬 +學 +裡 +電 +嶽 +獨 +マ +シ +サ +ジ +燘 +袪 +環 +❤ +臺 +灣 +専 +賣 +孖 +聖 +攝 +線 +▪ +α +傢 +俬 +夢 +達 +莊 +喬 +貝 +薩 +劍 +羅 +壓 +棛 +饦 +尃 +璈 +囍 +醫 +G +I +A +# +N +鷄 +髙 +嬰 +啓 +約 +隹 +潔 +賴 +藝 +~ +寶 +籣 +麺 +  +嶺 +√ +義 +網 +峩 +長 +∧ +魚 +機 +構 +② +鳯 +偉 +L +B +㙟 +畵 +鴿 +' +詩 +溝 +嚞 +屌 +藔 +佧 +玥 +蘭 +織 +1 +3 +9 +0 +7 +點 +砭 +鴨 +鋪 +銘 +廳 +弍 +‧ +創 +湯 +坶 +℃ +卩 +骝 +& +烜 +荘 +當 +潤 +扞 +係 +懷 +碶 +钅 +蚨 +讠 +☆ +叢 +爲 +埗 +涫 +塗 +→ +楽 +現 +鯨 +愛 +瑪 +鈺 +忄 +悶 +藥 +飾 +樓 +視 +孬 +ㆍ +燚 +苪 +師 +① +丼 +锽 +│ +韓 +標 +è +兒 +閏 +匋 +張 +漢 +Ü +髪 +會 +閑 +檔 +習 +裝 +の +峯 +菘 +輝 +И +雞 +釣 +億 +浐 +K +O +R +8 +H +E +P +T +W +D +S +C +M +F +姌 +饹 +» +晞 +廰 +ä +嵯 +鷹 +負 +飲 +絲 +冚 +楗 +澤 +綫 +區 +❋ +← +質 +靑 +揚 +③ +滬 +統 +産 +協 +﹑ +乸 +畐 +經 +運 +際 +洺 +岽 +為 +粵 +諾 +崋 +豐 +碁 +ɔ +V +2 +6 +齋 +誠 +訂 +´ +勑 +雙 +陳 +無 +í +泩 +媄 +夌 +刂 +i +c +t +o +r +a +嘢 +耄 +燴 +暃 +壽 +媽 +靈 +抻 +體 +唻 +É +冮 +甹 +鎮 +錦 +ʌ +蜛 +蠄 +尓 +駕 +戀 +飬 +逹 +倫 +貴 +極 +Я +Й +寬 +磚 +嶪 +郎 +職 +| +間 +n +d +剎 +伈 +課 +飛 +橋 +瘊 +№ +譜 +骓 +圗 +滘 +縣 +粿 +咅 +養 +濤 +彳 +® +% +Ⅱ +啰 +㴪 +見 +矞 +薬 +糁 +邨 +鲮 +顔 +罱 +З +選 +話 +贏 +氪 +俵 +競 +瑩 +繡 +枱 +β +綉 +á +獅 +爾 +™ +麵 +戋 +淩 +徳 +個 +劇 +場 +務 +簡 +寵 +h +實 +膠 +轱 +圖 +築 +嘣 +樹 +㸃 +營 +耵 +孫 +饃 +鄺 +飯 +麯 +遠 +輸 +坫 +孃 +乚 +閃 +鏢 +㎡ +題 +廠 +關 +↑ +爺 +將 +軍 +連 +篦 +覌 +參 +箸 +- +窠 +棽 +寕 +夀 +爰 +歐 +呙 +閥 +頡 +熱 +雎 +垟 +裟 +凬 +勁 +帑 +馕 +夆 +疌 +枼 +馮 +貨 +蒤 +樸 +彧 +旸 +靜 +龢 +暢 +㐱 +鳥 +珺 +鏡 +灡 +爭 +堷 +廚 +Ó +騰 +診 +┅ +蘇 +褔 +凱 +頂 +豕 +亞 +帥 +嘬 +⊥ +仺 +桖 +複 +饣 +絡 +穂 +顏 +棟 +納 +▏ +濟 +親 +設 +計 +攵 +埌 +烺 +ò +頤 +燦 +蓮 +撻 +節 +講 +濱 +濃 +娽 +洳 +朿 +燈 +鈴 +護 +膚 +铔 +過 +補 +Z +U +5 +4 +坋 +闿 +䖝 +餘 +缐 +铞 +貿 +铪 +桼 +趙 +鍊 +[ +㐂 +垚 +菓 +揸 +捲 +鐘 +滏 +𣇉 +爍 +輪 +燜 +鴻 +鮮 +動 +鹞 +鷗 +丄 +慶 +鉌 +翥 +飮 +腸 +⇋ +漁 +覺 +來 +熘 +昴 +翏 +鲱 +圧 +鄉 +萭 +頔 +爐 +嫚 +г +貭 +類 +聯 +幛 +輕 +訓 +鑒 +夋 +锨 +芃 +珣 +䝉 +扙 +嵐 +銷 +處 +ㄱ +語 +誘 +苝 +歸 +儀 +燒 +楿 +內 +粢 +葒 +奧 +麥 +礻 +滿 +蠔 +穵 +瞭 +態 +鱬 +榞 +硂 +鄭 +黃 +煙 +祐 +奓 +逺 +* +瑄 +獲 +聞 +薦 +讀 +這 +樣 +決 +問 +啟 +們 +執 +説 +轉 +單 +隨 +唘 +帶 +倉 +庫 +還 +贈 +尙 +皺 +■ +餅 +產 +○ +∈ +報 +狀 +楓 +賠 +琯 +嗮 +禮 +` +傳 +> +≤ +嗞 +Φ +≥ +換 +咭 +∣ +↓ +曬 +ε +応 +寫 +″ +終 +様 +純 +費 +療 +聨 +凍 +壐 +郵 +ü +黒 +∫ +製 +塊 +調 +軽 +確 +撃 +級 +馴 +Ⅲ +涇 +繹 +數 +碼 +證 +狒 +処 +劑 +< +晧 +賀 +衆 +] +櫥 +兩 +陰 +絶 +對 +鯉 +憶 +◎ +p +e +Y +蕒 +煖 +頓 +測 +試 +鼽 +僑 +碩 +妝 +帯 +≈ +鐡 +舖 +權 +喫 +倆 +ˋ +該 +悅 +ā +俫 +. +f +s +b +m +k +g +u +j +貼 +淨 +濕 +針 +適 +備 +l +/ +給 +謢 +強 +觸 +衛 +與 +⊙ +$ +緯 +變 +⑴ +⑵ +⑶ +㎏ +殺 +∩ +幚 +─ +價 +▲ +離 +ú +ó +飄 +烏 +関 +閟 +﹝ +﹞ +邏 +輯 +鍵 +驗 +訣 +導 +歷 +屆 +層 +▼ +儱 +錄 +熳 +ē +艦 +吋 +錶 +辧 +飼 +顯 +④ +禦 +販 +気 +対 +枰 +閩 +紀 +幹 +瞓 +貊 +淚 +△ +眞 +墊 +Ω +獻 +褲 +縫 +緑 +亜 +鉅 +餠 +{ +} +◆ +蘆 +薈 +█ +◇ +溫 +彈 +晳 +粧 +犸 +穩 +訊 +崬 +凖 +熥 +П +舊 +條 +紋 +圍 +Ⅳ +筆 +尷 +難 +雜 +錯 +綁 +識 +頰 +鎖 +艶 +□ +殁 +殼 +⑧ +├ +▕ +鵬 +ǐ +ō +ǒ +糝 +綱 +▎ +μ +盜 +饅 +醬 +籤 +蓋 +釀 +鹽 +據 +à +ɡ +辦 +◥ +彐 +┌ +婦 +獸 +鲩 +伱 +ī +蒟 +蒻 +齊 +袆 +腦 +寧 +凈 +妳 +煥 +詢 +偽 +謹 +啫 +鯽 +騷 +鱸 +損 +傷 +鎻 +髮 +買 +冏 +儥 +両 +﹢ +∞ +載 +喰 +z +羙 +悵 +燙 +曉 +員 +組 +徹 +艷 +痠 +鋼 +鼙 +縮 +細 +嚒 +爯 +≠ +維 +" +鱻 +壇 +厍 +帰 +浥 +犇 +薡 +軎 +² +應 +醜 +刪 +緻 +鶴 +賜 +噁 +軌 +尨 +镔 +鷺 +槗 +彌 +葚 +濛 +請 +溇 +緹 +賢 +訪 +獴 +瑅 +資 +縤 +陣 +蕟 +栢 +韻 +祼 +恁 +伢 +謝 +劃 +涑 +總 +衖 +踺 +砋 +凉 +籃 +駿 +苼 +瘋 +昽 +紡 +驊 +腎 +﹗ +響 +杋 +剛 +嚴 +禪 +歓 +槍 +傘 +檸 +檫 +炣 +勢 +鏜 +鎢 +銑 +尐 +減 +奪 +惡 +θ +僮 +婭 +臘 +ū +ì +殻 +鉄 +∑ +蛲 +焼 +緖 +續 +紹 +懮 \ No newline at end of file diff --git a/deploy/ios_demo/ocr_demo/main.m b/deploy/ios_demo/ocr_demo/main.m new file mode 100644 index 0000000000000000000000000000000000000000..22813be2a168759f88d48d833c99ec154c9904d6 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/main.m @@ -0,0 +1,16 @@ +// +// main.m +// seg_demo +// +// Created by Li,Xiaoyang(SYS) on 2018/11/13. +// Copyright © 2018年 Li,Xiaoyang(SYS). All rights reserved. +// + +#import +#import "AppDelegate.h" + +int main(int argc, char * argv[]) { + @autoreleasepool { + return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class])); + } +} diff --git a/deploy/ios_demo/ocr_demo/ocr.png b/deploy/ios_demo/ocr_demo/ocr.png new file mode 100644 index 0000000000000000000000000000000000000000..f628458708df7fdfd731816767ad9373f307c966 Binary files /dev/null and b/deploy/ios_demo/ocr_demo/ocr.png differ diff --git a/deploy/ios_demo/ocr_demo/pdocr/ocr_clipper.cpp b/deploy/ios_demo/ocr_demo/pdocr/ocr_clipper.cpp new file mode 100644 index 0000000000000000000000000000000000000000..d5db34720246818d644ebe1a59b1b8845a5c1c15 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/pdocr/ocr_clipper.cpp @@ -0,0 +1,4629 @@ +/******************************************************************************* +* * +* Author : Angus Johnson * +* Version : 6.4.2 * +* Date : 27 February 2017 * +* Website : http://www.angusj.com * +* Copyright : Angus Johnson 2010-2017 * +* * +* License: * +* Use, modification & distribution is subject to Boost Software License Ver 1. * +* http://www.boost.org/LICENSE_1_0.txt * +* * +* Attributions: * +* The code in this library is an extension of Bala Vatti's clipping algorithm: * +* "A generic solution to polygon clipping" * +* Communications of the ACM, Vol 35, Issue 7 (July 1992) pp 56-63. * +* http://portal.acm.org/citation.cfm?id=129906 * +* * +* Computer graphics and geometric modeling: implementation and algorithms * +* By Max K. Agoston * +* Springer; 1 edition (January 4, 2005) * +* http://books.google.com/books?q=vatti+clipping+agoston * +* * +* See also: * +* "Polygon Offsetting by Computing Winding Numbers" * +* Paper no. DETC2005-85513 pp. 565-575 * +* ASME 2005 International Design Engineering Technical Conferences * +* and Computers and Information in Engineering Conference (IDETC/CIE2005) * +* September 24-28, 2005 , Long Beach, California, USA * +* http://www.me.berkeley.edu/~mcmains/pubs/DAC05OffsetPolygon.pdf * +* * +*******************************************************************************/ + +/******************************************************************************* +* * +* This is a translation of the Delphi Clipper library and the naming style * +* used has retained a Delphi flavour. * +* * +*******************************************************************************/ + +#include "ocr_clipper.hpp" +#include +#include +#include +#include +#include +#include +#include +#include + +namespace ClipperLib { + +static double const pi = 3.141592653589793238; +static double const two_pi = pi *2; +static double const def_arc_tolerance = 0.25; + +enum Direction { dRightToLeft, dLeftToRight }; + +static int const Unassigned = -1; //edge not currently 'owning' a solution +static int const Skip = -2; //edge that would otherwise close a path + +#define HORIZONTAL (-1.0E+40) +#define TOLERANCE (1.0e-20) +#define NEAR_ZERO(val) (((val) > -TOLERANCE) && ((val) < TOLERANCE)) + +struct TEdge { + IntPoint Bot; + IntPoint Curr; //current (updated for every new scanbeam) + IntPoint Top; + double Dx; + PolyType PolyTyp; + EdgeSide Side; //side only refers to current side of solution poly + int WindDelta; //1 or -1 depending on winding direction + int WindCnt; + int WindCnt2; //winding count of the opposite polytype + int OutIdx; + TEdge *Next; + TEdge *Prev; + TEdge *NextInLML; + TEdge *NextInAEL; + TEdge *PrevInAEL; + TEdge *NextInSEL; + TEdge *PrevInSEL; +}; + +struct IntersectNode { + TEdge *Edge1; + TEdge *Edge2; + IntPoint Pt; +}; + +struct LocalMinimum { + cInt Y; + TEdge *LeftBound; + TEdge *RightBound; +}; + +struct OutPt; + +//OutRec: contains a path in the clipping solution. Edges in the AEL will +//carry a pointer to an OutRec when they are part of the clipping solution. +struct OutRec { + int Idx; + bool IsHole; + bool IsOpen; + OutRec *FirstLeft; //see comments in clipper.pas + PolyNode *PolyNd; + OutPt *Pts; + OutPt *BottomPt; +}; + +struct OutPt { + int Idx; + IntPoint Pt; + OutPt *Next; + OutPt *Prev; +}; + +struct Join { + OutPt *OutPt1; + OutPt *OutPt2; + IntPoint OffPt; +}; + +struct LocMinSorter +{ + inline bool operator()(const LocalMinimum& locMin1, const LocalMinimum& locMin2) + { + return locMin2.Y < locMin1.Y; + } +}; + +//------------------------------------------------------------------------------ +//------------------------------------------------------------------------------ + +inline cInt Round(double val) +{ + if ((val < 0)) return static_cast(val - 0.5); + else return static_cast(val + 0.5); +} +//------------------------------------------------------------------------------ + +inline cInt Abs(cInt val) +{ + return val < 0 ? -val : val; +} + +//------------------------------------------------------------------------------ +// PolyTree methods ... +//------------------------------------------------------------------------------ + +void PolyTree::Clear() +{ + for (PolyNodes::size_type i = 0; i < AllNodes.size(); ++i) + delete AllNodes[i]; + AllNodes.resize(0); + Childs.resize(0); +} +//------------------------------------------------------------------------------ + +PolyNode* PolyTree::GetFirst() const +{ + if (!Childs.empty()) + return Childs[0]; + else + return 0; +} +//------------------------------------------------------------------------------ + +int PolyTree::Total() const +{ + int result = (int)AllNodes.size(); + //with negative offsets, ignore the hidden outer polygon ... + if (result > 0 && Childs[0] != AllNodes[0]) result--; + return result; +} + +//------------------------------------------------------------------------------ +// PolyNode methods ... +//------------------------------------------------------------------------------ + +PolyNode::PolyNode(): Parent(0), Index(0), m_IsOpen(false) +{ +} +//------------------------------------------------------------------------------ + +int PolyNode::ChildCount() const +{ + return (int)Childs.size(); +} +//------------------------------------------------------------------------------ + +void PolyNode::AddChild(PolyNode& child) +{ + unsigned cnt = (unsigned)Childs.size(); + Childs.push_back(&child); + child.Parent = this; + child.Index = cnt; +} +//------------------------------------------------------------------------------ + +PolyNode* PolyNode::GetNext() const +{ + if (!Childs.empty()) + return Childs[0]; + else + return GetNextSiblingUp(); +} +//------------------------------------------------------------------------------ + +PolyNode* PolyNode::GetNextSiblingUp() const +{ + if (!Parent) //protects against PolyTree.GetNextSiblingUp() + return 0; + else if (Index == Parent->Childs.size() - 1) + return Parent->GetNextSiblingUp(); + else + return Parent->Childs[Index + 1]; +} +//------------------------------------------------------------------------------ + +bool PolyNode::IsHole() const +{ + bool result = true; + PolyNode* node = Parent; + while (node) + { + result = !result; + node = node->Parent; + } + return result; +} +//------------------------------------------------------------------------------ + +bool PolyNode::IsOpen() const +{ + return m_IsOpen; +} +//------------------------------------------------------------------------------ + +#ifndef use_int32 + +//------------------------------------------------------------------------------ +// Int128 class (enables safe math on signed 64bit integers) +// eg Int128 val1((long64)9223372036854775807); //ie 2^63 -1 +// Int128 val2((long64)9223372036854775807); +// Int128 val3 = val1 * val2; +// val3.AsString => "85070591730234615847396907784232501249" (8.5e+37) +//------------------------------------------------------------------------------ + +class Int128 +{ + public: + ulong64 lo; + long64 hi; + + Int128(long64 _lo = 0) + { + lo = (ulong64)_lo; + if (_lo < 0) hi = -1; else hi = 0; + } + + + Int128(const Int128 &val): lo(val.lo), hi(val.hi){} + + Int128(const long64& _hi, const ulong64& _lo): lo(_lo), hi(_hi){} + + Int128& operator = (const long64 &val) + { + lo = (ulong64)val; + if (val < 0) hi = -1; else hi = 0; + return *this; + } + + bool operator == (const Int128 &val) const + {return (hi == val.hi && lo == val.lo);} + + bool operator != (const Int128 &val) const + { return !(*this == val);} + + bool operator > (const Int128 &val) const + { + if (hi != val.hi) + return hi > val.hi; + else + return lo > val.lo; + } + + bool operator < (const Int128 &val) const + { + if (hi != val.hi) + return hi < val.hi; + else + return lo < val.lo; + } + + bool operator >= (const Int128 &val) const + { return !(*this < val);} + + bool operator <= (const Int128 &val) const + { return !(*this > val);} + + Int128& operator += (const Int128 &rhs) + { + hi += rhs.hi; + lo += rhs.lo; + if (lo < rhs.lo) hi++; + return *this; + } + + Int128 operator + (const Int128 &rhs) const + { + Int128 result(*this); + result+= rhs; + return result; + } + + Int128& operator -= (const Int128 &rhs) + { + *this += -rhs; + return *this; + } + + Int128 operator - (const Int128 &rhs) const + { + Int128 result(*this); + result -= rhs; + return result; + } + + Int128 operator-() const //unary negation + { + if (lo == 0) + return Int128(-hi, 0); + else + return Int128(~hi, ~lo + 1); + } + + operator double() const + { + const double shift64 = 18446744073709551616.0; //2^64 + if (hi < 0) + { + if (lo == 0) return (double)hi * shift64; + else return -(double)(~lo + ~hi * shift64); + } + else + return (double)(lo + hi * shift64); + } + +}; +//------------------------------------------------------------------------------ + +Int128 Int128Mul (long64 lhs, long64 rhs) +{ + bool negate = (lhs < 0) != (rhs < 0); + + if (lhs < 0) lhs = -lhs; + ulong64 int1Hi = ulong64(lhs) >> 32; + ulong64 int1Lo = ulong64(lhs & 0xFFFFFFFF); + + if (rhs < 0) rhs = -rhs; + ulong64 int2Hi = ulong64(rhs) >> 32; + ulong64 int2Lo = ulong64(rhs & 0xFFFFFFFF); + + //nb: see comments in clipper.pas + ulong64 a = int1Hi * int2Hi; + ulong64 b = int1Lo * int2Lo; + ulong64 c = int1Hi * int2Lo + int1Lo * int2Hi; + + Int128 tmp; + tmp.hi = long64(a + (c >> 32)); + tmp.lo = long64(c << 32); + tmp.lo += long64(b); + if (tmp.lo < b) tmp.hi++; + if (negate) tmp = -tmp; + return tmp; +}; +#endif + +//------------------------------------------------------------------------------ +// Miscellaneous global functions +//------------------------------------------------------------------------------ + +bool Orientation(const Path &poly) +{ + return Area(poly) >= 0; +} +//------------------------------------------------------------------------------ + +double Area(const Path &poly) +{ + int size = (int)poly.size(); + if (size < 3) return 0; + + double a = 0; + for (int i = 0, j = size -1; i < size; ++i) + { + a += ((double)poly[j].X + poly[i].X) * ((double)poly[j].Y - poly[i].Y); + j = i; + } + return -a * 0.5; +} +//------------------------------------------------------------------------------ + +double Area(const OutPt *op) +{ + const OutPt *startOp = op; + if (!op) return 0; + double a = 0; + do { + a += (double)(op->Prev->Pt.X + op->Pt.X) * (double)(op->Prev->Pt.Y - op->Pt.Y); + op = op->Next; + } while (op != startOp); + return a * 0.5; +} +//------------------------------------------------------------------------------ + +double Area(const OutRec &outRec) +{ + return Area(outRec.Pts); +} +//------------------------------------------------------------------------------ + +bool PointIsVertex(const IntPoint &Pt, OutPt *pp) +{ + OutPt *pp2 = pp; + do + { + if (pp2->Pt == Pt) return true; + pp2 = pp2->Next; + } + while (pp2 != pp); + return false; +} +//------------------------------------------------------------------------------ + +//See "The Point in Polygon Problem for Arbitrary Polygons" by Hormann & Agathos +//http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.5498&rep=rep1&type=pdf +int PointInPolygon(const IntPoint &pt, const Path &path) +{ + //returns 0 if false, +1 if true, -1 if pt ON polygon boundary + int result = 0; + size_t cnt = path.size(); + if (cnt < 3) return 0; + IntPoint ip = path[0]; + for(size_t i = 1; i <= cnt; ++i) + { + IntPoint ipNext = (i == cnt ? path[0] : path[i]); + if (ipNext.Y == pt.Y) + { + if ((ipNext.X == pt.X) || (ip.Y == pt.Y && + ((ipNext.X > pt.X) == (ip.X < pt.X)))) return -1; + } + if ((ip.Y < pt.Y) != (ipNext.Y < pt.Y)) + { + if (ip.X >= pt.X) + { + if (ipNext.X > pt.X) result = 1 - result; + else + { + double d = (double)(ip.X - pt.X) * (ipNext.Y - pt.Y) - + (double)(ipNext.X - pt.X) * (ip.Y - pt.Y); + if (!d) return -1; + if ((d > 0) == (ipNext.Y > ip.Y)) result = 1 - result; + } + } else + { + if (ipNext.X > pt.X) + { + double d = (double)(ip.X - pt.X) * (ipNext.Y - pt.Y) - + (double)(ipNext.X - pt.X) * (ip.Y - pt.Y); + if (!d) return -1; + if ((d > 0) == (ipNext.Y > ip.Y)) result = 1 - result; + } + } + } + ip = ipNext; + } + return result; +} +//------------------------------------------------------------------------------ + +int PointInPolygon (const IntPoint &pt, OutPt *op) +{ + //returns 0 if false, +1 if true, -1 if pt ON polygon boundary + int result = 0; + OutPt* startOp = op; + for(;;) + { + if (op->Next->Pt.Y == pt.Y) + { + if ((op->Next->Pt.X == pt.X) || (op->Pt.Y == pt.Y && + ((op->Next->Pt.X > pt.X) == (op->Pt.X < pt.X)))) return -1; + } + if ((op->Pt.Y < pt.Y) != (op->Next->Pt.Y < pt.Y)) + { + if (op->Pt.X >= pt.X) + { + if (op->Next->Pt.X > pt.X) result = 1 - result; + else + { + double d = (double)(op->Pt.X - pt.X) * (op->Next->Pt.Y - pt.Y) - + (double)(op->Next->Pt.X - pt.X) * (op->Pt.Y - pt.Y); + if (!d) return -1; + if ((d > 0) == (op->Next->Pt.Y > op->Pt.Y)) result = 1 - result; + } + } else + { + if (op->Next->Pt.X > pt.X) + { + double d = (double)(op->Pt.X - pt.X) * (op->Next->Pt.Y - pt.Y) - + (double)(op->Next->Pt.X - pt.X) * (op->Pt.Y - pt.Y); + if (!d) return -1; + if ((d > 0) == (op->Next->Pt.Y > op->Pt.Y)) result = 1 - result; + } + } + } + op = op->Next; + if (startOp == op) break; + } + return result; +} +//------------------------------------------------------------------------------ + +bool Poly2ContainsPoly1(OutPt *OutPt1, OutPt *OutPt2) +{ + OutPt* op = OutPt1; + do + { + //nb: PointInPolygon returns 0 if false, +1 if true, -1 if pt on polygon + int res = PointInPolygon(op->Pt, OutPt2); + if (res >= 0) return res > 0; + op = op->Next; + } + while (op != OutPt1); + return true; +} +//---------------------------------------------------------------------- + +bool SlopesEqual(const TEdge &e1, const TEdge &e2, bool UseFullInt64Range) +{ +#ifndef use_int32 + if (UseFullInt64Range) + return Int128Mul(e1.Top.Y - e1.Bot.Y, e2.Top.X - e2.Bot.X) == + Int128Mul(e1.Top.X - e1.Bot.X, e2.Top.Y - e2.Bot.Y); + else +#endif + return (e1.Top.Y - e1.Bot.Y) * (e2.Top.X - e2.Bot.X) == + (e1.Top.X - e1.Bot.X) * (e2.Top.Y - e2.Bot.Y); +} +//------------------------------------------------------------------------------ + +bool SlopesEqual(const IntPoint pt1, const IntPoint pt2, + const IntPoint pt3, bool UseFullInt64Range) +{ +#ifndef use_int32 + if (UseFullInt64Range) + return Int128Mul(pt1.Y-pt2.Y, pt2.X-pt3.X) == Int128Mul(pt1.X-pt2.X, pt2.Y-pt3.Y); + else +#endif + return (pt1.Y-pt2.Y)*(pt2.X-pt3.X) == (pt1.X-pt2.X)*(pt2.Y-pt3.Y); +} +//------------------------------------------------------------------------------ + +bool SlopesEqual(const IntPoint pt1, const IntPoint pt2, + const IntPoint pt3, const IntPoint pt4, bool UseFullInt64Range) +{ +#ifndef use_int32 + if (UseFullInt64Range) + return Int128Mul(pt1.Y-pt2.Y, pt3.X-pt4.X) == Int128Mul(pt1.X-pt2.X, pt3.Y-pt4.Y); + else +#endif + return (pt1.Y-pt2.Y)*(pt3.X-pt4.X) == (pt1.X-pt2.X)*(pt3.Y-pt4.Y); +} +//------------------------------------------------------------------------------ + +inline bool IsHorizontal(TEdge &e) +{ + return e.Dx == HORIZONTAL; +} +//------------------------------------------------------------------------------ + +inline double GetDx(const IntPoint pt1, const IntPoint pt2) +{ + return (pt1.Y == pt2.Y) ? + HORIZONTAL : (double)(pt2.X - pt1.X) / (pt2.Y - pt1.Y); +} +//--------------------------------------------------------------------------- + +inline void SetDx(TEdge &e) +{ + cInt dy = (e.Top.Y - e.Bot.Y); + if (dy == 0) e.Dx = HORIZONTAL; + else e.Dx = (double)(e.Top.X - e.Bot.X) / dy; +} +//--------------------------------------------------------------------------- + +inline void SwapSides(TEdge &Edge1, TEdge &Edge2) +{ + EdgeSide Side = Edge1.Side; + Edge1.Side = Edge2.Side; + Edge2.Side = Side; +} +//------------------------------------------------------------------------------ + +inline void SwapPolyIndexes(TEdge &Edge1, TEdge &Edge2) +{ + int OutIdx = Edge1.OutIdx; + Edge1.OutIdx = Edge2.OutIdx; + Edge2.OutIdx = OutIdx; +} +//------------------------------------------------------------------------------ + +inline cInt TopX(TEdge &edge, const cInt currentY) +{ + return ( currentY == edge.Top.Y ) ? + edge.Top.X : edge.Bot.X + Round(edge.Dx *(currentY - edge.Bot.Y)); +} +//------------------------------------------------------------------------------ + +void IntersectPoint(TEdge &Edge1, TEdge &Edge2, IntPoint &ip) +{ +#ifdef use_xyz + ip.Z = 0; +#endif + + double b1, b2; + if (Edge1.Dx == Edge2.Dx) + { + ip.Y = Edge1.Curr.Y; + ip.X = TopX(Edge1, ip.Y); + return; + } + else if (Edge1.Dx == 0) + { + ip.X = Edge1.Bot.X; + if (IsHorizontal(Edge2)) + ip.Y = Edge2.Bot.Y; + else + { + b2 = Edge2.Bot.Y - (Edge2.Bot.X / Edge2.Dx); + ip.Y = Round(ip.X / Edge2.Dx + b2); + } + } + else if (Edge2.Dx == 0) + { + ip.X = Edge2.Bot.X; + if (IsHorizontal(Edge1)) + ip.Y = Edge1.Bot.Y; + else + { + b1 = Edge1.Bot.Y - (Edge1.Bot.X / Edge1.Dx); + ip.Y = Round(ip.X / Edge1.Dx + b1); + } + } + else + { + b1 = Edge1.Bot.X - Edge1.Bot.Y * Edge1.Dx; + b2 = Edge2.Bot.X - Edge2.Bot.Y * Edge2.Dx; + double q = (b2-b1) / (Edge1.Dx - Edge2.Dx); + ip.Y = Round(q); + if (std::fabs(Edge1.Dx) < std::fabs(Edge2.Dx)) + ip.X = Round(Edge1.Dx * q + b1); + else + ip.X = Round(Edge2.Dx * q + b2); + } + + if (ip.Y < Edge1.Top.Y || ip.Y < Edge2.Top.Y) + { + if (Edge1.Top.Y > Edge2.Top.Y) + ip.Y = Edge1.Top.Y; + else + ip.Y = Edge2.Top.Y; + if (std::fabs(Edge1.Dx) < std::fabs(Edge2.Dx)) + ip.X = TopX(Edge1, ip.Y); + else + ip.X = TopX(Edge2, ip.Y); + } + //finally, don't allow 'ip' to be BELOW curr.Y (ie bottom of scanbeam) ... + if (ip.Y > Edge1.Curr.Y) + { + ip.Y = Edge1.Curr.Y; + //use the more vertical edge to derive X ... + if (std::fabs(Edge1.Dx) > std::fabs(Edge2.Dx)) + ip.X = TopX(Edge2, ip.Y); else + ip.X = TopX(Edge1, ip.Y); + } +} +//------------------------------------------------------------------------------ + +void ReversePolyPtLinks(OutPt *pp) +{ + if (!pp) return; + OutPt *pp1, *pp2; + pp1 = pp; + do { + pp2 = pp1->Next; + pp1->Next = pp1->Prev; + pp1->Prev = pp2; + pp1 = pp2; + } while( pp1 != pp ); +} +//------------------------------------------------------------------------------ + +void DisposeOutPts(OutPt*& pp) +{ + if (pp == 0) return; + pp->Prev->Next = 0; + while( pp ) + { + OutPt *tmpPp = pp; + pp = pp->Next; + delete tmpPp; + } +} +//------------------------------------------------------------------------------ + +inline void InitEdge(TEdge* e, TEdge* eNext, TEdge* ePrev, const IntPoint& Pt) +{ + std::memset(e, 0, sizeof(TEdge)); + e->Next = eNext; + e->Prev = ePrev; + e->Curr = Pt; + e->OutIdx = Unassigned; +} +//------------------------------------------------------------------------------ + +void InitEdge2(TEdge& e, PolyType Pt) +{ + if (e.Curr.Y >= e.Next->Curr.Y) + { + e.Bot = e.Curr; + e.Top = e.Next->Curr; + } else + { + e.Top = e.Curr; + e.Bot = e.Next->Curr; + } + SetDx(e); + e.PolyTyp = Pt; +} +//------------------------------------------------------------------------------ + +TEdge* RemoveEdge(TEdge* e) +{ + //removes e from double_linked_list (but without removing from memory) + e->Prev->Next = e->Next; + e->Next->Prev = e->Prev; + TEdge* result = e->Next; + e->Prev = 0; //flag as removed (see ClipperBase.Clear) + return result; +} +//------------------------------------------------------------------------------ + +inline void ReverseHorizontal(TEdge &e) +{ + //swap horizontal edges' Top and Bottom x's so they follow the natural + //progression of the bounds - ie so their xbots will align with the + //adjoining lower edge. [Helpful in the ProcessHorizontal() method.] + std::swap(e.Top.X, e.Bot.X); +#ifdef use_xyz + std::swap(e.Top.Z, e.Bot.Z); +#endif +} +//------------------------------------------------------------------------------ + +void SwapPoints(IntPoint &pt1, IntPoint &pt2) +{ + IntPoint tmp = pt1; + pt1 = pt2; + pt2 = tmp; +} +//------------------------------------------------------------------------------ + +bool GetOverlapSegment(IntPoint pt1a, IntPoint pt1b, IntPoint pt2a, + IntPoint pt2b, IntPoint &pt1, IntPoint &pt2) +{ + //precondition: segments are Collinear. + if (Abs(pt1a.X - pt1b.X) > Abs(pt1a.Y - pt1b.Y)) + { + if (pt1a.X > pt1b.X) SwapPoints(pt1a, pt1b); + if (pt2a.X > pt2b.X) SwapPoints(pt2a, pt2b); + if (pt1a.X > pt2a.X) pt1 = pt1a; else pt1 = pt2a; + if (pt1b.X < pt2b.X) pt2 = pt1b; else pt2 = pt2b; + return pt1.X < pt2.X; + } else + { + if (pt1a.Y < pt1b.Y) SwapPoints(pt1a, pt1b); + if (pt2a.Y < pt2b.Y) SwapPoints(pt2a, pt2b); + if (pt1a.Y < pt2a.Y) pt1 = pt1a; else pt1 = pt2a; + if (pt1b.Y > pt2b.Y) pt2 = pt1b; else pt2 = pt2b; + return pt1.Y > pt2.Y; + } +} +//------------------------------------------------------------------------------ + +bool FirstIsBottomPt(const OutPt* btmPt1, const OutPt* btmPt2) +{ + OutPt *p = btmPt1->Prev; + while ((p->Pt == btmPt1->Pt) && (p != btmPt1)) p = p->Prev; + double dx1p = std::fabs(GetDx(btmPt1->Pt, p->Pt)); + p = btmPt1->Next; + while ((p->Pt == btmPt1->Pt) && (p != btmPt1)) p = p->Next; + double dx1n = std::fabs(GetDx(btmPt1->Pt, p->Pt)); + + p = btmPt2->Prev; + while ((p->Pt == btmPt2->Pt) && (p != btmPt2)) p = p->Prev; + double dx2p = std::fabs(GetDx(btmPt2->Pt, p->Pt)); + p = btmPt2->Next; + while ((p->Pt == btmPt2->Pt) && (p != btmPt2)) p = p->Next; + double dx2n = std::fabs(GetDx(btmPt2->Pt, p->Pt)); + + if (std::max(dx1p, dx1n) == std::max(dx2p, dx2n) && + std::min(dx1p, dx1n) == std::min(dx2p, dx2n)) + return Area(btmPt1) > 0; //if otherwise identical use orientation + else + return (dx1p >= dx2p && dx1p >= dx2n) || (dx1n >= dx2p && dx1n >= dx2n); +} +//------------------------------------------------------------------------------ + +OutPt* GetBottomPt(OutPt *pp) +{ + OutPt* dups = 0; + OutPt* p = pp->Next; + while (p != pp) + { + if (p->Pt.Y > pp->Pt.Y) + { + pp = p; + dups = 0; + } + else if (p->Pt.Y == pp->Pt.Y && p->Pt.X <= pp->Pt.X) + { + if (p->Pt.X < pp->Pt.X) + { + dups = 0; + pp = p; + } else + { + if (p->Next != pp && p->Prev != pp) dups = p; + } + } + p = p->Next; + } + if (dups) + { + //there appears to be at least 2 vertices at BottomPt so ... + while (dups != p) + { + if (!FirstIsBottomPt(p, dups)) pp = dups; + dups = dups->Next; + while (dups->Pt != pp->Pt) dups = dups->Next; + } + } + return pp; +} +//------------------------------------------------------------------------------ + +bool Pt2IsBetweenPt1AndPt3(const IntPoint pt1, + const IntPoint pt2, const IntPoint pt3) +{ + if ((pt1 == pt3) || (pt1 == pt2) || (pt3 == pt2)) + return false; + else if (pt1.X != pt3.X) + return (pt2.X > pt1.X) == (pt2.X < pt3.X); + else + return (pt2.Y > pt1.Y) == (pt2.Y < pt3.Y); +} +//------------------------------------------------------------------------------ + +bool HorzSegmentsOverlap(cInt seg1a, cInt seg1b, cInt seg2a, cInt seg2b) +{ + if (seg1a > seg1b) std::swap(seg1a, seg1b); + if (seg2a > seg2b) std::swap(seg2a, seg2b); + return (seg1a < seg2b) && (seg2a < seg1b); +} + +//------------------------------------------------------------------------------ +// ClipperBase class methods ... +//------------------------------------------------------------------------------ + +ClipperBase::ClipperBase() //constructor +{ + m_CurrentLM = m_MinimaList.begin(); //begin() == end() here + m_UseFullRange = false; +} +//------------------------------------------------------------------------------ + +ClipperBase::~ClipperBase() //destructor +{ + Clear(); +} +//------------------------------------------------------------------------------ + +void RangeTest(const IntPoint& Pt, bool& useFullRange) +{ + if (useFullRange) + { + if (Pt.X > hiRange || Pt.Y > hiRange || -Pt.X > hiRange || -Pt.Y > hiRange) + throw clipperException("Coordinate outside allowed range"); + } + else if (Pt.X > loRange|| Pt.Y > loRange || -Pt.X > loRange || -Pt.Y > loRange) + { + useFullRange = true; + RangeTest(Pt, useFullRange); + } +} +//------------------------------------------------------------------------------ + +TEdge* FindNextLocMin(TEdge* E) +{ + for (;;) + { + while (E->Bot != E->Prev->Bot || E->Curr == E->Top) E = E->Next; + if (!IsHorizontal(*E) && !IsHorizontal(*E->Prev)) break; + while (IsHorizontal(*E->Prev)) E = E->Prev; + TEdge* E2 = E; + while (IsHorizontal(*E)) E = E->Next; + if (E->Top.Y == E->Prev->Bot.Y) continue; //ie just an intermediate horz. + if (E2->Prev->Bot.X < E->Bot.X) E = E2; + break; + } + return E; +} +//------------------------------------------------------------------------------ + +TEdge* ClipperBase::ProcessBound(TEdge* E, bool NextIsForward) +{ + TEdge *Result = E; + TEdge *Horz = 0; + + if (E->OutIdx == Skip) + { + //if edges still remain in the current bound beyond the skip edge then + //create another LocMin and call ProcessBound once more + if (NextIsForward) + { + while (E->Top.Y == E->Next->Bot.Y) E = E->Next; + //don't include top horizontals when parsing a bound a second time, + //they will be contained in the opposite bound ... + while (E != Result && IsHorizontal(*E)) E = E->Prev; + } + else + { + while (E->Top.Y == E->Prev->Bot.Y) E = E->Prev; + while (E != Result && IsHorizontal(*E)) E = E->Next; + } + + if (E == Result) + { + if (NextIsForward) Result = E->Next; + else Result = E->Prev; + } + else + { + //there are more edges in the bound beyond result starting with E + if (NextIsForward) + E = Result->Next; + else + E = Result->Prev; + MinimaList::value_type locMin; + locMin.Y = E->Bot.Y; + locMin.LeftBound = 0; + locMin.RightBound = E; + E->WindDelta = 0; + Result = ProcessBound(E, NextIsForward); + m_MinimaList.push_back(locMin); + } + return Result; + } + + TEdge *EStart; + + if (IsHorizontal(*E)) + { + //We need to be careful with open paths because this may not be a + //true local minima (ie E may be following a skip edge). + //Also, consecutive horz. edges may start heading left before going right. + if (NextIsForward) + EStart = E->Prev; + else + EStart = E->Next; + if (IsHorizontal(*EStart)) //ie an adjoining horizontal skip edge + { + if (EStart->Bot.X != E->Bot.X && EStart->Top.X != E->Bot.X) + ReverseHorizontal(*E); + } + else if (EStart->Bot.X != E->Bot.X) + ReverseHorizontal(*E); + } + + EStart = E; + if (NextIsForward) + { + while (Result->Top.Y == Result->Next->Bot.Y && Result->Next->OutIdx != Skip) + Result = Result->Next; + if (IsHorizontal(*Result) && Result->Next->OutIdx != Skip) + { + //nb: at the top of a bound, horizontals are added to the bound + //only when the preceding edge attaches to the horizontal's left vertex + //unless a Skip edge is encountered when that becomes the top divide + Horz = Result; + while (IsHorizontal(*Horz->Prev)) Horz = Horz->Prev; + if (Horz->Prev->Top.X > Result->Next->Top.X) Result = Horz->Prev; + } + while (E != Result) + { + E->NextInLML = E->Next; + if (IsHorizontal(*E) && E != EStart && + E->Bot.X != E->Prev->Top.X) ReverseHorizontal(*E); + E = E->Next; + } + if (IsHorizontal(*E) && E != EStart && E->Bot.X != E->Prev->Top.X) + ReverseHorizontal(*E); + Result = Result->Next; //move to the edge just beyond current bound + } else + { + while (Result->Top.Y == Result->Prev->Bot.Y && Result->Prev->OutIdx != Skip) + Result = Result->Prev; + if (IsHorizontal(*Result) && Result->Prev->OutIdx != Skip) + { + Horz = Result; + while (IsHorizontal(*Horz->Next)) Horz = Horz->Next; + if (Horz->Next->Top.X == Result->Prev->Top.X || + Horz->Next->Top.X > Result->Prev->Top.X) Result = Horz->Next; + } + + while (E != Result) + { + E->NextInLML = E->Prev; + if (IsHorizontal(*E) && E != EStart && E->Bot.X != E->Next->Top.X) + ReverseHorizontal(*E); + E = E->Prev; + } + if (IsHorizontal(*E) && E != EStart && E->Bot.X != E->Next->Top.X) + ReverseHorizontal(*E); + Result = Result->Prev; //move to the edge just beyond current bound + } + + return Result; +} +//------------------------------------------------------------------------------ + +bool ClipperBase::AddPath(const Path &pg, PolyType PolyTyp, bool Closed) +{ +#ifdef use_lines + if (!Closed && PolyTyp == ptClip) + throw clipperException("AddPath: Open paths must be subject."); +#else + if (!Closed) + throw clipperException("AddPath: Open paths have been disabled."); +#endif + + int highI = (int)pg.size() -1; + if (Closed) while (highI > 0 && (pg[highI] == pg[0])) --highI; + while (highI > 0 && (pg[highI] == pg[highI -1])) --highI; + if ((Closed && highI < 2) || (!Closed && highI < 1)) return false; + + //create a new edge array ... + TEdge *edges = new TEdge [highI +1]; + + bool IsFlat = true; + //1. Basic (first) edge initialization ... + try + { + edges[1].Curr = pg[1]; + RangeTest(pg[0], m_UseFullRange); + RangeTest(pg[highI], m_UseFullRange); + InitEdge(&edges[0], &edges[1], &edges[highI], pg[0]); + InitEdge(&edges[highI], &edges[0], &edges[highI-1], pg[highI]); + for (int i = highI - 1; i >= 1; --i) + { + RangeTest(pg[i], m_UseFullRange); + InitEdge(&edges[i], &edges[i+1], &edges[i-1], pg[i]); + } + } + catch(...) + { + delete [] edges; + throw; //range test fails + } + TEdge *eStart = &edges[0]; + + //2. Remove duplicate vertices, and (when closed) collinear edges ... + TEdge *E = eStart, *eLoopStop = eStart; + for (;;) + { + //nb: allows matching start and end points when not Closed ... + if (E->Curr == E->Next->Curr && (Closed || E->Next != eStart)) + { + if (E == E->Next) break; + if (E == eStart) eStart = E->Next; + E = RemoveEdge(E); + eLoopStop = E; + continue; + } + if (E->Prev == E->Next) + break; //only two vertices + else if (Closed && + SlopesEqual(E->Prev->Curr, E->Curr, E->Next->Curr, m_UseFullRange) && + (!m_PreserveCollinear || + !Pt2IsBetweenPt1AndPt3(E->Prev->Curr, E->Curr, E->Next->Curr))) + { + //Collinear edges are allowed for open paths but in closed paths + //the default is to merge adjacent collinear edges into a single edge. + //However, if the PreserveCollinear property is enabled, only overlapping + //collinear edges (ie spikes) will be removed from closed paths. + if (E == eStart) eStart = E->Next; + E = RemoveEdge(E); + E = E->Prev; + eLoopStop = E; + continue; + } + E = E->Next; + if ((E == eLoopStop) || (!Closed && E->Next == eStart)) break; + } + + if ((!Closed && (E == E->Next)) || (Closed && (E->Prev == E->Next))) + { + delete [] edges; + return false; + } + + if (!Closed) + { + m_HasOpenPaths = true; + eStart->Prev->OutIdx = Skip; + } + + //3. Do second stage of edge initialization ... + E = eStart; + do + { + InitEdge2(*E, PolyTyp); + E = E->Next; + if (IsFlat && E->Curr.Y != eStart->Curr.Y) IsFlat = false; + } + while (E != eStart); + + //4. Finally, add edge bounds to LocalMinima list ... + + //Totally flat paths must be handled differently when adding them + //to LocalMinima list to avoid endless loops etc ... + if (IsFlat) + { + if (Closed) + { + delete [] edges; + return false; + } + E->Prev->OutIdx = Skip; + MinimaList::value_type locMin; + locMin.Y = E->Bot.Y; + locMin.LeftBound = 0; + locMin.RightBound = E; + locMin.RightBound->Side = esRight; + locMin.RightBound->WindDelta = 0; + for (;;) + { + if (E->Bot.X != E->Prev->Top.X) ReverseHorizontal(*E); + if (E->Next->OutIdx == Skip) break; + E->NextInLML = E->Next; + E = E->Next; + } + m_MinimaList.push_back(locMin); + m_edges.push_back(edges); + return true; + } + + m_edges.push_back(edges); + bool leftBoundIsForward; + TEdge* EMin = 0; + + //workaround to avoid an endless loop in the while loop below when + //open paths have matching start and end points ... + if (E->Prev->Bot == E->Prev->Top) E = E->Next; + + for (;;) + { + E = FindNextLocMin(E); + if (E == EMin) break; + else if (!EMin) EMin = E; + + //E and E.Prev now share a local minima (left aligned if horizontal). + //Compare their slopes to find which starts which bound ... + MinimaList::value_type locMin; + locMin.Y = E->Bot.Y; + if (E->Dx < E->Prev->Dx) + { + locMin.LeftBound = E->Prev; + locMin.RightBound = E; + leftBoundIsForward = false; //Q.nextInLML = Q.prev + } else + { + locMin.LeftBound = E; + locMin.RightBound = E->Prev; + leftBoundIsForward = true; //Q.nextInLML = Q.next + } + + if (!Closed) locMin.LeftBound->WindDelta = 0; + else if (locMin.LeftBound->Next == locMin.RightBound) + locMin.LeftBound->WindDelta = -1; + else locMin.LeftBound->WindDelta = 1; + locMin.RightBound->WindDelta = -locMin.LeftBound->WindDelta; + + E = ProcessBound(locMin.LeftBound, leftBoundIsForward); + if (E->OutIdx == Skip) E = ProcessBound(E, leftBoundIsForward); + + TEdge* E2 = ProcessBound(locMin.RightBound, !leftBoundIsForward); + if (E2->OutIdx == Skip) E2 = ProcessBound(E2, !leftBoundIsForward); + + if (locMin.LeftBound->OutIdx == Skip) + locMin.LeftBound = 0; + else if (locMin.RightBound->OutIdx == Skip) + locMin.RightBound = 0; + m_MinimaList.push_back(locMin); + if (!leftBoundIsForward) E = E2; + } + return true; +} +//------------------------------------------------------------------------------ + +bool ClipperBase::AddPaths(const Paths &ppg, PolyType PolyTyp, bool Closed) +{ + bool result = false; + for (Paths::size_type i = 0; i < ppg.size(); ++i) + if (AddPath(ppg[i], PolyTyp, Closed)) result = true; + return result; +} +//------------------------------------------------------------------------------ + +void ClipperBase::Clear() +{ + DisposeLocalMinimaList(); + for (EdgeList::size_type i = 0; i < m_edges.size(); ++i) + { + TEdge* edges = m_edges[i]; + delete [] edges; + } + m_edges.clear(); + m_UseFullRange = false; + m_HasOpenPaths = false; +} +//------------------------------------------------------------------------------ + +void ClipperBase::Reset() +{ + m_CurrentLM = m_MinimaList.begin(); + if (m_CurrentLM == m_MinimaList.end()) return; //ie nothing to process + std::sort(m_MinimaList.begin(), m_MinimaList.end(), LocMinSorter()); + + m_Scanbeam = ScanbeamList(); //clears/resets priority_queue + //reset all edges ... + for (MinimaList::iterator lm = m_MinimaList.begin(); lm != m_MinimaList.end(); ++lm) + { + InsertScanbeam(lm->Y); + TEdge* e = lm->LeftBound; + if (e) + { + e->Curr = e->Bot; + e->Side = esLeft; + e->OutIdx = Unassigned; + } + + e = lm->RightBound; + if (e) + { + e->Curr = e->Bot; + e->Side = esRight; + e->OutIdx = Unassigned; + } + } + m_ActiveEdges = 0; + m_CurrentLM = m_MinimaList.begin(); +} +//------------------------------------------------------------------------------ + +void ClipperBase::DisposeLocalMinimaList() +{ + m_MinimaList.clear(); + m_CurrentLM = m_MinimaList.begin(); +} +//------------------------------------------------------------------------------ + +bool ClipperBase::PopLocalMinima(cInt Y, const LocalMinimum *&locMin) +{ + if (m_CurrentLM == m_MinimaList.end() || (*m_CurrentLM).Y != Y) return false; + locMin = &(*m_CurrentLM); + ++m_CurrentLM; + return true; +} +//------------------------------------------------------------------------------ + +IntRect ClipperBase::GetBounds() +{ + IntRect result; + MinimaList::iterator lm = m_MinimaList.begin(); + if (lm == m_MinimaList.end()) + { + result.left = result.top = result.right = result.bottom = 0; + return result; + } + result.left = lm->LeftBound->Bot.X; + result.top = lm->LeftBound->Bot.Y; + result.right = lm->LeftBound->Bot.X; + result.bottom = lm->LeftBound->Bot.Y; + while (lm != m_MinimaList.end()) + { + //todo - needs fixing for open paths + result.bottom = std::max(result.bottom, lm->LeftBound->Bot.Y); + TEdge* e = lm->LeftBound; + for (;;) { + TEdge* bottomE = e; + while (e->NextInLML) + { + if (e->Bot.X < result.left) result.left = e->Bot.X; + if (e->Bot.X > result.right) result.right = e->Bot.X; + e = e->NextInLML; + } + result.left = std::min(result.left, e->Bot.X); + result.right = std::max(result.right, e->Bot.X); + result.left = std::min(result.left, e->Top.X); + result.right = std::max(result.right, e->Top.X); + result.top = std::min(result.top, e->Top.Y); + if (bottomE == lm->LeftBound) e = lm->RightBound; + else break; + } + ++lm; + } + return result; +} +//------------------------------------------------------------------------------ + +void ClipperBase::InsertScanbeam(const cInt Y) +{ + m_Scanbeam.push(Y); +} +//------------------------------------------------------------------------------ + +bool ClipperBase::PopScanbeam(cInt &Y) +{ + if (m_Scanbeam.empty()) return false; + Y = m_Scanbeam.top(); + m_Scanbeam.pop(); + while (!m_Scanbeam.empty() && Y == m_Scanbeam.top()) { m_Scanbeam.pop(); } // Pop duplicates. + return true; +} +//------------------------------------------------------------------------------ + +void ClipperBase::DisposeAllOutRecs(){ + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); ++i) + DisposeOutRec(i); + m_PolyOuts.clear(); +} +//------------------------------------------------------------------------------ + +void ClipperBase::DisposeOutRec(PolyOutList::size_type index) +{ + OutRec *outRec = m_PolyOuts[index]; + if (outRec->Pts) DisposeOutPts(outRec->Pts); + delete outRec; + m_PolyOuts[index] = 0; +} +//------------------------------------------------------------------------------ + +void ClipperBase::DeleteFromAEL(TEdge *e) +{ + TEdge* AelPrev = e->PrevInAEL; + TEdge* AelNext = e->NextInAEL; + if (!AelPrev && !AelNext && (e != m_ActiveEdges)) return; //already deleted + if (AelPrev) AelPrev->NextInAEL = AelNext; + else m_ActiveEdges = AelNext; + if (AelNext) AelNext->PrevInAEL = AelPrev; + e->NextInAEL = 0; + e->PrevInAEL = 0; +} +//------------------------------------------------------------------------------ + +OutRec* ClipperBase::CreateOutRec() +{ + OutRec* result = new OutRec; + result->IsHole = false; + result->IsOpen = false; + result->FirstLeft = 0; + result->Pts = 0; + result->BottomPt = 0; + result->PolyNd = 0; + m_PolyOuts.push_back(result); + result->Idx = (int)m_PolyOuts.size() - 1; + return result; +} +//------------------------------------------------------------------------------ + +void ClipperBase::SwapPositionsInAEL(TEdge *Edge1, TEdge *Edge2) +{ + //check that one or other edge hasn't already been removed from AEL ... + if (Edge1->NextInAEL == Edge1->PrevInAEL || + Edge2->NextInAEL == Edge2->PrevInAEL) return; + + if (Edge1->NextInAEL == Edge2) + { + TEdge* Next = Edge2->NextInAEL; + if (Next) Next->PrevInAEL = Edge1; + TEdge* Prev = Edge1->PrevInAEL; + if (Prev) Prev->NextInAEL = Edge2; + Edge2->PrevInAEL = Prev; + Edge2->NextInAEL = Edge1; + Edge1->PrevInAEL = Edge2; + Edge1->NextInAEL = Next; + } + else if (Edge2->NextInAEL == Edge1) + { + TEdge* Next = Edge1->NextInAEL; + if (Next) Next->PrevInAEL = Edge2; + TEdge* Prev = Edge2->PrevInAEL; + if (Prev) Prev->NextInAEL = Edge1; + Edge1->PrevInAEL = Prev; + Edge1->NextInAEL = Edge2; + Edge2->PrevInAEL = Edge1; + Edge2->NextInAEL = Next; + } + else + { + TEdge* Next = Edge1->NextInAEL; + TEdge* Prev = Edge1->PrevInAEL; + Edge1->NextInAEL = Edge2->NextInAEL; + if (Edge1->NextInAEL) Edge1->NextInAEL->PrevInAEL = Edge1; + Edge1->PrevInAEL = Edge2->PrevInAEL; + if (Edge1->PrevInAEL) Edge1->PrevInAEL->NextInAEL = Edge1; + Edge2->NextInAEL = Next; + if (Edge2->NextInAEL) Edge2->NextInAEL->PrevInAEL = Edge2; + Edge2->PrevInAEL = Prev; + if (Edge2->PrevInAEL) Edge2->PrevInAEL->NextInAEL = Edge2; + } + + if (!Edge1->PrevInAEL) m_ActiveEdges = Edge1; + else if (!Edge2->PrevInAEL) m_ActiveEdges = Edge2; +} +//------------------------------------------------------------------------------ + +void ClipperBase::UpdateEdgeIntoAEL(TEdge *&e) +{ + if (!e->NextInLML) + throw clipperException("UpdateEdgeIntoAEL: invalid call"); + + e->NextInLML->OutIdx = e->OutIdx; + TEdge* AelPrev = e->PrevInAEL; + TEdge* AelNext = e->NextInAEL; + if (AelPrev) AelPrev->NextInAEL = e->NextInLML; + else m_ActiveEdges = e->NextInLML; + if (AelNext) AelNext->PrevInAEL = e->NextInLML; + e->NextInLML->Side = e->Side; + e->NextInLML->WindDelta = e->WindDelta; + e->NextInLML->WindCnt = e->WindCnt; + e->NextInLML->WindCnt2 = e->WindCnt2; + e = e->NextInLML; + e->Curr = e->Bot; + e->PrevInAEL = AelPrev; + e->NextInAEL = AelNext; + if (!IsHorizontal(*e)) InsertScanbeam(e->Top.Y); +} +//------------------------------------------------------------------------------ + +bool ClipperBase::LocalMinimaPending() +{ + return (m_CurrentLM != m_MinimaList.end()); +} + +//------------------------------------------------------------------------------ +// TClipper methods ... +//------------------------------------------------------------------------------ + +Clipper::Clipper(int initOptions) : ClipperBase() //constructor +{ + m_ExecuteLocked = false; + m_UseFullRange = false; + m_ReverseOutput = ((initOptions & ioReverseSolution) != 0); + m_StrictSimple = ((initOptions & ioStrictlySimple) != 0); + m_PreserveCollinear = ((initOptions & ioPreserveCollinear) != 0); + m_HasOpenPaths = false; +#ifdef use_xyz + m_ZFill = 0; +#endif +} +//------------------------------------------------------------------------------ + +#ifdef use_xyz +void Clipper::ZFillFunction(ZFillCallback zFillFunc) +{ + m_ZFill = zFillFunc; +} +//------------------------------------------------------------------------------ +#endif + +bool Clipper::Execute(ClipType clipType, Paths &solution, PolyFillType fillType) +{ + return Execute(clipType, solution, fillType, fillType); +} +//------------------------------------------------------------------------------ + +bool Clipper::Execute(ClipType clipType, PolyTree &polytree, PolyFillType fillType) +{ + return Execute(clipType, polytree, fillType, fillType); +} +//------------------------------------------------------------------------------ + +bool Clipper::Execute(ClipType clipType, Paths &solution, + PolyFillType subjFillType, PolyFillType clipFillType) +{ + if( m_ExecuteLocked ) return false; + if (m_HasOpenPaths) + throw clipperException("Error: PolyTree struct is needed for open path clipping."); + m_ExecuteLocked = true; + solution.resize(0); + m_SubjFillType = subjFillType; + m_ClipFillType = clipFillType; + m_ClipType = clipType; + m_UsingPolyTree = false; + bool succeeded = ExecuteInternal(); + if (succeeded) BuildResult(solution); + DisposeAllOutRecs(); + m_ExecuteLocked = false; + return succeeded; +} +//------------------------------------------------------------------------------ + +bool Clipper::Execute(ClipType clipType, PolyTree& polytree, + PolyFillType subjFillType, PolyFillType clipFillType) +{ + if( m_ExecuteLocked ) return false; + m_ExecuteLocked = true; + m_SubjFillType = subjFillType; + m_ClipFillType = clipFillType; + m_ClipType = clipType; + m_UsingPolyTree = true; + bool succeeded = ExecuteInternal(); + if (succeeded) BuildResult2(polytree); + DisposeAllOutRecs(); + m_ExecuteLocked = false; + return succeeded; +} +//------------------------------------------------------------------------------ + +void Clipper::FixHoleLinkage(OutRec &outrec) +{ + //skip OutRecs that (a) contain outermost polygons or + //(b) already have the correct owner/child linkage ... + if (!outrec.FirstLeft || + (outrec.IsHole != outrec.FirstLeft->IsHole && + outrec.FirstLeft->Pts)) return; + + OutRec* orfl = outrec.FirstLeft; + while (orfl && ((orfl->IsHole == outrec.IsHole) || !orfl->Pts)) + orfl = orfl->FirstLeft; + outrec.FirstLeft = orfl; +} +//------------------------------------------------------------------------------ + +bool Clipper::ExecuteInternal() +{ + bool succeeded = true; + try { + Reset(); + m_Maxima = MaximaList(); + m_SortedEdges = 0; + + succeeded = true; + cInt botY, topY; + if (!PopScanbeam(botY)) return false; + InsertLocalMinimaIntoAEL(botY); + while (PopScanbeam(topY) || LocalMinimaPending()) + { + ProcessHorizontals(); + ClearGhostJoins(); + if (!ProcessIntersections(topY)) + { + succeeded = false; + break; + } + ProcessEdgesAtTopOfScanbeam(topY); + botY = topY; + InsertLocalMinimaIntoAEL(botY); + } + } + catch(...) + { + succeeded = false; + } + + if (succeeded) + { + //fix orientations ... + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); ++i) + { + OutRec *outRec = m_PolyOuts[i]; + if (!outRec->Pts || outRec->IsOpen) continue; + if ((outRec->IsHole ^ m_ReverseOutput) == (Area(*outRec) > 0)) + ReversePolyPtLinks(outRec->Pts); + } + + if (!m_Joins.empty()) JoinCommonEdges(); + + //unfortunately FixupOutPolygon() must be done after JoinCommonEdges() + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); ++i) + { + OutRec *outRec = m_PolyOuts[i]; + if (!outRec->Pts) continue; + if (outRec->IsOpen) + FixupOutPolyline(*outRec); + else + FixupOutPolygon(*outRec); + } + + if (m_StrictSimple) DoSimplePolygons(); + } + + ClearJoins(); + ClearGhostJoins(); + return succeeded; +} +//------------------------------------------------------------------------------ + +void Clipper::SetWindingCount(TEdge &edge) +{ + TEdge *e = edge.PrevInAEL; + //find the edge of the same polytype that immediately preceeds 'edge' in AEL + while (e && ((e->PolyTyp != edge.PolyTyp) || (e->WindDelta == 0))) e = e->PrevInAEL; + if (!e) + { + if (edge.WindDelta == 0) + { + PolyFillType pft = (edge.PolyTyp == ptSubject ? m_SubjFillType : m_ClipFillType); + edge.WindCnt = (pft == pftNegative ? -1 : 1); + } + else + edge.WindCnt = edge.WindDelta; + edge.WindCnt2 = 0; + e = m_ActiveEdges; //ie get ready to calc WindCnt2 + } + else if (edge.WindDelta == 0 && m_ClipType != ctUnion) + { + edge.WindCnt = 1; + edge.WindCnt2 = e->WindCnt2; + e = e->NextInAEL; //ie get ready to calc WindCnt2 + } + else if (IsEvenOddFillType(edge)) + { + //EvenOdd filling ... + if (edge.WindDelta == 0) + { + //are we inside a subj polygon ... + bool Inside = true; + TEdge *e2 = e->PrevInAEL; + while (e2) + { + if (e2->PolyTyp == e->PolyTyp && e2->WindDelta != 0) + Inside = !Inside; + e2 = e2->PrevInAEL; + } + edge.WindCnt = (Inside ? 0 : 1); + } + else + { + edge.WindCnt = edge.WindDelta; + } + edge.WindCnt2 = e->WindCnt2; + e = e->NextInAEL; //ie get ready to calc WindCnt2 + } + else + { + //nonZero, Positive or Negative filling ... + if (e->WindCnt * e->WindDelta < 0) + { + //prev edge is 'decreasing' WindCount (WC) toward zero + //so we're outside the previous polygon ... + if (Abs(e->WindCnt) > 1) + { + //outside prev poly but still inside another. + //when reversing direction of prev poly use the same WC + if (e->WindDelta * edge.WindDelta < 0) edge.WindCnt = e->WindCnt; + //otherwise continue to 'decrease' WC ... + else edge.WindCnt = e->WindCnt + edge.WindDelta; + } + else + //now outside all polys of same polytype so set own WC ... + edge.WindCnt = (edge.WindDelta == 0 ? 1 : edge.WindDelta); + } else + { + //prev edge is 'increasing' WindCount (WC) away from zero + //so we're inside the previous polygon ... + if (edge.WindDelta == 0) + edge.WindCnt = (e->WindCnt < 0 ? e->WindCnt - 1 : e->WindCnt + 1); + //if wind direction is reversing prev then use same WC + else if (e->WindDelta * edge.WindDelta < 0) edge.WindCnt = e->WindCnt; + //otherwise add to WC ... + else edge.WindCnt = e->WindCnt + edge.WindDelta; + } + edge.WindCnt2 = e->WindCnt2; + e = e->NextInAEL; //ie get ready to calc WindCnt2 + } + + //update WindCnt2 ... + if (IsEvenOddAltFillType(edge)) + { + //EvenOdd filling ... + while (e != &edge) + { + if (e->WindDelta != 0) + edge.WindCnt2 = (edge.WindCnt2 == 0 ? 1 : 0); + e = e->NextInAEL; + } + } else + { + //nonZero, Positive or Negative filling ... + while ( e != &edge ) + { + edge.WindCnt2 += e->WindDelta; + e = e->NextInAEL; + } + } +} +//------------------------------------------------------------------------------ + +bool Clipper::IsEvenOddFillType(const TEdge& edge) const +{ + if (edge.PolyTyp == ptSubject) + return m_SubjFillType == pftEvenOdd; else + return m_ClipFillType == pftEvenOdd; +} +//------------------------------------------------------------------------------ + +bool Clipper::IsEvenOddAltFillType(const TEdge& edge) const +{ + if (edge.PolyTyp == ptSubject) + return m_ClipFillType == pftEvenOdd; else + return m_SubjFillType == pftEvenOdd; +} +//------------------------------------------------------------------------------ + +bool Clipper::IsContributing(const TEdge& edge) const +{ + PolyFillType pft, pft2; + if (edge.PolyTyp == ptSubject) + { + pft = m_SubjFillType; + pft2 = m_ClipFillType; + } else + { + pft = m_ClipFillType; + pft2 = m_SubjFillType; + } + + switch(pft) + { + case pftEvenOdd: + //return false if a subj line has been flagged as inside a subj polygon + if (edge.WindDelta == 0 && edge.WindCnt != 1) return false; + break; + case pftNonZero: + if (Abs(edge.WindCnt) != 1) return false; + break; + case pftPositive: + if (edge.WindCnt != 1) return false; + break; + default: //pftNegative + if (edge.WindCnt != -1) return false; + } + + switch(m_ClipType) + { + case ctIntersection: + switch(pft2) + { + case pftEvenOdd: + case pftNonZero: + return (edge.WindCnt2 != 0); + case pftPositive: + return (edge.WindCnt2 > 0); + default: + return (edge.WindCnt2 < 0); + } + break; + case ctUnion: + switch(pft2) + { + case pftEvenOdd: + case pftNonZero: + return (edge.WindCnt2 == 0); + case pftPositive: + return (edge.WindCnt2 <= 0); + default: + return (edge.WindCnt2 >= 0); + } + break; + case ctDifference: + if (edge.PolyTyp == ptSubject) + switch(pft2) + { + case pftEvenOdd: + case pftNonZero: + return (edge.WindCnt2 == 0); + case pftPositive: + return (edge.WindCnt2 <= 0); + default: + return (edge.WindCnt2 >= 0); + } + else + switch(pft2) + { + case pftEvenOdd: + case pftNonZero: + return (edge.WindCnt2 != 0); + case pftPositive: + return (edge.WindCnt2 > 0); + default: + return (edge.WindCnt2 < 0); + } + break; + case ctXor: + if (edge.WindDelta == 0) //XOr always contributing unless open + switch(pft2) + { + case pftEvenOdd: + case pftNonZero: + return (edge.WindCnt2 == 0); + case pftPositive: + return (edge.WindCnt2 <= 0); + default: + return (edge.WindCnt2 >= 0); + } + else + return true; + break; + default: + return true; + } +} +//------------------------------------------------------------------------------ + +OutPt* Clipper::AddLocalMinPoly(TEdge *e1, TEdge *e2, const IntPoint &Pt) +{ + OutPt* result; + TEdge *e, *prevE; + if (IsHorizontal(*e2) || ( e1->Dx > e2->Dx )) + { + result = AddOutPt(e1, Pt); + e2->OutIdx = e1->OutIdx; + e1->Side = esLeft; + e2->Side = esRight; + e = e1; + if (e->PrevInAEL == e2) + prevE = e2->PrevInAEL; + else + prevE = e->PrevInAEL; + } else + { + result = AddOutPt(e2, Pt); + e1->OutIdx = e2->OutIdx; + e1->Side = esRight; + e2->Side = esLeft; + e = e2; + if (e->PrevInAEL == e1) + prevE = e1->PrevInAEL; + else + prevE = e->PrevInAEL; + } + + if (prevE && prevE->OutIdx >= 0 && prevE->Top.Y < Pt.Y && e->Top.Y < Pt.Y) + { + cInt xPrev = TopX(*prevE, Pt.Y); + cInt xE = TopX(*e, Pt.Y); + if (xPrev == xE && (e->WindDelta != 0) && (prevE->WindDelta != 0) && + SlopesEqual(IntPoint(xPrev, Pt.Y), prevE->Top, IntPoint(xE, Pt.Y), e->Top, m_UseFullRange)) + { + OutPt* outPt = AddOutPt(prevE, Pt); + AddJoin(result, outPt, e->Top); + } + } + return result; +} +//------------------------------------------------------------------------------ + +void Clipper::AddLocalMaxPoly(TEdge *e1, TEdge *e2, const IntPoint &Pt) +{ + AddOutPt( e1, Pt ); + if (e2->WindDelta == 0) AddOutPt(e2, Pt); + if( e1->OutIdx == e2->OutIdx ) + { + e1->OutIdx = Unassigned; + e2->OutIdx = Unassigned; + } + else if (e1->OutIdx < e2->OutIdx) + AppendPolygon(e1, e2); + else + AppendPolygon(e2, e1); +} +//------------------------------------------------------------------------------ + +void Clipper::AddEdgeToSEL(TEdge *edge) +{ + //SEL pointers in PEdge are reused to build a list of horizontal edges. + //However, we don't need to worry about order with horizontal edge processing. + if( !m_SortedEdges ) + { + m_SortedEdges = edge; + edge->PrevInSEL = 0; + edge->NextInSEL = 0; + } + else + { + edge->NextInSEL = m_SortedEdges; + edge->PrevInSEL = 0; + m_SortedEdges->PrevInSEL = edge; + m_SortedEdges = edge; + } +} +//------------------------------------------------------------------------------ + +bool Clipper::PopEdgeFromSEL(TEdge *&edge) +{ + if (!m_SortedEdges) return false; + edge = m_SortedEdges; + DeleteFromSEL(m_SortedEdges); + return true; +} +//------------------------------------------------------------------------------ + +void Clipper::CopyAELToSEL() +{ + TEdge* e = m_ActiveEdges; + m_SortedEdges = e; + while ( e ) + { + e->PrevInSEL = e->PrevInAEL; + e->NextInSEL = e->NextInAEL; + e = e->NextInAEL; + } +} +//------------------------------------------------------------------------------ + +void Clipper::AddJoin(OutPt *op1, OutPt *op2, const IntPoint OffPt) +{ + Join* j = new Join; + j->OutPt1 = op1; + j->OutPt2 = op2; + j->OffPt = OffPt; + m_Joins.push_back(j); +} +//------------------------------------------------------------------------------ + +void Clipper::ClearJoins() +{ + for (JoinList::size_type i = 0; i < m_Joins.size(); i++) + delete m_Joins[i]; + m_Joins.resize(0); +} +//------------------------------------------------------------------------------ + +void Clipper::ClearGhostJoins() +{ + for (JoinList::size_type i = 0; i < m_GhostJoins.size(); i++) + delete m_GhostJoins[i]; + m_GhostJoins.resize(0); +} +//------------------------------------------------------------------------------ + +void Clipper::AddGhostJoin(OutPt *op, const IntPoint OffPt) +{ + Join* j = new Join; + j->OutPt1 = op; + j->OutPt2 = 0; + j->OffPt = OffPt; + m_GhostJoins.push_back(j); +} +//------------------------------------------------------------------------------ + +void Clipper::InsertLocalMinimaIntoAEL(const cInt botY) +{ + const LocalMinimum *lm; + while (PopLocalMinima(botY, lm)) + { + TEdge* lb = lm->LeftBound; + TEdge* rb = lm->RightBound; + + OutPt *Op1 = 0; + if (!lb) + { + //nb: don't insert LB into either AEL or SEL + InsertEdgeIntoAEL(rb, 0); + SetWindingCount(*rb); + if (IsContributing(*rb)) + Op1 = AddOutPt(rb, rb->Bot); + } + else if (!rb) + { + InsertEdgeIntoAEL(lb, 0); + SetWindingCount(*lb); + if (IsContributing(*lb)) + Op1 = AddOutPt(lb, lb->Bot); + InsertScanbeam(lb->Top.Y); + } + else + { + InsertEdgeIntoAEL(lb, 0); + InsertEdgeIntoAEL(rb, lb); + SetWindingCount( *lb ); + rb->WindCnt = lb->WindCnt; + rb->WindCnt2 = lb->WindCnt2; + if (IsContributing(*lb)) + Op1 = AddLocalMinPoly(lb, rb, lb->Bot); + InsertScanbeam(lb->Top.Y); + } + + if (rb) + { + if (IsHorizontal(*rb)) + { + AddEdgeToSEL(rb); + if (rb->NextInLML) + InsertScanbeam(rb->NextInLML->Top.Y); + } + else InsertScanbeam( rb->Top.Y ); + } + + if (!lb || !rb) continue; + + //if any output polygons share an edge, they'll need joining later ... + if (Op1 && IsHorizontal(*rb) && + m_GhostJoins.size() > 0 && (rb->WindDelta != 0)) + { + for (JoinList::size_type i = 0; i < m_GhostJoins.size(); ++i) + { + Join* jr = m_GhostJoins[i]; + //if the horizontal Rb and a 'ghost' horizontal overlap, then convert + //the 'ghost' join to a real join ready for later ... + if (HorzSegmentsOverlap(jr->OutPt1->Pt.X, jr->OffPt.X, rb->Bot.X, rb->Top.X)) + AddJoin(jr->OutPt1, Op1, jr->OffPt); + } + } + + if (lb->OutIdx >= 0 && lb->PrevInAEL && + lb->PrevInAEL->Curr.X == lb->Bot.X && + lb->PrevInAEL->OutIdx >= 0 && + SlopesEqual(lb->PrevInAEL->Bot, lb->PrevInAEL->Top, lb->Curr, lb->Top, m_UseFullRange) && + (lb->WindDelta != 0) && (lb->PrevInAEL->WindDelta != 0)) + { + OutPt *Op2 = AddOutPt(lb->PrevInAEL, lb->Bot); + AddJoin(Op1, Op2, lb->Top); + } + + if(lb->NextInAEL != rb) + { + + if (rb->OutIdx >= 0 && rb->PrevInAEL->OutIdx >= 0 && + SlopesEqual(rb->PrevInAEL->Curr, rb->PrevInAEL->Top, rb->Curr, rb->Top, m_UseFullRange) && + (rb->WindDelta != 0) && (rb->PrevInAEL->WindDelta != 0)) + { + OutPt *Op2 = AddOutPt(rb->PrevInAEL, rb->Bot); + AddJoin(Op1, Op2, rb->Top); + } + + TEdge* e = lb->NextInAEL; + if (e) + { + while( e != rb ) + { + //nb: For calculating winding counts etc, IntersectEdges() assumes + //that param1 will be to the Right of param2 ABOVE the intersection ... + IntersectEdges(rb , e , lb->Curr); //order important here + e = e->NextInAEL; + } + } + } + + } +} +//------------------------------------------------------------------------------ + +void Clipper::DeleteFromSEL(TEdge *e) +{ + TEdge* SelPrev = e->PrevInSEL; + TEdge* SelNext = e->NextInSEL; + if( !SelPrev && !SelNext && (e != m_SortedEdges) ) return; //already deleted + if( SelPrev ) SelPrev->NextInSEL = SelNext; + else m_SortedEdges = SelNext; + if( SelNext ) SelNext->PrevInSEL = SelPrev; + e->NextInSEL = 0; + e->PrevInSEL = 0; +} +//------------------------------------------------------------------------------ + +#ifdef use_xyz +void Clipper::SetZ(IntPoint& pt, TEdge& e1, TEdge& e2) +{ + if (pt.Z != 0 || !m_ZFill) return; + else if (pt == e1.Bot) pt.Z = e1.Bot.Z; + else if (pt == e1.Top) pt.Z = e1.Top.Z; + else if (pt == e2.Bot) pt.Z = e2.Bot.Z; + else if (pt == e2.Top) pt.Z = e2.Top.Z; + else (*m_ZFill)(e1.Bot, e1.Top, e2.Bot, e2.Top, pt); +} +//------------------------------------------------------------------------------ +#endif + +void Clipper::IntersectEdges(TEdge *e1, TEdge *e2, IntPoint &Pt) +{ + bool e1Contributing = ( e1->OutIdx >= 0 ); + bool e2Contributing = ( e2->OutIdx >= 0 ); + +#ifdef use_xyz + SetZ(Pt, *e1, *e2); +#endif + +#ifdef use_lines + //if either edge is on an OPEN path ... + if (e1->WindDelta == 0 || e2->WindDelta == 0) + { + //ignore subject-subject open path intersections UNLESS they + //are both open paths, AND they are both 'contributing maximas' ... + if (e1->WindDelta == 0 && e2->WindDelta == 0) return; + + //if intersecting a subj line with a subj poly ... + else if (e1->PolyTyp == e2->PolyTyp && + e1->WindDelta != e2->WindDelta && m_ClipType == ctUnion) + { + if (e1->WindDelta == 0) + { + if (e2Contributing) + { + AddOutPt(e1, Pt); + if (e1Contributing) e1->OutIdx = Unassigned; + } + } + else + { + if (e1Contributing) + { + AddOutPt(e2, Pt); + if (e2Contributing) e2->OutIdx = Unassigned; + } + } + } + else if (e1->PolyTyp != e2->PolyTyp) + { + //toggle subj open path OutIdx on/off when Abs(clip.WndCnt) == 1 ... + if ((e1->WindDelta == 0) && abs(e2->WindCnt) == 1 && + (m_ClipType != ctUnion || e2->WindCnt2 == 0)) + { + AddOutPt(e1, Pt); + if (e1Contributing) e1->OutIdx = Unassigned; + } + else if ((e2->WindDelta == 0) && (abs(e1->WindCnt) == 1) && + (m_ClipType != ctUnion || e1->WindCnt2 == 0)) + { + AddOutPt(e2, Pt); + if (e2Contributing) e2->OutIdx = Unassigned; + } + } + return; + } +#endif + + //update winding counts... + //assumes that e1 will be to the Right of e2 ABOVE the intersection + if ( e1->PolyTyp == e2->PolyTyp ) + { + if ( IsEvenOddFillType( *e1) ) + { + int oldE1WindCnt = e1->WindCnt; + e1->WindCnt = e2->WindCnt; + e2->WindCnt = oldE1WindCnt; + } else + { + if (e1->WindCnt + e2->WindDelta == 0 ) e1->WindCnt = -e1->WindCnt; + else e1->WindCnt += e2->WindDelta; + if ( e2->WindCnt - e1->WindDelta == 0 ) e2->WindCnt = -e2->WindCnt; + else e2->WindCnt -= e1->WindDelta; + } + } else + { + if (!IsEvenOddFillType(*e2)) e1->WindCnt2 += e2->WindDelta; + else e1->WindCnt2 = ( e1->WindCnt2 == 0 ) ? 1 : 0; + if (!IsEvenOddFillType(*e1)) e2->WindCnt2 -= e1->WindDelta; + else e2->WindCnt2 = ( e2->WindCnt2 == 0 ) ? 1 : 0; + } + + PolyFillType e1FillType, e2FillType, e1FillType2, e2FillType2; + if (e1->PolyTyp == ptSubject) + { + e1FillType = m_SubjFillType; + e1FillType2 = m_ClipFillType; + } else + { + e1FillType = m_ClipFillType; + e1FillType2 = m_SubjFillType; + } + if (e2->PolyTyp == ptSubject) + { + e2FillType = m_SubjFillType; + e2FillType2 = m_ClipFillType; + } else + { + e2FillType = m_ClipFillType; + e2FillType2 = m_SubjFillType; + } + + cInt e1Wc, e2Wc; + switch (e1FillType) + { + case pftPositive: e1Wc = e1->WindCnt; break; + case pftNegative: e1Wc = -e1->WindCnt; break; + default: e1Wc = Abs(e1->WindCnt); + } + switch(e2FillType) + { + case pftPositive: e2Wc = e2->WindCnt; break; + case pftNegative: e2Wc = -e2->WindCnt; break; + default: e2Wc = Abs(e2->WindCnt); + } + + if ( e1Contributing && e2Contributing ) + { + if ((e1Wc != 0 && e1Wc != 1) || (e2Wc != 0 && e2Wc != 1) || + (e1->PolyTyp != e2->PolyTyp && m_ClipType != ctXor) ) + { + AddLocalMaxPoly(e1, e2, Pt); + } + else + { + AddOutPt(e1, Pt); + AddOutPt(e2, Pt); + SwapSides( *e1 , *e2 ); + SwapPolyIndexes( *e1 , *e2 ); + } + } + else if ( e1Contributing ) + { + if (e2Wc == 0 || e2Wc == 1) + { + AddOutPt(e1, Pt); + SwapSides(*e1, *e2); + SwapPolyIndexes(*e1, *e2); + } + } + else if ( e2Contributing ) + { + if (e1Wc == 0 || e1Wc == 1) + { + AddOutPt(e2, Pt); + SwapSides(*e1, *e2); + SwapPolyIndexes(*e1, *e2); + } + } + else if ( (e1Wc == 0 || e1Wc == 1) && (e2Wc == 0 || e2Wc == 1)) + { + //neither edge is currently contributing ... + + cInt e1Wc2, e2Wc2; + switch (e1FillType2) + { + case pftPositive: e1Wc2 = e1->WindCnt2; break; + case pftNegative : e1Wc2 = -e1->WindCnt2; break; + default: e1Wc2 = Abs(e1->WindCnt2); + } + switch (e2FillType2) + { + case pftPositive: e2Wc2 = e2->WindCnt2; break; + case pftNegative: e2Wc2 = -e2->WindCnt2; break; + default: e2Wc2 = Abs(e2->WindCnt2); + } + + if (e1->PolyTyp != e2->PolyTyp) + { + AddLocalMinPoly(e1, e2, Pt); + } + else if (e1Wc == 1 && e2Wc == 1) + switch( m_ClipType ) { + case ctIntersection: + if (e1Wc2 > 0 && e2Wc2 > 0) + AddLocalMinPoly(e1, e2, Pt); + break; + case ctUnion: + if ( e1Wc2 <= 0 && e2Wc2 <= 0 ) + AddLocalMinPoly(e1, e2, Pt); + break; + case ctDifference: + if (((e1->PolyTyp == ptClip) && (e1Wc2 > 0) && (e2Wc2 > 0)) || + ((e1->PolyTyp == ptSubject) && (e1Wc2 <= 0) && (e2Wc2 <= 0))) + AddLocalMinPoly(e1, e2, Pt); + break; + case ctXor: + AddLocalMinPoly(e1, e2, Pt); + } + else + SwapSides( *e1, *e2 ); + } +} +//------------------------------------------------------------------------------ + +void Clipper::SetHoleState(TEdge *e, OutRec *outrec) +{ + TEdge *e2 = e->PrevInAEL; + TEdge *eTmp = 0; + while (e2) + { + if (e2->OutIdx >= 0 && e2->WindDelta != 0) + { + if (!eTmp) eTmp = e2; + else if (eTmp->OutIdx == e2->OutIdx) eTmp = 0; + } + e2 = e2->PrevInAEL; + } + if (!eTmp) + { + outrec->FirstLeft = 0; + outrec->IsHole = false; + } + else + { + outrec->FirstLeft = m_PolyOuts[eTmp->OutIdx]; + outrec->IsHole = !outrec->FirstLeft->IsHole; + } +} +//------------------------------------------------------------------------------ + +OutRec* GetLowermostRec(OutRec *outRec1, OutRec *outRec2) +{ + //work out which polygon fragment has the correct hole state ... + if (!outRec1->BottomPt) + outRec1->BottomPt = GetBottomPt(outRec1->Pts); + if (!outRec2->BottomPt) + outRec2->BottomPt = GetBottomPt(outRec2->Pts); + OutPt *OutPt1 = outRec1->BottomPt; + OutPt *OutPt2 = outRec2->BottomPt; + if (OutPt1->Pt.Y > OutPt2->Pt.Y) return outRec1; + else if (OutPt1->Pt.Y < OutPt2->Pt.Y) return outRec2; + else if (OutPt1->Pt.X < OutPt2->Pt.X) return outRec1; + else if (OutPt1->Pt.X > OutPt2->Pt.X) return outRec2; + else if (OutPt1->Next == OutPt1) return outRec2; + else if (OutPt2->Next == OutPt2) return outRec1; + else if (FirstIsBottomPt(OutPt1, OutPt2)) return outRec1; + else return outRec2; +} +//------------------------------------------------------------------------------ + +bool OutRec1RightOfOutRec2(OutRec* outRec1, OutRec* outRec2) +{ + do + { + outRec1 = outRec1->FirstLeft; + if (outRec1 == outRec2) return true; + } while (outRec1); + return false; +} +//------------------------------------------------------------------------------ + +OutRec* Clipper::GetOutRec(int Idx) +{ + OutRec* outrec = m_PolyOuts[Idx]; + while (outrec != m_PolyOuts[outrec->Idx]) + outrec = m_PolyOuts[outrec->Idx]; + return outrec; +} +//------------------------------------------------------------------------------ + +void Clipper::AppendPolygon(TEdge *e1, TEdge *e2) +{ + //get the start and ends of both output polygons ... + OutRec *outRec1 = m_PolyOuts[e1->OutIdx]; + OutRec *outRec2 = m_PolyOuts[e2->OutIdx]; + + OutRec *holeStateRec; + if (OutRec1RightOfOutRec2(outRec1, outRec2)) + holeStateRec = outRec2; + else if (OutRec1RightOfOutRec2(outRec2, outRec1)) + holeStateRec = outRec1; + else + holeStateRec = GetLowermostRec(outRec1, outRec2); + + //get the start and ends of both output polygons and + //join e2 poly onto e1 poly and delete pointers to e2 ... + + OutPt* p1_lft = outRec1->Pts; + OutPt* p1_rt = p1_lft->Prev; + OutPt* p2_lft = outRec2->Pts; + OutPt* p2_rt = p2_lft->Prev; + + //join e2 poly onto e1 poly and delete pointers to e2 ... + if( e1->Side == esLeft ) + { + if( e2->Side == esLeft ) + { + //z y x a b c + ReversePolyPtLinks(p2_lft); + p2_lft->Next = p1_lft; + p1_lft->Prev = p2_lft; + p1_rt->Next = p2_rt; + p2_rt->Prev = p1_rt; + outRec1->Pts = p2_rt; + } else + { + //x y z a b c + p2_rt->Next = p1_lft; + p1_lft->Prev = p2_rt; + p2_lft->Prev = p1_rt; + p1_rt->Next = p2_lft; + outRec1->Pts = p2_lft; + } + } else + { + if( e2->Side == esRight ) + { + //a b c z y x + ReversePolyPtLinks(p2_lft); + p1_rt->Next = p2_rt; + p2_rt->Prev = p1_rt; + p2_lft->Next = p1_lft; + p1_lft->Prev = p2_lft; + } else + { + //a b c x y z + p1_rt->Next = p2_lft; + p2_lft->Prev = p1_rt; + p1_lft->Prev = p2_rt; + p2_rt->Next = p1_lft; + } + } + + outRec1->BottomPt = 0; + if (holeStateRec == outRec2) + { + if (outRec2->FirstLeft != outRec1) + outRec1->FirstLeft = outRec2->FirstLeft; + outRec1->IsHole = outRec2->IsHole; + } + outRec2->Pts = 0; + outRec2->BottomPt = 0; + outRec2->FirstLeft = outRec1; + + int OKIdx = e1->OutIdx; + int ObsoleteIdx = e2->OutIdx; + + e1->OutIdx = Unassigned; //nb: safe because we only get here via AddLocalMaxPoly + e2->OutIdx = Unassigned; + + TEdge* e = m_ActiveEdges; + while( e ) + { + if( e->OutIdx == ObsoleteIdx ) + { + e->OutIdx = OKIdx; + e->Side = e1->Side; + break; + } + e = e->NextInAEL; + } + + outRec2->Idx = outRec1->Idx; +} +//------------------------------------------------------------------------------ + +OutPt* Clipper::AddOutPt(TEdge *e, const IntPoint &pt) +{ + if( e->OutIdx < 0 ) + { + OutRec *outRec = CreateOutRec(); + outRec->IsOpen = (e->WindDelta == 0); + OutPt* newOp = new OutPt; + outRec->Pts = newOp; + newOp->Idx = outRec->Idx; + newOp->Pt = pt; + newOp->Next = newOp; + newOp->Prev = newOp; + if (!outRec->IsOpen) + SetHoleState(e, outRec); + e->OutIdx = outRec->Idx; + return newOp; + } else + { + OutRec *outRec = m_PolyOuts[e->OutIdx]; + //OutRec.Pts is the 'Left-most' point & OutRec.Pts.Prev is the 'Right-most' + OutPt* op = outRec->Pts; + + bool ToFront = (e->Side == esLeft); + if (ToFront && (pt == op->Pt)) return op; + else if (!ToFront && (pt == op->Prev->Pt)) return op->Prev; + + OutPt* newOp = new OutPt; + newOp->Idx = outRec->Idx; + newOp->Pt = pt; + newOp->Next = op; + newOp->Prev = op->Prev; + newOp->Prev->Next = newOp; + op->Prev = newOp; + if (ToFront) outRec->Pts = newOp; + return newOp; + } +} +//------------------------------------------------------------------------------ + +OutPt* Clipper::GetLastOutPt(TEdge *e) +{ + OutRec *outRec = m_PolyOuts[e->OutIdx]; + if (e->Side == esLeft) + return outRec->Pts; + else + return outRec->Pts->Prev; +} +//------------------------------------------------------------------------------ + +void Clipper::ProcessHorizontals() +{ + TEdge* horzEdge; + while (PopEdgeFromSEL(horzEdge)) + ProcessHorizontal(horzEdge); +} +//------------------------------------------------------------------------------ + +inline bool IsMinima(TEdge *e) +{ + return e && (e->Prev->NextInLML != e) && (e->Next->NextInLML != e); +} +//------------------------------------------------------------------------------ + +inline bool IsMaxima(TEdge *e, const cInt Y) +{ + return e && e->Top.Y == Y && !e->NextInLML; +} +//------------------------------------------------------------------------------ + +inline bool IsIntermediate(TEdge *e, const cInt Y) +{ + return e->Top.Y == Y && e->NextInLML; +} +//------------------------------------------------------------------------------ + +TEdge *GetMaximaPair(TEdge *e) +{ + if ((e->Next->Top == e->Top) && !e->Next->NextInLML) + return e->Next; + else if ((e->Prev->Top == e->Top) && !e->Prev->NextInLML) + return e->Prev; + else return 0; +} +//------------------------------------------------------------------------------ + +TEdge *GetMaximaPairEx(TEdge *e) +{ + //as GetMaximaPair() but returns 0 if MaxPair isn't in AEL (unless it's horizontal) + TEdge* result = GetMaximaPair(e); + if (result && (result->OutIdx == Skip || + (result->NextInAEL == result->PrevInAEL && !IsHorizontal(*result)))) return 0; + return result; +} +//------------------------------------------------------------------------------ + +void Clipper::SwapPositionsInSEL(TEdge *Edge1, TEdge *Edge2) +{ + if( !( Edge1->NextInSEL ) && !( Edge1->PrevInSEL ) ) return; + if( !( Edge2->NextInSEL ) && !( Edge2->PrevInSEL ) ) return; + + if( Edge1->NextInSEL == Edge2 ) + { + TEdge* Next = Edge2->NextInSEL; + if( Next ) Next->PrevInSEL = Edge1; + TEdge* Prev = Edge1->PrevInSEL; + if( Prev ) Prev->NextInSEL = Edge2; + Edge2->PrevInSEL = Prev; + Edge2->NextInSEL = Edge1; + Edge1->PrevInSEL = Edge2; + Edge1->NextInSEL = Next; + } + else if( Edge2->NextInSEL == Edge1 ) + { + TEdge* Next = Edge1->NextInSEL; + if( Next ) Next->PrevInSEL = Edge2; + TEdge* Prev = Edge2->PrevInSEL; + if( Prev ) Prev->NextInSEL = Edge1; + Edge1->PrevInSEL = Prev; + Edge1->NextInSEL = Edge2; + Edge2->PrevInSEL = Edge1; + Edge2->NextInSEL = Next; + } + else + { + TEdge* Next = Edge1->NextInSEL; + TEdge* Prev = Edge1->PrevInSEL; + Edge1->NextInSEL = Edge2->NextInSEL; + if( Edge1->NextInSEL ) Edge1->NextInSEL->PrevInSEL = Edge1; + Edge1->PrevInSEL = Edge2->PrevInSEL; + if( Edge1->PrevInSEL ) Edge1->PrevInSEL->NextInSEL = Edge1; + Edge2->NextInSEL = Next; + if( Edge2->NextInSEL ) Edge2->NextInSEL->PrevInSEL = Edge2; + Edge2->PrevInSEL = Prev; + if( Edge2->PrevInSEL ) Edge2->PrevInSEL->NextInSEL = Edge2; + } + + if( !Edge1->PrevInSEL ) m_SortedEdges = Edge1; + else if( !Edge2->PrevInSEL ) m_SortedEdges = Edge2; +} +//------------------------------------------------------------------------------ + +TEdge* GetNextInAEL(TEdge *e, Direction dir) +{ + return dir == dLeftToRight ? e->NextInAEL : e->PrevInAEL; +} +//------------------------------------------------------------------------------ + +void GetHorzDirection(TEdge& HorzEdge, Direction& Dir, cInt& Left, cInt& Right) +{ + if (HorzEdge.Bot.X < HorzEdge.Top.X) + { + Left = HorzEdge.Bot.X; + Right = HorzEdge.Top.X; + Dir = dLeftToRight; + } else + { + Left = HorzEdge.Top.X; + Right = HorzEdge.Bot.X; + Dir = dRightToLeft; + } +} +//------------------------------------------------------------------------ + +/******************************************************************************* +* Notes: Horizontal edges (HEs) at scanline intersections (ie at the Top or * +* Bottom of a scanbeam) are processed as if layered. The order in which HEs * +* are processed doesn't matter. HEs intersect with other HE Bot.Xs only [#] * +* (or they could intersect with Top.Xs only, ie EITHER Bot.Xs OR Top.Xs), * +* and with other non-horizontal edges [*]. Once these intersections are * +* processed, intermediate HEs then 'promote' the Edge above (NextInLML) into * +* the AEL. These 'promoted' edges may in turn intersect [%] with other HEs. * +*******************************************************************************/ + +void Clipper::ProcessHorizontal(TEdge *horzEdge) +{ + Direction dir; + cInt horzLeft, horzRight; + bool IsOpen = (horzEdge->WindDelta == 0); + + GetHorzDirection(*horzEdge, dir, horzLeft, horzRight); + + TEdge* eLastHorz = horzEdge, *eMaxPair = 0; + while (eLastHorz->NextInLML && IsHorizontal(*eLastHorz->NextInLML)) + eLastHorz = eLastHorz->NextInLML; + if (!eLastHorz->NextInLML) + eMaxPair = GetMaximaPair(eLastHorz); + + MaximaList::const_iterator maxIt; + MaximaList::const_reverse_iterator maxRit; + if (m_Maxima.size() > 0) + { + //get the first maxima in range (X) ... + if (dir == dLeftToRight) + { + maxIt = m_Maxima.begin(); + while (maxIt != m_Maxima.end() && *maxIt <= horzEdge->Bot.X) maxIt++; + if (maxIt != m_Maxima.end() && *maxIt >= eLastHorz->Top.X) + maxIt = m_Maxima.end(); + } + else + { + maxRit = m_Maxima.rbegin(); + while (maxRit != m_Maxima.rend() && *maxRit > horzEdge->Bot.X) maxRit++; + if (maxRit != m_Maxima.rend() && *maxRit <= eLastHorz->Top.X) + maxRit = m_Maxima.rend(); + } + } + + OutPt* op1 = 0; + + for (;;) //loop through consec. horizontal edges + { + + bool IsLastHorz = (horzEdge == eLastHorz); + TEdge* e = GetNextInAEL(horzEdge, dir); + while(e) + { + + //this code block inserts extra coords into horizontal edges (in output + //polygons) whereever maxima touch these horizontal edges. This helps + //'simplifying' polygons (ie if the Simplify property is set). + if (m_Maxima.size() > 0) + { + if (dir == dLeftToRight) + { + while (maxIt != m_Maxima.end() && *maxIt < e->Curr.X) + { + if (horzEdge->OutIdx >= 0 && !IsOpen) + AddOutPt(horzEdge, IntPoint(*maxIt, horzEdge->Bot.Y)); + maxIt++; + } + } + else + { + while (maxRit != m_Maxima.rend() && *maxRit > e->Curr.X) + { + if (horzEdge->OutIdx >= 0 && !IsOpen) + AddOutPt(horzEdge, IntPoint(*maxRit, horzEdge->Bot.Y)); + maxRit++; + } + } + }; + + if ((dir == dLeftToRight && e->Curr.X > horzRight) || + (dir == dRightToLeft && e->Curr.X < horzLeft)) break; + + //Also break if we've got to the end of an intermediate horizontal edge ... + //nb: Smaller Dx's are to the right of larger Dx's ABOVE the horizontal. + if (e->Curr.X == horzEdge->Top.X && horzEdge->NextInLML && + e->Dx < horzEdge->NextInLML->Dx) break; + + if (horzEdge->OutIdx >= 0 && !IsOpen) //note: may be done multiple times + { +#ifdef use_xyz + if (dir == dLeftToRight) SetZ(e->Curr, *horzEdge, *e); + else SetZ(e->Curr, *e, *horzEdge); +#endif + op1 = AddOutPt(horzEdge, e->Curr); + TEdge* eNextHorz = m_SortedEdges; + while (eNextHorz) + { + if (eNextHorz->OutIdx >= 0 && + HorzSegmentsOverlap(horzEdge->Bot.X, + horzEdge->Top.X, eNextHorz->Bot.X, eNextHorz->Top.X)) + { + OutPt* op2 = GetLastOutPt(eNextHorz); + AddJoin(op2, op1, eNextHorz->Top); + } + eNextHorz = eNextHorz->NextInSEL; + } + AddGhostJoin(op1, horzEdge->Bot); + } + + //OK, so far we're still in range of the horizontal Edge but make sure + //we're at the last of consec. horizontals when matching with eMaxPair + if(e == eMaxPair && IsLastHorz) + { + if (horzEdge->OutIdx >= 0) + AddLocalMaxPoly(horzEdge, eMaxPair, horzEdge->Top); + DeleteFromAEL(horzEdge); + DeleteFromAEL(eMaxPair); + return; + } + + if(dir == dLeftToRight) + { + IntPoint Pt = IntPoint(e->Curr.X, horzEdge->Curr.Y); + IntersectEdges(horzEdge, e, Pt); + } + else + { + IntPoint Pt = IntPoint(e->Curr.X, horzEdge->Curr.Y); + IntersectEdges( e, horzEdge, Pt); + } + TEdge* eNext = GetNextInAEL(e, dir); + SwapPositionsInAEL( horzEdge, e ); + e = eNext; + } //end while(e) + + //Break out of loop if HorzEdge.NextInLML is not also horizontal ... + if (!horzEdge->NextInLML || !IsHorizontal(*horzEdge->NextInLML)) break; + + UpdateEdgeIntoAEL(horzEdge); + if (horzEdge->OutIdx >= 0) AddOutPt(horzEdge, horzEdge->Bot); + GetHorzDirection(*horzEdge, dir, horzLeft, horzRight); + + } //end for (;;) + + if (horzEdge->OutIdx >= 0 && !op1) + { + op1 = GetLastOutPt(horzEdge); + TEdge* eNextHorz = m_SortedEdges; + while (eNextHorz) + { + if (eNextHorz->OutIdx >= 0 && + HorzSegmentsOverlap(horzEdge->Bot.X, + horzEdge->Top.X, eNextHorz->Bot.X, eNextHorz->Top.X)) + { + OutPt* op2 = GetLastOutPt(eNextHorz); + AddJoin(op2, op1, eNextHorz->Top); + } + eNextHorz = eNextHorz->NextInSEL; + } + AddGhostJoin(op1, horzEdge->Top); + } + + if (horzEdge->NextInLML) + { + if(horzEdge->OutIdx >= 0) + { + op1 = AddOutPt( horzEdge, horzEdge->Top); + UpdateEdgeIntoAEL(horzEdge); + if (horzEdge->WindDelta == 0) return; + //nb: HorzEdge is no longer horizontal here + TEdge* ePrev = horzEdge->PrevInAEL; + TEdge* eNext = horzEdge->NextInAEL; + if (ePrev && ePrev->Curr.X == horzEdge->Bot.X && + ePrev->Curr.Y == horzEdge->Bot.Y && ePrev->WindDelta != 0 && + (ePrev->OutIdx >= 0 && ePrev->Curr.Y > ePrev->Top.Y && + SlopesEqual(*horzEdge, *ePrev, m_UseFullRange))) + { + OutPt* op2 = AddOutPt(ePrev, horzEdge->Bot); + AddJoin(op1, op2, horzEdge->Top); + } + else if (eNext && eNext->Curr.X == horzEdge->Bot.X && + eNext->Curr.Y == horzEdge->Bot.Y && eNext->WindDelta != 0 && + eNext->OutIdx >= 0 && eNext->Curr.Y > eNext->Top.Y && + SlopesEqual(*horzEdge, *eNext, m_UseFullRange)) + { + OutPt* op2 = AddOutPt(eNext, horzEdge->Bot); + AddJoin(op1, op2, horzEdge->Top); + } + } + else + UpdateEdgeIntoAEL(horzEdge); + } + else + { + if (horzEdge->OutIdx >= 0) AddOutPt(horzEdge, horzEdge->Top); + DeleteFromAEL(horzEdge); + } +} +//------------------------------------------------------------------------------ + +bool Clipper::ProcessIntersections(const cInt topY) +{ + if( !m_ActiveEdges ) return true; + try { + BuildIntersectList(topY); + size_t IlSize = m_IntersectList.size(); + if (IlSize == 0) return true; + if (IlSize == 1 || FixupIntersectionOrder()) ProcessIntersectList(); + else return false; + } + catch(...) + { + m_SortedEdges = 0; + DisposeIntersectNodes(); + throw clipperException("ProcessIntersections error"); + } + m_SortedEdges = 0; + return true; +} +//------------------------------------------------------------------------------ + +void Clipper::DisposeIntersectNodes() +{ + for (size_t i = 0; i < m_IntersectList.size(); ++i ) + delete m_IntersectList[i]; + m_IntersectList.clear(); +} +//------------------------------------------------------------------------------ + +void Clipper::BuildIntersectList(const cInt topY) +{ + if ( !m_ActiveEdges ) return; + + //prepare for sorting ... + TEdge* e = m_ActiveEdges; + m_SortedEdges = e; + while( e ) + { + e->PrevInSEL = e->PrevInAEL; + e->NextInSEL = e->NextInAEL; + e->Curr.X = TopX( *e, topY ); + e = e->NextInAEL; + } + + //bubblesort ... + bool isModified; + do + { + isModified = false; + e = m_SortedEdges; + while( e->NextInSEL ) + { + TEdge *eNext = e->NextInSEL; + IntPoint Pt; + if(e->Curr.X > eNext->Curr.X) + { + IntersectPoint(*e, *eNext, Pt); + if (Pt.Y < topY) Pt = IntPoint(TopX(*e, topY), topY); + IntersectNode * newNode = new IntersectNode; + newNode->Edge1 = e; + newNode->Edge2 = eNext; + newNode->Pt = Pt; + m_IntersectList.push_back(newNode); + + SwapPositionsInSEL(e, eNext); + isModified = true; + } + else + e = eNext; + } + if( e->PrevInSEL ) e->PrevInSEL->NextInSEL = 0; + else break; + } + while ( isModified ); + m_SortedEdges = 0; //important +} +//------------------------------------------------------------------------------ + + +void Clipper::ProcessIntersectList() +{ + for (size_t i = 0; i < m_IntersectList.size(); ++i) + { + IntersectNode* iNode = m_IntersectList[i]; + { + IntersectEdges( iNode->Edge1, iNode->Edge2, iNode->Pt); + SwapPositionsInAEL( iNode->Edge1 , iNode->Edge2 ); + } + delete iNode; + } + m_IntersectList.clear(); +} +//------------------------------------------------------------------------------ + +bool IntersectListSort(IntersectNode* node1, IntersectNode* node2) +{ + return node2->Pt.Y < node1->Pt.Y; +} +//------------------------------------------------------------------------------ + +inline bool EdgesAdjacent(const IntersectNode &inode) +{ + return (inode.Edge1->NextInSEL == inode.Edge2) || + (inode.Edge1->PrevInSEL == inode.Edge2); +} +//------------------------------------------------------------------------------ + +bool Clipper::FixupIntersectionOrder() +{ + //pre-condition: intersections are sorted Bottom-most first. + //Now it's crucial that intersections are made only between adjacent edges, + //so to ensure this the order of intersections may need adjusting ... + CopyAELToSEL(); + std::sort(m_IntersectList.begin(), m_IntersectList.end(), IntersectListSort); + size_t cnt = m_IntersectList.size(); + for (size_t i = 0; i < cnt; ++i) + { + if (!EdgesAdjacent(*m_IntersectList[i])) + { + size_t j = i + 1; + while (j < cnt && !EdgesAdjacent(*m_IntersectList[j])) j++; + if (j == cnt) return false; + std::swap(m_IntersectList[i], m_IntersectList[j]); + } + SwapPositionsInSEL(m_IntersectList[i]->Edge1, m_IntersectList[i]->Edge2); + } + return true; +} +//------------------------------------------------------------------------------ + +void Clipper::DoMaxima(TEdge *e) +{ + TEdge* eMaxPair = GetMaximaPairEx(e); + if (!eMaxPair) + { + if (e->OutIdx >= 0) + AddOutPt(e, e->Top); + DeleteFromAEL(e); + return; + } + + TEdge* eNext = e->NextInAEL; + while(eNext && eNext != eMaxPair) + { + IntersectEdges(e, eNext, e->Top); + SwapPositionsInAEL(e, eNext); + eNext = e->NextInAEL; + } + + if(e->OutIdx == Unassigned && eMaxPair->OutIdx == Unassigned) + { + DeleteFromAEL(e); + DeleteFromAEL(eMaxPair); + } + else if( e->OutIdx >= 0 && eMaxPair->OutIdx >= 0 ) + { + if (e->OutIdx >= 0) AddLocalMaxPoly(e, eMaxPair, e->Top); + DeleteFromAEL(e); + DeleteFromAEL(eMaxPair); + } +#ifdef use_lines + else if (e->WindDelta == 0) + { + if (e->OutIdx >= 0) + { + AddOutPt(e, e->Top); + e->OutIdx = Unassigned; + } + DeleteFromAEL(e); + + if (eMaxPair->OutIdx >= 0) + { + AddOutPt(eMaxPair, e->Top); + eMaxPair->OutIdx = Unassigned; + } + DeleteFromAEL(eMaxPair); + } +#endif + else throw clipperException("DoMaxima error"); +} +//------------------------------------------------------------------------------ + +void Clipper::ProcessEdgesAtTopOfScanbeam(const cInt topY) +{ + TEdge* e = m_ActiveEdges; + while( e ) + { + //1. process maxima, treating them as if they're 'bent' horizontal edges, + // but exclude maxima with horizontal edges. nb: e can't be a horizontal. + bool IsMaximaEdge = IsMaxima(e, topY); + + if(IsMaximaEdge) + { + TEdge* eMaxPair = GetMaximaPairEx(e); + IsMaximaEdge = (!eMaxPair || !IsHorizontal(*eMaxPair)); + } + + if(IsMaximaEdge) + { + if (m_StrictSimple) m_Maxima.push_back(e->Top.X); + TEdge* ePrev = e->PrevInAEL; + DoMaxima(e); + if( !ePrev ) e = m_ActiveEdges; + else e = ePrev->NextInAEL; + } + else + { + //2. promote horizontal edges, otherwise update Curr.X and Curr.Y ... + if (IsIntermediate(e, topY) && IsHorizontal(*e->NextInLML)) + { + UpdateEdgeIntoAEL(e); + if (e->OutIdx >= 0) + AddOutPt(e, e->Bot); + AddEdgeToSEL(e); + } + else + { + e->Curr.X = TopX( *e, topY ); + e->Curr.Y = topY; +#ifdef use_xyz + e->Curr.Z = topY == e->Top.Y ? e->Top.Z : (topY == e->Bot.Y ? e->Bot.Z : 0); +#endif + } + + //When StrictlySimple and 'e' is being touched by another edge, then + //make sure both edges have a vertex here ... + if (m_StrictSimple) + { + TEdge* ePrev = e->PrevInAEL; + if ((e->OutIdx >= 0) && (e->WindDelta != 0) && ePrev && (ePrev->OutIdx >= 0) && + (ePrev->Curr.X == e->Curr.X) && (ePrev->WindDelta != 0)) + { + IntPoint pt = e->Curr; +#ifdef use_xyz + SetZ(pt, *ePrev, *e); +#endif + OutPt* op = AddOutPt(ePrev, pt); + OutPt* op2 = AddOutPt(e, pt); + AddJoin(op, op2, pt); //StrictlySimple (type-3) join + } + } + + e = e->NextInAEL; + } + } + + //3. Process horizontals at the Top of the scanbeam ... + m_Maxima.sort(); + ProcessHorizontals(); + m_Maxima.clear(); + + //4. Promote intermediate vertices ... + e = m_ActiveEdges; + while(e) + { + if(IsIntermediate(e, topY)) + { + OutPt* op = 0; + if( e->OutIdx >= 0 ) + op = AddOutPt(e, e->Top); + UpdateEdgeIntoAEL(e); + + //if output polygons share an edge, they'll need joining later ... + TEdge* ePrev = e->PrevInAEL; + TEdge* eNext = e->NextInAEL; + if (ePrev && ePrev->Curr.X == e->Bot.X && + ePrev->Curr.Y == e->Bot.Y && op && + ePrev->OutIdx >= 0 && ePrev->Curr.Y > ePrev->Top.Y && + SlopesEqual(e->Curr, e->Top, ePrev->Curr, ePrev->Top, m_UseFullRange) && + (e->WindDelta != 0) && (ePrev->WindDelta != 0)) + { + OutPt* op2 = AddOutPt(ePrev, e->Bot); + AddJoin(op, op2, e->Top); + } + else if (eNext && eNext->Curr.X == e->Bot.X && + eNext->Curr.Y == e->Bot.Y && op && + eNext->OutIdx >= 0 && eNext->Curr.Y > eNext->Top.Y && + SlopesEqual(e->Curr, e->Top, eNext->Curr, eNext->Top, m_UseFullRange) && + (e->WindDelta != 0) && (eNext->WindDelta != 0)) + { + OutPt* op2 = AddOutPt(eNext, e->Bot); + AddJoin(op, op2, e->Top); + } + } + e = e->NextInAEL; + } +} +//------------------------------------------------------------------------------ + +void Clipper::FixupOutPolyline(OutRec &outrec) +{ + OutPt *pp = outrec.Pts; + OutPt *lastPP = pp->Prev; + while (pp != lastPP) + { + pp = pp->Next; + if (pp->Pt == pp->Prev->Pt) + { + if (pp == lastPP) lastPP = pp->Prev; + OutPt *tmpPP = pp->Prev; + tmpPP->Next = pp->Next; + pp->Next->Prev = tmpPP; + delete pp; + pp = tmpPP; + } + } + + if (pp == pp->Prev) + { + DisposeOutPts(pp); + outrec.Pts = 0; + return; + } +} +//------------------------------------------------------------------------------ + +void Clipper::FixupOutPolygon(OutRec &outrec) +{ + //FixupOutPolygon() - removes duplicate points and simplifies consecutive + //parallel edges by removing the middle vertex. + OutPt *lastOK = 0; + outrec.BottomPt = 0; + OutPt *pp = outrec.Pts; + bool preserveCol = m_PreserveCollinear || m_StrictSimple; + + for (;;) + { + if (pp->Prev == pp || pp->Prev == pp->Next) + { + DisposeOutPts(pp); + outrec.Pts = 0; + return; + } + + //test for duplicate points and collinear edges ... + if ((pp->Pt == pp->Next->Pt) || (pp->Pt == pp->Prev->Pt) || + (SlopesEqual(pp->Prev->Pt, pp->Pt, pp->Next->Pt, m_UseFullRange) && + (!preserveCol || !Pt2IsBetweenPt1AndPt3(pp->Prev->Pt, pp->Pt, pp->Next->Pt)))) + { + lastOK = 0; + OutPt *tmp = pp; + pp->Prev->Next = pp->Next; + pp->Next->Prev = pp->Prev; + pp = pp->Prev; + delete tmp; + } + else if (pp == lastOK) break; + else + { + if (!lastOK) lastOK = pp; + pp = pp->Next; + } + } + outrec.Pts = pp; +} +//------------------------------------------------------------------------------ + +int PointCount(OutPt *Pts) +{ + if (!Pts) return 0; + int result = 0; + OutPt* p = Pts; + do + { + result++; + p = p->Next; + } + while (p != Pts); + return result; +} +//------------------------------------------------------------------------------ + +void Clipper::BuildResult(Paths &polys) +{ + polys.reserve(m_PolyOuts.size()); + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); ++i) + { + if (!m_PolyOuts[i]->Pts) continue; + Path pg; + OutPt* p = m_PolyOuts[i]->Pts->Prev; + int cnt = PointCount(p); + if (cnt < 2) continue; + pg.reserve(cnt); + for (int i = 0; i < cnt; ++i) + { + pg.push_back(p->Pt); + p = p->Prev; + } + polys.push_back(pg); + } +} +//------------------------------------------------------------------------------ + +void Clipper::BuildResult2(PolyTree& polytree) +{ + polytree.Clear(); + polytree.AllNodes.reserve(m_PolyOuts.size()); + //add each output polygon/contour to polytree ... + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); i++) + { + OutRec* outRec = m_PolyOuts[i]; + int cnt = PointCount(outRec->Pts); + if ((outRec->IsOpen && cnt < 2) || (!outRec->IsOpen && cnt < 3)) continue; + FixHoleLinkage(*outRec); + PolyNode* pn = new PolyNode(); + //nb: polytree takes ownership of all the PolyNodes + polytree.AllNodes.push_back(pn); + outRec->PolyNd = pn; + pn->Parent = 0; + pn->Index = 0; + pn->Contour.reserve(cnt); + OutPt *op = outRec->Pts->Prev; + for (int j = 0; j < cnt; j++) + { + pn->Contour.push_back(op->Pt); + op = op->Prev; + } + } + + //fixup PolyNode links etc ... + polytree.Childs.reserve(m_PolyOuts.size()); + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); i++) + { + OutRec* outRec = m_PolyOuts[i]; + if (!outRec->PolyNd) continue; + if (outRec->IsOpen) + { + outRec->PolyNd->m_IsOpen = true; + polytree.AddChild(*outRec->PolyNd); + } + else if (outRec->FirstLeft && outRec->FirstLeft->PolyNd) + outRec->FirstLeft->PolyNd->AddChild(*outRec->PolyNd); + else + polytree.AddChild(*outRec->PolyNd); + } +} +//------------------------------------------------------------------------------ + +void SwapIntersectNodes(IntersectNode &int1, IntersectNode &int2) +{ + //just swap the contents (because fIntersectNodes is a single-linked-list) + IntersectNode inode = int1; //gets a copy of Int1 + int1.Edge1 = int2.Edge1; + int1.Edge2 = int2.Edge2; + int1.Pt = int2.Pt; + int2.Edge1 = inode.Edge1; + int2.Edge2 = inode.Edge2; + int2.Pt = inode.Pt; +} +//------------------------------------------------------------------------------ + +inline bool E2InsertsBeforeE1(TEdge &e1, TEdge &e2) +{ + if (e2.Curr.X == e1.Curr.X) + { + if (e2.Top.Y > e1.Top.Y) + return e2.Top.X < TopX(e1, e2.Top.Y); + else return e1.Top.X > TopX(e2, e1.Top.Y); + } + else return e2.Curr.X < e1.Curr.X; +} +//------------------------------------------------------------------------------ + +bool GetOverlap(const cInt a1, const cInt a2, const cInt b1, const cInt b2, + cInt& Left, cInt& Right) +{ + if (a1 < a2) + { + if (b1 < b2) {Left = std::max(a1,b1); Right = std::min(a2,b2);} + else {Left = std::max(a1,b2); Right = std::min(a2,b1);} + } + else + { + if (b1 < b2) {Left = std::max(a2,b1); Right = std::min(a1,b2);} + else {Left = std::max(a2,b2); Right = std::min(a1,b1);} + } + return Left < Right; +} +//------------------------------------------------------------------------------ + +inline void UpdateOutPtIdxs(OutRec& outrec) +{ + OutPt* op = outrec.Pts; + do + { + op->Idx = outrec.Idx; + op = op->Prev; + } + while(op != outrec.Pts); +} +//------------------------------------------------------------------------------ + +void Clipper::InsertEdgeIntoAEL(TEdge *edge, TEdge* startEdge) +{ + if(!m_ActiveEdges) + { + edge->PrevInAEL = 0; + edge->NextInAEL = 0; + m_ActiveEdges = edge; + } + else if(!startEdge && E2InsertsBeforeE1(*m_ActiveEdges, *edge)) + { + edge->PrevInAEL = 0; + edge->NextInAEL = m_ActiveEdges; + m_ActiveEdges->PrevInAEL = edge; + m_ActiveEdges = edge; + } + else + { + if(!startEdge) startEdge = m_ActiveEdges; + while(startEdge->NextInAEL && + !E2InsertsBeforeE1(*startEdge->NextInAEL , *edge)) + startEdge = startEdge->NextInAEL; + edge->NextInAEL = startEdge->NextInAEL; + if(startEdge->NextInAEL) startEdge->NextInAEL->PrevInAEL = edge; + edge->PrevInAEL = startEdge; + startEdge->NextInAEL = edge; + } +} +//---------------------------------------------------------------------- + +OutPt* DupOutPt(OutPt* outPt, bool InsertAfter) +{ + OutPt* result = new OutPt; + result->Pt = outPt->Pt; + result->Idx = outPt->Idx; + if (InsertAfter) + { + result->Next = outPt->Next; + result->Prev = outPt; + outPt->Next->Prev = result; + outPt->Next = result; + } + else + { + result->Prev = outPt->Prev; + result->Next = outPt; + outPt->Prev->Next = result; + outPt->Prev = result; + } + return result; +} +//------------------------------------------------------------------------------ + +bool JoinHorz(OutPt* op1, OutPt* op1b, OutPt* op2, OutPt* op2b, + const IntPoint Pt, bool DiscardLeft) +{ + Direction Dir1 = (op1->Pt.X > op1b->Pt.X ? dRightToLeft : dLeftToRight); + Direction Dir2 = (op2->Pt.X > op2b->Pt.X ? dRightToLeft : dLeftToRight); + if (Dir1 == Dir2) return false; + + //When DiscardLeft, we want Op1b to be on the Left of Op1, otherwise we + //want Op1b to be on the Right. (And likewise with Op2 and Op2b.) + //So, to facilitate this while inserting Op1b and Op2b ... + //when DiscardLeft, make sure we're AT or RIGHT of Pt before adding Op1b, + //otherwise make sure we're AT or LEFT of Pt. (Likewise with Op2b.) + if (Dir1 == dLeftToRight) + { + while (op1->Next->Pt.X <= Pt.X && + op1->Next->Pt.X >= op1->Pt.X && op1->Next->Pt.Y == Pt.Y) + op1 = op1->Next; + if (DiscardLeft && (op1->Pt.X != Pt.X)) op1 = op1->Next; + op1b = DupOutPt(op1, !DiscardLeft); + if (op1b->Pt != Pt) + { + op1 = op1b; + op1->Pt = Pt; + op1b = DupOutPt(op1, !DiscardLeft); + } + } + else + { + while (op1->Next->Pt.X >= Pt.X && + op1->Next->Pt.X <= op1->Pt.X && op1->Next->Pt.Y == Pt.Y) + op1 = op1->Next; + if (!DiscardLeft && (op1->Pt.X != Pt.X)) op1 = op1->Next; + op1b = DupOutPt(op1, DiscardLeft); + if (op1b->Pt != Pt) + { + op1 = op1b; + op1->Pt = Pt; + op1b = DupOutPt(op1, DiscardLeft); + } + } + + if (Dir2 == dLeftToRight) + { + while (op2->Next->Pt.X <= Pt.X && + op2->Next->Pt.X >= op2->Pt.X && op2->Next->Pt.Y == Pt.Y) + op2 = op2->Next; + if (DiscardLeft && (op2->Pt.X != Pt.X)) op2 = op2->Next; + op2b = DupOutPt(op2, !DiscardLeft); + if (op2b->Pt != Pt) + { + op2 = op2b; + op2->Pt = Pt; + op2b = DupOutPt(op2, !DiscardLeft); + }; + } else + { + while (op2->Next->Pt.X >= Pt.X && + op2->Next->Pt.X <= op2->Pt.X && op2->Next->Pt.Y == Pt.Y) + op2 = op2->Next; + if (!DiscardLeft && (op2->Pt.X != Pt.X)) op2 = op2->Next; + op2b = DupOutPt(op2, DiscardLeft); + if (op2b->Pt != Pt) + { + op2 = op2b; + op2->Pt = Pt; + op2b = DupOutPt(op2, DiscardLeft); + }; + }; + + if ((Dir1 == dLeftToRight) == DiscardLeft) + { + op1->Prev = op2; + op2->Next = op1; + op1b->Next = op2b; + op2b->Prev = op1b; + } + else + { + op1->Next = op2; + op2->Prev = op1; + op1b->Prev = op2b; + op2b->Next = op1b; + } + return true; +} +//------------------------------------------------------------------------------ + +bool Clipper::JoinPoints(Join *j, OutRec* outRec1, OutRec* outRec2) +{ + OutPt *op1 = j->OutPt1, *op1b; + OutPt *op2 = j->OutPt2, *op2b; + + //There are 3 kinds of joins for output polygons ... + //1. Horizontal joins where Join.OutPt1 & Join.OutPt2 are vertices anywhere + //along (horizontal) collinear edges (& Join.OffPt is on the same horizontal). + //2. Non-horizontal joins where Join.OutPt1 & Join.OutPt2 are at the same + //location at the Bottom of the overlapping segment (& Join.OffPt is above). + //3. StrictSimple joins where edges touch but are not collinear and where + //Join.OutPt1, Join.OutPt2 & Join.OffPt all share the same point. + bool isHorizontal = (j->OutPt1->Pt.Y == j->OffPt.Y); + + if (isHorizontal && (j->OffPt == j->OutPt1->Pt) && + (j->OffPt == j->OutPt2->Pt)) + { + //Strictly Simple join ... + if (outRec1 != outRec2) return false; + op1b = j->OutPt1->Next; + while (op1b != op1 && (op1b->Pt == j->OffPt)) + op1b = op1b->Next; + bool reverse1 = (op1b->Pt.Y > j->OffPt.Y); + op2b = j->OutPt2->Next; + while (op2b != op2 && (op2b->Pt == j->OffPt)) + op2b = op2b->Next; + bool reverse2 = (op2b->Pt.Y > j->OffPt.Y); + if (reverse1 == reverse2) return false; + if (reverse1) + { + op1b = DupOutPt(op1, false); + op2b = DupOutPt(op2, true); + op1->Prev = op2; + op2->Next = op1; + op1b->Next = op2b; + op2b->Prev = op1b; + j->OutPt1 = op1; + j->OutPt2 = op1b; + return true; + } else + { + op1b = DupOutPt(op1, true); + op2b = DupOutPt(op2, false); + op1->Next = op2; + op2->Prev = op1; + op1b->Prev = op2b; + op2b->Next = op1b; + j->OutPt1 = op1; + j->OutPt2 = op1b; + return true; + } + } + else if (isHorizontal) + { + //treat horizontal joins differently to non-horizontal joins since with + //them we're not yet sure where the overlapping is. OutPt1.Pt & OutPt2.Pt + //may be anywhere along the horizontal edge. + op1b = op1; + while (op1->Prev->Pt.Y == op1->Pt.Y && op1->Prev != op1b && op1->Prev != op2) + op1 = op1->Prev; + while (op1b->Next->Pt.Y == op1b->Pt.Y && op1b->Next != op1 && op1b->Next != op2) + op1b = op1b->Next; + if (op1b->Next == op1 || op1b->Next == op2) return false; //a flat 'polygon' + + op2b = op2; + while (op2->Prev->Pt.Y == op2->Pt.Y && op2->Prev != op2b && op2->Prev != op1b) + op2 = op2->Prev; + while (op2b->Next->Pt.Y == op2b->Pt.Y && op2b->Next != op2 && op2b->Next != op1) + op2b = op2b->Next; + if (op2b->Next == op2 || op2b->Next == op1) return false; //a flat 'polygon' + + cInt Left, Right; + //Op1 --> Op1b & Op2 --> Op2b are the extremites of the horizontal edges + if (!GetOverlap(op1->Pt.X, op1b->Pt.X, op2->Pt.X, op2b->Pt.X, Left, Right)) + return false; + + //DiscardLeftSide: when overlapping edges are joined, a spike will created + //which needs to be cleaned up. However, we don't want Op1 or Op2 caught up + //on the discard Side as either may still be needed for other joins ... + IntPoint Pt; + bool DiscardLeftSide; + if (op1->Pt.X >= Left && op1->Pt.X <= Right) + { + Pt = op1->Pt; DiscardLeftSide = (op1->Pt.X > op1b->Pt.X); + } + else if (op2->Pt.X >= Left&& op2->Pt.X <= Right) + { + Pt = op2->Pt; DiscardLeftSide = (op2->Pt.X > op2b->Pt.X); + } + else if (op1b->Pt.X >= Left && op1b->Pt.X <= Right) + { + Pt = op1b->Pt; DiscardLeftSide = op1b->Pt.X > op1->Pt.X; + } + else + { + Pt = op2b->Pt; DiscardLeftSide = (op2b->Pt.X > op2->Pt.X); + } + j->OutPt1 = op1; j->OutPt2 = op2; + return JoinHorz(op1, op1b, op2, op2b, Pt, DiscardLeftSide); + } else + { + //nb: For non-horizontal joins ... + // 1. Jr.OutPt1.Pt.Y == Jr.OutPt2.Pt.Y + // 2. Jr.OutPt1.Pt > Jr.OffPt.Y + + //make sure the polygons are correctly oriented ... + op1b = op1->Next; + while ((op1b->Pt == op1->Pt) && (op1b != op1)) op1b = op1b->Next; + bool Reverse1 = ((op1b->Pt.Y > op1->Pt.Y) || + !SlopesEqual(op1->Pt, op1b->Pt, j->OffPt, m_UseFullRange)); + if (Reverse1) + { + op1b = op1->Prev; + while ((op1b->Pt == op1->Pt) && (op1b != op1)) op1b = op1b->Prev; + if ((op1b->Pt.Y > op1->Pt.Y) || + !SlopesEqual(op1->Pt, op1b->Pt, j->OffPt, m_UseFullRange)) return false; + }; + op2b = op2->Next; + while ((op2b->Pt == op2->Pt) && (op2b != op2))op2b = op2b->Next; + bool Reverse2 = ((op2b->Pt.Y > op2->Pt.Y) || + !SlopesEqual(op2->Pt, op2b->Pt, j->OffPt, m_UseFullRange)); + if (Reverse2) + { + op2b = op2->Prev; + while ((op2b->Pt == op2->Pt) && (op2b != op2)) op2b = op2b->Prev; + if ((op2b->Pt.Y > op2->Pt.Y) || + !SlopesEqual(op2->Pt, op2b->Pt, j->OffPt, m_UseFullRange)) return false; + } + + if ((op1b == op1) || (op2b == op2) || (op1b == op2b) || + ((outRec1 == outRec2) && (Reverse1 == Reverse2))) return false; + + if (Reverse1) + { + op1b = DupOutPt(op1, false); + op2b = DupOutPt(op2, true); + op1->Prev = op2; + op2->Next = op1; + op1b->Next = op2b; + op2b->Prev = op1b; + j->OutPt1 = op1; + j->OutPt2 = op1b; + return true; + } else + { + op1b = DupOutPt(op1, true); + op2b = DupOutPt(op2, false); + op1->Next = op2; + op2->Prev = op1; + op1b->Prev = op2b; + op2b->Next = op1b; + j->OutPt1 = op1; + j->OutPt2 = op1b; + return true; + } + } +} +//---------------------------------------------------------------------- + +static OutRec* ParseFirstLeft(OutRec* FirstLeft) +{ + while (FirstLeft && !FirstLeft->Pts) + FirstLeft = FirstLeft->FirstLeft; + return FirstLeft; +} +//------------------------------------------------------------------------------ + +void Clipper::FixupFirstLefts1(OutRec* OldOutRec, OutRec* NewOutRec) +{ + //tests if NewOutRec contains the polygon before reassigning FirstLeft + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); ++i) + { + OutRec* outRec = m_PolyOuts[i]; + OutRec* firstLeft = ParseFirstLeft(outRec->FirstLeft); + if (outRec->Pts && firstLeft == OldOutRec) + { + if (Poly2ContainsPoly1(outRec->Pts, NewOutRec->Pts)) + outRec->FirstLeft = NewOutRec; + } + } +} +//---------------------------------------------------------------------- + +void Clipper::FixupFirstLefts2(OutRec* InnerOutRec, OutRec* OuterOutRec) +{ + //A polygon has split into two such that one is now the inner of the other. + //It's possible that these polygons now wrap around other polygons, so check + //every polygon that's also contained by OuterOutRec's FirstLeft container + //(including 0) to see if they've become inner to the new inner polygon ... + OutRec* orfl = OuterOutRec->FirstLeft; + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); ++i) + { + OutRec* outRec = m_PolyOuts[i]; + + if (!outRec->Pts || outRec == OuterOutRec || outRec == InnerOutRec) + continue; + OutRec* firstLeft = ParseFirstLeft(outRec->FirstLeft); + if (firstLeft != orfl && firstLeft != InnerOutRec && firstLeft != OuterOutRec) + continue; + if (Poly2ContainsPoly1(outRec->Pts, InnerOutRec->Pts)) + outRec->FirstLeft = InnerOutRec; + else if (Poly2ContainsPoly1(outRec->Pts, OuterOutRec->Pts)) + outRec->FirstLeft = OuterOutRec; + else if (outRec->FirstLeft == InnerOutRec || outRec->FirstLeft == OuterOutRec) + outRec->FirstLeft = orfl; + } +} +//---------------------------------------------------------------------- +void Clipper::FixupFirstLefts3(OutRec* OldOutRec, OutRec* NewOutRec) +{ + //reassigns FirstLeft WITHOUT testing if NewOutRec contains the polygon + for (PolyOutList::size_type i = 0; i < m_PolyOuts.size(); ++i) + { + OutRec* outRec = m_PolyOuts[i]; + OutRec* firstLeft = ParseFirstLeft(outRec->FirstLeft); + if (outRec->Pts && firstLeft == OldOutRec) + outRec->FirstLeft = NewOutRec; + } +} +//---------------------------------------------------------------------- + +void Clipper::JoinCommonEdges() +{ + for (JoinList::size_type i = 0; i < m_Joins.size(); i++) + { + Join* join = m_Joins[i]; + + OutRec *outRec1 = GetOutRec(join->OutPt1->Idx); + OutRec *outRec2 = GetOutRec(join->OutPt2->Idx); + + if (!outRec1->Pts || !outRec2->Pts) continue; + if (outRec1->IsOpen || outRec2->IsOpen) continue; + + //get the polygon fragment with the correct hole state (FirstLeft) + //before calling JoinPoints() ... + OutRec *holeStateRec; + if (outRec1 == outRec2) holeStateRec = outRec1; + else if (OutRec1RightOfOutRec2(outRec1, outRec2)) holeStateRec = outRec2; + else if (OutRec1RightOfOutRec2(outRec2, outRec1)) holeStateRec = outRec1; + else holeStateRec = GetLowermostRec(outRec1, outRec2); + + if (!JoinPoints(join, outRec1, outRec2)) continue; + + if (outRec1 == outRec2) + { + //instead of joining two polygons, we've just created a new one by + //splitting one polygon into two. + outRec1->Pts = join->OutPt1; + outRec1->BottomPt = 0; + outRec2 = CreateOutRec(); + outRec2->Pts = join->OutPt2; + + //update all OutRec2.Pts Idx's ... + UpdateOutPtIdxs(*outRec2); + + if (Poly2ContainsPoly1(outRec2->Pts, outRec1->Pts)) + { + //outRec1 contains outRec2 ... + outRec2->IsHole = !outRec1->IsHole; + outRec2->FirstLeft = outRec1; + + if (m_UsingPolyTree) FixupFirstLefts2(outRec2, outRec1); + + if ((outRec2->IsHole ^ m_ReverseOutput) == (Area(*outRec2) > 0)) + ReversePolyPtLinks(outRec2->Pts); + + } else if (Poly2ContainsPoly1(outRec1->Pts, outRec2->Pts)) + { + //outRec2 contains outRec1 ... + outRec2->IsHole = outRec1->IsHole; + outRec1->IsHole = !outRec2->IsHole; + outRec2->FirstLeft = outRec1->FirstLeft; + outRec1->FirstLeft = outRec2; + + if (m_UsingPolyTree) FixupFirstLefts2(outRec1, outRec2); + + if ((outRec1->IsHole ^ m_ReverseOutput) == (Area(*outRec1) > 0)) + ReversePolyPtLinks(outRec1->Pts); + } + else + { + //the 2 polygons are completely separate ... + outRec2->IsHole = outRec1->IsHole; + outRec2->FirstLeft = outRec1->FirstLeft; + + //fixup FirstLeft pointers that may need reassigning to OutRec2 + if (m_UsingPolyTree) FixupFirstLefts1(outRec1, outRec2); + } + + } else + { + //joined 2 polygons together ... + + outRec2->Pts = 0; + outRec2->BottomPt = 0; + outRec2->Idx = outRec1->Idx; + + outRec1->IsHole = holeStateRec->IsHole; + if (holeStateRec == outRec2) + outRec1->FirstLeft = outRec2->FirstLeft; + outRec2->FirstLeft = outRec1; + + if (m_UsingPolyTree) FixupFirstLefts3(outRec2, outRec1); + } + } +} + +//------------------------------------------------------------------------------ +// ClipperOffset support functions ... +//------------------------------------------------------------------------------ + +DoublePoint GetUnitNormal(const IntPoint &pt1, const IntPoint &pt2) +{ + if(pt2.X == pt1.X && pt2.Y == pt1.Y) + return DoublePoint(0, 0); + + double Dx = (double)(pt2.X - pt1.X); + double dy = (double)(pt2.Y - pt1.Y); + double f = 1 *1.0/ std::sqrt( Dx*Dx + dy*dy ); + Dx *= f; + dy *= f; + return DoublePoint(dy, -Dx); +} + +//------------------------------------------------------------------------------ +// ClipperOffset class +//------------------------------------------------------------------------------ + +ClipperOffset::ClipperOffset(double miterLimit, double arcTolerance) +{ + this->MiterLimit = miterLimit; + this->ArcTolerance = arcTolerance; + m_lowest.X = -1; +} +//------------------------------------------------------------------------------ + +ClipperOffset::~ClipperOffset() +{ + Clear(); +} +//------------------------------------------------------------------------------ + +void ClipperOffset::Clear() +{ + for (int i = 0; i < m_polyNodes.ChildCount(); ++i) + delete m_polyNodes.Childs[i]; + m_polyNodes.Childs.clear(); + m_lowest.X = -1; +} +//------------------------------------------------------------------------------ + +void ClipperOffset::AddPath(const Path& path, JoinType joinType, EndType endType) +{ + int highI = (int)path.size() - 1; + if (highI < 0) return; + PolyNode* newNode = new PolyNode(); + newNode->m_jointype = joinType; + newNode->m_endtype = endType; + + //strip duplicate points from path and also get index to the lowest point ... + if (endType == etClosedLine || endType == etClosedPolygon) + while (highI > 0 && path[0] == path[highI]) highI--; + newNode->Contour.reserve(highI + 1); + newNode->Contour.push_back(path[0]); + int j = 0, k = 0; + for (int i = 1; i <= highI; i++) + if (newNode->Contour[j] != path[i]) + { + j++; + newNode->Contour.push_back(path[i]); + if (path[i].Y > newNode->Contour[k].Y || + (path[i].Y == newNode->Contour[k].Y && + path[i].X < newNode->Contour[k].X)) k = j; + } + if (endType == etClosedPolygon && j < 2) + { + delete newNode; + return; + } + m_polyNodes.AddChild(*newNode); + + //if this path's lowest pt is lower than all the others then update m_lowest + if (endType != etClosedPolygon) return; + if (m_lowest.X < 0) + m_lowest = IntPoint(m_polyNodes.ChildCount() - 1, k); + else + { + IntPoint ip = m_polyNodes.Childs[(int)m_lowest.X]->Contour[(int)m_lowest.Y]; + if (newNode->Contour[k].Y > ip.Y || + (newNode->Contour[k].Y == ip.Y && + newNode->Contour[k].X < ip.X)) + m_lowest = IntPoint(m_polyNodes.ChildCount() - 1, k); + } +} +//------------------------------------------------------------------------------ + +void ClipperOffset::AddPaths(const Paths& paths, JoinType joinType, EndType endType) +{ + for (Paths::size_type i = 0; i < paths.size(); ++i) + AddPath(paths[i], joinType, endType); +} +//------------------------------------------------------------------------------ + +void ClipperOffset::FixOrientations() +{ + //fixup orientations of all closed paths if the orientation of the + //closed path with the lowermost vertex is wrong ... + if (m_lowest.X >= 0 && + !Orientation(m_polyNodes.Childs[(int)m_lowest.X]->Contour)) + { + for (int i = 0; i < m_polyNodes.ChildCount(); ++i) + { + PolyNode& node = *m_polyNodes.Childs[i]; + if (node.m_endtype == etClosedPolygon || + (node.m_endtype == etClosedLine && Orientation(node.Contour))) + ReversePath(node.Contour); + } + } else + { + for (int i = 0; i < m_polyNodes.ChildCount(); ++i) + { + PolyNode& node = *m_polyNodes.Childs[i]; + if (node.m_endtype == etClosedLine && !Orientation(node.Contour)) + ReversePath(node.Contour); + } + } +} +//------------------------------------------------------------------------------ + +void ClipperOffset::Execute(Paths& solution, double delta) +{ + solution.clear(); + FixOrientations(); + DoOffset(delta); + + //now clean up 'corners' ... + Clipper clpr; + clpr.AddPaths(m_destPolys, ptSubject, true); + if (delta > 0) + { + clpr.Execute(ctUnion, solution, pftPositive, pftPositive); + } + else + { + IntRect r = clpr.GetBounds(); + Path outer(4); + outer[0] = IntPoint(r.left - 10, r.bottom + 10); + outer[1] = IntPoint(r.right + 10, r.bottom + 10); + outer[2] = IntPoint(r.right + 10, r.top - 10); + outer[3] = IntPoint(r.left - 10, r.top - 10); + + clpr.AddPath(outer, ptSubject, true); + clpr.ReverseSolution(true); + clpr.Execute(ctUnion, solution, pftNegative, pftNegative); + if (solution.size() > 0) solution.erase(solution.begin()); + } +} +//------------------------------------------------------------------------------ + +void ClipperOffset::Execute(PolyTree& solution, double delta) +{ + solution.Clear(); + FixOrientations(); + DoOffset(delta); + + //now clean up 'corners' ... + Clipper clpr; + clpr.AddPaths(m_destPolys, ptSubject, true); + if (delta > 0) + { + clpr.Execute(ctUnion, solution, pftPositive, pftPositive); + } + else + { + IntRect r = clpr.GetBounds(); + Path outer(4); + outer[0] = IntPoint(r.left - 10, r.bottom + 10); + outer[1] = IntPoint(r.right + 10, r.bottom + 10); + outer[2] = IntPoint(r.right + 10, r.top - 10); + outer[3] = IntPoint(r.left - 10, r.top - 10); + + clpr.AddPath(outer, ptSubject, true); + clpr.ReverseSolution(true); + clpr.Execute(ctUnion, solution, pftNegative, pftNegative); + //remove the outer PolyNode rectangle ... + if (solution.ChildCount() == 1 && solution.Childs[0]->ChildCount() > 0) + { + PolyNode* outerNode = solution.Childs[0]; + solution.Childs.reserve(outerNode->ChildCount()); + solution.Childs[0] = outerNode->Childs[0]; + solution.Childs[0]->Parent = outerNode->Parent; + for (int i = 1; i < outerNode->ChildCount(); ++i) + solution.AddChild(*outerNode->Childs[i]); + } + else + solution.Clear(); + } +} +//------------------------------------------------------------------------------ + +void ClipperOffset::DoOffset(double delta) +{ + m_destPolys.clear(); + m_delta = delta; + + //if Zero offset, just copy any CLOSED polygons to m_p and return ... + if (NEAR_ZERO(delta)) + { + m_destPolys.reserve(m_polyNodes.ChildCount()); + for (int i = 0; i < m_polyNodes.ChildCount(); i++) + { + PolyNode& node = *m_polyNodes.Childs[i]; + if (node.m_endtype == etClosedPolygon) + m_destPolys.push_back(node.Contour); + } + return; + } + + //see offset_triginometry3.svg in the documentation folder ... + if (MiterLimit > 2) m_miterLim = 2/(MiterLimit * MiterLimit); + else m_miterLim = 0.5; + + double y; + if (ArcTolerance <= 0.0) y = def_arc_tolerance; + else if (ArcTolerance > std::fabs(delta) * def_arc_tolerance) + y = std::fabs(delta) * def_arc_tolerance; + else y = ArcTolerance; + //see offset_triginometry2.svg in the documentation folder ... + double steps = pi / std::acos(1 - y / std::fabs(delta)); + if (steps > std::fabs(delta) * pi) + steps = std::fabs(delta) * pi; //ie excessive precision check + m_sin = std::sin(two_pi / steps); + m_cos = std::cos(two_pi / steps); + m_StepsPerRad = steps / two_pi; + if (delta < 0.0) m_sin = -m_sin; + + m_destPolys.reserve(m_polyNodes.ChildCount() * 2); + for (int i = 0; i < m_polyNodes.ChildCount(); i++) + { + PolyNode& node = *m_polyNodes.Childs[i]; + m_srcPoly = node.Contour; + + int len = (int)m_srcPoly.size(); + if (len == 0 || (delta <= 0 && (len < 3 || node.m_endtype != etClosedPolygon))) + continue; + + m_destPoly.clear(); + if (len == 1) + { + if (node.m_jointype == jtRound) + { + double X = 1.0, Y = 0.0; + for (cInt j = 1; j <= steps; j++) + { + m_destPoly.push_back(IntPoint( + Round(m_srcPoly[0].X + X * delta), + Round(m_srcPoly[0].Y + Y * delta))); + double X2 = X; + X = X * m_cos - m_sin * Y; + Y = X2 * m_sin + Y * m_cos; + } + } + else + { + double X = -1.0, Y = -1.0; + for (int j = 0; j < 4; ++j) + { + m_destPoly.push_back(IntPoint( + Round(m_srcPoly[0].X + X * delta), + Round(m_srcPoly[0].Y + Y * delta))); + if (X < 0) X = 1; + else if (Y < 0) Y = 1; + else X = -1; + } + } + m_destPolys.push_back(m_destPoly); + continue; + } + //build m_normals ... + m_normals.clear(); + m_normals.reserve(len); + for (int j = 0; j < len - 1; ++j) + m_normals.push_back(GetUnitNormal(m_srcPoly[j], m_srcPoly[j + 1])); + if (node.m_endtype == etClosedLine || node.m_endtype == etClosedPolygon) + m_normals.push_back(GetUnitNormal(m_srcPoly[len - 1], m_srcPoly[0])); + else + m_normals.push_back(DoublePoint(m_normals[len - 2])); + + if (node.m_endtype == etClosedPolygon) + { + int k = len - 1; + for (int j = 0; j < len; ++j) + OffsetPoint(j, k, node.m_jointype); + m_destPolys.push_back(m_destPoly); + } + else if (node.m_endtype == etClosedLine) + { + int k = len - 1; + for (int j = 0; j < len; ++j) + OffsetPoint(j, k, node.m_jointype); + m_destPolys.push_back(m_destPoly); + m_destPoly.clear(); + //re-build m_normals ... + DoublePoint n = m_normals[len -1]; + for (int j = len - 1; j > 0; j--) + m_normals[j] = DoublePoint(-m_normals[j - 1].X, -m_normals[j - 1].Y); + m_normals[0] = DoublePoint(-n.X, -n.Y); + k = 0; + for (int j = len - 1; j >= 0; j--) + OffsetPoint(j, k, node.m_jointype); + m_destPolys.push_back(m_destPoly); + } + else + { + int k = 0; + for (int j = 1; j < len - 1; ++j) + OffsetPoint(j, k, node.m_jointype); + + IntPoint pt1; + if (node.m_endtype == etOpenButt) + { + int j = len - 1; + pt1 = IntPoint((cInt)Round(m_srcPoly[j].X + m_normals[j].X * + delta), (cInt)Round(m_srcPoly[j].Y + m_normals[j].Y * delta)); + m_destPoly.push_back(pt1); + pt1 = IntPoint((cInt)Round(m_srcPoly[j].X - m_normals[j].X * + delta), (cInt)Round(m_srcPoly[j].Y - m_normals[j].Y * delta)); + m_destPoly.push_back(pt1); + } + else + { + int j = len - 1; + k = len - 2; + m_sinA = 0; + m_normals[j] = DoublePoint(-m_normals[j].X, -m_normals[j].Y); + if (node.m_endtype == etOpenSquare) + DoSquare(j, k); + else + DoRound(j, k); + } + + //re-build m_normals ... + for (int j = len - 1; j > 0; j--) + m_normals[j] = DoublePoint(-m_normals[j - 1].X, -m_normals[j - 1].Y); + m_normals[0] = DoublePoint(-m_normals[1].X, -m_normals[1].Y); + + k = len - 1; + for (int j = k - 1; j > 0; --j) OffsetPoint(j, k, node.m_jointype); + + if (node.m_endtype == etOpenButt) + { + pt1 = IntPoint((cInt)Round(m_srcPoly[0].X - m_normals[0].X * delta), + (cInt)Round(m_srcPoly[0].Y - m_normals[0].Y * delta)); + m_destPoly.push_back(pt1); + pt1 = IntPoint((cInt)Round(m_srcPoly[0].X + m_normals[0].X * delta), + (cInt)Round(m_srcPoly[0].Y + m_normals[0].Y * delta)); + m_destPoly.push_back(pt1); + } + else + { + k = 1; + m_sinA = 0; + if (node.m_endtype == etOpenSquare) + DoSquare(0, 1); + else + DoRound(0, 1); + } + m_destPolys.push_back(m_destPoly); + } + } +} +//------------------------------------------------------------------------------ + +void ClipperOffset::OffsetPoint(int j, int& k, JoinType jointype) +{ + //cross product ... + m_sinA = (m_normals[k].X * m_normals[j].Y - m_normals[j].X * m_normals[k].Y); + if (std::fabs(m_sinA * m_delta) < 1.0) + { + //dot product ... + double cosA = (m_normals[k].X * m_normals[j].X + m_normals[j].Y * m_normals[k].Y ); + if (cosA > 0) // angle => 0 degrees + { + m_destPoly.push_back(IntPoint(Round(m_srcPoly[j].X + m_normals[k].X * m_delta), + Round(m_srcPoly[j].Y + m_normals[k].Y * m_delta))); + return; + } + //else angle => 180 degrees + } + else if (m_sinA > 1.0) m_sinA = 1.0; + else if (m_sinA < -1.0) m_sinA = -1.0; + + if (m_sinA * m_delta < 0) + { + m_destPoly.push_back(IntPoint(Round(m_srcPoly[j].X + m_normals[k].X * m_delta), + Round(m_srcPoly[j].Y + m_normals[k].Y * m_delta))); + m_destPoly.push_back(m_srcPoly[j]); + m_destPoly.push_back(IntPoint(Round(m_srcPoly[j].X + m_normals[j].X * m_delta), + Round(m_srcPoly[j].Y + m_normals[j].Y * m_delta))); + } + else + switch (jointype) + { + case jtMiter: + { + double r = 1 + (m_normals[j].X * m_normals[k].X + + m_normals[j].Y * m_normals[k].Y); + if (r >= m_miterLim) DoMiter(j, k, r); else DoSquare(j, k); + break; + } + case jtSquare: DoSquare(j, k); break; + case jtRound: DoRound(j, k); break; + } + k = j; +} +//------------------------------------------------------------------------------ + +void ClipperOffset::DoSquare(int j, int k) +{ + double dx = std::tan(std::atan2(m_sinA, + m_normals[k].X * m_normals[j].X + m_normals[k].Y * m_normals[j].Y) / 4); + m_destPoly.push_back(IntPoint( + Round(m_srcPoly[j].X + m_delta * (m_normals[k].X - m_normals[k].Y * dx)), + Round(m_srcPoly[j].Y + m_delta * (m_normals[k].Y + m_normals[k].X * dx)))); + m_destPoly.push_back(IntPoint( + Round(m_srcPoly[j].X + m_delta * (m_normals[j].X + m_normals[j].Y * dx)), + Round(m_srcPoly[j].Y + m_delta * (m_normals[j].Y - m_normals[j].X * dx)))); +} +//------------------------------------------------------------------------------ + +void ClipperOffset::DoMiter(int j, int k, double r) +{ + double q = m_delta / r; + m_destPoly.push_back(IntPoint(Round(m_srcPoly[j].X + (m_normals[k].X + m_normals[j].X) * q), + Round(m_srcPoly[j].Y + (m_normals[k].Y + m_normals[j].Y) * q))); +} +//------------------------------------------------------------------------------ + +void ClipperOffset::DoRound(int j, int k) +{ + double a = std::atan2(m_sinA, + m_normals[k].X * m_normals[j].X + m_normals[k].Y * m_normals[j].Y); + int steps = std::max((int)Round(m_StepsPerRad * std::fabs(a)), 1); + + double X = m_normals[k].X, Y = m_normals[k].Y, X2; + for (int i = 0; i < steps; ++i) + { + m_destPoly.push_back(IntPoint( + Round(m_srcPoly[j].X + X * m_delta), + Round(m_srcPoly[j].Y + Y * m_delta))); + X2 = X; + X = X * m_cos - m_sin * Y; + Y = X2 * m_sin + Y * m_cos; + } + m_destPoly.push_back(IntPoint( + Round(m_srcPoly[j].X + m_normals[j].X * m_delta), + Round(m_srcPoly[j].Y + m_normals[j].Y * m_delta))); +} + +//------------------------------------------------------------------------------ +// Miscellaneous public functions +//------------------------------------------------------------------------------ + +void Clipper::DoSimplePolygons() +{ + PolyOutList::size_type i = 0; + while (i < m_PolyOuts.size()) + { + OutRec* outrec = m_PolyOuts[i++]; + OutPt* op = outrec->Pts; + if (!op || outrec->IsOpen) continue; + do //for each Pt in Polygon until duplicate found do ... + { + OutPt* op2 = op->Next; + while (op2 != outrec->Pts) + { + if ((op->Pt == op2->Pt) && op2->Next != op && op2->Prev != op) + { + //split the polygon into two ... + OutPt* op3 = op->Prev; + OutPt* op4 = op2->Prev; + op->Prev = op4; + op4->Next = op; + op2->Prev = op3; + op3->Next = op2; + + outrec->Pts = op; + OutRec* outrec2 = CreateOutRec(); + outrec2->Pts = op2; + UpdateOutPtIdxs(*outrec2); + if (Poly2ContainsPoly1(outrec2->Pts, outrec->Pts)) + { + //OutRec2 is contained by OutRec1 ... + outrec2->IsHole = !outrec->IsHole; + outrec2->FirstLeft = outrec; + if (m_UsingPolyTree) FixupFirstLefts2(outrec2, outrec); + } + else + if (Poly2ContainsPoly1(outrec->Pts, outrec2->Pts)) + { + //OutRec1 is contained by OutRec2 ... + outrec2->IsHole = outrec->IsHole; + outrec->IsHole = !outrec2->IsHole; + outrec2->FirstLeft = outrec->FirstLeft; + outrec->FirstLeft = outrec2; + if (m_UsingPolyTree) FixupFirstLefts2(outrec, outrec2); + } + else + { + //the 2 polygons are separate ... + outrec2->IsHole = outrec->IsHole; + outrec2->FirstLeft = outrec->FirstLeft; + if (m_UsingPolyTree) FixupFirstLefts1(outrec, outrec2); + } + op2 = op; //ie get ready for the Next iteration + } + op2 = op2->Next; + } + op = op->Next; + } + while (op != outrec->Pts); + } +} +//------------------------------------------------------------------------------ + +void ReversePath(Path& p) +{ + std::reverse(p.begin(), p.end()); +} +//------------------------------------------------------------------------------ + +void ReversePaths(Paths& p) +{ + for (Paths::size_type i = 0; i < p.size(); ++i) + ReversePath(p[i]); +} +//------------------------------------------------------------------------------ + +void SimplifyPolygon(const Path &in_poly, Paths &out_polys, PolyFillType fillType) +{ + Clipper c; + c.StrictlySimple(true); + c.AddPath(in_poly, ptSubject, true); + c.Execute(ctUnion, out_polys, fillType, fillType); +} +//------------------------------------------------------------------------------ + +void SimplifyPolygons(const Paths &in_polys, Paths &out_polys, PolyFillType fillType) +{ + Clipper c; + c.StrictlySimple(true); + c.AddPaths(in_polys, ptSubject, true); + c.Execute(ctUnion, out_polys, fillType, fillType); +} +//------------------------------------------------------------------------------ + +void SimplifyPolygons(Paths &polys, PolyFillType fillType) +{ + SimplifyPolygons(polys, polys, fillType); +} +//------------------------------------------------------------------------------ + +inline double DistanceSqrd(const IntPoint& pt1, const IntPoint& pt2) +{ + double Dx = ((double)pt1.X - pt2.X); + double dy = ((double)pt1.Y - pt2.Y); + return (Dx*Dx + dy*dy); +} +//------------------------------------------------------------------------------ + +double DistanceFromLineSqrd( + const IntPoint& pt, const IntPoint& ln1, const IntPoint& ln2) +{ + //The equation of a line in general form (Ax + By + C = 0) + //given 2 points (x�,y�) & (x�,y�) is ... + //(y� - y�)x + (x� - x�)y + (y� - y�)x� - (x� - x�)y� = 0 + //A = (y� - y�); B = (x� - x�); C = (y� - y�)x� - (x� - x�)y� + //perpendicular distance of point (x�,y�) = (Ax� + By� + C)/Sqrt(A� + B�) + //see http://en.wikipedia.org/wiki/Perpendicular_distance + double A = double(ln1.Y - ln2.Y); + double B = double(ln2.X - ln1.X); + double C = A * ln1.X + B * ln1.Y; + C = A * pt.X + B * pt.Y - C; + return (C * C) / (A * A + B * B); +} +//--------------------------------------------------------------------------- + +bool SlopesNearCollinear(const IntPoint& pt1, + const IntPoint& pt2, const IntPoint& pt3, double distSqrd) +{ + //this function is more accurate when the point that's geometrically + //between the other 2 points is the one that's tested for distance. + //ie makes it more likely to pick up 'spikes' ... + if (Abs(pt1.X - pt2.X) > Abs(pt1.Y - pt2.Y)) + { + if ((pt1.X > pt2.X) == (pt1.X < pt3.X)) + return DistanceFromLineSqrd(pt1, pt2, pt3) < distSqrd; + else if ((pt2.X > pt1.X) == (pt2.X < pt3.X)) + return DistanceFromLineSqrd(pt2, pt1, pt3) < distSqrd; + else + return DistanceFromLineSqrd(pt3, pt1, pt2) < distSqrd; + } + else + { + if ((pt1.Y > pt2.Y) == (pt1.Y < pt3.Y)) + return DistanceFromLineSqrd(pt1, pt2, pt3) < distSqrd; + else if ((pt2.Y > pt1.Y) == (pt2.Y < pt3.Y)) + return DistanceFromLineSqrd(pt2, pt1, pt3) < distSqrd; + else + return DistanceFromLineSqrd(pt3, pt1, pt2) < distSqrd; + } +} +//------------------------------------------------------------------------------ + +bool PointsAreClose(IntPoint pt1, IntPoint pt2, double distSqrd) +{ + double Dx = (double)pt1.X - pt2.X; + double dy = (double)pt1.Y - pt2.Y; + return ((Dx * Dx) + (dy * dy) <= distSqrd); +} +//------------------------------------------------------------------------------ + +OutPt* ExcludeOp(OutPt* op) +{ + OutPt* result = op->Prev; + result->Next = op->Next; + op->Next->Prev = result; + result->Idx = 0; + return result; +} +//------------------------------------------------------------------------------ + +void CleanPolygon(const Path& in_poly, Path& out_poly, double distance) +{ + //distance = proximity in units/pixels below which vertices + //will be stripped. Default ~= sqrt(2). + + size_t size = in_poly.size(); + + if (size == 0) + { + out_poly.clear(); + return; + } + + OutPt* outPts = new OutPt[size]; + for (size_t i = 0; i < size; ++i) + { + outPts[i].Pt = in_poly[i]; + outPts[i].Next = &outPts[(i + 1) % size]; + outPts[i].Next->Prev = &outPts[i]; + outPts[i].Idx = 0; + } + + double distSqrd = distance * distance; + OutPt* op = &outPts[0]; + while (op->Idx == 0 && op->Next != op->Prev) + { + if (PointsAreClose(op->Pt, op->Prev->Pt, distSqrd)) + { + op = ExcludeOp(op); + size--; + } + else if (PointsAreClose(op->Prev->Pt, op->Next->Pt, distSqrd)) + { + ExcludeOp(op->Next); + op = ExcludeOp(op); + size -= 2; + } + else if (SlopesNearCollinear(op->Prev->Pt, op->Pt, op->Next->Pt, distSqrd)) + { + op = ExcludeOp(op); + size--; + } + else + { + op->Idx = 1; + op = op->Next; + } + } + + if (size < 3) size = 0; + out_poly.resize(size); + for (size_t i = 0; i < size; ++i) + { + out_poly[i] = op->Pt; + op = op->Next; + } + delete [] outPts; +} +//------------------------------------------------------------------------------ + +void CleanPolygon(Path& poly, double distance) +{ + CleanPolygon(poly, poly, distance); +} +//------------------------------------------------------------------------------ + +void CleanPolygons(const Paths& in_polys, Paths& out_polys, double distance) +{ + out_polys.resize(in_polys.size()); + for (Paths::size_type i = 0; i < in_polys.size(); ++i) + CleanPolygon(in_polys[i], out_polys[i], distance); +} +//------------------------------------------------------------------------------ + +void CleanPolygons(Paths& polys, double distance) +{ + CleanPolygons(polys, polys, distance); +} +//------------------------------------------------------------------------------ + +void Minkowski(const Path& poly, const Path& path, + Paths& solution, bool isSum, bool isClosed) +{ + int delta = (isClosed ? 1 : 0); + size_t polyCnt = poly.size(); + size_t pathCnt = path.size(); + Paths pp; + pp.reserve(pathCnt); + if (isSum) + for (size_t i = 0; i < pathCnt; ++i) + { + Path p; + p.reserve(polyCnt); + for (size_t j = 0; j < poly.size(); ++j) + p.push_back(IntPoint(path[i].X + poly[j].X, path[i].Y + poly[j].Y)); + pp.push_back(p); + } + else + for (size_t i = 0; i < pathCnt; ++i) + { + Path p; + p.reserve(polyCnt); + for (size_t j = 0; j < poly.size(); ++j) + p.push_back(IntPoint(path[i].X - poly[j].X, path[i].Y - poly[j].Y)); + pp.push_back(p); + } + + solution.clear(); + solution.reserve((pathCnt + delta) * (polyCnt + 1)); + for (size_t i = 0; i < pathCnt - 1 + delta; ++i) + for (size_t j = 0; j < polyCnt; ++j) + { + Path quad; + quad.reserve(4); + quad.push_back(pp[i % pathCnt][j % polyCnt]); + quad.push_back(pp[(i + 1) % pathCnt][j % polyCnt]); + quad.push_back(pp[(i + 1) % pathCnt][(j + 1) % polyCnt]); + quad.push_back(pp[i % pathCnt][(j + 1) % polyCnt]); + if (!Orientation(quad)) ReversePath(quad); + solution.push_back(quad); + } +} +//------------------------------------------------------------------------------ + +void MinkowskiSum(const Path& pattern, const Path& path, Paths& solution, bool pathIsClosed) +{ + Minkowski(pattern, path, solution, true, pathIsClosed); + Clipper c; + c.AddPaths(solution, ptSubject, true); + c.Execute(ctUnion, solution, pftNonZero, pftNonZero); +} +//------------------------------------------------------------------------------ + +void TranslatePath(const Path& input, Path& output, const IntPoint delta) +{ + //precondition: input != output + output.resize(input.size()); + for (size_t i = 0; i < input.size(); ++i) + output[i] = IntPoint(input[i].X + delta.X, input[i].Y + delta.Y); +} +//------------------------------------------------------------------------------ + +void MinkowskiSum(const Path& pattern, const Paths& paths, Paths& solution, bool pathIsClosed) +{ + Clipper c; + for (size_t i = 0; i < paths.size(); ++i) + { + Paths tmp; + Minkowski(pattern, paths[i], tmp, true, pathIsClosed); + c.AddPaths(tmp, ptSubject, true); + if (pathIsClosed) + { + Path tmp2; + TranslatePath(paths[i], tmp2, pattern[0]); + c.AddPath(tmp2, ptClip, true); + } + } + c.Execute(ctUnion, solution, pftNonZero, pftNonZero); +} +//------------------------------------------------------------------------------ + +void MinkowskiDiff(const Path& poly1, const Path& poly2, Paths& solution) +{ + Minkowski(poly1, poly2, solution, false, true); + Clipper c; + c.AddPaths(solution, ptSubject, true); + c.Execute(ctUnion, solution, pftNonZero, pftNonZero); +} +//------------------------------------------------------------------------------ + +enum NodeType {ntAny, ntOpen, ntClosed}; + +void AddPolyNodeToPaths(const PolyNode& polynode, NodeType nodetype, Paths& paths) +{ + bool match = true; + if (nodetype == ntClosed) match = !polynode.IsOpen(); + else if (nodetype == ntOpen) return; + + if (!polynode.Contour.empty() && match) + paths.push_back(polynode.Contour); + for (int i = 0; i < polynode.ChildCount(); ++i) + AddPolyNodeToPaths(*polynode.Childs[i], nodetype, paths); +} +//------------------------------------------------------------------------------ + +void PolyTreeToPaths(const PolyTree& polytree, Paths& paths) +{ + paths.resize(0); + paths.reserve(polytree.Total()); + AddPolyNodeToPaths(polytree, ntAny, paths); +} +//------------------------------------------------------------------------------ + +void ClosedPathsFromPolyTree(const PolyTree& polytree, Paths& paths) +{ + paths.resize(0); + paths.reserve(polytree.Total()); + AddPolyNodeToPaths(polytree, ntClosed, paths); +} +//------------------------------------------------------------------------------ + +void OpenPathsFromPolyTree(PolyTree& polytree, Paths& paths) +{ + paths.resize(0); + paths.reserve(polytree.Total()); + //Open paths are top level only, so ... + for (int i = 0; i < polytree.ChildCount(); ++i) + if (polytree.Childs[i]->IsOpen()) + paths.push_back(polytree.Childs[i]->Contour); +} +//------------------------------------------------------------------------------ + +std::ostream& operator <<(std::ostream &s, const IntPoint &p) +{ + s << "(" << p.X << "," << p.Y << ")"; + return s; +} +//------------------------------------------------------------------------------ + +std::ostream& operator <<(std::ostream &s, const Path &p) +{ + if (p.empty()) return s; + Path::size_type last = p.size() -1; + for (Path::size_type i = 0; i < last; i++) + s << "(" << p[i].X << "," << p[i].Y << "), "; + s << "(" << p[last].X << "," << p[last].Y << ")\n"; + return s; +} +//------------------------------------------------------------------------------ + +std::ostream& operator <<(std::ostream &s, const Paths &p) +{ + for (Paths::size_type i = 0; i < p.size(); i++) + s << p[i]; + s << "\n"; + return s; +} +//------------------------------------------------------------------------------ + +} //ClipperLib namespace diff --git a/deploy/ios_demo/ocr_demo/pdocr/ocr_clipper.hpp b/deploy/ios_demo/ocr_demo/pdocr/ocr_clipper.hpp new file mode 100644 index 0000000000000000000000000000000000000000..50d8f6e96065a0422a5da02316b930aae2bd6f26 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/pdocr/ocr_clipper.hpp @@ -0,0 +1,547 @@ +/******************************************************************************* +* * +* Author : Angus Johnson * +* Version : 6.4.2 * +* Date : 27 February 2017 * +* Website : http://www.angusj.com * +* Copyright : Angus Johnson 2010-2017 * +* * +* License: * +* Use, modification & distribution is subject to Boost Software License Ver 1. * +* http://www.boost.org/LICENSE_1_0.txt * +* * +* Attributions: * +* The code in this library is an extension of Bala Vatti's clipping algorithm: * +* "A generic solution to polygon clipping" * +* Communications of the ACM, Vol 35, Issue 7 (July 1992) pp 56-63. * +* http://portal.acm.org/citation.cfm?id=129906 * +* * +* Computer graphics and geometric modeling: implementation and algorithms * +* By Max K. Agoston * +* Springer; 1 edition (January 4, 2005) * +* http://books.google.com/books?q=vatti+clipping+agoston * +* * +* See also: * +* "Polygon Offsetting by Computing Winding Numbers" * +* Paper no. DETC2005-85513 pp. 565-575 * +* ASME 2005 International Design Engineering Technical Conferences * +* and Computers and Information in Engineering Conference (IDETC/CIE2005) * +* September 24-28, 2005 , Long Beach, California, USA * +* http://www.me.berkeley.edu/~mcmains/pubs/DAC05OffsetPolygon.pdf * +* * +*******************************************************************************/ + +#ifndef clipper_hpp +#define clipper_hpp + +#define CLIPPER_VERSION "6.4.2" + +//use_int32: When enabled 32bit ints are used instead of 64bit ints. This +//improve performance but coordinate values are limited to the range +/- 46340 +//#define use_int32 + +//use_xyz: adds a Z member to IntPoint. Adds a minor cost to perfomance. +//#define use_xyz + +//use_lines: Enables line clipping. Adds a very minor cost to performance. +#define use_lines + +//use_deprecated: Enables temporary support for the obsolete functions +//#define use_deprecated + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace ClipperLib { + +enum ClipType { + ctIntersection, ctUnion, ctDifference, ctXor +}; +enum PolyType { + ptSubject, ptClip +}; +//By far the most widely used winding rules for polygon filling are +//EvenOdd & NonZero (GDI, GDI+, XLib, OpenGL, Cairo, AGG, Quartz, SVG, Gr32) +//Others rules include Positive, Negative and ABS_GTR_EQ_TWO (only in OpenGL) +//see http://glprogramming.com/red/chapter11.html +enum PolyFillType { + pftEvenOdd, pftNonZero, pftPositive, pftNegative +}; + +#ifdef use_int32 +typedef int cInt; +static cInt const loRange = 0x7FFF; +static cInt const hiRange = 0x7FFF; +#else +typedef signed long long cInt; +static cInt const loRange = 0x3FFFFFFF; +static cInt const hiRange = 0x3FFFFFFFFFFFFFFFLL; +typedef signed long long long64; //used by Int128 class +typedef unsigned long long ulong64; + +#endif + +struct IntPoint { + cInt X; + cInt Y; +#ifdef use_xyz + cInt Z; + IntPoint(cInt x = 0, cInt y = 0, cInt z = 0): X(x), Y(y), Z(z) {}; +#else + + IntPoint(cInt x = 0, cInt y = 0) : X(x), Y(y) {}; +#endif + + friend inline bool operator==(const IntPoint &a, const IntPoint &b) { + return a.X == b.X && a.Y == b.Y; + } + + friend inline bool operator!=(const IntPoint &a, const IntPoint &b) { + return a.X != b.X || a.Y != b.Y; + } +}; +//------------------------------------------------------------------------------ + +typedef std::vector Path; +typedef std::vector Paths; + +inline Path &operator<<(Path &poly, const IntPoint &p) { + poly.push_back(p); + return poly; +} + +inline Paths &operator<<(Paths &polys, const Path &p) { + polys.push_back(p); + return polys; +} + +std::ostream &operator<<(std::ostream &s, const IntPoint &p); + +std::ostream &operator<<(std::ostream &s, const Path &p); + +std::ostream &operator<<(std::ostream &s, const Paths &p); + +struct DoublePoint { + double X; + double Y; + + DoublePoint(double x = 0, double y = 0) : X(x), Y(y) {} + + DoublePoint(IntPoint ip) : X((double) ip.X), Y((double) ip.Y) {} +}; +//------------------------------------------------------------------------------ + +#ifdef use_xyz +typedef void (*ZFillCallback)(IntPoint& e1bot, IntPoint& e1top, IntPoint& e2bot, IntPoint& e2top, IntPoint& pt); +#endif + +enum InitOptions { + ioReverseSolution = 1, ioStrictlySimple = 2, ioPreserveCollinear = 4 +}; +enum JoinType { + jtSquare, jtRound, jtMiter +}; +enum EndType { + etClosedPolygon, etClosedLine, etOpenButt, etOpenSquare, etOpenRound +}; + +class PolyNode; + +typedef std::vector PolyNodes; + +class PolyNode { +public: + PolyNode(); + + virtual ~PolyNode() {}; + Path Contour; + PolyNodes Childs; + PolyNode *Parent; + + PolyNode *GetNext() const; + + bool IsHole() const; + + bool IsOpen() const; + + int ChildCount() const; + +private: + //PolyNode& operator =(PolyNode& other); + unsigned Index; //node index in Parent.Childs + bool m_IsOpen; + JoinType m_jointype; + EndType m_endtype; + + PolyNode *GetNextSiblingUp() const; + + void AddChild(PolyNode &child); + + friend class Clipper; //to access Index + friend class ClipperOffset; +}; + +class PolyTree : public PolyNode { +public: + ~PolyTree() { Clear(); }; + + PolyNode *GetFirst() const; + + void Clear(); + + int Total() const; + +private: + //PolyTree& operator =(PolyTree& other); + PolyNodes AllNodes; + + friend class Clipper; //to access AllNodes +}; + +bool Orientation(const Path &poly); + +double Area(const Path &poly); + +int PointInPolygon(const IntPoint &pt, const Path &path); + +void SimplifyPolygon(const Path &in_poly, Paths &out_polys, PolyFillType fillType = pftEvenOdd); + +void SimplifyPolygons(const Paths &in_polys, Paths &out_polys, PolyFillType fillType = pftEvenOdd); + +void SimplifyPolygons(Paths &polys, PolyFillType fillType = pftEvenOdd); + +void CleanPolygon(const Path &in_poly, Path &out_poly, double distance = 1.415); + +void CleanPolygon(Path &poly, double distance = 1.415); + +void CleanPolygons(const Paths &in_polys, Paths &out_polys, double distance = 1.415); + +void CleanPolygons(Paths &polys, double distance = 1.415); + +void MinkowskiSum(const Path &pattern, const Path &path, Paths &solution, bool pathIsClosed); + +void MinkowskiSum(const Path &pattern, const Paths &paths, Paths &solution, bool pathIsClosed); + +void MinkowskiDiff(const Path &poly1, const Path &poly2, Paths &solution); + +void PolyTreeToPaths(const PolyTree &polytree, Paths &paths); + +void ClosedPathsFromPolyTree(const PolyTree &polytree, Paths &paths); + +void OpenPathsFromPolyTree(PolyTree &polytree, Paths &paths); + +void ReversePath(Path &p); + +void ReversePaths(Paths &p); + +struct IntRect { + cInt left; + cInt top; + cInt right; + cInt bottom; +}; + +//enums that are used internally ... +enum EdgeSide { + esLeft = 1, esRight = 2 +}; + +//forward declarations (for stuff used internally) ... +struct TEdge; +struct IntersectNode; +struct LocalMinimum; +struct OutPt; +struct OutRec; +struct Join; + +typedef std::vector PolyOutList; +typedef std::vector EdgeList; +typedef std::vector JoinList; +typedef std::vector IntersectList; + +//------------------------------------------------------------------------------ + +//ClipperBase is the ancestor to the Clipper class. It should not be +//instantiated directly. This class simply abstracts the conversion of sets of +//polygon coordinates into edge objects that are stored in a LocalMinima list. +class ClipperBase { +public: + ClipperBase(); + + virtual ~ClipperBase(); + + virtual bool AddPath(const Path &pg, PolyType PolyTyp, bool Closed); + + bool AddPaths(const Paths &ppg, PolyType PolyTyp, bool Closed); + + virtual void Clear(); + + IntRect GetBounds(); + + bool PreserveCollinear() { return m_PreserveCollinear; }; + + void PreserveCollinear(bool value) { m_PreserveCollinear = value; }; +protected: + void DisposeLocalMinimaList(); + + TEdge *AddBoundsToLML(TEdge *e, bool IsClosed); + + virtual void Reset(); + + TEdge *ProcessBound(TEdge *E, bool IsClockwise); + + void InsertScanbeam(const cInt Y); + + bool PopScanbeam(cInt &Y); + + bool LocalMinimaPending(); + + bool PopLocalMinima(cInt Y, const LocalMinimum *&locMin); + + OutRec *CreateOutRec(); + + void DisposeAllOutRecs(); + + void DisposeOutRec(PolyOutList::size_type index); + + void SwapPositionsInAEL(TEdge *edge1, TEdge *edge2); + + void DeleteFromAEL(TEdge *e); + + void UpdateEdgeIntoAEL(TEdge *&e); + + typedef std::vector MinimaList; + MinimaList::iterator m_CurrentLM; + MinimaList m_MinimaList; + + bool m_UseFullRange; + EdgeList m_edges; + bool m_PreserveCollinear; + bool m_HasOpenPaths; + PolyOutList m_PolyOuts; + TEdge *m_ActiveEdges; + + typedef std::priority_queue ScanbeamList; + ScanbeamList m_Scanbeam; +}; +//------------------------------------------------------------------------------ + +class Clipper : public virtual ClipperBase { +public: + Clipper(int initOptions = 0); + + bool Execute(ClipType clipType, + Paths &solution, + PolyFillType fillType = pftEvenOdd); + + bool Execute(ClipType clipType, + Paths &solution, + PolyFillType subjFillType, + PolyFillType clipFillType); + + bool Execute(ClipType clipType, + PolyTree &polytree, + PolyFillType fillType = pftEvenOdd); + + bool Execute(ClipType clipType, + PolyTree &polytree, + PolyFillType subjFillType, + PolyFillType clipFillType); + + bool ReverseSolution() { return m_ReverseOutput; }; + + void ReverseSolution(bool value) { m_ReverseOutput = value; }; + + bool StrictlySimple() { return m_StrictSimple; }; + + void StrictlySimple(bool value) { m_StrictSimple = value; }; + //set the callback function for z value filling on intersections (otherwise Z is 0) +#ifdef use_xyz + void ZFillFunction(ZFillCallback zFillFunc); +#endif +protected: + virtual bool ExecuteInternal(); + +private: + JoinList m_Joins; + JoinList m_GhostJoins; + IntersectList m_IntersectList; + ClipType m_ClipType; + typedef std::list MaximaList; + MaximaList m_Maxima; + TEdge *m_SortedEdges; + bool m_ExecuteLocked; + PolyFillType m_ClipFillType; + PolyFillType m_SubjFillType; + bool m_ReverseOutput; + bool m_UsingPolyTree; + bool m_StrictSimple; +#ifdef use_xyz + ZFillCallback m_ZFill; //custom callback +#endif + + void SetWindingCount(TEdge &edge); + + bool IsEvenOddFillType(const TEdge &edge) const; + + bool IsEvenOddAltFillType(const TEdge &edge) const; + + void InsertLocalMinimaIntoAEL(const cInt botY); + + void InsertEdgeIntoAEL(TEdge *edge, TEdge *startEdge); + + void AddEdgeToSEL(TEdge *edge); + + bool PopEdgeFromSEL(TEdge *&edge); + + void CopyAELToSEL(); + + void DeleteFromSEL(TEdge *e); + + void SwapPositionsInSEL(TEdge *edge1, TEdge *edge2); + + bool IsContributing(const TEdge &edge) const; + + bool IsTopHorz(const cInt XPos); + + void DoMaxima(TEdge *e); + + void ProcessHorizontals(); + + void ProcessHorizontal(TEdge *horzEdge); + + void AddLocalMaxPoly(TEdge *e1, TEdge *e2, const IntPoint &pt); + + OutPt *AddLocalMinPoly(TEdge *e1, TEdge *e2, const IntPoint &pt); + + OutRec *GetOutRec(int idx); + + void AppendPolygon(TEdge *e1, TEdge *e2); + + void IntersectEdges(TEdge *e1, TEdge *e2, IntPoint &pt); + + OutPt *AddOutPt(TEdge *e, const IntPoint &pt); + + OutPt *GetLastOutPt(TEdge *e); + + bool ProcessIntersections(const cInt topY); + + void BuildIntersectList(const cInt topY); + + void ProcessIntersectList(); + + void ProcessEdgesAtTopOfScanbeam(const cInt topY); + + void BuildResult(Paths &polys); + + void BuildResult2(PolyTree &polytree); + + void SetHoleState(TEdge *e, OutRec *outrec); + + void DisposeIntersectNodes(); + + bool FixupIntersectionOrder(); + + void FixupOutPolygon(OutRec &outrec); + + void FixupOutPolyline(OutRec &outrec); + + bool IsHole(TEdge *e); + + bool FindOwnerFromSplitRecs(OutRec &outRec, OutRec *&currOrfl); + + void FixHoleLinkage(OutRec &outrec); + + void AddJoin(OutPt *op1, OutPt *op2, const IntPoint offPt); + + void ClearJoins(); + + void ClearGhostJoins(); + + void AddGhostJoin(OutPt *op, const IntPoint offPt); + + bool JoinPoints(Join *j, OutRec *outRec1, OutRec *outRec2); + + void JoinCommonEdges(); + + void DoSimplePolygons(); + + void FixupFirstLefts1(OutRec *OldOutRec, OutRec *NewOutRec); + + void FixupFirstLefts2(OutRec *InnerOutRec, OutRec *OuterOutRec); + + void FixupFirstLefts3(OutRec *OldOutRec, OutRec *NewOutRec); + +#ifdef use_xyz + void SetZ(IntPoint& pt, TEdge& e1, TEdge& e2); +#endif +}; +//------------------------------------------------------------------------------ + +class ClipperOffset { +public: + ClipperOffset(double miterLimit = 2.0, double roundPrecision = 0.25); + + ~ClipperOffset(); + + void AddPath(const Path &path, JoinType joinType, EndType endType); + + void AddPaths(const Paths &paths, JoinType joinType, EndType endType); + + void Execute(Paths &solution, double delta); + + void Execute(PolyTree &solution, double delta); + + void Clear(); + + double MiterLimit; + double ArcTolerance; +private: + Paths m_destPolys; + Path m_srcPoly; + Path m_destPoly; + std::vector m_normals; + double m_delta, m_sinA, m_sin, m_cos; + double m_miterLim, m_StepsPerRad; + IntPoint m_lowest; + PolyNode m_polyNodes; + + void FixOrientations(); + + void DoOffset(double delta); + + void OffsetPoint(int j, int &k, JoinType jointype); + + void DoSquare(int j, int k); + + void DoMiter(int j, int k, double r); + + void DoRound(int j, int k); +}; +//------------------------------------------------------------------------------ + +class clipperException : public std::exception { +public: + clipperException(const char *description) : m_descr(description) {} + + virtual ~clipperException() throw() {} + + virtual const char *what() const throw() { return m_descr.c_str(); } + +private: + std::string m_descr; +}; +//------------------------------------------------------------------------------ + +} //ClipperLib namespace + +#endif //clipper_hpp + + diff --git a/deploy/ios_demo/ocr_demo/pdocr/ocr_crnn_process.cpp b/deploy/ios_demo/ocr_demo/pdocr/ocr_crnn_process.cpp new file mode 100644 index 0000000000000000000000000000000000000000..df932d5ed3b767b3f95a549a6209cee044bfa267 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/pdocr/ocr_crnn_process.cpp @@ -0,0 +1,141 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "ocr_crnn_process.h" +#include +#include +#include +#include +#include +#include + +const std::string CHARACTER_TYPE = "ch"; +const int MAX_DICT_LENGTH = 6624; +const std::vector REC_IMAGE_SHAPE = {3, 32, 320}; + +static cv::Mat crnn_resize_norm_img(cv::Mat img, float wh_ratio) { + int imgC, imgH, imgW; + imgC = REC_IMAGE_SHAPE[0]; + imgW = REC_IMAGE_SHAPE[2]; + imgH = REC_IMAGE_SHAPE[1]; + + if (CHARACTER_TYPE == "ch") + imgW = int(32 * wh_ratio); + + float ratio = float(img.cols) / float(img.rows); + int resize_w, resize_h; + if (ceilf(imgH * ratio) > imgW) + resize_w = imgW; + else + resize_w = int(ceilf(imgH * ratio)); + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f, cv::INTER_CUBIC); + + resize_img.convertTo(resize_img, CV_32FC3, 1 / 255.f); + + for (int h = 0; h < resize_img.rows; h++) { + for (int w = 0; w < resize_img.cols; w++) { + resize_img.at(h, w)[0] = (resize_img.at(h, w)[0] - 0.5) * 2; + resize_img.at(h, w)[1] = (resize_img.at(h, w)[1] - 0.5) * 2; + resize_img.at(h, w)[2] = (resize_img.at(h, w)[2] - 0.5) * 2; + } + } + + cv::Mat dist; + cv::copyMakeBorder(resize_img, dist, 0, 0, 0, int(imgW - resize_w), cv::BORDER_CONSTANT, + {0, 0, 0}); + + return dist; + +} + +cv::Mat crnn_resize_img(const cv::Mat &img, float wh_ratio) { + int imgC = REC_IMAGE_SHAPE[0]; + int imgW = REC_IMAGE_SHAPE[2]; + int imgH = REC_IMAGE_SHAPE[1]; + + if (CHARACTER_TYPE == "ch") { + imgW = int(32 * wh_ratio); + } + + float ratio = float(img.cols) / float(img.rows); + int resize_w; + if (ceilf(imgH * ratio) > imgW) + resize_w = imgW; + else + resize_w = int(ceilf(imgH * ratio)); + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(resize_w, imgH)); + return resize_img; +} + + +cv::Mat get_rotate_crop_image(const cv::Mat &srcimage, const std::vector> &box) { + + std::vector> points = box; + + int x_collect[4] = {box[0][0], box[1][0], box[2][0], box[3][0]}; + int y_collect[4] = {box[0][1], box[1][1], box[2][1], box[3][1]}; + int left = int(*std::min_element(x_collect, x_collect + 4)); + int right = int(*std::max_element(x_collect, x_collect + 4)); + int top = int(*std::min_element(y_collect, y_collect + 4)); + int bottom = int(*std::max_element(y_collect, y_collect + 4)); + + cv::Mat img_crop; + srcimage(cv::Rect(left, top, right - left, bottom - top)).copyTo(img_crop); + + for (int i = 0; i < points.size(); i++) { + points[i][0] -= left; + points[i][1] -= top; + } + + int img_crop_width = int(sqrt(pow(points[0][0] - points[1][0], 2) + + pow(points[0][1] - points[1][1], 2))); + int img_crop_height = int(sqrt(pow(points[0][0] - points[3][0], 2) + + pow(points[0][1] - points[3][1], 2))); + + cv::Point2f pts_std[4]; + pts_std[0] = cv::Point2f(0., 0.); + pts_std[1] = cv::Point2f(img_crop_width, 0.); + pts_std[2] = cv::Point2f(img_crop_width, img_crop_height); + pts_std[3] = cv::Point2f(0.f, img_crop_height); + + cv::Point2f pointsf[4]; + pointsf[0] = cv::Point2f(points[0][0], points[0][1]); + pointsf[1] = cv::Point2f(points[1][0], points[1][1]); + pointsf[2] = cv::Point2f(points[2][0], points[2][1]); + pointsf[3] = cv::Point2f(points[3][0], points[3][1]); + + cv::Mat M = cv::getPerspectiveTransform(pointsf, pts_std); + + cv::Mat dst_img; + cv::warpPerspective(img_crop, dst_img, M, cv::Size(img_crop_width, img_crop_height), + cv::BORDER_REPLICATE); + + if (float(dst_img.rows) >= float(dst_img.cols) * 1.5) { + /* + cv::Mat srcCopy = cv::Mat(dst_img.rows, dst_img.cols, dst_img.depth()); + cv::transpose(dst_img, srcCopy); + cv::flip(srcCopy, srcCopy, 0); + return srcCopy; + */ + cv::transpose(dst_img, dst_img); + cv::flip(dst_img, dst_img, 0); + return dst_img; + } else { + return dst_img; + } + +} + diff --git a/deploy/ios_demo/ocr_demo/pdocr/ocr_crnn_process.h b/deploy/ios_demo/ocr_demo/pdocr/ocr_crnn_process.h new file mode 100644 index 0000000000000000000000000000000000000000..1ab55e3439aec1d2d2a1a70b6d11d9e45fabe356 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/pdocr/ocr_crnn_process.h @@ -0,0 +1,19 @@ +// +// Created by fujiayi on 2020/7/3. +// +#pragma once + +#include +#include + + +extern const std::vector REC_IMAGE_SHAPE; + +cv::Mat get_rotate_crop_image(const cv::Mat &srcimage, const std::vector>& box); + +cv::Mat crnn_resize_img(const cv::Mat& img, float wh_ratio); + +template +inline size_t argmax(ForwardIterator first, ForwardIterator last) { + return std::distance(first, std::max_element(first, last)); +} \ No newline at end of file diff --git a/deploy/ios_demo/ocr_demo/pdocr/ocr_db_post_process.cpp b/deploy/ios_demo/ocr_demo/pdocr/ocr_db_post_process.cpp new file mode 100644 index 0000000000000000000000000000000000000000..76d7cd4da81dcfaeffdca194a87633a7bb5c27e7 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/pdocr/ocr_db_post_process.cpp @@ -0,0 +1,372 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include +#include +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include "ocr_clipper.hpp" + + +static void getcontourarea(float **box, float unclip_ratio, float &distance) { + int pts_num = 4; + float area = 0.0f; + float dist = 0.0f; + for (int i = 0; i < pts_num; i++) { + area += box[i][0] * box[(i + 1) % pts_num][1] - box[i][1] * box[(i + 1) % pts_num][0]; + dist += sqrtf( + (box[i][0] - box[(i + 1) % pts_num][0]) * (box[i][0] - box[(i + 1) % pts_num][0]) + + (box[i][1] - box[(i + 1) % pts_num][1]) * (box[i][1] - box[(i + 1) % pts_num][1])); + } + area = fabs(float(area / 2.0)); + + distance = area * unclip_ratio / dist; + +} + +static cv::RotatedRect unclip(float **box) { + float unclip_ratio = 2.0; + float distance = 1.0; + + getcontourarea(box, unclip_ratio, distance); + + ClipperLib::ClipperOffset offset; + ClipperLib::Path p; + p << ClipperLib::IntPoint(int(box[0][0]), int(box[0][1])) + << ClipperLib::IntPoint(int(box[1][0]), int(box[1][1])) << + ClipperLib::IntPoint(int(box[2][0]), int(box[2][1])) + << ClipperLib::IntPoint(int(box[3][0]), int(box[3][1])); + offset.AddPath(p, ClipperLib::jtRound, ClipperLib::etClosedPolygon); + + ClipperLib::Paths soln; + offset.Execute(soln, distance); + std::vector points; + + for (int j = 0; j < soln.size(); j++) { + for (int i = 0; i < soln[soln.size() - 1].size(); i++) { + points.emplace_back(soln[j][i].X, soln[j][i].Y); + } + } + cv::RotatedRect res = cv::minAreaRect(points); + + return res; +} + +static float **Mat2Vec(cv::Mat mat) { + auto **array = new float *[mat.rows]; + for (int i = 0; i < mat.rows; ++i) + array[i] = new float[mat.cols]; + for (int i = 0; i < mat.rows; ++i) { + for (int j = 0; j < mat.cols; ++j) { + array[i][j] = mat.at(i, j); + } + } + + return array; +} + +static void quickSort(float **s, int l, int r) { + if (l < r) { + int i = l, j = r; + float x = s[l][0]; + float *xp = s[l]; + while (i < j) { + while (i < j && s[j][0] >= x) + j--; + if (i < j) + std::swap(s[i++], s[j]); + while (i < j && s[i][0] < x) + i++; + if (i < j) + std::swap(s[j--], s[i]); + } + s[i] = xp; + quickSort(s, l, i - 1); + quickSort(s, i + 1, r); + } +} + +static void quickSort_vector(std::vector> &box, int l, int r, int axis) { + if (l < r) { + int i = l, j = r; + int x = box[l][axis]; + std::vector xp(box[l]); + while (i < j) { + while (i < j && box[j][axis] >= x) + j--; + if (i < j) + std::swap(box[i++], box[j]); + while (i < j && box[i][axis] < x) + i++; + if (i < j) + std::swap(box[j--], box[i]); + } + box[i] = xp; + quickSort_vector(box, l, i - 1, axis); + quickSort_vector(box, i + 1, r, axis); + } +} + +static std::vector> order_points_clockwise(std::vector> pts) { + std::vector> box = pts; + quickSort_vector(box, 0, int(box.size() - 1), 0); + std::vector> leftmost = {box[0], box[1]}; + std::vector> rightmost = {box[2], box[3]}; + + if (leftmost[0][1] > leftmost[1][1]) + std::swap(leftmost[0], leftmost[1]); + + if (rightmost[0][1] > rightmost[1][1]) + std::swap(rightmost[0], rightmost[1]); + + std::vector> rect = {leftmost[0], rightmost[0], rightmost[1], leftmost[1]}; + return rect; +} + +static float **get_mini_boxes(cv::RotatedRect box, float &ssid) { + ssid = box.size.width >= box.size.height ? box.size.height : box.size.width; + + cv::Mat points; + cv::boxPoints(box, points); + // sorted box points + auto array = Mat2Vec(points); + quickSort(array, 0, 3); + + float *idx1 = array[0], *idx2 = array[1], *idx3 = array[2], *idx4 = array[3]; + if (array[3][1] <= array[2][1]) { + idx2 = array[3]; + idx3 = array[2]; + } else { + idx2 = array[2]; + idx3 = array[3]; + } + if (array[1][1] <= array[0][1]) { + idx1 = array[1]; + idx4 = array[0]; + } else { + idx1 = array[0]; + idx4 = array[1]; + } + + array[0] = idx1; + array[1] = idx2; + array[2] = idx3; + array[3] = idx4; + + return array; +} + +template T clamp(T x, T min, T max) { + if (x > max){ + return max; + } + if (x < min){ + return min; + } + return x; +} + +static float clampf(float x, float min, float max) { + if (x > max) + return max; + if (x < min) + return min; + return x; +} + + +float box_score_fast(float **box_array, cv::Mat pred) { + auto array = box_array; + int width = pred.cols; + int height = pred.rows; + + float box_x[4] = {array[0][0], array[1][0], array[2][0], array[3][0]}; + float box_y[4] = {array[0][1], array[1][1], array[2][1], array[3][1]}; + + int xmin = clamp(int(std::floorf(*(std::min_element(box_x, box_x + 4)))), 0, width - 1); + int xmax = clamp(int(std::ceilf(*(std::max_element(box_x, box_x + 4)))), 0, width - 1); + int ymin = clamp(int(std::floorf(*(std::min_element(box_y, box_y + 4)))), 0, height - 1); + int ymax = clamp(int(std::ceilf(*(std::max_element(box_y, box_y + 4)))), 0, height - 1); + + cv::Mat mask; + mask = cv::Mat::zeros(ymax - ymin + 1, xmax - xmin + 1, CV_8UC1); + + cv::Point root_point[4]; + root_point[0] = cv::Point(int(array[0][0]) - xmin, int(array[0][1]) - ymin); + root_point[1] = cv::Point(int(array[1][0]) - xmin, int(array[1][1]) - ymin); + root_point[2] = cv::Point(int(array[2][0]) - xmin, int(array[2][1]) - ymin); + root_point[3] = cv::Point(int(array[3][0]) - xmin, int(array[3][1]) - ymin); + const cv::Point *ppt[1] = {root_point}; + int npt[] = {4}; + cv::fillPoly(mask, ppt, npt, 1, cv::Scalar(1)); + + cv::Mat croppedImg; + pred(cv::Rect(xmin, ymin, xmax - xmin + 1, ymax - ymin + 1)).copyTo(croppedImg); + + auto score = cv::mean(croppedImg, mask)[0]; + return score; +} + + +std::vector>> +boxes_from_bitmap(const cv::Mat& pred, const cv::Mat& bitmap) { + const int min_size = 3; + const int max_candidates = 1000; + const float box_thresh = 0.5; + + int width = bitmap.cols; + int height = bitmap.rows; + + std::vector> contours; + std::vector hierarchy; + + cv::findContours(bitmap, contours, hierarchy, cv::RETR_LIST, cv::CHAIN_APPROX_SIMPLE); + + int num_contours = contours.size() >= max_candidates ? max_candidates : contours.size(); + + std::vector>> boxes; + + for (int _i = 0; _i < num_contours; _i++) { + float ssid; + cv::RotatedRect box = cv::minAreaRect(contours[_i]); + auto array = get_mini_boxes(box, ssid); + + auto box_for_unclip = array; + //end get_mini_box + + if (ssid < min_size) { + continue; + } + + float score; + score = box_score_fast(array, pred); + //end box_score_fast + if (score < box_thresh) + continue; + + + // start for unclip + cv::RotatedRect points = unclip(box_for_unclip); + // end for unclip + + cv::RotatedRect clipbox = points; + auto cliparray = get_mini_boxes(clipbox, ssid); + + if (ssid < min_size + 2) continue; + + int dest_width = pred.cols; + int dest_height = pred.rows; + std::vector> intcliparray; + + for (int num_pt = 0; num_pt < 4; num_pt++) { + std::vector a{ + int(clampf(roundf(cliparray[num_pt][0] / float(width) * float(dest_width)), 0, + float(dest_width))), + int(clampf(roundf(cliparray[num_pt][1] / float(height) * float(dest_height)), 0, + float(dest_height)))}; + intcliparray.push_back(a); + } + boxes.push_back(intcliparray); + + }//end for + return boxes; +} + +int _max(int a, int b) { + return a >= b ? a : b; +} + +int _min(int a, int b) { + return a >= b ? b : a; +} + +std::vector>> +filter_tag_det_res(const std::vector>>& o_boxes, + float ratio_h, float ratio_w,const cv::Mat& srcimg) { + int oriimg_h = srcimg.rows; + int oriimg_w = srcimg.cols; + std::vector>> boxes{o_boxes}; + std::vector>> root_points; + for (int n = 0; n < boxes.size(); n++) { + boxes[n] = order_points_clockwise(boxes[n]); + for (int m = 0; m < boxes[0].size(); m++) { + boxes[n][m][0] /= ratio_w; + boxes[n][m][1] /= ratio_h; + + boxes[n][m][0] = int(_min(_max(boxes[n][m][0], 0), oriimg_w - 1)); + boxes[n][m][1] = int(_min(_max(boxes[n][m][1], 0), oriimg_h - 1)); + } + } + + for (int n = 0; n < boxes.size(); n++) { + int rect_width, rect_height; + rect_width = int(sqrt( + pow(boxes[n][0][0] - boxes[n][1][0], 2) + pow(boxes[n][0][1] - boxes[n][1][1], 2))); + rect_height = int(sqrt( + pow(boxes[n][0][0] - boxes[n][3][0], 2) + pow(boxes[n][0][1] - boxes[n][3][1], 2))); + if (rect_width <= 10 || rect_height <= 10) + continue; + root_points.push_back(boxes[n]); + } + return root_points; +} + +/* +using namespace std; +// read data from txt file +cv::Mat readtxt2(std::string path, int imgw, int imgh, int imgc) { + std::cout << "read data file from txt file! " << std::endl; + ifstream in(path); + string line; + int count = 0; + int i = 0, j = 0; + std::vector img_mean = {0.485, 0.456, 0.406}; + std::vector img_std = {0.229, 0.224, 0.225}; + + float trainData[imgh][imgw*imgc]; + + while (getline(in, line)) { + stringstream ss(line); + double x; + while (ss >> x) { +// trainData[i][j] = float(x) * img_std[j % 3] + img_mean[j % 3]; + trainData[i][j] = float(x); + j++; + } + i++; + j = 0; + } + + cv::Mat pred_map(imgh, imgw*imgc, CV_32FC1, (float *) trainData); + cv::Mat reshape_img = pred_map.reshape(imgc, imgh); + return reshape_img; +} + */ +//using namespace std; +// +//void writetxt(vector> data, std::string save_path){ +// +// ofstream fout(save_path); +// +// for (int i = 0; i < data.size(); i++) { +// for (int j=0; j< data[0].size(); j++){ +// fout << data[i][j] << " "; +// } +// fout << endl; +// } +// fout << endl; +// fout.close(); +//} diff --git a/deploy/ios_demo/ocr_demo/pdocr/ocr_db_post_process.h b/deploy/ios_demo/ocr_demo/pdocr/ocr_db_post_process.h new file mode 100644 index 0000000000000000000000000000000000000000..a0546d2d69c71587f0ed397d9140abb77df176a6 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/pdocr/ocr_db_post_process.h @@ -0,0 +1,10 @@ +// +// Created by fujiayi on 2020/7/2. +// +#pragma once + +std::vector>> boxes_from_bitmap(const cv::Mat& pred, const cv::Mat& bitmap); + +std::vector>> +filter_tag_det_res(const std::vector>>& o_boxes, + float ratio_h, float ratio_w, const cv::Mat& srcimg); \ No newline at end of file diff --git a/deploy/ios_demo/ocr_demo/timer.h b/deploy/ios_demo/ocr_demo/timer.h new file mode 100644 index 0000000000000000000000000000000000000000..e98be1bfb080d91cd66a72ad79c5721905516ca8 --- /dev/null +++ b/deploy/ios_demo/ocr_demo/timer.h @@ -0,0 +1,87 @@ +// +// timer.h +// face_demo +// +// Created by Li,Xiaoyang(SYS) on 2019/8/20. +// Copyright © 2019年 Li,Xiaoyang(SYS). All rights reserved. +// + +#ifndef timer_h +#define timer_h +#include +#include +class Timer final { + +public: + Timer() {} + + ~Timer() {} + + void clear() { + ms_time.clear(); + } + + void start() { + tstart = std::chrono::system_clock::now(); + } + + void end() { + tend = std::chrono::system_clock::now(); + auto ts = std::chrono::duration_cast(tend - tstart); + float elapse_ms = 1000.f * float(ts.count()) * std::chrono::microseconds::period::num / \ + std::chrono::microseconds::period::den; + ms_time.push_back(elapse_ms); + } + + float get_average_ms() { + if (ms_time.size() == 0) { + return 0.f; + } + float sum = 0.f; + for (auto i : ms_time){ + sum += i; + } + return sum / ms_time.size(); + } + + float get_sum_ms(){ + if (ms_time.size() == 0) { + return 0.f; + } + float sum = 0.f; + for (auto i : ms_time){ + sum += i; + } + return sum; + } + + // return tile (0-99) time. + float get_tile_time(float tile) { + + if (tile <0 || tile > 100) { + return -1.f; + } + int total_items = (int)ms_time.size(); + if (total_items <= 0) { + return -2.f; + } + ms_time.sort(); + int pos = (int)(tile * total_items / 100); + auto it = ms_time.begin(); + for (int i = 0; i < pos; ++i) { + ++it; + } + return *it; + } + + const std::list get_time_stat() { + return ms_time; + } + +private: + std::chrono::time_point tstart; + std::chrono::time_point tend; + std::list ms_time; +}; + +#endif /* timer_h */ diff --git a/deploy/lite/Makefile b/deploy/lite/Makefile index 96e05ecf01904fdcb21a103e78783da6dd748ca9..4c30d64475730f5b1cdc20713493b31d66540b4b 100644 --- a/deploy/lite/Makefile +++ b/deploy/lite/Makefile @@ -40,8 +40,8 @@ CXX_LIBS = ${OPENCV_LIBS} -L$(LITE_ROOT)/cxx/lib/ -lpaddle_light_api_shared $(SY #CXX_LIBS = $(LITE_ROOT)/cxx/lib/libpaddle_api_light_bundled.a $(SYSTEM_LIBS) -ocr_db_crnn: fetch_opencv ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o - $(CC) $(SYSROOT_LINK) $(CXXFLAGS_LINK) ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o -o ocr_db_crnn $(CXX_LIBS) $(LDFLAGS) +ocr_db_crnn: fetch_opencv ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o cls_process.o + $(CC) $(SYSROOT_LINK) $(CXXFLAGS_LINK) ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o cls_process.o -o ocr_db_crnn $(CXX_LIBS) $(LDFLAGS) ocr_db_crnn.o: ocr_db_crnn.cc $(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o ocr_db_crnn.o -c ocr_db_crnn.cc @@ -49,6 +49,9 @@ ocr_db_crnn.o: ocr_db_crnn.cc crnn_process.o: fetch_opencv crnn_process.cc $(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o crnn_process.o -c crnn_process.cc +cls_process.o: fetch_opencv cls_process.cc + $(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o cls_process.o -c cls_process.cc + db_post_process.o: fetch_clipper fetch_opencv db_post_process.cc $(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o db_post_process.o -c db_post_process.cc @@ -73,5 +76,5 @@ fetch_opencv: .PHONY: clean clean: - rm -f ocr_db_crnn.o clipper.o db_post_process.o crnn_process.o + rm -f ocr_db_crnn.o clipper.o db_post_process.o crnn_process.o cls_process.o rm -f ocr_db_crnn diff --git a/deploy/lite/cls_process.cc b/deploy/lite/cls_process.cc new file mode 100644 index 0000000000000000000000000000000000000000..f522e4bc5a9050c981a91bcbd8dfefd719f034c2 --- /dev/null +++ b/deploy/lite/cls_process.cc @@ -0,0 +1,43 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "cls_process.h" //NOLINT +#include +#include +#include + +const std::vector rec_image_shape{3, 32, 100}; + +cv::Mat ClsResizeImg(cv::Mat img) { + int imgC, imgH, imgW; + imgC = rec_image_shape[0]; + imgH = rec_image_shape[1]; + imgW = rec_image_shape[2]; + + float ratio = static_cast(img.cols) / static_cast(img.rows); + + int resize_w, resize_h; + if (ceilf(imgH * ratio) > imgW) + resize_w = imgW; + else + resize_w = int(ceilf(imgH * ratio)); + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f, + cv::INTER_LINEAR); + if (resize_w < imgW) { + cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, imgW - resize_w, + cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0)); + } + return resize_img; +} \ No newline at end of file diff --git a/deploy/lite/cls_process.h b/deploy/lite/cls_process.h new file mode 100644 index 0000000000000000000000000000000000000000..eedeeb9ba7a14a5686fd27ce96ad26b74f2bf7ed --- /dev/null +++ b/deploy/lite/cls_process.h @@ -0,0 +1,29 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#include +#include +#include + +#include "math.h" //NOLINT +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" + +cv::Mat ClsResizeImg(cv::Mat img); \ No newline at end of file diff --git a/deploy/lite/config.txt b/deploy/lite/config.txt index 8ed835dd2c055b2cf1abb31c3c380204d66d7f2a..f08f8e49dfe45ab0f4e47c994d471e0515771df6 100644 --- a/deploy/lite/config.txt +++ b/deploy/lite/config.txt @@ -1,4 +1,4 @@ max_side_len 960 det_db_thresh 0.3 det_db_box_thresh 0.5 -det_db_unclip_ratio 2.0 \ No newline at end of file +det_db_unclip_ratio 1.6 \ No newline at end of file diff --git a/deploy/lite/db_post_process.cc b/deploy/lite/db_post_process.cc index ff9caa2f114336eab7500c6c929e042e3a7f6895..495016bc7df5e2a901983fa8d78aecbc471f1166 100644 --- a/deploy/lite/db_post_process.cc +++ b/deploy/lite/db_post_process.cc @@ -214,6 +214,9 @@ BoxesFromBitmap(const cv::Mat pred, const cv::Mat bitmap, for (int i = 0; i < num_contours; i++) { float ssid; + if (contours[i].size() <= 2) + continue; + cv::RotatedRect box = cv::minAreaRect(contours[i]); auto array = GetMiniBoxes(box, ssid); @@ -232,6 +235,8 @@ BoxesFromBitmap(const cv::Mat pred, const cv::Mat bitmap, // start for unclip cv::RotatedRect points = Unclip(box_for_unclip, unclip_ratio); + if (points.size.height < 1.001 && points.size.width < 1.001) + continue; // end for unclip cv::RotatedRect clipbox = points; @@ -288,7 +293,7 @@ FilterTagDetRes(std::vector>> boxes, float ratio_h, rect_height = static_cast(sqrt(pow(boxes[n][0][0] - boxes[n][3][0], 2) + pow(boxes[n][0][1] - boxes[n][3][1], 2))); - if (rect_width <= 10 || rect_height <= 10) + if (rect_width <= 4 || rect_height <= 4) continue; root_points.push_back(boxes[n]); } diff --git a/deploy/lite/ocr_db_crnn.cc b/deploy/lite/ocr_db_crnn.cc index 547afbf0a5e315dfbf971a785077116476a73c44..2786d15ad0581bf10f9e8de4b06f94cc5d4027aa 100644 --- a/deploy/lite/ocr_db_crnn.cc +++ b/deploy/lite/ocr_db_crnn.cc @@ -15,6 +15,7 @@ #include "paddle_api.h" // NOLINT #include +#include "cls_process.h" #include "crnn_process.h" #include "db_post_process.h" @@ -105,11 +106,55 @@ cv::Mat DetResizeImg(const cv::Mat img, int max_size_len, return resize_img; } +cv::Mat RunClsModel(cv::Mat img, std::shared_ptr predictor_cls, + const float thresh = 0.5) { + std::vector mean = {0.5f, 0.5f, 0.5f}; + std::vector scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; + + cv::Mat srcimg; + img.copyTo(srcimg); + cv::Mat crop_img; + cv::Mat resize_img; + + int index = 0; + float wh_ratio = + static_cast(crop_img.cols) / static_cast(crop_img.rows); + + resize_img = ClsResizeImg(crop_img); + resize_img.convertTo(resize_img, CV_32FC3, 1 / 255.f); + + const float *dimg = reinterpret_cast(resize_img.data); + + std::unique_ptr input_tensor0(std::move(predictor_cls->GetInput(0))); + input_tensor0->Resize({1, 3, resize_img.rows, resize_img.cols}); + auto *data0 = input_tensor0->mutable_data(); + + NeonMeanScale(dimg, data0, resize_img.rows * resize_img.cols, mean, scale); + // Run CLS predictor + predictor_cls->Run(); + + // Get output and run postprocess + std::unique_ptr softmax_out( + std::move(predictor_cls->GetOutput(0))); + std::unique_ptr label_out( + std::move(predictor_cls->GetOutput(1))); + auto *softmax_scores = softmax_out->mutable_data(); + auto *label_idxs = label_out->data(); + int label_idx = label_idxs[0]; + float score = softmax_scores[label_idx]; + + if (label_idx % 2 == 1 && score > thresh) { + cv::rotate(srcimg, srcimg, 1); + } + return srcimg; +} + void RunRecModel(std::vector>> boxes, cv::Mat img, std::shared_ptr predictor_crnn, std::vector &rec_text, std::vector &rec_text_score, - std::vector charactor_dict) { + std::vector charactor_dict, + std::shared_ptr predictor_cls) { std::vector mean = {0.5f, 0.5f, 0.5f}; std::vector scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; @@ -121,6 +166,7 @@ void RunRecModel(std::vector>> boxes, cv::Mat img, int index = 0; for (int i = boxes.size() - 1; i >= 0; i--) { crop_img = GetRotateCropImage(srcimg, boxes[i]); + crop_img = RunClsModel(crop_img, predictor_cls); float wh_ratio = static_cast(crop_img.cols) / static_cast(crop_img.rows); @@ -235,16 +281,18 @@ RunDetModel(std::shared_ptr predictor, cv::Mat img, } cv::Mat cbuf_map(shape_out[2], shape_out[3], CV_8UC1, - reinterpret_cast cbuf); + reinterpret_cast(cbuf)); cv::Mat pred_map(shape_out[2], shape_out[3], CV_32F, - reinterpret_cast pred); + reinterpret_cast(pred)); const double threshold = double(Config["det_db_thresh"]) * 255; const double maxvalue = 255; cv::Mat bit_map; cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY); - - auto boxes = BoxesFromBitmap(pred_map, bit_map, Config); + cv::Mat dilation_map; + cv::Mat dila_ele = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2, 2)); + cv::dilate(bit_map, dilation_map, dila_ele); + auto boxes = BoxesFromBitmap(pred_map, dilation_map, Config); std::vector>> filter_boxes = FilterTagDetRes(boxes, ratio_hw[0], ratio_hw[1], srcimg); @@ -318,13 +366,15 @@ std::map LoadConfigTxt(std::string config_path) { int main(int argc, char **argv) { if (argc < 5) { std::cerr << "[ERROR] usage: " << argv[0] - << " det_model_file rec_model_file image_path\n"; + << " det_model_file cls_model_file rec_model_file image_path " + "charactor_dict\n"; exit(1); } std::string det_model_file = argv[1]; std::string rec_model_file = argv[2]; - std::string img_path = argv[3]; - std::string dict_path = argv[4]; + std::string cls_model_file = argv[3]; + std::string img_path = argv[4]; + std::string dict_path = argv[5]; //// load config from txt file auto Config = LoadConfigTxt("./config.txt"); @@ -333,8 +383,10 @@ int main(int argc, char **argv) { auto det_predictor = loadModel(det_model_file); auto rec_predictor = loadModel(rec_model_file); + auto cls_predictor = loadModel(cls_model_file); auto charactor_dict = ReadDict(dict_path); + charactor_dict.push_back(" "); cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR); auto boxes = RunDetModel(det_predictor, srcimg, Config); @@ -342,7 +394,7 @@ int main(int argc, char **argv) { std::vector rec_text; std::vector rec_text_score; RunRecModel(boxes, srcimg, rec_predictor, rec_text, rec_text_score, - charactor_dict); + charactor_dict, cls_predictor); auto end = std::chrono::system_clock::now(); auto duration = diff --git a/deploy/lite/prepare.sh b/deploy/lite/prepare.sh new file mode 100644 index 0000000000000000000000000000000000000000..daaa30c4bce619e3bca6009e20aa0afc61f9b1a2 --- /dev/null +++ b/deploy/lite/prepare.sh @@ -0,0 +1,9 @@ +#!/bin/bash + +mkdir -p $1/demo/cxx/ocr/debug/ +cp ../../ppocr/utils/ppocr_keys_v1.txt $1/demo/cxx/ocr/debug/ +cp -r ./* $1/demo/cxx/ocr/ +cp ./config.txt $1/demo/cxx/ocr/debug/ +cp ../../doc/imgs/11.jpg $1/demo/cxx/ocr/debug/ + +echo "Prepare Done" diff --git a/deploy/lite/readme.md b/deploy/lite/readme.md index bfd15e4b8ef01933f37120f991c05a98e653c0c9..93f38e057258edad0ac5dd3bbbb7b3789011cefb 100644 --- a/deploy/lite/readme.md +++ b/deploy/lite/readme.md @@ -15,10 +15,9 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理 交叉编译环境用于编译 Paddle Lite 和 PaddleOCR 的C++ demo。 支持多种开发环境,不同开发环境的编译流程请参考对应文档。 -1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#docker) -2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#android) -3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#id13) -4. [Windows](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/x86.html#windows) +1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker) +2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux) +3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os) ### 1.2 准备预测库 @@ -28,15 +27,15 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理 |-|-| |Android|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv7.gcc.c++_static.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.CV_ON.tar.gz)| |IOS|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios.armv7.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios64.armv8.with_extra.CV_ON.tar.gz)| - |x86(Linux)|[预测库](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/X86/Linux/inference_lite_lib.x86.linux.tar.gz)| - 注:如果是从下Paddle-Lite[官网文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/release_lib.html#android-toolchain-gcc)下载的预测库, - 注意选择`with_extra=ON,with_cv=ON`的下载链接。 + 注:1. 如果是从下Paddle-Lite[官网文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/release_lib.html#android-toolchain-gcc)下载的预测库, + 注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 如果使用量化的模型部署在端侧,建议使用Paddle-Lite develop分支编译预测库。 -- 2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下: +- 2. [建议]编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下: ``` git clone https://github.com/PaddlePaddle/Paddle-Lite.git cd Paddle-Lite +# 务必使用develop分支编译预测库 git checkout develop ./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON ``` @@ -82,9 +81,12 @@ Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括 下述表格中提供了优化好的超轻量中文模型: -|模型简介|检测模型|识别模型|Paddle-Lite版本| -|-|-|-|-| -|超轻量级中文OCR opt优化模型|[下载地址](https://paddleocr.bj.bcebos.com/deploy/lite/ch_det_mv3_db_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/deploy/lite/ch_rec_mv3_crnn_opt.nb)|2.6.1| +|模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本| +|-|-|-|-|-|-|-| +|V1.1|超轻量中文OCR 移动端模型|3.5M|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|develop| +|V1.0|轻量级中文OCR 移动端模型|8.6M|[下载地址](https://paddleocr.bj.bcebos.com/deploy/lite/ch_det_mv3_db_opt.nb)|---|[下载地址](https://paddleocr.bj.bcebos.com/deploy/lite/ch_rec_mv3_crnn_opt.nb)|develop| + +注意:V1.1 3.0M 轻量模型是使用PaddleSlim优化后的,需要配合Paddle-Lite最新预测库使用。 如果直接使用上述表格中的模型进行部署,可略过下述步骤,直接阅读 [2.2节](#2.2与手机联调)。 @@ -121,18 +123,27 @@ cd build.opt/lite/api/ 下面以PaddleOCR的超轻量中文模型为例,介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。 ``` -# 下载PaddleOCR的超轻量文inference模型,并解压 +# 【推荐】 下载PaddleOCR V1.1版本的中英文 inference模型,V1.1比1.0效果更好,模型更小 +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_prune_infer.tar +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_quant_infer.tar +# 转换V1.1检测模型 +./opt --model_file=./ch_ppocr_mobile_v1.1_det_prune_infer/model --param_file=./ch_ppocr_mobile_v1.1_det_prune_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_det_prune_opt --valid_targets=arm +# 转换V1.1识别模型 +./opt --model_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/model --param_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_rec_quant_opt --valid_targets=arm + + +# 或下载使用PaddleOCR的V1.0超轻量中英文 inference模型,解压并转换为移动端支持的模型 wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar - -# 转换检测模型 +# 转换V1.0检测模型 ./opt --model_file=./ch_det_mv3_db/model --param_file=./ch_det_mv3_db/params --optimize_out_type=naive_buffer --optimize_out=./ch_det_mv3_db_opt --valid_targets=arm - -# 转换识别模型 +# 转换V1.0识别模型 ./opt --model_file=./ch_rec_mv3_crnn/model --param_file=./ch_rec_mv3_crnn/params --optimize_out_type=naive_buffer --optimize_out=./ch_rec_mv3_crnn_opt --valid_targets=arm ``` -转换成功后,当前目录下会多出`ch_det_mv3_db_opt.nb`, `ch_rec_mv3_crnn_opt.nb`结尾的文件,即是转换成功的模型文件。 +# 转换V1.1检测模型 + +转换成功后,当前目录下会多出`.nb`结尾的文件,即是转换成功的模型文件。 注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt 转换的输入模型是paddle保存的inference模型 @@ -168,25 +179,30 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar ``` 4. 准备优化后的模型、预测库文件、测试图像和使用的字典文件。 - 在预测库`inference_lite_lib.android.armv8/demo/cxx/`下新建一个`ocr/`文件夹, - 将PaddleOCR repo中`PaddleOCR/deploy/lite/` 下的除`readme.md`所有文件放在新建的ocr文件夹下。在`ocr`文件夹下新建一个`debug`文件夹, - 将C++预测库so文件复制到debug文件夹下。 - ``` + ``` + git clone https://github.com/PaddlePaddle/PaddleOCR.git + cd PaddleOCR/deploy/lite/ + # 运行prepare.sh,准备预测库文件、测试图像和使用的字典文件,并放置在预测库中的demo/cxx/ocr文件夹下 + sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8 + # 进入OCR demo的工作目录 + cd /{lite prediction library path}/inference_lite_lib.android.armv8/ cd demo/cxx/ocr/ # 将C++预测动态库so文件复制到debug文件夹中 cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/ - ``` + ``` + 准备测试图像,以`PaddleOCR/doc/imgs/11.jpg`为例,将测试的图像复制到`demo/cxx/ocr/debug/`文件夹下。 - 准备字典文件,中文超轻量模型的字典文件是`PaddleOCR/ppocr/utils/ppocr_keys_v1.txt`,将其复制到`demo/cxx/ocr/debug/`文件夹下。 + 准备lite opt工具优化后的模型文件,比如使用`ch_ppocr_mobile_v1.1_det_prune_opt.nb,ch_ppocr_mobile_v1.1_rec_quant_opt.nb, ch_ppocr_mobile_cls_quant_opt.nb`,模型文件放置在`demo/cxx/ocr/debug/`文件夹下。 执行完成后,ocr文件夹下将有如下文件格式: ``` demo/cxx/ocr/ |-- debug/ -| |--ch_det_mv3_db_opt.nb 优化后的检测模型文件 -| |--ch_rec_mv3_crnn_opt.nb 优化后的识别模型文件 +| |--ch_ppocr_mobile_v1.1_det_prune_opt.nb 优化后的检测模型文件 +| |--ch_ppocr_mobile_v1.1_rec_quant_opt.nb 优化后的识别模型文件 +| |--ch_ppocr_mobile_cls_quant_opt.nb 优化后的文字方向分类器模型文件 | |--11.jpg 待测试图像 | |--ppocr_keys_v1.txt 字典文件 | |--libpaddle_light_api_shared.so C++预测库文件 @@ -217,7 +233,7 @@ demo/cxx/ocr/ adb shell cd /data/local/tmp/debug export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH - ./ocr_db_crnn ch_det_mv3_db_opt.nb ch_rec_mv3_crnn_opt.nb ./11.jpg ppocr_keys_v1.txt + ./ocr_db_crnn ch_ppocr_mobile_v1.1_det_prune_opt.nb ch_ppocr_mobile_v1.1_rec_quant_opt.nb ch_ppocr_mobile_cls_quant_opt.nb ./11.jpg ppocr_keys_v1.txt ``` 如果对代码做了修改,则需要重新编译并push到手机上。 diff --git a/deploy/lite/readme_en.md b/deploy/lite/readme_en.md new file mode 100644 index 0000000000000000000000000000000000000000..a62f0478843fb28b21404f282d981daf7a0c294f --- /dev/null +++ b/deploy/lite/readme_en.md @@ -0,0 +1,199 @@ + +# Tutorial of PaddleOCR Mobile deployment + +This tutorial will introduce how to use paddle-lite to deploy paddleOCR ultra-lightweight Chinese and English detection models on mobile phones. + +paddle-lite is a lightweight inference engine for PaddlePaddle. +It provides efficient inference capabilities for mobile phones and IOTs, +and extensively integrates cross-platform hardware to provide lightweight +deployment solutions for end-side deployment issues. + +## 1. Preparation + +- Computer (for Compiling Paddle Lite) +- Mobile phone (arm7 or arm8) + +## 2. Build PaddleLite library +1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker) +2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux) +3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os) + +## 3. Download prebuild library for android and ios + +|Platform|Prebuild library Download Link| +|-|-| +|Android|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv7.gcc.c++_static.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.CV_ON.tar.gz)| +|IOS|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios.armv7.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios64.armv8.with_extra.CV_ON.tar.gz)| + +note: It is recommended to build prebuild library using [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) develop branch if developer wants to deploy the [quantitative](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README_en.md) model to mobile phone. + + +The structure of the prediction library is as follows: + +``` +inference_lite_lib.android.armv8/ +|-- cxx C++ prebuild library +| |-- include C++ +| | |-- paddle_api.h +| | |-- paddle_image_preprocess.h +| | |-- paddle_lite_factory_helper.h +| | |-- paddle_place.h +| | |-- paddle_use_kernels.h +| | |-- paddle_use_ops.h +| | `-- paddle_use_passes.h +| `-- lib +| |-- libpaddle_api_light_bundled.a C++ static library +| `-- libpaddle_light_api_shared.so C++ dynamic library +|-- java Java predict library +| |-- jar +| | `-- PaddlePredictor.jar +| |-- so +| | `-- libpaddle_lite_jni.so +| `-- src +|-- demo C++ and java demo +| |-- cxx +| `-- java +``` + + + +## 4. Inference Model Optimization + +Paddle Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle Lite provide opt tools to automatically complete the optimization steps and output a lightweight, optimal executable model. + +If you use PaddleOCR 8.6M OCR model to deploy, you can directly download the optimized model. + + +|Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle Lite branch | +|-|-|-|-|-|-| +|V1.1|extra-lightweight chinese OCR optimized model|3.5M|[Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|[Download](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb)|[Download](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|develop| +|V1.0|lightweight Chinese OCR optimized model|8.6M|[Download](https://paddleocr.bj.bcebos.com/deploy/lite/ch_det_mv3_db_opt.nb)|---|[Download](https://paddleocr.bj.bcebos.com/deploy/lite/ch_rec_mv3_crnn_opt.nb)|develop| + +If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model. + +``` +git clone https://github.com/PaddlePaddle/Paddle-Lite.git +cd Paddle-Lite +git checkout develop +./lite/tools/build.sh build_optimize_tool +``` + +The `opt` tool can be obtained by compiling Paddle Lite. + +After the compilation is complete, the opt file is located under `build.opt/lite/api/`. + +The `opt` can optimize the inference model saved by paddle.io.save_inference_model to get the model that the paddlelite API can use. + +The usage of opt is as follows: +``` +# 【Recommend】V1.1 is better than V1.0. steps for convert V1.1 model to nb file are as follows +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_prune_infer.tar +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_quant_infer.tar + +./opt --model_file=./ch_ppocr_mobile_v1.1_det_prune_infer/model --param_file=./ch_ppocr_mobile_v1.1_det_prune_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_det_prune_opt --valid_targets=arm +./opt --model_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/model --param_file=./ch_ppocr_mobile_v1.1_rec_quant_infer/params --optimize_out=./ch_ppocr_mobile_v1.1_rec_quant_opt --valid_targets=arm + +# or use V1.0 model +wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar +wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar + +./opt --model_file=./ch_det_mv3_db/model --param_file=./ch_det_mv3_db/params --optimize_out_type=naive_buffer --optimize_out=./ch_det_mv3_db_opt --valid_targets=arm +./opt --model_file=./ch_rec_mv3_crnn/model --param_file=./ch_rec_mv3_crnn/params --optimize_out_type=naive_buffer --optimize_out=./ch_rec_mv3_crnn_opt --valid_targets=arm + +``` + +When the above code command is completed, there will be two more files `.nb` in the current directory, which is the converted model file. + +## 5. Run optimized model on Phone + +1. Prepare an Android phone with arm8. If the compiled prediction library and opt file are armv7, you need an arm7 phone and modify ARM_ABI = arm7 in the Makefile. + +2. Make sure the phone is connected to the computer, open the USB debugging option of the phone, and select the file transfer mode. + +3. Install the adb tool on the computer. + 3.1 Install ADB for MAC + ``` + brew cask install android-platform-tools + ``` + 3.2 Install ADB for Linux + ``` + sudo apt update + sudo apt install -y wget adb + ``` + 3.3 Install ADB for windows + [Download Link](https://developer.android.com/studio) + + Verify whether adb is installed successfully + ``` + $ adb devices + + List of devices attached + 744be294 device + ``` + + If there is `device` output, it means the installation was successful. + +4. Prepare optimized models, prediction library files, test images and dictionary files used. + +``` + git clone https://github.com/PaddlePaddle/PaddleOCR.git + cd PaddleOCR/deploy/lite/ + # run prepare.sh + sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8 + + # + cd /{lite prediction library path}/inference_lite_lib.android.armv8/ + cd demo/cxx/ocr/ + # copy paddle-lite C++ .so file to debug/ directory + cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/ + + cd inference_lite_lib.android.armv8/demo/cxx/ocr/ + cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/ + +``` + +Prepare the test image, taking `PaddleOCR/doc/imgs/11.jpg` as an example, copy the image file to the `demo/cxx/ocr/debug/` folder. +Prepare the model files optimized by the lite opt tool, `ch_det_mv3_db_opt.nb, ch_rec_mv3_crnn_opt.nb`, +and place them under the `demo/cxx/ocr/debug/` folder. + + +The structure of the OCR demo is as follows after the above command is executed: +``` +demo/cxx/ocr/ +|-- debug/ +| |--ch_ppocr_mobile_v1.1_det_prune_opt.nb Detection model +| |--ch_ppocr_mobile_v1.1_rec_quant_opt.nb Recognition model +| |--ch_ppocr_mobile_cls_quant_opt.nb Text direction classification model +| |--11.jpg Image for OCR +| |--ppocr_keys_v1.txt Dictionary file +| |--libpaddle_light_api_shared.so C++ .so file +| |--config.txt Config file +|-- config.txt +|-- crnn_process.cc +|-- crnn_process.h +|-- db_post_process.cc +|-- db_post_process.h +|-- Makefile +|-- ocr_db_crnn.cc + +``` + +5. Run Model on phone + +``` +cd inference_lite_lib.android.armv8/demo/cxx/ocr/ +make -j +mv ocr_db_crnn ./debug/ +adb push debug /data/local/tmp/ +adb shell +cd /data/local/tmp/debug +export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH +# run model + ./ocr_db_crnn ch_ppocr_mobile_v1.1_det_prune_opt.nb ch_ppocr_mobile_v1.1_rec_quant_opt.nb ch_ppocr_mobile_cls_quant_opt.nb ./11.jpg ppocr_keys_v1.txt +``` + +The outputs are as follows: + +
+ +
diff --git a/deploy/pdserving/det_local_server.py b/deploy/pdserving/det_local_server.py new file mode 100644 index 0000000000000000000000000000000000000000..eb7948daadd018810997bba78367e86aa3398e31 --- /dev/null +++ b/deploy/pdserving/det_local_server.py @@ -0,0 +1,79 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_client import Client +import cv2 +import sys +import numpy as np +import os +from paddle_serving_client import Client +from paddle_serving_app.reader import Sequential, ResizeByFactor +from paddle_serving_app.reader import Div, Normalize, Transpose +from paddle_serving_app.reader import DBPostProcess, FilterBoxes +if sys.argv[1] == 'gpu': + from paddle_serving_server_gpu.web_service import WebService +elif sys.argv[1] == 'cpu': + from paddle_serving_server.web_service import WebService +import time +import re +import base64 + + +class OCRService(WebService): + def init_det(self): + self.det_preprocess = Sequential([ + ResizeByFactor(32, 960), Div(255), + Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( + (2, 0, 1)) + ]) + self.filter_func = FilterBoxes(10, 10) + self.post_func = DBPostProcess({ + "thresh": 0.3, + "box_thresh": 0.5, + "max_candidates": 1000, + "unclip_ratio": 1.5, + "min_size": 3 + }) + + def preprocess(self, feed=[], fetch=[]): + data = base64.b64decode(feed[0]["image"].encode('utf8')) + data = np.fromstring(data, np.uint8) + im = cv2.imdecode(data, cv2.IMREAD_COLOR) + self.ori_h, self.ori_w, _ = im.shape + det_img = self.det_preprocess(im) + _, self.new_h, self.new_w = det_img.shape + return {"image": det_img[np.newaxis, :].copy()}, ["concat_1.tmp_0"] + + def postprocess(self, feed={}, fetch=[], fetch_map=None): + det_out = fetch_map["concat_1.tmp_0"] + ratio_list = [ + float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w + ] + dt_boxes_list = self.post_func(det_out, [ratio_list]) + dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w]) + return {"dt_boxes": dt_boxes.tolist()} + + +ocr_service = OCRService(name="ocr") +ocr_service.load_model_config("ocr_det_model") +ocr_service.init_det() +if sys.argv[1] == 'gpu': + ocr_service.set_gpus("0") + ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) + ocr_service.run_debugger_service(gpu=True) +elif sys.argv[1] == 'cpu': + ocr_service.prepare_server(workdir="workdir", port=9292) + ocr_service.run_debugger_service() +ocr_service.init_det() +ocr_service.run_web_service() diff --git a/deploy/pdserving/det_web_server.py b/deploy/pdserving/det_web_server.py new file mode 100644 index 0000000000000000000000000000000000000000..14be74130dcb413c31a3e76c150d74f65575f451 --- /dev/null +++ b/deploy/pdserving/det_web_server.py @@ -0,0 +1,78 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_client import Client +import cv2 +import sys +import numpy as np +import os +from paddle_serving_client import Client +from paddle_serving_app.reader import Sequential, ResizeByFactor +from paddle_serving_app.reader import Div, Normalize, Transpose +from paddle_serving_app.reader import DBPostProcess, FilterBoxes +if sys.argv[1] == 'gpu': + from paddle_serving_server_gpu.web_service import WebService +elif sys.argv[1] == 'cpu': + from paddle_serving_server.web_service import WebService +import time +import re +import base64 + + +class OCRService(WebService): + def init_det(self): + self.det_preprocess = Sequential([ + ResizeByFactor(32, 960), Div(255), + Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( + (2, 0, 1)) + ]) + self.filter_func = FilterBoxes(10, 10) + self.post_func = DBPostProcess({ + "thresh": 0.3, + "box_thresh": 0.5, + "max_candidates": 1000, + "unclip_ratio": 1.5, + "min_size": 3 + }) + + def preprocess(self, feed=[], fetch=[]): + data = base64.b64decode(feed[0]["image"].encode('utf8')) + data = np.fromstring(data, np.uint8) + im = cv2.imdecode(data, cv2.IMREAD_COLOR) + self.ori_h, self.ori_w, _ = im.shape + det_img = self.det_preprocess(im) + _, self.new_h, self.new_w = det_img.shape + print(det_img) + return {"image": det_img}, ["concat_1.tmp_0"] + + def postprocess(self, feed={}, fetch=[], fetch_map=None): + det_out = fetch_map["concat_1.tmp_0"] + ratio_list = [ + float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w + ] + dt_boxes_list = self.post_func(det_out, [ratio_list]) + dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w]) + return {"dt_boxes": dt_boxes.tolist()} + + +ocr_service = OCRService(name="ocr") +ocr_service.load_model_config("ocr_det_model") +if sys.argv[1] == 'gpu': + ocr_service.set_gpus("0") + ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) +elif sys.argv[1] == 'cpu': + ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") +ocr_service.init_det() +ocr_service.run_rpc_service() +ocr_service.run_web_service() diff --git a/deploy/pdserving/ocr_local_server.py b/deploy/pdserving/ocr_local_server.py new file mode 100644 index 0000000000000000000000000000000000000000..de5b3d13f12afd4a84c5d46625682c42f418d6bb --- /dev/null +++ b/deploy/pdserving/ocr_local_server.py @@ -0,0 +1,114 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_client import Client +from paddle_serving_app.reader import OCRReader +import cv2 +import sys +import numpy as np +import os +from paddle_serving_client import Client +from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor +from paddle_serving_app.reader import Div, Normalize, Transpose +from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes +if sys.argv[1] == 'gpu': + from paddle_serving_server_gpu.web_service import WebService +elif sys.argv[1] == 'cpu': + from paddle_serving_server.web_service import WebService +from paddle_serving_app.local_predict import Debugger +import time +import re +import base64 + + +class OCRService(WebService): + def init_det_debugger(self, det_model_config): + self.det_preprocess = Sequential([ + ResizeByFactor(32, 960), Div(255), + Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( + (2, 0, 1)) + ]) + self.det_client = Debugger() + if sys.argv[1] == 'gpu': + self.det_client.load_model_config( + det_model_config, gpu=True, profile=False) + elif sys.argv[1] == 'cpu': + self.det_client.load_model_config( + det_model_config, gpu=False, profile=False) + self.ocr_reader = OCRReader() + + def preprocess(self, feed=[], fetch=[]): + data = base64.b64decode(feed[0]["image"].encode('utf8')) + data = np.fromstring(data, np.uint8) + im = cv2.imdecode(data, cv2.IMREAD_COLOR) + ori_h, ori_w, _ = im.shape + det_img = self.det_preprocess(im) + _, new_h, new_w = det_img.shape + det_img = det_img[np.newaxis, :] + det_img = det_img.copy() + det_out = self.det_client.predict( + feed={"image": det_img}, fetch=["concat_1.tmp_0"]) + filter_func = FilterBoxes(10, 10) + post_func = DBPostProcess({ + "thresh": 0.3, + "box_thresh": 0.5, + "max_candidates": 1000, + "unclip_ratio": 1.5, + "min_size": 3 + }) + sorted_boxes = SortedBoxes() + ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w] + dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list]) + dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w]) + dt_boxes = sorted_boxes(dt_boxes) + get_rotate_crop_image = GetRotateCropImage() + img_list = [] + max_wh_ratio = 0 + for i, dtbox in enumerate(dt_boxes): + boximg = get_rotate_crop_image(im, dt_boxes[i]) + img_list.append(boximg) + h, w = boximg.shape[0:2] + wh_ratio = w * 1.0 / h + max_wh_ratio = max(max_wh_ratio, wh_ratio) + if len(img_list) == 0: + return [], [] + _, w, h = self.ocr_reader.resize_norm_img(img_list[0], + max_wh_ratio).shape + imgs = np.zeros((len(img_list), 3, w, h)).astype('float32') + for id, img in enumerate(img_list): + norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) + imgs[id] = norm_img + feed = {"image": imgs.copy()} + fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] + return feed, fetch + + def postprocess(self, feed={}, fetch=[], fetch_map=None): + rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) + res_lst = [] + for res in rec_res: + res_lst.append(res[0]) + res = {"res": res_lst} + return res + + +ocr_service = OCRService(name="ocr") +ocr_service.load_model_config("ocr_rec_model") +ocr_service.init_det_debugger(det_model_config="ocr_det_model") +if sys.argv[1] == 'gpu': + ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) + ocr_service.run_debugger_service(gpu=True) +elif sys.argv[1] == 'cpu': + ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") + ocr_service.run_debugger_service() +ocr_service.run_web_service() diff --git a/deploy/pdserving/ocr_web_client.py b/deploy/pdserving/ocr_web_client.py new file mode 100644 index 0000000000000000000000000000000000000000..e2a92eb8ee4aa62059be184dd7e67237ed460f13 --- /dev/null +++ b/deploy/pdserving/ocr_web_client.py @@ -0,0 +1,37 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# -*- coding: utf-8 -*- + +import requests +import json +import cv2 +import base64 +import os, sys +import time + +def cv2_to_base64(image): + #data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(image).decode( + 'utf8') #data.tostring()).decode('utf8') + +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:9292/ocr/prediction" +test_img_dir = "../../doc/imgs/" +for img_file in os.listdir(test_img_dir): + with open(os.path.join(test_img_dir, img_file), 'rb') as file: + image_data1 = file.read() + image = cv2_to_base64(image_data1) + data = {"feed": [{"image": image}], "fetch": ["res"]} + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + print(r.json()) diff --git a/deploy/pdserving/ocr_web_server.py b/deploy/pdserving/ocr_web_server.py new file mode 100644 index 0000000000000000000000000000000000000000..6c0de44661958a6425f57039261969551ff552c5 --- /dev/null +++ b/deploy/pdserving/ocr_web_server.py @@ -0,0 +1,105 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_client import Client +from paddle_serving_app.reader import OCRReader +import cv2 +import sys +import numpy as np +import os +from paddle_serving_client import Client +from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor +from paddle_serving_app.reader import Div, Normalize, Transpose +from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes +if sys.argv[1] == 'gpu': + from paddle_serving_server_gpu.web_service import WebService +elif sys.argv[1] == 'cpu': + from paddle_serving_server.web_service import WebService +import time +import re +import base64 + + +class OCRService(WebService): + def init_det_client(self, det_port, det_client_config): + self.det_preprocess = Sequential([ + ResizeByFactor(32, 960), Div(255), + Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( + (2, 0, 1)) + ]) + self.det_client = Client() + self.det_client.load_client_config(det_client_config) + self.det_client.connect(["127.0.0.1:{}".format(det_port)]) + self.ocr_reader = OCRReader() + + def preprocess(self, feed=[], fetch=[]): + data = base64.b64decode(feed[0]["image"].encode('utf8')) + data = np.fromstring(data, np.uint8) + im = cv2.imdecode(data, cv2.IMREAD_COLOR) + ori_h, ori_w, _ = im.shape + det_img = self.det_preprocess(im) + det_out = self.det_client.predict( + feed={"image": det_img}, fetch=["concat_1.tmp_0"]) + _, new_h, new_w = det_img.shape + filter_func = FilterBoxes(10, 10) + post_func = DBPostProcess({ + "thresh": 0.3, + "box_thresh": 0.5, + "max_candidates": 1000, + "unclip_ratio": 1.5, + "min_size": 3 + }) + sorted_boxes = SortedBoxes() + ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w] + dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list]) + dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w]) + dt_boxes = sorted_boxes(dt_boxes) + get_rotate_crop_image = GetRotateCropImage() + feed_list = [] + img_list = [] + max_wh_ratio = 0 + for i, dtbox in enumerate(dt_boxes): + boximg = get_rotate_crop_image(im, dt_boxes[i]) + img_list.append(boximg) + h, w = boximg.shape[0:2] + wh_ratio = w * 1.0 / h + max_wh_ratio = max(max_wh_ratio, wh_ratio) + for img in img_list: + norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) + feed = {"image": norm_img} + feed_list.append(feed) + fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] + return feed_list, fetch + + def postprocess(self, feed={}, fetch=[], fetch_map=None): + rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) + res_lst = [] + for res in rec_res: + res_lst.append(res[0]) + res = {"res": res_lst} + return res + + +ocr_service = OCRService(name="ocr") +ocr_service.load_model_config("ocr_rec_model") +if sys.argv[1] == 'gpu': + ocr_service.set_gpus("0") + ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) +elif sys.argv[1] == 'cpu': + ocr_service.prepare_server(workdir="workdir", port=9292) +ocr_service.init_det_client( + det_port=9293, + det_client_config="ocr_det_client/serving_client_conf.prototxt") +ocr_service.run_rpc_service() +ocr_service.run_web_service() diff --git a/deploy/pdserving/readme.md b/deploy/pdserving/readme.md new file mode 100644 index 0000000000000000000000000000000000000000..af12d508ba9c04e6032f2a392701e72b41462395 --- /dev/null +++ b/deploy/pdserving/readme.md @@ -0,0 +1,120 @@ +[English](readme_en.md) | 简体中文 + +PaddleOCR提供2种服务部署方式: +- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",使用方法参考[文档](../hubserving/readme.md)。 +- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",按照本教程使用。 + +# Paddle Serving 服务部署 +本教程将介绍基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)部署PaddleOCR在线预测服务的详细步骤。 + +## 快速启动服务 + +### 1. 准备环境 +我们先安装Paddle Serving相关组件 +我们推荐用户使用GPU来做Paddle Serving的OCR服务部署 + +**CUDA版本:9.0** + +**CUDNN版本:7.0** + +**操作系统版本:CentOS 6以上** + +**Python版本: 2.7/3.6/3.7** + +**Python操作指南:** +``` +#CPU/GPU版本选择一个 +#GPU版本服务端 +python -m pip install paddle_serving_server_gpu +#CPU版本服务端 +python -m pip install paddle_serving_server +#客户端和App包使用以下链接(CPU,GPU通用) +python -m pip install paddle_serving_app paddle_serving_client +``` + + +### 2. 模型转换 +可以使用`paddle_serving_app`提供的模型,执行下列命令 +``` +python -m paddle_serving_app.package --get_model ocr_rec +tar -xzvf ocr_rec.tar.gz +python -m paddle_serving_app.package --get_model ocr_det +tar -xzvf ocr_det.tar.gz +``` +执行上述命令会下载`db_crnn_mobile`的模型,如果想要下载规模更大的`db_crnn_server`模型,可以在下载预测模型并解压之后。参考[如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md)。 + +我们以`ch_rec_r34_vd_crnn`模型作为例子,下载链接在: + +``` +wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar +tar xf ch_rec_r34_vd_crnn_infer.tar +``` +因此我们按照Serving模型转换教程,运行下列python文件。 +``` +from paddle_serving_client.io import inference_model_to_serving +inference_model_dir = "ch_rec_r34_vd_crnn" +serving_client_dir = "serving_client_dir" +serving_server_dir = "serving_server_dir" +feed_var_names, fetch_var_names = inference_model_to_serving( + inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params") +``` +最终会在`serving_client_dir`和`serving_server_dir`生成客户端和服务端的模型配置。 + +### 3. 启动服务 +启动服务可以根据实际需求选择启动`标准版`或者`快速版`,两种方式的对比如下表: + +|版本|特点|适用场景| +|-|-|-| +|标准版|稳定性高,分布式部署|适用于吞吐量大,需要跨机房部署的情况| +|快速版|部署方便,预测速度快|适用于对预测速度要求高,迭代速度快的场景| + +#### 方式1. 启动标准版服务 + +``` +# cpu,gpu启动二选一,以下是cpu启动 +python -m paddle_serving_server.serve --model ocr_det_model --port 9293 +python ocr_web_server.py cpu +# gpu启动 +python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0 +python ocr_web_server.py gpu +``` + +#### 方式2. 启动快速版服务 + +``` +# cpu,gpu启动二选一,以下是cpu启动 +python ocr_local_server.py cpu +# gpu启动 +python ocr_local_server.py gpu +``` + +## 发送预测请求 + +``` +python ocr_web_client.py +``` + +## 返回结果格式说明 + +返回结果是json格式 +``` +{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}} +``` +我们也可以打印结果json串中`res`字段的每一句话 +``` +土地整治与土壤修复研究中心 +华南农业大学1素图 +``` + +## 自定义修改服务逻辑 + +在`ocr_web_server.py`或是`ocr_local_server.py`当中的`preprocess`函数里面做了检测服务和识别服务的前处理,`postprocess`函数里面做了识别的后处理服务,可以在相应的函数中做修改。调用了`paddle_serving_app`库提供的常见CV模型的前处理/后处理库。 + +如果想要单独启动Paddle Serving的检测服务和识别服务,参见下列表格, 执行对应的脚本即可,并且在命令行参数注明用的CPU或是GPU来提供服务。 + +| 模型 | 标准版 | 快速版 | +| ---- | ----------------- | ------------------- | +| 检测 | det_web_server.py | det_local_server.py | +| 识别 | rec_web_server.py | rec_local_server.py | + +更多信息参见[Paddle Serving](https://github.com/PaddlePaddle/Serving) diff --git a/deploy/pdserving/readme_en.md b/deploy/pdserving/readme_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9a0c684fb6fb4f0eeff2552af70f62053d3351fb --- /dev/null +++ b/deploy/pdserving/readme_en.md @@ -0,0 +1,123 @@ +English | [简体中文](readme.md) + +PaddleOCR provides 2 service deployment methods: +- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../hubserving/readme_en.md) for usage. +- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please follow this tutorial. + +# Service deployment based on Paddle Serving + +This tutorial will introduce the detail steps of deploying PaddleOCR online prediction service based on [Paddle Serving](https://github.com/PaddlePaddle/Serving). + +## Quick start service + +### 1. Prepare the environment +Let's first install the relevant components of Paddle Serving. GPU is recommended for service deployment with Paddle Serving. + +**Requirements:** +- **CUDA version: 9.0** +- **CUDNN version: 7.0** +- **Operating system version: >= CentOS 6** +- **Python version: 2.7/3.6/3.7** + +**Installation:** +``` +# install GPU server +python -m pip install paddle_serving_server_gpu + +# or, install CPU server +python -m pip install paddle_serving_server + +# install client and App package (CPU/GPU) +python -m pip install paddle_serving_app paddle_serving_client +``` + +### 2. Model transformation +You can directly use converted model provided by `paddle_serving_app` for convenience. Execute the following command to obtain: +``` +python -m paddle_serving_app.package --get_model ocr_rec +tar -xzvf ocr_rec.tar.gz +python -m paddle_serving_app.package --get_model ocr_det +tar -xzvf ocr_det.tar.gz +``` +Executing the above command will download the `db_crnn_mobile` model, which is in different format with inference model. If you want to use other models for deployment, you can refer to the [tutorial](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md) to convert your inference model to a model which is deployable for Paddle Serving. + +We take `ch_rec_r34_vd_crnn` model as example. Download the inference model by executing the following command: +``` +wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar +tar xf ch_rec_r34_vd_crnn_infer.tar +``` + +Convert the downloaded model by executing the following python script: +``` +from paddle_serving_client.io import inference_model_to_serving +inference_model_dir = "ch_rec_r34_vd_crnn" +serving_client_dir = "serving_client_dir" +serving_server_dir = "serving_server_dir" +feed_var_names, fetch_var_names = inference_model_to_serving( + inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params") +``` + +Finally, model configuration of client and server will be generated in `serving_client_dir` and `serving_server_dir`. + +### 3. Start service +Start the standard version or the fast version service according to your actual needs. The comparison of the two versions is shown in the table below: + +|version|characteristics|recommended scenarios| +|-|-|-| +|standard version|High stability, suitable for distributed deployment|Large throughput and cross regional deployment| +|fast version|Easy to deploy and fast to predict|Suitable for scenarios which requires high prediction speed and fast iteration speed| + +#### Mode 1. Start the standard mode service + +``` +# start with CPU +python -m paddle_serving_server.serve --model ocr_det_model --port 9293 +python ocr_web_server.py cpu + +# or, with GPU +python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0 +python ocr_web_server.py gpu +``` + +#### Mode 2. Start the fast mode service + +``` +# start with CPU +python ocr_local_server.py cpu + +# or, with GPU +python ocr_local_server.py gpu +``` + +## Send prediction requests + +``` +python ocr_web_client.py +``` + +## Returned result format + +The returned result is a JSON string, eg. +``` +{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}} +``` + +You can also print the readable result in `res`: +``` +土地整治与土壤修复研究中心 +华南农业大学1素图 +``` + +## User defined service module modification + +The pre-processing and post-processing process, can be found in the `preprocess` and `postprocess` function in `ocr_web_server.py` or `ocr_local_server.py`. The pre-processing/post-processing library for common CV models provided by `paddle_serving_app` is called. +You can modify the corresponding code as actual needs. + +If you only want to start the detection service or the recognition service, execute the corresponding script reffering to the following table. Indicate the CPU or GPU is used in the start command parameters. + +| task | standard | fast | +| ---- | ----------------- | ------------------- | +| detection | det_web_server.py | det_local_server.py | +| recognition | rec_web_server.py | rec_local_server.py | + +More info can be found in [Paddle Serving](https://github.com/PaddlePaddle/Serving). diff --git a/deploy/pdserving/rec_local_server.py b/deploy/pdserving/rec_local_server.py new file mode 100644 index 0000000000000000000000000000000000000000..ba021c1cd5054071eb115b3e6e9c64cb572ff871 --- /dev/null +++ b/deploy/pdserving/rec_local_server.py @@ -0,0 +1,79 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_client import Client +from paddle_serving_app.reader import OCRReader +import cv2 +import sys +import numpy as np +import os +from paddle_serving_client import Client +from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor +from paddle_serving_app.reader import Div, Normalize, Transpose +from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes +if sys.argv[1] == 'gpu': + from paddle_serving_server_gpu.web_service import WebService +elif sys.argv[1] == 'cpu': + from paddle_serving_server.web_service import WebService +import time +import re +import base64 + + +class OCRService(WebService): + def init_rec(self): + self.ocr_reader = OCRReader() + + def preprocess(self, feed=[], fetch=[]): + img_list = [] + for feed_data in feed: + data = base64.b64decode(feed_data["image"].encode('utf8')) + data = np.fromstring(data, np.uint8) + im = cv2.imdecode(data, cv2.IMREAD_COLOR) + img_list.append(im) + max_wh_ratio = 0 + for i, boximg in enumerate(img_list): + h, w = boximg.shape[0:2] + wh_ratio = w * 1.0 / h + max_wh_ratio = max(max_wh_ratio, wh_ratio) + _, w, h = self.ocr_reader.resize_norm_img(img_list[0], + max_wh_ratio).shape + imgs = np.zeros((len(img_list), 3, w, h)).astype('float32') + for i, img in enumerate(img_list): + norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) + imgs[i] = norm_img + feed = {"image": imgs.copy()} + fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] + return feed, fetch + + def postprocess(self, feed={}, fetch=[], fetch_map=None): + rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) + res_lst = [] + for res in rec_res: + res_lst.append(res[0]) + res = {"res": res_lst} + return res + + +ocr_service = OCRService(name="ocr") +ocr_service.load_model_config("ocr_rec_model") +ocr_service.init_rec() +if sys.argv[1] == 'gpu': + ocr_service.set_gpus("0") + ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) + ocr_service.run_debugger_service(gpu=True) +elif sys.argv[1] == 'cpu': + ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") + ocr_service.run_debugger_service() +ocr_service.run_web_service() diff --git a/deploy/pdserving/rec_web_server.py b/deploy/pdserving/rec_web_server.py new file mode 100644 index 0000000000000000000000000000000000000000..0f4e9f6d264ed602f387bfaf0303cd59af7823fa --- /dev/null +++ b/deploy/pdserving/rec_web_server.py @@ -0,0 +1,77 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_client import Client +from paddle_serving_app.reader import OCRReader +import cv2 +import sys +import numpy as np +import os +from paddle_serving_client import Client +from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor +from paddle_serving_app.reader import Div, Normalize, Transpose +from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes +if sys.argv[1] == 'gpu': + from paddle_serving_server_gpu.web_service import WebService +elif sys.argv[1] == 'cpu': + from paddle_serving_server.web_service import WebService +import time +import re +import base64 + + +class OCRService(WebService): + def init_rec(self): + self.ocr_reader = OCRReader() + + def preprocess(self, feed=[], fetch=[]): + # TODO: to handle batch rec images + img_list = [] + for feed_data in feed: + data = base64.b64decode(feed_data["image"].encode('utf8')) + data = np.fromstring(data, np.uint8) + im = cv2.imdecode(data, cv2.IMREAD_COLOR) + img_list.append(im) + feed_list = [] + max_wh_ratio = 0 + for i, boximg in enumerate(img_list): + h, w = boximg.shape[0:2] + wh_ratio = w * 1.0 / h + max_wh_ratio = max(max_wh_ratio, wh_ratio) + for img in img_list: + norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) + feed = {"image": norm_img} + feed_list.append(feed) + fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] + return feed_list, fetch + + def postprocess(self, feed={}, fetch=[], fetch_map=None): + rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) + res_lst = [] + for res in rec_res: + res_lst.append(res[0]) + res = {"res": res_lst} + return res + + +ocr_service = OCRService(name="ocr") +ocr_service.load_model_config("ocr_rec_model") +ocr_service.init_rec() +if sys.argv[1] == 'gpu': + ocr_service.set_gpus("0") + ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) +elif sys.argv[1] == 'cpu': + ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") +ocr_service.run_rpc_service() +ocr_service.run_web_service() diff --git a/deploy/slim/prune/README.md b/deploy/slim/prune/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bff1b78e6b583592fb699ba46c6a3740a63dae75 --- /dev/null +++ b/deploy/slim/prune/README.md @@ -0,0 +1,75 @@ + +## 介绍 + +复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 +本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。 +[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。 + + +在开始本教程之前,建议先了解: +1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md) +2. [分类模型裁剪教程](https://paddlepaddle.github.io/PaddleSlim/tutorials/pruning_tutorial/) +3. [PaddleSlim 裁剪压缩API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) + + +## 快速开始 + +模型裁剪主要包括五个步骤: +1. 安装 PaddleSlim +2. 准备训练好的模型 +3. 敏感度分析、训练 +4. 模型裁剪训练 +5. 导出模型、预测部署 + +### 1. 安装PaddleSlim + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd Paddleslim +python setup.py install +``` + +### 2. 获取预训练模型 +模型裁剪需要加载事先训练好的模型,PaddleOCR也提供了一系列模型[../../../doc/doc_ch/models_list.md],开发者可根据需要自行选择模型或使用自己的模型。 + +### 3. 敏感度分析训练 + +加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,得到敏感度文件:sensitivities_0.data,可以通过PaddleSlim提供的[接口](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221)加载文件,获得各网络层在不同裁剪比例下的精度损失。从而了解各网络层冗余度,决定每个网络层的裁剪比例。 +敏感度分析的具体细节见:[敏感度分析](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md) +敏感度文件内容格式: + sensitivities_0.data(Dict){ + 'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} + 'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} + } + + 例子: + { + 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} + 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} + } +加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) + +进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析训练: +```bash +python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights="your trained model" Global.test_batch_size_per_card=1 +``` + +### 4. 模型裁剪训练 +裁剪时通过之前的敏感度分析文件决定每个网络层的裁剪比例。在具体实现时,为了尽可能多的保留从图像中提取的低阶特征,我们跳过了backbone中靠近输入的4个卷积层。同样,为了减少由于裁剪导致的模型性能损失,我们通过之前敏感度分析所获得的敏感度表,人工挑选出了一些冗余较少,对裁剪较为敏感的[网络层](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41)(指在较低的裁剪比例下就导致很高性能损失的网络层),并在之后的裁剪过程中选择避开这些网络层。裁剪过后finetune的过程沿用OCR检测模型原始的训练策略。 + +```bash +python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1 +``` +通过对比可以发现,经过裁剪训练保存的模型更小。 + +### 5. 导出模型、预测部署 + +在得到裁剪训练保存的模型后,我们可以将其导出为inference_model: +```bash +python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model +``` + +inference model的预测和部署参考: +1. [inference model python端预测](../../../doc/doc_ch/inference.md) +2. [inference model C++预测](../../cpp_infer/readme.md) +3. [inference model在移动端部署](../../lite/readme.md) diff --git a/deploy/slim/prune/README_en.md b/deploy/slim/prune/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..7adbd86c6145a6854e19cca62b9f995e07cb6b13 --- /dev/null +++ b/deploy/slim/prune/README_en.md @@ -0,0 +1,85 @@ + +## Introduction + +Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance. + +This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model. +[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry. + +It is recommended that you could understand following pages before reading this example: +1. [PaddleOCR training methods](../../../doc/doc_ch/quickstart.md) +2. [The demo of prune](https://paddlepaddle.github.io/PaddleSlim/tutorials/pruning_tutorial/) +3. [PaddleSlim prune API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) + +## Quick start + +Five steps for OCR model prune: +1. Install PaddleSlim +2. Prepare the trained model +3. Sensitivity analysis and training +4. Model tailoring training +5. Export model, predict deployment + +### 1. Install PaddleSlim + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd Paddleslim +python setup.py install +``` + + +### 2. Download Pretrain Model +Model prune needs to load pre-trained models. +PaddleOCR also provides a series of (models)[../../../doc/doc_en/models_list_en.md]. Developers can choose their own models or use their own models according to their needs. + + +### 3. Pruning sensitivity analysis + + After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sensitivities_0.data. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md) + The data format of sensitivity file: + sensitivities_0.data(Dict){ + 'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} + 'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} + } + + example: + { + 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} + 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} + } + The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of correspoding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) + + +Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command: + +```bash + +python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1 + +``` + + + +### 4. Model pruning and Fine-tune + + When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model. + +```bash + +python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1 + +``` + + +### 5. Export inference model and deploy it + +We can export the pruned model as inference_model for deployment: +```bash +python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model +``` + +Reference for prediction and deployment of inference model: +1. [inference model python prediction](../../../doc/doc_en/inference_en.md) +2. [inference model C++ prediction](../../cpp_infer/readme_en.md) +3. [Deployment of inference model on mobile](../../lite/readme_en.md) diff --git a/deploy/slim/prune/export_prune_model.py b/deploy/slim/prune/export_prune_model.py new file mode 100644 index 0000000000000000000000000000000000000000..0603966f8eeb9a59fa8c4494bf47de9cdaf53774 --- /dev/null +++ b/deploy/slim/prune/export_prune_model.py @@ -0,0 +1,67 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.join(__dir__, '..', '..', '..')) +sys.path.append(os.path.join(__dir__, '..', '..', '..', 'tools')) + +import program +from paddle import fluid +from ppocr.utils.utility import initial_logger +logger = initial_logger() +from ppocr.utils.save_load import init_model +from paddleslim.prune import load_model + + +def main(): + startup_prog, eval_program, place, config, _ = program.preprocess() + + feeded_var_names, target_vars, fetches_var_name = program.build_export( + config, eval_program, startup_prog) + eval_program = eval_program.clone(for_test=True) + exe = fluid.Executor(place) + exe.run(startup_prog) + + if config['Global']['checkpoints'] is not None: + path = config['Global']['checkpoints'] + else: + path = config['Global']['pretrain_weights'] + + load_model(exe, eval_program, path) + + save_inference_dir = config['Global']['save_inference_dir'] + if not os.path.exists(save_inference_dir): + os.makedirs(save_inference_dir) + fluid.io.save_inference_model( + dirname=save_inference_dir, + feeded_var_names=feeded_var_names, + main_program=eval_program, + target_vars=target_vars, + executor=exe, + model_filename='model', + params_filename='params') + print("inference model saved in {}/model and {}/params".format( + save_inference_dir, save_inference_dir)) + print("save success, output_name_list:", fetches_var_name) + + +if __name__ == '__main__': + main() diff --git a/deploy/slim/prune/pruning_and_finetune.py b/deploy/slim/prune/pruning_and_finetune.py new file mode 100644 index 0000000000000000000000000000000000000000..0a03cb449523232dbeedf1d7066b4ea4ba01f31f --- /dev/null +++ b/deploy/slim/prune/pruning_and_finetune.py @@ -0,0 +1,145 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import numpy as np +__dir__ = os.path.dirname(__file__) +sys.path.append(__dir__) +sys.path.append(os.path.join(__dir__, '..', '..', '..')) +sys.path.append(os.path.join(__dir__, '..', '..', '..', 'tools')) + +import tools.program as program +from paddle import fluid +from ppocr.utils.utility import initial_logger +logger = initial_logger() +from ppocr.data.reader_main import reader_main +from ppocr.utils.save_load import init_model +from ppocr.utils.character import CharacterOps +from ppocr.utils.utility import initial_logger +from paddleslim.prune import Pruner, save_model +from paddleslim.analysis import flops +from paddleslim.core.graph_wrapper import * +from paddleslim.prune import load_sensitivities, get_ratios_by_loss, merge_sensitive +logger = initial_logger() + +skip_list = [ + 'conv10_linear_weights', 'conv11_linear_weights', 'conv12_expand_weights', + 'conv12_linear_weights', 'conv12_se_2_weights', 'conv13_linear_weights', + 'conv2_linear_weights', 'conv4_linear_weights', 'conv5_expand_weights', + 'conv5_linear_weights', 'conv5_se_2_weights', 'conv6_linear_weights', + 'conv7_linear_weights', 'conv8_expand_weights', 'conv8_linear_weights', + 'conv9_expand_weights', 'conv9_linear_weights' +] + + +def main(): + config = program.load_config(FLAGS.config) + program.merge_config(FLAGS.opt) + logger.info(config) + + # check if set use_gpu=True in paddlepaddle cpu version + use_gpu = config['Global']['use_gpu'] + program.check_gpu(use_gpu) + + alg = config['Global']['algorithm'] + assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE'] + if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']: + config['Global']['char_ops'] = CharacterOps(config['Global']) + + place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() + startup_program = fluid.Program() + train_program = fluid.Program() + train_build_outputs = program.build( + config, train_program, startup_program, mode='train') + train_loader = train_build_outputs[0] + train_fetch_name_list = train_build_outputs[1] + train_fetch_varname_list = train_build_outputs[2] + train_opt_loss_name = train_build_outputs[3] + + eval_program = fluid.Program() + eval_build_outputs = program.build( + config, eval_program, startup_program, mode='eval') + eval_fetch_name_list = eval_build_outputs[1] + eval_fetch_varname_list = eval_build_outputs[2] + eval_program = eval_program.clone(for_test=True) + + train_reader = reader_main(config=config, mode="train") + train_loader.set_sample_list_generator(train_reader, places=place) + + eval_reader = reader_main(config=config, mode="eval") + + exe = fluid.Executor(place) + exe.run(startup_program) + + # compile program for multi-devices + init_model(config, train_program, exe) + + sen = load_sensitivities("sensitivities_0.data") + for i in skip_list: + sen.pop(i) + back_bone_list = ['conv' + str(x) for x in range(1, 5)] + for i in back_bone_list: + for key in list(sen.keys()): + if i + '_' in key: + sen.pop(key) + ratios = get_ratios_by_loss(sen, 0.03) + logger.info("FLOPs before pruning: {}".format(flops(eval_program))) + pruner = Pruner(criterion='geometry_median') + print("ratios: {}".format(ratios)) + pruned_val_program, _, _ = pruner.prune( + eval_program, + fluid.global_scope(), + params=ratios.keys(), + ratios=ratios.values(), + place=place, + only_graph=True) + + pruned_program, _, _ = pruner.prune( + train_program, + fluid.global_scope(), + params=ratios.keys(), + ratios=ratios.values(), + place=place) + logger.info("FLOPs after pruning: {}".format(flops(pruned_val_program))) + train_compile_program = program.create_multi_devices_program( + pruned_program, train_opt_loss_name) + + + train_info_dict = {'compile_program':train_compile_program,\ + 'train_program':pruned_program,\ + 'reader':train_loader,\ + 'fetch_name_list':train_fetch_name_list,\ + 'fetch_varname_list':train_fetch_varname_list} + + eval_info_dict = {'program':pruned_val_program,\ + 'reader':eval_reader,\ + 'fetch_name_list':eval_fetch_name_list,\ + 'fetch_varname_list':eval_fetch_varname_list} + + if alg in ['EAST', 'DB']: + program.train_eval_det_run( + config, exe, train_info_dict, eval_info_dict, is_pruning=True) + else: + program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict) + + +if __name__ == '__main__': + parser = program.ArgsParser() + FLAGS = parser.parse_args() + main() diff --git a/deploy/slim/prune/sensitivity_anal.py b/deploy/slim/prune/sensitivity_anal.py new file mode 100644 index 0000000000000000000000000000000000000000..b416f09a3a17527eccbd131917ca257b39bc7884 --- /dev/null +++ b/deploy/slim/prune/sensitivity_anal.py @@ -0,0 +1,115 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +__dir__ = os.path.dirname(__file__) +sys.path.append(__dir__) +sys.path.append(os.path.join(__dir__, '..', '..', '..')) +sys.path.append(os.path.join(__dir__, '..', '..', '..', 'tools')) + +import json +import cv2 +from paddle import fluid +import paddleslim as slim +from copy import deepcopy +from tools.eval_utils.eval_det_utils import eval_det_run + +from tools import program +from ppocr.utils.utility import initial_logger +from ppocr.data.reader_main import reader_main +from ppocr.utils.save_load import init_model +from ppocr.utils.character import CharacterOps +from ppocr.utils.utility import create_module +from ppocr.data.reader_main import reader_main + +logger = initial_logger() + + +def get_pruned_params(program): + params = [] + for param in program.global_block().all_parameters(): + if len( + param.shape + ) == 4 and 'depthwise' not in param.name and 'transpose' not in param.name: + params.append(param.name) + return params + + +def eval_function(eval_args, mode='eval'): + exe = eval_args['exe'] + config = eval_args['config'] + eval_info_dict = eval_args['eval_info_dict'] + metrics = eval_det_run(exe, config, eval_info_dict, mode=mode) + return metrics['hmean'] + + +def main(): + config = program.load_config(FLAGS.config) + program.merge_config(FLAGS.opt) + logger.info(config) + + # check if set use_gpu=True in paddlepaddle cpu version + use_gpu = config['Global']['use_gpu'] + program.check_gpu(use_gpu) + + alg = config['Global']['algorithm'] + assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE'] + if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']: + config['Global']['char_ops'] = CharacterOps(config['Global']) + + place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() + startup_prog = fluid.Program() + eval_program = fluid.Program() + eval_build_outputs = program.build( + config, eval_program, startup_prog, mode='test') + eval_fetch_name_list = eval_build_outputs[1] + eval_fetch_varname_list = eval_build_outputs[2] + eval_program = eval_program.clone(for_test=True) + exe = fluid.Executor(place) + exe.run(startup_prog) + + init_model(config, eval_program, exe) + + eval_reader = reader_main(config=config, mode="eval") + eval_info_dict = {'program':eval_program,\ + 'reader':eval_reader,\ + 'fetch_name_list':eval_fetch_name_list,\ + 'fetch_varname_list':eval_fetch_varname_list} + eval_args = dict() + eval_args = {'exe': exe, 'config': config, 'eval_info_dict': eval_info_dict} + metrics = eval_function(eval_args) + print("Baseline: {}".format(metrics)) + + params = get_pruned_params(eval_program) + print('Start to analyze') + sens_0 = slim.prune.sensitivity( + eval_program, + place, + params, + eval_function, + sensitivities_file="sensitivities_0.data", + pruned_ratios=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8], + eval_args=eval_args, + criterion='geometry_median') + + +if __name__ == '__main__': + parser = program.ArgsParser() + FLAGS = parser.parse_args() + main() diff --git a/deploy/slim/quantization/README.md b/deploy/slim/quantization/README.md new file mode 100755 index 0000000000000000000000000000000000000000..b35761c649ae5faf9e0db8663047419d991282fe --- /dev/null +++ b/deploy/slim/quantization/README.md @@ -0,0 +1,61 @@ + +## 介绍 +复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 +模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。 + +本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。 +PaddleSlim(项目链接:https://github.com/PaddlePaddle/PaddleSlim)集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。 + +在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html) + + +## 快速开始 +量化多适用于轻量模型在移动端的部署,当训练出一个模型后,如果希望进一步的压缩模型大小并加速预测,可使用量化的方法压缩模型。 + +模型量化主要包括五个步骤: +1. 安装 PaddleSlim +2. 准备训练好的模型 +3. 量化训练 +4. 导出量化推理模型 +5. 量化模型预测部署 + +### 1. 安装PaddleSlim + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd Paddleslim +python setup.py install +``` + +### 2. 准备训练好的模型 + +PaddleOCR提供了一系列训练好的[模型](../../../doc/doc_ch/models_list.md),如果待量化的模型不在列表中,需要按照[常规训练](../../../doc/doc_ch/quickstart.md)方法得到训练好的模型。 + +### 3. 量化训练 +量化训练包括离线量化训练和在线量化训练,在线量化训练效果更好,需加载预训练模型,在定义好量化策略后即可对模型进行量化。 + + +量化训练的代码位于slim/quantization/quant/py 中,比如训练检测模型,训练指令如下: +```bash +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model' Global.save_model_dir=./output/quant_model + +# 比如下载提供的训练模型 +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar +tar xf ch_ppocr_mobile_v1.1_det_train.tar +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_model_dir=./output/quant_model + +``` +如果要训练识别模型的量化,修改配置文件和加载的模型参数即可。 + +### 4. 导出模型 + +在得到量化训练保存的模型后,我们可以将其导出为inference_model,用于预测部署: + +```bash +python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model +``` + +### 5. 量化模型部署 + +上述步骤导出的量化模型,参数精度仍然是FP32,但是参数的数值范围是int8,导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。 +量化模型部署的可参考 [移动端模型部署](../lite/readme.md) diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md new file mode 100755 index 0000000000000000000000000000000000000000..4f28a54245b08a0156a74d3316f9532bf40423fc --- /dev/null +++ b/deploy/slim/quantization/README_en.md @@ -0,0 +1,68 @@ + +## Introduction + +Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. +Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, +so as to reduce model calculation complexity and improve model inference performance. + +This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model. + +It is recommended that you could understand following pages before reading this example: +- [The training strategy of OCR model](../../../doc/doc_en/quickstart_en.md) +- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) + +## Quick Start +Quantization is mostly suitable for the deployment of lightweight models on mobile terminals. +After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps. + +1. Install PaddleSlim +2. Prepare trained model +3. Quantization-Aware Training +4. Export inference model +5. Deploy quantization inference model + + +### 1. Install PaddleSlim + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd Paddleslim +python setup.py install +``` + + +### 2. Download Pretrain Model +PaddleOCR provides a series of trained [models](../../../doc/doc_en/models_list_en.md). +If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../doc/doc_en/quickstart_en.md) method to get the trained model. + + +### 3. Quant-Aware Training +Quantization training includes offline quantization training and online quantization training. +Online quantization training is more effective. It is necessary to load the pre-training model. +After the quantization strategy is defined, the model can be quantified. + +The code for quantization training is located in `slim/quantization/quant/py`. For example, to train a detection model, the training instructions are as follows: +```bash +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model' Global.save_model_dir=./output/quant_model + +# download provided model +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar +tar xf ch_ppocr_mobile_v1.1_det_train.tar +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_model_dir=./output/quant_model + +``` + + +### 4. Export inference model + +After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment: + +```bash +python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model +``` + +### 5. Deploy +The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8. +The derived model can be converted through the `opt tool` of PaddleLite. + +For quantitative model deployment, please refer to [Mobile terminal model deployment](../lite/readme_en.md) diff --git a/deploy/slim/quantization/export_model.py b/deploy/slim/quantization/export_model.py new file mode 100644 index 0000000000000000000000000000000000000000..d0d08b300066044d3088f669045e0536006c3140 --- /dev/null +++ b/deploy/slim/quantization/export_model.py @@ -0,0 +1,129 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +__dir__ = os.path.dirname(__file__) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..'))) +sys.path.append( + os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools'))) + + +def set_paddle_flags(**kwargs): + for key, value in kwargs.items(): + if os.environ.get(key, None) is None: + os.environ[key] = str(value) + + +# NOTE(paddle-dev): All of these flags should be +# set before `import paddle`. Otherwise, it would +# not take any effect. +set_paddle_flags( + FLAGS_eager_delete_tensor_gb=0, # enable GC to save memory +) + +import program +from paddle import fluid +from ppocr.utils.utility import initial_logger +logger = initial_logger() +from ppocr.utils.save_load import init_model, load_params +from ppocr.utils.character import CharacterOps +from ppocr.utils.utility import create_module +from ppocr.data.reader_main import reader_main + +from paddleslim.quant import quant_aware, convert +from paddle.fluid.layer_helper import LayerHelper +from eval_utils.eval_det_utils import eval_det_run +from eval_utils.eval_rec_utils import eval_rec_run + + +def main(): + # 1. quantization configs + quant_config = { + # weight quantize type, default is 'channel_wise_abs_max' + 'weight_quantize_type': 'channel_wise_abs_max', + # activation quantize type, default is 'moving_average_abs_max' + 'activation_quantize_type': 'moving_average_abs_max', + # weight quantize bit num, default is 8 + 'weight_bits': 8, + # activation quantize bit num, default is 8 + 'activation_bits': 8, + # ops of name_scope in not_quant_pattern list, will not be quantized + 'not_quant_pattern': ['skip_quant'], + # ops of type in quantize_op_types, will be quantized + 'quantize_op_types': ['conv2d', 'depthwise_conv2d', 'mul'], + # data type after quantization, such as 'uint8', 'int8', etc. default is 'int8' + 'dtype': 'int8', + # window size for 'range_abs_max' quantization. defaulf is 10000 + 'window_size': 10000, + # The decay coefficient of moving average, default is 0.9 + 'moving_rate': 0.9, + } + + startup_prog, eval_program, place, config, alg_type = program.preprocess() + + feeded_var_names, target_vars, fetches_var_name = program.build_export( + config, eval_program, startup_prog) + + eval_program = eval_program.clone(for_test=True) + exe = fluid.Executor(place) + exe.run(startup_prog) + + eval_program = quant_aware( + eval_program, place, quant_config, scope=None, for_test=True) + + init_model(config, eval_program, exe) + + # 2. Convert the program before save inference program + # The dtype of eval_program's weights is float32, but in int8 range. + + eval_program = convert(eval_program, place, quant_config, scope=None) + + eval_fetch_name_list = fetches_var_name + eval_fetch_varname_list = [v.name for v in target_vars] + eval_reader = reader_main(config=config, mode="eval") + quant_info_dict = {'program':eval_program,\ + 'reader':eval_reader,\ + 'fetch_name_list':eval_fetch_name_list,\ + 'fetch_varname_list':eval_fetch_varname_list} + + if alg_type == 'det': + final_metrics = eval_det_run(exe, config, quant_info_dict, "eval") + else: + final_metrics = eval_rec_run(exe, config, quant_info_dict, "eval") + print(final_metrics) + + # 3. Save inference model + model_path = "./quant_model" + if not os.path.isdir(model_path): + os.makedirs(model_path) + + fluid.io.save_inference_model( + dirname=model_path, + feeded_var_names=feeded_var_names, + target_vars=target_vars, + executor=exe, + main_program=eval_program, + model_filename=model_path + '/model', + params_filename=model_path + '/params') + print("model saved as {}".format(model_path)) + + +if __name__ == '__main__': + main() diff --git a/deploy/slim/quantization/quant.py b/deploy/slim/quantization/quant.py new file mode 100755 index 0000000000000000000000000000000000000000..f54a328d4645910dcb24bd4d597e8d5f8867312a --- /dev/null +++ b/deploy/slim/quantization/quant.py @@ -0,0 +1,188 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..'))) +sys.path.append( + os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools'))) + + +def set_paddle_flags(**kwargs): + for key, value in kwargs.items(): + if os.environ.get(key, None) is None: + os.environ[key] = str(value) + + +# NOTE(paddle-dev): All of these flags should be +# set before `import paddle`. Otherwise, it would +# not take any effect. +set_paddle_flags( + FLAGS_eager_delete_tensor_gb=0, # enable GC to save memory +) + +import tools.program as program +from paddle import fluid +from ppocr.utils.utility import initial_logger +logger = initial_logger() +from ppocr.data.reader_main import reader_main +from ppocr.utils.save_load import init_model +from paddle.fluid.contrib.model_stat import summary + +# quant dependencies +import paddle +import paddle.fluid as fluid +from paddleslim.quant import quant_aware, convert +from paddle.fluid.layer_helper import LayerHelper + + +def pact(x): + """ + Process a variable using the pact method you define + Args: + x(Tensor): Paddle Tensor, need to be preprocess before quantization + Returns: + The processed Tensor x. + """ + helper = LayerHelper("pact", **locals()) + dtype = 'float32' + init_thres = 20 + u_param_attr = fluid.ParamAttr( + name=x.name + '_pact', + initializer=fluid.initializer.ConstantInitializer(value=init_thres), + regularizer=fluid.regularizer.L2Decay(0.0001), + learning_rate=1) + u_param = helper.create_parameter(attr=u_param_attr, shape=[1], dtype=dtype) + x = fluid.layers.elementwise_sub( + x, fluid.layers.relu(fluid.layers.elementwise_sub(x, u_param))) + x = fluid.layers.elementwise_add( + x, fluid.layers.relu(fluid.layers.elementwise_sub(-u_param, x))) + return x + + +def get_optimizer(): + """ + Build a program using a model and an optimizer + """ + return fluid.optimizer.AdamOptimizer(0.001) + + +def main(): + train_build_outputs = program.build( + config, train_program, startup_program, mode='train') + train_loader = train_build_outputs[0] + train_fetch_name_list = train_build_outputs[1] + train_fetch_varname_list = train_build_outputs[2] + train_opt_loss_name = train_build_outputs[3] + model_average = train_build_outputs[-1] + + eval_program = fluid.Program() + eval_build_outputs = program.build( + config, eval_program, startup_program, mode='eval') + eval_fetch_name_list = eval_build_outputs[1] + eval_fetch_varname_list = eval_build_outputs[2] + eval_program = eval_program.clone(for_test=True) + + train_reader = reader_main(config=config, mode="train") + train_loader.set_sample_list_generator(train_reader, places=place) + + eval_reader = reader_main(config=config, mode="eval") + + exe = fluid.Executor(place) + exe.run(startup_program) + + # 1. quantization configs + quant_config = { + # weight quantize type, default is 'channel_wise_abs_max' + 'weight_quantize_type': 'channel_wise_abs_max', + # activation quantize type, default is 'moving_average_abs_max' + 'activation_quantize_type': 'moving_average_abs_max', + # weight quantize bit num, default is 8 + 'weight_bits': 8, + # activation quantize bit num, default is 8 + 'activation_bits': 8, + # ops of name_scope in not_quant_pattern list, will not be quantized + 'not_quant_pattern': ['skip_quant'], + # ops of type in quantize_op_types, will be quantized + 'quantize_op_types': ['conv2d', 'depthwise_conv2d', 'mul'], + # data type after quantization, such as 'uint8', 'int8', etc. default is 'int8' + 'dtype': 'int8', + # window size for 'range_abs_max' quantization. defaulf is 10000 + 'window_size': 10000, + # The decay coefficient of moving average, default is 0.9 + 'moving_rate': 0.9, + } + + # 2. quantization transform programs (training aware) + # Make some quantization transforms in the graph before training and testing. + # According to the weight and activation quantization type, the graph will be added + # some fake quantize operators and fake dequantize operators. + act_preprocess_func = pact + optimizer_func = get_optimizer + executor = exe + + eval_program = quant_aware( + eval_program, + place, + quant_config, + scope=None, + act_preprocess_func=act_preprocess_func, + optimizer_func=optimizer_func, + executor=executor, + for_test=True) + quant_train_program = quant_aware( + train_program, + place, + quant_config, + scope=None, + act_preprocess_func=act_preprocess_func, + optimizer_func=optimizer_func, + executor=executor, + for_test=False, + return_program=True) + + # compile program for multi-devices + train_compile_program = program.create_multi_devices_program( + quant_train_program, train_opt_loss_name, for_quant=True) + + init_model(config, quant_train_program, exe) + + train_info_dict = {'compile_program':train_compile_program,\ + 'train_program':quant_train_program,\ + 'reader':train_loader,\ + 'fetch_name_list':train_fetch_name_list,\ + 'fetch_varname_list':train_fetch_varname_list,\ + 'model_average': model_average} + + eval_info_dict = {'program':eval_program,\ + 'reader':eval_reader,\ + 'fetch_name_list':eval_fetch_name_list,\ + 'fetch_varname_list':eval_fetch_varname_list} + + if train_alg_type == 'det': + program.train_eval_det_run(config, exe, train_info_dict, eval_info_dict) + else: + program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict) + + +if __name__ == '__main__': + startup_program, train_program, place, config, train_alg_type = program.preprocess( + ) + main() diff --git a/doc/PPOCR.pdf b/doc/PPOCR.pdf new file mode 100644 index 0000000000000000000000000000000000000000..219621ddb58a96b4b85ef4d74f05dd517c2eb630 Binary files /dev/null and b/doc/PPOCR.pdf differ diff --git a/doc/datasets/captcha_demo.png b/doc/datasets/captcha_demo.png new file mode 100644 index 0000000000000000000000000000000000000000..047a72648c5766102fbfb9301a9c19917fe62b04 Binary files /dev/null and b/doc/datasets/captcha_demo.png differ diff --git a/doc/datasets/ccpd_demo.png b/doc/datasets/ccpd_demo.png new file mode 100644 index 0000000000000000000000000000000000000000..a750d054f6d05ece9021261e2dba94616da940fe Binary files /dev/null and b/doc/datasets/ccpd_demo.png differ diff --git a/doc/datasets/cmb_demo.jpg b/doc/datasets/cmb_demo.jpg new file mode 100644 index 0000000000000000000000000000000000000000..8299149a7c20041e4db7b75b751b705581106929 Binary files /dev/null and b/doc/datasets/cmb_demo.jpg differ diff --git a/doc/datasets/doc.jpg b/doc/datasets/doc.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f57e62abe187bab5b9d49d406e313c2439d0f89e Binary files /dev/null and b/doc/datasets/doc.jpg differ diff --git a/doc/datasets/nist_demo.png b/doc/datasets/nist_demo.png new file mode 100644 index 0000000000000000000000000000000000000000..4c2ce11e2648b170d8342fb739a0d014079c50e4 Binary files /dev/null and b/doc/datasets/nist_demo.png differ diff --git a/doc/demo/build.png b/doc/demo/build.png new file mode 100644 index 0000000000000000000000000000000000000000..73fb58e24b497bf57881745774838adb90d07ffa Binary files /dev/null and b/doc/demo/build.png differ diff --git a/doc/demo/error.png b/doc/demo/error.png new file mode 100644 index 0000000000000000000000000000000000000000..6a463262b49a321f901729e04eedaa46b7f46a3b Binary files /dev/null and b/doc/demo/error.png differ diff --git a/doc/demo/proxy.png b/doc/demo/proxy.png new file mode 100644 index 0000000000000000000000000000000000000000..c6d72ce8df1c0cd0cca32f64157b7f0d3995991c Binary files /dev/null and b/doc/demo/proxy.png differ diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md index 38b8baeed585fbd58a56eca310cf2dbe4392b5f7..f782a1643c4f75cb82745000a24a8a17fb0ff4b8 100644 --- a/doc/doc_ch/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -1,51 +1,495 @@ -## FAQ +# FAQ -1. **预测报错:got an unexpected keyword argument 'gradient_clip'** -安装的paddle版本不对,目前本项目仅支持paddle1.7,近期会适配到1.8。 +## 写在前面 -2. **转换attention识别模型时报错:KeyError: 'predict'** -问题已解决,请更新到最新代码。 +- 我们收集整理了开源以来在issues和用户群中的常见问题并且给出了简要解答,旨在为OCR的开发者提供一些参考,也希望帮助大家少走一些弯路。 -3. **关于推理速度** -图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。 +- OCR领域大佬众多,本文档回答主要依赖有限的项目实践,难免挂一漏万,如有遗漏和不足,也**希望有识之士帮忙补充和修正**,万分感谢。 -4. **服务部署与移动端部署** -预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案,欢迎持续关注。 -5. **自研算法发布时间** -自研算法SAST、SRN、End2End-PSL都将在7-8月陆续发布,敬请期待。 +## PaddleOCR常见问题汇总(持续更新) -6. **如何在Windows或Mac系统上运行** -PaddleOCR已完成Windows和Mac系统适配,运行时注意两点:1、在[快速安装](./installation.md)时,如果不想安装docker,可跳过第一步,直接从第二步安装paddle开始。2、inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。 +* [【精选】OCR精选10个问题](#OCR精选10个问题) +* [【理论篇】OCR通用21个问题](#OCR通用问题) + * [基础知识3题](#基础知识) + * [数据集4题](#数据集) + * [模型训练调优6题](#模型训练调优) + * [预测部署8题](#预测部署) +* [【实战篇】PaddleOCR实战53个问题](#PaddleOCR实战问题) + * [使用咨询17题](#使用咨询) + * [数据集9题](#数据集) + * [模型训练调优13题](#模型训练调优) + * [预测部署14题](#预测部署) -7. **超轻量模型和通用OCR模型的区别** -目前PaddleOCR开源了2个中文模型,分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下: - - 相同点:两者使用相同的**算法**和**训练数据**; - - 不同点:不同之处在于**骨干网络**和**通道参数**,超轻量模型使用MobileNetV3作为骨干网络,通用模型使用Resnet50_vd作为检测模型backbone,Resnet34_vd作为识别模型backbone,具体参数差异可对比两种模型训练的配置文件. + + + +## 【精选】OCR精选10个问题 + +#### Q1.1.1:基于深度学习的文字检测方法有哪几种?各有什么优缺点? + +**A**:常用的基于深度学习的文字检测方法一般可以分为基于回归的、基于分割的两大类,当然还有一些将两者进行结合的方法。 + +(1)基于回归的方法分为box回归和像素值回归。a. 采用box回归的方法主要有CTPN、Textbox系列和EAST,这类算法对规则形状文本检测效果较好,但无法准确检测不规则形状文本。 b. 像素值回归的方法主要有CRAFT和SA-Text,这类算法能够检测弯曲文本且对小文本效果优秀但是实时性能不够。 + +(2)基于分割的算法,如PSENet,这类算法不受文本形状的限制,对各种形状的文本都能取得较好的效果,但是往往后处理比较复杂,导致耗时严重。目前也有一些算法专门针对这个问题进行改进,如DB,将二值化进行近似,使其可导,融入训练,从而获取更准确的边界,大大降低了后处理的耗时。 + +#### Q1.1.2:对于中文行文本识别,CTC和Attention哪种更优? + +**A**:(1)从效果上来看,通用OCR场景CTC的识别效果优于Attention,因为带识别的字典中的字符比较多,常用中文汉字三千字以上,如果训练样本不足的情况下,对于这些字符的序列关系挖掘比较困难。中文场景下Attention模型的优势无法体现。而且Attention适合短语句识别,对长句子识别比较差。 + +(2)从训练和预测速度上,Attention的串行解码结构限制了预测速度,而CTC网络结构更高效,预测速度上更有优势。 + +#### Q1.1.3:弯曲形变的文字识别需要怎么处理?TPS应用场景是什么,是否好用? + +**A**:(1)在大多数情况下,如果遇到的场景弯曲形变不是太严重,检测4个顶点,然后直接通过仿射变换转正识别就足够了。 + +(2)如果不能满足需求,可以尝试使用TPS(Thin Plate Spline),即薄板样条插值。TPS是一种插值算法,经常用于图像变形等,通过少量的控制点就可以驱动图像进行变化。一般用在有弯曲形变的文本识别中,当检测到不规则的/弯曲的(如,使用基于分割的方法检测算法)文本区域,往往先使用TPS算法对文本区域矫正成矩形再进行识别,如,STAR-Net、RARE等识别算法中引入了TPS模块。 +**Warning**:TPS看起来美好,在实际应用时经常发现并不够鲁棒,并且会增加耗时,需要谨慎使用。 + +#### Q1.1.4:简单的对于精度要求不高的OCR任务,数据集需要准备多少张呢? + +**A**:(1)训练数据的数量和需要解决问题的复杂度有关系。难度越大,精度要求越高,则数据集需求越大,而且一般情况实际中的训练数据越多效果越好。 + +(2)对于精度要求不高的场景,检测任务和识别任务需要的数据量是不一样的。对于检测任务,500张图像可以保证基本的检测效果。对于识别任务,需要保证识别字典中每个字符出现在不同场景的行文本图像数目需要大于200张(举例,如果有字典中有5个字,每个字都需要出现在200张图片以上,那么最少要求的图像数量应该在200-1000张之间),这样可以保证基本的识别效果。 + +#### Q1.1.5:背景干扰的文字(如印章盖到落款上,需要识别落款或者印章中的文字),如何识别? + +**A**:(1)在人眼确认可识别的条件下,对于背景有干扰的文字,首先要保证检测框足够准确,如果检测框不准确,需要考虑是否可以通过过滤颜色等方式对图像预处理并且增加更多相关的训练数据;在识别的部分,注意在训练数据中加入背景干扰类的扩增图像。 + +(2)如果MobileNet模型不能满足需求,可以尝试ResNet系列大模型来获得更好的效果 +。 + +#### Q1.1.6:OCR领域常用的评估指标是什么? + +**A**:对于两阶段的可以分开来看,分别是检测和识别阶段 + +(1)检测阶段:先按照检测框和标注框的IOU评估,IOU大于某个阈值判断为检测准确。这里检测框和标注框不同于一般的通用目标检测框,是采用多边形进行表示。检测准确率:正确的检测框个数在全部检测框的占比,主要是判断检测指标。检测召回率:正确的检测框个数在全部标注框的占比,主要是判断漏检的指标。 + + +(2)识别阶段: +字符识别准确率,即正确识别的文本行占标注的文本行数量的比例,只有整行文本识别对才算正确识别。 + +(3)端到端统计: +端对端召回率:准确检测并正确识别文本行在全部标注文本行的占比; +端到端准确率:准确检测并正确识别文本行在 检测到的文本行数量 的占比; +准确检测的标准是检测框与标注框的IOU大于某个阈值,正确识别的的检测框中的文本与标注的文本相同。 + + +#### Q1.1.7:单张图上多语种并存识别(如单张图印刷体和手写文字并存),应该如何处理? + +**A**:单张图像中存在多种类型文本的情况很常见,典型的以学生的试卷为代表,一张图像同时存在手写体和印刷体两种文本,这类情况下,可以尝试”1个检测模型+1个N分类模型+N个识别模型”的解决方案。 +其中不同类型文本共用同一个检测模型,N分类模型指额外训练一个分类器,将检测到的文本进行分类,如手写+印刷的情况就是二分类,N种语言就是N分类,在识别的部分,针对每个类型的文本单独训练一个识别模型,如手写+印刷的场景,就需要训练一个手写体识别模型,一个印刷体识别模型,如果一个文本框的分类结果是手写体,那么就传给手写体识别模型进行识别,其他情况同理。 + +#### Q1.1.8:请问PaddleOCR项目中的中文超轻量和通用模型用了哪些数据集?训练多少样本,gpu什么配置,跑了多少个epoch,大概跑了多久? + +**A**: +(1)检测的话,LSVT街景数据集共3W张图像,超轻量模型,150epoch左右,2卡V100 跑了不到2天;通用模型:2卡V100 150epoch 不到4天。 +(2) +识别的话,520W左右的数据集(真实数据26W+合成数据500W)训练,超轻量模型:4卡V100,总共训练了5天左右。通用模型:4卡V100,共训练6天。 + +超轻量模型训练分为2个阶段: +(1)全量数据训练50epoch,耗时3天 +(2)合成数据+真实数据按照1:1数据采样,进行finetune训练200epoch,耗时2天 + +通用模型训练: +真实数据+合成数据,动态采样(1:1)训练,200epoch,耗时 6天左右。 + + +#### Q1.1.9:PaddleOCR模型推理方式有几种?各自的优缺点是什么 + +**A**:目前推理方式支持基于训练引擎推理和基于预测引擎推理。 + +(1)基于训练引擎推理不需要转换模型,但是需要先组网再load参数,语言只支持python,不适合系统集成。 + +(2)基于预测引擎的推理需要先转换模型为inference格式,然后可以进行不需要组网的推理,语言支持c++和python,适合系统集成。 + +#### Q1.1.10:PaddleOCR中,对于模型预测加速,CPU加速的途径有哪些?基于TenorRT加速GPU对输入有什么要求? + +**A**:(1)CPU可以使用mkldnn进行加速;对于python inference的话,可以把enable_mkldnn改为true,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/tools/infer/utility.py#L73),对于cpp inference的话,在配置文件里面配置use_mkldnn 1即可,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/deploy/cpp_infer/tools/config.txt#L6) + +(2)GPU需要注意变长输入问题等,TRT6 之后才支持变长输入 + + +## 【理论篇】OCR通用问题 +### 基础知识 + +#### Q2.1.1:CRNN能否识别两行的文字?还是说必须一行? + +**A**:CRNN是一种基于1D-CTC的算法,其原理决定无法识别2行或多行的文字,只能单行识别。 + +#### Q2.1.2:怎么判断行文本图像是否是颠倒的? + +**A**:有两种方案:(1)原始图像和颠倒图像都进行识别预测,取得分较高的为识别结果。 +(2)训练一个正常图像和颠倒图像的方向分类器进行判断。 + +#### Q2.1.3:目前OCR普遍是二阶段,端到端的方案在业界落地情况如何? + +**A**:端到端在文字分布密集的业务场景,效率会比较有保证,精度的话看自己业务数据积累情况,如果行级别的识别数据积累比较多的话two-stage会比较好。百度的落地场景,比如工业仪表识别、车牌识别都用到端到端解决方案。 + + +### 数据集 + +#### Q2.2.1:支持空格的模型,标注数据的时候是不是要标注空格?中间几个空格都要标注出来么? + +**A**:如果需要检测和识别模型,就需要在标注的时候把空格标注出来,而且在字典中增加空格对应的字符。标注过程中,如果中间几个空格标注一个就行。 + +#### Q2.2.2:如果考虑支持竖排文字识别,相关的数据集如何合成? + +**A**:竖排文字与横排文字合成方式相同,只是选择了垂直字体。合成工具推荐:[text_renderer](https://github.com/Sanster/text_renderer) + +#### Q2.2.3:训练文字识别模型,真实数据有30w,合成数据有500w,需要做样本均衡吗? + +**A**:需要,一般需要保证一个batch中真实数据样本和合成数据样本的比例是1:1~1:3左右效果比较理想。如果合成数据过大,会过拟合到合成数据,预测效果往往不佳。还有一种**启发性**的尝试是可以先用大量合成数据训练一个base模型,然后再用真实数据微调,在一些简单场景效果也是会有提升的。 + +#### Q2.2.4:请问一下,竖排文字识别时候,字的特征已经变了,这种情况在数据集和字典标注是新增一个类别还是多个角度的字共享一个类别? + +**A**:可以根据实际场景做不同的尝试,共享一个类别是可以收敛,效果也还不错。但是如果分开训练,同类样本之间一致性更好,更容易收敛,识别效果会更优。 + +### 模型训练调优 + +#### Q2.3.1:如何更换文本检测/识别的backbone? +**A**:无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。[**飞桨图像分类套件PaddleClas**](https://github.com/PaddlePaddle/PaddleClas)汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的[**117个预训练模型下载地址**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。 + + (1)文字检测骨干网络的替换,主要是确定类似与ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。 + + (2)文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中[MobileNetV3骨干网络](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py)的改动。 + +#### Q2.3.2:文本识别训练不加LSTM是否可以收敛? + +**A**:理论上是可以收敛的,加上LSTM模块主要是为了挖掘文字之间的序列关系,提升识别效果。对于有明显上下文语义的场景效果会比较明显。 + +#### Q2.3.3:文本识别中LSTM和GRU如何选择? + +**A**:从项目实践经验来看,序列模块采用LSTM的识别效果优于GRU,但是LSTM的计算量比GRU大一些,可以根据自己实际情况选择。 + +#### Q2.3.4:对于CRNN模型,backbone采用DenseNet和ResNet_vd,哪种网络结构更好? + +**A**:Backbone的识别效果在CRNN模型上的效果,与Imagenet 1000 图像分类任务上识别效果和效率一致。在图像分类任务上ResnNet_vd(79%+)的识别精度明显优于DenseNet(77%+),此外对于GPU,Nvidia针对ResNet系列模型做了优化,预测效率更高,所以相对而言,resnet_vd是较好选择。如果是移动端,可以优先考虑MobileNetV3系列。 + +#### Q2.3.5:训练识别时,如何选择合适的网络输入shape? + +**A**:一般高度采用32,最长宽度的选择,有两种方法: + +(1)统计训练样本图像的宽高比分布。最大宽高比的选取考虑满足80%的训练样本。 + +(2)统计训练样本文字数目。最长字符数目的选取考虑满足80%的训练样本。然后中文字符长宽比近似认为是1,英文认为3:1,预估一个最长宽度。 + +#### Q2.3.6:如何识别文字比较长的文本? + +**A**:在中文识别模型训练时,并不是采用直接将训练样本缩放到[3,32,320]进行训练,而是先等比例缩放图像,保证图像高度为32,宽度不足320的部分补0,宽高比大于10的样本直接丢弃。预测时,如果是单张图像预测,则按上述操作直接对图像缩放,不做宽度320的限制。如果是多张图预测,则采用batch方式预测,每个batch的宽度动态变换,采用这个batch中最长宽度。 + +### 预测部署 + +#### Q2.4.1:请问对于图片中的密集文字,有什么好的处理办法吗? + +**A**:可以先试用预训练模型测试一下,例如DB+CRNN,判断下密集文字图片中是检测还是识别的问题,然后针对性的改善。还有一种是如果图象中密集文字较小,可以尝试增大图像分辨率,对图像进行一定范围内的拉伸,将文字稀疏化,提高识别效果。 + +#### Q2.4.2:对于一些在识别时稍微模糊的文本,有没有一些图像增强的方式? + +**A**:在人类肉眼可以识别的前提下,可以考虑图像处理中的均值滤波、中值滤波或者高斯滤波等模糊算子尝试。也可以尝试从数据扩增扰动来强化模型鲁棒性,另外新的思路有对抗性训练和超分SR思路,可以尝试借鉴。但目前业界尚无普遍认可的最优方案,建议优先在数据采集阶段增加一些限制提升图片质量。 + +#### Q2.4.3:对于特定文字检测,例如身份证只检测姓名,检测指定区域文字更好,还是检测全部区域再筛选更好? + +**A**:两个角度来说明一般检测全部区域再筛选更好。 + +(1)由于特定文字和非特定文字之间的视觉特征并没有很强的区分行,只检测指定区域,容易造成特定文字漏检。 + +(2)产品的需求可能是变化的,不排除后续对于模型需求变化的可能性(比如又需要增加一个字段),相比于训练模型,后处理的逻辑会更容易调整。 + +#### Q2.4.4:对于小白如何快速入门中文OCR项目实践? + +**A**:建议可以先了解OCR方向的基础知识,大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看,从内容的完备性来看,PaddleOCR的中英文双语教程文档是有明显优势的,在数据集、模型训练、预测部署文档详实,可以快速入手。而且还有微信用户群答疑,非常适合学习实践。项目地址:[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) + +#### Q2.4.5:如何识别带空格的英文行文本图像? + +**A**:空格识别可以考虑以下两种方案: + +(1)优化文本检测算法。检测结果在空格处将文本断开。这种方案在检测数据标注时,需要将含有空格的文本行分成好多段。 + +(2)优化文本识别算法。在识别字典里面引入空格字符,然后在识别的训练数据中,如果用空行,进行标注。此外,合成数据时,通过拼接训练数据,生成含有空格的文本。 + +#### Q2.4.6:中英文一起识别时也可以加空格字符来训练吗 + +**A**:中文识别可以加空格当做分隔符训练,具体的效果如何没法给出直接评判,根据实际业务数据训练来判断。 + +#### Q2.4.7:低像素文字或者字号比较小的文字有什么超分辨率方法吗 + +**A**:超分辨率方法分为传统方法和基于深度学习的方法。基于深度学习的方法中,比较经典的有SRCNN,另外CVPR2020也有一篇超分辨率的工作可以参考文章:Unpaired Image Super-Resolution using Pseudo-Supervision,但是没有充分的实践验证过,需要看实际场景下的效果。 + +#### Q2.4.8:表格识别有什么好的模型 或者论文推荐么 + +**A**:表格目前学术界比较成熟的解决方案不多 ,可以尝试下分割的论文方案。 + + + +## 【实战篇】PaddleOCR实战问题 + +### 使用咨询 + +#### Q3.1.1:OSError: [WinError 126] 找不到指定的模块。mac pro python 3.4 shapely import 问题 + +**A**:这个问题是因为shapely库安装有误,可以参考 [#212](https://github.com/PaddlePaddle/PaddleOCR/issues/212) 这个issue重新安装一下 + +#### Q3.1.2:安装了paddle-gpu,运行时提示没有安装gpu版本的paddle,可能是什么原因? + +**A**:用户同时安装了paddle cpu和gpu版本,都删掉之后,重新安装gpu版本的padle就好了 + +#### Q3.1.3:试用报错:Cannot load cudnn shared library,是什么原因呢? + +**A**:需要把cudnn lib添加到LD_LIBRARY_PATH中去。 + +#### Q3.1.4:PaddlePaddle怎么指定GPU运行 os.environ["CUDA_VISIBLE_DEVICES"]这种不生效 + +**A**:通过设置 export CUDA_VISIBLE_DEVICES='0'环境变量 + +#### Q3.1.5:windows下训练没有问题,aistudio中提示数据路径有问题 + +**A**:需要把`\`改为`/`(windows和linux的文件夹分隔符不一样,windows下的是`\`,linux下是`/`) + +#### Q3.1.6:gpu版的paddle虽然能在cpu上运行,但是必须要有gpu设备 + +**A**:export CUDA_VISIBLE_DEVICES='',CPU是可以正常跑的 + +#### Q3.1.7:预测报错ImportError: dlopen: cannot load any more object with static TLS + +**A**:glibc的版本问题,运行需要glibc的版本号大于2.23。 + +#### Q3.1.8:提供的inference model和预训练模型的区别 + +**A**:inference model为固化模型,文件中包含网络结构和网络参数,多用于预测部署。预训练模型是训练过程中保存好的模型,多用于fine-tune训练或者断点训练。 + +#### Q3.1.9:模型的解码部分有后处理? + +**A**:有的检测的后处理在ppocr/postprocess路径下,识别的后处理均在ppocr/utils/character.py文件内 + +#### Q3.1.10:PaddleOCR中文模型是否支持数字识别? + +**A**:支持的,可以看下ppocr/utils/ppocr_keys_v1.txt 这个文件,是支持的识别字符列表,其中包含了数字识别。 + +#### Q3.1.11:PaddleOCR如何做到横排和竖排同时支持的? + +**A**:合成了一批竖排文字,逆时针旋转90度后加入训练集与横排一起训练。预测时根据图片长款比判断是否为竖排,若为竖排则将crop出的文本逆时针旋转90度后送入识别网络。 + +#### Q3.1.12:如何获取检测文本框的坐标? + +**A**:文本检测的结果有box和文本信息, 具体 [参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/9d33e36df550762b204d5fbfd7977a25e31b2c44/tools/infer/predict_system.py#L13) + +#### Q3.1.13:识别模型框出来的位置太紧凑,会丢失边缘的文字信息,导致识别错误 + +**A**: 可以在命令中加入 --det_db_unclip_ratio ,参数[定义位置](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/utility.py#L49),这个参数是检测后处理时控制文本框大小的,默认2.0,可以尝试改成2.5或者更大,反之,如果觉得文本框不够紧凑,也可以把该参数调小。 + +#### Q3.1.14:英文手写体识别有计划提供的预训练模型吗? + +**A**:近期也在开展需求调研,如果企业用户需求较多,我们会考虑增加相应的研发投入,后续提供对应的预训练模型,如果有需求欢迎通过issue或者加入微信群联系我们。 + +#### Q3.1.15:PaddleOCR的算法可以用于手写文字检测识别吗?后续有计划推出手写预训练模型么? +**A**:理论上只要有相应的数据集,都是可以的。当然手写识别毕竟和印刷体有区别,对应训练调优策略可能需要适配性优化。 + + +#### Q3.1.16:PaddleOCR是否支持在Windows或Mac系统上运行? + +**A**:PaddleOCR已完成Windows和Mac系统适配,运行时注意两点: + +(1)在[快速安装](./installation.md)时,如果不想安装docker,可跳过第一步,直接从第二步安装paddle开始。 + +(2)inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。 + +#### Q3.1.17:PaddleOCR开源的超轻量模型和通用OCR模型的区别? +**A**:目前PaddleOCR开源了2个中文模型,分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下: +- 相同点:两者使用相同的**算法**和**训练数据**; +- 不同点:不同之处在于**骨干网络**和**通道参数**,超轻量模型使用MobileNetV3作为骨干网络,通用模型使用Resnet50_vd作为检测模型backbone,Resnet34_vd作为识别模型backbone,具体参数差异可对比两种模型训练的配置文件. |模型|骨干网络|检测训练配置|识别训练配置| |-|-|-|-| |8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml| |通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml| -8. **是否有计划开源仅识别数字或仅识别英文+数字的模型** -暂不计划开源仅数字、仅数字+英文、或其他小垂类专用模型。PaddleOCR开源了多种检测、识别算法供用户自定义训练,两种中文模型也是基于开源的算法库训练产出,有小垂类需求的小伙伴,可以按照教程准备好数据,选择合适的配置文件,自行训练,相信能有不错的效果。训练有任何问题欢迎提issue或在交流群提问,我们会及时解答。 -9. **开源模型使用的训练数据是什么,能否开源** -目前开源的模型,数据集和量级如下: - - 检测: - 英文数据集,ICDAR2015 - 中文数据集,LSVT街景数据集训练数据3w张图片 - - 识别: - 英文数据集,MJSynth和SynthText合成数据,数据量上千万。 - 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。 - - 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 +### 数据集 + +#### Q3.2.1:如何制作PaddleOCR支持的数据格式 + +**A**:可以参考检测与识别训练文档,里面有数据格式详细介绍。[检测文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md),[识别文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/recognition.md) + +#### Q3.2.2:请问一下,如果想用预训练模型,但是我的数据里面又出现了预训练模型字符集中没有的字符,新的字符是在字符集前面添加还是在后面添加? + +**A**:在后面添加,修改dict之后,就改变了模型最后一层fc的结构,之前训练到的参数没有用到,相当于从头训练,因此acc是0。 + +#### Q3.2.3:如何调试数据读取程序? + +**A**:tools/train.py中有一个test_reader()函数用于调试数据读取。 + +#### Q3.2.4:开源模型使用的训练数据是什么,能否开源? + +**A**:目前开源的模型,数据集和量级如下: + +- 检测: + - 英文数据集,ICDAR2015 + - 中文数据集,LSVT街景数据集训练数据3w张图片 + +- 识别: + - 英文数据集,MJSynth和SynthText合成数据,数据量上千万。 + - 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。 + +其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 + +#### Q3.2.5:请问中文字符集多大呢?支持生僻字识别吗? + +**A**:中文字符集是6623, 支持生僻字识别。训练样本中有部分生僻字,但样本不多,如果有特殊需求建议使用自己的数据集做fine-tune。 + +#### Q3.2.6:中文文本检测、文本识别构建训练集的话,大概需要多少数据量 + +**A**:检测需要的数据相对较少,在PaddleOCR模型的基础上进行Fine-tune,一般需要500张可达到不错的效果。 +识别分英文和中文,一般英文场景需要几十万数据可达到不错的效果,中文则需要几百万甚至更多。 + +#### Q3.2.7:中文识别模型如何选择? + +**A**:中文模型共有2大类:通用模型和超轻量模型。他们各自的优势如下: +超轻量模型具有更小的模型大小,更快的预测速度。适合用于端侧使用。 +通用模型具有更高的模型精度,适合对模型大小不敏感的场景。 +此外基于以上模型,PaddleOCR还提供了支持空格识别的模型,主要针对中文场景中的英文句子。 +您可以根据实际使用需求进行选择。 + +#### Q3.2.8:图像旋转90° 文本检测可以正常检测到具体文本位置,但是识别准确度大幅降低,是否会考虑增加相应的旋转预处理? + +**A**:目前模型只支持两种方向的文字:水平和垂直。 为了降低模型大小,加快模型预测速度,PaddleOCR暂时没有加入图片的方向判断。建议用户在识别前自行转正,后期也会考虑添加选择角度判断。 + +#### Q3.2.9:同一张图通用检测出21个条目,轻量级检测出26个 ,难道不是轻量级的好吗? + +**A**:可以主要参考可视化效果,通用模型更倾向于检测一整行文字,轻量级可能会有一行文字被分成两段检测的情况,不是数量越多,效果就越好。 + +### 模型训练调优 + +#### Q3.3.1:文本长度超过25,应该怎么处理? + +**A**:默认训练时的文本可识别的最大长度为25,超过25的文本会被忽略不参与训练。如果您训练样本中的长文本较多,可以修改配置文件中的 max\_text\_length 字段,设置为更大的最长文本长度,具体位置在[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/fb9e47b262529386983edc21b33abfa16bbf06ac/configs/rec/rec_chinese_lite_train.yml#L13)。 + +#### Q3.3.2:配置文件里面检测的阈值设置么? + +**A**:有的,检测相关的参数主要有以下几个: +``max_side_len:预测时图像resize的长边尺寸 +thresh: 用于二值化输出图的阈值 +box_thresh:用于过滤文本框的阈值,低于此阈值的文本框不要 +unclip_ratio: 文本框扩张的系数,关系到文本框的大小`` + +这些参数的默认值见[代码](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/utility.py#L40),可以通过从命令行传递参数进行修改。 + +#### Q3.3.3:我想请教一下,你们在训练识别时候,lsvt里的非矩形框文字,你们是怎么做处理的呢。忽略掉还是去最小旋转框? + +**A**:现在是忽略处理的 + +#### Q3.3.4:训练过程中,如何恰当的停止训练(直接kill,经常还有显存占用的问题) + +**A**:可以通过下面的脚本终止所有包含train.py字段的进程, + +``` +ps -axu | grep train.py | awk '{print $2}' | xargs kill -9 +``` + +#### Q3.3.5:读数据进程数设置4~8时训练一会进程接连defunct后gpu利用率一直为0卡死 + +**A**:修改多进程的队列数后解决, 将[代码段]( https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/ppocr/data/reader_main.py#L75 ) 修改为: + +``` +return paddle.reader.multiprocess_reader(readers, False, queue_size=320) + +``` + +#### Q3.3.6:可不可以将pretrain_weights设置为空呢?想从零开始训练一个model + +**A**:这个是可以的,在训练通用识别模型的时候,pretrain_weights就设置为空,但是这样可能需要更长的迭代轮数才能达到相同的精度。 + +#### Q3.3.7:PaddleOCR默认不是200个step保存一次模型吗?为啥文件夹下面都没有生成 + +**A**:因为默认保存的起始点不是0,而是4000,将eval_batch_step [4000, 5000]改为[0, 2000] 就是从第0次迭代开始,每2000迭代保存一次模型 + +#### Q3.3.8:如何进行模型微调? + +**A**:注意配置好匹配的数据集合适,然后在finetune训练时,可以加载我们提供的预训练模型,设置配置文件中Global.pretrain_weights 参数为要加载的预训练模型路径。 + +#### Q3.3.9:文本检测换成自己的数据没法训练,有一些”###”是什么意思? + +**A**:数据格式有问题,”###” 表示要被忽略的文本区域,所以你的数据都被跳过了,可以换成其他任意字符或者就写个空的。 + +#### Q3.3.10:copy_from_cpu这个地方,这块input不变(t_data的size不变)连续调用两次copy_from_cpu()时,这里面的gpu_place会重新malloc GPU内存吗?还是只有当ele_size变化时才会重新在GPU上malloc呢? + +**A**:小于等于的时候都不会重新分配,只有大于的时候才会重新分配 + +#### Q3.3.11:自己训练出来的未inference转换的模型 可以当作预训练模型吗? + +**A**:可以的,但是如果训练数据两量少的话,可能会过拟合到少量数据上,泛化性能不佳。 + +#### Q3.3.12:使用带TPS的识别模型预测报错 + +**A**:直接更换配置文件里的Backbone.function即可,格式为:网络文件路径,网络Class名词。如果所需的backbone在PaddleOCR里没有提供,可以参照PaddleClas里面的网络结构,进行修改尝试。具体修改原则可以参考OCR通用问题中 "如何更换文本检测/识别的backbone" 的回答。 + +#### Q3.3.13:如何更换文本检测/识别的backbone?报错信息:``Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](320) != Grid dimension[2](100) `` + +**A**:TPS模块暂时无法支持变长的输入,请设置 ``--rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape`` + +### 预测部署 + +#### Q3.4.1:如何pip安装opt模型转换工具? + +**A**:由于OCR端侧部署需要某些算子的支持,这些算子仅在Paddle-Lite 最新develop分支中,所以需要自己编译opt模型转换工具。opt工具可以通过编译PaddleLite获得,编译步骤参考[lite部署文档](https://github.com/PaddlePaddle/PaddleOCR/blob/0791714b91/deploy/lite/readme.md) 中2.1 模型优化部分。 + +#### Q3.4.2:如何将PaddleOCR预测模型封装成SDK + +**A**:如果是Python的话,可以使用tools/infer/predict_system.py中的TextSystem进行sdk封装,如果是c++的话,可以使用deploy/cpp_infer/src下面的DBDetector和CRNNRecognizer完成封装 + +#### Q3.4.3:服务部署可以只发布文本识别,而不带文本检测模型么? + +**A**:可以的。默认的服务部署是检测和识别串联预测的。也支持单独发布文本检测或文本识别模型,比如使用PaddleHUBPaddleOCR 模型时,deploy下有三个文件夹,分别是 + +- ocr_det:检测预测 +- ocr_rec: 识别预测 +- ocr_system: 检测识别串联预测 + +每个模块是单独分开的,所以可以选择只发布文本识别模型。使用PaddleServing部署时同理。 + + +#### Q3.4.4:为什么PaddleOCR检测预测是只支持一张图片测试?即test_batch_size_per_card=1 + +**A**:测试的时候,对图像等比例缩放,最长边960,不同图像等比例缩放后长宽不一致,无法组成batch,所以设置为test_batch_size为1。 + +#### Q3.4.5:为什么使用c++ inference和python inference结果不一致? + +**A**:可能是导出的inference model版本与预测库版本需要保持一致,比如在Windows下,Paddle官网提供的预测库版本是1.8,而PaddleOCR提供的inference model 版本是1.7,因此最终预测结果会有差别。可以在Paddle1.8环境下导出模型,再基于该模型进行预测。 +此外也需要保证两者的预测参数配置完全一致。 + +#### Q3.4.6:为什么第一张张图预测时间很长,第二张之后预测时间会降低? + +**A**:第一张图需要显存资源初始化,耗时较多。完成模型加载后,之后的预测时间会明显缩短。 + +#### Q3.4.7:请问opt工具可以直接转int8量化后的模型为.nb文件吗 + +**A**:有的,PaddleLite提供完善的opt工具,可以参考[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/post_quant_with_data.html) + +#### Q3.4.8:请问在安卓端怎么设置这个参数 --det_db_unclip_ratio=3 + +**A**:在安卓APK上无法设置,没有暴露这个接口,如果使用的是PaddledOCR/deploy/lite/的demo,可以修改config.txt中的对应参数来设置 + +#### Q3.4.9:PaddleOCR模型是否可以转换成ONNX模型? + +**A**:目前暂不支持转ONNX,相关工作在研发中。 + +#### Q3.4.10:使用opt工具对检测模型转换时报错 can not found op arguments for node conv2_b_attr + +**A**:这个问题大概率是编译opt工具的Paddle-Lite不是develop分支,建议使用Paddle-Lite 的develop分支编译opt工具。 + +#### Q3.4.11:libopenblas.so找不到是什么意思? + +**A**:目前包括mkl和openblas两种版本的预测库,推荐使用mkl的预测库,如果下载的预测库是mkl的,编译的时候也需要勾选`with_mkl`选项 +,以Linux下编译为例,需要在设置这里为ON,`-DWITH_MKL=ON`,[参考链接](https://github.com/PaddlePaddle/PaddleOCR/blob/8a78af26df0dd8f15b734cc8db13e25d2a3656a2/deploy/cpp_infer/tools/build.sh#L12)。此外,使用预测库时,推荐在Linux或者Windows上进行开发,不推荐在MacOS上开发。 + +#### Q3.4.12:使用自定义字典训练,inference时如何修改 + +**A**:使用了自定义字典的话,用inference预测时,需要通过 --rec_char_dict_path 修改字典路径。详细操作可参考[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/inference.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E5%AD%97%E5%85%B8%E7%9A%84%E6%8E%A8%E7%90%86) -10. **使用带TPS的识别模型预测报错** -报错信息:Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](320) != Grid dimension[2](100) -原因:TPS模块暂时无法支持变长的输入,请设置 --rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape +#### Q3.4.13:能否返回单字字符的位置? -11. **自定义字典训练的模型,识别结果出现字典里没出现的字** -预测时没有设置采用的自定义字典路径。设置方法是在预测时,通过增加输入参数rec_char_dict_path来设置。 +**A**:训练的时候标注是整个文本行的标注,所以预测的也是文本行位置,如果要获取单字符位置信息,可以根据预测的文本,计算字符数量,再去根据整个文本行的位置信息,估计文本块中每个字符的位置。 +#### Q3.4.14:PaddleOCR模型部署方式有哪几种? +**A**:目前有Inference部署,serving部署和手机端Paddle Lite部署,可根据不同场景做灵活的选择:Inference部署适用于本地离线部署,serving部署适用于云端部署,Paddle Lite部署适用于手机端集成。 diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md new file mode 100644 index 0000000000000000000000000000000000000000..c4a3b3255f7367aec672387272d47b64a02658ee --- /dev/null +++ b/doc/doc_ch/algorithm_overview.md @@ -0,0 +1,64 @@ + +## 算法介绍 +本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v1.1 系列模型下载](./models_list.md)。 + +- [1.文本检测算法](#文本检测算法) +- [2.文本识别算法](#文本识别算法) + + +### 1.文本检测算法 + +PaddleOCR开源的文本检测算法列表: +- [x] DB([paper](https://arxiv.org/abs/1911.08947))(ppocr推荐) +- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) +- [x] SAST([paper](https://arxiv.org/abs/1908.05498)) + +在ICDAR2015文本检测公开数据集上,算法效果如下: + +|模型|骨干网络|precision|recall|Hmean|下载链接| +|-|-|-|-|-|-| +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| +|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| + +在Total-text文本检测公开数据集上,算法效果如下: + +|模型|骨干网络|precision|recall|Hmean|下载链接| +|-|-|-|-|-|-| +|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| + +**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) + +PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./detection.md)。 + + + +### 2.文本识别算法 + +PaddleOCR开源的文本识别算法列表: +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))(ppocr推荐) +- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) +- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) +- [x] SRN([paper](https://arxiv.org/abs/2003.12294)) + +参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: + +|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| +|-|-|-|-|-| +|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| +|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| +|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| +|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| +|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| +|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| + +**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。 +原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。 + +PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。 diff --git a/doc/doc_ch/android_demo.md b/doc/doc_ch/android_demo.md new file mode 100644 index 0000000000000000000000000000000000000000..b6d9a7921b950a5a3c911802f0fa0ad40ab4d850 --- /dev/null +++ b/doc/doc_ch/android_demo.md @@ -0,0 +1,57 @@ +# Android Demo 快速测试 + + +### 1. 安装最新版本的Android Studio + +可以从 https://developer.android.com/studio 下载。本Demo使用是4.0版本Android Studio编写。 + +### 2. 创建新项目 + +Demo测试的时候使用的是NDK 20b版本,20版本以上均可以支持编译成功。 + +如果您是初学者,可以用以下方式安装和测试NDK编译环境。 +点击 File -> New ->New Project, 新建 "Native C++" project + + +1. Start a new Android Studio project + 在项目模版中选择 Native C++ 选择PaddleOCR/depoly/android_demo 路径 + 进入项目后会自动编译,第一次编译会花费较长的时间,建议添加代理加速下载。 + +**代理添加:** + +选择 Android Studio -> Perferences -> Appearance & Behavior -> System Settings -> HTTP Proxy -> Manual proxy configuration + +![](../demo/proxy.png) + +2. 开始编译 + +点击编译按钮,连接手机,跟着Android Studio的引导完成操作。 + +在 Android Studio 里看到下图,表示编译完成: + +![](../demo/build.png) + +**提示:** 此时如果出现下列找不到OpenCV的报错信息,请重新点击编译,编译完成后退出项目,再次进入。 + +![](../demo/error.png) + +### 3. 发送到手机端 + +完成编译,点击运行,在手机端查看效果。 + +### 4. 如何自定义demo图片 + +1. 图片存放路径:android_demo/app/src/main/assets/images + + 将自定义图片放置在该路径下 + +2. 配置文件: android_demo/app/src/main/res/values/strings.xml + + 修改 IMAGE_PATH_DEFAULT 为自定义图片名即可 + + +# 获得更多支持 +前往[端计算模型生成平台EasyEdge](https://ai.baidu.com/easyedge/app/open_source_demo?referrerUrl=paddlelite),获得更多开发支持: + +- Demo APP:可使用手机扫码安装,方便手机端快速体验文字识别 +- SDK:模型被封装为适配不同芯片硬件和操作系统SDK,包括完善的接口,方便进行二次开发 diff --git a/doc/doc_ch/angle_class.md b/doc/doc_ch/angle_class.md new file mode 100644 index 0000000000000000000000000000000000000000..b2118661290ac0b6f2731a8fd9ba76dadcb21ded --- /dev/null +++ b/doc/doc_ch/angle_class.md @@ -0,0 +1,127 @@ +## 文字角度分类 + +### 数据准备 + +请按如下步骤设置数据集: + +训练数据的默认存储路径是 `PaddleOCR/train_data/cls`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录: + +``` +ln -sf /train_data/cls/dataset +``` + +请参考下文组织您的数据。 +- 训练集 + +首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(cls_gt_train.txt)记录图片路径和标签。 + +**注意:** 默认请将图片路径和图片标签用 `\t` 分割,如用其他方式分割将造成训练报错 + +0和180分别表示图片的角度为0度和180度 + +``` +" 图像文件名 图像标注信息 " + +train_data/cls/word_001.jpg 0 +train_data/cls/word_002.jpg 180 +``` + +最终训练集应有如下文件结构: +``` +|-train_data + |-cls + |- cls_gt_train.txt + |- train + |- word_001.png + |- word_002.jpg + |- word_003.jpg + | ... +``` + +- 测试集 + +同训练集类似,测试集也需要提供一个包含所有图片的文件夹(test)和一个cls_gt_test.txt,测试集的结构如下所示: + +``` +|-train_data + |-cls + |- 和一个cls_gt_test.txt + |- test + |- word_001.jpg + |- word_002.jpg + |- word_003.jpg + | ... +``` + +### 启动训练 + +PaddleOCR提供了训练脚本、评估脚本和预测脚本。 + +开始训练: + +*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* + +``` +# 设置PYTHONPATH路径 +export PYTHONPATH=$PYTHONPATH:. +# GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +# 启动训练 +python3 tools/train.py -c configs/cls/cls_mv3.yml +``` + +- 数据增强 + +PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`。 + +默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse),随机数据增强(RandAugment)。 + +训练过程中除随机数据增强外每种扰动方式以50%的概率被选择,具体代码实现请参考: +[randaugment.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/cls/randaugment.py) +[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) + +*由于OpenCV的兼容性问题,扰动操作暂时只支持linux* + +### 训练 + +PaddleOCR支持训练和评估交替进行, 可以在 `configs/cls/cls_mv3.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/cls_mv3/best_accuracy` 。 + +如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。 + +**注意,预测/评估时的配置文件请务必与训练一致。** + +### 评估 + +评估数据集可以通过`configs/cls/cls_reader.yml` 修改EvalReader中的 `label_file_path` 设置。 + +*注意* 评估时必须确保配置文件中 infer_img 字段为空 +``` +export CUDA_VISIBLE_DEVICES=0 +# GPU 评估, Global.checkpoints 为待测权重 +python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy +``` + +### 预测 + +* 训练引擎的预测 + +使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。 + +默认预测图片存储在 `infer_img` 里,通过 `-o Global.checkpoints` 指定权重: + +``` +# 预测分类结果 +python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png +``` + +预测图片: + +![](../imgs_words/en/word_1.png) + +得到输入图像的预测结果: + +``` +infer_img: doc/imgs_words/en/word_1.png + scores: [[0.93161047 0.06838956]] + label: [0] +``` diff --git a/doc/doc_ch/benchmark.md b/doc/doc_ch/benchmark.md index 65d9a534a5fca0d31500db6ca4cf4cf7f360b958..520a2fcea35ef4bc19ae448517fbfcba61ed60b0 100644 --- a/doc/doc_ch/benchmark.md +++ b/doc/doc_ch/benchmark.md @@ -1,29 +1,51 @@ # Benchmark -本文给出了PaddleOCR超轻量中文模型(8.6M)在各平台的预测耗时benchmark。 +本文给出了中英文OCR系列模型精度指标和在各平台预测耗时的benchmark。 ## 测试数据 -- 从中文公开数据集[ICDAR2017-RCTW](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#ICDAR2017-RCTW-17)中随机采样**500**张图像。 -该集合大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。 +针对OCR实际应用场景,包括合同,车牌,铭牌,火车票,化验单,表格,证书,街景文字,名片,数码显示屏等,收集的300张图像,每张图平均有17个文本框,下图给出了一些图像示例。 -## 评估指标 -在四种平台上的预测耗时指标如下: +
+ +
-|长边尺寸(px)|T4(s)|V100(s)|Intel至强6148(s)|骁龙855(s)| -|-|-|-|-|-| -|960|0.092|0.057|0.319|0.354| -|640|0.067|0.045|0.198|0.236| -|480|0.057|0.043|0.151|0.175| +## 评估指标 -说明: +说明: +- v1.0是未添加优化策略的DB+CRNN模型,v1.1是添加多种优化策略和方向分类器的PP-OCR模型。slim_v1.1是使用裁剪或量化的模型。 +- 检测输入图像的的长边尺寸是960。 - 评估耗时阶段为图像输入到结果输出的完整阶段,包括了图像的预处理和后处理。 -- `Intel至强6148`为服务器端CPU型号,测试中使用Intel MKL-DNN 加速CPU预测速度,使用该操作需要: - - 更新到飞桨latest版本:https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev ,请根据自己环境的CUDA版本和Python版本选择相应的mkl版wheel包,如,CUDA10、Python3.7环境,应操作: - ```shell - # 获取安装包 - wget https://paddle-wheel.bj.bcebos.com/0.0.0-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl - # 安装 - pip3.7 install paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl - ``` - - 预测时使用参数打开加速开关: `--enable_mkldnn True` +- `Intel至强6148`为服务器端CPU型号,测试中使用Intel MKL-DNN 加速。 - `骁龙855`为移动端处理平台型号。 + +不同预测模型大小和整体识别精度对比 + +| 模型名称 | 整体模型
大小\(M\) | 检测模型
大小\(M\) | 方向分类器
模型大小\(M\) | 识别模型
大小\(M\) | 整体识别
F\-score | +|:-:|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 | +| ch\_ppocr\_server\_v1\.1 | 155\.1 | 47\.2 | 0\.9 | 107 | 0\.5414 | +| ch\_ppocr\_mobile\_v1\.0 | 8\.6 | 4\.1 | \- | 4\.5 | 0\.393 | +| ch\_ppocr\_server\_v1\.0 | 203\.8 | 98\.5 | \- | 105\.3 | 0\.4436 | + +不同预测模型在T4 GPU上预测速度对比,单位ms + +| 模型名称 | 整体 | 检测 | 方向分类器 | 识别 | +|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 137 | 35 | 24 | 78 | +| ch\_ppocr\_server\_v1\.1 | 204 | 39 | 25 | 140 | +| ch\_ppocr\_mobile\_v1\.0 | 117 | 41 | \- | 76 | +| ch\_ppocr\_server\_v1\.0 | 199 | 52 | \- | 147 | + +不同预测模型在CPU上预测速度对比,单位ms + +| 模型名称 | 整体 | 检测 | 方向分类器 | 识别 | +|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 421 | 164 | 51 | 206 | +| ch\_ppocr\_mobile\_v1\.0 | 398 | 219 | \- | 179 | + +裁剪量化模型和原始模型模型大小,整体识别精度和在SD 855上预测速度对比 + +| 模型名称 | 整体模型
大小\(M\) | 检测模型
大小\(M\) | 方向分类器
模型大小\(M\) | 识别模型
大小\(M\) | 整体识别
F\-score | SD 855
\(ms\) | +|:-:|:-:|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 | 306 | +| ch\_ppocr\_mobile\_slim\_v1\.1 | 3\.5 | 1\.4 | 0\.5 | 1\.6 | 0\.521 | 268 | diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index bee2637094b1386210677788f0944d232f7ff82c..f9c664d4ea38e2e52dc76bfb5b63d9a515b106a7 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -10,7 +10,7 @@ ## 配置文件 Global 参数介绍 -以 `rec_chinese_lite_train.yml` 为例 +以 `rec_chinese_lite_train_v1.1.yml ` 为例 | 字段 | 用途 | 默认值 | 备注 | @@ -32,6 +32,10 @@ | loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention | | distort | 设置是否使用数据增强 | false | 设置为true时,将在训练时随机进行扰动,支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) | | use_space_char | 设置是否识别空格 | false | 仅在 character_type=ch 时支持空格 | +| label_list | 设置方向分类器支持的角度 | ['0','180'] | 仅在方向分类器中生效 | +| average_window | ModelAverage优化器中的窗口长度计算比例 | 0.15 | 目前仅应用与SRN | +| max_average_window | 平均值计算窗口长度的最大值 | 15625 | 推荐设置为一轮训练中mini-batchs的数目| +| min_average_window | 平均值计算窗口长度的最小值 | 10000 | \ | | reader_yml | 设置reader配置文件 | ./configs/rec/rec_icdar15_reader.yml | \ | | pretrain_weights | 加载预训练模型路径 | ./pretrain_models/CRNN/best_accuracy | \ | | checkpoints | 加载模型参数路径 | None | 用于中断后加载参数继续训练 | @@ -60,6 +64,9 @@ | beta1 | 设置一阶矩估计的指数衰减率 | 0.9 | \ | | beta2 | 设置二阶矩估计的指数衰减率 | 0.999 | \ | | decay | 是否使用decay | \ | \ | -| function(decay) | 设置decay方式 | cosine_decay | 目前只支持cosin_decay | -| step_each_epoch | 每个epoch包含多少次迭代 | 20 | 计算方式:total_image_num / (batch_size_per_card * card_size) | -| total_epoch | 总共迭代多少个epoch | 1000 | 与Global.epoch_num 一致 | +| function(decay) | 设置decay方式 | - | 目前支持cosine_decay, cosine_decay_warmup与piecewise_decay | +| step_each_epoch | 每个epoch包含多少次迭代, cosine_decay/cosine_decay_warmup时有效 | 20 | 计算方式:total_image_num / (batch_size_per_card * card_size) | +| total_epoch | 总共迭代多少个epoch, cosine_decay/cosine_decay_warmup时有效 | 1000 | 与Global.epoch_num 一致 | +| warmup_minibatch | 线性warmup的迭代次数, cosine_decay_warmup时有效 | 1000 | \ | +| boundaries | 学习率下降时的迭代次数间隔, piecewise_decay时有效 | - | 参数为列表形式 | +| decay_rate | 学习率衰减系数, piecewise_decay时有效 | - | \ | diff --git a/doc/doc_ch/customize.md b/doc/doc_ch/customize.md index 6e471c1c4831f0d98f00b811f18ef12b377ecffa..5944bf08e4c4fac0da5a8e939936719e0385f4e1 100644 --- a/doc/doc_ch/customize.md +++ b/doc/doc_ch/customize.md @@ -6,7 +6,7 @@ PaddleOCR提供了EAST、DB两种文本检测算法,均支持MobileNetV3、ResNet50_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的DB检测模型(即超轻量模型使用的配置): ``` -python3 tools/train.py -c configs/det/det_mv3_db.yml +python3 tools/train.py -c configs/det/det_mv3_db.yml 2>&1 | tee det_db.log ``` 更详细的数据准备和训练教程参考文档教程中[文本检测模型训练/评估/预测](./detection.md)。 @@ -14,7 +14,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml PaddleOCR提供了CRNN、Rosetta、STAR-Net、RARE四种文本识别算法,均支持MobileNetV3、ResNet34_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的CRNN识别模型(即超轻量模型使用的配置): ``` -python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml +python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml 2>&1 | tee rec_ch_lite.log ``` 更详细的数据准备和训练教程参考文档教程中[文本识别模型训练/评估/预测](./recognition.md)。 diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md index cfff12f976f0a0abe06c37187755d78358e5ca25..3945b7f0b996a18cfdb943c3ec68ce43fa53e6f3 100644 --- a/doc/doc_ch/detection.md +++ b/doc/doc_ch/detection.md @@ -1,19 +1,28 @@ # 文字检测 -本节以icdar15数据集为例,介绍PaddleOCR中检测模型的训练、评估与测试。 +本节以icdar2015数据集为例,介绍PaddleOCR中检测模型的训练、评估与测试。 ## 数据准备 icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。 将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/ 下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件 ,您可以通过wget的方式进行下载。 -``` +```shell # 在PaddleOCR路径下 cd PaddleOCR/ wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt ``` +PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `train_data/gen_label.py`, 这里以训练集为例: + +``` +# 将官网下载的标签文件转换为 train_icdar2015_label.txt +python gen_label.py --mode="det" --root_path="icdar_c4_train_imgs/" \ + --input_path="ch4_training_localization_transcription_gt" \ + --output_label="train_icdar2015_label.txt" +``` + 解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,分别是: ``` /PaddleOCR/train_data/icdar2015/text_localization/ @@ -23,29 +32,31 @@ wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_la └─ test_icdar2015_label.txt icdar数据集的测试标注 ``` -提供的标注文件格式为,其中中间是"\t"分隔: +提供的标注文件格式如下,中间用"\t"分隔: ``` " 图像文件名 json.dumps编码的图像标注信息" -ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] +ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}] ``` json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 -`transcription` 表示当前文本框的文字,在文本检测任务中并不需要这个信息。 -如果您想在其他数据集上训练PaddleOCR,可以按照上述形式构建标注文件。 +`transcription` 表示当前文本框的文字,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。** +如果您想在其他数据集上训练,可以按照上述形式构建标注文件。 ## 快速启动训练 -首先下载模型backbone的pretrain model,PaddleOCR的检测模型目前支持两种backbone,分别是MobileNetV3、ResNet50_vd, +首先下载模型backbone的pretrain model,PaddleOCR的检测模型目前支持两种backbone,分别是MobileNetV3、ResNet_vd系列, 您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures)中的模型更换backbone。 -``` +```shell cd PaddleOCR/ # 下载MobileNetV3的预训练模型 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar -# 下载ResNet50的预训练模型 +# 或,下载ResNet18_vd的预训练模型 +wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar +# 或,下载ResNet50_vd的预训练模型 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar # 解压预训练模型文件,以MobileNetV3为例 -tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/ +tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/ # 注:正确解压backbone预训练权重文件后,文件夹下包含众多以网络层命名的权重文件,格式如下: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/ @@ -57,64 +68,67 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models ``` -**启动训练** +#### 启动训练 *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ +```shell +# 训练 mv3_db 模型,并将训练日志保存为 tain_det.log +python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml \ + -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ \ + 2>&1 | tee train_det.log ``` -上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。 +上述指令中,通过-c 选择训练使用configs/det/det_db_mv3_v1.1.yml配置文件。 有关配置文件的详细解释,请参考[链接](./config.md)。 您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001 -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 +```shell +python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001 ``` -**断点训练** +#### 断点训练 如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径: -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model +```shell +python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model ``` -**注意**:Global.checkpoints的优先级高于Global.pretrain_weights的优先级,即同时指定两个参数时,优先加载Global.checkpoints指定的模型,如果Global.checkpoints指定的模型路径有误,会加载Global.pretrain_weights指定的模型。 +**注意**:`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。 ## 指标评估 PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean。 -运行如下代码,根据配置文件det_db_mv3.yml中save_res_path指定的测试集检测结果文件,计算评估指标。 +运行如下代码,根据配置文件`det_db_mv3_v1.1.yml`中`save_res_path`指定的测试集检测结果文件,计算评估指标。 -评估时设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 +评估时设置后处理参数`box_thresh=0.6`,`unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化 +```shell +python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 -``` -训练中模型参数默认保存在Global.save_model_dir目录下。在评估指标时,需要设置Global.checkpoints指向保存的参数文件。 +训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。 比如: -``` -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +```shell +python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` -* 注:box_thresh、unclip_ratio是DB后处理所需要的参数,在评估EAST模型时不需要设置 +* 注:`box_thresh`、`unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置 ## 测试检测效果 测试单张图像的检测效果 -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" +```shell +python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" ``` 测试DB模型时,调整后处理阈值, -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +```shell +python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` 测试文件夹下所有图像的检测效果 -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" +```shell +python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" ``` diff --git a/doc/doc_ch/framework.png b/doc/doc_ch/framework.png new file mode 100644 index 0000000000000000000000000000000000000000..db151a0e161771f33a438d2e98b49777c34950f8 Binary files /dev/null and b/doc/doc_ch/framework.png differ diff --git a/doc/doc_ch/handwritten_datasets.md b/doc/doc_ch/handwritten_datasets.md index 21f68acc1a1974b26fce6778b0d87781207b53da..46e85e4f9dc22e4732f654f9a1ef2a715a498fcf 100644 --- a/doc/doc_ch/handwritten_datasets.md +++ b/doc/doc_ch/handwritten_datasets.md @@ -1,13 +1,28 @@ -## 手写中文OCR数据集 -这里整理了常用手写中文数据集,持续更新中,欢迎各位小伙伴贡献数据集~ +# 手写OCR数据集 +这里整理了常用手写数据集,持续更新中,欢迎各位小伙伴贡献数据集~ - [中科院自动化研究所-手写中文数据集](#中科院自动化研究所-手写中文数据集) +- [NIST手写单字数据集-英文](#NIST手写单字数据集-英文) -#### 1、中科院自动化研究所-手写中文数据集 +## 中科院自动化研究所-手写中文数据集 - **数据来源**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html -- **数据简介**:包含在线和离线两类手写单字数据,包含GB2312-80中的3755个一级汉字,共由720人手写完成。在线部分(HWDB)总共包含约210万个训练样本,53万个测试样本;离线部分(OLHWDB)总共包含约210万个训练样本,53万个测试样本。 - ![](../datasets/CASIA_0.jpg) - (a) 五张单字图片样例 +- **数据简介**: + * 包含在线和离线两类手写数据,`HWDB1.0~1.2`总共有3895135个手写单字样本,分属7356类(7185个汉字和171个英文字母、数字、符号);`HWDB2.0~2.2`总共有5091页图像,分割为52230个文本行和1349414个文字。所有文字和文本样本均存为灰度图像。部分单字样本图片如下所示。 + + ![](../datasets/CASIA_0.jpg) + - **下载地址**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html - **使用建议**:数据为单字,白色背景,可以大量合成文字行进行训练。白色背景可以处理成透明状态,方便添加各种背景。对于需要语义的情况,建议从真实语料出发,抽取单字组成文字行 + + +## NIST手写单字数据集-英文(NIST Handprinted Forms and Characters Database) + +- **数据来源**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19) + +- **数据简介**: NIST19数据集适用于手写文档和字符识别的模型训练,从3600位作者的手写样本表格中提取得到,总共包含81万张字符图片。其中9张图片示例如下。 + + ![](../datasets/nist_demo.png) + + +- **下载地址**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19) diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md index 34c40027fd50c3a920a5401ac917c8262692ff3c..c775de57b8538089f5ac9288dd825e044a183693 100644 --- a/doc/doc_ch/inference.md +++ b/doc/doc_ch/inference.md @@ -1,19 +1,48 @@ # 基于Python预测引擎推理 -inference 模型(fluid.io.save_inference_model保存的模型) -一般是模型训练完成后保存的固化模型,多用于预测部署。 -训练过程中保存的模型是checkpoints模型,保存的是模型的参数,多用于恢复训练等。 -与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合与实际系统集成。更详细的介绍请参考文档[分类预测框架](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). +inference 模型(`fluid.io.save_inference_model`保存的模型) +一般是模型训练完成后保存的固化模型,多用于预测部署。训练过程中保存的模型是checkpoints模型,保存的是模型的参数,多用于恢复训练等。 +与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合与实际系统集成。更详细的介绍请参考文档[分类预测框架](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/extension/paddle_inference.md). 接下来首先介绍如何将训练的模型转换成inference模型,然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。 + +- [一、训练模型转inference模型](#训练模型转inference模型) + - [检测模型转inference模型](#检测模型转inference模型) + - [识别模型转inference模型](#识别模型转inference模型) + - [方向分类模型转inference模型](#方向分类模型转inference模型) + +- [二、文本检测模型推理](#文本检测模型推理) + - [1. 超轻量中文检测模型推理](#超轻量中文检测模型推理) + - [2. DB文本检测模型推理](#DB文本检测模型推理) + - [3. EAST文本检测模型推理](#EAST文本检测模型推理) + - [4. SAST文本检测模型推理](#SAST文本检测模型推理) + +- [三、文本识别模型推理](#文本识别模型推理) + - [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理) + - [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理) + - [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理) + - [4. 基于SRN损失的识别模型推理](#基于SRN损失的识别模型推理) + - [5. 自定义文本识别字典的推理](#自定义文本识别字典的推理) + - [6. 多语言模型的推理](#多语言模型的推理) + +- [四、方向分类模型推理](#方向识别模型推理) + - [1. 方向分类模型推理](#方向分类模型推理) + +- [五、文本检测、方向分类和文字识别串联推理](#文本检测、方向分类和文字识别串联推理) + - [1. 超轻量中文OCR模型推理](#超轻量中文OCR模型推理) + - [2. 其他模型推理](#其他模型推理) + + + ## 一、训练模型转inference模型 + ### 检测模型转inference模型 下载超轻量级中文检测模型: ``` -wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar -C ./ch_lite/ ``` 上述模型是以MobileNetV3为backbone训练的DB算法,将训练好的模型转换成inference模型只需要运行如下命令: ``` @@ -22,22 +51,23 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar & # Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # Global.save_inference_dir参数设置转换的模型将保存的地址。 -python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/ +python3 tools/export_model.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_inference_dir=./inference/det_db/ ``` -转inference模型时,使用的配置文件和训练时使用的配置文件相同。另外,还需要设置配置文件中的Global.checkpoints、Global.save_inference_dir参数。 -其中Global.checkpoints指向训练中保存的模型参数文件,Global.save_inference_dir是生成的inference模型要保存的目录。 -转换成功后,在save_inference_dir 目录下有两个文件: +转inference模型时,使用的配置文件和训练时使用的配置文件相同。另外,还需要设置配置文件中的`Global.checkpoints`、`Global.save_inference_dir`参数。 +其中`Global.checkpoints`指向训练中保存的模型参数文件,`Global.save_inference_dir`是生成的inference模型要保存的目录。 +转换成功后,在`save_inference_dir`目录下有两个文件: ``` inference/det_db/ └─ model 检测inference模型的program文件 └─ params 检测inference模型的参数文件 ``` + ### 识别模型转inference模型 下载超轻量中文识别模型: ``` -wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar && tar xf ch_ppocr_mobile_v1.1_rec_train.tar -C ./ch_lite/ ``` 识别模型转inference模型与检测的方式相同,如下: @@ -47,11 +77,11 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar # Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # Global.save_inference_dir参数设置转换的模型将保存的地址。 -python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ +python3 tools/export_model.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_rec_train/best_accuracy \ Global.save_inference_dir=./inference/rec_crnn/ ``` -如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。 +**注意:**如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的`character_dict_path`是否是所需要的字典文件。 转换成功后,在目录下有两个文件: ``` @@ -60,11 +90,39 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa └─ params 识别inference模型的参数文件 ``` + +### 方向分类模型转inference模型 + +下载方向分类模型: +``` +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_pre.tar && tar xf ./ch_lite/ch_ppocr_mobile-v1.1.cls_pre.tar -C ./ch_lite/ +``` + +方向分类模型转inference模型与检测的方式相同,如下: +``` +# -c后面设置训练算法的yml配置文件 +# -o配置可选参数 +# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 +# Global.save_inference_dir参数设置转换的模型将保存的地址。 + +python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/cls_model/best_accuracy \ + Global.save_inference_dir=./inference/cls/ +``` + +转换成功后,在目录下有两个文件: +``` +/inference/cls/ + └─ model 识别inference模型的program文件 + └─ params 识别inference模型的参数文件 +``` + + ## 二、文本检测模型推理 -下面将介绍超轻量中文检测模型推理、DB文本检测模型推理和EAST文本检测模型推理。默认配置是根据DB文本检测模型推理设置的。由于EAST和DB算法差别很大,在推理时,需要通过传入相应的参数适配EAST文本检测算法。 +文本检测模型推理,默认使用DB模型的配置参数。当不使用DB模型时,在推理时,需要通过传入相应的参数进行算法适配,细节参考下文。 -### 1.超轻量中文检测模型推理 + +### 1. 超轻量中文检测模型推理 超轻量中文检测模型推理,可以执行如下命令: @@ -72,11 +130,11 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" ``` -可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: +可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -![](imgs_results/det_res_2.jpg) +![](../imgs_results/det_res_2.jpg) -通过设置参数det_max_side_len的大小,改变检测算法中图片规范化的最大值。当图片的长宽都小于det_max_side_len,则使用原图预测,否则将图片等比例缩放到最大值,进行预测。该参数默认设置为det_max_side_len=960. 如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以执行如下命令: +通过设置参数`det_max_side_len`的大小,改变检测算法中图片规范化的最大值。当图片的长宽都小于`det_max_side_len`,则使用原图预测,否则将图片等比例缩放到最大值,进行预测。该参数默认设置为`det_max_side_len=960`。 如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以执行如下命令: ``` python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200 @@ -87,7 +145,8 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_di python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False ``` -### 2.DB文本检测模型推理 + +### 2. DB文本检测模型推理 首先将DB文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)),可以使用如下命令进行转换: @@ -105,13 +164,14 @@ DB文本检测模型推理,可以执行如下命令: python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/" ``` -可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: +可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: ![](../imgs_results/det_res_img_10_db.jpg) -**注意**:由于ICDAR2015数据集只有1000张训练图像,主要针对英文场景,所以上述模型对中文文本图像检测效果非常差。 +**注意**:由于ICDAR2015数据集只有1000张训练图像,且主要针对英文场景,所以上述模型对中文文本图像检测效果会比较差。 -### 3.EAST文本检测模型推理 + +### 3. EAST文本检测模型推理 首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)),可以使用如下命令进行转换: @@ -123,24 +183,59 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east" ``` -EAST文本检测模型推理,需要设置参数det_algorithm,指定检测算法类型为EAST,可以执行如下命令: +**EAST文本检测模型推理,需要设置参数`--det_algorithm="EAST"`**,可以执行如下命令: ``` python3 tools/infer/predict_det.py --det_algorithm="EAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" ``` -可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: +可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: ![](../imgs_results/det_res_img_10_east.jpg) -**注意**:本代码库中EAST后处理中NMS采用的Python版本,所以预测速度比较耗时。如果采用C++版本,会有明显加速。 +**注意**:本代码库中,EAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。 + + + +### 4. SAST文本检测模型推理 +#### (1). 四边形文本检测模型(ICDAR2015) +首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)),可以使用如下命令进行转换: +``` +python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.checkpoints="./models/sast_r50_vd_icdar2015/best_accuracy" Global.save_inference_dir="./inference/det_sast_ic15" +``` +**SAST文本检测模型推理,需要设置参数`--det_algorithm="SAST"`**,可以执行如下命令: +``` +python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_sast_ic15/" +``` +可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: + +![](../imgs_results/det_res_img_10_sast.jpg) + +#### (2). 弯曲文本检测模型(Total-Text) +首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)),可以使用如下命令进行转换: + +``` +python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.checkpoints="./models/sast_r50_vd_total_text/best_accuracy" Global.save_inference_dir="./inference/det_sast_tt" +``` + +**SAST文本检测模型推理,需要设置参数`--det_algorithm="SAST"`,同时,还需要增加参数`--det_sast_polygon=True`,**可以执行如下命令: +``` +python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_sast_tt/" --det_sast_polygon=True +``` +可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: +![](../imgs_results/det_res_img623_sast.jpg) +**注意**:本代码库中,SAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。 + + + ## 三、文本识别模型推理 下面将介绍超轻量中文识别模型推理、基于CTC损失的识别模型推理和基于Attention损失的识别模型推理。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。此外,如果训练时修改了文本的字典,请参考下面的自定义文本识别字典的推理。 -### 1.超轻量中文识别模型推理 + +### 1. 超轻量中文识别模型推理 超轻量中文识别模型推理,可以执行如下命令: @@ -155,7 +250,8 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] -### 2.基于CTC损失的识别模型推理 + +### 2. 基于CTC损失的识别模型推理 我们以STAR-Net为例,介绍基于CTC损失的识别模型推理。 CRNN和Rosetta使用方式类似,不用设置识别算法参数rec_algorithm。 @@ -176,7 +272,8 @@ STAR-Net文本识别模型推理,可以执行如下命令: python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` -### 3.基于Attention损失的识别模型推理 + +### 3. 基于Attention损失的识别模型推理 基于Attention损失的识别模型与ctc不同,需要额外设置识别算法参数 --rec_algorithm="RARE" @@ -201,31 +298,95 @@ Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) ``` + +### 4. 基于SRN损失的识别模型推理 -### 4.自定义文本识别字典的推理 +基于SRN损失的识别模型,需要额外设置识别算法参数 --rec_algorithm="SRN"。 同时需要保证预测shape与训练时一致,如: --rec_image_shape="1, 64, 256" + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \ + --rec_model_dir="./inference/srn/" \ + --rec_image_shape="1, 64, 256" \ + --rec_char_type="en" \ + --rec_algorithm="SRN" +``` + + +### 5. 自定义文本识别字典的推理 如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径 ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path" ``` -## 四、文本检测、识别串联推理 + +### 6. 多语言模型的推理 +如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果, +需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/korean_dict.txt" --vis_font_path="doc/korean.ttf" +``` +![](../imgs_words/korean/1.jpg) + +执行命令后,上图的预测结果为: +``` text +2020-09-19 16:15:05,076-INFO: index: [205 206 38 39] +2020-09-19 16:15:05,077-INFO: word : 바탕으로 +2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535 +``` + + +## 四、方向分类模型推理 -### 1.超轻量中文OCR模型推理 +下面将介绍方向分类模型推理。 -在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 + +### 1. 方向分类模型推理 +方向分类模型推理,可以执行如下命令: + +``` +python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="./inference/cls/" ``` -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" + +![](../imgs_words/ch/word_4.jpg) + +执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下: + +Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963] + + +## 五、文本检测、方向分类和文字识别串联推理 + +### 1. 超轻量中文OCR模型推理 + +在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir`和`rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。可视化识别结果默认保存到 ./inference_results 文件夹里面。 + ``` +# 使用方向分类器 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true + +# 不使用方向分类器 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false +``` + + + + 执行命令后,识别结果图像如下: ![](../imgs_results/2.jpg) -### 2.其他模型推理 + +### 2. 其他模型推理 + +如果想尝试使用其他检测算法或者识别算法,请参考上述文本检测模型推理和文本识别模型推理,更新相应配置和模型。 + +**注意:由于检测框矫正逻辑的局限性,暂不支持使用SAST弯曲文本检测模型(即,使用参数`--det_sast_polygon=True`时)进行模型串联。** -如果想尝试使用其他检测算法或者识别算法,请参考上述文本检测模型推理和文本识别模型推理,更新相应配置和模型,下面给出基于EAST文本检测和STAR-Net文本识别执行命令: +下面给出基于EAST文本检测和STAR-Net文本识别执行命令: ``` python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" diff --git a/doc/doc_ch/installation.md b/doc/doc_ch/installation.md index b379b7af9b4e42262f1e5a27f67ad5db7042a32a..d4b0a67f3cbdcb7d4f3efa6ea44ff881f7598a38 100644 --- a/doc/doc_ch/installation.md +++ b/doc/doc_ch/installation.md @@ -2,15 +2,16 @@ 经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23 PaddleOCR 工作环境 -- PaddlePaddle1.7 -- python3 +- PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0.0.beta +- python3.7 - glibc 2.23 +- cuDNN 7.6+ (GPU) -建议使用我们提供的docker运行PaddleOCR,有关docker使用请参考[链接](https://docs.docker.com/get-started/)。 +建议使用我们提供的docker运行PaddleOCR,有关docker、nvidia-docker使用请参考[链接](https://www.runoob.com/docker/docker-tutorial.html/)。 *如您希望使用 mac 或 windows直接运行预测代码,可以从第2步开始执行。* -1. (建议)准备docker环境。第一次使用这个镜像,会自动下载该镜像,请耐心等待。 +**1. (建议)准备docker环境。第一次使用这个镜像,会自动下载该镜像,请耐心等待。** ``` # 切换到工作目录下 cd /home/Projects @@ -20,10 +21,10 @@ cd /home/Projects 如果您希望在CPU环境下使用docker,使用docker而不是nvidia-docker创建docker sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash -如果您的机器安装的是CUDA9,请运行以下命令创建容器 +如果使用CUDA9,请运行以下命令创建容器 sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash -如果您的机器安装的是CUDA10,请运行以下命令创建容器 +如果使用CUDA10,请运行以下命令创建容器 sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash 您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。 @@ -46,24 +47,21 @@ docker images hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 ``` -2. 安装PaddlePaddle Fluid v1.7 +**2. 安装PaddlePaddle Fluid v2.0** ``` pip3 install --upgrade pip -如果您的机器安装的是CUDA9,请运行以下命令安装 -python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple - -如果您的机器安装的是CUDA10,请运行以下命令安装 -python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple +如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 +python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple 如果您的机器是CPU,请运行以下命令安装 -python3 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple +python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple 更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 ``` -3. 克隆PaddleOCR repo代码 +**3. 克隆PaddleOCR repo代码** ``` 【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR @@ -74,8 +72,11 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。 ``` -4. 安装第三方库 +**4. 安装第三方库** ``` cd PaddleOCR pip3 install -r requirments.txt ``` + +注意,windows环境下,建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装, +直接通过pip安装的shapely库可能出现`[winRrror 126] 找不到指定模块的问题`。 diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md new file mode 100644 index 0000000000000000000000000000000000000000..56dfd403a9394ffe2a91273de389716a80be5346 --- /dev/null +++ b/doc/doc_ch/models_list.md @@ -0,0 +1,71 @@ +## OCR模型列表(V1.1,9月22日更新) + +- [一、文本检测模型](#文本检测模型) +- [二、文本识别模型](#文本识别模型) + - [1. 中文识别模型](#中文识别模型) + - [2. 英文识别模型](#英文识别模型) + - [3. 多语言识别模型](#多语言识别模型) +- [三、文本方向分类模型](#文本方向分类模型) + +PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训练模型`、`slim模型`,模型区别说明如下: + +|模型类型|模型格式|简介| +|-|-|-| +|推理模型|model、params|用于python预测引擎推理,[详情](./inference.md)| +|训练模型、预训练模型|\*.pdmodel、\*.pdopt、\*.pdparams|训练过程中保存的checkpoints模型,保存的是模型的参数,多用于模型指标评估和恢复训练| +|slim模型|\*.nb|用于lite部署| + + + +### 一、文本检测模型 +|模型名称|模型简介|配置文件|推理模型大小|下载地址| +|-|-|-|-|-| +|ch_ppocr_mobile_slim_v1.1_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|1.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)| +|ch_ppocr_mobile_v1.1_det|原始超轻量模型,支持中英文、多语种文本检测|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)| +|ch_ppocr_server_v1.1_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[det_r18_vd_db_v1.1.yml](../../configs/det/det_r18_vd_db_v1.1.yml)|47.2M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)| + + + +### 二、文本识别模型 + + +#### 1. 中文识别模型 +|模型名称|模型简介|配置文件|推理模型大小|下载地址| +|-|-|-|-|-| +|ch_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|1.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) | +|ch_ppocr_mobile_v1.1_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|4.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | +|ch_ppocr_server_v1.1_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml)|105M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | + +**说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。 + + +#### 2. 英文识别模型 +|模型名称|模型简介|配置文件|推理模型大小|下载地址| +|-|-|-|-|-| +|en_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|0.9M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb) | +|en_ppocr_mobile_v1.1_rec|原始超轻量模型,支持英文、数字识别|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|2.0M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar) | + + +#### 3. 多语言识别模型(更多语言持续更新中...) +|模型名称|模型简介|配置文件|推理模型大小|下载地址| +|-|-|-|-|-| +| french_ppocr_mobile_v1.1_rec |法文识别|[rec_french_lite_train.yml](../../configs/rec/multi_languages/rec_french_lite_train.yml)|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar) | +| german_ppocr_mobile_v1.1_rec |德文识别|[rec_ger_lite_train.yml](../../configs/rec/multi_languages/rec_ger_lite_train.yml)|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar) | +| korean_ppocr_mobile_v1.1_rec |韩文识别|[rec_korean_lite_train.yml](../../configs/rec/multi_languages/rec_korean_lite_train.yml)|3.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar) | +| japan_ppocr_mobile_v1.1_rec |日文识别|[rec_japan_lite_train.yml](../../configs/rec/multi_languages/rec_japan_lite_train.yml)|3.7M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar) | + + + +### 三、文本方向分类模型 +|模型名称|模型简介|配置文件|推理模型大小|下载地址| +|-|-|-|-|-| +|ch_ppocr_mobile_v1.1_cls_quant|slim量化版模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|0.5M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb) | +|ch_ppocr_mobile_v1.1_cls|原始模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|850kb|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | + + +## OCR模型列表(V1.0,7月16日更新) + +|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| +|-|-|-|-|-| +|chinese_db_crnn_mobile|8.6M超轻量级中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|通用中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar) |[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index fead57f3d12395c6b4a2417fe8a23b1e00a4579b..f411e8faa810c9272635bec7e9fb05351f37f743 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -5,14 +5,19 @@ 请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。 +*注意:也可以通过 whl 包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)。* + ## 2.inference模型下载 -|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| -|-|-|-|-|-| -|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) -|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) +* 移动端和服务器端的检测与识别模型如下,更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](./doc_ch/models_list.md) + +| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | +| ------------ | --------------- | ----------------|---- | ---------- | -------- | +| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | +| 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | -*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下* + +* windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下 复制上表中的检测和识别的`inference模型`下载地址,并解压 @@ -22,6 +27,8 @@ mkdir inference && cd inference wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} # 下载识别模型并解压 wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package} +# 下载方向分类器模型并解压 +wget {url/of/classification/inference_model} && tar xf {name/of/classification/inference_model/package} cd .. ``` @@ -30,9 +37,11 @@ cd .. ``` mkdir inference && cd inference # 下载超轻量级中文OCR模型的检测模型并解压 -wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_infer.tar # 下载超轻量级中文OCR模型的识别模型并解压 -wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_infer.tar +# 下载超轻量级中文OCR模型的文本方向分类器模型并解压 +wget https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar && tar xf ch_ppocr_mobile_v1.1_cls_infer.tar cd .. ``` @@ -40,10 +49,13 @@ cd .. ``` |-inference - |-ch_rec_mv3_crnn + |-ch_ppocr_mobile_v1.1_det_infer |- model |- params - |-ch_det_mv3_db + |-ch_ppocr_mobile_v1.1_rec_infer + |- model + |- params + |-ch_ppocr_mobile-v1.1_cls_infer |- model |- params ... @@ -51,42 +63,37 @@ cd .. ## 3.单张图像或者图像集合预测 -以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 +以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数`det_model_dir`指定检测inference模型的路径、参数`rec_model_dir`指定识别inference模型的路径、参数`use_angle_cls`指定是否使用方向分类器、参数`cls_model_dir`指定方向分类器inference模型的路径、参数`use_space_char`指定是否预测空格字符。可视化识别结果默认保存到`./inference_results`文件夹里面。 ```bash # 预测image_dir指定的单张图像 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True # 预测image_dir指定的图像集合 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True # 如果想使用CPU进行预测,需设置use_gpu参数为False -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False ``` - 通用中文OCR模型 请按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: -``` + +```bash # 预测image_dir指定的单张图像 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_server_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True ``` -- 支持空格的通用中文OCR模型 - -请按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: - -*注意:请将代码更新到最新版本,并添加参数 `--use_space_char=True` * +* 注意: + - 如果希望使用不支持空格的识别模型,在预测的时候需要注意:请将代码更新到最新版本,并添加参数 `--use_space_char=False`。 + - 如果不希望使用方向分类器,在预测的时候需要注意:请将代码更新到最新版本,并添加参数 `--use_angle_cls=False`。 -``` -# 预测image_dir指定的单张图像 -python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" --use_space_char=True -``` 更多的文本检测、识别串联推理使用方式请参考文档教程中[基于Python预测引擎推理](./inference.md)。 此外,文档教程中也提供了中文OCR模型的其他预测部署方式: - [基于C++预测引擎推理](../../deploy/cpp_infer/readme.md) -- [服务部署](./serving.md) +- [服务部署](../../deploy/pdserving/readme.md) - [端侧部署](../../deploy/lite/readme.md) diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index b23837bedeae7368f750cbd1f1413c189abd6923..918050ccda2c6203bef0f147483df0bd752d6eb5 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -1,5 +1,24 @@ ## 文字识别 + +- [一、数据准备](#数据准备) + - [数据下载](#数据下载) + - [自定义数据集](#自定义数据集) + - [字典](#字典) + - [支持空格](#支持空格) + +- [二、启动训练](#文本检测模型推理) + - [1. 数据增强](#数据增强) + - [2. 训练](#训练) + - [3. 小语种](#小语种) + +- [三、评估](#评估) + +- [四、预测](#预测) + - [1. 训练引擎预测](#训练引擎预测) + + + ### 数据准备 @@ -13,12 +32,15 @@ PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据,调试算 ln -sf /train_data/dataset ``` - + * 数据下载 若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),下载 benchmark 所需的lmdb格式数据集。 -* 使用自己数据集: +如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。 + + +* 使用自己数据集 若您希望使用自己的数据进行训练,请参考下文组织您的数据。 @@ -26,7 +48,7 @@ ln -sf /train_data/dataset 首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(rec_gt_train.txt)记录图片路径和标签。 -* 注意: 默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错 +**注意:** 默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错 ``` " 图像文件名 图像标注信息 " @@ -41,12 +63,16 @@ PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件,通 wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt # 测试集标签 wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt +``` +PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `train_data/gen_label.py`, 这里以训练集为例: +``` +# 将官网下载的标签文件转换为 rec_gt_label.txt +python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt" ``` 最终训练集应有如下文件结构: - ``` |-train_data |-ic15_data @@ -72,7 +98,7 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t |- word_003.jpg | ... ``` - + - 字典 最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。 @@ -91,21 +117,36 @@ n word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,“and” 将被映射成 [2 5 1] `ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典, + `ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典, + +`ppocr/utils/french_dict.txt` 是一个包含118个字符的法文字典 + +`ppocr/utils/japan_dict.txt` 是一个包含4399个字符的法文字典 + +`ppocr/utils/korean_dict.txt` 是一个包含3636个字符的法文字典 + +`ppocr/utils/german_dict.txt` 是一个包含131个字符的法文字典 + + 您可以按需使用。 +目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体**, +如您愿意可将字典文件提交至 [utils](../../ppocr/utils) ,我们会在Repo中感谢您。 + - 自定义字典 如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。 并将 `character_type` 设置为 `ch`。 + - 添加空格类别 如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true`。 **注意:`use_space_char` 仅在 `character_type=ch` 时生效** - + ### 启动训练 PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例: @@ -126,14 +167,12 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* ``` -# 设置PYTHONPATH路径 -export PYTHONPATH=$PYTHONPATH:. # GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号 export CUDA_VISIBLE_DEVICES=0,1,2,3 -# 训练icdar15英文数据 -python3 tools/train.py -c configs/rec/rec_icdar15_train.yml +# 训练icdar15英文数据 并将训练日志保存为 tain_rec.log +python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log ``` - + - 数据增强 PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`。 @@ -142,20 +181,24 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入 训练过程中每种扰动方式以50%的概率被选择,具体代码实现请参考:[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) -*由于OpenCV的兼容性问题,扰动操作暂时只支持GPU* +*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux* + - 训练 PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` 。 如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。 -* 提示: 可通过 -c 参数选择 `configs/rec/` 路径下的多种模型配置进行训练,PaddleOCR支持的识别算法有: +**提示:** 可通过 -c 参数选择 `configs/rec/` 路径下的多种模型配置进行训练,PaddleOCR支持的识别算法有: | 配置文件 | 算法名称 | backbone | trans | seq | pred | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | +| [rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| [rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | @@ -165,8 +208,9 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t | rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | | rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention | | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | +| rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn | -训练中文数据,推荐使用`rec_chinese_lite_train.yml`,如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: +训练中文数据,推荐使用[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: 以 `rec_mv3_none_none_ctc.yml` 为例: ``` @@ -202,8 +246,54 @@ Optimizer: ``` **注意,预测/评估时的配置文件请务必与训练一致。** + +- 小语种 + +PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的提供了多语言的配置文件,目前PaddleOCR支持的多语言算法有: +| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | + +多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以在 [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 上下载,提取码:frgi。 + +如您希望在现有模型效果的基础上调优,请参考下列说明修改配置文件: + +以 `rec_french_lite_train` 为例: +``` +Global: + ... + # 添加自定义字典,如修改字典请将路径指向新字典 + character_dict_path: ./ppocr/utils/french_dict.txt + # 训练时添加数据增强 + distort: true + # 识别空格 + use_space_char: true + ... + # 修改reader类型 + reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml + ... +... +``` + +同时需要修改数据读取文件 `rec_french_reader.yml`: + +``` +TrainReader: + ... + # 修改训练数据存放的目录名 + img_set_dir: ./train_data + # 修改 label 文件名称 + label_file_path: ./train_data/french_train.txt + +... +``` + ### 评估 评估数据集可以通过 `configs/rec/rec_icdar15_reader.yml` 修改EvalReader中的 `label_file_path` 设置。 @@ -215,8 +305,10 @@ export CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy ``` + ### 预测 + * 训练引擎的预测 使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。 @@ -240,12 +332,12 @@ infer_img: doc/imgs_words/en/word_1.png word : joint ``` -预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml` 完成了中文模型的训练, +预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml` 完成了中文模型的训练, 您可以使用如下命令进行中文模型预测。 ``` # 预测中文结果 -python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg +python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg ``` 预测图片: diff --git a/doc/doc_ch/tree.md b/doc/doc_ch/tree.md new file mode 100644 index 0000000000000000000000000000000000000000..f730d8f01fae467f49a03d68d931eb4fda526626 --- /dev/null +++ b/doc/doc_ch/tree.md @@ -0,0 +1,208 @@ +# 整体目录结构 + +PaddleOCR 的整体目录结构介绍如下: + +``` +PaddleOCR +├── configs // 配置文件,可通过yml文件选择模型结构并修改超参 +│ ├── cls // 方向分类器相关配置文件 +│ │ ├── cls_mv3.yml // 训练配置相关,包括骨干网络、head、loss、优化器 +│ │ └── cls_reader.yml // 数据读取相关,数据读取方式、数据存储路径 +│ ├── det // 检测相关配置文件 +│ │ ├── det_db_icdar15_reader.yml // 数据读取 +│ │ ├── det_mv3_db.yml // 训练配置 +│ │ ... +│ └── rec // 识别相关配置文件 +│ ├── rec_benchmark_reader.yml // LMDB 格式数据读取相关 +│ ├── rec_chinese_common_train.yml // 通用中文训练配置 +│ ├── rec_icdar15_reader.yml // simple 数据读取相关,包括数据读取函数、数据路径、标签文件 +│ ... +├── deploy // 部署相关 +│ ├── android_demo // android_demo +│ │ ... +│ ├── cpp_infer // C++ infer +│ │ ├── CMakeLists.txt // Cmake 文件 +│ │ ├── docs // 说明文档 +│ │ │ └── windows_vs2019_build.md +│ │ ├── include // 头文件 +│ │ │ ├── clipper.h // clipper 库 +│ │ │ ├── config.h // 预测配置 +│ │ │ ├── ocr_cls.h // 方向分类器 +│ │ │ ├── ocr_det.h // 文字检测 +│ │ │ ├── ocr_rec.h // 文字识别 +│ │ │ ├── postprocess_op.h // 检测后处理 +│ │ │ ├── preprocess_op.h // 检测预处理 +│ │ │ └── utility.h // 工具 +│ │ ├── readme.md // 说明文档 +│ │ ├── ... +│ │ ├── src // 源文件 +│ │ │ ├── clipper.cpp +│ │ │ ├── config.cpp +│ │ │ ├── main.cpp +│ │ │ ├── ocr_cls.cpp +│ │ │ ├── ocr_det.cpp +│ │ │ ├── ocr_rec.cpp +│ │ │ ├── postprocess_op.cpp +│ │ │ ├── preprocess_op.cpp +│ │ │ └── utility.cpp +│ │ └── tools // 编译、执行脚本 +│ │ ├── build.sh // 编译脚本 +│ │ ├── config.txt // 配置文件 +│ │ └── run.sh // 测试启动脚本 +│ ├── docker +│ │ └── hubserving +│ │ ├── cpu +│ │ │ └── Dockerfile +│ │ ├── gpu +│ │ │ └── Dockerfile +│ │ ├── README_cn.md +│ │ ├── README.md +│ │ └── sample_request.txt +│ ├── hubserving // hubserving +│ │ ├── ocr_det // 文字检测 +│ │ │ ├── config.json // serving 配置 +│ │ │ ├── __init__.py +│ │ │ ├── module.py // 预测模型 +│ │ │ └── params.py // 预测参数 +│ │ ├── ocr_rec // 文字识别 +│ │ │ ├── config.json +│ │ │ ├── __init__.py +│ │ │ ├── module.py +│ │ │ └── params.py +│ │ └── ocr_system // 系统预测 +│ │ ├── config.json +│ │ ├── __init__.py +│ │ ├── module.py +│ │ └── params.py +│ ├── imgs // 预测图片 +│ │ ├── cpp_infer_pred_12.png +│ │ └── demo.png +│ ├── ios_demo // ios demo +│ │ ... +│ ├── lite // lite 部署 +│ │ ├── cls_process.cc // 方向分类器数据处理 +│ │ ├── cls_process.h +│ │ ├── config.txt // 检测配置参数 +│ │ ├── crnn_process.cc // crnn数据处理 +│ │ ├── crnn_process.h +│ │ ├── db_post_process.cc // db数据处理 +│ │ ├── db_post_process.h +│ │ ├── Makefile // 编译文件 +│ │ ├── ocr_db_crnn.cc // 串联预测 +│ │ ├── prepare.sh // 数据准备 +│ │ ├── readme.md // 说明文档 +│ │ ... +│ ├── pdserving // pdserving 部署 +│ │ ├── det_local_server.py // 检测 快速版,部署方便预测速度快 +│ │ ├── det_web_server.py // 检测 完整版,稳定性高分布式部署 +│ │ ├── ocr_local_server.py // 检测+识别 快速版 +│ │ ├── ocr_web_client.py // 客户端 +│ │ ├── ocr_web_server.py // 检测+识别 完整版 +│ │ ├── readme.md // 说明文档 +│ │ ├── rec_local_server.py // 识别 快速版 +│ │ └── rec_web_server.py // 识别 完整版 +│ └── slim +│ └── quantization // 量化相关 +│ ├── export_model.py // 导出模型 +│ ├── quant.py // 量化 +│ └── README.md // 说明文档 +├── doc // 文档教程 +│ ... +├── paddleocr.py +├── ppocr // 网络核心代码 +│ ├── data // 数据处理 +│ │ ├── cls // 方向分类器 +│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch +│ │ │ └── randaugment.py // 随机数据增广操作 +│ │ ├── det // 检测 +│ │ │ ├── data_augment.py // 数据增广操作 +│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch +│ │ │ ├── db_process.py // db 数据处理 +│ │ │ ├── east_process.py // east 数据处理 +│ │ │ ├── make_border_map.py // 生成边界图 +│ │ │ ├── make_shrink_map.py // 生成收缩图 +│ │ │ ├── random_crop_data.py // 随机切割 +│ │ │ └── sast_process.py // sast 数据处理 +│ │ ├── reader_main.py // 数据读取器主函数 +│ │ └── rec // 识别 +│ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,包含 LMDB_Reader 和 Simple_Reader +│ │ └── img_tools.py // 数据处理相关,包括数据归一化、扰动 +│ ├── __init__.py +│ ├── modeling // 组网相关 +│ │ ├── architectures // 模型架构,定义模型所需的各个模块 +│ │ │ ├── cls_model.py // 方向分类器 +│ │ │ ├── det_model.py // 检测 +│ │ │ └── rec_model.py // 识别 +│ │ ├── backbones // 骨干网络 +│ │ │ ├── det_mobilenet_v3.py // 检测 mobilenet_v3 +│ │ │ ├── det_resnet_vd.py +│ │ │ ├── det_resnet_vd_sast.py +│ │ │ ├── rec_mobilenet_v3.py // 识别 mobilenet_v3 +│ │ │ ├── rec_resnet_fpn.py +│ │ │ └── rec_resnet_vd.py +│ │ ├── common_functions.py // 公共函数 +│ │ ├── heads // 头函数 +│ │ │ ├── cls_head.py // 分类头 +│ │ │ ├── det_db_head.py // db 检测头 +│ │ │ ├── det_east_head.py // east 检测头 +│ │ │ ├── det_sast_head.py // sast 检测头 +│ │ │ ├── rec_attention_head.py // 识别 attention +│ │ │ ├── rec_ctc_head.py // 识别 ctc +│ │ │ ├── rec_seq_encoder.py // 识别 序列编码 +│ │ │ ├── rec_srn_all_head.py // 识别 srn 相关 +│ │ │ └── self_attention // srn attention +│ │ │ └── model.py +│ │ ├── losses // 损失函数 +│ │ │ ├── cls_loss.py // 方向分类器损失函数 +│ │ │ ├── det_basic_loss.py // 检测基础loss +│ │ │ ├── det_db_loss.py // DB loss +│ │ │ ├── det_east_loss.py // EAST loss +│ │ │ ├── det_sast_loss.py // SAST loss +│ │ │ ├── rec_attention_loss.py // attention loss +│ │ │ ├── rec_ctc_loss.py // ctc loss +│ │ │ └── rec_srn_loss.py // srn loss +│ │ └── stns // 空间变换网络 +│ │ └── tps.py // TPS 变换 +│ ├── optimizer.py // 优化器 +│ ├── postprocess // 后处理 +│ │ ├── db_postprocess.py // DB 后处理 +│ │ ├── east_postprocess.py // East 后处理 +│ │ ├── lanms // lanms 相关 +│ │ │ ... +│ │ ├── locality_aware_nms.py // nms +│ │ └── sast_postprocess.py // sast 后处理 +│ └── utils // 工具 +│ ├── character.py // 字符处理,包括对文本的编码和解码,计算预测准确率 +│ ├── check.py // 参数加载检查 +│ ├── ic15_dict.txt // 英文数字字典,区分大小写 +│ ├── ppocr_keys_v1.txt // 中文字典,用于训练中文模型 +│ ├── save_load.py // 模型保存和加载函数 +│ ├── stats.py // 统计 +│ └── utility.py // 工具函数,包含输入参数是否合法等相关检查工具 +├── README_en.md // 说明文档 +├── README.md +├── requirments.txt // 安装依赖 +├── setup.py // whl包打包脚本 +└── tools // 启动工具 + ├── eval.py // 评估函数 + ├── eval_utils // 评估工具 + │ ├── eval_cls_utils.py // 分类相关 + │ ├── eval_det_iou.py // 检测 iou 相关 + │ ├── eval_det_utils.py // 检测相关 + │ ├── eval_rec_utils.py // 识别相关 + │ └── __init__.py + ├── export_model.py // 导出 infer 模型 + ├── infer // 基于预测引擎预测 + │ ├── predict_cls.py + │ ├── predict_det.py + │ ├── predict_rec.py + │ ├── predict_system.py + │ └── utility.py + ├── infer_cls.py // 基于训练引擎 预测分类 + ├── infer_det.py // 基于训练引擎 预测检测 + ├── infer_rec.py // 基于训练引擎 预测识别 + ├── program.py // 整体流程 + ├── test_hubserving.py + └── train.py // 启动训练 + +``` diff --git a/doc/doc_ch/update.md b/doc/doc_ch/update.md index 59514aad0885b61721e037a7c5d6b065409391fb..091aa0bc0a107ae26c946dcb1279e6bcefd8210f 100644 --- a/doc/doc_ch/update.md +++ b/doc/doc_ch/update.md @@ -1,4 +1,12 @@ # 更新 +- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941 +- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载) +- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载) +- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md) +- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md) +- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) +- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294) +- 2020.7.23 发布7月21日B站直播课回放和PPT,课节1,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519) - 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统 - 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark - 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具 diff --git a/doc/doc_ch/vertical_and_multilingual_datasets.md b/doc/doc_ch/vertical_and_multilingual_datasets.md new file mode 100644 index 0000000000000000000000000000000000000000..802ade5f8eb3b0d3cc8335034a8fda8821464a8b --- /dev/null +++ b/doc/doc_ch/vertical_and_multilingual_datasets.md @@ -0,0 +1,79 @@ +# 垂类多语言OCR数据集 +这里整理了常用垂类和多语言OCR数据集,持续更新中,欢迎各位小伙伴贡献数据集~ +- [中国城市车牌数据集](#中国城市车牌数据集) +- [银行信用卡数据集](#银行信用卡数据集) +- [验证码数据集-Captcha](#验证码数据集-Captcha) +- [多语言数据集](#多语言数据集) + + + +## 中国城市车牌数据集 + +- **数据来源**:[https://github.com/detectRecog/CCPD](https://github.com/detectRecog/CCPD) + +- **数据简介**: 包含超过25万张中国城市车牌图片及车牌检测、识别信息的标注。包含以下几种不同场景中的车牌图片信息。 + * CCPD-Base: 通用车牌图片 + * CCPD-DB: 车牌区域亮度较亮、较暗或者不均匀 + * CCPD-FN: 车牌离摄像头拍摄位置相对更远或者更近 + * CCPD-Rotate: 车牌包含旋转(水平20\~50度,竖直-10\~10度) + * CCPD-Tilt: 车牌包含旋转(水平15\~45度,竖直15\~45度) + * CCPD-Blur: 车牌包含由于摄像机镜头抖动导致的模糊情况 + * CCPD-Weather: 车牌在雨天、雪天或者雾天拍摄得到 + * CCPD-Challenge: 至今在车牌检测识别任务中最有挑战性的一些图片 + * CCPD-NP: 没有安装车牌的新车图片。 + + ![](../datasets/ccpd_demo.png) + + +- **下载地址** + * 百度云下载地址(提取码是hm0U): [https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw](https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw) + * Google drive下载地址:[https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view](https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view) + + + +## 银行信用卡数据集 + +- **数据来源**: [https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870](https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870) + +- **数据简介**: 训练数据共提供了三类数据 + * 1.招行样卡数据: 包括卡面图片数据及标注数据,总共618张图片 + * 2.单字符数据: 包括图片及标注数据,总共37张图片。 + * 3.仅包含其他银行卡面,不具有更细致的信息,总共50张图片。 + + * demo图片展示如下,标注信息存储在excel表格中,下面的demo图片标注为 + * 前8位卡号:62257583 + * 卡片种类:本行卡 + * 有效期结束:07/41 + * 卡用户拼音:MICHAEL + + ![](../datasets/cmb_demo.jpg) + +- **下载地址**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip) + + + + +## 验证码数据集-Captcha + +- **数据来源**: [https://github.com/lepture/captcha](https://github.com/lepture/captcha) + +- **数据简介**: 这是一个数据合成的工具包,可以根据输入的文本,输出验证码图片,使用该工具包生成几张demo图片如下。 + + ![](../datasets/captcha_demo.png) + +- **下载地址**: 该数据集是生成得到,无下载地址。 + + + + +## 多语言数据集(Multi-lingual scene text detection and recognition) + +- **数据来源**: [https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads) + +- **数据简介**: 多语言检测数据集MLT同时包含了语种识别和检测任务。 + * 在检测任务中,训练集包含10000张图片,共有10种语言,每种语言包含1000张训练图片。测试集包含10000张图片。 + * 在识别任务中,训练集包含111998个样本。 + + +- **下载地址**: 训练集较大,分2部分下载,需要在网站上注册之后才能下载: +[https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads) diff --git a/doc/doc_ch/visualization.md b/doc/doc_ch/visualization.md index 5a711fe93cfd7959731a5ec73cc74120b175347a..cae846816cfdc40ac075ad83b9e5aafca2c3eaf7 100644 --- a/doc/doc_ch/visualization.md +++ b/doc/doc_ch/visualization.md @@ -1,45 +1,43 @@ # 效果展示 -- [超轻量级中文OCR效果展示](#超轻量级中文OCR) -- [通用中文OCR效果展示](#通用中文OCR) -- [支持空格的中文OCR效果展示](#支持空格的中文OCR) - -## 超轻量级中文OCR效果展示 + +## 通用ppocr_server_1.1效果展示
- + + + + + + + +
-
- -
-
- -
+ + +## 超轻量ppocr_mobile_1.0效果展示
- +
- +
- +
-
- -
- -## 通用中文OCR效果展示 + +## 通用ppocr_server_1.0效果展示
@@ -52,16 +50,3 @@
- - -## 支持空格的中文OCR效果展示 - -### 轻量级模型 -
- -
- -### 通用模型 -
- -
diff --git a/doc/doc_ch/whl.md b/doc/doc_ch/whl.md new file mode 100644 index 0000000000000000000000000000000000000000..1b04a9a8a967f39516db0c0f1be3e3842a87278b --- /dev/null +++ b/doc/doc_ch/whl.md @@ -0,0 +1,298 @@ +# paddleocr package使用说明 + +## 快速上手 + +### 安装whl包 + +pip安装 +```bash +pip install paddleocr +``` + +本地构建并安装 +```bash +python3 setup.py bdist_wheel +pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号 +``` +### 1. 代码使用 + +* 检测+分类+识别全流程 +```python +from paddleocr import PaddleOCR, draw_ocr +# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换 +# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。 +ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs/11.jpg' +result = ocr.ocr(img_path, cls=True) +for line in result: + print(line) + +# 显示结果 +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` +结果是一个list,每个item包含了文本框,文字和识别置信度 +```bash +[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]] +[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]] +[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]] +...... +``` +结果可视化 + +
+ +
+ + +* 检测+识别 +```python +from paddleocr import PaddleOCR, draw_ocr +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs/11.jpg' +result = ocr.ocr(img_path) +for line in result: + print(line) + +# 显示结果 +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` +结果是一个list,每个item包含了文本框,文字和识别置信度 +```bash +[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]] +[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]] +[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]] +...... +``` +结果可视化 + +
+ +
+ + +* 分类+识别 +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg' +result = ocr.ocr(img_path, det=False, cls=True) +for line in result: + print(line) +``` +结果是一个list,每个item只包含识别结果和识别置信度 +```bash +['韩国小馆', 0.9907421] +``` + +* 单独执行检测 +```python +from paddleocr import PaddleOCR, draw_ocr +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs/11.jpg' +result = ocr.ocr(img_path, rec=False) +for line in result: + print(line) + +# 显示结果 +from PIL import Image + +image = Image.open(img_path).convert('RGB') +im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` +结果是一个list,每个item只包含文本框 +```bash +[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]] +[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]] +[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]] +...... +``` +结果可视化 + + +
+ +
+ +* 单独执行识别 +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg' +result = ocr.ocr(img_path, det=False) +for line in result: + print(line) +``` +结果是一个list,每个item只包含识别结果和识别置信度 +```bash +['韩国小馆', 0.9907421] +``` + +* 单独执行分类 +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg' +result = ocr.ocr(img_path, det=False, rec=False, cls=True) +for line in result: + print(line) +``` +结果是一个list,每个item只包含分类结果和分类置信度 +```bash +['0', 0.9999924] +``` + +### 通过命令行使用 + +查看帮助信息 +```bash +paddleocr -h +``` + +* 检测+分类+识别全流程 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --cls true +``` +结果是一个list,每个item包含了文本框,文字和识别置信度 +```bash +[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]] +[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]] +[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]] +...... +``` + +* 检测+识别 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg +``` +结果是一个list,每个item包含了文本框,文字和识别置信度 +```bash +[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]] +[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]] +[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]] +...... +``` + +* 分类+识别 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false +``` + +结果是一个list,每个item只包含识别结果和识别置信度 +```bash +['韩国小馆', 0.9907421] +``` + +* 单独执行检测 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false +``` +结果是一个list,每个item只包含文本框 +```bash +[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]] +[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]] +[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]] +...... +``` + +* 单独执行识别 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false +``` + +结果是一个list,每个item只包含识别结果和识别置信度 +```bash +['韩国小馆', 0.9907421] +``` + +* 单独执行分类 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false --rec false +``` + +结果是一个list,每个item只包含分类结果和分类置信度 +```bash +['0', 0.9999924] +``` + +## 自定义模型 +当内置模型无法满足需求时,需要使用到自己训练的模型。 +首先,参照[inference.md](./inference.md) 第一节转换将检测、分类和识别模型转换为inference模型,然后按照如下方式使用 + +### 代码使用 +```python +from paddleocr import PaddleOCR, draw_ocr +# 模型路径下必须含有model和params文件 +ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True) +img_path = 'PaddleOCR/doc/imgs/11.jpg' +result = ocr.ocr(img_path, cls=True) +for line in result: + print(line) + +# 显示结果 +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +### 通过命令行使用 + +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true +``` + +## 参数说明 + +| 字段 | 说明 | 默认值 | +|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------| +| use_gpu | 是否使用GPU | TRUE | +| gpu_mem | 初始化占用的GPU内存大小 | 8000M | +| image_dir | 通过命令行调用时执行预测的图片或文件夹路径 | | +| det_algorithm | 使用的检测算法类型 | DB | +| det_model_dir | 检测模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/det`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None | +| det_max_side_len | 检测算法前向时图片长边的最大尺寸,当长边超出这个值时会将长边resize到这个大小,短边等比例缩放 | 960 | +| det_db_thresh | DB模型输出预测图的二值化阈值 | 0.3 | +| det_db_box_thresh | DB模型输出框的阈值,低于此值的预测框会被丢弃 | 0.5 | +| det_db_unclip_ratio | DB模型输出框扩大的比例 | 2 | +| det_east_score_thresh | EAST模型输出预测图的二值化阈值 | 0.8 | +| det_east_cover_thresh | EAST模型输出框的阈值,低于此值的预测框会被丢弃 | 0.1 | +| det_east_nms_thresh | EAST模型输出框NMS的阈值 | 0.2 | +| rec_algorithm | 使用的识别算法类型 | CRNN | +| rec_model_dir | 识别模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None | +| rec_image_shape | 识别算法的输入图片尺寸 | "3,32,320" | +| rec_char_type | 识别算法的字符类型,中英文(ch)、英文(en)、法语(french)、德语(german)、韩语(korean)、日语(japan) | ch | +| rec_batch_num | 进行识别时,同时前向的图片数 | 30 | +| max_text_length | 识别算法能识别的最大文字长度 | 25 | +| rec_char_dict_path | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径 | ./ppocr/utils/ppocr_keys_v1.txt | +| use_space_char | 是否识别空格 | TRUE | +| use_angle_cls | 是否加载分类模型 | FALSE | +| cls_model_dir | 分类模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/cls`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None | +| cls_image_shape | 分类算法的输入图片尺寸 | "3, 48, 192" | +| label_list | 分类算法的标签列表 | ['0', '180'] | +| cls_batch_num | 进行分类时,同时前向的图片数 |30 | +| enable_mkldnn | 是否启用mkldnn | FALSE | +| use_zero_copy_run | 是否通过zero_copy_run的方式进行前向 | FALSE | +| lang | 模型语言类型,目前支持 中文(ch)和英文(en) | ch | +| det | 前向时使用启动检测 | TRUE | +| rec | 前向时是否启动识别 | TRUE | +| cls | 前向时是否启动分类 | FALSE | diff --git a/doc/doc_en/FAQ_en.md b/doc/doc_en/FAQ_en.md index 04feb363777801088efa0425195afd9e065a5b1e..25777be77b6393c09c38e3c319ca1bd50cc3b1e8 100644 --- a/doc/doc_en/FAQ_en.md +++ b/doc/doc_en/FAQ_en.md @@ -45,9 +45,12 @@ At present, the open source model, dataset and magnitude are as follows: Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. 10. **Error in using the model with TPS module for prediction** -Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100) +Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3]\(108) != Grid dimension[2]\(100) Solution:TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en' -11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary** +11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary** +The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file. -The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file. \ No newline at end of file + +12. **Results of cpp_infer and python_inference are very different** +Versions of exprted inference model and inference libraray should be same. For example, on Windows platform, version of the inference libraray that PaddlePaddle provides is 1.8, but version of the inference model that PaddleOCR provides is 1.7, you should export model yourself(`tools/export_model.py`) on PaddlePaddle1.8 and then use the exported model for inference. diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md new file mode 100644 index 0000000000000000000000000000000000000000..2e21fd621971e062384a9323e79a8cf4498d7495 --- /dev/null +++ b/doc/doc_en/algorithm_overview_en.md @@ -0,0 +1,66 @@ + +## Algorithm introduction + +This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v1.1 models list](./models_list_en.md). + + +- [1. Text Detection Algorithm](#TEXTDETECTIONALGORITHM) +- [2. Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM) + + +### 1. Text Detection Algorithm + +PaddleOCR open source text detection algorithms list: +- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) +- [x] DB([paper](https://arxiv.org/abs/1911.08947)) +- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research) + +On the ICDAR2015 dataset, the text detection result is as follows: + +|Model|Backbone|precision|recall|Hmean|Download link| +|-|-|-|-|-|-| +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| +|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| + +On Total-Text dataset, the text detection result is as follows: + +|Model|Backbone|precision|recall|Hmean|Download link| +|-|-|-|-|-|-| +|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| + +**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). + +For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) + + +### 2. Text Recognition Algorithm + +PaddleOCR open-source text recognition algorithms list: +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) +- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) +- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) +- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research) + +Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: + +|Model|Backbone|Avg Accuracy|Module combination|Download link| +|-|-|-|-|-| +|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| +|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| +|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| +|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| +|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| +|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| + +**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA) (download code: y3ry). + +The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar). + +Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) diff --git a/doc/doc_en/android_demo_en.md b/doc/doc_en/android_demo_en.md new file mode 100644 index 0000000000000000000000000000000000000000..bab4f5275c785d0dbbc1456ae29b98ff82cbae1a --- /dev/null +++ b/doc/doc_en/android_demo_en.md @@ -0,0 +1,60 @@ +# Android Demo quick start + +### 1. Install the latest version of Android Studio + +It can be downloaded from https://developer.android.com/studio . This Demo is written by Android Studio version 4.0. + +### 2. Create a new project + +The NDK version 20b is used in the demo test, and the compilation can be successfully supported for version 20 and above. + +If you are a beginner, you can install and test the NDK compilation environment in the following ways. + +File -> New ->New Project to create "Native C++" project + +1. Start a new Android Studio project + + Select Native C++ in the project template, select Paddle OCR/deploy/android_demo path + After entering the project, it will be automatically compiled. The first compilation + will take a long time. It is recommended to add an agent to speed up the download. + +**Agent add:** + + Android Studio -> Perferences -> Appearance & Behavior -> System Settings -> HTTP Proxy -> Manual proxy configuration + +![](../demo/proxy.png) + +2. Start compilation + +Click the compile button, connect the phone, and follow the instructions of Android Studio to complete the operation. + +When you see the following picture in Android Studio, the compilation is complete: + +![](../demo/build.png) + +**Tip:** At this time, if the following error message that OpenCV cannot be found appears, please re-click compile, +exit the project after compiling, and enter again. + +![](../demo/error.png) + +### 3. Send to mobile + +Complete the compilation, click Run, and check the effect on the mobile phone. + +### 4. How to customize the demo picture + +1. Image storage path: android_demo/app/src/main/assets/images + + Place the custom picture under this path + +2. Configuration file: android_demo/app/src/main/res/values/strings.xml + + Modify IMAGE_PATH_DEFAULT to a custom picture name + +# Get more support + +Go to [EasyEdge](https://ai.baidu.com/easyedge/app/open_source_demo?referrerUrl=paddlelite) to get more development support: + +- Demo APP: You can use your mobile phone to scan the code to install, which is convenient for the mobile terminal to quickly experience text recognition + +- SDK: The model is packaged to adapt to different chip hardware and operating system SDKs, including a complete interface to facilitate secondary development diff --git a/doc/doc_en/angle_class_en.md b/doc/doc_en/angle_class_en.md new file mode 100644 index 0000000000000000000000000000000000000000..c7fff3a1833570cda7687b87efb7c3af2ec49120 --- /dev/null +++ b/doc/doc_en/angle_class_en.md @@ -0,0 +1,126 @@ +## TEXT ANGLE CLASSIFICATION + +### DATA PREPARATION + +Please organize the dataset as follows: + +The default storage path for training data is `PaddleOCR/train_data/cls`, if you already have a dataset on your disk, just create a soft link to the dataset directory: + +``` +ln -sf /train_data/cls/dataset +``` + +please refer to the following to organize your data. + +- Training set + +First put the training images in the same folder (train_images), and use a txt file (cls_gt_train.txt) to store the image path and label. + +* Note: by default, the image path and image label are split with `\t`, if you use other methods to split, it will cause training error + +0 and 180 indicate that the angle of the image is 0 degrees and 180 degrees, respectively. + +``` +" Image file name Image annotation " + +train_data/word_001.jpg 0 +train_data/word_002.jpg 180 +``` + +The final training set should have the following file structure: + +``` +|-train_data + |-cls + |- cls_gt_train.txt + |- train + |- word_001.png + |- word_002.jpg + |- word_003.jpg + | ... +``` + +- Test set + +Similar to the training set, the test set also needs to be provided a folder +containing all images (test) and a cls_gt_test.txt. The structure of the test set is as follows: + +``` +|-train_data + |-cls + |- cls_gt_test.txt + |- test + |- word_001.jpg + |- word_002.jpg + |- word_003.jpg + | ... +``` + +### TRAINING + +PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. + +Start training: + +``` +# Set PYTHONPATH path +export PYTHONPATH=$PYTHONPATH:. +# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES +export CUDA_VISIBLE_DEVICES=0,1,2,3 +# Training icdar15 English data +python3 tools/train.py -c configs/cls/cls_mv3.yml +``` + +- Data Augmentation + +PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file. + +The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment. + +Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: +[randaugment.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/cls/randaugment.py) +[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) + + +- Training + +PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/cls_mv3/best_accuracy` during the evaluation process. + +If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training. + +**Note that the configuration file for prediction/evaluation must be consistent with the training.** + +### EVALUATION + +The evaluation data set can be modified via `configs/cls/cls_reader.yml` setting of `label_file_path` in EvalReader. + +``` +export CUDA_VISIBLE_DEVICES=0 +# GPU evaluation, Global.checkpoints is the weight to be tested +python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy +``` + +### PREDICTION + +* Training engine prediction + +Using the model trained by paddleocr, you can quickly get prediction through the following script. + +The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`: + +``` +# Predict English results +python3 tools/infer_rec.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg +``` + +Input image: + +![](../imgs_words/en/word_1.png) + +Get the prediction result of the input image: + +``` +infer_img: doc/imgs_words/en/word_1.png + scores: [[0.93161047 0.06838956]] + label: [0] +``` diff --git a/doc/doc_en/benchmark_en.md b/doc/doc_en/benchmark_en.md new file mode 100644 index 0000000000000000000000000000000000000000..91b015941924add81f8b4f0d9d9ca13274348131 --- /dev/null +++ b/doc/doc_en/benchmark_en.md @@ -0,0 +1,56 @@ +# BENCHMARK + +This document gives the performance of the series models for Chinese and English recognition. + +## TEST DATA + +We collected 300 images for different real application scenarios to evaluate the overall OCR system, including contract samples, license plates, nameplates, train tickets, test sheets, forms, certificates, street view images, business cards, digital meter, etc. The following figure shows some images of the test set. + +
+ +
+ +## MEASUREMENT + +Explanation: +- v1.0 indicates DB+CRNN models without the strategies. v1.1 indicates the PP-OCR models with the strategies and the direction classify. slim_v1.1 indicates the PP-OCR models with prunner or quantization. + +- The long size of the input for the text detector is 960. + +- The evaluation time-consuming stage is the complete stage from image input to result output, including image pre-processing and post-processing. + +- ```Intel Xeon 6148``` is the server-side CPU model. Intel MKL-DNN is used in the test to accelerate the CPU prediction speed. + +- ```Snapdragon 855``` is a mobile processing platform model. + +Compares the model size and F-score: + +| Model Name | Model Size
of the
Whole System\(M\) | Model Size
of the Text
Detector\(M\) | Model Size
of the Direction
Classifier\(M\) | Model Size
of the Text
Recognizer \(M\) | F\-score | +|:-:|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 | +| ch\_ppocr\_server\_v1\.1 | 155\.1 | 47\.2 | 0\.9 | 107 | 0\.5414 | +| ch\_ppocr\_mobile\_v1\.0 | 8\.6 | 4\.1 | \- | 4\.5 | 0\.393 | +| ch\_ppocr\_server\_v1\.0 | 203\.8 | 98\.5 | \- | 105\.3 | 0\.4436 | + +Compares the time-consuming on T4 GPU (ms): + +| Model Name | Overall | Text Detector | Direction Classifier | Text Recognizer | +|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 137 | 35 | 24 | 78 | +| ch\_ppocr\_server\_v1\.1 | 204 | 39 | 25 | 140 | +| ch\_ppocr\_mobile\_v1\.0 | 117 | 41 | \- | 76 | +| ch\_ppocr\_server\_v1\.0 | 199 | 52 | \- | 147 | + +Compares the time-consuming on CPU (ms): + +| Model Name | Overall | Text Detector | Direction Classifier | Text Recognizer | +|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 421 | 164 | 51 | 206 | +| ch\_ppocr\_mobile\_v1\.0 | 398 | 219 | \- | 179 | + +Compares the model size, F-score, the time-consuming on SD 855 of between the slim models and the original models: + +| Model Name | Model Size
of the
Whole System\(M\) | Model Size
of the Text
Detector\(M\) | Model Size
of the Direction
Classifier\(M\) | Model Size
of the Text
Recognizer \(M\) | F\-score | SD 855
\(ms\) | +|:-:|:-:|:-:|:-:|:-:|:-:|:-:| +| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 | 306 | +| ch\_ppocr\_mobile\_slim\_v1\.1 | 3\.5 | 1\.4 | 0\.5 | 1\.6 | 0\.521 | 268 | diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index ffead1ee335a3d2f10491792a23a69e0f22a1755..722da6620fd03912c48a47679e7c13a23f15752e 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -10,7 +10,7 @@ The following list can be viewed via `--help` ## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE -Take `rec_chinese_lite_train.yml` as an example +Take `rec_chinese_lite_train_v1.1.yml` as an example | Parameter | Use | Default | Note | @@ -32,6 +32,7 @@ Take `rec_chinese_lite_train.yml` as an example | loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention | | distort | Set use distort | false | Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) | | use_space_char | Wether to recognize space | false | Only support in character_type=ch mode | + label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in the direction classifier | | reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ | | pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ | | checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | @@ -60,6 +61,9 @@ Take `rec_icdar15_train.yml` as an example: | beta1 | Set the exponential decay rate for the 1st moment estimates | 0.9 | \ | | beta2 | Set the exponential decay rate for the 2nd moment estimates | 0.999 | \ | | decay | Whether to use decay | \ | \ | -| function(decay) | Set the decay function | cosine_decay | Only support cosine_decay | -| step_each_epoch | The number of steps in an epoch. | 20 | Calculation :total_image_num / (batch_size_per_card * card_size) | -| total_epoch | The number of epochs | 1000 | Consistent with Global.epoch_num | +| function(decay) | Set the decay function | cosine_decay | Support cosine_decay, cosine_decay_warmup and piecewise_decay | +| step_each_epoch | The number of steps in an epoch. Used in cosine_decay/cosine_decay_warmup | 20 | Calculation: total_image_num / (batch_size_per_card * card_size) | +| total_epoch | The number of epochs. Used in cosine_decay/cosine_decay_warmup | 1000 | Consistent with Global.epoch_num | +| warmup_minibatch | Number of steps for linear warmup. Used in cosine_decay_warmup | 1000 | \ | +| boundaries | The step intervals to reduce learning rate. Used in piecewise_decay | - | The format is list | +| decay_rate | Learning rate decay rate. Used in piecewise_decay | - | \ | diff --git a/doc/doc_en/customize_en.md b/doc/doc_en/customize_en.md index b63de67c6226abbb5b4fb8d0ed57c19142307203..fb47c14f3346e918f32950c8eec5ada76345ce59 100644 --- a/doc/doc_en/customize_en.md +++ b/doc/doc_en/customize_en.md @@ -6,7 +6,7 @@ The process of making a customized ultra-lightweight OCR models can be divided i PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model : ``` -python3 tools/train.py -c configs/det/det_mv3_db.yml +python3 tools/train.py -c configs/det/det_mv3_db.yml 2>&1 | tee det_db.log ``` For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md) @@ -14,7 +14,7 @@ For more details about data preparation and training tutorials, refer to the doc PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network: ``` -python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml +python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml 2>&1 | tee rec_ch_lite.log ``` For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md) diff --git a/doc/doc_en/data_annotation_en.md b/doc/doc_en/data_annotation_en.md new file mode 100644 index 0000000000000000000000000000000000000000..176aa6d53be69771edd4187c9693fd39179d112f --- /dev/null +++ b/doc/doc_en/data_annotation_en.md @@ -0,0 +1,20 @@ +# DATA ANNOTATION TOOLS +There are the commonly used data annotation tools, which will be continuously updated. Welcome to contribute tools~ + +### 1. labelImg +- Tool description: Rectangular label +- Tool address: https://github.com/tzutalin/labelImg +- Sketch diagram: +![labelimg](../datasets/labelimg.jpg) + +### 2. roLabelImg +- Tool description: Label tool rewritten based on labelImg, supporting rotating rectangular label +- Tool address: https://github.com/cgvict/roLabelImg +- Sketch diagram: +![roLabelImg](../datasets/roLabelImg.png) + +### 3. labelme +- Tool description: Support four points, polygons, circles and other labels +- Tool address: https://github.com/wkentaro/labelme +- Sketch diagram: +![labelme](../datasets/labelme.jpg) diff --git a/doc/doc_en/data_synthesis_en.md b/doc/doc_en/data_synthesis_en.md new file mode 100644 index 0000000000000000000000000000000000000000..81b19d538d7c680ff0a73d7bff204ca667fdc552 --- /dev/null +++ b/doc/doc_en/data_synthesis_en.md @@ -0,0 +1,11 @@ +# DATA SYNTHESIS TOOLS + +In addition to open source data, users can also use synthesis tools to synthesize data. +There are the commonly used data synthesis tools, which will be continuously updated. Welcome to contribute tools~ + +* [Text_renderer](https://github.com/Sanster/text_renderer) +* [SynthText](https://github.com/ankush-me/SynthText) +* [SynthText_Chinese_version](https://github.com/JarveeLee/SynthText_Chinese_version) +* [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) +* [SynthText3D](https://github.com/MhLiao/SynthText3D) +* [UnrealText](https://github.com/Jyouhou/UnrealText/) diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 6bb496c91c32702eb408ea5afcfded33301d7535..96928364486c3aa122806ad5144681dfde785556 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -1,12 +1,12 @@ # TEXT DETECTION -This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. +This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. ## DATA PREPARATION The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: -``` +```shell # Under the PaddleOCR path cd PaddleOCR/ wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt @@ -22,29 +22,34 @@ After decompressing the data set and downloading the annotation file, PaddleOCR/ └─ test_icdar2015_label.txt Test annotation of icdar dataset ``` -The provided annotation file format is as follow: +The provided annotation file format is as follow, seperated by "\t": ``` " Image file name Image annotation information encoded by json.dumps" -ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] +ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}] ``` -The image annotation after json.dumps() encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. +The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries. + +The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. + +`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.** -`transcription` represents the text of the current text box, and this information is not needed in the text detection task. -If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format. +If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format. ## TRAINING -First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. -``` +First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. +```shell cd PaddleOCR/ # Download the pre-trained model of MobileNetV3 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar -# Download the pre-trained model of ResNet50 +# or, download the pre-trained model of ResNet18_vd +wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar +# or, download the pre-trained model of ResNet50_vd wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar # decompressing the pre-training model file, take MobileNetV3 as an example -tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/ +tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/ # Note: After decompressing the backbone pre-training weight file correctly, the file list in the folder is as follows: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/ @@ -56,64 +61,65 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models ``` -**START TRAINING** -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml +#### START TRAINING +*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.* +```shell +python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml 2>&1 | tee train_det.log ``` -In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file. -For a detailed explanation of the configuration file, please refer to [link](./config_en.md). +In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file. +For a detailed explanation of the configuration file, please refer to [config](./config_en.md). -You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 +You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 +```shell +python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001 ``` -**load trained model and conntinue training** -If you expect to load trained model and continue the training again, you can specify the `Global.checkpoints` parameter as the model path to be loaded. +#### load trained model and continue training +If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. For example: -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model +```shell +python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model ``` -**Note**:The priority of Global.checkpoints is higher than the priority of Global.pretrain_weights, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by Global.checkpoints is wrong, the one specified by Global.pretrain_weights will be loaded. +**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded. ## EVALUATION PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. -Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml` +Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3_v1.1.yml` -When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5. If you use different datasets, different models for training, these two parameters should be adjusted for better result. +When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result. +```shell +python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 -``` -The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file. +The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. Such as: -``` -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +```shell +python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` -* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. +* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. -## TEST DETECTION RESULT +## TEST Test the detection result on a single image: -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" +```shell +python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" ``` When testing the DB model, adjust the post-processing threshold: -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +```shell +python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` Test the detection result on all images in the folder: -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" +```shell +python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" ``` diff --git a/doc/doc_en/handwritten_datasets_en.md b/doc/doc_en/handwritten_datasets_en.md new file mode 100644 index 0000000000000000000000000000000000000000..da6008a2acfa684dd6efe37fda42eb0e2c7eb97a --- /dev/null +++ b/doc/doc_en/handwritten_datasets_en.md @@ -0,0 +1,28 @@ +# Handwritten OCR dataset +Here we have sorted out the commonly used handwritten OCR dataset datasets, which are being updated continuously. We welcome you to contribute datasets ~ +- [Institute of automation, Chinese Academy of Sciences - handwritten Chinese dataset](#Institute of automation, Chinese Academy of Sciences - handwritten Chinese dataset) +- [NIST handwritten single character dataset - English](#NIST handwritten single character dataset - English) + + +## Institute of automation, Chinese Academy of Sciences - handwritten Chinese dataset +- **Data source**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html +- **Data introduction**: + * It includes online and offline handwritten data,`HWDB1.0~1.2` has totally 3895135 handwritten single character samples, which belong to 7356 categories (7185 Chinese characters and 171 English letters, numbers and symbols);`HWDB2.0~2.2` has totally 5091 pages of images, which are divided into 52230 text lines and 1349414 words. All text and text samples are stored as grayscale images. Some sample words are shown below. + + ![](../datasets/CASIA_0.jpg) + +- **Download address**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html +- **使用建议**:Data for single character, white background, can form a large number of text lines for training. White background can be processed into transparent state, which is convenient to add various backgrounds. For the case of semantic needs, it is suggested to extract single character from real corpus to form text lines. + + + +## NIST handwritten single character dataset - English(NIST Handprinted Forms and Characters Database) + +- **Data source**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19) + +- **Data introduction**: NIST19 dataset is suitable for handwritten document and character recognition model training. It is extracted from the handwritten sample form of 3600 authors and contains 810000 character images in total. Nine of them are shown below. + + ![](../datasets/nist_demo.png) + + +- **Download address**: [https://www.nist.gov/srd/nist-special-database-19](https://www.nist.gov/srd/nist-special-database-19) diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md index 0fd7a3725df653b29703a23618675fbc72a6e342..868d579369a7d78a419b978895f4244ae00acc7d 100644 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -1,25 +1,62 @@ -# PREDICTION FROM INFERENCE MODEL +# Reasoning based on Python prediction engine -The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. +The inference model (the model saved by `fluid.io.save_inference_model`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. -Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). +Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/extension/paddle_inference.md). Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model. +- [CONVERT TRAINING MODEL TO INFERENCE MODEL](#CONVERT) + - [Convert detection model to inference model](#Convert_detection_model) + - [Convert recognition model to inference model](#Convert_recognition_model) + - [Convert angle classification model to inference model](#Convert_angle_class_model) + + +- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE) + - [1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE](#LIGHTWEIGHT_DETECTION) + - [2. DB TEXT DETECTION MODEL INFERENCE](#DB_DETECTION) + - [3. EAST TEXT DETECTION MODEL INFERENCE](#EAST_DETECTION) + - [4. SAST TEXT DETECTION MODEL INFERENCE](#SAST_DETECTION) + - [5. Multilingual model inference](#Multilingual model inference) + +- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE) + - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION) + - [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION) + - [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION) + - [4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE](#SRN-BASED_RECOGNITION) + - [5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS) + - [6. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE) + +- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) + - [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) + +- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION) + - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_CHINESE_MODEL) + - [2. OTHER MODELS](#OTHER_MODELS) + + ## CONVERT TRAINING MODEL TO INFERENCE MODEL + ### Convert detection model to inference model Download the lightweight Chinese detection model: ``` -wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar -C ./ch_lite/ ``` + The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command: ``` -python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/ +# -c Set the training algorithm yml configuration file +# -o Set optional parameters +# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# Global.save_inference_dir Set the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_inference_dir=./inference/det_db/ ``` + When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file. `Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved. After the conversion is successful, there are two files in the `save_inference_dir` directory: @@ -29,17 +66,22 @@ inference/det_db/ └─ params Check the parameter file of the inference model ``` + ### Convert recognition model to inference model Download the lightweight Chinese recognition model: ``` -wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar && tar xf ch_ppocr_mobile_v1.1_rec_train.tar -C ./ch_lite/ ``` The recognition model is converted to the inference model in the same way as the detection, as follows: ``` -python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ - Global.save_inference_dir=./inference/rec_crnn/ +# -c Set the training algorithm yml configuration file +# -o Set optional parameters +# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# Global.save_inference_dir Set the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_rec_train/best_accuracy \ ``` If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path. @@ -51,10 +93,40 @@ After the conversion is successful, there are two files in the directory: └─ params Identify the parameter files of the inference model ``` + +### Convert angle classification model to inference model + +Download the angle classification model: +``` +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_pre.tar && tar xf ./ch_lite/ch_ppocr_mobile-v1.1.cls_pre.tar -C ./ch_lite/ +``` + +The angle classification model is converted to the inference model in the same way as the detection, as follows: +``` +# -c Set the training algorithm yml configuration file +# -o Set optional parameters +# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# Global.save_inference_dir Set the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/cls_model/best_accuracy \ + Global.save_inference_dir=./inference/cls/ +``` + +After the conversion is successful, there are two files in the directory: +``` +/inference/cls/ + └─ model Identify the saved model files + └─ params Identify the parameter files of the inference model +``` + + + ## TEXT DETECTION MODEL INFERENCE -The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. +The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. +Because EAST and DB algorithms are very different, when inference, it is necessary to **adapt the EAST text detection algorithm by passing in corresponding parameters**. + ### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE For lightweight Chinese detection model inference, you can execute the following commands: @@ -78,6 +150,7 @@ If you want to use the CPU for prediction, execute the command as follows python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False ``` + ### 2. DB TEXT DETECTION MODEL INFERENCE First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert: @@ -102,6 +175,7 @@ The visualized text detection results are saved to the `./inference_results` fol **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. + ### 3. EAST TEXT DETECTION MODEL INFERENCE First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert: @@ -114,23 +188,64 @@ First, convert the model saved in the EAST text detection training process into python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east" ``` -For EAST text detection model inference, you need to set the parameter det_algorithm, specify the detection algorithm type to EAST, run the following command: +**For EAST text detection model inference, you need to set the parameter ``--det_algorithm="EAST"``**, run the following command: ``` python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" ``` + The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: ![](../imgs_results/det_res_img_10_east.jpg) -**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup. +**Note**: EAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases. + + + +### 4. SAST TEXT DETECTION MODEL INFERENCE +#### (1). Quadrangle text detection model (ICDAR2015) +First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)), you can use the following command to convert: + +``` +python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.checkpoints="./models/sast_r50_vd_icdar2015/best_accuracy" Global.save_inference_dir="./inference/det_sast_ic15" +``` + +**For SAST quadrangle text detection model inference, you need to set the parameter `--det_algorithm="SAST"`**, run the following command: + +``` +python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_sast_ic15/" +``` + +The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: + +![](../imgs_results/det_res_img_10_sast.jpg) + +#### (2). Curved text detection model (Total-Text) +First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the Total-Text English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)), you can use the following command to convert: + +``` +python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.checkpoints="./models/sast_r50_vd_total_text/best_accuracy" Global.save_inference_dir="./inference/det_sast_tt" +``` + +**For SAST curved text detection model inference, you need to set the parameter `--det_algorithm="SAST"` and `--det_sast_polygon=True`**, run the following command: +``` +python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_sast_tt/" --det_sast_polygon=True +``` + +The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: +![](../imgs_results/det_res_img623_sast.jpg) + +**Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases. + + ## TEXT RECOGNITION MODEL INFERENCE The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details. + ### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE For lightweight Chinese recognition model inference, you can execute the following commands: @@ -146,6 +261,7 @@ After executing the command, the prediction results (recognized text and score) Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] + ### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`. @@ -166,6 +282,7 @@ For STAR-Net text recognition model inference, execute the following commands: python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` + ### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE ![](../imgs_words_en/word_336.png) @@ -184,30 +301,97 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) ``` -### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY + +### 4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE + +The recognition model based on SRN requires additional setting of the recognition algorithm parameter --rec_algorithm="SRN". +At the same time, it is necessary to ensure that the predicted shape is consistent with the training, such as: --rec_image_shape="1, 64, 256" + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \ + --rec_model_dir="./inference/srn/" \ + --rec_image_shape="1, 64, 256" \ + --rec_char_type="en" \ + --rec_algorithm="SRN" +``` + + + +### 5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict. ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path" ``` -## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION + +### 6. MULTILINGAUL MODEL INFERENCE +If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, +You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/ utils/korean_dict.txt" --vis_font_path="doc/korean.ttf" +``` +![](../imgs_words/korean/1.jpg) + +After executing the command, the prediction result of the above figure is: + +``` text +2020-09-19 16:15:05,076-INFO: index: [205 206 38 39] +2020-09-19 16:15:05,077-INFO: word : 바탕으로 +2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535 +``` + + +## ANGLE CLASSIFICATION MODEL INFERENCE + +The following will introduce the angle classification model inference. + + + +### 1.ANGLE CLASSIFICATION MODEL INFERENCE + +For angle classification model inference, you can execute the following commands: + +``` +python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="./inference/cls/" +``` + +![](../imgs_words/ch/word_4.jpg) + +After executing the command, the prediction results (classification angle and score) of the above image will be printed on the screen. + +Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963] + + +## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION + + ### 1. LIGHTWEIGHT CHINESE MODEL -When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default. +When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model.The visualized recognition results are saved to the `./inference_results` folder by default. ``` -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" +# use direction classifier +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true + +# not use use direction classifier +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" ``` After executing the command, the recognition result image is as follows: ![](../imgs_results/2.jpg) + ### 2. OTHER MODELS -If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following command uses the combination of the EAST text detection and STAR-Net text recognition: +If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model. + +**Note: due to the limitation of rotation logic of detected box, SAST curved text detection model (using the parameter `det_sast_polygon=True`) is not supported for model combination yet.** + +The following command uses the combination of the EAST text detection and STAR-Net text recognition: ``` python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" diff --git a/doc/doc_en/installation_en.md b/doc/doc_en/installation_en.md index cc3bfb52f5dc592cf9fe1dc1a9a6fd8f1d3bb5cf..7608e12d979e9f26d58a8f0504d3740e9cf67014 100644 --- a/doc/doc_en/installation_en.md +++ b/doc/doc_en/installation_en.md @@ -3,27 +3,29 @@ After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. PaddleOCR working environment: -- PaddlePaddle1.7 -- python3 +- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0.0.beta +- python3.7 - glibc 2.23 -It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/). +It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://www.runoob.com/docker/docker-tutorial.html/). -1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient. +*If you want to directly run the prediction code on mac or windows, you can start from step 2.* + +**1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient.** ``` # Switch to the working directory cd /home/Projects # You need to create a docker container for the first run, and do not need to run the current command when you run it again # Create a docker container named ppocr and map the current directory to the /paddle directory of the container -#If you want to use docker in a CPU environment, use docker instead of nvidia-docker to create docker +#If using CPU, use docker instead of nvidia-docker to create docker sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash ``` -If you have cuda9 installed on your machine, please run the following command to create a container: +If using CUDA9, please run the following command to create a container: ``` sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash ``` -If you have cuda10 installed on your machine, please run the following command to create a container: +If using CUDA10, please run the following command to create a container: ``` sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash ``` @@ -47,20 +49,20 @@ docker images hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 ``` -2. Install PaddlePaddle Fluid v1.7 (the higher version is not supported yet, the adaptation work is in progress) +**2. Install PaddlePaddle Fluid v2.0 ** ``` pip3 install --upgrade pip -# If you have cuda9 installed on your machine, please run the following command to install -python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple +# If you have cuda9 or cuda10 installed on your machine, please run the following command to install +python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple -# If you have cuda10 installed on your machine, please run the following command to install -python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple +# If you only have cpu on your machine, please run the following command to install +python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple ``` For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. -3. Clone PaddleOCR repo +**3. Clone PaddleOCR repo** ``` # Recommend git clone https://github.com/PaddlePaddle/PaddleOCR @@ -72,8 +74,14 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR # Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method. ``` -4. Install third-party libraries +**4. Install third-party libraries** ``` cd PaddleOCR pip3 install -r requirments.txt ``` + +If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. + +Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely). + +Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found) diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md new file mode 100644 index 0000000000000000000000000000000000000000..3017ed85d9f978241a38ddca0610311c268436e1 --- /dev/null +++ b/doc/doc_en/models_list_en.md @@ -0,0 +1,70 @@ +## OCR model list(V1.1, updated on 9.22) + +- [1. Text Detection Model](#Detection) +- [2. Text Recognition Model](#Recognition) + - [Chinese Recognition Model](#Chinese) + - [English Recognition Model](#English) + - [Multilingual Recognition Model](#Multilingual) +- [3. Text Angle Classification Model](#Angle) + +The downloadable models provided by PaddleOCR include `inference model`, `trained model`, `pre-trained model` and `slim model`. The differences between the models are as follows: + +|model type|model format|description| +|-|-|-| +|inference model|model、params|Used for reasoning based on Python prediction engine. [detail](./inference_en.md)| +|trained model / pre-trained model|\*.pdmodel、\*.pdopt、\*.pdparams|The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.| +|slim model|\*.nb|Generally used for Lite deployment| + + + +### 1. Text Detection Model +|model name|description|config|model size|download| +|-|-|-|-|-| +|ch_ppocr_mobile_slim_v1.1_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|1.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)| +|ch_ppocr_mobile_v1.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)| +|ch_ppocr_server_v1.1_det|General model, which is larger than the lightweight model, but achieved better performance|[det_r18_vd_db_v1.1.yml](../../configs/det/det_r18_vd_db_v1.1.yml)|47.2M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)| + + + +### 2. Text Recognition Model + + +#### Chinese Recognition Model +|model name|description|config|model size|download| +|-|-|-|-|-| +|ch_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|1.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) | +|ch_ppocr_mobile_v1.1_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|4.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | +|ch_ppocr_server_v1.1_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml)|105M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | + +**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset. + + +#### English Recognition Model +|model name|description|config|model size|download| +|-|-|-|-|-| +|en_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|0.9M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb) | +|en_ppocr_mobile_v1.1_rec|Original lightweight model, supporting English and number recognition|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|2.0M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar) | + + +#### Multilingual Recognition Model(Updating...) +|model name|description|config|model size|download| +|-|-|-|-|-| +| french_ppocr_mobile_v1.1_rec |Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_languages/rec_french_lite_train.yml)|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar) | +| german_ppocr_mobile_v1.1_rec |German model for French recognition|[rec_ger_lite_train.yml](../../configs/rec/multi_languages/rec_ger_lite_train.yml)|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar) | +| korean_ppocr_mobile_v1.1_rec |Lightweight model for Korean recognition|[rec_korean_lite_train.yml](../../configs/rec/multi_languages/rec_korean_lite_train.yml)|3.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar) | +| japan_ppocr_mobile_v1.1_rec |Lightweight model for Japanese recognition|[rec_japan_lite_train.yml](../../configs/rec/multi_languages/rec_japan_lite_train.yml)|3.7M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar) | + + + +### 3. Text Angle Classification Model +|model name|description|config|model size|download| +|-|-|-|-|-| +|ch_ppocr_mobile_v1.1_cls_quant|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|0.5M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb) | +|ch_ppocr_mobile_v1.1_cls|Original model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|850kb|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | + + +## OCR model list(V1.0, updated on 7.16) +|model name|description|detection model|recognition model|recognition model supporting space recognition| +|-|-|-|-|-| +|chinese_db_crnn_mobile|8.6M lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar) | [inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar) |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar) | [inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar) |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md new file mode 100644 index 0000000000000000000000000000000000000000..b7c1f6dfd4c401827f468169711d9b0a8a499133 --- /dev/null +++ b/doc/doc_en/quickstart_en.md @@ -0,0 +1,102 @@ + +# Quick start of Chinese OCR model + +## 1. Prepare for the environment + +Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment. + +* Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md). + +## 2.inference models + +The detection and recognition models on the mobile and server sides are as follows. For more models (including multiple languages), please refer to [PP-OCR v1.1 series model list](./doc_ch/models_list.md) + + +| Model introduction | Model name | Recommended scene | Detection model | Direction Classifier | Recognition model | +| ------------ | --------------- | ----------------|---- | ---------- | -------- | +| Ultra-lightweight Chinese OCR model(8.1M) | ch_ppocr_mobile_v1.1_xx |Mobile-side/Server-side|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | +| Universal Chinese OCR model(155.1M) |ch_ppocr_server_v1.1_xx|Server-side |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | + +* If `wget` is not installed in the windows environment, you can copy the link to the browser to download when downloading the model, then uncompress it and place it in the corresponding directory. + +Copy the download address of the `inference model` for detection and recognition in the table above, and uncompress them. + +``` +mkdir inference && cd inference +# Download the detection model and unzip +wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} +# Download the recognition model and unzip +wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package} +# Download the direction classifier model and unzip +wget {url/of/classification/inference_model} && tar xf {name/of/classification/inference_model/package} +cd .. +``` + +Take the ultra-lightweight model as an example: + +``` +mkdir inference && cd inference +# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_infer.tar +# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it +wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_infer.tar +# Download the direction classifier model of the ultra-lightweight Chinese OCR model and uncompress it +wget https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar && tar xf ch_ppocr_mobile_v1.1_cls_infer.tar +cd .. +``` + +After decompression, the file structure should be as follows: + +``` +|-inference + |-ch_ppocr_mobile_v1.1_det_infer + |- model + |- params + |-ch_ppocr_mobile_v1.1_rec_infer + |- model + |- params + |-ch_ppocr_mobile_v1.1_cls_infer + |- model + |- params + ... +``` + +## 3. Single image or image set prediction + +* The following code implements text detection and recognition process. When performing prediction, you need to specify the path of a single image or image set through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `rec_model_dir` specifies the path to identify the inference model, the parameter `use_angle_cls` specifies whether to use the direction classifier, the parameter `cls_model_dir` specifies the path to identify the direction classifier model, the parameter `use_space_char` specifies whether to predict the space char. The visual results are saved to the `./inference_results` folder by default. + + + +```bash + +# Predict a single image specified by image_dir +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True + +# Predict imageset specified by image_dir +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True + +# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False +``` + +- Universal Chinese OCR model + +Please follow the above steps to download the corresponding models and update the relevant parameters, The example is as follows. + +``` +# Predict a single image specified by image_dir +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_server_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True +``` + +* Note + - If you want to use the recognition model which does not support space char recognition, please update the source code to the latest version and add parameters `--use_space_char=False`. + - If you do not want to use direction classifier, please update the source code to the latest version and add parameters `--use_angle_cls=False`. + + +For more text detection and recognition tandem reasoning, please refer to the document tutorial +: [Inference with Python inference engine](./inference_en.md)。 + +In addition, the tutorial also provides other deployment methods for the Chinese OCR model: +- [Server-side C++ inference](../../deploy/cpp_infer/readme_en.md) +- [Service deployment](../../deploy/pdserving/readme_en.md) +- [End-to-end deployment](../../deploy/lite/readme_en.md) diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index ac1bc2f335bd9470924c7e5934265a23b26342c6..61d829b8b5cb917ac8ba135f4588d73ac89a2f8c 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -1,5 +1,22 @@ ## TEXT RECOGNITION +- [DATA PREPARATION](#DATA_PREPARATION) + - [Dataset Download](#Dataset_download) + - [Costom Dataset](#Costom_Dataset) + - [Dictionary](#Dictionary) + - [Add Space Category](#Add_space_category) + +- [TRAINING](#TRAINING) + - [Data Augmentation](#Data_Augmentation) + - [Training](#Training) + - [Multi-language](#Multi_language) + +- [EVALUATION](#EVALUATION) + +- [PREDICTION](#PREDICTION) + - [Training engine prediction](#Training_engine_prediction) + + ### DATA PREPARATION @@ -13,11 +30,14 @@ The default storage path for training data is `PaddleOCR/train_data`, if you alr ln -sf /train_data/dataset ``` - + * Dataset download If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark +If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path. + + * Use your own dataset: If you want to use your own data for training, please refer to the following to organize your data. @@ -70,7 +90,7 @@ Similar to the training set, the test set also needs to be provided a folder con |- word_003.jpg | ... ``` - + - Dictionary Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. @@ -90,12 +110,36 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters. -`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters. +`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters + +`ppocr/utils/french_dict.txt` is a French dictionary with 118 characters + +`ppocr/utils/japan_dict.txt` is a French dictionary with 4399 characters + +`ppocr/utils/korean_dict.txt` is a French dictionary with 3636 characters + +`ppocr/utils/german_dict.txt` is a French dictionary with 131 characters + +You can use it on demand. + +The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**, +If you like, you can submit the dictionary file to [utils](../../ppocr/utils) and we will thank you in the Repo. -You can use them if needed. To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. +- Custom dictionary + +If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch. + + +- Add space category + +If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `true`. + +**Note: use_space_char only takes effect when character_type=ch** + + ### TRAINING PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: @@ -114,13 +158,22 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar Start training: ``` -# Set PYTHONPATH path -export PYTHONPATH=$PYTHONPATH:. # GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES export CUDA_VISIBLE_DEVICES=0,1,2,3 -# Training icdar15 English data -python3 tools/train.py -c configs/rec/rec_icdar15_train.yml +# Training icdar15 English data and saving the log as train_rec.log +python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log ``` + +- Data Augmentation + +PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file. + +The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse. + +Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) + + +- Training PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process. @@ -131,7 +184,10 @@ If the evaluation set is large, the test will be time-consuming. It is recommend | Configuration file | Algorithm | backbone | trans | seq | pred | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | +| [rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| [rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | @@ -142,7 +198,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend | rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention | | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | -For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: +For training Chinese data, it is recommended to use +训练中文数据,推荐使用[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: co Take `rec_mv3_none_none_ctc.yml` as an example: ``` @@ -178,8 +235,43 @@ Optimizer: ``` **Note that the configuration file for prediction/evaluation must be consistent with the training.** + +- Multi-language + +PaddleOCR also provides multi-language. The configuration file in `configs/rec/multi_languages` provides multi-language configuration files. Currently, the multi-language algorithms supported by PaddleOCR are: + +| Configuration file | Algorithm name | backbone | trans | seq | pred | language | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | +The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi. + +If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file: + +Take `rec_french_lite_train` as an example: + +``` +Global: + ... + # Add a custom dictionary, if you modify the dictionary + # please point the path to the new dictionary + character_dict_path: ./ppocr/utils/french_dict.txt + # Add data augmentation during training + distort: true + # Identify spaces + use_space_char: true + ... + # Modify reader type + reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml + ... +... +``` + ### EVALUATION The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. @@ -187,11 +279,13 @@ The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` ``` export CUDA_VISIBLE_DEVICES=0 # GPU evaluation, Global.checkpoints is the weight to be tested -python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy +python3 tools/eval.py -c configs/rec/rec_icdar15_reader.yml -o Global.checkpoints={path/to/weights}/best_accuracy ``` + ### PREDICTION + * Training engine prediction Using the model trained by paddleocr, you can quickly get prediction through the following script. @@ -200,7 +294,7 @@ The default prediction picture is stored in `infer_img`, and the weight is speci ``` # Predict English results -python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg +python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg ``` Input image: @@ -215,11 +309,11 @@ infer_img: doc/imgs_words/en/word_1.png word : joint ``` -The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`, you can use the following command to predict the Chinese model: +The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml`, you can use the following command to predict the Chinese model: ``` # Predict Chinese results -python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg +python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg ``` Input image: diff --git a/doc/doc_en/reference_en.md b/doc/doc_en/reference_en.md new file mode 100644 index 0000000000000000000000000000000000000000..55c56f5617bc8137d4d96bf887a0aa9a88c3f90f --- /dev/null +++ b/doc/doc_en/reference_en.md @@ -0,0 +1,55 @@ +# REFERENCE + +``` +1. EAST: +@inproceedings{zhou2017east, + title={EAST: an efficient and accurate scene text detector}, + author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun}, + booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition}, + pages={5551--5560}, + year={2017} +} + +2. DB: +@article{liao2019real, + title={Real-time Scene Text Detection with Differentiable Binarization}, + author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang}, + journal={arXiv preprint arXiv:1911.08947}, + year={2019} +} + +3. DTRB: +@inproceedings{baek2019wrong, + title={What is wrong with scene text recognition model comparisons? dataset and model analysis}, + author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={4715--4723}, + year={2019} +} + +4. SAST: +@inproceedings{wang2019single, + title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning}, + author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming}, + booktitle={Proceedings of the 27th ACM International Conference on Multimedia}, + pages={1277--1285}, + year={2019} +} + +5. SRN: +@article{yu2020towards, + title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks}, + author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui}, + journal={arXiv preprint arXiv:2003.12294}, + year={2020} +} + +6. end2end-psl: +@inproceedings{sun2019chinese, + title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning}, + author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={9086--9095}, + year={2019} +} +``` \ No newline at end of file diff --git a/doc/doc_en/tree_en.md b/doc/doc_en/tree_en.md new file mode 100644 index 0000000000000000000000000000000000000000..5919013bc60395f7e5dbcea19596f91406610f58 --- /dev/null +++ b/doc/doc_en/tree_en.md @@ -0,0 +1,208 @@ +# Overall directory structure + +The overall directory structure of PaddleOCR is introduced as follows: + +``` +PaddleOCR +├── configs // configuration file, you can select model structure and modify hyperparameters through yml file +│ ├── cls // Related configuration files of direction classifier +│ │ ├── cls_mv3.yml // training configuration related, including backbone network, head, loss, optimizer +│ │ └── cls_reader.yml // Data reading related, data reading method, data storage path +│ ├── det // Detection related configuration files +│ │ ├── det_db_icdar15_reader.yml // data read +│ │ ├── det_mv3_db.yml // training configuration +│ │ ... +│ └── rec // Identify related configuration files +│ ├── rec_benchmark_reader.yml // LMDB format data reading related +│ ├── rec_chinese_common_train.yml // General Chinese training configuration +│ ├── rec_icdar15_reader.yml // simple data reading related, including data reading function, data path, label file +│ ... +├── deploy // deployment related +│ ├── android_demo // android_demo +│ │ ... +│ ├── cpp_infer // C++ infer +│ │ ├── CMakeLists.txt // Cmake file +│ │ ├── docs // documentation +│ │ │ └── windows_vs2019_build.md +│ │ ├── include +│ │ │ ├── clipper.h // clipper library +│ │ │ ├── config.h // infer configuration +│ │ │ ├── ocr_cls.h // direction classifier +│ │ │ ├── ocr_det.h // text detection +│ │ │ ├── ocr_rec.h // text recognition +│ │ │ ├── postprocess_op.h // postprocess after detection +│ │ │ ├── preprocess_op.h // preprocess detection +│ │ │ └── utility.h // tools +│ │ ├── readme.md // documentation +│ │ ├── ... +│ │ ├── src // source file +│ │ │ ├── clipper.cpp +│ │ │ ├── config.cpp +│ │ │ ├── main.cpp +│ │ │ ├── ocr_cls.cpp +│ │ │ ├── ocr_det.cpp +│ │ │ ├── ocr_rec.cpp +│ │ │ ├── postprocess_op.cpp +│ │ │ ├── preprocess_op.cpp +│ │ │ └── utility.cpp +│ │ └── tools // compile and execute script +│ │ ├── build.sh // compile script +│ │ ├── config.txt // configuration file +│ │ └── run.sh // Test startup script +│ ├── docker +│ │ └── hubserving +│ │ ├── cpu +│ │ │ └── Dockerfile +│ │ ├── gpu +│ │ │ └── Dockerfile +│ │ ├── README_cn.md +│ │ ├── README.md +│ │ └── sample_request.txt +│ ├── hubserving // hubserving +│ │ ├── ocr_det // text detection +│ │ │ ├── config.json // serving configuration +│ │ │ ├── __init__.py +│ │ │ ├── module.py // prediction model +│ │ │ └── params.py // prediction parameters +│ │ ├── ocr_rec // text recognition +│ │ │ ├── config.json +│ │ │ ├── __init__.py +│ │ │ ├── module.py +│ │ │ └── params.py +│ │ └── ocr_system // system forecast +│ │ ├── config.json +│ │ ├── __init__.py +│ │ ├── module.py +│ │ └── params.py +│ ├── imgs // prediction picture +│ │ ├── cpp_infer_pred_12.png +│ │ └── demo.png +│ ├── ios_demo // ios demo +│ │ ... +│ ├── lite // lite deployment +│ │ ├── cls_process.cc // direction classifier data processing +│ │ ├── cls_process.h +│ │ ├── config.txt // check configuration parameters +│ │ ├── crnn_process.cc // crnn data processing +│ │ ├── crnn_process.h +│ │ ├── db_post_process.cc // db data processing +│ │ ├── db_post_process.h +│ │ ├── Makefile // compile file +│ │ ├── ocr_db_crnn.cc // series prediction +│ │ ├── prepare.sh // data preparation +│ │ ├── readme.md // documentation +│ │ ... +│ ├── pdserving // pdserving deployment +│ │ ├── det_local_server.py // fast detection version, easy deployment and fast prediction +│ │ ├── det_web_server.py // Full version of detection, high stability and distributed deployment +│ │ ├── ocr_local_server.py // detection + identification quick version +│ │ ├── ocr_web_client.py // client +│ │ ├── ocr_web_server.py // detection + identification full version +│ │ ├── readme.md // documentation +│ │ ├── rec_local_server.py // recognize quick version +│ │ └── rec_web_server.py // Identify the full version +│ └── slim +│ └── quantization // quantization related +│ ├── export_model.py // export model +│ ├── quant.py // quantization +│ └── README.md // Documentation +├── doc // Documentation tutorial +│ ... +├── paddleocr.py +├── ppocr // network core code +│ ├── data // data processing +│ │ ├── cls // direction classifier +│ │ │ ├── dataset_traversal.py // Data transmission, define data reader, read data and form batch +│ │ │ └── randaugment.py // Random data augmentation operation +│ │ ├── det // detection +│ │ │ ├── data_augment.py // data augmentation operation +│ │ │ ├── dataset_traversal.py // Data transmission, define data reader, read data and form batch +│ │ │ ├── db_process.py // db data processing +│ │ │ ├── east_process.py // east data processing +│ │ │ ├── make_border_map.py // Generate boundary map +│ │ │ ├── make_shrink_map.py // Generate shrink map +│ │ │ ├── random_crop_data.py // random crop +│ │ │ └── sast_process.py // sast data processing +│ │ ├── reader_main.py // main function of data reader +│ │ └── rec // recognation +│ │ ├── dataset_traversal.py // Data transmission, define data reader, including LMDB_Reader and Simple_Reader +│ │ └── img_tools.py // Data processing related, including data normalization and disturbance +│ ├── __init__.py +│ ├── modeling // networking related +│ │ ├── architectures // Model architecture, which defines the various modules required by the model +│ │ │ ├── cls_model.py // direction classifier +│ │ │ ├── det_model.py // detection +│ │ │ └── rec_model.py // recognition +│ │ ├── backbones // backbone network +│ │ │ ├── det_mobilenet_v3.py // detect mobilenet_v3 +│ │ │ ├── det_resnet_vd.py +│ │ │ ├── det_resnet_vd_sast.py +│ │ │ ├── rec_mobilenet_v3.py // recognize mobilenet_v3 +│ │ │ ├── rec_resnet_fpn.py +│ │ │ └── rec_resnet_vd.py +│ │ ├── common_functions.py // common functions +│ │ ├── heads +│ │ │ ├── cls_head.py // class header +│ │ │ ├── det_db_head.py // db detection head +│ │ │ ├── det_east_head.py // east detection head +│ │ │ ├── det_sast_head.py // sast detection head +│ │ │ ├── rec_attention_head.py // recognition attention +│ │ │ ├── rec_ctc_head.py // recognition ctc +│ │ │ ├── rec_seq_encoder.py // recognition sequence code +│ │ │ ├── rec_srn_all_head.py // srn related +│ │ │ └── self_attention // srn attention +│ │ │ └── model.py +│ │ ├── losses // loss function +│ │ │ ├── cls_loss.py // Directional classifier loss function +│ │ │ ├── det_basic_loss.py // detect basic loss +│ │ │ ├── det_db_loss.py // DB loss +│ │ │ ├── det_east_loss.py // EAST loss +│ │ │ ├── det_sast_loss.py // SAST loss +│ │ │ ├── rec_attention_loss.py // attention loss +│ │ │ ├── rec_ctc_loss.py // ctc loss +│ │ │ └── rec_srn_loss.py // srn loss +│ │ └── stns // Spatial transformation network +│ │ └── tps.py // TPS conversion +│ ├── optimizer.py // optimizer +│ ├── postprocess // post-processing +│ │ ├── db_postprocess.py // DB postprocess +│ │ ├── east_postprocess.py // East postprocess +│ │ ├── lanms // lanms related +│ │ │ ... +│ │ ├── locality_aware_nms.py // nms +│ │ └── sast_postprocess.py // sast post-processing +│ └── utils // tools +│ ├── character.py // Character processing, including text encoding and decoding, and calculation of prediction accuracy +│ ├── check.py // parameter loading check +│ ├── ic15_dict.txt // English number dictionary, case sensitive +│ ├── ppocr_keys_v1.txt // Chinese dictionary, used to train Chinese models +│ ├── save_load.py // model save and load function +│ ├── stats.py // Statistics +│ └── utility.py // Tool functions, including related check tools such as whether the input parameters are legal +├── README_en.md // documentation +├── README.md +├── requirments.txt // installation dependencies +├── setup.py // whl package packaging script +└── tools // start tool + ├── eval.py // evaluation function + ├── eval_utils // evaluation tools + │ ├── eval_cls_utils.py // category related + │ ├── eval_det_iou.py // detect iou related + │ ├── eval_det_utils.py // detection related + │ ├── eval_rec_utils.py // recognition related + │ └── __init__.py + ├── export_model.py // export infer model + ├── infer // Forecast based on prediction engine + │ ├── predict_cls.py + │ ├── predict_det.py + │ ├── predict_rec.py + │ ├── predict_system.py + │ └── utility.py + ├── infer_cls.py // Predict classification based on training engine + ├── infer_det.py // Predictive detection based on training engine + ├── infer_rec.py // Predictive recognition based on training engine + ├── program.py // overall process + ├── test_hubserving.py + └── train.py // start training + +``` diff --git a/doc/doc_en/tricks_en.md b/doc/doc_en/tricks_en.md new file mode 100644 index 0000000000000000000000000000000000000000..eab9c89236ca86d4e473fbb2776941fdd3e7567d --- /dev/null +++ b/doc/doc_en/tricks_en.md @@ -0,0 +1,68 @@ +## Tricks +Here we have sorted out some Chinese OCR training and prediction tricks, which are being updated continuously. You are welcome to contribute more OCR tricks ~ + +- [Replace Backbone Network](#ReplaceBackboneNetwork) +- [Long Chinese Text Recognition](#LongChineseTextRecognition) +- [Space Recognition](#SpaceRecognition) + + +#### 1、Replace Backbone Network +- **Problem Description** + + At present, ResNet_vd series and MobileNetV3 series are the backbone networks used in PaddleOCR, whether replacing the other backbone networks will help to improve the accuracy? What should be paid attention to when replacing? + +- **Tips** + - Whether text detection or text recognition, the choice of backbone network is a trade-off between prediction effect and prediction efficiency. Generally, a larger backbone network is selected, e.g. ResNet101_vd, then the performance of the detection or recognition is more accurate, but the time cost will increase accordingly. And a smaller backbone network is selected, e.g. MobileNetV3_small_x0_35, the prediction speed is faster, but the accuracy of detection or recognition will be reduced. Fortunately, the detection or recognition effect of different backbone networks is positively correlated with the performance of ImageNet 1000 classification task. [**PaddleClas**](https://github.com/PaddlePaddle/PaddleClas/blob/master/README_en.md) have sorted out the 23 series of classification network structures, such as ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet. It provides the top1 accuracy of classification, the time cost of GPU(V100 and T4) and CPU(SD 855), and the 117 pretrained models [**download addresses**](https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html). + + - Similar as the 4 stages of ResNet, the replacement of text detection backbone network is to determine those four stages to facilitate the integration of FPN like the object detection heads. In addition, for the text detection problem, the pre trained model in ImageNet1000 can accelerate the convergence and improve the accuracy. + + - In order to replace the backbone network of text recognition, we need to pay attention to the descending position of network width and height stride. Since the ratio between width and height is large in chinese text recognition, the frequency of height decrease is less and the frequency of width decrease is more. You can refer the [modifies of MobileNetV3](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py) in PaddleOCR. + + +#### 2、Long Chinese Text Recognition +- **Problem Description** + The maximum resolution of Chinese recognition model during training is [3,32,320], if the text image to be recognized is too long, as shown in the figure below, how to adapt? + +
+ +
+ +- **Tips** + + During the training, the training samples are not directly resized to [3,32,320]. At first, the height of samples are resized to 32 and keep the ratio between the width and the height. When the width is less than 320, the excess parts are padding 0. Besides, when the ratio between the width and the height of the samples is larger than 10, these samples will be ignored. When the prediction for one image, do as above, but do not limit the max ratio between the width and the height. When the prediction for an images batch, do as training, but the resized target width is the longest width of the images in the batch. [Code as following](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/predict_rec.py): + + ``` + def resize_norm_img(self, img, max_wh_ratio): + imgC, imgH, imgW = self.rec_image_shape + assert imgC == img.shape[2] + if self.character_type == "ch": + imgW = int((32 * max_wh_ratio)) + h, w = img.shape[:2] + ratio = w / float(h) + if math.ceil(imgH * ratio) > imgW: + resized_w = imgW + else: + resized_w = int(math.ceil(imgH * ratio)) + resized_image = cv2.resize(img, (resized_w, imgH)) + resized_image = resized_image.astype('float32') + resized_image = resized_image.transpose((2, 0, 1)) / 255 + resized_image -= 0.5 + resized_image /= 0.5 + padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) + padding_im[:, :, 0:resized_w] = resized_image + return padding_im + ``` + + +#### 3、Space Recognition +- **Problem Description** + + As shown in the figure below, for Chinese and English mixed scenes, in order to facilitate reading and using the recognition results, it is often necessary to recognize the spaces between words. How can this situation be adapted? + +
+ +
+ +- **Tips** + + There are two possible methods for space recognition. (1) Optimize the text detection. For spliting the text at the space in detection results, it needs to divide the text line with space into many segments when label the data for detection. (2) Optimize the text recognition. The space character is introduced into the recognition dictionary. Label the blank line in the training data for text recognition. In addition, we can also concat multiple word lines to synthesize the training data with spaces. PaddleOCR currently uses the second method. diff --git a/doc/doc_en/update_en.md b/doc/doc_en/update_en.md index c3e868b7421b5add6e1ee7f8aca8f7af0fecc999..a8f75f6a8049fe5a4ac966178fd3096e646db9db 100644 --- a/doc/doc_en/update_en.md +++ b/doc/doc_en/update_en.md @@ -1,5 +1,16 @@ # RECENT UPDATES - +- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 +- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md) +- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294) +- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519) +- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite) +- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addtion, the benchmarks of the ultra-lightweight Chinese OCR model are provided. +- 2020.7.15, Add several related datasets, data annotation and synthesis tools. +- 2020.7.9 Add a new model to support recognize the character "space". +- 2020.7.9 Add the data augument and learning rate decay strategies during training. +- 2020.6.8 Add [datasets](./doc/doc_en/datasets_en.md) and keep updating +- 2020.6.5 Support exporting `attention` model to `inference_model` +- 2020.6.5 Support separate prediction and recognition, output result score - 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support separate prediction and recognition, output result score - 2020.5.30 Provide Lightweight Chinese OCR online experience diff --git a/doc/doc_en/vertical_and_multilingual_datasets_en.md b/doc/doc_en/vertical_and_multilingual_datasets_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9d5ecff7e327213656870845f9897321f6521df6 --- /dev/null +++ b/doc/doc_en/vertical_and_multilingual_datasets_en.md @@ -0,0 +1,79 @@ +# Vertical multi-language OCR dataset +Here we have sorted out the commonly used vertical multi-language OCR dataset datasets, which are being updated continuously. We welcome you to contribute datasets ~ +- [Chinese urban license plate dataset](#Chinese urban license plate dataset) +- [Bank credit card dataset](#Bank credit card dataset) +- [Captcha dataset-Captcha](#Captcha dataset-Captcha) +- [multi-language dataset](#multi-language dataset) + + + +## Chinese urban license plate dataset + +- **Data source**:[https://github.com/detectRecog/CCPD](https://github.com/detectRecog/CCPD) + +- **Data introduction**: It contains more than 250000 vehicle license plate images and vehicle license plate detection and recognition information labeling. It contains the following license plate image information in different scenes. + * CCPD-Base: General license plate picture + * CCPD-DB: The brightness of license plate area is bright, dark or uneven + * CCPD-FN: The license plate is farther or closer to the camera location + * CCPD-Rotate: License plate includes rotation (horizontal 20\~50 degrees, vertical-10\~10 degrees) + * CCPD-Tilt: License plate includes rotation (horizontal 15\~45 degrees, vertical 15\~45 degrees) + * CCPD-Blur: The license plate contains blurring due to camera lens jitter + * CCPD-Weather: The license plate is photographed on rainy, snowy or foggy days + * CCPD-Challenge: So far, some of the most challenging images in license plate detection and recognition tasks + * CCPD-NP: Pictures of new cars without license plates. + + ![](../datasets/ccpd_demo.png) + + +- **Download address** + * Baidu cloud download address (extracted code is hm0U): [https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw](https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw) + * Google drive download address:[https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view](https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view) + + + +## Bank credit card dataset + +- **Data source**: [https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870](https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870) + +- **Data introduction**: There are three types of training data + * 1.Sample card data of China Merchants Bank: including card image data and annotation data, a total of 618 pictures + * 2.Single character data: including pictures and annotation data, 37 pictures in total. + * 3.There are only other bank cards, no more detailed information, a total of 50 pictures. + + * The demo image is shown as follows. The annotation information is stored in excel, and the demo image below is marked as + * Top 8 card number: 62257583 + * Card type: card of our bank + * End of validity: 07/41 + * Chinese phonetic alphabet of card users: MICHAEL + + ![](../datasets/cmb_demo.jpg) + +- **Download address**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip) + + + + +## Captcha dataset-Captcha + +- **Data source**: [https://github.com/lepture/captcha](https://github.com/lepture/captcha) + +- **Data introduction**: This is a toolkit for data synthesis. You can output captcha images according to the input text. Use the toolkit to generate several demo images as follows. + + ![](../datasets/captcha_demo.png) + +- **Download address**: The dataset is generated and has no download address. + + + + +## multi-language dataset(Multi-lingual scene text detection and recognition) + +- **Data source**: [https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads) + +- **Data introduction**: Multi language detection dataset MLT contains both language recognition and detection tasks. + * In the detection task, the training set contains 10000 images in 10 languages, and each language contains 1000 training images. The test set contains 10000 images. + * In the recognition task, the training set contains 111998 samples. + + +- **Download address**: The training set is large and can be downloaded in two parts. It can only be downloaded after registering on the website: +[https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads) diff --git a/doc/doc_en/visualization_en.md b/doc/doc_en/visualization_en.md new file mode 100644 index 0000000000000000000000000000000000000000..6696cf935c0f343bc3bca71003aecc22c6d98e0d --- /dev/null +++ b/doc/doc_en/visualization_en.md @@ -0,0 +1,53 @@ +# Visualization + + +## ppocr_server_1.1 + +
+ + + + + + + + +
+ + + + +## ppocr_mobile_1.0 + +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ + + +## ppocr_server_1.0 + +
+ +
+ +
+ +
+ +
+ +
+ diff --git a/doc/doc_en/whl_en.md b/doc/doc_en/whl_en.md new file mode 100644 index 0000000000000000000000000000000000000000..ffbced346f7a3f661f382b5f2d826c20fef2c012 --- /dev/null +++ b/doc/doc_en/whl_en.md @@ -0,0 +1,308 @@ +# paddleocr package + +## Get started quickly +### install package +install by pypi +```bash +pip install paddleocr +``` + +build own whl package and install +```bash +python3 setup.py bdist_wheel +pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr +``` +### 1. Use by code + +* detection classification and recognition +```python +from paddleocr import PaddleOCR,draw_ocr +# Paddleocr supports Chinese, English, French, German, Korean and Japanese. +# You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan` +# to switch the language model in order. +ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' +result = ocr.ocr(img_path, cls=True) +for line in result: + print(line) + + +# draw result +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +Output will be a list, each item contains bounding box, text and recognition confidence +```bash +[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]] +[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]] +[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]] +...... +``` + +Visualization of results + +
+ +
+ +* detection and recognition +```python +from paddleocr import PaddleOCR,draw_ocr +ocr = PaddleOCR(lang='en') # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' +result = ocr.ocr(img_path) +for line in result: + print(line) + +# draw result +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +Output will be a list, each item contains bounding box, text and recognition confidence +```bash +[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]] +[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]] +[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]] +...... +``` + +Visualization of results + +
+ +
+ +* classification and recognition +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to load model into memory +img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png' +result = ocr.ocr(img_path, det=False, cls=True) +for line in result: + print(line) +``` + +Output will be a list, each item contains recognition text and confidence +```bash +['PAIN', 0.990372] +``` + +* only detection +```python +from paddleocr import PaddleOCR,draw_ocr +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' +result = ocr.ocr(img_path,rec=False) +for line in result: + print(line) + +# draw result +from PIL import Image + +image = Image.open(img_path).convert('RGB') +im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +Output will be a list, each item only contains bounding box +```bash +[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]] +[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]] +[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]] +...... +``` + +Visualization of results + +
+ +
+ +* only recognition +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR(lang='en') # need to run only once to load model into memory +img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png' +result = ocr.ocr(img_path, det=False, cls=False) +for line in result: + print(line) +``` + +Output will be a list, each item contains recognition text and confidence +```bash +['PAIN', 0.990372] +``` + +* only classification +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR(use_angle_cls=True) # need to run only once to load model into memory +img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png' +result = ocr.ocr(img_path, det=False, rec=False, cls=True) +for line in result: + print(line) +``` + +Output will be a list, each item contains classification result and confidence +```bash +['0', 0.99999964] +``` + +### Use by command line + +show help information +```bash +paddleocr -h +``` + +* detection classification and recognition +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true -cls true --lang en +``` + +Output will be a list, each item contains bounding box, text and recognition confidence +```bash +[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]] +[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]] +[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]] +...... +``` + +* detection and recognition +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --lang en +``` + +Output will be a list, each item contains bounding box, text and recognition confidence +```bash +[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]] +[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]] +[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]] +...... +``` + +* classification and recognition +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --lang en +``` + +Output will be a list, each item contains text and recognition confidence +```bash +['PAIN', 0.990372] +``` + +* only detection +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false +``` + +Output will be a list, each item only contains bounding box +```bash +[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]] +[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]] +[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]] +...... +``` + +* only recognition +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --cls false --lang en +``` + +Output will be a list, each item contains text and recognition confidence +```bash +['PAIN', 0.990372] +``` + +* only classification +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --rec false +``` + +Output will be a list, each item contains classification result and confidence +```bash +['0', 0.99999964] +``` + +## Use custom model +When the built-in model cannot meet the needs, you need to use your own trained model. +First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows + +### 1. Use by code + +```python +from paddleocr import PaddleOCR,draw_ocr +# The path of detection and recognition model must contain model and params files +ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True) +img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' +result = ocr.ocr(img_path, cls=True) +for line in result: + print(line) + +# draw result +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +### Use by command line + +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true +``` + +## Parameter Description + +| Parameter | Description | Default value | +|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------| +| use_gpu | use GPU or not | TRUE | +| gpu_mem | GPU memory size used for initialization | 8000M | +| image_dir | The images path or folder path for predicting when used by the command line | | +| det_algorithm | Type of detection algorithm selected | DB | +| det_model_dir | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None | +| det_max_side_len | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally | 960 | +| det_db_thresh | Binarization threshold value of DB output map | 0.3 | +| det_db_box_thresh | The threshold value of the DB output box. Boxes score lower than this value will be discarded | 0.5 | +| det_db_unclip_ratio | The expanded ratio of DB output box | 2 | +| det_east_score_thresh | Binarization threshold value of EAST output map | 0.8 | +| det_east_cover_thresh | The threshold value of the EAST output box. Boxes score lower than this value will be discarded | 0.1 | +| det_east_nms_thresh | The NMS threshold value of EAST model output box | 0.2 | +| rec_algorithm | Type of recognition algorithm selected | CRNN | +| rec_model_dir | the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/rec`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None | +| rec_image_shape | image shape of recognition algorithm | "3,32,320" | +| rec_char_type | Character type of recognition algorithm, Chinese (ch) or English (en) | ch | +| rec_batch_num | When performing recognition, the batchsize of forward images | 30 | +| max_text_length | The maximum text length that the recognition algorithm can recognize | 25 | +| rec_char_dict_path | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2 | ./ppocr/utils/ppocr_keys_v1.txt | +| use_space_char | Whether to recognize spaces | TRUE | +| use_angle_cls | Whether to load classification model | FALSE | +| cls_model_dir | the classification inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/cls`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None | +| cls_image_shape | image shape of classification algorithm | "3,48,192" | +| label_list | label list of classification algorithm | ['0','180'] | +| cls_batch_num | When performing classification, the batchsize of forward images | 30 | +| enable_mkldnn | Whether to enable mkldnn | FALSE | +| use_zero_copy_run | Whether to forward by zero_copy_run | FALSE | +| lang | The support language, now only Chinese(ch)、English(en)、French(french)、German(german)、Korean(korean)、Japanese(japan) are supported | ch | +| det | Enable detction when `ppocr.ocr` func exec | TRUE | +| rec | Enable recognition when `ppocr.ocr` func exec | TRUE | +| cls | Enable classification when `ppocr.ocr` func exec | FALSE | diff --git a/doc/french.ttf b/doc/french.ttf new file mode 100644 index 0000000000000000000000000000000000000000..ab68fb197d4479b3b6dec6e85bd5cbaf433a87c5 Binary files /dev/null and b/doc/french.ttf differ diff --git a/doc/german.ttf b/doc/german.ttf new file mode 100644 index 0000000000000000000000000000000000000000..ab68fb197d4479b3b6dec6e85bd5cbaf433a87c5 Binary files /dev/null and b/doc/german.ttf differ diff --git a/doc/imgs/french_0.jpg b/doc/imgs/french_0.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0c3cc4de545dc932926632d296f5ba50e2cc5f6d Binary files /dev/null and b/doc/imgs/french_0.jpg differ diff --git a/doc/imgs/ger_1.jpg b/doc/imgs/ger_1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..050bf7c7b7ea23343f0c0e0bb6197c55f2f915be Binary files /dev/null and b/doc/imgs/ger_1.jpg differ diff --git a/doc/imgs/ger_2.jpg b/doc/imgs/ger_2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b922702bfcbd7496901ff63e6279ec80499f79b0 Binary files /dev/null and b/doc/imgs/ger_2.jpg differ diff --git a/doc/imgs/japan_1.jpg b/doc/imgs/japan_1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..ffcb2e4a801ddb443ecf113d9110b43308893822 Binary files /dev/null and b/doc/imgs/japan_1.jpg differ diff --git a/doc/imgs/japan_2.jpg b/doc/imgs/japan_2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0353874860a7bb275a4c4f61662d827a8351c2ff Binary files /dev/null and b/doc/imgs/japan_2.jpg differ diff --git a/doc/imgs/korean_1.jpg b/doc/imgs/korean_1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f1614e9f286a82261312731e13e8256410b160e3 Binary files /dev/null and b/doc/imgs/korean_1.jpg differ diff --git a/doc/imgs_en/img623.jpg b/doc/imgs_en/img623.jpg new file mode 100755 index 0000000000000000000000000000000000000000..2fae1b5b1ca1d557355933e93bbe268d8bba6778 Binary files /dev/null and b/doc/imgs_en/img623.jpg differ diff --git a/doc/imgs_results/1101.jpg b/doc/imgs_results/1101.jpg new file mode 100644 index 0000000000000000000000000000000000000000..fa8d809a9b133ca09e4265355493e5c60e311e44 Binary files /dev/null and b/doc/imgs_results/1101.jpg differ diff --git a/doc/imgs_results/1102.jpg b/doc/imgs_results/1102.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6988b12c4b836e88b67897a7b7141e12e236e7c0 Binary files /dev/null and b/doc/imgs_results/1102.jpg differ diff --git a/doc/imgs_results/1103.jpg b/doc/imgs_results/1103.jpg new file mode 100644 index 0000000000000000000000000000000000000000..3437f60b8e587b0fda9c88aa37c001a68ace59b4 Binary files /dev/null and b/doc/imgs_results/1103.jpg differ diff --git a/doc/imgs_results/1104.jpg b/doc/imgs_results/1104.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9297be0787ad6cc89c43acfcd1abd010c512c45b Binary files /dev/null and b/doc/imgs_results/1104.jpg differ diff --git a/doc/imgs_results/1105.jpg b/doc/imgs_results/1105.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6280e5eec8c05125bcde2a171d767a3fc3f3ea4d Binary files /dev/null and b/doc/imgs_results/1105.jpg differ diff --git a/doc/imgs_results/1106.jpg b/doc/imgs_results/1106.jpg new file mode 100644 index 0000000000000000000000000000000000000000..61f3915d5a36b02537681687dafb0e2e9303eea2 Binary files /dev/null and b/doc/imgs_results/1106.jpg differ diff --git a/doc/imgs_results/1110.jpg b/doc/imgs_results/1110.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b0c63e7c47c9ddbd555df34f8a9c17bf7d93043d Binary files /dev/null and b/doc/imgs_results/1110.jpg differ diff --git a/doc/imgs_results/1112.jpg b/doc/imgs_results/1112.jpg new file mode 100644 index 0000000000000000000000000000000000000000..35bec155034ba5860620f8c9d387dbc71607d6fe Binary files /dev/null and b/doc/imgs_results/1112.jpg differ diff --git a/doc/imgs_results/det_res_img623_sast.jpg b/doc/imgs_results/det_res_img623_sast.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b2dd538f7729724a33516091d11c081c7c2c1bd7 Binary files /dev/null and b/doc/imgs_results/det_res_img623_sast.jpg differ diff --git a/doc/imgs_results/det_res_img_10_sast.jpg b/doc/imgs_results/det_res_img_10_sast.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c63faf1354601f25cedb57a3b87f4467999f5457 Binary files /dev/null and b/doc/imgs_results/det_res_img_10_sast.jpg differ diff --git a/doc/imgs_results/whl/11_det.jpg b/doc/imgs_results/whl/11_det.jpg new file mode 100644 index 0000000000000000000000000000000000000000..fe0cd23cc24457f5d7084fff0c63c239d09c9969 Binary files /dev/null and b/doc/imgs_results/whl/11_det.jpg differ diff --git a/doc/imgs_results/whl/11_det_rec.jpg b/doc/imgs_results/whl/11_det_rec.jpg new file mode 100644 index 0000000000000000000000000000000000000000..31c566478fd874d10a61dcd54635453e34c20e4c Binary files /dev/null and b/doc/imgs_results/whl/11_det_rec.jpg differ diff --git a/doc/imgs_results/whl/12_det.jpg b/doc/imgs_results/whl/12_det.jpg new file mode 100644 index 0000000000000000000000000000000000000000..1d5ccf2a6b5d3fa9516560e0cb2646ad6b917da6 Binary files /dev/null and b/doc/imgs_results/whl/12_det.jpg differ diff --git a/doc/imgs_results/whl/12_det_rec.jpg b/doc/imgs_results/whl/12_det_rec.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9db8b57e1279362db2c9f3d6a3ba36b77bf13775 Binary files /dev/null and b/doc/imgs_results/whl/12_det_rec.jpg differ diff --git a/doc/imgs_words/french/1.jpg b/doc/imgs_words/french/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..077ca28e70b74ed07fa637011c80219aecc448d5 Binary files /dev/null and b/doc/imgs_words/french/1.jpg differ diff --git a/doc/imgs_words/french/2.jpg b/doc/imgs_words/french/2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..38a73caa621710a7eb7378603e0152ba9c14dd41 Binary files /dev/null and b/doc/imgs_words/french/2.jpg differ diff --git a/doc/imgs_words/german/1.jpg b/doc/imgs_words/german/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d26ec9ed14de65c2d27e37693ff0da133e774b94 Binary files /dev/null and b/doc/imgs_words/german/1.jpg differ diff --git a/doc/imgs_words/japan/1.jpg b/doc/imgs_words/japan/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..684879749764a1b6063da32d7910bff911e855f4 Binary files /dev/null and b/doc/imgs_words/japan/1.jpg differ diff --git a/doc/imgs_words/korean/1.jpg b/doc/imgs_words/korean/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..48a89389ae880783a39a13e9b06a861b88948fba Binary files /dev/null and b/doc/imgs_words/korean/1.jpg differ diff --git a/doc/imgs_words/korean/2.jpg b/doc/imgs_words/korean/2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b24f28914d574be44e147943d906f8634f149ed5 Binary files /dev/null and b/doc/imgs_words/korean/2.jpg differ diff --git a/doc/japan.ttc b/doc/japan.ttc new file mode 100644 index 0000000000000000000000000000000000000000..ad68243b968fc87b207928594c585039859b75a9 Binary files /dev/null and b/doc/japan.ttc differ diff --git a/doc/joinus.PNG b/doc/joinus.PNG new file mode 100644 index 0000000000000000000000000000000000000000..fa11f286d7d2d56d18d94e9034c3be77c974d42f Binary files /dev/null and b/doc/joinus.PNG differ diff --git a/doc/joinus.jpg b/doc/joinus.jpg deleted file mode 100644 index 6a287f3145c1910a2e25db35a94f5cbb14380b9d..0000000000000000000000000000000000000000 Binary files a/doc/joinus.jpg and /dev/null differ diff --git a/doc/korean.ttf b/doc/korean.ttf new file mode 100644 index 0000000000000000000000000000000000000000..e638ce37f67ff1cd9babf73387786eaeb5c52968 Binary files /dev/null and b/doc/korean.ttf differ diff --git a/doc/ocr-android-easyedge.png b/doc/ocr-android-easyedge.png index 3c92fbee073788961825af9f44527b82c864d1fa..8b2a846fa406fcb9b1d3314d76ce8b4cc2e76f2a 100644 Binary files a/doc/ocr-android-easyedge.png and b/doc/ocr-android-easyedge.png differ diff --git a/doc/ppocr_framework.png b/doc/ppocr_framework.png new file mode 100644 index 0000000000000000000000000000000000000000..ab51c88fe694b210a98423868cd90874be3c09ed Binary files /dev/null and b/doc/ppocr_framework.png differ diff --git a/doc/tricks/long_text_examples.jpg b/doc/tricks/long_text_examples.jpg new file mode 100644 index 0000000000000000000000000000000000000000..457615295bbfad1f64b52086d38f19fd04ab5c62 Binary files /dev/null and b/doc/tricks/long_text_examples.jpg differ diff --git a/paddleocr.py b/paddleocr.py new file mode 100644 index 0000000000000000000000000000000000000000..7e9b2402ad792b4d690b1147f042203df46872a5 --- /dev/null +++ b/paddleocr.py @@ -0,0 +1,279 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys + +__dir__ = os.path.dirname(__file__) +sys.path.append(os.path.join(__dir__, '')) + +import cv2 +import numpy as np +from pathlib import Path +import tarfile +import requests +from tqdm import tqdm + +from tools.infer import predict_system +from ppocr.utils.utility import initial_logger + +logger = initial_logger() +from ppocr.utils.utility import check_and_read_gif, get_image_file_list + +__all__ = ['PaddleOCR'] + +model_urls = { + 'det': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar', + 'rec': { + 'ch': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/ppocr_keys_v1.txt' + }, + 'en': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/ic15_dict.txt' + }, + 'french': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/french_dict.txt' + }, + 'german': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/german_dict.txt' + }, + 'korean': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/korean_dict.txt' + }, + 'japan': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/japan_dict.txt' + } + }, + 'cls': + 'https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar' +} + +SUPPORT_DET_MODEL = ['DB'] +SUPPORT_REC_MODEL = ['CRNN'] +BASE_DIR = os.path.expanduser("~/.paddleocr/") + + +def download_with_progressbar(url, save_path): + response = requests.get(url, stream=True) + total_size_in_bytes = int(response.headers.get('content-length', 0)) + block_size = 1024 # 1 Kibibyte + progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True) + with open(save_path, 'wb') as file: + for data in response.iter_content(block_size): + progress_bar.update(len(data)) + file.write(data) + progress_bar.close() + if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes: + logger.error("ERROR, something went wrong") + sys.exit(0) + + +def maybe_download(model_storage_directory, url): + # using custom model + if not os.path.exists(os.path.join( + model_storage_directory, 'model')) or not os.path.exists( + os.path.join(model_storage_directory, 'params')): + tmp_path = os.path.join(model_storage_directory, url.split('/')[-1]) + print('download {} to {}'.format(url, tmp_path)) + os.makedirs(model_storage_directory, exist_ok=True) + download_with_progressbar(url, tmp_path) + with tarfile.open(tmp_path, 'r') as tarObj: + for member in tarObj.getmembers(): + if "model" in member.name: + filename = 'model' + elif "params" in member.name: + filename = 'params' + else: + continue + file = tarObj.extractfile(member) + with open( + os.path.join(model_storage_directory, filename), + 'wb') as f: + f.write(file.read()) + os.remove(tmp_path) + + +def parse_args(): + import argparse + + def str2bool(v): + return v.lower() in ("true", "t", "1") + + parser = argparse.ArgumentParser() + # params for prediction engine + parser.add_argument("--use_gpu", type=str2bool, default=True) + parser.add_argument("--ir_optim", type=str2bool, default=True) + parser.add_argument("--use_tensorrt", type=str2bool, default=False) + parser.add_argument("--gpu_mem", type=int, default=8000) + + # params for text detector + parser.add_argument("--image_dir", type=str) + parser.add_argument("--det_algorithm", type=str, default='DB') + parser.add_argument("--det_model_dir", type=str, default=None) + parser.add_argument("--det_max_side_len", type=float, default=960) + + # DB parmas + parser.add_argument("--det_db_thresh", type=float, default=0.3) + parser.add_argument("--det_db_box_thresh", type=float, default=0.5) + parser.add_argument("--det_db_unclip_ratio", type=float, default=2.0) + + # EAST parmas + parser.add_argument("--det_east_score_thresh", type=float, default=0.8) + parser.add_argument("--det_east_cover_thresh", type=float, default=0.1) + parser.add_argument("--det_east_nms_thresh", type=float, default=0.2) + + # params for text recognizer + parser.add_argument("--rec_algorithm", type=str, default='CRNN') + parser.add_argument("--rec_model_dir", type=str, default=None) + parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320") + parser.add_argument("--rec_char_type", type=str, default='ch') + parser.add_argument("--rec_batch_num", type=int, default=30) + parser.add_argument("--max_text_length", type=int, default=25) + parser.add_argument("--rec_char_dict_path", type=str, default=None) + parser.add_argument("--use_space_char", type=bool, default=True) + + # params for text classifier + parser.add_argument("--use_angle_cls", type=str2bool, default=False) + parser.add_argument("--cls_model_dir", type=str, default=None) + parser.add_argument("--cls_image_shape", type=str, default="3, 48, 192") + parser.add_argument("--label_list", type=list, default=['0', '180']) + parser.add_argument("--cls_batch_num", type=int, default=30) + parser.add_argument("--cls_thresh", type=float, default=0.9) + + parser.add_argument("--enable_mkldnn", type=bool, default=False) + parser.add_argument("--use_zero_copy_run", type=bool, default=False) + + parser.add_argument("--lang", type=str, default='ch') + parser.add_argument("--det", type=str2bool, default=True) + parser.add_argument("--rec", type=str2bool, default=True) + parser.add_argument("--cls", type=str2bool, default=False) + return parser.parse_args() + + +class PaddleOCR(predict_system.TextSystem): + def __init__(self, **kwargs): + """ + paddleocr package + args: + **kwargs: other params show in paddleocr --help + """ + postprocess_params = parse_args() + postprocess_params.__dict__.update(**kwargs) + self.use_angle_cls = postprocess_params.use_angle_cls + lang = postprocess_params.lang + assert lang in model_urls[ + 'rec'], 'param lang must in {}, but got {}'.format( + model_urls['rec'].keys(), lang) + if postprocess_params.rec_char_dict_path is None: + postprocess_params.rec_char_dict_path = model_urls['rec'][lang][ + 'dict_path'] + + # init model dir + if postprocess_params.det_model_dir is None: + postprocess_params.det_model_dir = os.path.join(BASE_DIR, 'det') + if postprocess_params.rec_model_dir is None: + postprocess_params.rec_model_dir = os.path.join( + BASE_DIR, 'rec/{}'.format(lang)) + if postprocess_params.cls_model_dir is None: + postprocess_params.cls_model_dir = os.path.join(BASE_DIR, 'cls') + print(postprocess_params) + # download model + maybe_download(postprocess_params.det_model_dir, model_urls['det']) + maybe_download(postprocess_params.rec_model_dir, + model_urls['rec'][lang]['url']) + if self.use_angle_cls: + maybe_download(postprocess_params.cls_model_dir, model_urls['cls']) + + if postprocess_params.det_algorithm not in SUPPORT_DET_MODEL: + logger.error('det_algorithm must in {}'.format(SUPPORT_DET_MODEL)) + sys.exit(0) + if postprocess_params.rec_algorithm not in SUPPORT_REC_MODEL: + logger.error('rec_algorithm must in {}'.format(SUPPORT_REC_MODEL)) + sys.exit(0) + + postprocess_params.rec_char_dict_path = Path( + __file__).parent / postprocess_params.rec_char_dict_path + + # init det_model and rec_model + super().__init__(postprocess_params) + + def ocr(self, img, det=True, rec=True, cls=False): + """ + ocr with paddleocr + args: + img: img for ocr, support ndarray, img_path and list or ndarray + det: use text detection or not, if false, only rec will be exec. default is True + rec: use text recognition or not, if false, only det will be exec. default is True + """ + assert isinstance(img, (np.ndarray, list, str)) + if cls and not self.use_angle_cls: + print('cls should be false when use_angle_cls is false') + exit(-1) + self.use_angle_cls = cls + if isinstance(img, str): + image_file = img + img, flag = check_and_read_gif(image_file) + if not flag: + img = cv2.imread(image_file) + if img is None: + logger.error("error in loading image:{}".format(image_file)) + return None + if det and rec: + dt_boxes, rec_res = self.__call__(img) + return [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)] + elif det and not rec: + dt_boxes, elapse = self.text_detector(img) + if dt_boxes is None: + return None + return [box.tolist() for box in dt_boxes] + else: + if not isinstance(img, list): + img = [img] + if self.use_angle_cls: + img, cls_res, elapse = self.text_classifier(img) + if not rec: + return cls_res + rec_res, elapse = self.text_recognizer(img) + return rec_res + + +def main(): + # for com + args = parse_args() + image_file_list = get_image_file_list(args.image_dir) + if len(image_file_list) == 0: + logger.error('no images find in {}'.format(args.image_dir)) + return + ocr_engine = PaddleOCR() + for img_path in image_file_list: + print(img_path) + result = ocr_engine.ocr(img_path, + det=args.det, + rec=args.rec, + cls=args.cls) + for line in result: + print(line) diff --git a/ppocr/data/cls/__init__.py b/ppocr/data/cls/__init__.py new file mode 100755 index 0000000000000000000000000000000000000000..abf198b97e6e818e1fbe59006f98492640bcee54 --- /dev/null +++ b/ppocr/data/cls/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/ppocr/data/cls/dataset_traversal.py b/ppocr/data/cls/dataset_traversal.py new file mode 100755 index 0000000000000000000000000000000000000000..01f8c89c839f0c8f6d07ca6ad9676947ce25f6ab --- /dev/null +++ b/ppocr/data/cls/dataset_traversal.py @@ -0,0 +1,144 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import math +import random +import numpy as np +import cv2 + +from ppocr.utils.utility import initial_logger +from ppocr.utils.utility import get_image_file_list + +logger = initial_logger() + +from ppocr.data.rec.img_tools import resize_norm_img, warp +from ppocr.data.cls.randaugment import RandAugment + + +def random_crop(img): + img_h, img_w = img.shape[:2] + if img_w > img_h * 4: + w = random.randint(img_h * 2, img_w) + i = random.randint(0, img_w - w) + + img = img[:, i:i + w, :] + return img + + +class SimpleReader(object): + def __init__(self, params): + if params['mode'] != 'train': + self.num_workers = 1 + else: + self.num_workers = params['num_workers'] + if params['mode'] != 'test': + self.img_set_dir = params['img_set_dir'] + self.label_file_path = params['label_file_path'] + self.use_gpu = params['use_gpu'] + self.image_shape = params['image_shape'] + self.mode = params['mode'] + self.infer_img = params['infer_img'] + self.use_distort = params['mode'] == 'train' and params['distort'] + self.randaug = RandAugment() + self.label_list = params['label_list'] + if "distort" in params: + self.use_distort = params['distort'] and params['use_gpu'] + if not params['use_gpu']: + logger.info( + "Distort operation can only support in GPU.Distort will be set to False." + ) + if params['mode'] == 'train': + self.batch_size = params['train_batch_size_per_card'] + self.drop_last = True + else: + self.batch_size = params['test_batch_size_per_card'] + self.drop_last = False + self.use_distort = False + + def __call__(self, process_id): + if self.mode != 'train': + process_id = 0 + + def get_device_num(): + if self.use_gpu: + gpus = os.environ.get("CUDA_VISIBLE_DEVICES", 1) + gpu_num = len(gpus.split(',')) + return gpu_num + else: + cpu_num = os.environ.get("CPU_NUM", 1) + return int(cpu_num) + + def sample_iter_reader(): + if self.mode != 'train' and self.infer_img is not None: + image_file_list = get_image_file_list(self.infer_img) + for single_img in image_file_list: + img = cv2.imread(single_img) + if img.shape[-1] == 1 or len(list(img.shape)) == 2: + img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) + norm_img = resize_norm_img(img, self.image_shape) + + norm_img = norm_img[np.newaxis, :] + yield norm_img + else: + with open(self.label_file_path, "rb") as fin: + label_infor_list = fin.readlines() + img_num = len(label_infor_list) + img_id_list = list(range(img_num)) + random.shuffle(img_id_list) + if sys.platform == "win32" and self.num_workers != 1: + print("multiprocess is not fully compatible with Windows." + "num_workers will be 1.") + self.num_workers = 1 + if self.batch_size * get_device_num( + ) * self.num_workers > img_num: + raise Exception( + "The number of the whole data ({}) is smaller than the batch_size * devices_num * num_workers ({})". + format(img_num, self.batch_size * get_device_num() * + self.num_workers)) + for img_id in range(process_id, img_num, self.num_workers): + label_infor = label_infor_list[img_id_list[img_id]] + substr = label_infor.decode('utf-8').strip("\n").split("\t") + label = self.label_list.index(substr[1]) + + img_path = self.img_set_dir + "/" + substr[0] + img = cv2.imread(img_path) + if img is None: + logger.info("{} does not exist!".format(img_path)) + continue + if img.shape[-1] == 1 or len(list(img.shape)) == 2: + img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) + + if self.use_distort: + img = warp(img, 10) + img = self.randaug(img) + norm_img = resize_norm_img(img, self.image_shape) + norm_img = norm_img[np.newaxis, :] + yield (norm_img, label) + + def batch_iter_reader(): + batch_outs = [] + for outs in sample_iter_reader(): + batch_outs.append(outs) + if len(batch_outs) == self.batch_size: + yield batch_outs + batch_outs = [] + if not self.drop_last: + if len(batch_outs) != 0: + yield batch_outs + + if self.infer_img is None: + return batch_iter_reader + return sample_iter_reader diff --git a/ppocr/data/cls/randaugment.py b/ppocr/data/cls/randaugment.py new file mode 100644 index 0000000000000000000000000000000000000000..21345c05be59f6d1c9ae5a8d396ffed2dd9b0ca1 --- /dev/null +++ b/ppocr/data/cls/randaugment.py @@ -0,0 +1,135 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +from PIL import Image, ImageEnhance, ImageOps +import numpy as np +import random +import six + + +class RawRandAugment(object): + def __init__(self, num_layers=2, magnitude=5, fillcolor=(128, 128, 128)): + self.num_layers = num_layers + self.magnitude = magnitude + self.max_level = 10 + + abso_level = self.magnitude / self.max_level + self.level_map = { + "shearX": 0.3 * abso_level, + "shearY": 0.3 * abso_level, + "translateX": 150.0 / 331 * abso_level, + "translateY": 150.0 / 331 * abso_level, + "rotate": 30 * abso_level, + "color": 0.9 * abso_level, + "posterize": int(4.0 * abso_level), + "solarize": 256.0 * abso_level, + "contrast": 0.9 * abso_level, + "sharpness": 0.9 * abso_level, + "brightness": 0.9 * abso_level, + "autocontrast": 0, + "equalize": 0, + "invert": 0 + } + + # from https://stackoverflow.com/questions/5252170/ + # specify-image-filling-color-when-rotating-in-python-with-pil-and-setting-expand + def rotate_with_fill(img, magnitude): + rot = img.convert("RGBA").rotate(magnitude) + return Image.composite(rot, + Image.new("RGBA", rot.size, (128, ) * 4), + rot).convert(img.mode) + + rnd_ch_op = random.choice + + self.func = { + "shearX": lambda img, magnitude: img.transform( + img.size, + Image.AFFINE, + (1, magnitude * rnd_ch_op([-1, 1]), 0, 0, 1, 0), + Image.BICUBIC, + fillcolor=fillcolor), + "shearY": lambda img, magnitude: img.transform( + img.size, + Image.AFFINE, + (1, 0, 0, magnitude * rnd_ch_op([-1, 1]), 1, 0), + Image.BICUBIC, + fillcolor=fillcolor), + "translateX": lambda img, magnitude: img.transform( + img.size, + Image.AFFINE, + (1, 0, magnitude * img.size[0] * rnd_ch_op([-1, 1]), 0, 1, 0), + fillcolor=fillcolor), + "translateY": lambda img, magnitude: img.transform( + img.size, + Image.AFFINE, + (1, 0, 0, 0, 1, magnitude * img.size[1] * rnd_ch_op([-1, 1])), + fillcolor=fillcolor), + "rotate": lambda img, magnitude: rotate_with_fill(img, magnitude), + "color": lambda img, magnitude: ImageEnhance.Color(img).enhance( + 1 + magnitude * rnd_ch_op([-1, 1])), + "posterize": lambda img, magnitude: + ImageOps.posterize(img, magnitude), + "solarize": lambda img, magnitude: + ImageOps.solarize(img, magnitude), + "contrast": lambda img, magnitude: + ImageEnhance.Contrast(img).enhance( + 1 + magnitude * rnd_ch_op([-1, 1])), + "sharpness": lambda img, magnitude: + ImageEnhance.Sharpness(img).enhance( + 1 + magnitude * rnd_ch_op([-1, 1])), + "brightness": lambda img, magnitude: + ImageEnhance.Brightness(img).enhance( + 1 + magnitude * rnd_ch_op([-1, 1])), + "autocontrast": lambda img, magnitude: + ImageOps.autocontrast(img), + "equalize": lambda img, magnitude: ImageOps.equalize(img), + "invert": lambda img, magnitude: ImageOps.invert(img) + } + + def __call__(self, img): + avaiable_op_names = list(self.level_map.keys()) + for layer_num in range(self.num_layers): + op_name = np.random.choice(avaiable_op_names) + img = self.func[op_name](img, self.level_map[op_name]) + return img + + +class RandAugment(RawRandAugment): + """ RandAugment wrapper to auto fit different img types """ + + def __init__(self, *args, **kwargs): + if six.PY2: + super(RandAugment, self).__init__(*args, **kwargs) + else: + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + + if six.PY2: + img = super(RandAugment, self).__call__(img) + else: + img = super().__call__(img) + + if isinstance(img, Image.Image): + img = np.asarray(img) + + return img diff --git a/ppocr/data/det/dataset_traversal.py b/ppocr/data/det/dataset_traversal.py index 737cbe2e90dc259c1266b554ae0c6735ea3a52d2..bd055c82be3bd5c762ccaf7a0134f89fbe4fe290 100644 --- a/ppocr/data/det/dataset_traversal.py +++ b/ppocr/data/det/dataset_traversal.py @@ -31,19 +31,24 @@ class TrainReader(object): def __init__(self, params): self.num_workers = params['num_workers'] self.label_file_path = params['label_file_path'] + print(self.label_file_path) + self.use_mul_data = False + if isinstance(self.label_file_path, list): + self.use_mul_data = True + self.data_ratio_list = params['data_ratio_list'] self.batch_size = params['train_batch_size_per_card'] assert 'process_function' in params,\ "absence process_function in Reader" self.process = create_module(params['process_function'])(params) - def __call__(self, process_id): + def __call__(self, process_id): def sample_iter_reader(): with open(self.label_file_path, "rb") as fin: label_infor_list = fin.readlines() img_num = len(label_infor_list) img_id_list = list(range(img_num)) random.shuffle(img_id_list) - if sys.platform == "win32": + if sys.platform == "win32" and self.num_workers != 1: print("multiprocess is not fully compatible with Windows." "num_workers will be 1.") self.num_workers = 1 @@ -54,13 +59,64 @@ class TrainReader(object): continue yield outs + def sample_iter_reader_mul(): + batch_size = 1000 + data_source_list = self.label_file_path + batch_size_list = list(map(int, [max(1.0, batch_size * x) for x in self.data_ratio_list])) + print(self.data_ratio_list, batch_size_list) + + data_filename_list, data_size_list, fetch_record_list = [], [], [] + for data_source in data_source_list: + image_files = open(data_source, "rb").readlines() + random.shuffle(image_files) + data_filename_list.append(image_files) + data_size_list.append(len(image_files)) + fetch_record_list.append(0) + + image_batch = [] + # get a batch of img_fns and poly_fns + for i in range(0, len(batch_size_list)): + bs = batch_size_list[i] + ds = data_size_list[i] + image_names = data_filename_list[i] + fetch_record = fetch_record_list[i] + data_path = data_source_list[i] + for j in range(fetch_record, fetch_record + bs): + index = j % ds + image_batch.append(image_names[index]) + + if (fetch_record + bs) > ds: + fetch_record_list[i] = 0 + random.shuffle(data_filename_list[i]) + else: + fetch_record_list[i] = fetch_record + bs + + if sys.platform == "win32": + print("multiprocess is not fully compatible with Windows." + "num_workers will be 1.") + self.num_workers = 1 + + for label_infor in image_batch: + outs = self.process(label_infor) + if outs is None: + continue + yield outs + def batch_iter_reader(): batch_outs = [] - for outs in sample_iter_reader(): - batch_outs.append(outs) - if len(batch_outs) == self.batch_size: - yield batch_outs - batch_outs = [] + if self.use_mul_data: + print("Sample date from multiple datasets!") + for outs in sample_iter_reader_mul(): + batch_outs.append(outs) + if len(batch_outs) == self.batch_size: + yield batch_outs + batch_outs = [] + else: + for outs in sample_iter_reader(): + batch_outs.append(outs) + if len(batch_outs) == self.batch_size: + yield batch_outs + batch_outs = [] return batch_iter_reader diff --git a/ppocr/data/det/db_process.py b/ppocr/data/det/db_process.py index b64b8c8d227f293aff0eff90d1d85dee8dd85fce..9534c59ef69d830a8d991f421539c5e4e5bb3d39 100644 --- a/ppocr/data/det/db_process.py +++ b/ppocr/data/det/db_process.py @@ -17,7 +17,7 @@ import cv2 import numpy as np import json import sys -from ppocr.utils.utility import initial_logger +from ppocr.utils.utility import initial_logger, check_and_read_gif logger = initial_logger() from .data_augment import AugmentData @@ -100,7 +100,9 @@ class DBProcessTrain(object): def __call__(self, label_infor): img_path, gt_label = self.convert_label_infor(label_infor) - imgvalue = cv2.imread(img_path) + imgvalue, flag = check_and_read_gif(img_path) + if not flag: + imgvalue = cv2.imread(img_path) if imgvalue is None: logger.info("{} does not exist!".format(img_path)) return None diff --git a/ppocr/data/det/east_process.py b/ppocr/data/det/east_process.py index 7f0deda24d62240d42702a9354d43c9e070e7c53..e2581caa20bb0e63f67e110d483d3335f51d56b7 100755 --- a/ppocr/data/det/east_process.py +++ b/ppocr/data/det/east_process.py @@ -16,7 +16,8 @@ import math import cv2 import numpy as np import json - +import sys +import os class EASTProcessTrain(object): def __init__(self, params): @@ -52,7 +53,7 @@ class EASTProcessTrain(object): label_infor = label_infor.decode() label_infor = label_infor.encode('utf-8').decode('utf-8-sig') substr = label_infor.strip("\n").split("\t") - img_path = self.img_set_dir + substr[0] + img_path = os.path.join(self.img_set_dir, substr[0]) label = json.loads(substr[1]) nBox = len(label) wordBBs, txts, txt_tags = [], [], [] @@ -78,7 +79,7 @@ class EASTProcessTrain(object): dst_polys = [] rand_degree_ratio = np.random.rand() rand_degree_cnt = 1 - if rand_degree_ratio > 0.333 and rand_degree_ratio < 0.666: + if 0.333 < rand_degree_ratio < 0.666: rand_degree_cnt = 2 elif rand_degree_ratio > 0.666: rand_degree_cnt = 3 @@ -138,7 +139,7 @@ class EASTProcessTrain(object): continue if p_area > 0: #'poly in wrong direction' - if tag == False: + if not tag: tag = True #reversed cases should be ignore poly = poly[(0, 3, 2, 1), :] validated_polys.append(poly) diff --git a/ppocr/data/det/random_crop_data.py b/ppocr/data/det/random_crop_data.py index d0c081e785cb17282b5486c718446b97a580b6cc..3e8629092e9a74a0764a0c04de829a02d00b6844 100644 --- a/ppocr/data/det/random_crop_data.py +++ b/ppocr/data/det/random_crop_data.py @@ -1,4 +1,4 @@ -# -*- coding:utf-8 -*- +# -*- coding:utf-8 -*- from __future__ import absolute_import from __future__ import division @@ -121,24 +121,22 @@ def RandomCropData(data, size): all_care_polys = [ text_polys[i] for i, tag in enumerate(ignore_tags) if not tag ] - # 计算crop区域 crop_x, crop_y, crop_w, crop_h = crop_area(im, all_care_polys, min_crop_side_ratio, max_tries) - # crop 图片 保持比例填充 - scale_w = size[0] / crop_w - scale_h = size[1] / crop_h + dh, dw = size + scale_w = dw / crop_w + scale_h = dh / crop_h scale = min(scale_w, scale_h) h = int(crop_h * scale) w = int(crop_w * scale) if keep_ratio: - padimg = np.zeros((size[1], size[0], im.shape[2]), im.dtype) + padimg = np.zeros((dh, dw, im.shape[2]), im.dtype) padimg[:h, :w] = cv2.resize( im[crop_y:crop_y + crop_h, crop_x:crop_x + crop_w], (w, h)) img = padimg else: img = cv2.resize(im[crop_y:crop_y + crop_h, crop_x:crop_x + crop_w], - tuple(size)) - # crop 文本框 + (dw, dh)) text_polys_crop = [] ignore_tags_crop = [] texts_crop = [] diff --git a/ppocr/data/det/sast_process.py b/ppocr/data/det/sast_process.py new file mode 100644 index 0000000000000000000000000000000000000000..74a848465f4cedb4a7007f61adc509d80885922c --- /dev/null +++ b/ppocr/data/det/sast_process.py @@ -0,0 +1,781 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +import math +import cv2 +import numpy as np +import json + + +class SASTProcessTrain(object): + """ + SAST process function for training + """ + def __init__(self, params): + self.img_set_dir = params['img_set_dir'] + self.min_crop_side_ratio = params['min_crop_side_ratio'] + self.min_crop_size = params['min_crop_size'] + image_shape = params['image_shape'] + self.input_size = image_shape[1] + self.min_text_size = params['min_text_size'] + self.max_text_size = params['max_text_size'] + + def convert_label_infor(self, label_infor): + label_infor = label_infor.decode() + label_infor = label_infor.encode('utf-8').decode('utf-8-sig') + substr = label_infor.strip("\n").split("\t") + img_path = self.img_set_dir + substr[0] + label = json.loads(substr[1]) + nBox = len(label) + wordBBs, txts, txt_tags = [], [], [] + for bno in range(0, nBox): + wordBB = label[bno]['points'] + txt = label[bno]['transcription'] + wordBBs.append(wordBB) + txts.append(txt) + if txt == '###': + txt_tags.append(True) + else: + txt_tags.append(False) + wordBBs = np.array(wordBBs, dtype=np.float32) + txt_tags = np.array(txt_tags, dtype=np.bool) + return img_path, wordBBs, txt_tags, txts + + def quad_area(self, poly): + """ + compute area of a polygon + :param poly: + :return: + """ + edge = [ + (poly[1][0] - poly[0][0]) * (poly[1][1] + poly[0][1]), + (poly[2][0] - poly[1][0]) * (poly[2][1] + poly[1][1]), + (poly[3][0] - poly[2][0]) * (poly[3][1] + poly[2][1]), + (poly[0][0] - poly[3][0]) * (poly[0][1] + poly[3][1]) + ] + return np.sum(edge) / 2. + + def gen_quad_from_poly(self, poly): + """ + Generate min area quad from poly. + """ + point_num = poly.shape[0] + min_area_quad = np.zeros((4, 2), dtype=np.float32) + if True: + rect = cv2.minAreaRect(poly.astype(np.int32)) # (center (x,y), (width, height), angle of rotation) + center_point = rect[0] + box = np.array(cv2.boxPoints(rect)) + + first_point_idx = 0 + min_dist = 1e4 + for i in range(4): + dist = np.linalg.norm(box[(i + 0) % 4] - poly[0]) + \ + np.linalg.norm(box[(i + 1) % 4] - poly[point_num // 2 - 1]) + \ + np.linalg.norm(box[(i + 2) % 4] - poly[point_num // 2]) + \ + np.linalg.norm(box[(i + 3) % 4] - poly[-1]) + if dist < min_dist: + min_dist = dist + first_point_idx = i + for i in range(4): + min_area_quad[i] = box[(first_point_idx + i) % 4] + + return min_area_quad + + def check_and_validate_polys(self, polys, tags, xxx_todo_changeme): + """ + check so that the text poly is in the same direction, + and also filter some invalid polygons + :param polys: + :param tags: + :return: + """ + (h, w) = xxx_todo_changeme + if polys.shape[0] == 0: + return polys, np.array([]), np.array([]) + polys[:, :, 0] = np.clip(polys[:, :, 0], 0, w - 1) + polys[:, :, 1] = np.clip(polys[:, :, 1], 0, h - 1) + + validated_polys = [] + validated_tags = [] + hv_tags = [] + for poly, tag in zip(polys, tags): + quad = self.gen_quad_from_poly(poly) + p_area = self.quad_area(quad) + if abs(p_area) < 1: + print('invalid poly') + continue + if p_area > 0: + if tag == False: + print('poly in wrong direction') + tag = True # reversed cases should be ignore + poly = poly[(0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1), :] + quad = quad[(0, 3, 2, 1), :] + + len_w = np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[3] - quad[2]) + len_h = np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[1] - quad[2]) + hv_tag = 1 + + if len_w * 2.0 < len_h: + hv_tag = 0 + + validated_polys.append(poly) + validated_tags.append(tag) + hv_tags.append(hv_tag) + return np.array(validated_polys), np.array(validated_tags), np.array(hv_tags) + + def crop_area(self, im, polys, tags, hv_tags, txts, crop_background=False, max_tries=25): + """ + make random crop from the input image + :param im: + :param polys: + :param tags: + :param crop_background: + :param max_tries: 50 -> 25 + :return: + """ + h, w, _ = im.shape + pad_h = h // 10 + pad_w = w // 10 + h_array = np.zeros((h + pad_h * 2), dtype=np.int32) + w_array = np.zeros((w + pad_w * 2), dtype=np.int32) + for poly in polys: + poly = np.round(poly, decimals=0).astype(np.int32) + minx = np.min(poly[:, 0]) + maxx = np.max(poly[:, 0]) + w_array[minx + pad_w: maxx + pad_w] = 1 + miny = np.min(poly[:, 1]) + maxy = np.max(poly[:, 1]) + h_array[miny + pad_h: maxy + pad_h] = 1 + # ensure the cropped area not across a text + h_axis = np.where(h_array == 0)[0] + w_axis = np.where(w_array == 0)[0] + if len(h_axis) == 0 or len(w_axis) == 0: + return im, polys, tags, hv_tags, txts + for i in range(max_tries): + xx = np.random.choice(w_axis, size=2) + xmin = np.min(xx) - pad_w + xmax = np.max(xx) - pad_w + xmin = np.clip(xmin, 0, w - 1) + xmax = np.clip(xmax, 0, w - 1) + yy = np.random.choice(h_axis, size=2) + ymin = np.min(yy) - pad_h + ymax = np.max(yy) - pad_h + ymin = np.clip(ymin, 0, h - 1) + ymax = np.clip(ymax, 0, h - 1) + # if xmax - xmin < ARGS.min_crop_side_ratio * w or \ + # ymax - ymin < ARGS.min_crop_side_ratio * h: + if xmax - xmin < self.min_crop_size or \ + ymax - ymin < self.min_crop_size: + # area too small + continue + if polys.shape[0] != 0: + poly_axis_in_area = (polys[:, :, 0] >= xmin) & (polys[:, :, 0] <= xmax) \ + & (polys[:, :, 1] >= ymin) & (polys[:, :, 1] <= ymax) + selected_polys = np.where(np.sum(poly_axis_in_area, axis=1) == 4)[0] + else: + selected_polys = [] + if len(selected_polys) == 0: + # no text in this area + if crop_background: + txts_tmp = [] + for selected_poly in selected_polys: + txts_tmp.append(txts[selected_poly]) + txts = txts_tmp + return im[ymin : ymax + 1, xmin : xmax + 1, :], \ + polys[selected_polys], tags[selected_polys], hv_tags[selected_polys], txts + else: + continue + im = im[ymin: ymax + 1, xmin: xmax + 1, :] + polys = polys[selected_polys] + tags = tags[selected_polys] + hv_tags = hv_tags[selected_polys] + txts_tmp = [] + for selected_poly in selected_polys: + txts_tmp.append(txts[selected_poly]) + txts = txts_tmp + polys[:, :, 0] -= xmin + polys[:, :, 1] -= ymin + return im, polys, tags, hv_tags, txts + + return im, polys, tags, hv_tags, txts + + def generate_direction_map(self, poly_quads, direction_map): + """ + """ + width_list = [] + height_list = [] + for quad in poly_quads: + quad_w = (np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[2] - quad[3])) / 2.0 + quad_h = (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[2] - quad[1])) / 2.0 + width_list.append(quad_w) + height_list.append(quad_h) + norm_width = max(sum(width_list) / (len(width_list) + 1e-6), 1.0) + average_height = max(sum(height_list) / (len(height_list) + 1e-6), 1.0) + + for quad in poly_quads: + direct_vector_full = ((quad[1] + quad[2]) - (quad[0] + quad[3])) / 2.0 + direct_vector = direct_vector_full / (np.linalg.norm(direct_vector_full) + 1e-6) * norm_width + direction_label = tuple(map(float, [direct_vector[0], direct_vector[1], 1.0 / (average_height + 1e-6)])) + cv2.fillPoly(direction_map, quad.round().astype(np.int32)[np.newaxis, :, :], direction_label) + return direction_map + + def calculate_average_height(self, poly_quads): + """ + """ + height_list = [] + for quad in poly_quads: + quad_h = (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[2] - quad[1])) / 2.0 + height_list.append(quad_h) + average_height = max(sum(height_list) / len(height_list), 1.0) + return average_height + + def generate_tcl_label(self, hw, polys, tags, ds_ratio, + tcl_ratio=0.3, shrink_ratio_of_width=0.15): + """ + Generate polygon. + """ + h, w = hw + h, w = int(h * ds_ratio), int(w * ds_ratio) + polys = polys * ds_ratio + + score_map = np.zeros((h, w,), dtype=np.float32) + tbo_map = np.zeros((h, w, 5), dtype=np.float32) + training_mask = np.ones((h, w,), dtype=np.float32) + direction_map = np.ones((h, w, 3)) * np.array([0, 0, 1]).reshape([1, 1, 3]).astype(np.float32) + + for poly_idx, poly_tag in enumerate(zip(polys, tags)): + poly = poly_tag[0] + tag = poly_tag[1] + + # generate min_area_quad + min_area_quad, center_point = self.gen_min_area_quad_from_poly(poly) + min_area_quad_h = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[3]) + + np.linalg.norm(min_area_quad[1] - min_area_quad[2])) + min_area_quad_w = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[1]) + + np.linalg.norm(min_area_quad[2] - min_area_quad[3])) + + if min(min_area_quad_h, min_area_quad_w) < self.min_text_size * ds_ratio \ + or min(min_area_quad_h, min_area_quad_w) > self.max_text_size * ds_ratio: + continue + + if tag: + # continue + cv2.fillPoly(training_mask, poly.astype(np.int32)[np.newaxis, :, :], 0.15) + else: + tcl_poly = self.poly2tcl(poly, tcl_ratio) + tcl_quads = self.poly2quads(tcl_poly) + poly_quads = self.poly2quads(poly) + # stcl map + stcl_quads, quad_index = self.shrink_poly_along_width(tcl_quads, shrink_ratio_of_width=shrink_ratio_of_width, + expand_height_ratio=1.0 / tcl_ratio) + # generate tcl map + cv2.fillPoly(score_map, np.round(stcl_quads).astype(np.int32), 1.0) + + # generate tbo map + for idx, quad in enumerate(stcl_quads): + quad_mask = np.zeros((h, w), dtype=np.float32) + quad_mask = cv2.fillPoly(quad_mask, np.round(quad[np.newaxis, :, :]).astype(np.int32), 1.0) + tbo_map = self.gen_quad_tbo(poly_quads[quad_index[idx]], quad_mask, tbo_map) + return score_map, tbo_map, training_mask + + def generate_tvo_and_tco(self, hw, polys, tags, tcl_ratio=0.3, ds_ratio=0.25): + """ + Generate tcl map, tvo map and tbo map. + """ + h, w = hw + h, w = int(h * ds_ratio), int(w * ds_ratio) + polys = polys * ds_ratio + poly_mask = np.zeros((h, w), dtype=np.float32) + + tvo_map = np.ones((9, h, w), dtype=np.float32) + tvo_map[0:-1:2] = np.tile(np.arange(0, w), (h, 1)) + tvo_map[1:-1:2] = np.tile(np.arange(0, w), (h, 1)).T + poly_tv_xy_map = np.zeros((8, h, w), dtype=np.float32) + + # tco map + tco_map = np.ones((3, h, w), dtype=np.float32) + tco_map[0] = np.tile(np.arange(0, w), (h, 1)) + tco_map[1] = np.tile(np.arange(0, w), (h, 1)).T + poly_tc_xy_map = np.zeros((2, h, w), dtype=np.float32) + + poly_short_edge_map = np.ones((h, w), dtype=np.float32) + + for poly, poly_tag in zip(polys, tags): + + if poly_tag == True: + continue + + # adjust point order for vertical poly + poly = self.adjust_point(poly) + + # generate min_area_quad + min_area_quad, center_point = self.gen_min_area_quad_from_poly(poly) + min_area_quad_h = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[3]) + + np.linalg.norm(min_area_quad[1] - min_area_quad[2])) + min_area_quad_w = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[1]) + + np.linalg.norm(min_area_quad[2] - min_area_quad[3])) + + # generate tcl map and text, 128 * 128 + tcl_poly = self.poly2tcl(poly, tcl_ratio) + + # generate poly_tv_xy_map + for idx in range(4): + cv2.fillPoly(poly_tv_xy_map[2 * idx], + np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32), + float(min(max(min_area_quad[idx, 0], 0), w))) + cv2.fillPoly(poly_tv_xy_map[2 * idx + 1], + np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32), + float(min(max(min_area_quad[idx, 1], 0), h))) + + # generate poly_tc_xy_map + for idx in range(2): + cv2.fillPoly(poly_tc_xy_map[idx], + np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32), float(center_point[idx])) + + # generate poly_short_edge_map + cv2.fillPoly(poly_short_edge_map, + np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32), + float(max(min(min_area_quad_h, min_area_quad_w), 1.0))) + + # generate poly_mask and training_mask + cv2.fillPoly(poly_mask, np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32), 1) + + tvo_map *= poly_mask + tvo_map[:8] -= poly_tv_xy_map + tvo_map[-1] /= poly_short_edge_map + tvo_map = tvo_map.transpose((1, 2, 0)) + + tco_map *= poly_mask + tco_map[:2] -= poly_tc_xy_map + tco_map[-1] /= poly_short_edge_map + tco_map = tco_map.transpose((1, 2, 0)) + + return tvo_map, tco_map + + def adjust_point(self, poly): + """ + adjust point order. + """ + point_num = poly.shape[0] + if point_num == 4: + len_1 = np.linalg.norm(poly[0] - poly[1]) + len_2 = np.linalg.norm(poly[1] - poly[2]) + len_3 = np.linalg.norm(poly[2] - poly[3]) + len_4 = np.linalg.norm(poly[3] - poly[0]) + + if (len_1 + len_3) * 1.5 < (len_2 + len_4): + poly = poly[[1, 2, 3, 0], :] + + elif point_num > 4: + vector_1 = poly[0] - poly[1] + vector_2 = poly[1] - poly[2] + cos_theta = np.dot(vector_1, vector_2) / (np.linalg.norm(vector_1) * np.linalg.norm(vector_2) + 1e-6) + theta = np.arccos(np.round(cos_theta, decimals=4)) + + if abs(theta) > (70 / 180 * math.pi): + index = list(range(1, point_num)) + [0] + poly = poly[np.array(index), :] + return poly + + def gen_min_area_quad_from_poly(self, poly): + """ + Generate min area quad from poly. + """ + point_num = poly.shape[0] + min_area_quad = np.zeros((4, 2), dtype=np.float32) + if point_num == 4: + min_area_quad = poly + center_point = np.sum(poly, axis=0) / 4 + else: + rect = cv2.minAreaRect(poly.astype(np.int32)) # (center (x,y), (width, height), angle of rotation) + center_point = rect[0] + box = np.array(cv2.boxPoints(rect)) + + first_point_idx = 0 + min_dist = 1e4 + for i in range(4): + dist = np.linalg.norm(box[(i + 0) % 4] - poly[0]) + \ + np.linalg.norm(box[(i + 1) % 4] - poly[point_num // 2 - 1]) + \ + np.linalg.norm(box[(i + 2) % 4] - poly[point_num // 2]) + \ + np.linalg.norm(box[(i + 3) % 4] - poly[-1]) + if dist < min_dist: + min_dist = dist + first_point_idx = i + + for i in range(4): + min_area_quad[i] = box[(first_point_idx + i) % 4] + + return min_area_quad, center_point + + def shrink_quad_along_width(self, quad, begin_width_ratio=0., end_width_ratio=1.): + """ + Generate shrink_quad_along_width. + """ + ratio_pair = np.array([[begin_width_ratio], [end_width_ratio]], dtype=np.float32) + p0_1 = quad[0] + (quad[1] - quad[0]) * ratio_pair + p3_2 = quad[3] + (quad[2] - quad[3]) * ratio_pair + return np.array([p0_1[0], p0_1[1], p3_2[1], p3_2[0]]) + + def shrink_poly_along_width(self, quads, shrink_ratio_of_width, expand_height_ratio=1.0): + """ + shrink poly with given length. + """ + upper_edge_list = [] + + def get_cut_info(edge_len_list, cut_len): + for idx, edge_len in enumerate(edge_len_list): + cut_len -= edge_len + if cut_len <= 0.000001: + ratio = (cut_len + edge_len_list[idx]) / edge_len_list[idx] + return idx, ratio + + for quad in quads: + upper_edge_len = np.linalg.norm(quad[0] - quad[1]) + upper_edge_list.append(upper_edge_len) + + # length of left edge and right edge. + left_length = np.linalg.norm(quads[0][0] - quads[0][3]) * expand_height_ratio + right_length = np.linalg.norm(quads[-1][1] - quads[-1][2]) * expand_height_ratio + + shrink_length = min(left_length, right_length, sum(upper_edge_list)) * shrink_ratio_of_width + # shrinking length + upper_len_left = shrink_length + upper_len_right = sum(upper_edge_list) - shrink_length + + left_idx, left_ratio = get_cut_info(upper_edge_list, upper_len_left) + left_quad = self.shrink_quad_along_width(quads[left_idx], begin_width_ratio=left_ratio, end_width_ratio=1) + right_idx, right_ratio = get_cut_info(upper_edge_list, upper_len_right) + right_quad = self.shrink_quad_along_width(quads[right_idx], begin_width_ratio=0, end_width_ratio=right_ratio) + + out_quad_list = [] + if left_idx == right_idx: + out_quad_list.append([left_quad[0], right_quad[1], right_quad[2], left_quad[3]]) + else: + out_quad_list.append(left_quad) + for idx in range(left_idx + 1, right_idx): + out_quad_list.append(quads[idx]) + out_quad_list.append(right_quad) + + return np.array(out_quad_list), list(range(left_idx, right_idx + 1)) + + def vector_angle(self, A, B): + """ + Calculate the angle between vector AB and x-axis positive direction. + """ + AB = np.array([B[1] - A[1], B[0] - A[0]]) + return np.arctan2(*AB) + + def theta_line_cross_point(self, theta, point): + """ + Calculate the line through given point and angle in ax + by + c =0 form. + """ + x, y = point + cos = np.cos(theta) + sin = np.sin(theta) + return [sin, -cos, cos * y - sin * x] + + def line_cross_two_point(self, A, B): + """ + Calculate the line through given point A and B in ax + by + c =0 form. + """ + angle = self.vector_angle(A, B) + return self.theta_line_cross_point(angle, A) + + def average_angle(self, poly): + """ + Calculate the average angle between left and right edge in given poly. + """ + p0, p1, p2, p3 = poly + angle30 = self.vector_angle(p3, p0) + angle21 = self.vector_angle(p2, p1) + return (angle30 + angle21) / 2 + + def line_cross_point(self, line1, line2): + """ + line1 and line2 in 0=ax+by+c form, compute the cross point of line1 and line2 + """ + a1, b1, c1 = line1 + a2, b2, c2 = line2 + d = a1 * b2 - a2 * b1 + + if d == 0: + #print("line1", line1) + #print("line2", line2) + print('Cross point does not exist') + return np.array([0, 0], dtype=np.float32) + else: + x = (b1 * c2 - b2 * c1) / d + y = (a2 * c1 - a1 * c2) / d + + return np.array([x, y], dtype=np.float32) + + def quad2tcl(self, poly, ratio): + """ + Generate center line by poly clock-wise point. (4, 2) + """ + ratio_pair = np.array([[0.5 - ratio / 2], [0.5 + ratio / 2]], dtype=np.float32) + p0_3 = poly[0] + (poly[3] - poly[0]) * ratio_pair + p1_2 = poly[1] + (poly[2] - poly[1]) * ratio_pair + return np.array([p0_3[0], p1_2[0], p1_2[1], p0_3[1]]) + + def poly2tcl(self, poly, ratio): + """ + Generate center line by poly clock-wise point. + """ + ratio_pair = np.array([[0.5 - ratio / 2], [0.5 + ratio / 2]], dtype=np.float32) + tcl_poly = np.zeros_like(poly) + point_num = poly.shape[0] + + for idx in range(point_num // 2): + point_pair = poly[idx] + (poly[point_num - 1 - idx] - poly[idx]) * ratio_pair + tcl_poly[idx] = point_pair[0] + tcl_poly[point_num - 1 - idx] = point_pair[1] + return tcl_poly + + def gen_quad_tbo(self, quad, tcl_mask, tbo_map): + """ + Generate tbo_map for give quad. + """ + # upper and lower line function: ax + by + c = 0; + up_line = self.line_cross_two_point(quad[0], quad[1]) + lower_line = self.line_cross_two_point(quad[3], quad[2]) + + quad_h = 0.5 * (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[1] - quad[2])) + quad_w = 0.5 * (np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[2] - quad[3])) + + # average angle of left and right line. + angle = self.average_angle(quad) + + xy_in_poly = np.argwhere(tcl_mask == 1) + for y, x in xy_in_poly: + point = (x, y) + line = self.theta_line_cross_point(angle, point) + cross_point_upper = self.line_cross_point(up_line, line) + cross_point_lower = self.line_cross_point(lower_line, line) + ##FIX, offset reverse + upper_offset_x, upper_offset_y = cross_point_upper - point + lower_offset_x, lower_offset_y = cross_point_lower - point + tbo_map[y, x, 0] = upper_offset_y + tbo_map[y, x, 1] = upper_offset_x + tbo_map[y, x, 2] = lower_offset_y + tbo_map[y, x, 3] = lower_offset_x + tbo_map[y, x, 4] = 1.0 / max(min(quad_h, quad_w), 1.0) * 2 + return tbo_map + + def poly2quads(self, poly): + """ + Split poly into quads. + """ + quad_list = [] + point_num = poly.shape[0] + + # point pair + point_pair_list = [] + for idx in range(point_num // 2): + point_pair = [poly[idx], poly[point_num - 1 - idx]] + point_pair_list.append(point_pair) + + quad_num = point_num // 2 - 1 + for idx in range(quad_num): + # reshape and adjust to clock-wise + quad_list.append((np.array(point_pair_list)[[idx, idx + 1]]).reshape(4, 2)[[0, 2, 3, 1]]) + + return np.array(quad_list) + + def extract_polys(self, poly_txt_path): + """ + Read text_polys, txt_tags, txts from give txt file. + """ + text_polys, txt_tags, txts = [], [], [] + + with open(poly_txt_path) as f: + for line in f.readlines(): + poly_str, txt = line.strip().split('\t') + poly = map(float, poly_str.split(',')) + text_polys.append(np.array(poly, dtype=np.float32).reshape(-1, 2)) + txts.append(txt) + if txt == '###': + txt_tags.append(True) + else: + txt_tags.append(False) + + return np.array(map(np.array, text_polys)), \ + np.array(txt_tags, dtype=np.bool), txts + + def __call__(self, label_infor): + infor = self.convert_label_infor(label_infor) + im_path, text_polys, text_tags, text_strs = infor + im = cv2.imread(im_path) + if im is None: + return None + if text_polys.shape[0] == 0: + return None + + h, w, _ = im.shape + text_polys, text_tags, hv_tags = self.check_and_validate_polys(text_polys, text_tags, (h, w)) + + if text_polys.shape[0] == 0: + return None + + #set aspect ratio and keep area fix + asp_scales = np.arange(1.0, 1.55, 0.1) + asp_scale = np.random.choice(asp_scales) + + if np.random.rand() < 0.5: + asp_scale = 1.0 / asp_scale + asp_scale = math.sqrt(asp_scale) + + asp_wx = asp_scale + asp_hy = 1.0 / asp_scale + im = cv2.resize(im, dsize=None, fx=asp_wx, fy=asp_hy) + text_polys[:, :, 0] *= asp_wx + text_polys[:, :, 1] *= asp_hy + + h, w, _ = im.shape + if max(h, w) > 2048: + rd_scale = 2048.0 / max(h, w) + im = cv2.resize(im, dsize=None, fx=rd_scale, fy=rd_scale) + text_polys *= rd_scale + h, w, _ = im.shape + if min(h, w) < 16: + return None + + #no background + im, text_polys, text_tags, hv_tags, text_strs = self.crop_area(im, \ + text_polys, text_tags, hv_tags, text_strs, crop_background=False) + if text_polys.shape[0] == 0: + return None + #continue for all ignore case + if np.sum((text_tags * 1.0)) >= text_tags.size: + return None + new_h, new_w, _ = im.shape + if (new_h is None) or (new_w is None): + return None + #resize image + std_ratio = float(self.input_size) / max(new_w, new_h) + rand_scales = np.array([0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.0, 1.0, 1.0, 1.0]) + rz_scale = std_ratio * np.random.choice(rand_scales) + im = cv2.resize(im, dsize=None, fx=rz_scale, fy=rz_scale) + text_polys[:, :, 0] *= rz_scale + text_polys[:, :, 1] *= rz_scale + + #add gaussian blur + if np.random.rand() < 0.1 * 0.5: + ks = np.random.permutation(5)[0] + 1 + ks = int(ks/2)*2 + 1 + im = cv2.GaussianBlur(im, ksize=(ks, ks), sigmaX=0, sigmaY=0) + #add brighter + if np.random.rand() < 0.1 * 0.5: + im = im * (1.0 + np.random.rand() * 0.5) + im = np.clip(im, 0.0, 255.0) + #add darker + if np.random.rand() < 0.1 * 0.5: + im = im * (1.0 - np.random.rand() * 0.5) + im = np.clip(im, 0.0, 255.0) + + # Padding the im to [input_size, input_size] + new_h, new_w, _ = im.shape + if min(new_w, new_h) < self.input_size * 0.5: + return None + + im_padded = np.ones((self.input_size, self.input_size, 3), dtype=np.float32) + im_padded[:, :, 2] = 0.485 * 255 + im_padded[:, :, 1] = 0.456 * 255 + im_padded[:, :, 0] = 0.406 * 255 + + # Random the start position + del_h = self.input_size - new_h + del_w = self.input_size - new_w + sh, sw = 0, 0 + if del_h > 1: + sh = int(np.random.rand() * del_h) + if del_w > 1: + sw = int(np.random.rand() * del_w) + + # Padding + im_padded[sh: sh + new_h, sw: sw + new_w, :] = im.copy() + text_polys[:, :, 0] += sw + text_polys[:, :, 1] += sh + + score_map, border_map, training_mask = self.generate_tcl_label((self.input_size, self.input_size), + text_polys, text_tags, 0.25) + + # SAST head + tvo_map, tco_map = self.generate_tvo_and_tco((self.input_size, self.input_size), text_polys, text_tags, tcl_ratio=0.3, ds_ratio=0.25) + # print("test--------tvo_map shape:", tvo_map.shape) + + im_padded[:, :, 2] -= 0.485 * 255 + im_padded[:, :, 1] -= 0.456 * 255 + im_padded[:, :, 0] -= 0.406 * 255 + im_padded[:, :, 2] /= (255.0 * 0.229) + im_padded[:, :, 1] /= (255.0 * 0.224) + im_padded[:, :, 0] /= (255.0 * 0.225) + im_padded = im_padded.transpose((2, 0, 1)) + + return im_padded[::-1, :, :], score_map[np.newaxis, :, :], border_map.transpose((2, 0, 1)), training_mask[np.newaxis, :, :], tvo_map.transpose((2, 0, 1)), tco_map.transpose((2, 0, 1)) + + +class SASTProcessTest(object): + """ + SAST process function for test + """ + def __init__(self, params): + super(SASTProcessTest, self).__init__() + if 'max_side_len' in params: + self.max_side_len = params['max_side_len'] + else: + self.max_side_len = 2400 + + def resize_image(self, im): + """ + resize image to a size multiple of max_stride which is required by the network + :param im: the resized image + :param max_side_len: limit of max image size to avoid out of memory in gpu + :return: the resized image and the resize ratio + """ + h, w, _ = im.shape + + resize_w = w + resize_h = h + + # Fix the longer side + if resize_h > resize_w: + ratio = float(self.max_side_len) / resize_h + else: + ratio = float(self.max_side_len) / resize_w + + resize_h = int(resize_h * ratio) + resize_w = int(resize_w * ratio) + + max_stride = 128 + resize_h = (resize_h + max_stride - 1) // max_stride * max_stride + resize_w = (resize_w + max_stride - 1) // max_stride * max_stride + im = cv2.resize(im, (int(resize_w), int(resize_h))) + ratio_h = resize_h / float(h) + ratio_w = resize_w / float(w) + + return im, (ratio_h, ratio_w) + + def __call__(self, im): + src_h, src_w, _ = im.shape + im, (ratio_h, ratio_w) = self.resize_image(im) + img_mean = [0.485, 0.456, 0.406] + img_std = [0.229, 0.224, 0.225] + im = im[:, :, ::-1].astype(np.float32) + im = im / 255 + im -= img_mean + im /= img_std + im = im.transpose((2, 0, 1)) + im = im[np.newaxis, :] + return [im, (ratio_h, ratio_w, src_h, src_w)] diff --git a/ppocr/data/rec/dataset_traversal.py b/ppocr/data/rec/dataset_traversal.py index 510a028451302a92ebc179792ecbcb1ff8649807..02aa13baa92261ef9dbf888c2d479925aacb0be6 100755 --- a/ppocr/data/rec/dataset_traversal.py +++ b/ppocr/data/rec/dataset_traversal.py @@ -26,7 +26,7 @@ from ppocr.utils.utility import initial_logger from ppocr.utils.utility import get_image_file_list logger = initial_logger() -from .img_tools import process_image, get_img_data +from .img_tools import process_image, process_image_srn, get_img_data class LMDBReader(object): @@ -43,6 +43,9 @@ class LMDBReader(object): self.mode = params['mode'] self.drop_last = False self.use_tps = False + self.num_heads = None + if "num_heads" in params: + self.num_heads = params['num_heads'] if "tps" in params: self.ues_tps = True self.use_distort = False @@ -119,12 +122,20 @@ class LMDBReader(object): img = cv2.imread(single_img) if img.shape[-1] == 1 or len(list(img.shape)) == 2: img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) - norm_img = process_image( - img=img, - image_shape=self.image_shape, - char_ops=self.char_ops, - tps=self.use_tps, - infer_mode=True) + if self.loss_type == 'srn': + norm_img = process_image_srn( + img=img, + image_shape=self.image_shape, + num_heads=self.num_heads, + char_ops=self.char_ops, + max_text_length=self.max_text_length) + else: + norm_img = process_image( + img=img, + image_shape=self.image_shape, + char_ops=self.char_ops, + tps=self.use_tps, + infer_mode=True) yield norm_img else: lmdb_sets = self.load_hierarchical_lmdb_dataset() @@ -144,14 +155,25 @@ class LMDBReader(object): if sample_info is None: continue img, label = sample_info - outs = process_image( - img=img, - image_shape=self.image_shape, - label=label, - char_ops=self.char_ops, - loss_type=self.loss_type, - max_text_length=self.max_text_length, - distort=self.use_distort) + outs = [] + if self.loss_type == "srn": + outs = process_image_srn( + img=img, + image_shape=self.image_shape, + num_heads=self.num_heads, + max_text_length=self.max_text_length, + label=label, + char_ops=self.char_ops, + loss_type=self.loss_type) + + else: + outs = process_image( + img=img, + image_shape=self.image_shape, + label=label, + char_ops=self.char_ops, + loss_type=self.loss_type, + max_text_length=self.max_text_length) if outs is None: continue yield outs @@ -185,6 +207,7 @@ class SimpleReader(object): if params['mode'] != 'test': self.img_set_dir = params['img_set_dir'] self.label_file_path = params['label_file_path'] + self.use_gpu = params['use_gpu'] self.char_ops = params['char_ops'] self.image_shape = params['image_shape'] self.loss_type = params['loss_type'] @@ -192,6 +215,8 @@ class SimpleReader(object): self.mode = params['mode'] self.infer_img = params['infer_img'] self.use_tps = False + if "num_heads" in params: + self.num_heads = params['num_heads'] if "tps" in params: self.use_tps = True self.use_distort = False @@ -213,6 +238,15 @@ class SimpleReader(object): if self.mode != 'train': process_id = 0 + def get_device_num(): + if self.use_gpu: + gpus = os.environ.get("CUDA_VISIBLE_DEVICES", '1') + gpu_num = len(gpus.split(',')) + return gpu_num + else: + cpu_num = os.environ.get("CPU_NUM", 1) + return int(cpu_num) + def sample_iter_reader(): if self.mode != 'train' and self.infer_img is not None: image_file_list = get_image_file_list(self.infer_img) @@ -220,12 +254,20 @@ class SimpleReader(object): img = cv2.imread(single_img) if img.shape[-1] == 1 or len(list(img.shape)) == 2: img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) - norm_img = process_image( - img=img, - image_shape=self.image_shape, - char_ops=self.char_ops, - tps=self.use_tps, - infer_mode=True) + if self.loss_type == 'srn': + norm_img = process_image_srn( + img=img, + image_shape=self.image_shape, + char_ops=self.char_ops, + num_heads=self.num_heads, + max_text_length=self.max_text_length) + else: + norm_img = process_image( + img=img, + image_shape=self.image_shape, + char_ops=self.char_ops, + tps=self.use_tps, + infer_mode=True) yield norm_img else: with open(self.label_file_path, "rb") as fin: @@ -233,10 +275,16 @@ class SimpleReader(object): img_num = len(label_infor_list) img_id_list = list(range(img_num)) random.shuffle(img_id_list) - if sys.platform == "win32": + if sys.platform == "win32" and self.num_workers != 1: print("multiprocess is not fully compatible with Windows." "num_workers will be 1.") self.num_workers = 1 + if self.batch_size * get_device_num( + ) * self.num_workers > img_num: + raise Exception( + "The number of the whole data ({}) is smaller than the batch_size * devices_num * num_workers ({})". + format(img_num, self.batch_size * get_device_num() * + self.num_workers)) for img_id in range(process_id, img_num, self.num_workers): label_infor = label_infor_list[img_id_list[img_id]] substr = label_infor.decode('utf-8').strip("\n").split("\t") @@ -249,14 +297,25 @@ class SimpleReader(object): img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) label = substr[1] - outs = process_image( - img=img, - image_shape=self.image_shape, - label=label, - char_ops=self.char_ops, - loss_type=self.loss_type, - max_text_length=self.max_text_length, - distort=self.use_distort) + if self.loss_type == "srn": + outs = process_image_srn( + img=img, + image_shape=self.image_shape, + num_heads=self.num_heads, + max_text_length=self.max_text_length, + label=label, + char_ops=self.char_ops, + loss_type=self.loss_type) + + else: + outs = process_image( + img=img, + image_shape=self.image_shape, + label=label, + char_ops=self.char_ops, + loss_type=self.loss_type, + max_text_length=self.max_text_length, + distort=self.use_distort) if outs is None: continue yield outs diff --git a/ppocr/data/rec/img_tools.py b/ppocr/data/rec/img_tools.py index d41abd9ba5c867853984b4b3b4c9eda41ebeff7c..8b497e6b803ba0fffaefc3e12c366130504b9ce0 100755 --- a/ppocr/data/rec/img_tools.py +++ b/ppocr/data/rec/img_tools.py @@ -360,7 +360,7 @@ def process_image(img, text = char_ops.encode(label) if len(text) == 0 or len(text) > max_text_length: logger.info( - "Warning in ppocr/data/rec/img_tools.py:line362: Wrong data type." + "Warning in ppocr/data/rec/img_tools.py: Wrong data type." "Excepted string with length between 1 and {}, but " "got '{}'. Label is '{}'".format(max_text_length, len(text), label)) @@ -381,3 +381,86 @@ def process_image(img, assert False, "Unsupport loss_type %s in process_image"\ % loss_type return (norm_img) + +def resize_norm_img_srn(img, image_shape): + imgC, imgH, imgW = image_shape + + img_black = np.zeros((imgH, imgW)) + im_hei = img.shape[0] + im_wid = img.shape[1] + + if im_wid <= im_hei * 1: + img_new = cv2.resize(img, (imgH * 1, imgH)) + elif im_wid <= im_hei * 2: + img_new = cv2.resize(img, (imgH * 2, imgH)) + elif im_wid <= im_hei * 3: + img_new = cv2.resize(img, (imgH * 3, imgH)) + else: + img_new = cv2.resize(img, (imgW, imgH)) + + img_np = np.asarray(img_new) + img_np = cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY) + img_black[:, 0:img_np.shape[1]] = img_np + img_black = img_black[:, :, np.newaxis] + + row, col, c = img_black.shape + c = 1 + + return np.reshape(img_black, (c, row, col)).astype(np.float32) + +def srn_other_inputs(image_shape, + num_heads, + max_text_length, + char_num): + + imgC, imgH, imgW = image_shape + feature_dim = int((imgH / 8) * (imgW / 8)) + + encoder_word_pos = np.array(range(0, feature_dim)).reshape((feature_dim, 1)).astype('int64') + gsrm_word_pos = np.array(range(0, max_text_length)).reshape((max_text_length, 1)).astype('int64') + + lbl_weight = np.array([int(char_num-1)] * max_text_length).reshape((-1,1)).astype('int64') + + gsrm_attn_bias_data = np.ones((1, max_text_length, max_text_length)) + gsrm_slf_attn_bias1 = np.triu(gsrm_attn_bias_data, 1).reshape([-1, 1, max_text_length, max_text_length]) + gsrm_slf_attn_bias1 = np.tile(gsrm_slf_attn_bias1, [1, num_heads, 1, 1]) * [-1e9] + + gsrm_slf_attn_bias2 = np.tril(gsrm_attn_bias_data, -1).reshape([-1, 1, max_text_length, max_text_length]) + gsrm_slf_attn_bias2 = np.tile(gsrm_slf_attn_bias2, [1, num_heads, 1, 1]) * [-1e9] + + encoder_word_pos = encoder_word_pos[np.newaxis, :] + gsrm_word_pos = gsrm_word_pos[np.newaxis, :] + + return [lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2] + +def process_image_srn(img, + image_shape, + num_heads, + max_text_length, + label=None, + char_ops=None, + loss_type=None): + norm_img = resize_norm_img_srn(img, image_shape) + norm_img = norm_img[np.newaxis, :] + char_num = char_ops.get_char_num() + + [lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2] = \ + srn_other_inputs(image_shape, num_heads, max_text_length,char_num) + + if label is not None: + text = char_ops.encode(label) + if len(text) == 0 or len(text) > max_text_length: + return None + else: + if loss_type == "srn": + text_padded = [int(char_num-1)] * max_text_length + for i in range(len(text)): + text_padded[i] = text[i] + lbl_weight[i] = [1.0] + text_padded = np.array(text_padded) + text = text_padded.reshape(-1, 1) + return (norm_img, text,encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2,lbl_weight) + else: + assert False, "Unsupport loss_type %s in process_image"\ + % loss_type + return (norm_img, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2) diff --git a/ppocr/modeling/architectures/cls_model.py b/ppocr/modeling/architectures/cls_model.py new file mode 100755 index 0000000000000000000000000000000000000000..ad3ad0e7cf4010a14c70a700ed02d02ee1f1323b --- /dev/null +++ b/ppocr/modeling/architectures/cls_model.py @@ -0,0 +1,85 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from paddle import fluid + +from ppocr.utils.utility import create_module +from ppocr.utils.utility import initial_logger + +logger = initial_logger() +from copy import deepcopy + + +class ClsModel(object): + def __init__(self, params): + super(ClsModel, self).__init__() + global_params = params['Global'] + self.infer_img = global_params['infer_img'] + + backbone_params = deepcopy(params["Backbone"]) + backbone_params.update(global_params) + self.backbone = create_module(backbone_params['function']) \ + (params=backbone_params) + + head_params = deepcopy(params["Head"]) + head_params.update(global_params) + self.head = create_module(head_params['function']) \ + (params=head_params) + + loss_params = deepcopy(params["Loss"]) + loss_params.update(global_params) + self.loss = create_module(loss_params['function']) \ + (params=loss_params) + + self.image_shape = global_params['image_shape'] + + def create_feed(self, mode): + image_shape = deepcopy(self.image_shape) + image_shape.insert(0, -1) + if mode == "train": + image = fluid.data(name='image', shape=image_shape, dtype='float32') + label = fluid.data(name='label', shape=[None, 1], dtype='int64') + feed_list = [image, label] + labels = {'label': label} + loader = fluid.io.DataLoader.from_generator( + feed_list=feed_list, + capacity=64, + use_double_buffer=True, + iterable=False) + else: + labels = None + loader = None + image = fluid.data(name='image', shape=image_shape, dtype='float32') + return image, labels, loader + + def __call__(self, mode): + image, labels, loader = self.create_feed(mode) + inputs = image + conv_feas = self.backbone(inputs) + predicts = self.head(conv_feas, labels, mode) + if mode == "train": + loss = self.loss(predicts, labels) + label = labels['label'] + acc = fluid.layers.accuracy(predicts['predict'], label, k=1) + outputs = {'total_loss': loss, 'decoded_out': \ + predicts['decoded_out'], 'label': label, 'acc': acc} + return loader, outputs + elif mode == "export": + return [image, predicts] + else: + return loader, predicts diff --git a/ppocr/modeling/architectures/det_model.py b/ppocr/modeling/architectures/det_model.py old mode 100755 new mode 100644 index 7ea0949bc80c55724182704616a93b879be92314..e4c32b8eba056f8f6483e9ed2170a1650023fb0a --- a/ppocr/modeling/architectures/det_model.py +++ b/ppocr/modeling/architectures/det_model.py @@ -67,6 +67,7 @@ class DetModel(object): image = fluid.layers.data( name='image', shape=image_shape, dtype='float32') + image.stop_gradient = False if mode == "train": if self.algorithm == "EAST": h, w = int(image_shape[1] // 4), int(image_shape[2] // 4) @@ -97,6 +98,26 @@ class DetModel(object): 'shrink_mask':shrink_mask,\ 'threshold_map':threshold_map,\ 'threshold_mask':threshold_mask} + elif self.algorithm == "SAST": + input_score = fluid.layers.data( + name='score', shape=[1, 128, 128], dtype='float32') + input_border = fluid.layers.data( + name='border', shape=[5, 128, 128], dtype='float32') + input_mask = fluid.layers.data( + name='mask', shape=[1, 128, 128], dtype='float32') + input_tvo = fluid.layers.data( + name='tvo', shape=[9, 128, 128], dtype='float32') + input_tco = fluid.layers.data( + name='tco', shape=[3, 128, 128], dtype='float32') + feed_list = [ + image, input_score, input_border, input_mask, input_tvo, + input_tco + ] + labels = {'input_score': input_score,\ + 'input_border': input_border,\ + 'input_mask': input_mask,\ + 'input_tvo': input_tvo,\ + 'input_tco': input_tco} loader = fluid.io.DataLoader.from_generator( feed_list=feed_list, capacity=64, diff --git a/ppocr/modeling/architectures/rec_model.py b/ppocr/modeling/architectures/rec_model.py index e80a50ab6504c80aa7f10759576208486caf7c3f..261462044a9000561517c3657f5b5a6090fd107a 100755 --- a/ppocr/modeling/architectures/rec_model.py +++ b/ppocr/modeling/architectures/rec_model.py @@ -58,12 +58,17 @@ class RecModel(object): self.loss_type = global_params['loss_type'] self.image_shape = global_params['image_shape'] self.max_text_length = global_params['max_text_length'] + if "num_heads" in global_params: + self.num_heads = global_params["num_heads"] + else: + self.num_heads = None def create_feed(self, mode): image_shape = deepcopy(self.image_shape) image_shape.insert(0, -1) if mode == "train": image = fluid.data(name='image', shape=image_shape, dtype='float32') + image.stop_gradient = False if self.loss_type == "attention": label_in = fluid.data( name='label_in', @@ -77,6 +82,48 @@ class RecModel(object): lod_level=1) feed_list = [image, label_in, label_out] labels = {'label_in': label_in, 'label_out': label_out} + elif self.loss_type == "srn": + encoder_word_pos = fluid.data( + name="encoder_word_pos", + shape=[ + -1, int((image_shape[-2] / 8) * (image_shape[-1] / 8)), + 1 + ], + dtype="int64") + gsrm_word_pos = fluid.data( + name="gsrm_word_pos", + shape=[-1, self.max_text_length, 1], + dtype="int64") + gsrm_slf_attn_bias1 = fluid.data( + name="gsrm_slf_attn_bias1", + shape=[ + -1, self.num_heads, self.max_text_length, + self.max_text_length + ], + dtype="float32") + gsrm_slf_attn_bias2 = fluid.data( + name="gsrm_slf_attn_bias2", + shape=[ + -1, self.num_heads, self.max_text_length, + self.max_text_length + ], + dtype="float32") + lbl_weight = fluid.layers.data( + name="lbl_weight", shape=[-1, 1], dtype='int64') + label = fluid.data( + name='label', shape=[-1, 1], dtype='int32', lod_level=1) + feed_list = [ + image, label, encoder_word_pos, gsrm_word_pos, + gsrm_slf_attn_bias1, gsrm_slf_attn_bias2, lbl_weight + ] + labels = { + 'label': label, + 'encoder_word_pos': encoder_word_pos, + 'gsrm_word_pos': gsrm_word_pos, + 'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1, + 'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2, + 'lbl_weight': lbl_weight + } else: label = fluid.data( name='label', shape=[None, 1], dtype='int32', lod_level=1) @@ -88,7 +135,9 @@ class RecModel(object): use_double_buffer=True, iterable=False) else: - if self.char_type == "ch" and self.infer_img: + labels = None + loader = None + if self.char_type == "ch" and self.infer_img and self.loss_type != "srn": image_shape[-1] = -1 if self.tps != None: logger.info( @@ -98,8 +147,40 @@ class RecModel(object): ) image_shape = deepcopy(self.image_shape) image = fluid.data(name='image', shape=image_shape, dtype='float32') - labels = None - loader = None + image.stop_gradient = False + if self.loss_type == "srn": + encoder_word_pos = fluid.data( + name="encoder_word_pos", + shape=[ + -1, int((image_shape[-2] / 8) * (image_shape[-1] / 8)), + 1 + ], + dtype="int64") + gsrm_word_pos = fluid.data( + name="gsrm_word_pos", + shape=[-1, self.max_text_length, 1], + dtype="int64") + gsrm_slf_attn_bias1 = fluid.data( + name="gsrm_slf_attn_bias1", + shape=[ + -1, self.num_heads, self.max_text_length, + self.max_text_length + ], + dtype="float32") + gsrm_slf_attn_bias2 = fluid.data( + name="gsrm_slf_attn_bias2", + shape=[ + -1, self.num_heads, self.max_text_length, + self.max_text_length + ], + dtype="float32") + labels = { + 'encoder_word_pos': encoder_word_pos, + 'gsrm_word_pos': gsrm_word_pos, + 'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1, + 'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2 + } + return image, labels, loader def __call__(self, mode): @@ -117,13 +198,32 @@ class RecModel(object): label = labels['label_out'] else: label = labels['label'] - outputs = {'total_loss':loss, 'decoded_out':\ - decoded_out, 'label':label} + if self.loss_type == 'srn': + total_loss, img_loss, word_loss = self.loss(predicts, labels) + outputs = { + 'total_loss': total_loss, + 'img_loss': img_loss, + 'word_loss': word_loss, + 'decoded_out': decoded_out, + 'label': label + } + else: + outputs = {'total_loss':loss, 'decoded_out':\ + decoded_out, 'label':label} return loader, outputs + elif mode == "export": predict = predicts['predict'] if self.loss_type == "ctc": predict = fluid.layers.softmax(predict) + if self.loss_type == "srn": + return [ + image, labels, { + 'decoded_out': decoded_out, + 'predicts': predict + } + ] + return [image, {'decoded_out': decoded_out, 'predicts': predict}] else: predict = predicts['predict'] diff --git a/ppocr/modeling/backbones/det_mobilenet_v3.py b/ppocr/modeling/backbones/det_mobilenet_v3.py index 87f5dd72452bbd4be96f8532ff6486c299318c5a..508f2bbf2d73216fc35a2d906f94731707a25736 100755 --- a/ppocr/modeling/backbones/det_mobilenet_v3.py +++ b/ppocr/modeling/backbones/det_mobilenet_v3.py @@ -79,6 +79,8 @@ class MobileNetV3(): assert self.scale in supported_scale, \ "supported scale are {} but input scale is {}".format(supported_scale, self.scale) + self.disable_se = params.get('disable_se', False) + def __call__(self, input): scale = self.scale inplanes = self.inplanes @@ -232,7 +234,7 @@ class MobileNetV3(): num_groups=num_mid_filter, use_cudnn=False, name=name + '_depthwise') - if use_se: + if use_se and not self.disable_se: conv1 = self.se_block( input=conv1, num_out_filter=num_mid_filter, name=name + '_se') diff --git a/ppocr/modeling/backbones/det_resnet_vd_sast.py b/ppocr/modeling/backbones/det_resnet_vd_sast.py new file mode 100644 index 0000000000000000000000000000000000000000..14fe713852d009bef8dd7f3739c2e8070789e359 --- /dev/null +++ b/ppocr/modeling/backbones/det_resnet_vd_sast.py @@ -0,0 +1,274 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle.fluid as fluid +from paddle.fluid.param_attr import ParamAttr + +__all__ = ["ResNet"] + + +class ResNet(object): + def __init__(self, params): + """ + the Resnet backbone network for detection module. + Args: + params(dict): the super parameters for network build + """ + self.layers = params['layers'] + supported_layers = [18, 34, 50, 101, 152] + assert self.layers in supported_layers, \ + "supported layers are {} but input layer is {}".format(supported_layers, self.layers) + self.is_3x3 = True + + def __call__(self, input): + layers = self.layers + is_3x3 = self.is_3x3 + # if layers == 18: + # depth = [2, 2, 2, 2] + # elif layers == 34 or layers == 50: + # depth = [3, 4, 6, 3] + # elif layers == 101: + # depth = [3, 4, 23, 3] + # elif layers == 152: + # depth = [3, 8, 36, 3] + # elif layers == 200: + # depth = [3, 12, 48, 3] + # num_filters = [64, 128, 256, 512] + # outs = [] + + if layers == 18: + depth = [2, 2, 2, 2]#, 3, 3] + elif layers == 34 or layers == 50: + #depth = [3, 4, 6, 3]#, 3, 3] + depth = [3, 4, 6, 3, 3]#, 3] + elif layers == 101: + depth = [3, 4, 23, 3]#, 3, 3] + elif layers == 152: + depth = [3, 8, 36, 3]#, 3, 3] + num_filters = [64, 128, 256, 512, 512]#, 512] + blocks = {} + + idx = 'block_0' + blocks[idx] = input + + if is_3x3 == False: + conv = self.conv_bn_layer( + input=input, + num_filters=64, + filter_size=7, + stride=2, + act='relu') + else: + conv = self.conv_bn_layer( + input=input, + num_filters=32, + filter_size=3, + stride=2, + act='relu', + name='conv1_1') + conv = self.conv_bn_layer( + input=conv, + num_filters=32, + filter_size=3, + stride=1, + act='relu', + name='conv1_2') + conv = self.conv_bn_layer( + input=conv, + num_filters=64, + filter_size=3, + stride=1, + act='relu', + name='conv1_3') + idx = 'block_1' + blocks[idx] = conv + + conv = fluid.layers.pool2d( + input=conv, + pool_size=3, + pool_stride=2, + pool_padding=1, + pool_type='max') + + if layers >= 50: + for block in range(len(depth)): + for i in range(depth[block]): + if layers in [101, 152, 200] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + conv = self.bottleneck_block( + input=conv, + num_filters=num_filters[block], + stride=2 if i == 0 and block != 0 else 1, + if_first=block == i == 0, + name=conv_name) + # outs.append(conv) + idx = 'block_' + str(block + 2) + blocks[idx] = conv + else: + for block in range(len(depth)): + for i in range(depth[block]): + conv_name = "res" + str(block + 2) + chr(97 + i) + conv = self.basic_block( + input=conv, + num_filters=num_filters[block], + stride=2 if i == 0 and block != 0 else 1, + if_first=block == i == 0, + name=conv_name) + # outs.append(conv) + idx = 'block_' + str(block + 2) + blocks[idx] = conv + # return outs + return blocks + + def conv_bn_layer(self, + input, + num_filters, + filter_size, + stride=1, + groups=1, + act=None, + name=None): + conv = fluid.layers.conv2d( + input=input, + num_filters=num_filters, + filter_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=groups, + act=None, + param_attr=ParamAttr(name=name + "_weights"), + bias_attr=False) + if name == "conv1": + bn_name = "bn_" + name + else: + bn_name = "bn" + name[3:] + return fluid.layers.batch_norm( + input=conv, + act=act, + param_attr=ParamAttr(name=bn_name + '_scale'), + bias_attr=ParamAttr(bn_name + '_offset'), + moving_mean_name=bn_name + '_mean', + moving_variance_name=bn_name + '_variance') + + def conv_bn_layer_new(self, + input, + num_filters, + filter_size, + stride=1, + groups=1, + act=None, + name=None): + pool = fluid.layers.pool2d( + input=input, + pool_size=2, + pool_stride=2, + pool_padding=0, + pool_type='avg', + ceil_mode=True) + + conv = fluid.layers.conv2d( + input=pool, + num_filters=num_filters, + filter_size=filter_size, + stride=1, + padding=(filter_size - 1) // 2, + groups=groups, + act=None, + param_attr=ParamAttr(name=name + "_weights"), + bias_attr=False) + if name == "conv1": + bn_name = "bn_" + name + else: + bn_name = "bn" + name[3:] + return fluid.layers.batch_norm( + input=conv, + act=act, + param_attr=ParamAttr(name=bn_name + '_scale'), + bias_attr=ParamAttr(bn_name + '_offset'), + moving_mean_name=bn_name + '_mean', + moving_variance_name=bn_name + '_variance') + + def shortcut(self, input, ch_out, stride, name, if_first=False): + ch_in = input.shape[1] + if ch_in != ch_out or stride != 1: + if if_first: + return self.conv_bn_layer(input, ch_out, 1, stride, name=name) + else: + return self.conv_bn_layer_new( + input, ch_out, 1, stride, name=name) + elif if_first: + return self.conv_bn_layer(input, ch_out, 1, stride, name=name) + else: + return input + + def bottleneck_block(self, input, num_filters, stride, name, if_first): + conv0 = self.conv_bn_layer( + input=input, + num_filters=num_filters, + filter_size=1, + act='relu', + name=name + "_branch2a") + conv1 = self.conv_bn_layer( + input=conv0, + num_filters=num_filters, + filter_size=3, + stride=stride, + act='relu', + name=name + "_branch2b") + conv2 = self.conv_bn_layer( + input=conv1, + num_filters=num_filters * 4, + filter_size=1, + act=None, + name=name + "_branch2c") + + short = self.shortcut( + input, + num_filters * 4, + stride, + if_first=if_first, + name=name + "_branch1") + + return fluid.layers.elementwise_add(x=short, y=conv2, act='relu') + + def basic_block(self, input, num_filters, stride, name, if_first): + conv0 = self.conv_bn_layer( + input=input, + num_filters=num_filters, + filter_size=3, + act='relu', + stride=stride, + name=name + "_branch2a") + conv1 = self.conv_bn_layer( + input=conv0, + num_filters=num_filters, + filter_size=3, + act=None, + name=name + "_branch2b") + short = self.shortcut( + input, + num_filters, + stride, + if_first=if_first, + name=name + "_branch1") + return fluid.layers.elementwise_add(x=short, y=conv1, act='relu') diff --git a/ppocr/modeling/backbones/rec_mobilenet_v3.py b/ppocr/modeling/backbones/rec_mobilenet_v3.py index 506209cc9c701ac889b0d9f7eb4c1cb706a1f175..ff39a81210b7b71914f3c447b5e0035ac03db73b 100755 --- a/ppocr/modeling/backbones/rec_mobilenet_v3.py +++ b/ppocr/modeling/backbones/rec_mobilenet_v3.py @@ -31,16 +31,28 @@ __all__ = [ class MobileNetV3(): def __init__(self, params): - self.scale = params['scale'] - model_name = params['model_name'] + self.scale = params.get("scale", 0.5) + model_name = params.get("model_name", "small") + large_stride = params.get("large_stride", [1, 2, 2, 2]) + small_stride = params.get("small_stride", [2, 2, 2, 2]) + + assert isinstance(large_stride, list), "large_stride type must " \ + "be list but got {}".format(type(large_stride)) + assert isinstance(small_stride, list), "small_stride type must " \ + "be list but got {}".format(type(small_stride)) + assert len(large_stride) == 4, "large_stride length must be " \ + "4 but got {}".format(len(large_stride)) + assert len(small_stride) == 4, "small_stride length must be " \ + "4 but got {}".format(len(small_stride)) + self.inplanes = 16 if model_name == "large": self.cfg = [ # k, exp, c, se, nl, s, - [3, 16, 16, False, 'relu', 1], - [3, 64, 24, False, 'relu', (2, 1)], + [3, 16, 16, False, 'relu', large_stride[0]], + [3, 64, 24, False, 'relu', (large_stride[1], 1)], [3, 72, 24, False, 'relu', 1], - [5, 72, 40, True, 'relu', (2, 1)], + [5, 72, 40, True, 'relu', (large_stride[2], 1)], [5, 120, 40, True, 'relu', 1], [5, 120, 40, True, 'relu', 1], [3, 240, 80, False, 'hard_swish', 1], @@ -49,7 +61,7 @@ class MobileNetV3(): [3, 184, 80, False, 'hard_swish', 1], [3, 480, 112, True, 'hard_swish', 1], [3, 672, 112, True, 'hard_swish', 1], - [5, 672, 160, True, 'hard_swish', (2, 1)], + [5, 672, 160, True, 'hard_swish', (large_stride[3], 1)], [5, 960, 160, True, 'hard_swish', 1], [5, 960, 160, True, 'hard_swish', 1], ] @@ -58,15 +70,15 @@ class MobileNetV3(): elif model_name == "small": self.cfg = [ # k, exp, c, se, nl, s, - [3, 16, 16, True, 'relu', (2, 1)], - [3, 72, 24, False, 'relu', (2, 1)], + [3, 16, 16, True, 'relu', (small_stride[0], 1)], + [3, 72, 24, False, 'relu', (small_stride[1], 1)], [3, 88, 24, False, 'relu', 1], - [5, 96, 40, True, 'hard_swish', (2, 1)], + [5, 96, 40, True, 'hard_swish', (small_stride[2], 1)], [5, 240, 40, True, 'hard_swish', 1], [5, 240, 40, True, 'hard_swish', 1], [5, 120, 48, True, 'hard_swish', 1], [5, 144, 48, True, 'hard_swish', 1], - [5, 288, 96, True, 'hard_swish', (2, 1)], + [5, 288, 96, True, 'hard_swish', (small_stride[3], 1)], [5, 576, 96, True, 'hard_swish', 1], [5, 576, 96, True, 'hard_swish', 1], ] @@ -78,7 +90,7 @@ class MobileNetV3(): supported_scale = [0.35, 0.5, 0.75, 1.0, 1.25] assert self.scale in supported_scale, \ - "supported scale are {} but input scale is {}".format(supported_scale, scale) + "supported scales are {} but input scale is {}".format(supported_scale, self.scale) def __call__(self, input): scale = self.scale diff --git a/ppocr/modeling/backbones/rec_resnet_fpn.py b/ppocr/modeling/backbones/rec_resnet_fpn.py new file mode 100755 index 0000000000000000000000000000000000000000..0a05b5def8b79943f045d9cc941835cddc82bfdb --- /dev/null +++ b/ppocr/modeling/backbones/rec_resnet_fpn.py @@ -0,0 +1,246 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math + +import paddle +import paddle.fluid as fluid +from paddle.fluid.param_attr import ParamAttr + +__all__ = [ + "ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152" +] + +Trainable = True +w_nolr = fluid.ParamAttr(trainable=Trainable) +train_parameters = { + "input_size": [3, 224, 224], + "input_mean": [0.485, 0.456, 0.406], + "input_std": [0.229, 0.224, 0.225], + "learning_strategy": { + "name": "piecewise_decay", + "batch_size": 256, + "epochs": [30, 60, 90], + "steps": [0.1, 0.01, 0.001, 0.0001] + } +} + + +class ResNet(): + def __init__(self, params): + self.layers = params['layers'] + self.params = train_parameters + + def __call__(self, input): + layers = self.layers + supported_layers = [18, 34, 50, 101, 152] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format(supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + stride_list = [(2, 2), (2, 2), (1, 1), (1, 1)] + num_filters = [64, 128, 256, 512] + + conv = self.conv_bn_layer( + input=input, + num_filters=64, + filter_size=7, + stride=2, + act='relu', + name="conv1") + F = [] + if layers >= 50: + for block in range(len(depth)): + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + conv = self.bottleneck_block( + input=conv, + num_filters=num_filters[block], + stride=stride_list[block] if i == 0 else 1, + name=conv_name) + F.append(conv) + else: + for block in range(len(depth)): + for i in range(depth[block]): + conv_name = "res" + str(block + 2) + chr(97 + i) + + if i == 0 and block != 0: + stride = (2, 1) + else: + stride = (1, 1) + + conv = self.basic_block( + input=conv, + num_filters=num_filters[block], + stride=stride, + if_first=block == i == 0, + name=conv_name) + F.append(conv) + + base = F[-1] + for i in [-2, -3]: + b, c, w, h = F[i].shape + if (w, h) == base.shape[2:]: + base = base + else: + base = fluid.layers.conv2d_transpose( + input=base, + num_filters=c, + filter_size=4, + stride=2, + padding=1, + act=None, + param_attr=w_nolr, + bias_attr=w_nolr) + base = fluid.layers.batch_norm( + base, act="relu", param_attr=w_nolr, bias_attr=w_nolr) + base = fluid.layers.concat([base, F[i]], axis=1) + base = fluid.layers.conv2d( + base, + num_filters=c, + filter_size=1, + param_attr=w_nolr, + bias_attr=w_nolr) + base = fluid.layers.conv2d( + base, + num_filters=c, + filter_size=3, + padding=1, + param_attr=w_nolr, + bias_attr=w_nolr) + base = fluid.layers.batch_norm( + base, act="relu", param_attr=w_nolr, bias_attr=w_nolr) + + base = fluid.layers.conv2d( + base, + num_filters=512, + filter_size=1, + bias_attr=w_nolr, + param_attr=w_nolr) + + return base + + def conv_bn_layer(self, + input, + num_filters, + filter_size, + stride=1, + groups=1, + act=None, + name=None): + conv = fluid.layers.conv2d( + input=input, + num_filters=num_filters, + filter_size=2 if stride == (1, 1) else filter_size, + dilation=2 if stride == (1, 1) else 1, + stride=stride, + padding=(filter_size - 1) // 2, + groups=groups, + act=None, + param_attr=ParamAttr( + name=name + "_weights", trainable=Trainable), + bias_attr=False, + name=name + '.conv2d.output.1') + + if name == "conv1": + bn_name = "bn_" + name + else: + bn_name = "bn" + name[3:] + return fluid.layers.batch_norm( + input=conv, + act=act, + name=bn_name + '.output.1', + param_attr=ParamAttr( + name=bn_name + '_scale', trainable=Trainable), + bias_attr=ParamAttr( + bn_name + '_offset', trainable=Trainable), + moving_mean_name=bn_name + '_mean', + moving_variance_name=bn_name + '_variance', ) + + def shortcut(self, input, ch_out, stride, is_first, name): + ch_in = input.shape[1] + if ch_in != ch_out or stride != 1 or is_first == True: + if stride == (1, 1): + return self.conv_bn_layer(input, ch_out, 1, 1, name=name) + else: #stride == (2,2) + return self.conv_bn_layer(input, ch_out, 1, stride, name=name) + + else: + return input + + def bottleneck_block(self, input, num_filters, stride, name): + conv0 = self.conv_bn_layer( + input=input, + num_filters=num_filters, + filter_size=1, + act='relu', + name=name + "_branch2a") + conv1 = self.conv_bn_layer( + input=conv0, + num_filters=num_filters, + filter_size=3, + stride=stride, + act='relu', + name=name + "_branch2b") + conv2 = self.conv_bn_layer( + input=conv1, + num_filters=num_filters * 4, + filter_size=1, + act=None, + name=name + "_branch2c") + + short = self.shortcut( + input, + num_filters * 4, + stride, + is_first=False, + name=name + "_branch1") + + return fluid.layers.elementwise_add( + x=short, y=conv2, act='relu', name=name + ".add.output.5") + + def basic_block(self, input, num_filters, stride, is_first, name): + conv0 = self.conv_bn_layer( + input=input, + num_filters=num_filters, + filter_size=3, + act='relu', + stride=stride, + name=name + "_branch2a") + conv1 = self.conv_bn_layer( + input=conv0, + num_filters=num_filters, + filter_size=3, + act=None, + name=name + "_branch2b") + short = self.shortcut( + input, num_filters, stride, is_first, name=name + "_branch1") + return fluid.layers.elementwise_add(x=short, y=conv1, act='relu') diff --git a/ppocr/modeling/heads/cls_head.py b/ppocr/modeling/heads/cls_head.py new file mode 100644 index 0000000000000000000000000000000000000000..4567adcbaef3ddb14f3b46b27f772e3836db6793 --- /dev/null +++ b/ppocr/modeling/heads/cls_head.py @@ -0,0 +1,46 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math + +import paddle +import paddle.fluid as fluid + + +class ClsHead(object): + def __init__(self, params): + super(ClsHead, self).__init__() + self.class_dim = params['class_dim'] + + def __call__(self, inputs, labels=None, mode=None): + pool = fluid.layers.pool2d( + input=inputs, pool_type='avg', global_pooling=True) + stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0) + + out = fluid.layers.fc( + input=pool, + size=self.class_dim, + param_attr=fluid.param_attr.ParamAttr( + name="fc_0.w_0", + initializer=fluid.initializer.Uniform(-stdv, stdv)), + bias_attr=fluid.param_attr.ParamAttr(name="fc_0.b_0")) + + softmax_out = fluid.layers.softmax(out, use_cudnn=False) + out_label = fluid.layers.argmax(out, axis=1) + predicts = {'predict': softmax_out, 'decoded_out': out_label} + return predicts diff --git a/ppocr/modeling/heads/det_db_head.py b/ppocr/modeling/heads/det_db_head.py index 56998044d8923a2bbda094e01b5a4eb2f5496bb3..59b3a1601a3d070ea7101cd71ceb388147d946b8 100644 --- a/ppocr/modeling/heads/det_db_head.py +++ b/ppocr/modeling/heads/det_db_head.py @@ -123,6 +123,13 @@ class DBHead(object): return fluid.layers.reciprocal(1 + fluid.layers.exp(-self.k * (x - y))) def __call__(self, conv_features, mode="train"): + """ + Fuse different levels of feature map from backbone in the FPN manner. + Args: + conv_features(list): feature maps from backbone + mode(str): runtime mode, can be "train", "eval" or "test" + Return: predicts + """ c2, c3, c4, c5 = conv_features param_attr = fluid.initializer.MSRAInitializer(uniform=False) in5 = fluid.layers.conv2d( diff --git a/ppocr/modeling/heads/det_sast_head.py b/ppocr/modeling/heads/det_sast_head.py new file mode 100644 index 0000000000000000000000000000000000000000..0097913dd7e08c76c45064940416e7c9ffc32f26 --- /dev/null +++ b/ppocr/modeling/heads/det_sast_head.py @@ -0,0 +1,228 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle.fluid as fluid +from ..common_functions import conv_bn_layer, deconv_bn_layer +from collections import OrderedDict + + +class SASTHead(object): + """ + SAST: + see arxiv: https://arxiv.org/abs/1908.05498 + args: + params(dict): the super parameters for network build + """ + + def __init__(self, params): + self.model_name = params['model_name'] + self.with_cab = params['with_cab'] + + def FPN_Up_Fusion(self, blocks): + """ + blocks{}: contain block_2, block_3, block_4, block_5, block_6, block_7 with + 1/4, 1/8, 1/16, 1/32, 1/64, 1/128 resolution. + """ + f = [blocks['block_6'], blocks['block_5'], blocks['block_4'], blocks['block_3'], blocks['block_2']] + num_outputs = [256, 256, 192, 192, 128] + g = [None, None, None, None, None] + h = [None, None, None, None, None] + for i in range(5): + h[i] = conv_bn_layer(input=f[i], num_filters=num_outputs[i], + filter_size=1, stride=1, act=None, name='fpn_up_h'+str(i)) + + for i in range(4): + if i == 0: + g[i] = deconv_bn_layer(input=h[i], num_filters=num_outputs[i + 1], act=None, name='fpn_up_g0') + #print("g[{}] shape: {}".format(i, g[i].shape)) + else: + g[i] = fluid.layers.elementwise_add(x=g[i - 1], y=h[i]) + g[i] = fluid.layers.relu(g[i]) + #g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i], + # filter_size=1, stride=1, act='relu') + g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i], + filter_size=3, stride=1, act='relu', name='fpn_up_g%d_1'%i) + g[i] = deconv_bn_layer(input=g[i], num_filters=num_outputs[i + 1], act=None, name='fpn_up_g%d_2'%i) + #print("g[{}] shape: {}".format(i, g[i].shape)) + + g[4] = fluid.layers.elementwise_add(x=g[3], y=h[4]) + g[4] = fluid.layers.relu(g[4]) + g[4] = conv_bn_layer(input=g[4], num_filters=num_outputs[4], + filter_size=3, stride=1, act='relu', name='fpn_up_fusion_1') + g[4] = conv_bn_layer(input=g[4], num_filters=num_outputs[4], + filter_size=1, stride=1, act=None, name='fpn_up_fusion_2') + + return g[4] + + def FPN_Down_Fusion(self, blocks): + """ + blocks{}: contain block_2, block_3, block_4, block_5, block_6, block_7 with + 1/4, 1/8, 1/16, 1/32, 1/64, 1/128 resolution. + """ + f = [blocks['block_0'], blocks['block_1'], blocks['block_2']] + num_outputs = [32, 64, 128] + g = [None, None, None] + h = [None, None, None] + for i in range(3): + h[i] = conv_bn_layer(input=f[i], num_filters=num_outputs[i], + filter_size=3, stride=1, act=None, name='fpn_down_h'+str(i)) + for i in range(2): + if i == 0: + g[i] = conv_bn_layer(input=h[i], num_filters=num_outputs[i+1], filter_size=3, stride=2, act=None, name='fpn_down_g0') + else: + g[i] = fluid.layers.elementwise_add(x=g[i - 1], y=h[i]) + g[i] = fluid.layers.relu(g[i]) + g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i], filter_size=3, stride=1, act='relu', name='fpn_down_g%d_1'%i) + g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i+1], filter_size=3, stride=2, act=None, name='fpn_down_g%d_2'%i) + # print("g[{}] shape: {}".format(i, g[i].shape)) + g[2] = fluid.layers.elementwise_add(x=g[1], y=h[2]) + g[2] = fluid.layers.relu(g[2]) + g[2] = conv_bn_layer(input=g[2], num_filters=num_outputs[2], + filter_size=3, stride=1, act='relu', name='fpn_down_fusion_1') + g[2] = conv_bn_layer(input=g[2], num_filters=num_outputs[2], + filter_size=1, stride=1, act=None, name='fpn_down_fusion_2') + return g[2] + + def SAST_Header1(self, f_common): + """Detector header.""" + #f_score + f_score = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_score1') + f_score = conv_bn_layer(input=f_score, num_filters=64, filter_size=3, stride=1, act='relu', name='f_score2') + f_score = conv_bn_layer(input=f_score, num_filters=128, filter_size=1, stride=1, act='relu', name='f_score3') + f_score = conv_bn_layer(input=f_score, num_filters=1, filter_size=3, stride=1, name='f_score4') + f_score = fluid.layers.sigmoid(f_score) + # print("f_score shape: {}".format(f_score.shape)) + + #f_boder + f_border = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_border1') + f_border = conv_bn_layer(input=f_border, num_filters=64, filter_size=3, stride=1, act='relu', name='f_border2') + f_border = conv_bn_layer(input=f_border, num_filters=128, filter_size=1, stride=1, act='relu', name='f_border3') + f_border = conv_bn_layer(input=f_border, num_filters=4, filter_size=3, stride=1, name='f_border4') + # print("f_border shape: {}".format(f_border.shape)) + + return f_score, f_border + + def SAST_Header2(self, f_common): + """Detector header.""" + #f_tvo + f_tvo = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_tvo1') + f_tvo = conv_bn_layer(input=f_tvo, num_filters=64, filter_size=3, stride=1, act='relu', name='f_tvo2') + f_tvo = conv_bn_layer(input=f_tvo, num_filters=128, filter_size=1, stride=1, act='relu', name='f_tvo3') + f_tvo = conv_bn_layer(input=f_tvo, num_filters=8, filter_size=3, stride=1, name='f_tvo4') + # print("f_tvo shape: {}".format(f_tvo.shape)) + + #f_tco + f_tco = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_tco1') + f_tco = conv_bn_layer(input=f_tco, num_filters=64, filter_size=3, stride=1, act='relu', name='f_tco2') + f_tco = conv_bn_layer(input=f_tco, num_filters=128, filter_size=1, stride=1, act='relu', name='f_tco3') + f_tco = conv_bn_layer(input=f_tco, num_filters=2, filter_size=3, stride=1, name='f_tco4') + # print("f_tco shape: {}".format(f_tco.shape)) + + return f_tvo, f_tco + + def cross_attention(self, f_common): + """ + """ + f_shape = fluid.layers.shape(f_common) + f_theta = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_theta') + f_phi = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_phi') + f_g = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_g') + ### horizon + fh_theta = f_theta + fh_phi = f_phi + fh_g = f_g + #flatten + fh_theta = fluid.layers.transpose(fh_theta, [0, 2, 3, 1]) + fh_theta = fluid.layers.reshape(fh_theta, [f_shape[0] * f_shape[2], f_shape[3], 128]) + fh_phi = fluid.layers.transpose(fh_phi, [0, 2, 3, 1]) + fh_phi = fluid.layers.reshape(fh_phi, [f_shape[0] * f_shape[2], f_shape[3], 128]) + fh_g = fluid.layers.transpose(fh_g, [0, 2, 3, 1]) + fh_g = fluid.layers.reshape(fh_g, [f_shape[0] * f_shape[2], f_shape[3], 128]) + #correlation + fh_attn = fluid.layers.matmul(fh_theta, fluid.layers.transpose(fh_phi, [0, 2, 1])) + #scale + fh_attn = fh_attn / (128 ** 0.5) + fh_attn = fluid.layers.softmax(fh_attn) + #weighted sum + fh_weight = fluid.layers.matmul(fh_attn, fh_g) + fh_weight = fluid.layers.reshape(fh_weight, [f_shape[0], f_shape[2], f_shape[3], 128]) + # print("fh_weight: {}".format(fh_weight.shape)) + fh_weight = fluid.layers.transpose(fh_weight, [0, 3, 1, 2]) + fh_weight = conv_bn_layer(input=fh_weight, num_filters=128, filter_size=1, stride=1, name='fh_weight') + #short cut + fh_sc = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, name='fh_sc') + f_h = fluid.layers.relu(fh_weight + fh_sc) + ###### + #vertical + fv_theta = fluid.layers.transpose(f_theta, [0, 1, 3, 2]) + fv_phi = fluid.layers.transpose(f_phi, [0, 1, 3, 2]) + fv_g = fluid.layers.transpose(f_g, [0, 1, 3, 2]) + #flatten + fv_theta = fluid.layers.transpose(fv_theta, [0, 2, 3, 1]) + fv_theta = fluid.layers.reshape(fv_theta, [f_shape[0] * f_shape[3], f_shape[2], 128]) + fv_phi = fluid.layers.transpose(fv_phi, [0, 2, 3, 1]) + fv_phi = fluid.layers.reshape(fv_phi, [f_shape[0] * f_shape[3], f_shape[2], 128]) + fv_g = fluid.layers.transpose(fv_g, [0, 2, 3, 1]) + fv_g = fluid.layers.reshape(fv_g, [f_shape[0] * f_shape[3], f_shape[2], 128]) + #correlation + fv_attn = fluid.layers.matmul(fv_theta, fluid.layers.transpose(fv_phi, [0, 2, 1])) + #scale + fv_attn = fv_attn / (128 ** 0.5) + fv_attn = fluid.layers.softmax(fv_attn) + #weighted sum + fv_weight = fluid.layers.matmul(fv_attn, fv_g) + fv_weight = fluid.layers.reshape(fv_weight, [f_shape[0], f_shape[3], f_shape[2], 128]) + # print("fv_weight: {}".format(fv_weight.shape)) + fv_weight = fluid.layers.transpose(fv_weight, [0, 3, 2, 1]) + fv_weight = conv_bn_layer(input=fv_weight, num_filters=128, filter_size=1, stride=1, name='fv_weight') + #short cut + fv_sc = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, name='fv_sc') + f_v = fluid.layers.relu(fv_weight + fv_sc) + ###### + f_attn = fluid.layers.concat([f_h, f_v], axis=1) + f_attn = conv_bn_layer(input=f_attn, num_filters=128, filter_size=1, stride=1, act='relu', name='f_attn') + return f_attn + + def __call__(self, blocks, with_cab=False): + # for k, v in blocks.items(): + # print(k, v.shape) + + #down fpn + f_down = self.FPN_Down_Fusion(blocks) + # print("f_down shape: {}".format(f_down.shape)) + #up fpn + f_up = self.FPN_Up_Fusion(blocks) + # print("f_up shape: {}".format(f_up.shape)) + #fusion + f_common = fluid.layers.elementwise_add(x=f_down, y=f_up) + f_common = fluid.layers.relu(f_common) + # print("f_common: {}".format(f_common.shape)) + + if self.with_cab: + # print('enhence f_common with CAB.') + f_common = self.cross_attention(f_common) + + f_score, f_border= self.SAST_Header1(f_common) + f_tvo, f_tco = self.SAST_Header2(f_common) + + predicts = OrderedDict() + predicts['f_score'] = f_score + predicts['f_border'] = f_border + predicts['f_tvo'] = f_tvo + predicts['f_tco'] = f_tco + return predicts \ No newline at end of file diff --git a/ppocr/modeling/heads/rec_ctc_head.py b/ppocr/modeling/heads/rec_ctc_head.py index 37b4b00f8a16b219c978c685ea8a0d37234b40e4..84948c2b20933d0f2086a42442a420d1b6b1eeee 100755 --- a/ppocr/modeling/heads/rec_ctc_head.py +++ b/ppocr/modeling/heads/rec_ctc_head.py @@ -32,14 +32,16 @@ class CTCPredict(object): self.char_num = params['char_num'] self.encoder = SequenceEncoder(params) self.encoder_type = params['encoder_type'] + self.fc_decay = params.get("fc_decay", 0.0004) def __call__(self, inputs, labels=None, mode=None): - encoder_features = self.encoder(inputs) - if self.encoder_type != "reshape": - encoder_features = fluid.layers.concat(encoder_features, axis=1) - name = "ctc_fc" - para_attr, bias_attr = get_para_bias_attr( - l2_decay=0.0004, k=encoder_features.shape[1], name=name) + with fluid.scope_guard("skip_quant"): + encoder_features = self.encoder(inputs) + if self.encoder_type != "reshape": + encoder_features = fluid.layers.concat(encoder_features, axis=1) + name = "ctc_fc" + para_attr, bias_attr = get_para_bias_attr( + l2_decay=self.fc_decay, k=encoder_features.shape[1], name=name) predict = fluid.layers.fc(input=encoder_features, size=self.char_num + 1, param_attr=para_attr, diff --git a/ppocr/modeling/heads/rec_srn_all_head.py b/ppocr/modeling/heads/rec_srn_all_head.py new file mode 100755 index 0000000000000000000000000000000000000000..e1bb955d437faca243e3768e24b47f4328218624 --- /dev/null +++ b/ppocr/modeling/heads/rec_srn_all_head.py @@ -0,0 +1,230 @@ +#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math + +import paddle +import paddle.fluid as fluid +from paddle.fluid.param_attr import ParamAttr +import numpy as np +from .self_attention.model import wrap_encoder +from .self_attention.model import wrap_encoder_forFeature +gradient_clip = 10 + + +class SRNPredict(object): + def __init__(self, params): + super(SRNPredict, self).__init__() + self.char_num = params['char_num'] + self.max_length = params['max_text_length'] + + self.num_heads = params['num_heads'] + self.num_encoder_TUs = params['num_encoder_TUs'] + self.num_decoder_TUs = params['num_decoder_TUs'] + self.hidden_dims = params['hidden_dims'] + + def pvam(self, inputs, others): + + b, c, h, w = inputs.shape + conv_features = fluid.layers.reshape(x=inputs, shape=[-1, c, h * w]) + conv_features = fluid.layers.transpose(x=conv_features, perm=[0, 2, 1]) + + #===== Transformer encoder ===== + b, t, c = conv_features.shape + encoder_word_pos = others["encoder_word_pos"] + gsrm_word_pos = others["gsrm_word_pos"] + + enc_inputs = [conv_features, encoder_word_pos, None] + word_features = wrap_encoder_forFeature( + src_vocab_size=-1, + max_length=t, + n_layer=self.num_encoder_TUs, + n_head=self.num_heads, + d_key=int(self.hidden_dims / self.num_heads), + d_value=int(self.hidden_dims / self.num_heads), + d_model=self.hidden_dims, + d_inner_hid=self.hidden_dims, + prepostprocess_dropout=0.1, + attention_dropout=0.1, + relu_dropout=0.1, + preprocess_cmd="n", + postprocess_cmd="da", + weight_sharing=True, + enc_inputs=enc_inputs, ) + fluid.clip.set_gradient_clip( + fluid.clip.GradientClipByValue(gradient_clip)) + + #===== Parallel Visual Attention Module ===== + b, t, c = word_features.shape + + word_features = fluid.layers.fc(word_features, c, num_flatten_dims=2) + word_features_ = fluid.layers.reshape(word_features, [-1, 1, t, c]) + word_features_ = fluid.layers.expand(word_features_, + [1, self.max_length, 1, 1]) + word_pos_feature = fluid.layers.embedding(gsrm_word_pos, + [self.max_length, c]) + word_pos_ = fluid.layers.reshape(word_pos_feature, + [-1, self.max_length, 1, c]) + word_pos_ = fluid.layers.expand(word_pos_, [1, 1, t, 1]) + temp = fluid.layers.elementwise_add( + word_features_, word_pos_, act='tanh') + + attention_weight = fluid.layers.fc(input=temp, + size=1, + num_flatten_dims=3, + bias_attr=False) + attention_weight = fluid.layers.reshape( + x=attention_weight, shape=[-1, self.max_length, t]) + attention_weight = fluid.layers.softmax(input=attention_weight, axis=-1) + + pvam_features = fluid.layers.matmul(attention_weight, + word_features) #[b, max_length, c] + + return pvam_features + + def gsrm(self, pvam_features, others): + + #===== GSRM Visual-to-semantic embedding block ===== + b, t, c = pvam_features.shape + word_out = fluid.layers.fc( + input=fluid.layers.reshape(pvam_features, [-1, c]), + size=self.char_num, + act="softmax") + #word_out.stop_gradient = True + word_ids = fluid.layers.argmax(word_out, axis=1) + word_ids.stop_gradient = True + word_ids = fluid.layers.reshape(x=word_ids, shape=[-1, t, 1]) + + #===== GSRM Semantic reasoning block ===== + """ + This module is achieved through bi-transformers, + ngram_feature1 is the froward one, ngram_fetaure2 is the backward one + """ + pad_idx = self.char_num + gsrm_word_pos = others["gsrm_word_pos"] + gsrm_slf_attn_bias1 = others["gsrm_slf_attn_bias1"] + gsrm_slf_attn_bias2 = others["gsrm_slf_attn_bias2"] + + def prepare_bi(word_ids): + """ + prepare bi for gsrm + word1 for forward; word2 for backward + """ + word1 = fluid.layers.cast(word_ids, "float32") + word1 = fluid.layers.pad(word1, [0, 0, 1, 0, 0, 0], + pad_value=1.0 * pad_idx) + word1 = fluid.layers.cast(word1, "int64") + word1 = word1[:, :-1, :] + word2 = word_ids + return word1, word2 + + word1, word2 = prepare_bi(word_ids) + word1.stop_gradient = True + word2.stop_gradient = True + enc_inputs_1 = [word1, gsrm_word_pos, gsrm_slf_attn_bias1] + enc_inputs_2 = [word2, gsrm_word_pos, gsrm_slf_attn_bias2] + + gsrm_feature1 = wrap_encoder( + src_vocab_size=self.char_num + 1, + max_length=self.max_length, + n_layer=self.num_decoder_TUs, + n_head=self.num_heads, + d_key=int(self.hidden_dims / self.num_heads), + d_value=int(self.hidden_dims / self.num_heads), + d_model=self.hidden_dims, + d_inner_hid=self.hidden_dims, + prepostprocess_dropout=0.1, + attention_dropout=0.1, + relu_dropout=0.1, + preprocess_cmd="n", + postprocess_cmd="da", + weight_sharing=True, + enc_inputs=enc_inputs_1, ) + gsrm_feature2 = wrap_encoder( + src_vocab_size=self.char_num + 1, + max_length=self.max_length, + n_layer=self.num_decoder_TUs, + n_head=self.num_heads, + d_key=int(self.hidden_dims / self.num_heads), + d_value=int(self.hidden_dims / self.num_heads), + d_model=self.hidden_dims, + d_inner_hid=self.hidden_dims, + prepostprocess_dropout=0.1, + attention_dropout=0.1, + relu_dropout=0.1, + preprocess_cmd="n", + postprocess_cmd="da", + weight_sharing=True, + enc_inputs=enc_inputs_2, ) + gsrm_feature2 = fluid.layers.pad(gsrm_feature2, [0, 0, 0, 1, 0, 0], + pad_value=0.) + gsrm_feature2 = gsrm_feature2[:, 1:, ] + gsrm_features = gsrm_feature1 + gsrm_feature2 + + b, t, c = gsrm_features.shape + + gsrm_out = fluid.layers.matmul( + x=gsrm_features, + y=fluid.default_main_program().global_block().var( + "src_word_emb_table"), + transpose_y=True) + b, t, c = gsrm_out.shape + gsrm_out = fluid.layers.softmax(input=fluid.layers.reshape(gsrm_out, + [-1, c])) + + return gsrm_features, word_out, gsrm_out + + def vsfd(self, pvam_features, gsrm_features): + + #===== Visual-Semantic Fusion Decoder Module ===== + b, t, c1 = pvam_features.shape + b, t, c2 = gsrm_features.shape + combine_features_ = fluid.layers.concat( + [pvam_features, gsrm_features], axis=2) + img_comb_features_ = fluid.layers.reshape( + x=combine_features_, shape=[-1, c1 + c2]) + img_comb_features_map = fluid.layers.fc(input=img_comb_features_, + size=c1, + act="sigmoid") + img_comb_features_map = fluid.layers.reshape( + x=img_comb_features_map, shape=[-1, t, c1]) + combine_features = img_comb_features_map * pvam_features + ( + 1.0 - img_comb_features_map) * gsrm_features + img_comb_features = fluid.layers.reshape( + x=combine_features, shape=[-1, c1]) + + fc_out = fluid.layers.fc(input=img_comb_features, + size=self.char_num, + act="softmax") + return fc_out + + def __call__(self, inputs, others, mode=None): + + pvam_features = self.pvam(inputs, others) + gsrm_features, word_out, gsrm_out = self.gsrm(pvam_features, others) + final_out = self.vsfd(pvam_features, gsrm_features) + + _, decoded_out = fluid.layers.topk(input=final_out, k=1) + predicts = { + 'predict': final_out, + 'decoded_out': decoded_out, + 'word_out': word_out, + 'gsrm_out': gsrm_out + } + + return predicts diff --git a/ppocr/modeling/heads/self_attention/__init__.py b/ppocr/modeling/heads/self_attention/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/ppocr/modeling/heads/self_attention/model.py b/ppocr/modeling/heads/self_attention/model.py new file mode 100644 index 0000000000000000000000000000000000000000..8bf34e4ac6a2c3c33d2a46b1f4f9dbfaf8db8f57 --- /dev/null +++ b/ppocr/modeling/heads/self_attention/model.py @@ -0,0 +1,485 @@ +from functools import partial +import numpy as np + +import paddle.fluid as fluid +import paddle.fluid.layers as layers + +encoder_data_input_fields = ( + "src_word", + "src_pos", + "src_slf_attn_bias", ) + + +def wrap_layer_with_block(layer, block_idx): + """ + Make layer define support indicating block, by which we can add layers + to other blocks within current block. This will make it easy to define + cache among while loop. + """ + + class BlockGuard(object): + """ + BlockGuard class. + + BlockGuard class is used to switch to the given block in a program by + using the Python `with` keyword. + """ + + def __init__(self, block_idx=None, main_program=None): + self.main_program = fluid.default_main_program( + ) if main_program is None else main_program + self.old_block_idx = self.main_program.current_block().idx + self.new_block_idx = block_idx + + def __enter__(self): + self.main_program.current_block_idx = self.new_block_idx + + def __exit__(self, exc_type, exc_val, exc_tb): + self.main_program.current_block_idx = self.old_block_idx + if exc_type is not None: + return False # re-raise exception + return True + + def layer_wrapper(*args, **kwargs): + with BlockGuard(block_idx): + return layer(*args, **kwargs) + + return layer_wrapper + + +def multi_head_attention(queries, + keys, + values, + attn_bias, + d_key, + d_value, + d_model, + n_head=1, + dropout_rate=0., + cache=None, + gather_idx=None, + static_kv=False): + """ + Multi-Head Attention. Note that attn_bias is added to the logit before + computing softmax activiation to mask certain selected positions so that + they will not considered in attention weights. + """ + keys = queries if keys is None else keys + values = keys if values is None else values + + if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3): + raise ValueError( + "Inputs: quries, keys and values should all be 3-D tensors.") + + def __compute_qkv(queries, keys, values, n_head, d_key, d_value): + """ + Add linear projection to queries, keys, and values. + """ + q = layers.fc(input=queries, + size=d_key * n_head, + bias_attr=False, + num_flatten_dims=2) + # For encoder-decoder attention in inference, insert the ops and vars + # into global block to use as cache among beam search. + fc_layer = wrap_layer_with_block( + layers.fc, fluid.default_main_program().current_block() + .parent_idx) if cache is not None and static_kv else layers.fc + k = fc_layer( + input=keys, + size=d_key * n_head, + bias_attr=False, + num_flatten_dims=2) + v = fc_layer( + input=values, + size=d_value * n_head, + bias_attr=False, + num_flatten_dims=2) + return q, k, v + + def __split_heads_qkv(queries, keys, values, n_head, d_key, d_value): + """ + Reshape input tensors at the last dimension to split multi-heads + and then transpose. Specifically, transform the input tensor with shape + [bs, max_sequence_length, n_head * hidden_dim] to the output tensor + with shape [bs, n_head, max_sequence_length, hidden_dim]. + """ + # The value 0 in shape attr means copying the corresponding dimension + # size of the input as the output dimension size. + reshaped_q = layers.reshape( + x=queries, shape=[0, 0, n_head, d_key], inplace=True) + # permuate the dimensions into: + # [batch_size, n_head, max_sequence_len, hidden_size_per_head] + q = layers.transpose(x=reshaped_q, perm=[0, 2, 1, 3]) + # For encoder-decoder attention in inference, insert the ops and vars + # into global block to use as cache among beam search. + reshape_layer = wrap_layer_with_block( + layers.reshape, + fluid.default_main_program().current_block() + .parent_idx) if cache is not None and static_kv else layers.reshape + transpose_layer = wrap_layer_with_block( + layers.transpose, + fluid.default_main_program().current_block(). + parent_idx) if cache is not None and static_kv else layers.transpose + reshaped_k = reshape_layer( + x=keys, shape=[0, 0, n_head, d_key], inplace=True) + k = transpose_layer(x=reshaped_k, perm=[0, 2, 1, 3]) + reshaped_v = reshape_layer( + x=values, shape=[0, 0, n_head, d_value], inplace=True) + v = transpose_layer(x=reshaped_v, perm=[0, 2, 1, 3]) + + if cache is not None: # only for faster inference + if static_kv: # For encoder-decoder attention in inference + cache_k, cache_v = cache["static_k"], cache["static_v"] + # To init the static_k and static_v in cache. + # Maybe we can use condition_op(if_else) to do these at the first + # step in while loop to replace these, however it might be less + # efficient. + static_cache_init = wrap_layer_with_block( + layers.assign, + fluid.default_main_program().current_block().parent_idx) + static_cache_init(k, cache_k) + static_cache_init(v, cache_v) + else: # For decoder self-attention in inference + cache_k, cache_v = cache["k"], cache["v"] + # gather cell states corresponding to selected parent + select_k = layers.gather(cache_k, index=gather_idx) + select_v = layers.gather(cache_v, index=gather_idx) + if not static_kv: + # For self attention in inference, use cache and concat time steps. + select_k = layers.concat([select_k, k], axis=2) + select_v = layers.concat([select_v, v], axis=2) + # update cell states(caches) cached in global block + layers.assign(select_k, cache_k) + layers.assign(select_v, cache_v) + return q, select_k, select_v + return q, k, v + + def __combine_heads(x): + """ + Transpose and then reshape the last two dimensions of inpunt tensor x + so that it becomes one dimension, which is reverse to __split_heads. + """ + if len(x.shape) != 4: + raise ValueError("Input(x) should be a 4-D Tensor.") + + trans_x = layers.transpose(x, perm=[0, 2, 1, 3]) + # The value 0 in shape attr means copying the corresponding dimension + # size of the input as the output dimension size. + return layers.reshape( + x=trans_x, + shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]], + inplace=True) + + def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate): + """ + Scaled Dot-Product Attention + """ + # print(q) + # print(k) + + product = layers.matmul(x=q, y=k, transpose_y=True, alpha=d_key**-0.5) + if attn_bias: + product += attn_bias + weights = layers.softmax(product) + if dropout_rate: + weights = layers.dropout( + weights, dropout_prob=dropout_rate, seed=None, is_test=False) + out = layers.matmul(weights, v) + return out + + q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value) + q, k, v = __split_heads_qkv(q, k, v, n_head, d_key, d_value) + + ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_model, + dropout_rate) + + out = __combine_heads(ctx_multiheads) + + # Project back to the model size. + proj_out = layers.fc(input=out, + size=d_model, + bias_attr=False, + num_flatten_dims=2) + return proj_out + + +def positionwise_feed_forward(x, d_inner_hid, d_hid, dropout_rate): + """ + Position-wise Feed-Forward Networks. + This module consists of two linear transformations with a ReLU activation + in between, which is applied to each position separately and identically. + """ + hidden = layers.fc(input=x, + size=d_inner_hid, + num_flatten_dims=2, + act="relu") + if dropout_rate: + hidden = layers.dropout( + hidden, dropout_prob=dropout_rate, seed=None, is_test=False) + out = layers.fc(input=hidden, size=d_hid, num_flatten_dims=2) + return out + + +def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0.): + """ + Add residual connection, layer normalization and droput to the out tensor + optionally according to the value of process_cmd. + This will be used before or after multi-head attention and position-wise + feed-forward networks. + """ + for cmd in process_cmd: + if cmd == "a": # add residual connection + out = out + prev_out if prev_out else out + elif cmd == "n": # add layer normalization + out = layers.layer_norm( + out, + begin_norm_axis=len(out.shape) - 1, + param_attr=fluid.initializer.Constant(1.), + bias_attr=fluid.initializer.Constant(0.)) + elif cmd == "d": # add dropout + if dropout_rate: + out = layers.dropout( + out, dropout_prob=dropout_rate, seed=None, is_test=False) + return out + + +pre_process_layer = partial(pre_post_process_layer, None) +post_process_layer = pre_post_process_layer + + +def prepare_encoder( + src_word, # [b,t,c] + src_pos, + src_vocab_size, + src_emb_dim, + src_max_len, + dropout_rate=0., + bos_idx=0, + word_emb_param_name=None, + pos_enc_param_name=None): + """Add word embeddings and position encodings. + The output tensor has a shape of: + [batch_size, max_src_length_in_batch, d_model]. + This module is used at the bottom of the encoder stacks. + """ + + src_word_emb = src_word + src_word_emb = layers.cast(src_word_emb, 'float32') + + src_word_emb = layers.scale(x=src_word_emb, scale=src_emb_dim**0.5) + src_pos_enc = layers.embedding( + src_pos, + size=[src_max_len, src_emb_dim], + param_attr=fluid.ParamAttr( + name=pos_enc_param_name, trainable=False)) + src_pos_enc.stop_gradient = True + enc_input = src_word_emb + src_pos_enc + return layers.dropout( + enc_input, dropout_prob=dropout_rate, seed=None, + is_test=False) if dropout_rate else enc_input + + +def prepare_decoder(src_word, + src_pos, + src_vocab_size, + src_emb_dim, + src_max_len, + dropout_rate=0., + bos_idx=0, + word_emb_param_name=None, + pos_enc_param_name=None): + """Add word embeddings and position encodings. + The output tensor has a shape of: + [batch_size, max_src_length_in_batch, d_model]. + This module is used at the bottom of the encoder stacks. + """ + src_word_emb = layers.embedding( + src_word, + size=[src_vocab_size, src_emb_dim], + padding_idx=bos_idx, # set embedding of bos to 0 + param_attr=fluid.ParamAttr( + name=word_emb_param_name, + initializer=fluid.initializer.Normal(0., src_emb_dim**-0.5))) + + src_word_emb = layers.scale(x=src_word_emb, scale=src_emb_dim**0.5) + src_pos_enc = layers.embedding( + src_pos, + size=[src_max_len, src_emb_dim], + param_attr=fluid.ParamAttr( + name=pos_enc_param_name, trainable=False)) + src_pos_enc.stop_gradient = True + enc_input = src_word_emb + src_pos_enc + return layers.dropout( + enc_input, dropout_prob=dropout_rate, seed=None, + is_test=False) if dropout_rate else enc_input + + +def encoder_layer(enc_input, + attn_bias, + n_head, + d_key, + d_value, + d_model, + d_inner_hid, + prepostprocess_dropout, + attention_dropout, + relu_dropout, + preprocess_cmd="n", + postprocess_cmd="da"): + """The encoder layers that can be stacked to form a deep encoder. + This module consits of a multi-head (self) attention followed by + position-wise feed-forward networks and both the two components companied + with the post_process_layer to add residual connection, layer normalization + and droput. + """ + attn_output = multi_head_attention( + pre_process_layer(enc_input, preprocess_cmd, + prepostprocess_dropout), None, None, attn_bias, d_key, + d_value, d_model, n_head, attention_dropout) + attn_output = post_process_layer(enc_input, attn_output, postprocess_cmd, + prepostprocess_dropout) + ffd_output = positionwise_feed_forward( + pre_process_layer(attn_output, preprocess_cmd, prepostprocess_dropout), + d_inner_hid, d_model, relu_dropout) + return post_process_layer(attn_output, ffd_output, postprocess_cmd, + prepostprocess_dropout) + + +def encoder(enc_input, + attn_bias, + n_layer, + n_head, + d_key, + d_value, + d_model, + d_inner_hid, + prepostprocess_dropout, + attention_dropout, + relu_dropout, + preprocess_cmd="n", + postprocess_cmd="da"): + """ + The encoder is composed of a stack of identical layers returned by calling + encoder_layer. + """ + for i in range(n_layer): + enc_output = encoder_layer( + enc_input, + attn_bias, + n_head, + d_key, + d_value, + d_model, + d_inner_hid, + prepostprocess_dropout, + attention_dropout, + relu_dropout, + preprocess_cmd, + postprocess_cmd, ) + enc_input = enc_output + enc_output = pre_process_layer(enc_output, preprocess_cmd, + prepostprocess_dropout) + return enc_output + + +def wrap_encoder_forFeature(src_vocab_size, + max_length, + n_layer, + n_head, + d_key, + d_value, + d_model, + d_inner_hid, + prepostprocess_dropout, + attention_dropout, + relu_dropout, + preprocess_cmd, + postprocess_cmd, + weight_sharing, + enc_inputs=None, + bos_idx=0): + """ + The wrapper assembles together all needed layers for the encoder. + img, src_pos, src_slf_attn_bias = enc_inputs + img + """ + + conv_features, src_pos, src_slf_attn_bias = enc_inputs # + b, t, c = conv_features.shape + + enc_input = prepare_encoder( + conv_features, + src_pos, + src_vocab_size, + d_model, + max_length, + prepostprocess_dropout, + bos_idx=bos_idx, + word_emb_param_name="src_word_emb_table") + + enc_output = encoder( + enc_input, + src_slf_attn_bias, + n_layer, + n_head, + d_key, + d_value, + d_model, + d_inner_hid, + prepostprocess_dropout, + attention_dropout, + relu_dropout, + preprocess_cmd, + postprocess_cmd, ) + return enc_output + + +def wrap_encoder(src_vocab_size, + max_length, + n_layer, + n_head, + d_key, + d_value, + d_model, + d_inner_hid, + prepostprocess_dropout, + attention_dropout, + relu_dropout, + preprocess_cmd, + postprocess_cmd, + weight_sharing, + enc_inputs=None, + bos_idx=0): + """ + The wrapper assembles together all needed layers for the encoder. + img, src_pos, src_slf_attn_bias = enc_inputs + img + """ + + src_word, src_pos, src_slf_attn_bias = enc_inputs # + + enc_input = prepare_decoder( + src_word, + src_pos, + src_vocab_size, + d_model, + max_length, + prepostprocess_dropout, + bos_idx=bos_idx, + word_emb_param_name="src_word_emb_table") + + enc_output = encoder( + enc_input, + src_slf_attn_bias, + n_layer, + n_head, + d_key, + d_value, + d_model, + d_inner_hid, + prepostprocess_dropout, + attention_dropout, + relu_dropout, + preprocess_cmd, + postprocess_cmd, ) + return enc_output diff --git a/ppocr/modeling/losses/cls_loss.py b/ppocr/modeling/losses/cls_loss.py new file mode 100755 index 0000000000000000000000000000000000000000..c187dce3618feef7503b01704a72a040145816a2 --- /dev/null +++ b/ppocr/modeling/losses/cls_loss.py @@ -0,0 +1,33 @@ +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle.fluid as fluid + + +class ClsLoss(object): + def __init__(self, params): + super(ClsLoss, self).__init__() + self.loss_func = fluid.layers.cross_entropy + + def __call__(self, predicts, labels): + predict = predicts['predict'] + label = labels['label'] + # softmax_out = fluid.layers.softmax(predict, use_cudnn=False) + cost = fluid.layers.cross_entropy(input=predict, label=label) + sum_cost = fluid.layers.mean(cost) + return sum_cost diff --git a/ppocr/modeling/losses/det_sast_loss.py b/ppocr/modeling/losses/det_sast_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..fb1a545af9d67b61878b3fb476fdc81746e8b992 --- /dev/null +++ b/ppocr/modeling/losses/det_sast_loss.py @@ -0,0 +1,115 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle.fluid as fluid + + +class SASTLoss(object): + """ + SAST Loss function + """ + + def __init__(self, params=None): + super(SASTLoss, self).__init__() + + def __call__(self, predicts, labels): + """ + tcl_pos: N x 128 x 3 + tcl_mask: N x 128 x 1 + tcl_label: N x X list or LoDTensor + """ + + f_score = predicts['f_score'] + f_border = predicts['f_border'] + f_tvo = predicts['f_tvo'] + f_tco = predicts['f_tco'] + + l_score = labels['input_score'] + l_border = labels['input_border'] + l_mask = labels['input_mask'] + l_tvo = labels['input_tvo'] + l_tco = labels['input_tco'] + + #score_loss + intersection = fluid.layers.reduce_sum(f_score * l_score * l_mask) + union = fluid.layers.reduce_sum(f_score * l_mask) + fluid.layers.reduce_sum(l_score * l_mask) + score_loss = 1.0 - 2 * intersection / (union + 1e-5) + + #border loss + l_border_split, l_border_norm = fluid.layers.split(l_border, num_or_sections=[4, 1], dim=1) + f_border_split = f_border + l_border_norm_split = fluid.layers.expand(x=l_border_norm, expand_times=[1, 4, 1, 1]) + l_border_score = fluid.layers.expand(x=l_score, expand_times=[1, 4, 1, 1]) + l_border_mask = fluid.layers.expand(x=l_mask, expand_times=[1, 4, 1, 1]) + border_diff = l_border_split - f_border_split + abs_border_diff = fluid.layers.abs(border_diff) + border_sign = abs_border_diff < 1.0 + border_sign = fluid.layers.cast(border_sign, dtype='float32') + border_sign.stop_gradient = True + border_in_loss = 0.5 * abs_border_diff * abs_border_diff * border_sign + \ + (abs_border_diff - 0.5) * (1.0 - border_sign) + border_out_loss = l_border_norm_split * border_in_loss + border_loss = fluid.layers.reduce_sum(border_out_loss * l_border_score * l_border_mask) / \ + (fluid.layers.reduce_sum(l_border_score * l_border_mask) + 1e-5) + + #tvo_loss + l_tvo_split, l_tvo_norm = fluid.layers.split(l_tvo, num_or_sections=[8, 1], dim=1) + f_tvo_split = f_tvo + l_tvo_norm_split = fluid.layers.expand(x=l_tvo_norm, expand_times=[1, 8, 1, 1]) + l_tvo_score = fluid.layers.expand(x=l_score, expand_times=[1, 8, 1, 1]) + l_tvo_mask = fluid.layers.expand(x=l_mask, expand_times=[1, 8, 1, 1]) + # + tvo_geo_diff = l_tvo_split - f_tvo_split + abs_tvo_geo_diff = fluid.layers.abs(tvo_geo_diff) + tvo_sign = abs_tvo_geo_diff < 1.0 + tvo_sign = fluid.layers.cast(tvo_sign, dtype='float32') + tvo_sign.stop_gradient = True + tvo_in_loss = 0.5 * abs_tvo_geo_diff * abs_tvo_geo_diff * tvo_sign + \ + (abs_tvo_geo_diff - 0.5) * (1.0 - tvo_sign) + tvo_out_loss = l_tvo_norm_split * tvo_in_loss + tvo_loss = fluid.layers.reduce_sum(tvo_out_loss * l_tvo_score * l_tvo_mask) / \ + (fluid.layers.reduce_sum(l_tvo_score * l_tvo_mask) + 1e-5) + + #tco_loss + l_tco_split, l_tco_norm = fluid.layers.split(l_tco, num_or_sections=[2, 1], dim=1) + f_tco_split = f_tco + l_tco_norm_split = fluid.layers.expand(x=l_tco_norm, expand_times=[1, 2, 1, 1]) + l_tco_score = fluid.layers.expand(x=l_score, expand_times=[1, 2, 1, 1]) + l_tco_mask = fluid.layers.expand(x=l_mask, expand_times=[1, 2, 1, 1]) + # + tco_geo_diff = l_tco_split - f_tco_split + abs_tco_geo_diff = fluid.layers.abs(tco_geo_diff) + tco_sign = abs_tco_geo_diff < 1.0 + tco_sign = fluid.layers.cast(tco_sign, dtype='float32') + tco_sign.stop_gradient = True + tco_in_loss = 0.5 * abs_tco_geo_diff * abs_tco_geo_diff * tco_sign + \ + (abs_tco_geo_diff - 0.5) * (1.0 - tco_sign) + tco_out_loss = l_tco_norm_split * tco_in_loss + tco_loss = fluid.layers.reduce_sum(tco_out_loss * l_tco_score * l_tco_mask) / \ + (fluid.layers.reduce_sum(l_tco_score * l_tco_mask) + 1e-5) + + + # total loss + tvo_lw, tco_lw = 1.5, 1.5 + score_lw, border_lw = 1.0, 1.0 + total_loss = score_loss * score_lw + border_loss * border_lw + \ + tvo_loss * tvo_lw + tco_loss * tco_lw + + losses = {'total_loss':total_loss, "score_loss":score_loss,\ + "border_loss":border_loss, 'tvo_loss':tvo_loss, 'tco_loss':tco_loss} + return losses \ No newline at end of file diff --git a/ppocr/modeling/losses/rec_srn_loss.py b/ppocr/modeling/losses/rec_srn_loss.py new file mode 100755 index 0000000000000000000000000000000000000000..b1ebd86fadfc48831ca3d03e79ef65f4be2aa5e8 --- /dev/null +++ b/ppocr/modeling/losses/rec_srn_loss.py @@ -0,0 +1,55 @@ +#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math + +import paddle +import paddle.fluid as fluid + + +class SRNLoss(object): + def __init__(self, params): + super(SRNLoss, self).__init__() + self.char_num = params['char_num'] + + def __call__(self, predicts, others): + predict = predicts['predict'] + word_predict = predicts['word_out'] + gsrm_predict = predicts['gsrm_out'] + label = others['label'] + lbl_weight = others['lbl_weight'] + + casted_label = fluid.layers.cast(x=label, dtype='int64') + cost_word = fluid.layers.cross_entropy( + input=word_predict, label=casted_label) + cost_gsrm = fluid.layers.cross_entropy( + input=gsrm_predict, label=casted_label) + cost_vsfd = fluid.layers.cross_entropy( + input=predict, label=casted_label) + + cost_word = fluid.layers.reshape( + x=fluid.layers.reduce_sum(cost_word), shape=[1]) + cost_gsrm = fluid.layers.reshape( + x=fluid.layers.reduce_sum(cost_gsrm), shape=[1]) + cost_vsfd = fluid.layers.reshape( + x=fluid.layers.reduce_sum(cost_vsfd), shape=[1]) + + sum_cost = fluid.layers.sum( + [cost_word, cost_vsfd * 2.0, cost_gsrm * 0.15]) + + return [sum_cost, cost_vsfd, cost_word] diff --git a/ppocr/optimizer.py b/ppocr/optimizer.py old mode 100755 new mode 100644 index c50b14c8d926d6e5b3798f5248ab1f80cbb57011..fd315cd1319d4925e893705957a42f931a39076e --- a/ppocr/optimizer.py +++ b/ppocr/optimizer.py @@ -14,12 +14,50 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function +import math import paddle.fluid as fluid +from paddle.fluid.regularizer import L2Decay +from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter +import paddle.fluid.layers.ops as ops + from ppocr.utils.utility import initial_logger logger = initial_logger() +def cosine_decay_with_warmup(learning_rate, + step_each_epoch, + epochs=500, + warmup_minibatch=1000): + """Applies cosine decay to the learning rate. + lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1) + decrease lr for every mini-batch and start with warmup. + """ + global_step = _decay_step_counter() + lr = fluid.layers.tensor.create_global_var( + shape=[1], + value=0.0, + dtype='float32', + persistable=True, + name="learning_rate") + + warmup_minibatch = fluid.layers.fill_constant( + shape=[1], + dtype='float32', + value=float(warmup_minibatch), + force_cpu=True) + + with fluid.layers.control_flow.Switch() as switch: + with switch.case(global_step < warmup_minibatch): + decayed_lr = learning_rate * (1.0 * global_step / warmup_minibatch) + fluid.layers.tensor.assign(input=decayed_lr, output=lr) + with switch.default(): + decayed_lr = learning_rate * \ + (ops.cos((global_step - warmup_minibatch) * (math.pi / (epochs * step_each_epoch))) + 1)/2 + fluid.layers.tensor.assign(input=decayed_lr, output=lr) + return lr + + def AdamDecay(params, parameter_list=None): """ define optimizer function @@ -31,21 +69,87 @@ def AdamDecay(params, parameter_list=None): base_lr = params['base_lr'] beta1 = params['beta1'] beta2 = params['beta2'] + l2_decay = params.get("l2_decay", 0.0) + if 'decay' in params: + supported_decay_mode = [ + "cosine_decay", "cosine_decay_warmup", "piecewise_decay" + ] params = params['decay'] decay_mode = params['function'] - step_each_epoch = params['step_each_epoch'] - total_epoch = params['total_epoch'] + assert decay_mode in supported_decay_mode, "Supported decay mode is {}, but got {}".format( + supported_decay_mode, decay_mode) + if decay_mode == "cosine_decay": + step_each_epoch = params['step_each_epoch'] + total_epoch = params['total_epoch'] base_lr = fluid.layers.cosine_decay( learning_rate=base_lr, step_each_epoch=step_each_epoch, epochs=total_epoch) - else: - logger.info("Only support Cosine decay currently") + elif decay_mode == "cosine_decay_warmup": + step_each_epoch = params['step_each_epoch'] + total_epoch = params['total_epoch'] + warmup_minibatch = params.get("warmup_minibatch", 1000) + base_lr = cosine_decay_with_warmup( + learning_rate=base_lr, + step_each_epoch=step_each_epoch, + epochs=total_epoch, + warmup_minibatch=warmup_minibatch) + elif decay_mode == "piecewise_decay": + boundaries = params["boundaries"] + decay_rate = params["decay_rate"] + values = [ + base_lr * decay_rate**idx + for idx in range(len(boundaries) + 1) + ] + base_lr = fluid.layers.piecewise_decay(boundaries, values) + optimizer = fluid.optimizer.Adam( learning_rate=base_lr, beta1=beta1, beta2=beta2, + regularization=L2Decay(regularization_coeff=l2_decay), parameter_list=parameter_list) return optimizer + + +def RMSProp(params, parameter_list=None): + """ + define optimizer function + args: + params(dict): the super parameters + parameter_list (list): list of Variable names to update to minimize loss + return: + """ + base_lr = params.get("base_lr", 0.001) + l2_decay = params.get("l2_decay", 0.00005) + + if 'decay' in params: + supported_decay_mode = ["cosine_decay", "piecewise_decay"] + params = params['decay'] + decay_mode = params['function'] + assert decay_mode in supported_decay_mode, "Supported decay mode is {}, but got {}".format( + supported_decay_mode, decay_mode) + + if decay_mode == "cosine_decay": + step_each_epoch = params['step_each_epoch'] + total_epoch = params['total_epoch'] + base_lr = fluid.layers.cosine_decay( + learning_rate=base_lr, + step_each_epoch=step_each_epoch, + epochs=total_epoch) + elif decay_mode == "piecewise_decay": + boundaries = params["boundaries"] + decay_rate = params["decay_rate"] + values = [ + base_lr * decay_rate**idx + for idx in range(len(boundaries) + 1) + ] + base_lr = fluid.layers.piecewise_decay(boundaries, values) + + optimizer = fluid.optimizer.RMSProp( + learning_rate=base_lr, + regularization=fluid.regularizer.L2Decay(regularization_coeff=l2_decay)) + + return optimizer diff --git a/ppocr/postprocess/db_postprocess.py b/ppocr/postprocess/db_postprocess.py index f115f12ed177dda87d7caae2167e44dca037c9ae..0792cde0a46c6d3441d84ebd1e804083a04fb302 100644 --- a/ppocr/postprocess/db_postprocess.py +++ b/ppocr/postprocess/db_postprocess.py @@ -37,6 +37,7 @@ class DBPostProcess(object): self.max_candidates = params['max_candidates'] self.unclip_ratio = params['unclip_ratio'] self.min_size = 3 + self.dilation_kernel = np.array([[1, 1], [1, 1]]) def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height): ''' @@ -140,8 +141,9 @@ class DBPostProcess(object): boxes_batch = [] for batch_index in range(pred.shape[0]): height, width = pred.shape[-2:] - tmp_boxes, tmp_scores = self.boxes_from_bitmap( - pred[batch_index], segmentation[batch_index], width, height) + + mask = cv2.dilate(np.array(segmentation[batch_index]).astype(np.uint8), self.dilation_kernel) + tmp_boxes, tmp_scores = self.boxes_from_bitmap(pred[batch_index], mask, width, height) boxes = [] for k in range(len(tmp_boxes)): diff --git a/ppocr/postprocess/east_postprocess.py b/ppocr/postprocess/east_postprocess.py index 8200df3c281383fbc8c3f8df4f5090d923fb4d73..270cf6699bb7f77c730c6ff80b49f1798b9bb720 100755 --- a/ppocr/postprocess/east_postprocess.py +++ b/ppocr/postprocess/east_postprocess.py @@ -22,9 +22,9 @@ import cv2 import os import sys -__dir__ = os.path.dirname(__file__) +__dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) -sys.path.append(os.path.join(__dir__, '..')) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) class EASTPostPocess(object): diff --git a/ppocr/postprocess/lanms/.ycm_extra_conf.py b/ppocr/postprocess/lanms/.ycm_extra_conf.py index 3c8673ddbbd92042660545d123c6bbba4f0d8273..cd1a74e920bad8d84b755b5dbfbf83e6884836d6 100644 --- a/ppocr/postprocess/lanms/.ycm_extra_conf.py +++ b/ppocr/postprocess/lanms/.ycm_extra_conf.py @@ -25,7 +25,7 @@ import ycm_core # These are the compilation flags that will be used in case there's no # compilation database set (by default, one is not set). # CHANGE THIS LIST OF FLAGS. YES, THIS IS THE DROID YOU HAVE BEEN LOOKING FOR. -sys.path.append(os.path.dirname(__file__)) +sys.path.append(os.path.dirname(os.path.abspath(__file__))) BASE_DIR = os.path.dirname(os.path.realpath(__file__)) diff --git a/ppocr/postprocess/sast_postprocess.py b/ppocr/postprocess/sast_postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..ddce65542c6d12244419a37db838e070401b5290 --- /dev/null +++ b/ppocr/postprocess/sast_postprocess.py @@ -0,0 +1,289 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +__dir__ = os.path.dirname(__file__) +sys.path.append(__dir__) +sys.path.append(os.path.join(__dir__, '..')) + +import numpy as np +from .locality_aware_nms import nms_locality +# import lanms +import cv2 +import time + + +class SASTPostProcess(object): + """ + The post process for SAST. + """ + + def __init__(self, params): + self.score_thresh = params.get('score_thresh', 0.5) + self.nms_thresh = params.get('nms_thresh', 0.2) + self.sample_pts_num = params.get('sample_pts_num', 2) + self.shrink_ratio_of_width = params.get('shrink_ratio_of_width', 0.3) + self.expand_scale = params.get('expand_scale', 1.0) + self.tcl_map_thresh = 0.5 + + # c++ la-nms is faster, but only support python 3.5 + self.is_python35 = False + if sys.version_info.major == 3 and sys.version_info.minor == 5: + self.is_python35 = True + + def point_pair2poly(self, point_pair_list): + """ + Transfer vertical point_pairs into poly point in clockwise. + """ + # constract poly + point_num = len(point_pair_list) * 2 + point_list = [0] * point_num + for idx, point_pair in enumerate(point_pair_list): + point_list[idx] = point_pair[0] + point_list[point_num - 1 - idx] = point_pair[1] + return np.array(point_list).reshape(-1, 2) + + def shrink_quad_along_width(self, quad, begin_width_ratio=0., end_width_ratio=1.): + """ + Generate shrink_quad_along_width. + """ + ratio_pair = np.array([[begin_width_ratio], [end_width_ratio]], dtype=np.float32) + p0_1 = quad[0] + (quad[1] - quad[0]) * ratio_pair + p3_2 = quad[3] + (quad[2] - quad[3]) * ratio_pair + return np.array([p0_1[0], p0_1[1], p3_2[1], p3_2[0]]) + + def expand_poly_along_width(self, poly, shrink_ratio_of_width=0.3): + """ + expand poly along width. + """ + point_num = poly.shape[0] + left_quad = np.array([poly[0], poly[1], poly[-2], poly[-1]], dtype=np.float32) + left_ratio = -shrink_ratio_of_width * np.linalg.norm(left_quad[0] - left_quad[3]) / \ + (np.linalg.norm(left_quad[0] - left_quad[1]) + 1e-6) + left_quad_expand = self.shrink_quad_along_width(left_quad, left_ratio, 1.0) + right_quad = np.array([poly[point_num // 2 - 2], poly[point_num // 2 - 1], + poly[point_num // 2], poly[point_num // 2 + 1]], dtype=np.float32) + right_ratio = 1.0 + \ + shrink_ratio_of_width * np.linalg.norm(right_quad[0] - right_quad[3]) / \ + (np.linalg.norm(right_quad[0] - right_quad[1]) + 1e-6) + right_quad_expand = self.shrink_quad_along_width(right_quad, 0.0, right_ratio) + poly[0] = left_quad_expand[0] + poly[-1] = left_quad_expand[-1] + poly[point_num // 2 - 1] = right_quad_expand[1] + poly[point_num // 2] = right_quad_expand[2] + return poly + + def restore_quad(self, tcl_map, tcl_map_thresh, tvo_map): + """Restore quad.""" + xy_text = np.argwhere(tcl_map[:, :, 0] > tcl_map_thresh) + xy_text = xy_text[:, ::-1] # (n, 2) + + # Sort the text boxes via the y axis + xy_text = xy_text[np.argsort(xy_text[:, 1])] + + scores = tcl_map[xy_text[:, 1], xy_text[:, 0], 0] + scores = scores[:, np.newaxis] + + # Restore + point_num = int(tvo_map.shape[-1] / 2) + assert point_num == 4 + tvo_map = tvo_map[xy_text[:, 1], xy_text[:, 0], :] + xy_text_tile = np.tile(xy_text, (1, point_num)) # (n, point_num * 2) + quads = xy_text_tile - tvo_map + + return scores, quads, xy_text + + def quad_area(self, quad): + """ + compute area of a quad. + """ + edge = [ + (quad[1][0] - quad[0][0]) * (quad[1][1] + quad[0][1]), + (quad[2][0] - quad[1][0]) * (quad[2][1] + quad[1][1]), + (quad[3][0] - quad[2][0]) * (quad[3][1] + quad[2][1]), + (quad[0][0] - quad[3][0]) * (quad[0][1] + quad[3][1]) + ] + return np.sum(edge) / 2. + + def nms(self, dets): + if self.is_python35: + import lanms + dets = lanms.merge_quadrangle_n9(dets, self.nms_thresh) + else: + dets = nms_locality(dets, self.nms_thresh) + return dets + + def cluster_by_quads_tco(self, tcl_map, tcl_map_thresh, quads, tco_map): + """ + Cluster pixels in tcl_map based on quads. + """ + instance_count = quads.shape[0] + 1 # contain background + instance_label_map = np.zeros(tcl_map.shape[:2], dtype=np.int32) + if instance_count == 1: + return instance_count, instance_label_map + + # predict text center + xy_text = np.argwhere(tcl_map[:, :, 0] > tcl_map_thresh) + n = xy_text.shape[0] + xy_text = xy_text[:, ::-1] # (n, 2) + tco = tco_map[xy_text[:, 1], xy_text[:, 0], :] # (n, 2) + pred_tc = xy_text - tco + + # get gt text center + m = quads.shape[0] + gt_tc = np.mean(quads, axis=1) # (m, 2) + + pred_tc_tile = np.tile(pred_tc[:, np.newaxis, :], (1, m, 1)) # (n, m, 2) + gt_tc_tile = np.tile(gt_tc[np.newaxis, :, :], (n, 1, 1)) # (n, m, 2) + dist_mat = np.linalg.norm(pred_tc_tile - gt_tc_tile, axis=2) # (n, m) + xy_text_assign = np.argmin(dist_mat, axis=1) + 1 # (n,) + + instance_label_map[xy_text[:, 1], xy_text[:, 0]] = xy_text_assign + return instance_count, instance_label_map + + def estimate_sample_pts_num(self, quad, xy_text): + """ + Estimate sample points number. + """ + eh = (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[1] - quad[2])) / 2.0 + ew = (np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[2] - quad[3])) / 2.0 + + dense_sample_pts_num = max(2, int(ew)) + dense_xy_center_line = xy_text[np.linspace(0, xy_text.shape[0] - 1, dense_sample_pts_num, + endpoint=True, dtype=np.float32).astype(np.int32)] + + dense_xy_center_line_diff = dense_xy_center_line[1:] - dense_xy_center_line[:-1] + estimate_arc_len = np.sum(np.linalg.norm(dense_xy_center_line_diff, axis=1)) + + sample_pts_num = max(2, int(estimate_arc_len / eh)) + return sample_pts_num + + def detect_sast(self, tcl_map, tvo_map, tbo_map, tco_map, ratio_w, ratio_h, src_w, src_h, + shrink_ratio_of_width=0.3, tcl_map_thresh=0.5, offset_expand=1.0, out_strid=4.0): + """ + first resize the tcl_map, tvo_map and tbo_map to the input_size, then restore the polys + """ + # restore quad + scores, quads, xy_text = self.restore_quad(tcl_map, tcl_map_thresh, tvo_map) + dets = np.hstack((quads, scores)).astype(np.float32, copy=False) + dets = self.nms(dets) + if dets.shape[0] == 0: + return [] + quads = dets[:, :-1].reshape(-1, 4, 2) + + # Compute quad area + quad_areas = [] + for quad in quads: + quad_areas.append(-self.quad_area(quad)) + + # instance segmentation + # instance_count, instance_label_map = cv2.connectedComponents(tcl_map.astype(np.uint8), connectivity=8) + instance_count, instance_label_map = self.cluster_by_quads_tco(tcl_map, tcl_map_thresh, quads, tco_map) + + # restore single poly with tcl instance. + poly_list = [] + for instance_idx in range(1, instance_count): + xy_text = np.argwhere(instance_label_map == instance_idx)[:, ::-1] + quad = quads[instance_idx - 1] + q_area = quad_areas[instance_idx - 1] + if q_area < 5: + continue + + # + len1 = float(np.linalg.norm(quad[0] -quad[1])) + len2 = float(np.linalg.norm(quad[1] -quad[2])) + min_len = min(len1, len2) + if min_len < 3: + continue + + # filter small CC + if xy_text.shape[0] <= 0: + continue + + # filter low confidence instance + xy_text_scores = tcl_map[xy_text[:, 1], xy_text[:, 0], 0] + if np.sum(xy_text_scores) / quad_areas[instance_idx - 1] < 0.1: + # if np.sum(xy_text_scores) / quad_areas[instance_idx - 1] < 0.05: + continue + + # sort xy_text + left_center_pt = np.array([[(quad[0, 0] + quad[-1, 0]) / 2.0, + (quad[0, 1] + quad[-1, 1]) / 2.0]]) # (1, 2) + right_center_pt = np.array([[(quad[1, 0] + quad[2, 0]) / 2.0, + (quad[1, 1] + quad[2, 1]) / 2.0]]) # (1, 2) + proj_unit_vec = (right_center_pt - left_center_pt) / \ + (np.linalg.norm(right_center_pt - left_center_pt) + 1e-6) + proj_value = np.sum(xy_text * proj_unit_vec, axis=1) + xy_text = xy_text[np.argsort(proj_value)] + + # Sample pts in tcl map + if self.sample_pts_num == 0: + sample_pts_num = self.estimate_sample_pts_num(quad, xy_text) + else: + sample_pts_num = self.sample_pts_num + xy_center_line = xy_text[np.linspace(0, xy_text.shape[0] - 1, sample_pts_num, + endpoint=True, dtype=np.float32).astype(np.int32)] + + point_pair_list = [] + for x, y in xy_center_line: + # get corresponding offset + offset = tbo_map[y, x, :].reshape(2, 2) + if offset_expand != 1.0: + offset_length = np.linalg.norm(offset, axis=1, keepdims=True) + expand_length = np.clip(offset_length * (offset_expand - 1), a_min=0.5, a_max=3.0) + offset_detal = offset / offset_length * expand_length + offset = offset + offset_detal + # original point + ori_yx = np.array([y, x], dtype=np.float32) + point_pair = (ori_yx + offset)[:, ::-1]* out_strid / np.array([ratio_w, ratio_h]).reshape(-1, 2) + point_pair_list.append(point_pair) + + # ndarry: (x, 2), expand poly along width + detected_poly = self.point_pair2poly(point_pair_list) + detected_poly = self.expand_poly_along_width(detected_poly, shrink_ratio_of_width) + detected_poly[:, 0] = np.clip(detected_poly[:, 0], a_min=0, a_max=src_w) + detected_poly[:, 1] = np.clip(detected_poly[:, 1], a_min=0, a_max=src_h) + poly_list.append(detected_poly) + + return poly_list + + def __call__(self, outs_dict, ratio_list): + score_list = outs_dict['f_score'] + border_list = outs_dict['f_border'] + tvo_list = outs_dict['f_tvo'] + tco_list = outs_dict['f_tco'] + + img_num = len(ratio_list) + poly_lists = [] + for ino in range(img_num): + p_score = score_list[ino].transpose((1,2,0)) + p_border = border_list[ino].transpose((1,2,0)) + p_tvo = tvo_list[ino].transpose((1,2,0)) + p_tco = tco_list[ino].transpose((1,2,0)) + # print(p_score.shape, p_border.shape, p_tvo.shape, p_tco.shape) + ratio_h, ratio_w, src_h, src_w = ratio_list[ino] + + poly_list = self.detect_sast(p_score, p_tvo, p_border, p_tco, ratio_w, ratio_h, src_w, src_h, + shrink_ratio_of_width=self.shrink_ratio_of_width, + tcl_map_thresh=self.tcl_map_thresh, offset_expand=self.expand_scale) + + poly_lists.append(poly_list) + + return poly_lists + diff --git a/ppocr/utils/character.py b/ppocr/utils/character.py index 9a3db8dd92454c65256d1cadf7f155b6882ee171..97237cfa71a3d3ae0684ecbefbb2511f09bcd3a2 100755 --- a/ppocr/utils/character.py +++ b/ppocr/utils/character.py @@ -25,10 +25,13 @@ class CharacterOps(object): def __init__(self, config): self.character_type = config['character_type'] self.loss_type = config['loss_type'] + self.max_text_len = config['max_text_length'] if self.character_type == "en": self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) - elif self.character_type == "ch": + elif self.character_type in [ + "ch", 'japan', 'korean', 'french', 'german' + ]: character_dict_path = config['character_dict_path'] add_space = False if 'use_space_char' in config: @@ -54,6 +57,8 @@ class CharacterOps(object): self.end_str = "eos" if self.loss_type == "attention": dict_character = [self.beg_str, self.end_str] + dict_character + elif self.loss_type == "srn": + dict_character = dict_character + [self.beg_str, self.end_str] self.dict = {} for i, char in enumerate(dict_character): self.dict[char] = i @@ -147,6 +152,42 @@ def cal_predicts_accuracy(char_ops, return acc, acc_num, img_num +def cal_predicts_accuracy_srn(char_ops, + preds, + labels, + max_text_len, + is_debug=False): + acc_num = 0 + img_num = 0 + + char_num = char_ops.get_char_num() + + total_len = preds.shape[0] + img_num = int(total_len / max_text_len) + for i in range(img_num): + cur_label = [] + cur_pred = [] + for j in range(max_text_len): + if labels[j + i * max_text_len] != int(char_num - 1): #0 + cur_label.append(labels[j + i * max_text_len][0]) + else: + break + + for j in range(max_text_len + 1): + if j < len(cur_label) and preds[j + i * max_text_len][ + 0] != cur_label[j]: + break + elif j == len(cur_label) and j == max_text_len: + acc_num += 1 + break + elif j == len(cur_label) and preds[j + i * max_text_len][0] == int( + char_num - 1): + acc_num += 1 + break + acc = acc_num * 1.0 / img_num + return acc, acc_num, img_num + + def convert_rec_attention_infer_res(preds): img_num = preds.shape[0] target_lod = [0] diff --git a/ppocr/utils/french_dict.txt b/ppocr/utils/french_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..c7cd8ec503b3c70e03b72795761a858bb9a0d34d --- /dev/null +++ b/ppocr/utils/french_dict.txt @@ -0,0 +1,118 @@ +! +" +% +& +' +( +) ++ +, +- +. +/ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; +? +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z +[ +] +a +b +c +d +e +f +g +h +i +j +k +l +m +n +o +p +q +r +s +t +u +v +w +x +y +z +« +³ +µ +º +» +À +Á +Â +Å +É +Ê +Î +Ö +ß +à +á +â +ä +å +æ +ç +è +é +ê +ë +í +î +ï +ñ +ò +ó +ô +ö +ø +ù +ú +û +ü + diff --git a/ppocr/utils/german_dict.txt b/ppocr/utils/german_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..30c4d4218e8a77386db912e24117b1f197466e83 --- /dev/null +++ b/ppocr/utils/german_dict.txt @@ -0,0 +1,131 @@ +! +" +$ +% +& +' +( +) ++ +, +- +. +/ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; +> +? +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z +[ +] +a +b +c +d +e +f +g +h +i +j +k +l +m +n +o +p +q +r +s +t +u +v +w +x +y +z +£ +§ +­ +² +´ +µ +· +º +¼ +½ +¿ +À +Á +Ä +Å +Ç +É +Í +Ï +Ô +Ö +Ø +Ù +Ü +ß +à +á +â +ã +ä +å +æ +ç +è +é +ê +ë +í +ï +ñ +ò +ó +ô +ö +ø +ù +ú +û +ü + diff --git a/ppocr/utils/ic15_dict.txt b/ppocr/utils/ic15_dict.txt index 71043689051fb5a2da516b2e005d1d9b0fdecfb3..6fbd99f46acca8391a5e86ae546c637399204506 100644 --- a/ppocr/utils/ic15_dict.txt +++ b/ppocr/utils/ic15_dict.txt @@ -34,3 +34,30 @@ w x y z +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z + diff --git a/ppocr/utils/japan_dict.txt b/ppocr/utils/japan_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..339d4b89e5159a346636641a0814874faa59754a --- /dev/null +++ b/ppocr/utils/japan_dict.txt @@ -0,0 +1,4399 @@ +! +" +# +$ +% +& +' +( +) +* ++ +, +- +. +/ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; +< += +> +? +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z +[ +] +_ +` +a +b +c +d +e +f +g +h +i +j +k +l +m +n +o +p +q +r +s +t +u +v +w +x +y +z +© +° +² +´ +½ +Á +Ä +Å +Ç +È +É +Í +Ó +Ö +× +Ü +ß +à +á +â +ã +ä +å +æ +ç +è +é +ê +ë +í +ð +ñ +ò +ó +ô +õ +ö +ø +ú +û +ü +ý +ā +ă +ą +ć +Č +č +đ +ē +ė +ę +ğ +ī +ı +Ł +ł +ń +ň +ō +ř +Ş +ş +Š +š +ţ +ū +ż +Ž +ž +Ș +ș +ț +Δ +α +λ +μ +φ +Г +О +а +в +л +о +р +с +т +я +ồ +​ +— +― +’ +“ +” +… +℃ +→ +∇ +− +■ +☆ +  +、 +。 +々 +〆 +〈 +〉 +「 +」 +『 +』 +〔 +〕 +〜 +ぁ +あ +ぃ +い +う +ぇ +え +ぉ +お +か +が +き +ぎ +く +ぐ +け +げ +こ +ご +さ +ざ +し +じ +す +ず +せ +ぜ +そ +ぞ +た +だ +ち +ぢ +っ +つ +づ +て +で +と +ど +な +に +ぬ +ね +の +は +ば +ぱ +ひ +び +ぴ +ふ +ぶ +ぷ +へ +べ +ぺ +ほ +ぼ +ぽ +ま +み +む +め +も +ゃ +や +ゅ +ゆ +ょ +よ +ら +り +る +れ +ろ +わ +ゑ +を +ん +ゝ +ゞ +ァ +ア +ィ +イ +ゥ +ウ +ェ +エ +ォ +オ +カ +ガ +キ +ギ +ク +グ +ケ +ゲ +コ +ゴ +サ +ザ +シ +ジ +ス +ズ +セ +ゼ +ソ +ゾ +タ +ダ +チ +ヂ +ッ +ツ +ヅ +テ +デ +ト +ド +ナ +ニ +ヌ +ネ +ノ +ハ +バ +パ +ヒ +ビ +ピ +フ +ブ +プ +ヘ +ベ +ペ +ホ +ボ +ポ +マ +ミ +ム +メ +モ +ャ +ヤ +ュ +ユ +ョ +ヨ +ラ +リ +ル +レ +ロ +ワ +ヰ +ン +ヴ +ヵ +ヶ +・ +ー +㈱ +一 +丁 +七 +万 +丈 +三 +上 +下 +不 +与 +丑 +且 +世 +丘 +丙 +丞 +両 +並 +中 +串 +丸 +丹 +主 +丼 +丿 +乃 +久 +之 +乎 +乏 +乗 +乘 +乙 +九 +乞 +也 +乱 +乳 +乾 +亀 +了 +予 +争 +事 +二 +于 +互 +五 +井 +亘 +亙 +些 +亜 +亟 +亡 +交 +亥 +亦 +亨 +享 +京 +亭 +亮 +人 +什 +仁 +仇 +今 +介 +仍 +仏 +仔 +仕 +他 +仗 +付 +仙 +代 +令 +以 +仮 +仰 +仲 +件 +任 +企 +伊 +伍 +伎 +伏 +伐 +休 +会 +伝 +伯 +估 +伴 +伶 +伸 +伺 +似 +伽 +佃 +但 +位 +低 +住 +佐 +佑 +体 +何 +余 +佚 +佛 +作 +佩 +佳 +併 +佶 +使 +侈 +例 +侍 +侏 +侑 +侘 +供 +依 +侠 +価 +侮 +侯 +侵 +侶 +便 +係 +促 +俄 +俊 +俔 +俗 +俘 +保 +信 +俣 +俤 +修 +俯 +俳 +俵 +俸 +俺 +倉 +個 +倍 +倒 +候 +借 +倣 +値 +倫 +倭 +倶 +倹 +偃 +假 +偈 +偉 +偏 +偐 +偕 +停 +健 +側 +偵 +偶 +偽 +傀 +傅 +傍 +傑 +傘 +備 +催 +傭 +傲 +傳 +債 +傷 +傾 +僊 +働 +像 +僑 +僕 +僚 +僧 +僭 +僮 +儀 +億 +儇 +儒 +儛 +償 +儡 +優 +儲 +儺 +儼 +兀 +允 +元 +兄 +充 +兆 +先 +光 +克 +兌 +免 +兎 +児 +党 +兜 +入 +全 +八 +公 +六 +共 +兵 +其 +具 +典 +兼 +内 +円 +冊 +再 +冑 +冒 +冗 +写 +冠 +冤 +冥 +冨 +冬 +冲 +决 +冶 +冷 +准 +凉 +凋 +凌 +凍 +凛 +凝 +凞 +几 +凡 +処 +凪 +凰 +凱 +凶 +凸 +凹 +出 +函 +刀 +刃 +分 +切 +刈 +刊 +刎 +刑 +列 +初 +判 +別 +利 +刪 +到 +制 +刷 +券 +刹 +刺 +刻 +剃 +則 +削 +剋 +前 +剖 +剛 +剣 +剤 +剥 +剪 +副 +剰 +割 +創 +剽 +劇 +劉 +劔 +力 +功 +加 +劣 +助 +努 +劫 +劭 +励 +労 +効 +劾 +勃 +勅 +勇 +勉 +勒 +動 +勘 +務 +勝 +募 +勢 +勤 +勧 +勲 +勺 +勾 +勿 +匁 +匂 +包 +匏 +化 +北 +匙 +匝 +匠 +匡 +匣 +匯 +匲 +匹 +区 +医 +匿 +十 +千 +升 +午 +卉 +半 +卍 +卑 +卒 +卓 +協 +南 +単 +博 +卜 +占 +卦 +卯 +印 +危 +即 +却 +卵 +卸 +卿 +厄 +厚 +原 +厠 +厨 +厩 +厭 +厳 +去 +参 +又 +叉 +及 +友 +双 +反 +収 +叔 +取 +受 +叙 +叛 +叟 +叡 +叢 +口 +古 +句 +叩 +只 +叫 +召 +可 +台 +叱 +史 +右 +叶 +号 +司 +吃 +各 +合 +吉 +吊 +同 +名 +后 +吏 +吐 +向 +君 +吝 +吟 +吠 +否 +含 +吸 +吹 +吻 +吽 +吾 +呂 +呆 +呈 +呉 +告 +呑 +周 +呪 +呰 +味 +呼 +命 +咀 +咄 +咋 +和 +咒 +咫 +咲 +咳 +咸 +哀 +品 +哇 +哉 +員 +哨 +哩 +哭 +哲 +哺 +唄 +唆 +唇 +唐 +唖 +唯 +唱 +唳 +唸 +唾 +啄 +商 +問 +啓 +啼 +善 +喋 +喚 +喜 +喝 +喧 +喩 +喪 +喫 +喬 +單 +喰 +営 +嗅 +嗇 +嗔 +嗚 +嗜 +嗣 +嘆 +嘉 +嘗 +嘘 +嘩 +嘯 +嘱 +嘲 +嘴 +噂 +噌 +噛 +器 +噴 +噺 +嚆 +嚢 +囀 +囃 +囉 +囚 +四 +回 +因 +団 +困 +囲 +図 +固 +国 +圀 +圃 +國 +圏 +園 +圓 +團 +圜 +土 +圧 +在 +圭 +地 +址 +坂 +均 +坊 +坐 +坑 +坡 +坤 +坦 +坪 +垂 +型 +垢 +垣 +埃 +埋 +城 +埒 +埔 +域 +埠 +埴 +埵 +執 +培 +基 +埼 +堀 +堂 +堅 +堆 +堕 +堤 +堪 +堯 +堰 +報 +場 +堵 +堺 +塀 +塁 +塊 +塑 +塔 +塗 +塘 +塙 +塚 +塞 +塩 +填 +塵 +塾 +境 +墉 +墓 +増 +墜 +墟 +墨 +墳 +墺 +墻 +墾 +壁 +壇 +壊 +壌 +壕 +士 +壬 +壮 +声 +壱 +売 +壷 +壹 +壺 +壽 +変 +夏 +夕 +外 +夙 +多 +夜 +夢 +夥 +大 +天 +太 +夫 +夬 +夭 +央 +失 +夷 +夾 +奄 +奇 +奈 +奉 +奎 +奏 +契 +奔 +奕 +套 +奘 +奠 +奢 +奥 +奨 +奪 +奮 +女 +奴 +奸 +好 +如 +妃 +妄 +妊 +妍 +妓 +妖 +妙 +妥 +妨 +妬 +妲 +妹 +妻 +妾 +姉 +始 +姐 +姓 +委 +姚 +姜 +姞 +姥 +姦 +姨 +姪 +姫 +姶 +姻 +姿 +威 +娑 +娘 +娟 +娠 +娩 +娯 +娼 +婆 +婉 +婚 +婢 +婦 +婬 +婿 +媄 +媒 +媓 +媚 +媛 +媞 +媽 +嫁 +嫄 +嫉 +嫌 +嫐 +嫗 +嫡 +嬉 +嬌 +嬢 +嬪 +嬬 +嬾 +孁 +子 +孔 +字 +存 +孚 +孝 +孟 +季 +孤 +学 +孫 +孵 +學 +宅 +宇 +守 +安 +宋 +完 +宍 +宏 +宕 +宗 +官 +宙 +定 +宛 +宜 +宝 +実 +客 +宣 +室 +宥 +宮 +宰 +害 +宴 +宵 +家 +宸 +容 +宿 +寂 +寄 +寅 +密 +寇 +富 +寒 +寓 +寔 +寛 +寝 +察 +寡 +實 +寧 +審 +寮 +寵 +寶 +寸 +寺 +対 +寿 +封 +専 +射 +将 +尉 +尊 +尋 +對 +導 +小 +少 +尖 +尚 +尤 +尪 +尭 +就 +尹 +尺 +尻 +尼 +尽 +尾 +尿 +局 +居 +屈 +届 +屋 +屍 +屎 +屏 +屑 +屓 +展 +属 +屠 +層 +履 +屯 +山 +岐 +岑 +岡 +岩 +岫 +岬 +岳 +岷 +岸 +峠 +峡 +峨 +峯 +峰 +島 +峻 +崇 +崋 +崎 +崑 +崖 +崗 +崛 +崩 +嵌 +嵐 +嵩 +嵯 +嶂 +嶋 +嶠 +嶺 +嶼 +嶽 +巀 +巌 +巒 +巖 +川 +州 +巡 +巣 +工 +左 +巧 +巨 +巫 +差 +己 +巳 +巴 +巷 +巻 +巽 +巾 +市 +布 +帆 +希 +帖 +帚 +帛 +帝 +帥 +師 +席 +帯 +帰 +帳 +帷 +常 +帽 +幄 +幅 +幇 +幌 +幔 +幕 +幟 +幡 +幢 +幣 +干 +平 +年 +并 +幸 +幹 +幻 +幼 +幽 +幾 +庁 +広 +庄 +庇 +床 +序 +底 +庖 +店 +庚 +府 +度 +座 +庫 +庭 +庵 +庶 +康 +庸 +廂 +廃 +廉 +廊 +廓 +廟 +廠 +廣 +廬 +延 +廷 +建 +廻 +廼 +廿 +弁 +弄 +弉 +弊 +弌 +式 +弐 +弓 +弔 +引 +弖 +弗 +弘 +弛 +弟 +弥 +弦 +弧 +弱 +張 +強 +弼 +弾 +彈 +彊 +彌 +彎 +当 +彗 +彙 +彝 +形 +彦 +彩 +彫 +彬 +彭 +彰 +影 +彷 +役 +彼 +往 +征 +徂 +径 +待 +律 +後 +徐 +徑 +徒 +従 +得 +徠 +御 +徧 +徨 +復 +循 +徭 +微 +徳 +徴 +德 +徹 +徽 +心 +必 +忉 +忌 +忍 +志 +忘 +忙 +応 +忠 +快 +忯 +念 +忻 +忽 +忿 +怒 +怖 +思 +怠 +怡 +急 +性 +怨 +怪 +怯 +恂 +恋 +恐 +恒 +恕 +恣 +恤 +恥 +恨 +恩 +恬 +恭 +息 +恵 +悉 +悌 +悍 +悔 +悟 +悠 +患 +悦 +悩 +悪 +悲 +悼 +情 +惇 +惑 +惚 +惜 +惟 +惠 +惣 +惧 +惨 +惰 +想 +惹 +惺 +愈 +愉 +愍 +意 +愔 +愚 +愛 +感 +愷 +愿 +慈 +態 +慌 +慎 +慕 +慢 +慣 +慧 +慨 +慮 +慰 +慶 +憂 +憎 +憐 +憑 +憙 +憤 +憧 +憩 +憬 +憲 +憶 +憾 +懇 +應 +懌 +懐 +懲 +懸 +懺 +懽 +懿 +戈 +戊 +戌 +戎 +成 +我 +戒 +戔 +或 +戚 +戟 +戦 +截 +戮 +戯 +戴 +戸 +戻 +房 +所 +扁 +扇 +扈 +扉 +手 +才 +打 +払 +托 +扮 +扱 +扶 +批 +承 +技 +抄 +把 +抑 +抓 +投 +抗 +折 +抜 +択 +披 +抱 +抵 +抹 +押 +抽 +担 +拇 +拈 +拉 +拍 +拏 +拐 +拒 +拓 +拘 +拙 +招 +拝 +拠 +拡 +括 +拭 +拳 +拵 +拶 +拾 +拿 +持 +挂 +指 +按 +挑 +挙 +挟 +挨 +振 +挺 +挽 +挿 +捉 +捕 +捗 +捜 +捧 +捨 +据 +捺 +捻 +掃 +掄 +授 +掌 +排 +掖 +掘 +掛 +掟 +採 +探 +掣 +接 +控 +推 +掩 +措 +掬 +掲 +掴 +掻 +掾 +揃 +揄 +揆 +揉 +描 +提 +揖 +揚 +換 +握 +揮 +援 +揶 +揺 +損 +搦 +搬 +搭 +携 +搾 +摂 +摘 +摩 +摸 +摺 +撃 +撒 +撞 +撤 +撥 +撫 +播 +撮 +撰 +撲 +撹 +擁 +操 +擔 +擦 +擬 +擾 +攘 +攝 +攣 +支 +收 +改 +攻 +放 +政 +故 +敏 +救 +敗 +教 +敢 +散 +敦 +敬 +数 +整 +敵 +敷 +斂 +文 +斉 +斎 +斐 +斑 +斗 +料 +斜 +斟 +斤 +斥 +斧 +斬 +断 +斯 +新 +方 +於 +施 +旁 +旅 +旋 +旌 +族 +旗 +旛 +无 +旡 +既 +日 +旦 +旧 +旨 +早 +旬 +旭 +旺 +旻 +昂 +昆 +昇 +昉 +昌 +明 +昏 +易 +昔 +星 +映 +春 +昧 +昨 +昪 +昭 +是 +昵 +昼 +晁 +時 +晃 +晋 +晏 +晒 +晟 +晦 +晧 +晩 +普 +景 +晴 +晶 +智 +暁 +暇 +暈 +暉 +暑 +暖 +暗 +暘 +暢 +暦 +暫 +暮 +暲 +暴 +暹 +暾 +曄 +曇 +曉 +曖 +曙 +曜 +曝 +曠 +曰 +曲 +曳 +更 +書 +曹 +曼 +曽 +曾 +替 +最 +會 +月 +有 +朋 +服 +朏 +朔 +朕 +朗 +望 +朝 +期 +朧 +木 +未 +末 +本 +札 +朱 +朴 +机 +朽 +杁 +杉 +李 +杏 +材 +村 +杓 +杖 +杜 +杞 +束 +条 +杢 +杣 +来 +杭 +杮 +杯 +東 +杲 +杵 +杷 +杼 +松 +板 +枅 +枇 +析 +枓 +枕 +林 +枚 +果 +枝 +枠 +枡 +枢 +枯 +枳 +架 +柄 +柊 +柏 +某 +柑 +染 +柔 +柘 +柚 +柯 +柱 +柳 +柴 +柵 +査 +柾 +柿 +栂 +栃 +栄 +栖 +栗 +校 +株 +栲 +栴 +核 +根 +栻 +格 +栽 +桁 +桂 +桃 +框 +案 +桐 +桑 +桓 +桔 +桜 +桝 +桟 +桧 +桴 +桶 +桾 +梁 +梅 +梆 +梓 +梔 +梗 +梛 +條 +梟 +梢 +梧 +梨 +械 +梱 +梲 +梵 +梶 +棄 +棋 +棒 +棗 +棘 +棚 +棟 +棠 +森 +棲 +棹 +棺 +椀 +椅 +椋 +植 +椎 +椏 +椒 +椙 +検 +椥 +椹 +椿 +楊 +楓 +楕 +楚 +楞 +楠 +楡 +楢 +楨 +楪 +楫 +業 +楮 +楯 +楳 +極 +楷 +楼 +楽 +概 +榊 +榎 +榕 +榛 +榜 +榮 +榱 +榴 +槃 +槇 +槊 +構 +槌 +槍 +槐 +様 +槙 +槻 +槽 +槿 +樂 +樋 +樓 +樗 +標 +樟 +模 +権 +横 +樫 +樵 +樹 +樺 +樽 +橇 +橋 +橘 +機 +橿 +檀 +檄 +檎 +檐 +檗 +檜 +檣 +檥 +檬 +檮 +檸 +檻 +櫃 +櫓 +櫛 +櫟 +櫨 +櫻 +欄 +欅 +欠 +次 +欣 +欧 +欲 +欺 +欽 +款 +歌 +歎 +歓 +止 +正 +此 +武 +歩 +歪 +歯 +歳 +歴 +死 +殆 +殉 +殊 +残 +殖 +殯 +殴 +段 +殷 +殺 +殻 +殿 +毀 +毅 +母 +毎 +毒 +比 +毘 +毛 +毫 +毬 +氈 +氏 +民 +気 +水 +氷 +永 +氾 +汀 +汁 +求 +汎 +汐 +汗 +汚 +汝 +江 +池 +汪 +汰 +汲 +決 +汽 +沂 +沃 +沅 +沆 +沈 +沌 +沐 +沓 +沖 +沙 +没 +沢 +沱 +河 +沸 +油 +治 +沼 +沽 +沿 +況 +泉 +泊 +泌 +法 +泗 +泡 +波 +泣 +泥 +注 +泯 +泰 +泳 +洋 +洒 +洗 +洛 +洞 +津 +洩 +洪 +洲 +洸 +洹 +活 +洽 +派 +流 +浄 +浅 +浙 +浚 +浜 +浣 +浦 +浩 +浪 +浮 +浴 +海 +浸 +涅 +消 +涌 +涙 +涛 +涯 +液 +涵 +涼 +淀 +淄 +淆 +淇 +淋 +淑 +淘 +淡 +淤 +淨 +淫 +深 +淳 +淵 +混 +淹 +添 +清 +済 +渉 +渋 +渓 +渕 +渚 +減 +渟 +渠 +渡 +渤 +渥 +渦 +温 +渫 +測 +港 +游 +渾 +湊 +湖 +湘 +湛 +湧 +湫 +湯 +湾 +湿 +満 +源 +準 +溜 +溝 +溢 +溥 +溪 +溶 +溺 +滄 +滅 +滋 +滌 +滑 +滕 +滝 +滞 +滴 +滸 +滹 +滿 +漁 +漂 +漆 +漉 +漏 +漑 +演 +漕 +漠 +漢 +漣 +漫 +漬 +漱 +漸 +漿 +潅 +潔 +潙 +潜 +潟 +潤 +潭 +潮 +潰 +潴 +澁 +澂 +澄 +澎 +澗 +澤 +澪 +澱 +澳 +激 +濁 +濃 +濟 +濠 +濡 +濤 +濫 +濯 +濱 +濾 +瀉 +瀋 +瀑 +瀕 +瀞 +瀟 +瀧 +瀬 +瀾 +灌 +灑 +灘 +火 +灯 +灰 +灸 +災 +炉 +炊 +炎 +炒 +炭 +炮 +炷 +点 +為 +烈 +烏 +烙 +烝 +烹 +焔 +焙 +焚 +無 +焦 +然 +焼 +煇 +煉 +煌 +煎 +煕 +煙 +煤 +煥 +照 +煩 +煬 +煮 +煽 +熈 +熊 +熙 +熟 +熨 +熱 +熹 +熾 +燃 +燈 +燎 +燔 +燕 +燗 +燥 +燭 +燻 +爆 +爐 +爪 +爬 +爲 +爵 +父 +爺 +爼 +爽 +爾 +片 +版 +牌 +牒 +牘 +牙 +牛 +牝 +牟 +牡 +牢 +牧 +物 +牲 +特 +牽 +犂 +犠 +犬 +犯 +状 +狂 +狄 +狐 +狗 +狙 +狛 +狡 +狩 +独 +狭 +狷 +狸 +狼 +猊 +猛 +猟 +猥 +猨 +猩 +猪 +猫 +献 +猴 +猶 +猷 +猾 +猿 +獄 +獅 +獏 +獣 +獲 +玄 +玅 +率 +玉 +王 +玖 +玩 +玲 +珀 +珂 +珈 +珉 +珊 +珍 +珎 +珞 +珠 +珣 +珥 +珪 +班 +現 +球 +理 +琉 +琢 +琥 +琦 +琮 +琲 +琳 +琴 +琵 +琶 +瑁 +瑋 +瑙 +瑚 +瑛 +瑜 +瑞 +瑠 +瑤 +瑩 +瑪 +瑳 +瑾 +璃 +璋 +璜 +璞 +璧 +璨 +環 +璵 +璽 +璿 +瓊 +瓔 +瓜 +瓢 +瓦 +瓶 +甍 +甑 +甕 +甘 +甚 +甞 +生 +産 +甥 +用 +甫 +田 +由 +甲 +申 +男 +町 +画 +界 +畏 +畑 +畔 +留 +畜 +畝 +畠 +畢 +略 +番 +異 +畳 +當 +畷 +畸 +畺 +畿 +疆 +疇 +疋 +疎 +疏 +疑 +疫 +疱 +疲 +疹 +疼 +疾 +病 +症 +痒 +痔 +痕 +痘 +痙 +痛 +痢 +痩 +痴 +痺 +瘍 +瘡 +瘧 +療 +癇 +癌 +癒 +癖 +癡 +癪 +発 +登 +白 +百 +的 +皆 +皇 +皋 +皐 +皓 +皮 +皺 +皿 +盂 +盃 +盆 +盈 +益 +盒 +盗 +盛 +盞 +盟 +盡 +監 +盤 +盥 +盧 +目 +盲 +直 +相 +盾 +省 +眉 +看 +県 +眞 +真 +眠 +眷 +眺 +眼 +着 +睡 +督 +睦 +睨 +睿 +瞋 +瞑 +瞞 +瞬 +瞭 +瞰 +瞳 +瞻 +瞼 +瞿 +矍 +矛 +矜 +矢 +知 +矧 +矩 +短 +矮 +矯 +石 +砂 +砌 +研 +砕 +砥 +砦 +砧 +砲 +破 +砺 +硝 +硫 +硬 +硯 +碁 +碇 +碌 +碑 +碓 +碕 +碗 +碣 +碧 +碩 +確 +碾 +磁 +磐 +磔 +磧 +磨 +磬 +磯 +礁 +礎 +礒 +礙 +礫 +礬 +示 +礼 +社 +祀 +祁 +祇 +祈 +祉 +祐 +祓 +祕 +祖 +祗 +祚 +祝 +神 +祟 +祠 +祢 +祥 +票 +祭 +祷 +祺 +禁 +禄 +禅 +禊 +禍 +禎 +福 +禔 +禖 +禛 +禦 +禧 +禮 +禰 +禹 +禽 +禿 +秀 +私 +秋 +科 +秒 +秘 +租 +秤 +秦 +秩 +称 +移 +稀 +程 +税 +稔 +稗 +稙 +稚 +稜 +稠 +種 +稱 +稲 +稷 +稻 +稼 +稽 +稿 +穀 +穂 +穆 +積 +穎 +穏 +穗 +穜 +穢 +穣 +穫 +穴 +究 +空 +突 +窃 +窄 +窒 +窓 +窟 +窠 +窩 +窪 +窮 +窯 +竃 +竄 +竈 +立 +站 +竜 +竝 +竟 +章 +童 +竪 +竭 +端 +竴 +競 +竹 +竺 +竽 +竿 +笄 +笈 +笏 +笑 +笙 +笛 +笞 +笠 +笥 +符 +第 +笹 +筅 +筆 +筇 +筈 +等 +筋 +筌 +筍 +筏 +筐 +筑 +筒 +答 +策 +筝 +筥 +筧 +筬 +筮 +筯 +筰 +筵 +箆 +箇 +箋 +箏 +箒 +箔 +箕 +算 +箙 +箜 +管 +箪 +箭 +箱 +箸 +節 +篁 +範 +篆 +篇 +築 +篋 +篌 +篝 +篠 +篤 +篥 +篦 +篩 +篭 +篳 +篷 +簀 +簒 +簡 +簧 +簪 +簫 +簺 +簾 +簿 +籀 +籃 +籌 +籍 +籐 +籟 +籠 +籤 +籬 +米 +籾 +粂 +粉 +粋 +粒 +粕 +粗 +粘 +粛 +粟 +粥 +粧 +粮 +粳 +精 +糊 +糖 +糜 +糞 +糟 +糠 +糧 +糯 +糸 +糺 +系 +糾 +紀 +約 +紅 +紋 +納 +紐 +純 +紗 +紘 +紙 +級 +紛 +素 +紡 +索 +紫 +紬 +累 +細 +紳 +紵 +紹 +紺 +絁 +終 +絃 +組 +絅 +経 +結 +絖 +絞 +絡 +絣 +給 +統 +絲 +絵 +絶 +絹 +絽 +綏 +經 +継 +続 +綜 +綟 +綬 +維 +綱 +網 +綴 +綸 +綺 +綽 +綾 +綿 +緊 +緋 +総 +緑 +緒 +線 +締 +緥 +編 +緩 +緬 +緯 +練 +緻 +縁 +縄 +縅 +縒 +縛 +縞 +縢 +縣 +縦 +縫 +縮 +縹 +總 +績 +繁 +繊 +繋 +繍 +織 +繕 +繝 +繦 +繧 +繰 +繹 +繼 +纂 +纈 +纏 +纐 +纒 +纛 +缶 +罔 +罠 +罧 +罪 +置 +罰 +署 +罵 +罷 +罹 +羂 +羅 +羆 +羇 +羈 +羊 +羌 +美 +群 +羨 +義 +羯 +羲 +羹 +羽 +翁 +翅 +翌 +習 +翔 +翛 +翠 +翡 +翫 +翰 +翺 +翻 +翼 +耀 +老 +考 +者 +耆 +而 +耐 +耕 +耗 +耨 +耳 +耶 +耽 +聊 +聖 +聘 +聚 +聞 +聟 +聡 +聨 +聯 +聰 +聲 +聴 +職 +聾 +肄 +肆 +肇 +肉 +肋 +肌 +肖 +肘 +肛 +肝 +股 +肢 +肥 +肩 +肪 +肯 +肱 +育 +肴 +肺 +胃 +胆 +背 +胎 +胖 +胚 +胝 +胞 +胡 +胤 +胱 +胴 +胸 +能 +脂 +脅 +脆 +脇 +脈 +脊 +脚 +脛 +脩 +脱 +脳 +腋 +腎 +腐 +腑 +腔 +腕 +腫 +腰 +腱 +腸 +腹 +腺 +腿 +膀 +膏 +膚 +膜 +膝 +膠 +膣 +膨 +膩 +膳 +膵 +膾 +膿 +臂 +臆 +臈 +臍 +臓 +臘 +臚 +臣 +臥 +臨 +自 +臭 +至 +致 +臺 +臼 +舂 +舅 +與 +興 +舌 +舍 +舎 +舒 +舖 +舗 +舘 +舜 +舞 +舟 +舩 +航 +般 +舳 +舶 +船 +艇 +艘 +艦 +艮 +良 +色 +艶 +芋 +芒 +芙 +芝 +芥 +芦 +芬 +芭 +芯 +花 +芳 +芸 +芹 +芻 +芽 +芿 +苅 +苑 +苔 +苗 +苛 +苞 +苡 +若 +苦 +苧 +苫 +英 +苴 +苻 +茂 +范 +茄 +茅 +茎 +茗 +茘 +茜 +茨 +茲 +茵 +茶 +茸 +茹 +草 +荊 +荏 +荒 +荘 +荷 +荻 +荼 +莞 +莪 +莫 +莬 +莱 +莵 +莽 +菅 +菊 +菌 +菓 +菖 +菘 +菜 +菟 +菩 +菫 +華 +菱 +菴 +萄 +萊 +萌 +萍 +萎 +萠 +萩 +萬 +萱 +落 +葉 +著 +葛 +葡 +董 +葦 +葩 +葬 +葭 +葱 +葵 +葺 +蒋 +蒐 +蒔 +蒙 +蒟 +蒡 +蒲 +蒸 +蒻 +蒼 +蒿 +蓄 +蓆 +蓉 +蓋 +蓑 +蓬 +蓮 +蓼 +蔀 +蔑 +蔓 +蔚 +蔡 +蔦 +蔬 +蔭 +蔵 +蔽 +蕃 +蕉 +蕊 +蕎 +蕨 +蕩 +蕪 +蕭 +蕾 +薄 +薇 +薊 +薔 +薗 +薙 +薛 +薦 +薨 +薩 +薪 +薫 +薬 +薭 +薮 +藁 +藉 +藍 +藏 +藐 +藝 +藤 +藩 +藪 +藷 +藹 +藺 +藻 +蘂 +蘆 +蘇 +蘊 +蘭 +虎 +虐 +虔 +虚 +虜 +虞 +號 +虫 +虹 +虻 +蚊 +蚕 +蛇 +蛉 +蛍 +蛎 +蛙 +蛛 +蛟 +蛤 +蛭 +蛮 +蛸 +蛹 +蛾 +蜀 +蜂 +蜃 +蜆 +蜊 +蜘 +蜜 +蜷 +蜻 +蝉 +蝋 +蝕 +蝙 +蝠 +蝦 +蝶 +蝿 +螂 +融 +螣 +螺 +蟄 +蟇 +蟠 +蟷 +蟹 +蟻 +蠢 +蠣 +血 +衆 +行 +衍 +衒 +術 +街 +衙 +衛 +衝 +衞 +衡 +衢 +衣 +表 +衫 +衰 +衵 +衷 +衽 +衾 +衿 +袁 +袈 +袋 +袍 +袒 +袖 +袙 +袞 +袢 +被 +袰 +袱 +袴 +袷 +袿 +裁 +裂 +裃 +装 +裏 +裔 +裕 +裘 +裙 +補 +裟 +裡 +裲 +裳 +裴 +裸 +裹 +製 +裾 +褂 +褄 +複 +褌 +褐 +褒 +褥 +褪 +褶 +褻 +襄 +襖 +襞 +襟 +襠 +襦 +襪 +襲 +襴 +襷 +西 +要 +覆 +覇 +覈 +見 +規 +視 +覗 +覚 +覧 +親 +覲 +観 +覺 +觀 +角 +解 +触 +言 +訂 +計 +討 +訓 +託 +記 +訛 +訟 +訢 +訥 +訪 +設 +許 +訳 +訴 +訶 +診 +註 +証 +詐 +詔 +評 +詛 +詞 +詠 +詢 +詣 +試 +詩 +詫 +詮 +詰 +話 +該 +詳 +誄 +誅 +誇 +誉 +誌 +認 +誓 +誕 +誘 +語 +誠 +誡 +誣 +誤 +誥 +誦 +説 +読 +誰 +課 +誼 +誾 +調 +談 +請 +諌 +諍 +諏 +諒 +論 +諚 +諜 +諟 +諡 +諦 +諧 +諫 +諭 +諮 +諱 +諶 +諷 +諸 +諺 +諾 +謀 +謄 +謌 +謎 +謗 +謙 +謚 +講 +謝 +謡 +謫 +謬 +謹 +證 +識 +譚 +譛 +譜 +警 +譬 +譯 +議 +譲 +譴 +護 +讀 +讃 +讐 +讒 +谷 +谿 +豅 +豆 +豊 +豎 +豐 +豚 +象 +豪 +豫 +豹 +貌 +貝 +貞 +負 +財 +貢 +貧 +貨 +販 +貪 +貫 +責 +貯 +貰 +貴 +買 +貸 +費 +貼 +貿 +賀 +賁 +賂 +賃 +賄 +資 +賈 +賊 +賎 +賑 +賓 +賛 +賜 +賞 +賠 +賢 +賣 +賤 +賦 +質 +賭 +購 +賽 +贄 +贅 +贈 +贋 +贔 +贖 +赤 +赦 +走 +赴 +起 +超 +越 +趙 +趣 +足 +趺 +趾 +跋 +跏 +距 +跡 +跨 +跪 +路 +跳 +践 +踊 +踏 +踐 +踞 +踪 +踵 +蹄 +蹉 +蹊 +蹟 +蹲 +蹴 +躅 +躇 +躊 +躍 +躑 +躙 +躪 +身 +躬 +躯 +躰 +車 +軋 +軌 +軍 +軒 +軟 +転 +軸 +軻 +軽 +軾 +較 +載 +輌 +輔 +輜 +輝 +輦 +輩 +輪 +輯 +輸 +輿 +轄 +轍 +轟 +轢 +辛 +辞 +辟 +辥 +辦 +辨 +辰 +辱 +農 +辺 +辻 +込 +迂 +迅 +迎 +近 +返 +迢 +迦 +迪 +迫 +迭 +述 +迷 +迹 +追 +退 +送 +逃 +逅 +逆 +逍 +透 +逐 +逓 +途 +逕 +逗 +這 +通 +逝 +逞 +速 +造 +逢 +連 +逮 +週 +進 +逸 +逼 +遁 +遂 +遅 +遇 +遊 +運 +遍 +過 +遐 +道 +達 +違 +遙 +遜 +遠 +遡 +遣 +遥 +適 +遭 +遮 +遯 +遵 +遷 +選 +遺 +遼 +避 +邀 +邁 +邂 +邃 +還 +邇 +邉 +邊 +邑 +那 +邦 +邨 +邪 +邯 +邵 +邸 +郁 +郊 +郎 +郡 +郢 +部 +郭 +郴 +郵 +郷 +都 +鄂 +鄙 +鄭 +鄰 +鄲 +酉 +酋 +酌 +配 +酎 +酒 +酔 +酢 +酥 +酪 +酬 +酵 +酷 +酸 +醍 +醐 +醒 +醗 +醜 +醤 +醪 +醵 +醸 +采 +釈 +釉 +釋 +里 +重 +野 +量 +釐 +金 +釘 +釜 +針 +釣 +釧 +釿 +鈍 +鈎 +鈐 +鈔 +鈞 +鈦 +鈴 +鈷 +鈸 +鈿 +鉄 +鉇 +鉉 +鉋 +鉛 +鉢 +鉤 +鉦 +鉱 +鉾 +銀 +銃 +銅 +銈 +銑 +銕 +銘 +銚 +銜 +銭 +鋏 +鋒 +鋤 +鋭 +鋲 +鋳 +鋸 +鋺 +鋼 +錆 +錍 +錐 +錘 +錠 +錣 +錦 +錫 +錬 +錯 +録 +錵 +鍋 +鍍 +鍑 +鍔 +鍛 +鍬 +鍮 +鍵 +鍼 +鍾 +鎌 +鎖 +鎗 +鎚 +鎧 +鎬 +鎮 +鎰 +鎹 +鏃 +鏑 +鏡 +鐃 +鐇 +鐐 +鐔 +鐘 +鐙 +鐚 +鐡 +鐵 +鐸 +鑁 +鑊 +鑑 +鑒 +鑚 +鑠 +鑢 +鑰 +鑵 +鑷 +鑼 +鑽 +鑿 +長 +門 +閃 +閇 +閉 +開 +閏 +閑 +間 +閔 +閘 +関 +閣 +閤 +閥 +閦 +閨 +閬 +閲 +閻 +閼 +閾 +闇 +闍 +闔 +闕 +闘 +關 +闡 +闢 +闥 +阜 +阪 +阮 +阯 +防 +阻 +阿 +陀 +陂 +附 +陌 +降 +限 +陛 +陞 +院 +陣 +除 +陥 +陪 +陬 +陰 +陳 +陵 +陶 +陸 +険 +陽 +隅 +隆 +隈 +隊 +隋 +階 +随 +隔 +際 +障 +隠 +隣 +隧 +隷 +隻 +隼 +雀 +雁 +雄 +雅 +集 +雇 +雉 +雊 +雋 +雌 +雍 +雑 +雖 +雙 +雛 +離 +難 +雨 +雪 +雫 +雰 +雲 +零 +雷 +雹 +電 +需 +震 +霊 +霍 +霖 +霜 +霞 +霧 +霰 +露 +靈 +青 +靖 +静 +靜 +非 +面 +革 +靫 +靭 +靱 +靴 +靺 +鞁 +鞄 +鞆 +鞋 +鞍 +鞏 +鞘 +鞠 +鞨 +鞭 +韋 +韓 +韜 +韮 +音 +韶 +韻 +響 +頁 +頂 +頃 +項 +順 +須 +頌 +預 +頑 +頒 +頓 +領 +頚 +頬 +頭 +頴 +頸 +頻 +頼 +顆 +題 +額 +顎 +顔 +顕 +顗 +願 +顛 +類 +顧 +顯 +風 +飛 +食 +飢 +飩 +飫 +飯 +飲 +飴 +飼 +飽 +飾 +餃 +餅 +餉 +養 +餌 +餐 +餓 +餘 +餝 +餡 +館 +饂 +饅 +饉 +饋 +饌 +饒 +饗 +首 +馗 +香 +馨 +馬 +馳 +馴 +駄 +駅 +駆 +駈 +駐 +駒 +駕 +駝 +駿 +騁 +騎 +騏 +騒 +験 +騙 +騨 +騰 +驕 +驚 +驛 +驢 +骨 +骸 +髄 +體 +高 +髙 +髢 +髪 +髭 +髮 +髷 +髻 +鬘 +鬚 +鬢 +鬨 +鬯 +鬱 +鬼 +魁 +魂 +魄 +魅 +魏 +魔 +魚 +魯 +鮎 +鮑 +鮒 +鮪 +鮫 +鮭 +鮮 +鯉 +鯔 +鯖 +鯛 +鯨 +鯰 +鯱 +鰐 +鰒 +鰭 +鰯 +鰰 +鰹 +鰻 +鱈 +鱒 +鱗 +鱧 +鳥 +鳩 +鳰 +鳳 +鳴 +鳶 +鴈 +鴉 +鴎 +鴛 +鴟 +鴦 +鴨 +鴫 +鴻 +鵄 +鵜 +鵞 +鵡 +鵬 +鵲 +鵺 +鶉 +鶏 +鶯 +鶴 +鷄 +鷙 +鷲 +鷹 +鷺 +鸚 +鸞 +鹸 +鹽 +鹿 +麁 +麒 +麓 +麗 +麝 +麞 +麟 +麦 +麩 +麹 +麺 +麻 +麾 +麿 +黄 +黌 +黍 +黒 +黙 +黛 +黠 +鼈 +鼉 +鼎 +鼓 +鼠 +鼻 +齊 +齋 +齟 +齢 +齬 +龍 +龕 +龗 +! +# +% +& +( +) ++ +, +- +. +/ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; += +? +@ +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +R +S +T +U +V +W +X +Z +a +c +d +e +f +h +i +j +k +l +m +n +o +p +r +s +t +u +y +z +~ +・ + diff --git a/ppocr/utils/korean_dict.txt b/ppocr/utils/korean_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..a13899f14dfe3bfc25b34904390c7b1e4ed8674b --- /dev/null +++ b/ppocr/utils/korean_dict.txt @@ -0,0 +1,3688 @@ +! +" +# +$ +% +& +' +* ++ +- +/ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; +< += +> +? +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z +[ +\ +] +^ +_ +` +a +b +c +d +e +f +g +h +i +j +k +l +m +n +o +p +q +r +s +t +u +v +w +x +y +z +{ +| +} +~ +© +° +² +½ +Á +Ä +Å +Ç +É +Í +Î +Ó +Ö +× +Ü +ß +à +á +â +ã +ä +å +æ +ç +è +é +ê +ë +ì +í +î +ï +ð +ñ +ò +ó +ô +õ +ö +ø +ú +û +ü +ý +ā +ă +ą +ć +Č +č +đ +ē +ė +ę +ě +ğ +ī +İ +ı +Ł +ł +ń +ň +ō +ř +Ş +ş +Š +š +ţ +ū +ź +ż +Ž +ž +Ș +ș +Α +Δ +α +λ +φ +Г +О +а +в +л +о +р +с +т +я +​ +’ +“ +” +→ +∇ +∼ +「 +」 +ア +カ +グ +ニ +ラ +ン +ㄱ +ㄴ +ㄷ +ㄸ +ㄹ +ㅂ +ㅅ +ㅆ +ㅇ +ㅈ +ㅊ +ㅋ +ㅌ +ㅎ +ㅓ +ㅜ +ㅣ +一 +丁 +七 +三 +上 +下 +不 +丑 +世 +丘 +丞 +中 +丸 +丹 +主 +乃 +久 +之 +乎 +乘 +九 +也 +乳 +乾 +事 +二 +云 +互 +五 +井 +亞 +亡 +交 +亥 +亨 +享 +京 +亭 +人 +仁 +今 +他 +仙 +代 +令 +以 +仰 +仲 +件 +任 +企 +伊 +伍 +伎 +伏 +伐 +休 +伯 +伴 +伸 +佃 +佈 +位 +低 +住 +佐 +何 +佛 +作 +使 +來 +供 +依 +侯 +侵 +侶 +便 +俗 +保 +俠 +信 +修 +俱 +俳 +倉 +個 +倍 +倒 +候 +借 +値 +倫 +倭 +假 +偈 +偉 +偏 +停 +偶 +傅 +傑 +傳 +傷 +傾 +像 +僞 +僥 +僧 +價 +儀 +儉 +儒 +優 +儼 +兀 +允 +元 +兆 +先 +光 +克 +兒 +入 +內 +全 +八 +公 +六 +共 +兵 +其 +具 +典 +兼 +再 +冠 +冥 +冶 +准 +凞 +凡 +凱 +出 +函 +刀 +分 +刊 +刑 +列 +初 +判 +別 +利 +到 +制 +券 +刺 +刻 +則 +前 +剛 +副 +創 +劃 +劑 +力 +功 +加 +劣 +助 +劫 +勇 +動 +務 +勝 +勢 +勳 +勸 +匈 +化 +北 +匠 +區 +十 +千 +午 +半 +卍 +卑 +卒 +卓 +南 +博 +卜 +占 +卦 +印 +危 +卵 +卷 +卽 +卿 +厄 +原 +厦 +去 +參 +又 +叉 +友 +反 +叔 +受 +口 +古 +句 +可 +台 +史 +右 +司 +各 +合 +吉 +同 +名 +后 +吏 +吐 +君 +吠 +吳 +呂 +告 +周 +味 +呵 +命 +和 +咳 +咸 +咽 +哀 +品 +哨 +哮 +哲 +唐 +唯 +唱 +商 +問 +啼 +善 +喆 +喉 +喜 +喩 +喪 +嘗 +器 +嚴 +囊 +四 +回 +因 +困 +固 +圈 +國 +圍 +園 +圓 +圖 +團 +土 +在 +地 +均 +坊 +坐 +坑 +坵 +型 +垢 +城 +域 +埴 +執 +培 +基 +堂 +堅 +堆 +堤 +堯 +報 +場 +塔 +塚 +塞 +塵 +境 +墜 +墟 +墨 +墳 +墾 +壁 +壇 +壓 +壤 +士 +壬 +壯 +壺 +壽 +夏 +夕 +外 +多 +夜 +夢 +大 +天 +太 +夫 +央 +失 +夷 +奄 +奇 +奉 +奎 +奏 +契 +奔 +奮 +女 +奴 +好 +如 +妄 +妊 +妖 +妙 +始 +姑 +姓 +姚 +姜 +威 +婆 +婚 +婦 +媒 +媚 +子 +孔 +字 +存 +孝 +孟 +季 +孤 +孫 +學 +孺 +宇 +守 +安 +宋 +宗 +官 +宙 +定 +客 +宣 +室 +宮 +害 +家 +容 +寂 +寃 +寄 +寅 +密 +寇 +富 +寒 +寓 +實 +審 +寫 +寬 +寶 +寸 +寺 +封 +將 +專 +尊 +對 +小 +少 +尙 +尹 +尼 +尿 +局 +居 +屈 +屋 +屍 +屎 +屛 +層 +屬 +山 +岐 +岡 +岩 +岳 +岸 +峙 +峰 +島 +峻 +峽 +崇 +崔 +崖 +崩 +嶋 +巖 +川 +州 +巢 +工 +左 +巧 +巨 +巫 +差 +己 +巷 +市 +布 +帝 +師 +帶 +常 +帽 +幕 +干 +平 +年 +幹 +幻 +幼 +幽 +庇 +序 +店 +府 +度 +座 +庫 +庭 +康 +廟 +廣 +廳 +延 +廷 +建 +廻 +弁 +式 +弑 +弓 +引 +弘 +弟 +弱 +張 +强 +弼 +彌 +彛 +形 +彬 +影 +役 +彼 +彿 +往 +征 +待 +律 +後 +徐 +徑 +得 +從 +循 +微 +德 +徹 +心 +必 +忌 +忍 +志 +忠 +思 +怡 +急 +性 +恐 +恒 +恨 +恩 +悅 +悖 +患 +悲 +情 +惑 +惟 +惠 +惡 +想 +惺 +愁 +意 +愚 +愛 +感 +愼 +慈 +態 +慕 +慣 +慧 +慾 +憂 +憤 +憺 +應 +懸 +戎 +成 +我 +戟 +戮 +戰 +戴 +戶 +房 +所 +手 +才 +打 +批 +承 +技 +抄 +把 +抗 +抱 +抽 +拇 +拓 +拘 +拙 +拜 +拾 +持 +指 +捌 +捨 +捿 +授 +掌 +排 +接 +推 +提 +揚 +揭 +援 +損 +搗 +摩 +播 +操 +擒 +擔 +擘 +據 +擧 +攘 +攝 +攬 +支 +改 +攻 +放 +政 +故 +敍 +敎 +救 +敗 +散 +敬 +整 +數 +文 +斗 +料 +斛 +斜 +斧 +斯 +新 +斷 +方 +於 +施 +旋 +族 +旗 +日 +旨 +早 +旱 +昌 +明 +易 +昔 +星 +春 +昧 +昭 +是 +時 +晉 +晋 +晩 +普 +景 +晴 +晶 +智 +暈 +暑 +暗 +暘 +曉 +曜 +曠 +曦 +曰 +曲 +書 +曹 +曼 +曾 +最 +會 +月 +有 +朋 +服 +望 +朝 +期 +木 +未 +末 +本 +朱 +朴 +李 +材 +村 +杖 +杜 +杞 +杭 +杯 +東 +松 +板 +林 +果 +枝 +枯 +枰 +枾 +柏 +柑 +柱 +栗 +校 +栢 +核 +根 +格 +桀 +桂 +案 +桎 +桑 +桓 +桔 +梁 +梏 +梓 +梗 +條 +梨 +梵 +棗 +棟 +森 +植 +椒 +楊 +楓 +楚 +業 +楮 +極 +榮 +槃 +槍 +樂 +樓 +樗 +樣 +樸 +樹 +樺 +樽 +橄 +橋 +橘 +機 +橡 +檀 +檎 +權 +欌 +欖 +次 +欲 +歌 +歐 +止 +正 +此 +步 +武 +歲 +歸 +死 +殖 +段 +殷 +殺 +殿 +毅 +母 +毒 +比 +毛 +氏 +民 +氣 +水 +永 +求 +汎 +汗 +江 +池 +沅 +沒 +沖 +沙 +沛 +河 +油 +治 +沼 +沿 +泉 +泊 +法 +泗 +泡 +波 +注 +泰 +洋 +洙 +洛 +洞 +津 +洲 +活 +派 +流 +浅 +浦 +浮 +浴 +海 +涅 +涇 +消 +涌 +液 +淑 +淡 +淨 +淫 +深 +淳 +淵 +淸 +渠 +渡 +游 +渾 +湖 +湯 +源 +溪 +溫 +溶 +滄 +滅 +滋 +滯 +滿 +漁 +漆 +漢 +漫 +漸 +潑 +潤 +潭 +澄 +澎 +澤 +澳 +澹 +濁 +濕 +濟 +濤 +濯 +瀋 +瀝 +灣 +火 +灰 +灸 +災 +炎 +炭 +点 +烈 +烏 +烙 +焚 +無 +焦 +然 +煌 +煎 +照 +煬 +煮 +熟 +熱 +燁 +燈 +燔 +燕 +燥 +燧 +燮 +爲 +爵 +父 +片 +版 +牌 +牛 +牝 +牟 +牡 +物 +特 +犧 +犬 +狀 +狗 +猥 +猩 +猪 +獨 +獵 +獸 +獻 +玄 +玉 +王 +玲 +珍 +珠 +珪 +班 +現 +球 +理 +琴 +瑞 +瑟 +瑪 +璃 +璋 +璽 +瓜 +瓦 +甑 +甘 +生 +産 +用 +甫 +田 +由 +甲 +申 +男 +界 +畏 +留 +畜 +畢 +略 +番 +異 +畵 +當 +畸 +疏 +疑 +疫 +疹 +疼 +病 +症 +痔 +痛 +痺 +瘀 +瘍 +瘡 +療 +癌 +癖 +登 +發 +白 +百 +的 +皆 +皇 +皮 +盂 +盆 +益 +盛 +盜 +盟 +盡 +盤 +盧 +目 +直 +相 +省 +看 +眞 +眼 +睡 +督 +瞋 +矢 +矣 +知 +短 +石 +破 +碍 +碑 +磁 +磨 +磬 +示 +社 +祇 +祖 +祝 +神 +祥 +祭 +祺 +禁 +禅 +禍 +福 +禦 +禪 +禮 +禹 +禽 +禾 +秀 +私 +秉 +秋 +科 +秘 +秤 +秦 +秩 +移 +稀 +稗 +種 +稱 +稷 +稼 +稽 +穀 +穆 +積 +空 +窮 +竅 +立 +章 +童 +竭 +端 +竹 +笑 +符 +第 +筆 +等 +筍 +答 +策 +箋 +箕 +管 +箱 +節 +篇 +簡 +米 +粉 +粘 +粥 +精 +糖 +糞 +系 +紀 +紂 +約 +紅 +紋 +純 +紙 +級 +素 +索 +紫 +紬 +累 +細 +紳 +終 +組 +結 +絡 +統 +絲 +絶 +絹 +經 +綠 +維 +綱 +網 +綸 +綽 +緖 +線 +緣 +緯 +縣 +縱 +總 +織 +繡 +繩 +繪 +繭 +纂 +續 +罕 +置 +罰 +羅 +羊 +美 +群 +義 +羽 +翁 +習 +翟 +老 +考 +者 +而 +耐 +耕 +耳 +聃 +聖 +聞 +聰 +聲 +職 +肇 +肉 +肖 +肝 +股 +肥 +育 +肺 +胃 +胎 +胚 +胞 +胡 +胥 +能 +脂 +脈 +脚 +脛 +脣 +脩 +脫 +脯 +脾 +腋 +腎 +腫 +腸 +腹 +膜 +膠 +膨 +膽 +臆 +臟 +臣 +臥 +臨 +自 +至 +致 +臺 +臼 +臾 +與 +興 +舊 +舌 +舍 +舒 +舜 +舟 +般 +船 +艦 +良 +色 +芋 +花 +芳 +芽 +苑 +苔 +苕 +苛 +苞 +若 +苦 +英 +茂 +茵 +茶 +茹 +荀 +荇 +草 +荒 +荷 +莊 +莫 +菊 +菌 +菜 +菩 +菫 +華 +菴 +菽 +萊 +萍 +萬 +落 +葉 +著 +葛 +董 +葬 +蒙 +蒜 +蒲 +蒸 +蒿 +蓮 +蔓 +蔘 +蔡 +蔬 +蕃 +蕉 +蕓 +薄 +薑 +薛 +薩 +薪 +薺 +藏 +藝 +藤 +藥 +藩 +藻 +蘆 +蘇 +蘊 +蘚 +蘭 +虎 +處 +虛 +虞 +虹 +蜀 +蜂 +蜜 +蝕 +蝶 +融 +蟬 +蟲 +蠶 +蠻 +血 +衆 +行 +術 +衛 +衡 +衣 +表 +袁 +裔 +裕 +裙 +補 +製 +複 +襄 +西 +要 +見 +視 +親 +覺 +觀 +角 +解 +言 +訂 +訊 +訓 +託 +記 +訣 +設 +診 +註 +評 +詩 +話 +詵 +誅 +誌 +認 +誕 +語 +誠 +誤 +誥 +誦 +說 +調 +談 +諍 +論 +諡 +諫 +諭 +諸 +謙 +講 +謝 +謠 +證 +識 +譚 +譜 +譯 +議 +護 +讀 +變 +谷 +豆 +豊 +豚 +象 +豪 +豫 +貝 +貞 +財 +貧 +貨 +貪 +貫 +貴 +貸 +費 +資 +賊 +賓 +賞 +賢 +賣 +賦 +質 +贍 +赤 +赫 +走 +起 +超 +越 +趙 +趣 +趨 +足 +趾 +跋 +跡 +路 +踏 +蹟 +身 +躬 +車 +軍 +軒 +軟 +載 +輓 +輕 +輪 +輯 +輸 +輻 +輿 +轅 +轉 +辨 +辭 +辯 +辰 +農 +近 +迦 +述 +追 +逆 +透 +逐 +通 +逝 +造 +逢 +連 +進 +逵 +遂 +遊 +運 +遍 +過 +道 +達 +遠 +遡 +適 +遷 +選 +遺 +遽 +還 +邊 +邑 +那 +邪 +郞 +郡 +部 +都 +鄒 +鄕 +鄭 +鄲 +配 +酒 +酸 +醉 +醫 +醯 +釋 +里 +重 +野 +量 +釐 +金 +針 +鈍 +鈴 +鉞 +銀 +銅 +銘 +鋼 +錄 +錢 +錦 +鎭 +鏡 +鐘 +鐵 +鑑 +鑛 +長 +門 +閃 +開 +間 +閔 +閣 +閥 +閭 +閻 +闕 +關 +阪 +防 +阿 +陀 +降 +限 +陝 +院 +陰 +陳 +陵 +陶 +陸 +陽 +隆 +隊 +隋 +階 +際 +障 +隣 +隨 +隱 +隷 +雀 +雄 +雅 +集 +雇 +雌 +雖 +雙 +雜 +離 +難 +雨 +雪 +雲 +電 +霜 +露 +靈 +靑 +靖 +靜 +非 +面 +革 +靴 +鞏 +韓 +音 +韶 +韻 +順 +須 +頊 +頌 +領 +頭 +顔 +願 +顚 +類 +顯 +風 +飛 +食 +飢 +飮 +飯 +飾 +養 +餓 +餘 +首 +香 +馨 +馬 +駒 +騫 +騷 +驕 +骨 +骸 +髓 +體 +高 +髥 +髮 +鬪 +鬱 +鬼 +魏 +魔 +魚 +魯 +鮮 +鰍 +鰐 +鳥 +鳧 +鳳 +鴨 +鵲 +鶴 +鷄 +鷹 +鹽 +鹿 +麗 +麥 +麻 +黃 +黑 +默 +點 +黨 +鼎 +齊 +齋 +齒 +龍 +龜 +가 +각 +간 +갇 +갈 +갉 +감 +갑 +값 +갓 +갔 +강 +갖 +갗 +같 +갚 +갛 +개 +객 +갠 +갤 +갬 +갭 +갯 +갰 +갱 +갸 +걀 +걔 +걘 +거 +걱 +건 +걷 +걸 +검 +겁 +것 +겄 +겅 +겆 +겉 +겊 +겋 +게 +겐 +겔 +겟 +겠 +겡 +겨 +격 +겪 +견 +결 +겸 +겹 +겻 +겼 +경 +곁 +계 +곕 +곗 +고 +곡 +곤 +곧 +골 +곪 +곬 +곯 +곰 +곱 +곳 +공 +곶 +과 +곽 +관 +괄 +괌 +광 +괘 +괜 +괭 +괴 +괸 +굉 +교 +구 +국 +군 +굳 +굴 +굵 +굶 +굼 +굽 +굿 +궁 +궂 +궈 +권 +궐 +궜 +궝 +궤 +귀 +귄 +귈 +귓 +규 +균 +귤 +그 +극 +근 +글 +긁 +금 +급 +긋 +긍 +기 +긴 +길 +김 +깁 +깃 +깅 +깊 +까 +깍 +깎 +깐 +깔 +깜 +깝 +깟 +깡 +깥 +깨 +깬 +깰 +깻 +깼 +깽 +꺄 +꺼 +꺽 +꺾 +껀 +껄 +껌 +껍 +껏 +껐 +껑 +께 +껴 +꼈 +꼍 +꼐 +꼬 +꼭 +꼴 +꼼 +꼽 +꼿 +꽁 +꽂 +꽃 +꽉 +꽝 +꽤 +꽥 +꾀 +꾜 +꾸 +꾹 +꾼 +꿀 +꿇 +꿈 +꿉 +꿋 +꿍 +꿎 +꿔 +꿨 +꿩 +꿰 +꿴 +뀄 +뀌 +뀐 +뀔 +뀜 +뀝 +끄 +끈 +끊 +끌 +끓 +끔 +끕 +끗 +끙 +끝 +끼 +끽 +낀 +낄 +낌 +낍 +낏 +낑 +나 +낙 +낚 +난 +낟 +날 +낡 +남 +납 +낫 +났 +낭 +낮 +낯 +낱 +낳 +내 +낵 +낸 +낼 +냄 +냅 +냇 +냈 +냉 +냐 +냔 +냘 +냥 +너 +넉 +넋 +넌 +널 +넓 +넘 +넙 +넛 +넜 +넝 +넣 +네 +넥 +넨 +넬 +넴 +넵 +넷 +넸 +넹 +녀 +녁 +년 +념 +녔 +녕 +녘 +녜 +노 +녹 +논 +놀 +놈 +놋 +농 +높 +놓 +놔 +놨 +뇌 +뇨 +뇩 +뇽 +누 +눅 +눈 +눌 +눔 +눕 +눗 +눠 +눴 +뉘 +뉜 +뉩 +뉴 +늄 +늅 +늉 +느 +늑 +는 +늘 +늙 +늠 +늡 +능 +늦 +늪 +늬 +니 +닉 +닌 +닐 +님 +닙 +닛 +닝 +닢 +다 +닥 +닦 +단 +닫 +달 +닭 +닮 +닯 +닳 +담 +답 +닷 +당 +닻 +닿 +대 +댁 +댄 +댈 +댐 +댑 +댓 +댔 +댕 +댜 +더 +덕 +덖 +던 +덜 +덟 +덤 +덥 +덧 +덩 +덫 +덮 +데 +덱 +덴 +델 +뎀 +뎃 +뎅 +뎌 +뎠 +뎨 +도 +독 +돈 +돋 +돌 +돔 +돕 +돗 +동 +돛 +돝 +돼 +됐 +되 +된 +될 +됨 +됩 +됴 +두 +둑 +둔 +둘 +둠 +둡 +둣 +둥 +둬 +뒀 +뒤 +뒬 +뒷 +뒹 +듀 +듈 +듐 +드 +득 +든 +듣 +들 +듦 +듬 +듭 +듯 +등 +듸 +디 +딕 +딘 +딛 +딜 +딤 +딥 +딧 +딨 +딩 +딪 +따 +딱 +딴 +딸 +땀 +땄 +땅 +때 +땐 +땔 +땜 +땝 +땠 +땡 +떠 +떡 +떤 +떨 +떫 +떰 +떱 +떳 +떴 +떵 +떻 +떼 +떽 +뗀 +뗄 +뗍 +뗏 +뗐 +뗑 +또 +똑 +똘 +똥 +뙤 +뚜 +뚝 +뚤 +뚫 +뚱 +뛰 +뛴 +뛸 +뜀 +뜁 +뜨 +뜩 +뜬 +뜯 +뜰 +뜸 +뜻 +띄 +띈 +띌 +띔 +띕 +띠 +띤 +띨 +띱 +띵 +라 +락 +란 +랄 +람 +랍 +랏 +랐 +랑 +랒 +랗 +래 +랙 +랜 +랠 +램 +랩 +랫 +랬 +랭 +랴 +략 +량 +러 +럭 +런 +럴 +럼 +럽 +럿 +렀 +렁 +렇 +레 +렉 +렌 +렐 +렘 +렙 +렛 +렝 +려 +력 +련 +렬 +렴 +렵 +렷 +렸 +령 +례 +로 +록 +론 +롤 +롬 +롭 +롯 +롱 +롸 +롹 +뢰 +뢴 +뢸 +룃 +료 +룐 +룡 +루 +룩 +룬 +룰 +룸 +룹 +룻 +룽 +뤄 +뤘 +뤼 +류 +륙 +륜 +률 +륨 +륭 +르 +륵 +른 +를 +름 +릅 +릇 +릉 +릎 +리 +릭 +린 +릴 +림 +립 +릿 +링 +마 +막 +만 +많 +맏 +말 +맑 +맘 +맙 +맛 +망 +맞 +맡 +맣 +매 +맥 +맨 +맬 +맴 +맵 +맷 +맸 +맹 +맺 +먀 +먁 +머 +먹 +먼 +멀 +멈 +멋 +멍 +멎 +메 +멕 +멘 +멜 +멤 +멥 +멧 +멩 +며 +멱 +면 +멸 +몄 +명 +몇 +모 +목 +몫 +몬 +몰 +몸 +몹 +못 +몽 +뫼 +묘 +무 +묵 +묶 +문 +묻 +물 +묽 +뭄 +뭅 +뭇 +뭉 +뭍 +뭏 +뭐 +뭔 +뭘 +뭡 +뭣 +뮈 +뮌 +뮐 +뮤 +뮬 +므 +믈 +믐 +미 +믹 +민 +믿 +밀 +밈 +밉 +밋 +밌 +밍 +및 +밑 +바 +박 +밖 +반 +받 +발 +밝 +밟 +밤 +밥 +밧 +방 +밭 +배 +백 +밴 +밸 +뱀 +뱁 +뱃 +뱄 +뱅 +뱉 +뱍 +뱐 +버 +벅 +번 +벌 +범 +법 +벗 +벙 +벚 +베 +벡 +벤 +벨 +벰 +벱 +벳 +벵 +벼 +벽 +변 +별 +볍 +볏 +볐 +병 +볕 +보 +복 +볶 +본 +볼 +봄 +봅 +봇 +봉 +봐 +봤 +뵈 +뵐 +뵙 +부 +북 +분 +붇 +불 +붉 +붐 +붓 +붕 +붙 +뷔 +뷰 +뷴 +뷸 +브 +븐 +블 +비 +빅 +빈 +빌 +빔 +빕 +빗 +빙 +빚 +빛 +빠 +빡 +빤 +빨 +빳 +빴 +빵 +빻 +빼 +빽 +뺀 +뺄 +뺌 +뺏 +뺐 +뺑 +뺨 +뻐 +뻑 +뻔 +뻗 +뻘 +뻣 +뻤 +뻥 +뻬 +뼈 +뼉 +뼘 +뽀 +뽈 +뽐 +뽑 +뽕 +뾰 +뿌 +뿍 +뿐 +뿔 +뿜 +쁘 +쁜 +쁠 +쁨 +삐 +삔 +삘 +사 +삭 +삯 +산 +살 +삵 +삶 +삼 +삽 +삿 +샀 +상 +샅 +새 +색 +샌 +샐 +샘 +샙 +샛 +샜 +생 +샤 +샨 +샬 +샴 +샵 +샷 +샹 +서 +석 +섞 +선 +섣 +설 +섬 +섭 +섯 +섰 +성 +섶 +세 +섹 +센 +셀 +셈 +셉 +셋 +셌 +셍 +셔 +션 +셜 +셨 +셰 +셴 +셸 +소 +속 +손 +솔 +솜 +솝 +솟 +송 +솥 +쇄 +쇠 +쇤 +쇳 +쇼 +숀 +숄 +숍 +수 +숙 +순 +숟 +술 +숨 +숩 +숫 +숭 +숯 +숱 +숲 +숴 +쉐 +쉘 +쉬 +쉭 +쉰 +쉴 +쉼 +쉽 +슈 +슐 +슘 +슛 +슝 +스 +슥 +슨 +슬 +슭 +슴 +습 +슷 +승 +시 +식 +신 +싣 +실 +싫 +심 +십 +싯 +싱 +싶 +싸 +싹 +싼 +쌀 +쌈 +쌉 +쌌 +쌍 +쌓 +쌔 +쌘 +쌩 +써 +썩 +썬 +썰 +썸 +썹 +썼 +썽 +쎄 +쎈 +쏘 +쏙 +쏜 +쏟 +쏠 +쏭 +쏴 +쐈 +쐐 +쐬 +쑤 +쑥 +쑨 +쒀 +쒔 +쓰 +쓱 +쓴 +쓸 +씀 +씁 +씌 +씨 +씩 +씬 +씰 +씸 +씹 +씻 +씽 +아 +악 +안 +앉 +않 +알 +앎 +앓 +암 +압 +앗 +았 +앙 +앞 +애 +액 +앤 +앨 +앰 +앱 +앳 +앴 +앵 +야 +약 +얀 +얄 +얇 +얌 +얍 +얏 +양 +얕 +얗 +얘 +얜 +어 +억 +언 +얹 +얻 +얼 +얽 +엄 +업 +없 +엇 +었 +엉 +엊 +엌 +엎 +에 +엑 +엔 +엘 +엠 +엡 +엣 +엥 +여 +역 +엮 +연 +열 +엷 +염 +엽 +엾 +엿 +였 +영 +옅 +옆 +옇 +예 +옌 +옐 +옙 +옛 +오 +옥 +온 +올 +옭 +옮 +옳 +옴 +옵 +옷 +옹 +옻 +와 +왁 +완 +왈 +왑 +왓 +왔 +왕 +왜 +왠 +왱 +외 +왼 +요 +욕 +욘 +욜 +욤 +용 +우 +욱 +운 +울 +움 +웁 +웃 +웅 +워 +웍 +원 +월 +웜 +웠 +웡 +웨 +웬 +웰 +웸 +웹 +위 +윅 +윈 +윌 +윔 +윗 +윙 +유 +육 +윤 +율 +윱 +윳 +융 +으 +윽 +은 +을 +읊 +음 +읍 +응 +의 +읜 +읠 +이 +익 +인 +일 +읽 +잃 +임 +입 +잇 +있 +잉 +잊 +잎 +자 +작 +잔 +잖 +잘 +잠 +잡 +잣 +잤 +장 +잦 +재 +잭 +잰 +잴 +잽 +잿 +쟀 +쟁 +쟈 +쟉 +쟤 +저 +적 +전 +절 +젊 +점 +접 +젓 +정 +젖 +제 +젝 +젠 +젤 +젬 +젭 +젯 +져 +젼 +졀 +졌 +졍 +조 +족 +존 +졸 +좀 +좁 +종 +좇 +좋 +좌 +좍 +좽 +죄 +죠 +죤 +주 +죽 +준 +줄 +줌 +줍 +줏 +중 +줘 +줬 +쥐 +쥔 +쥘 +쥬 +쥴 +즈 +즉 +즌 +즐 +즘 +즙 +증 +지 +직 +진 +짇 +질 +짊 +짐 +집 +짓 +징 +짖 +짙 +짚 +짜 +짝 +짠 +짢 +짤 +짧 +짬 +짭 +짰 +짱 +째 +짹 +짼 +쨀 +쨉 +쨋 +쨌 +쨍 +쩄 +쩌 +쩍 +쩐 +쩔 +쩜 +쩝 +쩡 +쩨 +쪄 +쪘 +쪼 +쪽 +쪾 +쫀 +쫄 +쫑 +쫓 +쫙 +쬐 +쭈 +쭉 +쭐 +쭙 +쯔 +쯤 +쯧 +찌 +찍 +찐 +찔 +찜 +찝 +찡 +찢 +찧 +차 +착 +찬 +찮 +찰 +참 +찹 +찻 +찼 +창 +찾 +채 +책 +챈 +챌 +챔 +챕 +챗 +챘 +챙 +챠 +챤 +처 +척 +천 +철 +첨 +첩 +첫 +청 +체 +첵 +첸 +첼 +쳄 +쳇 +쳉 +쳐 +쳔 +쳤 +초 +촉 +촌 +촘 +촛 +총 +촨 +촬 +최 +쵸 +추 +축 +춘 +출 +춤 +춥 +춧 +충 +춰 +췄 +췌 +취 +췬 +츄 +츠 +측 +츨 +츰 +층 +치 +칙 +친 +칠 +칡 +침 +칩 +칫 +칭 +카 +칵 +칸 +칼 +캄 +캅 +캇 +캉 +캐 +캔 +캘 +캠 +캡 +캣 +캤 +캥 +캬 +커 +컥 +컨 +컫 +컬 +컴 +컵 +컷 +컸 +컹 +케 +켄 +켈 +켐 +켓 +켕 +켜 +켠 +켤 +켭 +켯 +켰 +코 +콕 +콘 +콜 +콤 +콥 +콧 +콩 +콰 +콱 +콴 +콸 +쾅 +쾌 +쾡 +쾨 +쾰 +쿄 +쿠 +쿡 +쿤 +쿨 +쿰 +쿵 +쿼 +퀀 +퀄 +퀘 +퀭 +퀴 +퀵 +퀸 +퀼 +큐 +큘 +크 +큰 +클 +큼 +큽 +키 +킥 +킨 +킬 +킴 +킵 +킷 +킹 +타 +탁 +탄 +탈 +탉 +탐 +탑 +탓 +탔 +탕 +태 +택 +탠 +탤 +탬 +탭 +탯 +탰 +탱 +터 +턱 +턴 +털 +텀 +텁 +텃 +텄 +텅 +테 +텍 +텐 +텔 +템 +텝 +텡 +텨 +톈 +토 +톡 +톤 +톨 +톰 +톱 +톳 +통 +퇴 +툇 +투 +툭 +툰 +툴 +툼 +퉁 +퉈 +퉜 +튀 +튄 +튈 +튕 +튜 +튠 +튤 +튬 +트 +특 +튼 +튿 +틀 +틈 +틉 +틋 +틔 +티 +틱 +틴 +틸 +팀 +팁 +팅 +파 +팍 +팎 +판 +팔 +팜 +팝 +팟 +팠 +팡 +팥 +패 +팩 +팬 +팰 +팸 +팻 +팼 +팽 +퍼 +퍽 +펀 +펄 +펌 +펍 +펐 +펑 +페 +펙 +펜 +펠 +펨 +펩 +펫 +펭 +펴 +편 +펼 +폄 +폈 +평 +폐 +포 +폭 +폰 +폴 +폼 +폿 +퐁 +표 +푭 +푸 +푹 +푼 +풀 +품 +풋 +풍 +퓨 +퓬 +퓰 +퓸 +프 +픈 +플 +픔 +픕 +피 +픽 +핀 +필 +핌 +핍 +핏 +핑 +하 +학 +한 +할 +핥 +함 +합 +핫 +항 +해 +핵 +핸 +핼 +햄 +햅 +햇 +했 +행 +햐 +향 +헀 +허 +헉 +헌 +헐 +험 +헙 +헛 +헝 +헤 +헥 +헨 +헬 +헴 +헵 +헷 +헹 +혀 +혁 +현 +혈 +혐 +협 +혓 +혔 +형 +혜 +호 +혹 +혼 +홀 +홈 +홉 +홋 +홍 +홑 +화 +확 +환 +활 +홧 +황 +홰 +홱 +횃 +회 +획 +횝 +횟 +횡 +효 +후 +훅 +훈 +훌 +훑 +훔 +훗 +훤 +훨 +훼 +휄 +휑 +휘 +휙 +휜 +휠 +휩 +휭 +휴 +휼 +흄 +흉 +흐 +흑 +흔 +흘 +흙 +흠 +흡 +흣 +흥 +흩 +희 +흰 +흽 +히 +힉 +힌 +힐 +힘 +힙 +힝 +車 +滑 +金 +奈 +羅 +洛 +卵 +欄 +蘭 +郎 +來 +盧 +老 +魯 +綠 +鹿 +論 +雷 +樓 +縷 +凌 +樂 +不 +參 +葉 +沈 +若 +兩 +凉 +梁 +呂 +女 +廬 +麗 +黎 +曆 +歷 +戀 +蓮 +連 +列 +烈 +裂 +念 +獵 +靈 +領 +例 +禮 +醴 +惡 +尿 +料 +遼 +龍 +暈 +柳 +流 +類 +六 +陸 +倫 +律 +栗 +利 +李 +梨 +理 +離 +燐 +林 +臨 +立 +茶 +切 +宅 + diff --git a/ppocr/utils/save_load.py b/ppocr/utils/save_load.py old mode 100755 new mode 100644 index 4d23f62656f5561dd93f40fa97a3f7874e4b2040..f2346b3d22bbc4cf2e6d3ac54eaa1b378df6338b --- a/ppocr/utils/save_load.py +++ b/ppocr/utils/save_load.py @@ -21,7 +21,6 @@ import os import shutil import tempfile -import paddle import paddle.fluid as fluid from .utility import initial_logger @@ -110,17 +109,20 @@ def init_model(config, program, exe): """ checkpoints = config['Global'].get('checkpoints') if checkpoints: - path = checkpoints - fluid.load(program, path, exe) - logger.info("Finish initing model from {}".format(path)) - return - - pretrain_weights = config['Global'].get('pretrain_weights') - if pretrain_weights: - path = pretrain_weights - load_params(exe, program, path) - logger.info("Finish initing model from {}".format(path)) - return + if os.path.exists(checkpoints + '.pdparams'): + path = checkpoints + fluid.load(program, path, exe) + logger.info("Finish initing model from {}".format(path)) + else: + raise ValueError("Model checkpoints {} does not exists," + "check if you lost the file prefix.".format( + checkpoints + '.pdparams')) + else: + pretrain_weights = config['Global'].get('pretrain_weights') + if pretrain_weights: + path = pretrain_weights + load_params(exe, program, path) + logger.info("Finish initing model from {}".format(path)) def save_model(program, model_path): diff --git a/ppocr/utils/utility.py b/ppocr/utils/utility.py index c824d4404ad5f1fbee0d672fbc06d365f7d4eade..2cf3c8f5c9ebba07ee1c21fe2248fe3f600126d9 100755 --- a/ppocr/utils/utility.py +++ b/ppocr/utils/utility.py @@ -14,6 +14,9 @@ import logging import os +import imghdr +import cv2 +from paddle import fluid def initial_logger(): @@ -61,29 +64,29 @@ def get_image_file_list(img_file): if img_file is None or not os.path.exists(img_file): raise Exception("not found any img file in {}".format(img_file)) - img_end = ['jpg', 'png', 'jpeg', 'JPEG', 'JPG', 'bmp'] - if os.path.isfile(img_file) and img_file.split('.')[-1] in img_end: + img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif', 'GIF'} + if os.path.isfile(img_file) and imghdr.what(img_file) in img_end: imgs_lists.append(img_file) elif os.path.isdir(img_file): for single_file in os.listdir(img_file): - if single_file.split('.')[-1] in img_end: - imgs_lists.append(os.path.join(img_file, single_file)) + file_path = os.path.join(img_file, single_file) + if imghdr.what(file_path) in img_end: + imgs_lists.append(file_path) if len(imgs_lists) == 0: raise Exception("not found any img file in {}".format(img_file)) return imgs_lists -from paddle import fluid - +def check_and_read_gif(img_path): + if os.path.basename(img_path)[-3:] in ['gif', 'GIF']: + gif = cv2.VideoCapture(img_path) + ret, frame = gif.read() + if not ret: + logging.info("Cannot read {}. This gif image maybe corrupted.") + return None, False + if len(frame.shape) == 2 or frame.shape[-1] == 1: + frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2RGB) + imgvalue = frame[:, :, ::-1] + return imgvalue, True + return None, False -def create_multi_devices_program(program, loss_var_name): - build_strategy = fluid.BuildStrategy() - build_strategy.memory_optimize = False - build_strategy.enable_inplace = True - exec_strategy = fluid.ExecutionStrategy() - exec_strategy.num_iteration_per_drop_scope = 1 - compile_program = fluid.CompiledProgram(program).with_data_parallel( - loss_name=loss_var_name, - build_strategy=build_strategy, - exec_strategy=exec_strategy) - return compile_program diff --git a/requirments.txt b/requirments.txt index 94e8478ffad88a6e5cd69424c6aa485400cfae06..ec538138beaed70ec8f5285ea0c4114f22e3b0ef 100644 --- a/requirments.txt +++ b/requirments.txt @@ -1,4 +1,6 @@ shapely imgaug pyclipper -lmdb \ No newline at end of file +lmdb +tqdm +numpy \ No newline at end of file diff --git a/setup.py b/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..2cea853dca9ed67a64e52595b0f95606d21ca16e --- /dev/null +++ b/setup.py @@ -0,0 +1,56 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from setuptools import setup +from io import open + +with open('requirments.txt', encoding="utf-8-sig") as f: + requirements = f.readlines() + requirements.append('tqdm') + + +def readme(): + with open('doc/doc_en/whl_en.md', encoding="utf-8-sig") as f: + README = f.read() + return README + + +setup( + name='paddleocr', + packages=['paddleocr'], + package_dir={'paddleocr': ''}, + include_package_data=True, + entry_points={"console_scripts": ["paddleocr= paddleocr.paddleocr:main"]}, + version='1.0.0', + install_requires=requirements, + license='Apache License 2.0', + description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices', + long_description=readme(), + long_description_content_type='text/markdown', + url='https://github.com/PaddlePaddle/PaddleOCR', + download_url='https://github.com/PaddlePaddle/PaddleOCR.git', + keywords=[ + 'ocr textdetection textrecognition paddleocr crnn east star-net rosetta ocrlite db chineseocr chinesetextdetection chinesetextrecognition' + ], + classifiers=[ + 'Intended Audience :: Developers', 'Operating System :: OS Independent', + 'Natural Language :: Chinese (Simplified)', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.2', + 'Programming Language :: Python :: 3.3', + 'Programming Language :: Python :: 3.4', + 'Programming Language :: Python :: 3.5', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', 'Topic :: Utilities' + ], ) diff --git a/tools/eval.py b/tools/eval.py index 2db6273a7b00ccded42fc876b1bf06dfbfce9a6a..aff5fc7111a062c9b4346e9c2dcbc8f9225fe8da 100755 --- a/tools/eval.py +++ b/tools/eval.py @@ -18,9 +18,9 @@ from __future__ import print_function import os import sys -__dir__ = os.path.dirname(__file__) +__dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) -sys.path.append(os.path.join(__dir__, '..')) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) def set_paddle_flags(**kwargs): @@ -45,26 +45,12 @@ from ppocr.utils.save_load import init_model from eval_utils.eval_det_utils import eval_det_run from eval_utils.eval_rec_utils import test_rec_benchmark from eval_utils.eval_rec_utils import eval_rec_run -from ppocr.utils.character import CharacterOps +from eval_utils.eval_cls_utils import eval_cls_run def main(): - config = program.load_config(FLAGS.config) - program.merge_config(FLAGS.opt) - logger.info(config) - - # check if set use_gpu=True in paddlepaddle cpu version - use_gpu = config['Global']['use_gpu'] - program.check_gpu(use_gpu) - - alg = config['Global']['algorithm'] - assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE'] - if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']: - config['Global']['char_ops'] = CharacterOps(config['Global']) - - place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() - startup_prog = fluid.Program() - eval_program = fluid.Program() + startup_prog, eval_program, place, config, train_alg_type = program.preprocess( + ) eval_build_outputs = program.build( config, eval_program, startup_prog, mode='test') eval_fetch_name_list = eval_build_outputs[1] @@ -75,14 +61,22 @@ def main(): init_model(config, eval_program, exe) - if alg in ['EAST', 'DB']: + if train_alg_type == 'det': eval_reader = reader_main(config=config, mode="eval") eval_info_dict = {'program':eval_program,\ 'reader':eval_reader,\ 'fetch_name_list':eval_fetch_name_list,\ 'fetch_varname_list':eval_fetch_varname_list} metrics = eval_det_run(exe, config, eval_info_dict, "eval") - print("Eval result", metrics) + logger.info("Eval result: {}".format(metrics)) + elif train_alg_type == 'cls': + eval_reader = reader_main(config=config, mode="eval") + eval_info_dict = {'program': eval_program, \ + 'reader': eval_reader, \ + 'fetch_name_list': eval_fetch_name_list, \ + 'fetch_varname_list': eval_fetch_varname_list} + metrics = eval_cls_run(exe, eval_info_dict) + logger.info("Eval result: {}".format(metrics)) else: reader_type = config['Global']['reader_yml'] if "benchmark" not in reader_type: @@ -92,7 +86,7 @@ def main(): 'fetch_name_list': eval_fetch_name_list, \ 'fetch_varname_list': eval_fetch_varname_list} metrics = eval_rec_run(exe, config, eval_info_dict, "eval") - print("Eval result:", metrics) + logger.info("Eval result: {}".format(metrics)) else: eval_info_dict = {'program':eval_program,\ 'fetch_name_list':eval_fetch_name_list,\ @@ -101,6 +95,4 @@ def main(): if __name__ == '__main__': - parser = program.ArgsParser() - FLAGS = parser.parse_args() main() diff --git a/tools/eval_utils/eval_cls_utils.py b/tools/eval_utils/eval_cls_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..9c9b26677dc57b70f7641c26e5ef57ce1d77f1af --- /dev/null +++ b/tools/eval_utils/eval_cls_utils.py @@ -0,0 +1,70 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np + +__all__ = ['eval_cls_run'] + +import logging + +FORMAT = '%(asctime)s-%(levelname)s: %(message)s' +logging.basicConfig(level=logging.INFO, format=FORMAT) +logger = logging.getLogger(__name__) + + +def eval_cls_run(exe, eval_info_dict): + """ + Run evaluation program, return program outputs. + """ + total_sample_num = 0 + total_acc_num = 0 + total_batch_num = 0 + + for data in eval_info_dict['reader'](): + img_num = len(data) + img_list = [] + label_list = [] + for ino in range(img_num): + img_list.append(data[ino][0]) + label_list.append(data[ino][1]) + + img_list = np.concatenate(img_list, axis=0) + outs = exe.run(eval_info_dict['program'], \ + feed={'image': img_list}, \ + fetch_list=eval_info_dict['fetch_varname_list'], \ + return_numpy=False) + softmax_outs = np.array(outs[1]) + if len(softmax_outs.shape) != 1: + softmax_outs = np.array(outs[0]) + acc, acc_num = cal_cls_acc(softmax_outs, label_list) + total_acc_num += acc_num + total_sample_num += len(label_list) + # logger.info("eval batch id: {}, acc: {}".format(total_batch_num, acc)) + total_batch_num += 1 + avg_acc = total_acc_num * 1.0 / total_sample_num + metrics = {'avg_acc': avg_acc, "total_acc_num": total_acc_num, \ + "total_sample_num": total_sample_num} + return metrics + + +def cal_cls_acc(preds, labels): + acc_num = 0 + for pred, label in zip(preds, labels): + if pred == label: + acc_num += 1 + return acc_num / len(preds), acc_num diff --git a/tools/eval_utils/eval_det_iou.py b/tools/eval_utils/eval_det_iou.py index 2f5ff2f10f1d5ecd1d33d2c742633945288ffe4c..64405984e3b90aa58f21324e727713075c5c4dc4 100644 --- a/tools/eval_utils/eval_det_iou.py +++ b/tools/eval_utils/eval_det_iou.py @@ -88,8 +88,8 @@ class DetectionIoUEvaluator(object): points = gt[n]['points'] # transcription = gt[n]['text'] dontCare = gt[n]['ignore'] - points = Polygon(points) - points = points.buffer(0) +# points = Polygon(points) +# points = points.buffer(0) if not Polygon(points).is_valid or not Polygon(points).is_simple: continue @@ -105,8 +105,8 @@ class DetectionIoUEvaluator(object): for n in range(len(pred)): points = pred[n]['points'] - points = Polygon(points) - points = points.buffer(0) +# points = Polygon(points) +# points = points.buffer(0) if not Polygon(points).is_valid or not Polygon(points).is_simple: continue diff --git a/tools/eval_utils/eval_rec_utils.py b/tools/eval_utils/eval_rec_utils.py index a88455439463f17009a0584b8e5a04a43a92883e..4479d9dff2c352c54a26ac1bfdbddab497fff418 100644 --- a/tools/eval_utils/eval_rec_utils.py +++ b/tools/eval_utils/eval_rec_utils.py @@ -29,7 +29,7 @@ FORMAT = '%(asctime)s-%(levelname)s: %(message)s' logging.basicConfig(level=logging.INFO, format=FORMAT) logger = logging.getLogger(__name__) -from ppocr.utils.character import cal_predicts_accuracy +from ppocr.utils.character import cal_predicts_accuracy, cal_predicts_accuracy_srn from ppocr.utils.character import convert_rec_label_to_lod from ppocr.utils.character import convert_rec_attention_infer_res from ppocr.utils.utility import create_module @@ -60,21 +60,60 @@ def eval_rec_run(exe, config, eval_info_dict, mode): for ino in range(img_num): img_list.append(data[ino][0]) label_list.append(data[ino][1]) - img_list = np.concatenate(img_list, axis=0) - outs = exe.run(eval_info_dict['program'], \ + + if config['Global']['loss_type'] != "srn": + img_list = np.concatenate(img_list, axis=0) + outs = exe.run(eval_info_dict['program'], \ feed={'image': img_list}, \ fetch_list=eval_info_dict['fetch_varname_list'], \ return_numpy=False) - preds = np.array(outs[0]) - if preds.shape[1] != 1: - preds, preds_lod = convert_rec_attention_infer_res(preds) + preds = np.array(outs[0]) + + if config['Global']['loss_type'] == "attention": + preds, preds_lod = convert_rec_attention_infer_res(preds) + else: + preds_lod = outs[0].lod()[0] + labels, labels_lod = convert_rec_label_to_lod(label_list) + acc, acc_num, sample_num = cal_predicts_accuracy( + char_ops, preds, preds_lod, labels, labels_lod, + is_remove_duplicate) else: - preds_lod = outs[0].lod()[0] - labels, labels_lod = convert_rec_label_to_lod(label_list) - acc, acc_num, sample_num = cal_predicts_accuracy( - char_ops, preds, preds_lod, labels, labels_lod, is_remove_duplicate) + encoder_word_pos_list = [] + gsrm_word_pos_list = [] + gsrm_slf_attn_bias1_list = [] + gsrm_slf_attn_bias2_list = [] + for ino in range(img_num): + encoder_word_pos_list.append(data[ino][2]) + gsrm_word_pos_list.append(data[ino][3]) + gsrm_slf_attn_bias1_list.append(data[ino][4]) + gsrm_slf_attn_bias2_list.append(data[ino][5]) + + img_list = np.concatenate(img_list, axis=0) + label_list = np.concatenate(label_list, axis=0) + encoder_word_pos_list = np.concatenate( + encoder_word_pos_list, axis=0).astype(np.int64) + gsrm_word_pos_list = np.concatenate( + gsrm_word_pos_list, axis=0).astype(np.int64) + gsrm_slf_attn_bias1_list = np.concatenate( + gsrm_slf_attn_bias1_list, axis=0).astype(np.float32) + gsrm_slf_attn_bias2_list = np.concatenate( + gsrm_slf_attn_bias2_list, axis=0).astype(np.float32) + + labels = label_list + + outs = exe.run(eval_info_dict['program'], \ + feed={'image': img_list, 'encoder_word_pos': encoder_word_pos_list, + 'gsrm_word_pos': gsrm_word_pos_list, 'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1_list, + 'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2_list}, \ + fetch_list=eval_info_dict['fetch_varname_list'], \ + return_numpy=False) + preds = np.array(outs[0]) + acc, acc_num, sample_num = cal_predicts_accuracy_srn( + char_ops, preds, labels, config['Global']['max_text_length']) + total_acc_num += acc_num total_sample_num += sample_num + #logger.info("eval batch id: {}, acc: {}".format(total_batch_num, acc)) total_batch_num += 1 avg_acc = total_acc_num * 1.0 / total_sample_num metrics = {'avg_acc': avg_acc, "total_acc_num": total_acc_num, \ diff --git a/tools/export_model.py b/tools/export_model.py index 4415eda84048395da094c0bc6e5685979f94a2e8..0bd06b98dcacc06893becbefacbea198c432bc39 100644 --- a/tools/export_model.py +++ b/tools/export_model.py @@ -18,9 +18,9 @@ from __future__ import print_function import os import sys -__dir__ = os.path.dirname(__file__) +__dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) -sys.path.append(os.path.join(__dir__, '..')) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) def set_paddle_flags(**kwargs): @@ -41,27 +41,11 @@ from paddle import fluid from ppocr.utils.utility import initial_logger logger = initial_logger() from ppocr.utils.save_load import init_model -from ppocr.utils.character import CharacterOps -from ppocr.utils.utility import create_module -def main(): - config = program.load_config(FLAGS.config) - program.merge_config(FLAGS.opt) - logger.info(config) - - # check if set use_gpu=True in paddlepaddle cpu version - use_gpu = config['Global']['use_gpu'] - program.check_gpu(use_gpu) - alg = config['Global']['algorithm'] - assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE'] - if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']: - config['Global']['char_ops'] = CharacterOps(config['Global']) - - place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() - startup_prog = fluid.Program() - eval_program = fluid.Program() +def main(): + startup_prog, eval_program, place, config, _ = program.preprocess() feeded_var_names, target_vars, fetches_var_name = program.build_export( config, eval_program, startup_prog) @@ -88,6 +72,4 @@ def main(): if __name__ == '__main__': - parser = program.ArgsParser() - FLAGS = parser.parse_args() main() diff --git a/tools/infer/predict_cls.py b/tools/infer/predict_cls.py new file mode 100755 index 0000000000000000000000000000000000000000..3c14011a24cf5afcecc5edd5a54e395a0f171f53 --- /dev/null +++ b/tools/infer/predict_cls.py @@ -0,0 +1,146 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import sys + +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) + +import tools.infer.utility as utility +from ppocr.utils.utility import initial_logger + +logger = initial_logger() +from ppocr.utils.utility import get_image_file_list, check_and_read_gif +import cv2 +import copy +import numpy as np +import math +import time +from paddle import fluid + + +class TextClassifier(object): + def __init__(self, args): + self.predictor, self.input_tensor, self.output_tensors = \ + utility.create_predictor(args, mode="cls") + self.cls_image_shape = [int(v) for v in args.cls_image_shape.split(",")] + self.cls_batch_num = args.rec_batch_num + self.label_list = args.label_list + self.use_zero_copy_run = args.use_zero_copy_run + self.cls_thresh = args.cls_thresh + + def resize_norm_img(self, img): + imgC, imgH, imgW = self.cls_image_shape + h = img.shape[0] + w = img.shape[1] + ratio = w / float(h) + if math.ceil(imgH * ratio) > imgW: + resized_w = imgW + else: + resized_w = int(math.ceil(imgH * ratio)) + resized_image = cv2.resize(img, (resized_w, imgH)) + resized_image = resized_image.astype('float32') + if self.cls_image_shape[0] == 1: + resized_image = resized_image / 255 + resized_image = resized_image[np.newaxis, :] + else: + resized_image = resized_image.transpose((2, 0, 1)) / 255 + resized_image -= 0.5 + resized_image /= 0.5 + padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) + padding_im[:, :, 0:resized_w] = resized_image + return padding_im + + def __call__(self, img_list): + img_list = copy.deepcopy(img_list) + img_num = len(img_list) + # Calculate the aspect ratio of all text bars + width_list = [] + for img in img_list: + width_list.append(img.shape[1] / float(img.shape[0])) + # Sorting can speed up the cls process + indices = np.argsort(np.array(width_list)) + + cls_res = [['', 0.0]] * img_num + batch_num = self.cls_batch_num + predict_time = 0 + for beg_img_no in range(0, img_num, batch_num): + end_img_no = min(img_num, beg_img_no + batch_num) + norm_img_batch = [] + max_wh_ratio = 0 + for ino in range(beg_img_no, end_img_no): + h, w = img_list[indices[ino]].shape[0:2] + wh_ratio = w * 1.0 / h + max_wh_ratio = max(max_wh_ratio, wh_ratio) + for ino in range(beg_img_no, end_img_no): + norm_img = self.resize_norm_img(img_list[indices[ino]]) + norm_img = norm_img[np.newaxis, :] + norm_img_batch.append(norm_img) + norm_img_batch = np.concatenate(norm_img_batch) + norm_img_batch = norm_img_batch.copy() + starttime = time.time() + + if self.use_zero_copy_run: + self.input_tensor.copy_from_cpu(norm_img_batch) + self.predictor.zero_copy_run() + else: + norm_img_batch = fluid.core.PaddleTensor(norm_img_batch) + self.predictor.run([norm_img_batch]) + + prob_out = self.output_tensors[0].copy_to_cpu() + label_out = self.output_tensors[1].copy_to_cpu() + if len(label_out.shape) != 1: + prob_out, label_out = label_out, prob_out + + elapse = time.time() - starttime + predict_time += elapse + for rno in range(len(label_out)): + label_idx = label_out[rno] + score = prob_out[rno][label_idx] + label = self.label_list[label_idx] + cls_res[indices[beg_img_no + rno]] = [label, score] + if '180' in label and score > self.cls_thresh: + img_list[indices[beg_img_no + rno]] = cv2.rotate( + img_list[indices[beg_img_no + rno]], 1) + return img_list, cls_res, predict_time + + +def main(args): + image_file_list = get_image_file_list(args.image_dir) + text_classifier = TextClassifier(args) + valid_image_file_list = [] + img_list = [] + for image_file in image_file_list[:10]: + img, flag = check_and_read_gif(image_file) + if not flag: + img = cv2.imread(image_file) + if img is None: + logger.info("error in loading image:{}".format(image_file)) + continue + valid_image_file_list.append(image_file) + img_list.append(img) + try: + img_list, cls_res, predict_time = text_classifier(img_list) + except Exception as e: + print(e) + exit() + for ino in range(len(img_list)): + print("Predicts of %s:%s" % (valid_image_file_list[ino], cls_res[ino])) + print("Total predict time for %d images:%.3f" % + (len(img_list), predict_time)) + + +if __name__ == "__main__": + main(utility.parse_args()) diff --git a/tools/infer/predict_det.py b/tools/infer/predict_det.py index 93a21c512d77d495842b5750a034df4d5255acfb..c57986b7590d6ea526097e5c251e9ea7827d36f7 100755 --- a/tools/infer/predict_det.py +++ b/tools/infer/predict_det.py @@ -13,30 +13,36 @@ # limitations under the License. import os import sys -__dir__ = os.path.dirname(__file__) +__dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) -sys.path.append(os.path.join(__dir__, '../..')) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) + +import cv2 +import copy +import numpy as np +import math +import time +import sys + +import paddle.fluid as fluid import tools.infer.utility as utility from ppocr.utils.utility import initial_logger logger = initial_logger() -from ppocr.utils.utility import get_image_file_list -import cv2 +from ppocr.utils.utility import get_image_file_list, check_and_read_gif +from ppocr.data.det.sast_process import SASTProcessTest from ppocr.data.det.east_process import EASTProcessTest from ppocr.data.det.db_process import DBProcessTest from ppocr.postprocess.db_postprocess import DBPostProcess from ppocr.postprocess.east_postprocess import EASTPostPocess -import copy -import numpy as np -import math -import time -import sys +from ppocr.postprocess.sast_postprocess import SASTPostProcess class TextDetector(object): def __init__(self, args): max_side_len = args.det_max_side_len self.det_algorithm = args.det_algorithm + self.use_zero_copy_run = args.use_zero_copy_run preprocess_params = {'max_side_len': max_side_len} postprocess_params = {} if self.det_algorithm == "DB": @@ -52,6 +58,20 @@ class TextDetector(object): postprocess_params["cover_thresh"] = args.det_east_cover_thresh postprocess_params["nms_thresh"] = args.det_east_nms_thresh self.postprocess_op = EASTPostPocess(postprocess_params) + elif self.det_algorithm == "SAST": + self.preprocess_op = SASTProcessTest(preprocess_params) + postprocess_params["score_thresh"] = args.det_sast_score_thresh + postprocess_params["nms_thresh"] = args.det_sast_nms_thresh + self.det_sast_polygon = args.det_sast_polygon + if self.det_sast_polygon: + postprocess_params["sample_pts_num"] = 6 + postprocess_params["expand_scale"] = 1.2 + postprocess_params["shrink_ratio_of_width"] = 0.2 + else: + postprocess_params["sample_pts_num"] = 2 + postprocess_params["expand_scale"] = 1.0 + postprocess_params["shrink_ratio_of_width"] = 0.3 + self.postprocess_op = SASTPostProcess(postprocess_params) else: logger.info("unknown det_algorithm:{}".format(self.det_algorithm)) sys.exit(0) @@ -84,7 +104,7 @@ class TextDetector(object): return rect def clip_det_res(self, points, img_height, img_width): - for pno in range(4): + for pno in range(points.shape[0]): points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1)) points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1)) return points @@ -97,12 +117,21 @@ class TextDetector(object): box = self.clip_det_res(box, img_height, img_width) rect_width = int(np.linalg.norm(box[0] - box[1])) rect_height = int(np.linalg.norm(box[0] - box[3])) - if rect_width <= 10 or rect_height <= 10: + if rect_width <= 3 or rect_height <= 3: continue dt_boxes_new.append(box) dt_boxes = np.array(dt_boxes_new) return dt_boxes + def filter_tag_det_res_only_clip(self, dt_boxes, image_shape): + img_height, img_width = image_shape[0:2] + dt_boxes_new = [] + for box in dt_boxes: + box = self.clip_det_res(box, img_height, img_width) + dt_boxes_new.append(box) + dt_boxes = np.array(dt_boxes_new) + return dt_boxes + def __call__(self, img): ori_im = img.copy() im, ratio_list = self.preprocess_op(img) @@ -110,8 +139,12 @@ class TextDetector(object): return None, 0 im = im.copy() starttime = time.time() - self.input_tensor.copy_from_cpu(im) - self.predictor.zero_copy_run() + if self.use_zero_copy_run: + self.input_tensor.copy_from_cpu(im) + self.predictor.zero_copy_run() + else: + im = fluid.core.PaddleTensor(im) + self.predictor.run([im]) outputs = [] for output_tensor in self.output_tensors: output = output_tensor.copy_to_cpu() @@ -120,11 +153,20 @@ class TextDetector(object): if self.det_algorithm == "EAST": outs_dict['f_geo'] = outputs[0] outs_dict['f_score'] = outputs[1] + elif self.det_algorithm == 'SAST': + outs_dict['f_border'] = outputs[0] + outs_dict['f_score'] = outputs[1] + outs_dict['f_tco'] = outputs[2] + outs_dict['f_tvo'] = outputs[3] else: outs_dict['maps'] = outputs[0] + dt_boxes_list = self.postprocess_op(outs_dict, [ratio_list]) dt_boxes = dt_boxes_list[0] - dt_boxes = self.filter_tag_det_res(dt_boxes, ori_im.shape) + if self.det_algorithm == "SAST" and self.det_sast_polygon: + dt_boxes = self.filter_tag_det_res_only_clip(dt_boxes, ori_im.shape) + else: + dt_boxes = self.filter_tag_det_res(dt_boxes, ori_im.shape) elapse = time.time() - starttime return dt_boxes, elapse @@ -135,8 +177,13 @@ if __name__ == "__main__": text_detector = TextDetector(args) count = 0 total_time = 0 + draw_img_save = "./inference_results" + if not os.path.exists(draw_img_save): + os.makedirs(draw_img_save) for image_file in image_file_list: - img = cv2.imread(image_file) + img, flag = check_and_read_gif(image_file) + if not flag: + img = cv2.imread(image_file) if img is None: logger.info("error in loading image:{}".format(image_file)) continue @@ -147,6 +194,7 @@ if __name__ == "__main__": print("Predict time of %s:" % image_file, elapse) src_im = utility.draw_text_det_res(dt_boxes, image_file) img_name_pure = image_file.split("/")[-1] - cv2.imwrite("./inference_results/det_res_%s" % img_name_pure, src_im) + cv2.imwrite( + os.path.join(draw_img_save, "det_res_%s" % img_name_pure), src_im) if count > 1: print("Avg Time:", total_time / (count - 1)) diff --git a/tools/infer/predict_rec.py b/tools/infer/predict_rec.py index bd96548d827d7b47a059648e8cedc20086488801..06273e9f9e5b42a9ecc829c435662e9aabcdd224 100755 --- a/tools/infer/predict_rec.py +++ b/tools/infer/predict_rec.py @@ -17,15 +17,18 @@ __dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) -import tools.infer.utility as utility -from ppocr.utils.utility import initial_logger -logger = initial_logger() -from ppocr.utils.utility import get_image_file_list import cv2 import copy import numpy as np import math import time + +import paddle.fluid as fluid + +import tools.infer.utility as utility +from ppocr.utils.utility import initial_logger +logger = initial_logger() +from ppocr.utils.utility import get_image_file_list, check_and_read_gif from ppocr.utils.character import CharacterOps @@ -37,17 +40,23 @@ class TextRecognizer(object): self.character_type = args.rec_char_type self.rec_batch_num = args.rec_batch_num self.rec_algorithm = args.rec_algorithm + self.text_len = args.max_text_length + self.use_zero_copy_run = args.use_zero_copy_run char_ops_params = { "character_type": args.rec_char_type, "character_dict_path": args.rec_char_dict_path, - "use_space_char": args.use_space_char + "use_space_char": args.use_space_char, + "max_text_length": args.max_text_length } - if self.rec_algorithm != "RARE": + if self.rec_algorithm in ["CRNN", "Rosetta", "STAR-Net"]: char_ops_params['loss_type'] = 'ctc' self.loss_type = 'ctc' - else: + elif self.rec_algorithm == "RARE": char_ops_params['loss_type'] = 'attention' self.loss_type = 'attention' + elif self.rec_algorithm == "SRN": + char_ops_params['loss_type'] = 'srn' + self.loss_type = 'srn' self.char_ops = CharacterOps(char_ops_params) def resize_norm_img(self, img, max_wh_ratio): @@ -70,6 +79,83 @@ class TextRecognizer(object): padding_im[:, :, 0:resized_w] = resized_image return padding_im + def resize_norm_img_srn(self, img, image_shape): + imgC, imgH, imgW = image_shape + + img_black = np.zeros((imgH, imgW)) + im_hei = img.shape[0] + im_wid = img.shape[1] + + if im_wid <= im_hei * 1: + img_new = cv2.resize(img, (imgH * 1, imgH)) + elif im_wid <= im_hei * 2: + img_new = cv2.resize(img, (imgH * 2, imgH)) + elif im_wid <= im_hei * 3: + img_new = cv2.resize(img, (imgH * 3, imgH)) + else: + img_new = cv2.resize(img, (imgW, imgH)) + + img_np = np.asarray(img_new) + img_np = cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY) + img_black[:, 0:img_np.shape[1]] = img_np + img_black = img_black[:, :, np.newaxis] + + row, col, c = img_black.shape + c = 1 + + return np.reshape(img_black, (c, row, col)).astype(np.float32) + + def srn_other_inputs(self, image_shape, num_heads, max_text_length, + char_num): + + imgC, imgH, imgW = image_shape + feature_dim = int((imgH / 8) * (imgW / 8)) + + encoder_word_pos = np.array(range(0, feature_dim)).reshape( + (feature_dim, 1)).astype('int64') + gsrm_word_pos = np.array(range(0, max_text_length)).reshape( + (max_text_length, 1)).astype('int64') + + gsrm_attn_bias_data = np.ones((1, max_text_length, max_text_length)) + gsrm_slf_attn_bias1 = np.triu(gsrm_attn_bias_data, 1).reshape( + [-1, 1, max_text_length, max_text_length]) + gsrm_slf_attn_bias1 = np.tile( + gsrm_slf_attn_bias1, + [1, num_heads, 1, 1]).astype('float32') * [-1e9] + + gsrm_slf_attn_bias2 = np.tril(gsrm_attn_bias_data, -1).reshape( + [-1, 1, max_text_length, max_text_length]) + gsrm_slf_attn_bias2 = np.tile( + gsrm_slf_attn_bias2, + [1, num_heads, 1, 1]).astype('float32') * [-1e9] + + encoder_word_pos = encoder_word_pos[np.newaxis, :] + gsrm_word_pos = gsrm_word_pos[np.newaxis, :] + + return [ + encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, + gsrm_slf_attn_bias2 + ] + + def process_image_srn(self, + img, + image_shape, + num_heads, + max_text_length, + char_ops=None): + norm_img = self.resize_norm_img_srn(img, image_shape) + norm_img = norm_img[np.newaxis, :] + char_num = char_ops.get_char_num() + + [encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2] = \ + self.srn_other_inputs(image_shape, num_heads, max_text_length, char_num) + + gsrm_slf_attn_bias1 = gsrm_slf_attn_bias1.astype(np.float32) + gsrm_slf_attn_bias2 = gsrm_slf_attn_bias2.astype(np.float32) + + return (norm_img, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, + gsrm_slf_attn_bias2) + def __call__(self, img_list): img_num = len(img_list) # Calculate the aspect ratio of all text bars @@ -79,7 +165,7 @@ class TextRecognizer(object): # Sorting can speed up the recognition process indices = np.argsort(np.array(width_list)) - # rec_res = [] + #rec_res = [] rec_res = [['', 0.0]] * img_num batch_num = self.rec_batch_num predict_time = 0 @@ -93,16 +179,62 @@ class TextRecognizer(object): wh_ratio = w * 1.0 / h max_wh_ratio = max(max_wh_ratio, wh_ratio) for ino in range(beg_img_no, end_img_no): - # norm_img = self.resize_norm_img(img_list[ino], max_wh_ratio) - norm_img = self.resize_norm_img(img_list[indices[ino]], - max_wh_ratio) - norm_img = norm_img[np.newaxis, :] - norm_img_batch.append(norm_img) - norm_img_batch = np.concatenate(norm_img_batch) + if self.loss_type != "srn": + norm_img = self.resize_norm_img(img_list[indices[ino]], + max_wh_ratio) + norm_img = norm_img[np.newaxis, :] + norm_img_batch.append(norm_img) + else: + norm_img = self.process_image_srn(img_list[indices[ino]], + self.rec_image_shape, 8, + 25, self.char_ops) + encoder_word_pos_list = [] + gsrm_word_pos_list = [] + gsrm_slf_attn_bias1_list = [] + gsrm_slf_attn_bias2_list = [] + encoder_word_pos_list.append(norm_img[1]) + gsrm_word_pos_list.append(norm_img[2]) + gsrm_slf_attn_bias1_list.append(norm_img[3]) + gsrm_slf_attn_bias2_list.append(norm_img[4]) + norm_img_batch.append(norm_img[0]) + + norm_img_batch = np.concatenate(norm_img_batch, axis=0) norm_img_batch = norm_img_batch.copy() - starttime = time.time() - self.input_tensor.copy_from_cpu(norm_img_batch) - self.predictor.zero_copy_run() + + if self.loss_type == "srn": + starttime = time.time() + encoder_word_pos_list = np.concatenate(encoder_word_pos_list) + gsrm_word_pos_list = np.concatenate(gsrm_word_pos_list) + gsrm_slf_attn_bias1_list = np.concatenate( + gsrm_slf_attn_bias1_list) + gsrm_slf_attn_bias2_list = np.concatenate( + gsrm_slf_attn_bias2_list) + starttime = time.time() + + norm_img_batch = fluid.core.PaddleTensor(norm_img_batch) + encoder_word_pos_list = fluid.core.PaddleTensor( + encoder_word_pos_list) + gsrm_word_pos_list = fluid.core.PaddleTensor(gsrm_word_pos_list) + gsrm_slf_attn_bias1_list = fluid.core.PaddleTensor( + gsrm_slf_attn_bias1_list) + gsrm_slf_attn_bias2_list = fluid.core.PaddleTensor( + gsrm_slf_attn_bias2_list) + + inputs = [ + norm_img_batch, encoder_word_pos_list, + gsrm_slf_attn_bias1_list, gsrm_slf_attn_bias2_list, + gsrm_word_pos_list + ] + + self.predictor.run(inputs) + else: + starttime = time.time() + if self.use_zero_copy_run: + self.input_tensor.copy_from_cpu(norm_img_batch) + self.predictor.zero_copy_run() + else: + norm_img_batch = fluid.core.PaddleTensor(norm_img_batch) + self.predictor.run([norm_img_batch]) if self.loss_type == "ctc": rec_idx_batch = self.output_tensors[0].copy_to_cpu() @@ -122,11 +254,31 @@ class TextRecognizer(object): ind = np.argmax(probs, axis=1) blank = probs.shape[1] valid_ind = np.where(ind != (blank - 1))[0] - score = np.mean(probs[valid_ind, ind[valid_ind]]) if len(valid_ind) == 0: continue + score = np.mean(probs[valid_ind, ind[valid_ind]]) # rec_res.append([preds_text, score]) rec_res[indices[beg_img_no + rno]] = [preds_text, score] + elif self.loss_type == 'srn': + rec_idx_batch = self.output_tensors[0].copy_to_cpu() + probs = self.output_tensors[1].copy_to_cpu() + char_num = self.char_ops.get_char_num() + preds = rec_idx_batch.reshape(-1) + elapse = time.time() - starttime + predict_time += elapse + total_preds = preds.copy() + for ino in range(int(len(rec_idx_batch) / self.text_len)): + preds = total_preds[ino * self.text_len:(ino + 1) * + self.text_len] + ind = np.argmax(probs, axis=1) + valid_ind = np.where(preds != int(char_num - 1))[0] + if len(valid_ind) == 0: + continue + score = np.mean(probs[valid_ind, ind[valid_ind]]) + preds = preds[:valid_ind[-1] + 1] + preds_text = self.char_ops.decode(preds) + + rec_res[indices[beg_img_no + ino]] = [preds_text, score] else: rec_idx_batch = self.output_tensors[0].copy_to_cpu() predict_batch = self.output_tensors[1].copy_to_cpu() @@ -153,12 +305,15 @@ def main(args): valid_image_file_list = [] img_list = [] for image_file in image_file_list: - img = cv2.imread(image_file, cv2.IMREAD_COLOR) + img, flag = check_and_read_gif(image_file) + if not flag: + img = cv2.imread(image_file) if img is None: logger.info("error in loading image:{}".format(image_file)) continue valid_image_file_list.append(image_file) img_list.append(img) + try: rec_res, predict_time = text_recognizer(img_list) except Exception as e: diff --git a/tools/infer/predict_system.py b/tools/infer/predict_system.py index 1b7aff1ca8213633b7eff73e13d0ea75eeaedc0b..766cc577b07bc9845af610a7a7e1ba50614d64eb 100755 --- a/tools/infer/predict_system.py +++ b/tools/infer/predict_system.py @@ -13,21 +13,24 @@ # limitations under the License. import os import sys + __dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) import tools.infer.utility as utility from ppocr.utils.utility import initial_logger + logger = initial_logger() import cv2 import tools.infer.predict_det as predict_det import tools.infer.predict_rec as predict_rec +import tools.infer.predict_cls as predict_cls import copy import numpy as np import math import time -from ppocr.utils.utility import get_image_file_list +from ppocr.utils.utility import get_image_file_list, check_and_read_gif from PIL import Image from tools.infer.utility import draw_ocr from tools.infer.utility import draw_ocr_box_txt @@ -37,6 +40,9 @@ class TextSystem(object): def __init__(self, args): self.text_detector = predict_det.TextDetector(args) self.text_recognizer = predict_rec.TextRecognizer(args) + self.use_angle_cls = args.use_angle_cls + if self.use_angle_cls: + self.text_classifier = predict_cls.TextClassifier(args) def get_rotate_crop_image(self, img, points): ''' @@ -49,18 +55,23 @@ class TextSystem(object): points[:, 0] = points[:, 0] - left points[:, 1] = points[:, 1] - top ''' - img_crop_width = int(max(np.linalg.norm(points[0] - points[1]), - np.linalg.norm(points[2] - points[3]))) - img_crop_height = int(max(np.linalg.norm(points[0] - points[3]), - np.linalg.norm(points[1] - points[2]))) - pts_std = np.float32([[0, 0], - [img_crop_width, 0], + img_crop_width = int( + max( + np.linalg.norm(points[0] - points[1]), + np.linalg.norm(points[2] - points[3]))) + img_crop_height = int( + max( + np.linalg.norm(points[0] - points[3]), + np.linalg.norm(points[1] - points[2]))) + pts_std = np.float32([[0, 0], [img_crop_width, 0], [img_crop_width, img_crop_height], [0, img_crop_height]]) M = cv2.getPerspectiveTransform(points, pts_std) - dst_img = cv2.warpPerspective(img, M, (img_crop_width, img_crop_height), - borderMode=cv2.BORDER_REPLICATE, - flags=cv2.INTER_CUBIC) + dst_img = cv2.warpPerspective( + img, + M, (img_crop_width, img_crop_height), + borderMode=cv2.BORDER_REPLICATE, + flags=cv2.INTER_CUBIC) dst_img_height, dst_img_width = dst_img.shape[0:2] if dst_img_height * 1.0 / dst_img_width >= 1.5: dst_img = np.rot90(dst_img) @@ -86,6 +97,11 @@ class TextSystem(object): tmp_box = copy.deepcopy(dt_boxes[bno]) img_crop = self.get_rotate_crop_image(ori_im, tmp_box) img_crop_list.append(img_crop) + if self.use_angle_cls: + img_crop_list, angle_list, elapse = self.text_classifier( + img_crop_list) + print("cls num : {}, elapse : {}".format( + len(img_crop_list), elapse)) rec_res, elapse = self.text_recognizer(img_crop_list) print("rec_res num : {}, elapse : {}".format(len(rec_res), elapse)) # self.print_draw_crop_rec_res(img_crop_list, rec_res) @@ -105,8 +121,8 @@ def sorted_boxes(dt_boxes): _boxes = list(sorted_boxes) for i in range(num_boxes - 1): - if abs(_boxes[i+1][0][1] - _boxes[i][0][1]) < 10 and \ - (_boxes[i + 1][0][0] < _boxes[i][0][0]): + if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \ + (_boxes[i + 1][0][0] < _boxes[i][0][0]): tmp = _boxes[i] _boxes[i] = _boxes[i + 1] _boxes[i + 1] = tmp @@ -117,27 +133,26 @@ def main(args): image_file_list = get_image_file_list(args.image_dir) text_sys = TextSystem(args) is_visualize = True - tackle_img_num = 0 + font_path = args.vis_font_path for image_file in image_file_list: - img = cv2.imread(image_file) + img, flag = check_and_read_gif(image_file) + if not flag: + img = cv2.imread(image_file) if img is None: logger.info("error in loading image:{}".format(image_file)) continue starttime = time.time() - tackle_img_num += 1 - if not args.use_gpu and args.enable_mkldnn and tackle_img_num % 30 == 0: - text_sys = TextSystem(args) dt_boxes, rec_res = text_sys(img) elapse = time.time() - starttime print("Predict time of %s: %.3fs" % (image_file, elapse)) + + drop_score = 0.5 dt_num = len(dt_boxes) - dt_boxes_final = [] for dno in range(dt_num): text, score = rec_res[dno] - if score >= 0.5: + if score >= drop_score: text_str = "%s, %.3f" % (text, score) print(text_str) - dt_boxes_final.append(dt_boxes[dno]) if is_visualize: image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) @@ -145,8 +160,13 @@ def main(args): txts = [rec_res[i][0] for i in range(len(rec_res))] scores = [rec_res[i][1] for i in range(len(rec_res))] - draw_img = draw_ocr( - image, boxes, txts, scores, draw_txt=True, drop_score=0.5) + draw_img = draw_ocr_box_txt( + image, + boxes, + txts, + scores, + drop_score=drop_score, + font_path=font_path) draw_img_save = "./inference_results/" if not os.path.exists(draw_img_save): os.makedirs(draw_img_save) diff --git a/tools/infer/utility.py b/tools/infer/utility.py index bab211a8dfa396b1aa8996cf7105452316441ed1..1eb57d34490edd3557dbe09a541b0a173ac7bbdf 100755 --- a/tools/infer/utility.py +++ b/tools/infer/utility.py @@ -15,6 +15,7 @@ import argparse import os, sys from ppocr.utils.utility import initial_logger + logger = initial_logger() from paddle.fluid.core import PaddleTensor from paddle.fluid.core import AnalysisConfig @@ -31,46 +32,66 @@ def parse_args(): return v.lower() in ("true", "t", "1") parser = argparse.ArgumentParser() - #params for prediction engine + # params for prediction engine parser.add_argument("--use_gpu", type=str2bool, default=True) parser.add_argument("--ir_optim", type=str2bool, default=True) parser.add_argument("--use_tensorrt", type=str2bool, default=False) parser.add_argument("--gpu_mem", type=int, default=8000) - #params for text detector + # params for text detector parser.add_argument("--image_dir", type=str) parser.add_argument("--det_algorithm", type=str, default='DB') parser.add_argument("--det_model_dir", type=str) parser.add_argument("--det_max_side_len", type=float, default=960) - #DB parmas + # DB parmas parser.add_argument("--det_db_thresh", type=float, default=0.3) parser.add_argument("--det_db_box_thresh", type=float, default=0.5) - parser.add_argument("--det_db_unclip_ratio", type=float, default=2.0) + parser.add_argument("--det_db_unclip_ratio", type=float, default=1.6) - #EAST parmas + # EAST parmas parser.add_argument("--det_east_score_thresh", type=float, default=0.8) parser.add_argument("--det_east_cover_thresh", type=float, default=0.1) parser.add_argument("--det_east_nms_thresh", type=float, default=0.2) - #params for text recognizer + # SAST parmas + parser.add_argument("--det_sast_score_thresh", type=float, default=0.5) + parser.add_argument("--det_sast_nms_thresh", type=float, default=0.2) + parser.add_argument("--det_sast_polygon", type=bool, default=False) + + # params for text recognizer parser.add_argument("--rec_algorithm", type=str, default='CRNN') parser.add_argument("--rec_model_dir", type=str) parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320") parser.add_argument("--rec_char_type", type=str, default='ch') - parser.add_argument("--rec_batch_num", type=int, default=30) + parser.add_argument("--rec_batch_num", type=int, default=6) + parser.add_argument("--max_text_length", type=int, default=25) parser.add_argument( "--rec_char_dict_path", type=str, default="./ppocr/utils/ppocr_keys_v1.txt") - parser.add_argument("--use_space_char", type=bool, default=True) - parser.add_argument("--enable_mkldnn", type=bool, default=False) + parser.add_argument("--use_space_char", type=str2bool, default=True) + parser.add_argument( + "--vis_font_path", type=str, default="./doc/simfang.ttf") + + # params for text classifier + parser.add_argument("--use_angle_cls", type=str2bool, default=False) + parser.add_argument("--cls_model_dir", type=str) + parser.add_argument("--cls_image_shape", type=str, default="3, 48, 192") + parser.add_argument("--label_list", type=list, default=['0', '180']) + parser.add_argument("--cls_batch_num", type=int, default=30) + parser.add_argument("--cls_thresh", type=float, default=0.9) + + parser.add_argument("--enable_mkldnn", type=str2bool, default=False) + parser.add_argument("--use_zero_copy_run", type=str2bool, default=False) return parser.parse_args() def create_predictor(args, mode): if mode == "det": model_dir = args.det_model_dir + elif mode == 'cls': + model_dir = args.cls_model_dir else: model_dir = args.rec_model_dir @@ -94,17 +115,23 @@ def create_predictor(args, mode): config.disable_gpu() config.set_cpu_math_library_num_threads(6) if args.enable_mkldnn: + # cache 10 different shapes for mkldnn to avoid memory leak + config.set_mkldnn_cache_capacity(10) config.enable_mkldnn() - - #config.enable_memory_optim() + + # config.enable_memory_optim() config.disable_glog_info() - # use zero copy - config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") - config.switch_use_feed_fetch_ops(False) + if args.use_zero_copy_run: + config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") + config.switch_use_feed_fetch_ops(False) + else: + config.switch_use_feed_fetch_ops(True) + predictor = create_paddle_predictor(config) input_names = predictor.get_input_names() - input_tensor = predictor.get_input_tensor(input_names[0]) + for name in input_names: + input_tensor = predictor.get_input_tensor(name) output_names = predictor.get_output_names() output_tensors = [] for output_name in output_names: @@ -133,7 +160,12 @@ def resize_img(img, input_size=600): return im -def draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5): +def draw_ocr(image, + boxes, + txts=None, + scores=None, + drop_score=0.5, + font_path="./doc/simfang.ttf"): """ Visualize the results of OCR detection and recognition args: @@ -141,39 +173,52 @@ def draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5): boxes(list): boxes with shape(N, 4, 2) txts(list): the texts scores(list): txxs corresponding scores - draw_txt(bool): whether draw text or not drop_score(float): only scores greater than drop_threshold will be visualized + font_path: the path of font which is used to draw text return(array): the visualized img """ if scores is None: scores = [1] * len(boxes) - for (box, score) in zip(boxes, scores): - if score < drop_score or math.isnan(score): + box_num = len(boxes) + for i in range(box_num): + if scores is not None and (scores[i] < drop_score or + math.isnan(scores[i])): continue - box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) + box = np.reshape(np.array(boxes[i]), [-1, 1, 2]).astype(np.int64) image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) - - if draw_txt: + if txts is not None: img = np.array(resize_img(image, input_size=600)) txt_img = text_visual( - txts, scores, img_h=img.shape[0], img_w=600, threshold=drop_score) + txts, + scores, + img_h=img.shape[0], + img_w=600, + threshold=drop_score, + font_path=font_path) img = np.concatenate([np.array(img), np.array(txt_img)], axis=1) return img return image -def draw_ocr_box_txt(image, boxes, txts): +def draw_ocr_box_txt(image, + boxes, + txts, + scores=None, + drop_score=0.5, + font_path="./doc/simfang.ttf"): h, w = image.height, image.width img_left = image.copy() img_right = Image.new('RGB', (w, h), (255, 255, 255)) import random - # 每次使用相同的随机种子 ,可以保证两次颜色一致 + random.seed(0) draw_left = ImageDraw.Draw(img_left) draw_right = ImageDraw.Draw(img_right) - for (box, txt) in zip(boxes, txts): + for idx, (box, txt) in enumerate(zip(boxes, txts)): + if scores is not None and scores[idx] < drop_score: + continue color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) draw_left.polygon(box, fill=color) @@ -189,8 +234,7 @@ def draw_ocr_box_txt(image, boxes, txts): 1])**2) if box_height > 2 * box_width: font_size = max(int(box_width * 0.9), 10) - font = ImageFont.truetype( - "./doc/simfang.ttf", font_size, encoding="utf-8") + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") cur_y = box[0][1] for c in txt: char_size = font.getsize(c) @@ -199,8 +243,7 @@ def draw_ocr_box_txt(image, boxes, txts): cur_y += char_size[1] else: font_size = max(int(box_height * 0.8), 10) - font = ImageFont.truetype( - "./doc/simfang.ttf", font_size, encoding="utf-8") + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") draw_right.text( [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font) img_left = Image.blend(image, img_left, 0.5) @@ -235,7 +278,12 @@ def str_count(s): return s_len - math.ceil(en_dg_count / 2) -def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): +def text_visual(texts, + scores, + img_h=400, + img_w=600, + threshold=0., + font_path="./doc/simfang.ttf"): """ create new blank img and draw txt on it args: @@ -243,6 +291,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): scores(list|None): corresponding score of each txt img_h(int): the height of blank img img_w(int): the width of blank img + font_path: the path of font which is used to draw text return(array): """ @@ -261,7 +310,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): font_size = 20 txt_color = (0, 0, 0) - font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8") + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") gap = font_size + 5 txt_img_list = [] @@ -342,6 +391,6 @@ if __name__ == '__main__': txts.append(dic['transcription']) scores.append(round(dic['scores'], 3)) - new_img = draw_ocr(image, boxes, txts, scores, draw_txt=True) + new_img = draw_ocr(image, boxes, txts, scores) cv2.imwrite(img_name, new_img) diff --git a/tools/infer_cls.py b/tools/infer_cls.py new file mode 100755 index 0000000000000000000000000000000000000000..aebdc0761b7ec48f81143ecbb758ce0e4da2edf7 --- /dev/null +++ b/tools/infer_cls.py @@ -0,0 +1,114 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import os +import sys + +__dir__ = os.path.dirname(__file__) +sys.path.append(__dir__) +sys.path.append(os.path.join(__dir__, '..')) + + +def set_paddle_flags(**kwargs): + for key, value in kwargs.items(): + if os.environ.get(key, None) is None: + os.environ[key] = str(value) + + +# NOTE(paddle-dev): All of these flags should be +# set before `import paddle`. Otherwise, it would +# not take any effect. +set_paddle_flags( + FLAGS_eager_delete_tensor_gb=0, # enable GC to save memory +) + +import tools.program as program +from paddle import fluid +from ppocr.utils.utility import initial_logger + +logger = initial_logger() +from ppocr.data.reader_main import reader_main +from ppocr.utils.save_load import init_model +from ppocr.utils.utility import create_module +from ppocr.utils.utility import get_image_file_list + + +def main(): + config = program.load_config(FLAGS.config) + program.merge_config(FLAGS.opt) + logger.info(config) + + # check if set use_gpu=True in paddlepaddle cpu version + use_gpu = config['Global']['use_gpu'] + # check_gpu(use_gpu) + + place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() + exe = fluid.Executor(place) + + rec_model = create_module(config['Architecture']['function'])(params=config) + startup_prog = fluid.Program() + eval_prog = fluid.Program() + with fluid.program_guard(eval_prog, startup_prog): + with fluid.unique_name.guard(): + _, outputs = rec_model(mode="test") + fetch_name_list = list(outputs.keys()) + fetch_varname_list = [outputs[v].name for v in fetch_name_list] + eval_prog = eval_prog.clone(for_test=True) + exe.run(startup_prog) + + init_model(config, eval_prog, exe) + + blobs = reader_main(config, 'test')() + infer_img = config['Global']['infer_img'] + infer_list = get_image_file_list(infer_img) + max_img_num = len(infer_list) + if len(infer_list) == 0: + logger.info("Can not find img in infer_img dir.") + for i in range(max_img_num): + logger.info("infer_img:%s" % infer_list[i]) + img = next(blobs) + predict = exe.run(program=eval_prog, + feed={"image": img}, + fetch_list=fetch_varname_list, + return_numpy=False) + scores = np.array(predict[0]) + label = np.array(predict[1]) + if len(label.shape) != 1: + label, scores = scores, label + logger.info('\t scores: {}'.format(scores)) + logger.info('\t label: {}'.format(label)) + # save for inference model + target_var = [] + for key, values in outputs.items(): + target_var.append(values) + + fluid.io.save_inference_model( + "./output", + feeded_var_names=['image'], + target_vars=target_var, + executor=exe, + main_program=eval_prog, + model_filename="model", + params_filename="params") + + +if __name__ == '__main__': + parser = program.ArgsParser() + FLAGS = parser.parse_args() + main() diff --git a/tools/infer_det.py b/tools/infer_det.py index f7dbd8eb1ec6c29b4fc0d098909bbc63beed91d8..1e7fdcc46a1d2f47a7928d6dc171ae393b15f901 100755 --- a/tools/infer_det.py +++ b/tools/infer_det.py @@ -22,9 +22,9 @@ import json import os import sys -__dir__ = os.path.dirname(__file__) +__dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) -sys.path.append(os.path.join(__dir__, '..')) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) def set_paddle_flags(**kwargs): @@ -70,7 +70,7 @@ def draw_det_res(dt_boxes, config, img, img_name): def main(): config = program.load_config(FLAGS.config) program.merge_config(FLAGS.opt) - print(config) + logger.info(config) # check if set use_gpu=True in paddlepaddle cpu version use_gpu = config['Global']['use_gpu'] @@ -134,8 +134,10 @@ def main(): dic = {'f_score': outs[0], 'f_geo': outs[1]} elif config['Global']['algorithm'] == 'DB': dic = {'maps': outs[0]} + elif config['Global']['algorithm'] == 'SAST': + dic = {'f_score': outs[0], 'f_border': outs[1], 'f_tvo': outs[2], 'f_tco': outs[3]} else: - raise Exception("only support algorithm: ['EAST', 'DB']") + raise Exception("only support algorithm: ['EAST', 'DB', 'SAST']") dt_boxes_list = postprocess(dic, ratio_list) for ino in range(img_num): dt_boxes = dt_boxes_list[ino] @@ -149,7 +151,7 @@ def main(): fout.write(otstr.encode()) src_img = cv2.imread(img_name) draw_det_res(dt_boxes, config, src_img, img_name) - + logger.info("success!") diff --git a/tools/infer_rec.py b/tools/infer_rec.py index 9abbf076e9a4902cdd0876c320021fc45e227a2c..29fc5b40a890cd6e8fa3ca7d3f0999835555d9bd 100755 --- a/tools/infer_rec.py +++ b/tools/infer_rec.py @@ -19,9 +19,9 @@ from __future__ import print_function import numpy as np import os import sys -__dir__ = os.path.dirname(__file__) +__dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) -sys.path.append(os.path.join(__dir__, '..')) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) def set_paddle_flags(**kwargs): @@ -64,7 +64,6 @@ def main(): exe = fluid.Executor(place) rec_model = create_module(config['Architecture']['function'])(params=config) - startup_prog = fluid.Program() eval_prog = fluid.Program() with fluid.program_guard(eval_prog, startup_prog): @@ -84,12 +83,38 @@ def main(): if len(infer_list) == 0: logger.info("Can not find img in infer_img dir.") for i in range(max_img_num): - print("infer_img:%s" % infer_list[i]) + logger.info("infer_img:%s" % infer_list[i]) img = next(blobs) - predict = exe.run(program=eval_prog, - feed={"image": img}, - fetch_list=fetch_varname_list, - return_numpy=False) + if loss_type != "srn": + predict = exe.run(program=eval_prog, + feed={"image": img}, + fetch_list=fetch_varname_list, + return_numpy=False) + else: + encoder_word_pos_list = [] + gsrm_word_pos_list = [] + gsrm_slf_attn_bias1_list = [] + gsrm_slf_attn_bias2_list = [] + encoder_word_pos_list.append(img[1]) + gsrm_word_pos_list.append(img[2]) + gsrm_slf_attn_bias1_list.append(img[3]) + gsrm_slf_attn_bias2_list.append(img[4]) + + encoder_word_pos_list = np.concatenate( + encoder_word_pos_list, axis=0).astype(np.int64) + gsrm_word_pos_list = np.concatenate( + gsrm_word_pos_list, axis=0).astype(np.int64) + gsrm_slf_attn_bias1_list = np.concatenate( + gsrm_slf_attn_bias1_list, axis=0).astype(np.float32) + gsrm_slf_attn_bias2_list = np.concatenate( + gsrm_slf_attn_bias2_list, axis=0).astype(np.float32) + + predict = exe.run(program=eval_prog, \ + feed={'image': img[0], 'encoder_word_pos': encoder_word_pos_list, + 'gsrm_word_pos': gsrm_word_pos_list, 'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1_list, + 'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2_list}, \ + fetch_list=fetch_varname_list, \ + return_numpy=False) if loss_type == "ctc": preds = np.array(predict[0]) preds = preds.reshape(-1) @@ -114,10 +139,21 @@ def main(): score = np.mean(probs[0, 1:end_pos[1]]) preds = preds.reshape(-1) preds_text = char_ops.decode(preds) - - print("\t index:", preds) - print("\t word :", preds_text) - print("\t score :", score) + elif loss_type == "srn": + char_num = char_ops.get_char_num() + preds = np.array(predict[0]) + preds = preds.reshape(-1) + probs = np.array(predict[1]) + ind = np.argmax(probs, axis=1) + valid_ind = np.where(preds != int(char_num - 1))[0] + if len(valid_ind) == 0: + continue + score = np.mean(probs[valid_ind, ind[valid_ind]]) + preds = preds[:valid_ind[-1] + 1] + preds_text = char_ops.decode(preds) + logger.info("\t index: {}".format(preds)) + logger.info("\t word : {}".format(preds_text)) + logger.info("\t score: {}".format(score)) # save for inference model target_var = [] diff --git a/tools/program.py b/tools/program.py index ff8743f15e2925a92fff76e7430e761d7baa720e..2ef203f4cb08231fa04cf2e4c8ee41a40470a0ae 100755 --- a/tools/program.py +++ b/tools/program.py @@ -22,6 +22,7 @@ import yaml import os from ppocr.utils.utility import create_module from ppocr.utils.utility import initial_logger + logger = initial_logger() import paddle.fluid as fluid @@ -29,9 +30,10 @@ import time from ppocr.utils.stats import TrainingStats from eval_utils.eval_det_utils import eval_det_run from eval_utils.eval_rec_utils import eval_rec_run +from eval_utils.eval_cls_utils import eval_cls_run from ppocr.utils.save_load import save_model import numpy as np -from ppocr.utils.character import cal_predicts_accuracy +from ppocr.utils.character import cal_predicts_accuracy, cal_predicts_accuracy_srn, CharacterOps class ArgsParser(ArgumentParser): @@ -81,10 +83,8 @@ default_config = {'Global': {'debug': False, }} def load_config(file_path): """ Load config from yml/yaml file. - Args: file_path (str): Path of the config file to be loaded. - Returns: global config """ merge_config(default_config) @@ -103,10 +103,8 @@ def load_config(file_path): def merge_config(config): """ Merge config into global config. - Args: config (dict): Config to be merged. - Returns: global config """ for key, value in config.items(): @@ -157,13 +155,11 @@ def build(config, main_prog, startup_prog, mode): 3. create a model 4. create fetchs 5. create an optimizer - Args: config(dict): config main_prog(): main program startup_prog(): startup program is_train(bool): train or valid - Returns: dataloader(): a bridge between the model and the data fetchs(dict): dict of model outputs(included loss and measures) @@ -176,8 +172,16 @@ def build(config, main_prog, startup_prog, mode): fetch_name_list = list(outputs.keys()) fetch_varname_list = [outputs[v].name for v in fetch_name_list] opt_loss_name = None + model_average = None + img_loss_name = None + word_loss_name = None if mode == "train": opt_loss = outputs['total_loss'] + # srn loss + #img_loss = outputs['img_loss'] + #word_loss = outputs['word_loss'] + #img_loss_name = img_loss.name + #word_loss_name = word_loss.name opt_params = config['Optimizer'] optimizer = create_module(opt_params['function'])(opt_params) optimizer.minimize(opt_loss) @@ -185,28 +189,58 @@ def build(config, main_prog, startup_prog, mode): global_lr = optimizer._global_learning_rate() fetch_name_list.insert(0, "lr") fetch_varname_list.insert(0, global_lr.name) - return (dataloader, fetch_name_list, fetch_varname_list, opt_loss_name) + if "loss_type" in config["Global"]: + if config['Global']["loss_type"] == 'srn': + model_average = fluid.optimizer.ModelAverage( + config['Global']['average_window'], + min_average_window=config['Global'][ + 'min_average_window'], + max_average_window=config['Global'][ + 'max_average_window']) + + return (dataloader, fetch_name_list, fetch_varname_list, opt_loss_name, + model_average) def build_export(config, main_prog, startup_prog): """ + Build input and output for exporting a checkpoints model to an inference model + Args: + config(dict): config + main_prog(): main program + startup_prog(): startup program + Returns: + feeded_var_names(list[str]): var names of input for exported inference model + target_vars(list[Variable]): output vars for exported inference model + fetches_var_name: dict of checkpoints model outputs(included loss and measures) """ with fluid.program_guard(main_prog, startup_prog): with fluid.unique_name.guard(): func_infor = config['Architecture']['function'] model = create_module(func_infor)(params=config) - image, outputs = model(mode='export') + algorithm = config['Global']['algorithm'] + if algorithm == "SRN": + image, others, outputs = model(mode='export') + else: + image, outputs = model(mode='export') fetches_var_name = sorted([name for name in outputs.keys()]) fetches_var = [outputs[name] for name in fetches_var_name] - feeded_var_names = [image.name] + if algorithm == "SRN": + others_var_names = sorted([name for name in others.keys()]) + feeded_var_names = [image.name] + others_var_names + else: + feeded_var_names = [image.name] + target_vars = fetches_var return feeded_var_names, target_vars, fetches_var_name -def create_multi_devices_program(program, loss_var_name): +def create_multi_devices_program(program, loss_var_name, for_quant=False): build_strategy = fluid.BuildStrategy() build_strategy.memory_optimize = False build_strategy.enable_inplace = True + if for_quant: + build_strategy.fuse_all_reduce_ops = False exec_strategy = fluid.ExecutionStrategy() exec_strategy.num_iteration_per_drop_scope = 1 compile_program = fluid.CompiledProgram(program).with_data_parallel( @@ -216,7 +250,14 @@ def create_multi_devices_program(program, loss_var_name): return compile_program -def train_eval_det_run(config, exe, train_info_dict, eval_info_dict): +def train_eval_det_run(config, + exe, + train_info_dict, + eval_info_dict, + is_pruning=False): + ''' + main program of evaluation for detection + ''' train_batch_id = 0 log_smooth_window = config['Global']['log_smooth_window'] epoch_num = config['Global']['epoch_num'] @@ -272,7 +313,14 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict): best_batch_id = train_batch_id best_epoch = epoch save_path = save_model_dir + "/best_accuracy" - save_model(train_info_dict['train_program'], save_path) + if is_pruning: + import paddleslim as slim + slim.prune.save_model( + exe, train_info_dict['train_program'], + save_path) + else: + save_model(train_info_dict['train_program'], + save_path) strs = 'Test iter: {}, metrics:{}, best_hmean:{:.6f}, best_epoch:{}, best_batch_id:{}'.format( train_batch_id, metrics, best_eval_hmean, best_epoch, best_batch_id) @@ -283,14 +331,27 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict): train_loader.reset() if epoch == 0 and save_epoch_step == 1: save_path = save_model_dir + "/iter_epoch_0" - save_model(train_info_dict['train_program'], save_path) + if is_pruning: + import paddleslim as slim + slim.prune.save_model(exe, train_info_dict['train_program'], + save_path) + else: + save_model(train_info_dict['train_program'], save_path) if epoch > 0 and epoch % save_epoch_step == 0: save_path = save_model_dir + "/iter_epoch_%d" % (epoch) - save_model(train_info_dict['train_program'], save_path) + if is_pruning: + import paddleslim as slim + slim.prune.save_model(exe, train_info_dict['train_program'], + save_path) + else: + save_model(train_info_dict['train_program'], save_path) return def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): + ''' + main program of evaluation for recognition + ''' train_batch_id = 0 log_smooth_window = config['Global']['log_smooth_window'] epoch_num = config['Global']['epoch_num'] @@ -329,14 +390,20 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): lr = np.mean(np.array(train_outs[fetch_map['lr']])) preds_idx = fetch_map['decoded_out'] preds = np.array(train_outs[preds_idx]) - preds_lod = train_outs[preds_idx].lod()[0] labels_idx = fetch_map['label'] labels = np.array(train_outs[labels_idx]) - labels_lod = train_outs[labels_idx].lod()[0] - acc, acc_num, img_num = cal_predicts_accuracy( - config['Global']['char_ops'], preds, preds_lod, labels, - labels_lod) + if config['Global']['loss_type'] != 'srn': + preds_lod = train_outs[preds_idx].lod()[0] + labels_lod = train_outs[labels_idx].lod()[0] + + acc, acc_num, img_num = cal_predicts_accuracy( + config['Global']['char_ops'], preds, preds_lod, labels, + labels_lod) + else: + acc, acc_num, img_num = cal_predicts_accuracy_srn( + config['Global']['char_ops'], preds, labels, + config['Global']['max_text_length']) t2 = time.time() train_batch_elapse = t2 - t1 stats = {'loss': loss, 'acc': acc} @@ -350,6 +417,9 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): if train_batch_id > 0 and\ train_batch_id % eval_batch_step == 0: + model_average = train_info_dict['model_average'] + if model_average != None: + model_average.apply(exe) metrics = eval_rec_run(exe, config, eval_info_dict, "eval") eval_acc = metrics['avg_acc'] eval_sample_num = metrics['total_sample_num'] @@ -374,3 +444,117 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): save_path = save_model_dir + "/iter_epoch_%d" % (epoch) save_model(train_info_dict['train_program'], save_path) return + + +def train_eval_cls_run(config, exe, train_info_dict, eval_info_dict): + train_batch_id = 0 + log_smooth_window = config['Global']['log_smooth_window'] + epoch_num = config['Global']['epoch_num'] + print_batch_step = config['Global']['print_batch_step'] + eval_batch_step = config['Global']['eval_batch_step'] + start_eval_step = 0 + if type(eval_batch_step) == list and len(eval_batch_step) >= 2: + start_eval_step = eval_batch_step[0] + eval_batch_step = eval_batch_step[1] + logger.info( + "During the training process, after the {}th iteration, an evaluation is run every {} iterations". + format(start_eval_step, eval_batch_step)) + save_epoch_step = config['Global']['save_epoch_step'] + save_model_dir = config['Global']['save_model_dir'] + if not os.path.exists(save_model_dir): + os.makedirs(save_model_dir) + train_stats = TrainingStats(log_smooth_window, ['loss', 'acc']) + best_eval_acc = -1 + best_batch_id = 0 + best_epoch = 0 + train_loader = train_info_dict['reader'] + for epoch in range(epoch_num): + train_loader.start() + try: + while True: + t1 = time.time() + train_outs = exe.run( + program=train_info_dict['compile_program'], + fetch_list=train_info_dict['fetch_varname_list'], + return_numpy=False) + fetch_map = dict( + zip(train_info_dict['fetch_name_list'], + range(len(train_outs)))) + + loss = np.mean(np.array(train_outs[fetch_map['total_loss']])) + lr = np.mean(np.array(train_outs[fetch_map['lr']])) + acc = np.mean(np.array(train_outs[fetch_map['acc']])) + + t2 = time.time() + train_batch_elapse = t2 - t1 + stats = {'loss': loss, 'acc': acc} + train_stats.update(stats) + if train_batch_id > start_eval_step and (train_batch_id - start_eval_step) \ + % print_batch_step == 0: + logs = train_stats.log() + strs = 'epoch: {}, iter: {}, lr: {:.6f}, {}, time: {:.3f}'.format( + epoch, train_batch_id, lr, logs, train_batch_elapse) + logger.info(strs) + + if train_batch_id > 0 and\ + train_batch_id % eval_batch_step == 0: + model_average = train_info_dict['model_average'] + if model_average != None: + model_average.apply(exe) + metrics = eval_cls_run(exe, eval_info_dict) + eval_acc = metrics['avg_acc'] + eval_sample_num = metrics['total_sample_num'] + if eval_acc > best_eval_acc: + best_eval_acc = eval_acc + best_batch_id = train_batch_id + best_epoch = epoch + save_path = save_model_dir + "/best_accuracy" + save_model(train_info_dict['train_program'], save_path) + strs = 'Test iter: {}, acc:{:.6f}, best_acc:{:.6f}, best_epoch:{}, best_batch_id:{}, eval_sample_num:{}'.format( + train_batch_id, eval_acc, best_eval_acc, best_epoch, + best_batch_id, eval_sample_num) + logger.info(strs) + train_batch_id += 1 + + except fluid.core.EOFException: + train_loader.reset() + if epoch == 0 and save_epoch_step == 1: + save_path = save_model_dir + "/iter_epoch_0" + save_model(train_info_dict['train_program'], save_path) + if epoch > 0 and epoch % save_epoch_step == 0: + save_path = save_model_dir + "/iter_epoch_%d" % (epoch) + save_model(train_info_dict['train_program'], save_path) + return + + +def preprocess(): + # load config from yml file + FLAGS = ArgsParser().parse_args() + config = load_config(FLAGS.config) + merge_config(FLAGS.opt) + logger.info(config) + + # check if set use_gpu=True in paddlepaddle cpu version + use_gpu = config['Global']['use_gpu'] + check_gpu(use_gpu) + + # check whether the set algorithm belongs to the supported algorithm list + alg = config['Global']['algorithm'] + assert alg in [ + 'EAST', 'DB', 'SAST', 'Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN', 'CLS' + ] + if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN']: + config['Global']['char_ops'] = CharacterOps(config['Global']) + + place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() + startup_program = fluid.Program() + train_program = fluid.Program() + + if alg in ['EAST', 'DB', 'SAST']: + train_alg_type = 'det' + elif alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN']: + train_alg_type = 'rec' + else: + train_alg_type = 'cls' + + return startup_program, train_program, place, config, train_alg_type diff --git a/tools/test_hubserving.py b/tools/test_hubserving.py index ea5929069eab71c0e8ca347ce05ea1b0b94b495e..f28ff39e441e9f0d8a4c6e1081827daf8aff9792 100644 --- a/tools/test_hubserving.py +++ b/tools/test_hubserving.py @@ -41,19 +41,19 @@ def draw_server_result(image_file, res): if len(res) == 0: return np.array(image) keys = res[0].keys() - if 'text_region' not in keys: # for ocr_rec, draw function is invalid - print("draw function is invalid for ocr_rec!") + if 'text_region' not in keys: # for ocr_rec, draw function is invalid + logger.info("draw function is invalid for ocr_rec!") return None - elif 'text' not in keys: # for ocr_det - print("draw text boxes only!") + elif 'text' not in keys: # for ocr_det + logger.info("draw text boxes only!") boxes = [] for dno in range(len(res)): boxes.append(res[dno]['text_region']) boxes = np.array(boxes) draw_img = draw_boxes(image, boxes) return draw_img - else: # for ocr_system - print("draw boxes and texts!") + else: # for ocr_system + logger.info("draw boxes and texts!") boxes = [] texts = [] scores = [] @@ -63,7 +63,8 @@ def draw_server_result(image_file, res): scores.append(res[dno]['confidence']) boxes = np.array(boxes) scores = np.array(scores) - draw_img = draw_ocr(image, boxes, texts, scores, draw_txt=True, drop_score=0.5) + draw_img = draw_ocr( + image, boxes, texts, scores, draw_txt=True, drop_score=0.5) return draw_img @@ -81,13 +82,13 @@ def main(url, image_path): # 发送HTTP请求 starttime = time.time() - data = {'images':[cv2_to_base64(img)]} + data = {'images': [cv2_to_base64(img)]} r = requests.post(url=url, headers=headers, data=json.dumps(data)) elapse = time.time() - starttime total_time += elapse - print("Predict time of %s: %.3fs" % (image_file, elapse)) + logger.info("Predict time of %s: %.3fs" % (image_file, elapse)) res = r.json()["results"][0] - # print(res) + logger.info(res) if is_visualize: draw_img = draw_server_result(image_file, res) @@ -98,17 +99,18 @@ def main(url, image_path): cv2.imwrite( os.path.join(draw_img_save, os.path.basename(image_file)), draw_img[:, :, ::-1]) - print("The visualized image saved in {}".format( + logger.info("The visualized image saved in {}".format( os.path.join(draw_img_save, os.path.basename(image_file)))) cnt += 1 if cnt % 100 == 0: - print(cnt, "processed") - print("avg time cost: ", float(total_time)/cnt) + logger.info("{} processed".format(cnt)) + logger.info("avg time cost: {}".format(float(total_time) / cnt)) -if __name__ == '__main__': + +if __name__ == '__main__': if len(sys.argv) != 3: - print("Usage: %s server_url image_path" % sys.argv[0]) + logger.info("Usage: %s server_url image_path" % sys.argv[0]) else: server_url = sys.argv[1] image_path = sys.argv[2] - main(server_url, image_path) \ No newline at end of file + main(server_url, image_path) diff --git a/tools/train.py b/tools/train.py index ccdf3c70907eca311407b81548da653cbb1b987f..cf0171b340f8cebb6251d2ef12efb14d3cdb709e 100755 --- a/tools/train.py +++ b/tools/train.py @@ -18,9 +18,9 @@ from __future__ import print_function import os import sys -__dir__ = os.path.dirname(__file__) +__dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) -sys.path.append(os.path.join(__dir__, '..')) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) def set_paddle_flags(**kwargs): @@ -42,34 +42,20 @@ from ppocr.utils.utility import initial_logger logger = initial_logger() from ppocr.data.reader_main import reader_main from ppocr.utils.save_load import init_model -from ppocr.utils.character import CharacterOps from paddle.fluid.contrib.model_stat import summary def main(): - config = program.load_config(FLAGS.config) - program.merge_config(FLAGS.opt) - logger.info(config) - - # check if set use_gpu=True in paddlepaddle cpu version - use_gpu = config['Global']['use_gpu'] - program.check_gpu(use_gpu) - - alg = config['Global']['algorithm'] - assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE'] - if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']: - config['Global']['char_ops'] = CharacterOps(config['Global']) - - place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() - startup_program = fluid.Program() - train_program = fluid.Program() + # build train program train_build_outputs = program.build( config, train_program, startup_program, mode='train') train_loader = train_build_outputs[0] train_fetch_name_list = train_build_outputs[1] train_fetch_varname_list = train_build_outputs[2] train_opt_loss_name = train_build_outputs[3] + model_average = train_build_outputs[-1] + # build eval program eval_program = fluid.Program() eval_build_outputs = program.build( config, eval_program, startup_program, mode='eval') @@ -77,9 +63,11 @@ def main(): eval_fetch_varname_list = eval_build_outputs[2] eval_program = eval_program.clone(for_test=True) + # initialize train reader train_reader = reader_main(config=config, mode="train") train_loader.set_sample_list_generator(train_reader, places=place) + # initialize eval reader eval_reader = reader_main(config=config, mode="eval") exe = fluid.Executor(place) @@ -91,7 +79,8 @@ def main(): # dump mode structure if config['Global']['debug']: - if 'attention' in config['Global']['loss_type']: + if train_alg_type == 'rec' and 'attention' in config['Global'][ + 'loss_type']: logger.warning('Does not suport dump attention...') else: summary(train_program) @@ -102,23 +91,24 @@ def main(): 'train_program':train_program,\ 'reader':train_loader,\ 'fetch_name_list':train_fetch_name_list,\ - 'fetch_varname_list':train_fetch_varname_list} + 'fetch_varname_list':train_fetch_varname_list,\ + 'model_average': model_average} eval_info_dict = {'program':eval_program,\ 'reader':eval_reader,\ 'fetch_name_list':eval_fetch_name_list,\ 'fetch_varname_list':eval_fetch_varname_list} - if alg in ['EAST', 'DB']: + if train_alg_type == 'det': program.train_eval_det_run(config, exe, train_info_dict, eval_info_dict) - else: + elif train_alg_type == 'rec': program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict) + else: + program.train_eval_cls_run(config, exe, train_info_dict, eval_info_dict) def test_reader(): - config = program.load_config(FLAGS.config) - program.merge_config(FLAGS.opt) - print(config) + logger.info(config) train_reader = reader_main(config=config, mode="train") import time starttime = time.time() @@ -129,14 +119,14 @@ def test_reader(): if count % 1 == 0: batch_time = time.time() - starttime starttime = time.time() - print("reader:", count, len(data), batch_time) + logger.info("reader:", count, len(data), batch_time) except Exception as e: logger.info(e) logger.info("finish reader: {}, Success!".format(count)) if __name__ == '__main__': - parser = program.ArgsParser() - FLAGS = parser.parse_args() + startup_program, train_program, place, config, train_alg_type = program.preprocess( + ) main() # test_reader() diff --git a/train_data/gen_label.py b/train_data/gen_label.py new file mode 100644 index 0000000000000000000000000000000000000000..552f279f34efa0be437d404273c510585da12f83 --- /dev/null +++ b/train_data/gen_label.py @@ -0,0 +1,74 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +import os +import argparse + + +def gen_rec_label(input_path, out_label): + with open(out_label, 'w') as out_file: + with open(input_path, 'r') as f: + for line in f.readlines(): + tmp = line.strip('\n').replace(" ", "").split(',') + img_path, label = tmp[0], tmp[1] + label = label.replace("\"", "") + out_file.write(img_path + '\t' + label + '\n') + + +def gen_det_label(root_path, input_dir, out_label): + with open(out_label, 'w') as out_file: + for label_file in os.listdir(input_dir): + img_path = root_path + label_file[3:-4] + ".jpg" + label = [] + with open(os.path.join(input_dir, label_file), 'r') as f: + for line in f.readlines(): + tmp = line.strip("\n\r").replace("\xef\xbb\xbf", "").split(',') + points = tmp[:-2] + s = [] + for i in range(0, len(points), 2): + b = points[i:i + 2] + s.append(b) + result = {"transcription": tmp[-1], "points": s} + label.append(result) + out_file.write(img_path + '\t' + str(label) + '\n') + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument( + '--mode', + type=str, + default="rec", + help='Generate rec_label or det_label, can be set rec or det') + parser.add_argument( + '--root_path', + type=str, + default=".", + help='The root directory of images.Only takes effect when mode=det ') + parser.add_argument( + '--input_path', + type=str, + default=".", + help='Input_label or input path to be converted') + parser.add_argument( + '--output_label', + type=str, + default="out_label.txt", + help='Output file name') + + args = parser.parse_args() + if args.mode == "rec": + print("Generate rec label") + gen_rec_label(args.input_path, args.output_label) + elif args.mode == "det": + gen_det_label(args.root_path, args.input_path, args.output_label)