diff --git a/.gitignore b/.gitignore index 1a2dd675e961f1804fa58e2e2e49118536b84ce9..9eecb4f1056fc040d4c9579d593bee2cc4013837 100644 --- a/.gitignore +++ b/.gitignore @@ -21,3 +21,7 @@ output/ *.log .clang-format .clang_format.hook + +build/ +dist/ +paddleocr.egg-info/ \ No newline at end of file diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000000000000000000000000000000000000..388882df0c3701780dd6371bc91887356a7bca40 --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,8 @@ +include LICENSE.txt +include README.md + +recursive-include ppocr/utils *.txt utility.py character.py check.py +recursive-include ppocr/data/det *.py +recursive-include ppocr/postprocess *.py +recursive-include ppocr/postprocess/lanms *.* +recursive-include tools/infer *.py diff --git a/README.md b/README.md index 08a27d8e7e2dd0a1ddcc774b0dd19189fcfb248b..c828163fcf1cba448b749bcb795749b5a00f686d 100644 --- a/README.md +++ b/README.md @@ -1,230 +1,209 @@ -English | [简体中文](README_cn.md) +[English](README_en.md) | 简体中文 -## Introduction -PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice. +## 简介 +PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 -**Recent updates** -- 2020.8.16, Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294) -- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519) -- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite) -- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addition, the benchmarks of the ultra-lightweight OCR model are provided. -- 2020.7.15, Add several related datasets, data annotation and synthesis tools. -- [more](./doc/doc_en/update_en.md) +**近期更新** +- 2020.8.26 更新OCR相关的85个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md) +- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md) +- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) +- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294) +- 2020.7.23 发布7月21日B站直播课回放和PPT,课节1,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519) +- 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统 +- [more](./doc/doc_ch/update.md) -## Features -- Ultra-lightweight OCR model, total model size is only 8.6M - - Single model supports Chinese/English numbers combination recognition, vertical text recognition, long text recognition - - Detection model DB (4.1M) + recognition model CRNN (4.5M) -- Various text detection algorithms: EAST, DB -- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE -- Support Linux, Windows, macOS and other systems. -## Visualization +## 特性 +- 超轻量级中文OCR模型,总模型仅8.6M + - 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 + - 检测模型DB(4.1M)+识别模型CRNN(4.5M) +- 实用通用中文OCR模型 +- 多种预测推理部署方案,包括服务部署和端侧部署 +- 多种文本检测训练算法,EAST、DB、SAST +- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE、SRN +- 可运行于Linux、Windows、MacOS等多种系统 -![](doc/imgs_results/11.jpg) +## 快速体验 -![](doc/imgs_results/img_10.jpg) - -[More visualization](./doc/doc_en/visualization_en.md) +
+ +
-You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) +上图是超轻量级中文OCR模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 -Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) +- 超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr +- 移动端DEMO体验(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统):[安装包二维码获取地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) - Also, you can scan the QR code below to install the App (**Android support only**) + Android手机也可以扫描下面二维码安装体验。
-- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md) - +## 中文OCR模型列表 -### Supported Models: - -|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link| +|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| |-|-|-|-|-| -|db_crnn_mobile|ultra-lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) -|db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) - - -## Tutorials -- [Installation](./doc/doc_en/installation_en.md) -- [Quick Start](./doc/doc_en/quickstart_en.md) -- Algorithm introduction - - [Text Detection Algorithm](#TEXTDETECTIONALGORITHM) - - [Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM) - - [END-TO-END OCR Algorithm](#ENDENDOCRALGORITHM) -- Model training/evaluation - - [Text Detection](./doc/doc_en/detection_en.md) - - [Text Recognition](./doc/doc_en/recognition_en.md) - - [Yml Configuration](./doc/doc_en/config_en.md) - - [Tricks](./doc/doc_en/tricks_en.md) -- Deployment - - [Python Inference](./doc/doc_en/inference_en.md) - - [C++ Inference](./deploy/cpp_infer/readme_en.md) - - [Serving](./doc/doc_en/serving_en.md) - - [Mobile](./deploy/lite/readme_en.md) - - Model Quantization and Compression (coming soon) - - [Benchmark](./doc/doc_en/benchmark_en.md) -- Datasets - - [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md) - - [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md) - - [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md) - - [Data Annotation Tools](./doc/doc_en/data_annotation_en.md) - - [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md) -- [FAQ](#FAQ) -- Visualization - - [Ultra-lightweight Chinese/English OCR Visualization](#UCOCRVIS) - - [General Chinese/English OCR Visualization](#GeOCRVIS) - - [Chinese/English OCR Visualization (Support Space Recognition )](#SpaceOCRVIS) -- [Community](#Community) -- [References](./doc/doc_en/reference_en.md) -- [License](#LICENSE) -- [Contribution](#CONTRIBUTION) - - -## Text Detection Algorithm - -PaddleOCR open source text detection algorithms list: +|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) + +## 文档教程 +- [快速安装](./doc/doc_ch/installation.md) +- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md) +- 算法介绍 + - [文本检测](#文本检测算法) + - [文本识别](#文本识别算法) +- 模型训练/评估 + - [文本检测](./doc/doc_ch/detection.md) + - [文本识别](./doc/doc_ch/recognition.md) + - [yml参数配置文件介绍](./doc/doc_ch/config.md) + - [中文OCR训练预测技巧](./doc/doc_ch/tricks.md) +- 预测部署 + - [基于Python预测引擎推理](./doc/doc_ch/inference.md) + - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) + - [服务化部署](./doc/doc_ch/serving.md) + - [端侧部署](./deploy/lite/readme.md) + - 模型量化压缩(coming soon) + - [Benchmark](./doc/doc_ch/benchmark.md) +- 数据集 + - [通用中英文OCR数据集](./doc/doc_ch/datasets.md) + - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) + - [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md) + - [常用数据标注工具](./doc/doc_ch/data_annotation.md) + - [常用数据合成工具](./doc/doc_ch/data_synthesis.md) +- 效果展示 + - [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示) + - [通用中文OCR效果展示](#通用中文OCR效果展示) + - [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示) +- FAQ + - [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md) + - [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md) + - [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md) +- [技术交流群](#欢迎加入PaddleOCR技术交流群) +- [参考文献](./doc/doc_ch/reference.md) +- [许可证书](#许可证书) +- [贡献代码](#贡献代码) + + +## 算法介绍 + +### 1.文本检测算法 + +PaddleOCR开源的文本检测算法列表: - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research) +- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研) -On the ICDAR2015 dataset, the text detection result is as follows: +在ICDAR2015文本检测公开数据集上,算法效果如下: -|Model|Backbone|precision|recall|Hmean|Download link| +|模型|骨干网络|precision|recall|Hmean|下载链接| |-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| +|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| -On Total-Text dataset, the text detection result is as follows: +在Total-text文本检测公开数据集上,算法效果如下: -|Model|Backbone|precision|recall|Hmean|Download link| +|模型|骨干网络|precision|recall|Hmean|下载链接| |-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| +|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| + +**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) + -**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). +使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下: -For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for text detection task are as follows: -|Model|Backbone|Configuration file|Pre-trained model| +|模型|骨干网络|配置文件|预训练模型| |-|-|-|-| -|ultra-lightweight OCR model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| -|General OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| +|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| +|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| -* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result. +* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 -For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) +PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)。 - -## Text Recognition Algorithm + +### 2.文本识别算法 -PaddleOCR open-source text recognition algorithms list: +PaddleOCR开源的文本识别算法列表: - [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) - [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) - [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) -- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research) +- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研) -Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: +参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: -|Model|Backbone|Avg Accuracy|Module combination|Download link| +|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| |-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| -|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| -|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| -|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| - -**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA) (download code: y3ry). - -The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar). - -We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w training data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the model. The related configuration and pre-trained models are as follows: - -|Model|Backbone|Configuration file|Pre-trained model| +|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| +|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| +|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| +|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| +|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| +|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| + +**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。 +原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。 + +使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下: + +|模型|骨干网络|配置文件|预训练模型| |-|-|-|-| -|ultra-lightweight OCR model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)| -|General OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)| - -Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) +|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| +|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| - -## END-TO-END OCR Algorithm -- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, coming soon) +PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)。 -## Visualization +## 效果展示 - -### 1.Ultra-lightweight Chinese/English OCR Visualization [more](./doc/doc_en/visualization_en.md) + +### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
- -### 2. General Chinese/English OCR Visualization [more](./doc/doc_en/visualization_en.md) + +### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
- -### 3.Chinese/English OCR Visualization (Space_support) [more](./doc/doc_en/visualization_en.md) + +### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
- - -## FAQ -1. Error when using attention-based recognition model: KeyError: 'predict' - - The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss. - -2. About inference speed - - When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values. - -3. Service deployment and mobile deployment - - It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates. - -4. Release time of self-developed algorithm - - Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. - -[more](./doc/doc_en/FAQ_en.md) - - -## Community -Scan the QR code below with your wechat and completing the questionnaire, you can access to offical technical exchange group. + +## 欢迎加入PaddleOCR技术交流群 +请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍
- -## License -This project is released under Apache 2.0 license - - -## Contribution -We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. - -- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation. -- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually. -- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure. -- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets. -- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively. -- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style. -- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services. + +## 许可证书 +本项目的发布受Apache 2.0 license许可认证。 + + +## 贡献代码 +我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。 + +- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档 +- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题 +- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码 +- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集 +- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码 +- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。 +- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务,支持快速发布可调用的Restful API服务。 diff --git a/README_cn.md b/README_cn.md deleted file mode 100644 index 10bcbc505ac25d856c60ec16ad758be7011af751..0000000000000000000000000000000000000000 --- a/README_cn.md +++ /dev/null @@ -1,228 +0,0 @@ -[English](README.md) | 简体中文 - -## 简介 -PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 - -**近期更新** -- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294) -- 2020.7.23 发布7月21日B站直播课回放和PPT,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519) -- 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统 -- 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark -- 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具 -- [more](./doc/doc_ch/update.md) - - -## 特性 -- 超轻量级中文OCR模型,总模型仅8.6M - - 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 - - 检测模型DB(4.1M)+识别模型CRNN(4.5M) -- 实用通用中文OCR模型 -- 多种预测推理部署方案,包括服务部署和端侧部署 -- 多种文本检测训练算法,EAST、DB -- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE -- 可运行于Linux、Windows、MacOS等多种系统 - -## 快速体验 - -
- -
- -上图是超轻量级中文OCR模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 - -- 超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr -- 移动端DEMO体验(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统):[安装包二维码获取地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) - - Android手机也可以扫描下面二维码安装体验。 - -
- -
- -- [**中文OCR模型快速使用**](./doc/doc_ch/quickstart.md) - - -## 中文OCR模型列表 - -|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| -|-|-|-|-|-| -|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) -|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) - -## 文档教程 -- [快速安装](./doc/doc_ch/installation.md) -- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md) -- 算法介绍 - - [文本检测](#文本检测算法) - - [文本识别](#文本识别算法) - - [端到端OCR](#端到端OCR算法) -- 模型训练/评估 - - [文本检测](./doc/doc_ch/detection.md) - - [文本识别](./doc/doc_ch/recognition.md) - - [yml参数配置文件介绍](./doc/doc_ch/config.md) - - [中文OCR训练预测技巧](./doc/doc_ch/tricks.md) -- 预测部署 - - [基于Python预测引擎推理](./doc/doc_ch/inference.md) - - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - - [服务化部署](./doc/doc_ch/serving.md) - - [端侧部署](./deploy/lite/readme.md) - - 模型量化压缩(coming soon) - - [Benchmark](./doc/doc_ch/benchmark.md) -- 数据集 - - [通用中英文OCR数据集](./doc/doc_ch/datasets.md) - - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) - - [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md) - - [常用数据标注工具](./doc/doc_ch/data_annotation.md) - - [常用数据合成工具](./doc/doc_ch/data_synthesis.md) -- [FAQ](#FAQ) -- 效果展示 - - [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示) - - [通用中文OCR效果展示](#通用中文OCR效果展示) - - [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示) -- [技术交流群](#欢迎加入PaddleOCR技术交流群) -- [参考文献](./doc/doc_ch/reference.md) -- [许可证书](#许可证书) -- [贡献代码](#贡献代码) - - -## 算法介绍 - -### 1.文本检测算法 - -PaddleOCR开源的文本检测算法列表: -- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) -- [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研) - -在ICDAR2015文本检测公开数据集上,算法效果如下: - -|模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| - -在Total-text文本检测公开数据集上,算法效果如下: - -|模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| - -**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) - - -使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下: - -|模型|骨干网络|配置文件|预训练模型| -|-|-|-|-| -|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| -|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| - -* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 - -PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)。 - - -### 2.文本识别算法 - -PaddleOCR开源的文本识别算法列表: -- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) -- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) -- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) -- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研) - -参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: - -|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| -|-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| -|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| -|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| -|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| - -**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。 -原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。 - -使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下: - -|模型|骨干网络|配置文件|预训练模型| -|-|-|-|-| -|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| -|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| - -PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)。 - - -### 3.端到端OCR算法 -- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, coming soon) - -## 效果展示 - - -### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md) - -
- -
- - -### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md) - -
- -
- - -### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md) - -
- -
- - -## FAQ -1. **转换attention识别模型时报错:KeyError: 'predict'** -问题已解,请更新到最新代码。 - -2. **关于推理速度** -图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。 - -3. **服务部署与移动端部署** -预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案,欢迎持续关注。 - -4. **自研算法发布时间** -自研算法SAST、SRN、End2End-PSL都将在7-8月陆续发布,敬请期待。 - -[more](./doc/doc_ch/FAQ.md) - - -## 欢迎加入PaddleOCR技术交流群 -请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍 - -
- -
- - -## 许可证书 -本项目的发布受Apache 2.0 license许可认证。 - - -## 贡献代码 -我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。 - -- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档 -- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题 -- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码 -- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集 -- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码 -- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。 -- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务,支持快速发布可调用的Restful API服务。 diff --git a/README_en.md b/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..37250da2cd3f6ccee76b522bf10745ecb8cd649e --- /dev/null +++ b/README_en.md @@ -0,0 +1,231 @@ +English | [简体中文](README.md) + +## Introduction +PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice. + +**Recent updates** +- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md) +- 2020.8.16, Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294) +- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519) +- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite) +- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addition, the benchmarks of the ultra-lightweight OCR model are provided. +- 2020.7.15, Add several related datasets, data annotation and synthesis tools. +- [more](./doc/doc_en/update_en.md) + +## Features +- Ultra-lightweight OCR model, total model size is only 8.6M + - Single model supports Chinese/English numbers combination recognition, vertical text recognition, long text recognition + - Detection model DB (4.1M) + recognition model CRNN (4.5M) +- Various text detection algorithms: EAST, DB +- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE +- Support Linux, Windows, macOS and other systems. + +## Visualization + +![](doc/imgs_results/11.jpg) + +![](doc/imgs_results/img_10.jpg) + +[More visualization](./doc/doc_en/visualization_en.md) + +You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) + +Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) + + Also, you can scan the QR code below to install the App (**Android support only**) + +
+ +
+ +- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md) + + + +### Supported Models: + +|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link| +|-|-|-|-|-| +|db_crnn_mobile|ultra-lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) + + +## Tutorials +- [Installation](./doc/doc_en/installation_en.md) +- [Quick Start](./doc/doc_en/quickstart_en.md) +- Algorithm introduction + - [Text Detection Algorithm](#TEXTDETECTIONALGORITHM) + - [Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM) + - [END-TO-END OCR Algorithm](#ENDENDOCRALGORITHM) +- Model training/evaluation + - [Text Detection](./doc/doc_en/detection_en.md) + - [Text Recognition](./doc/doc_en/recognition_en.md) + - [Yml Configuration](./doc/doc_en/config_en.md) + - [Tricks](./doc/doc_en/tricks_en.md) +- Deployment + - [Python Inference](./doc/doc_en/inference_en.md) + - [C++ Inference](./deploy/cpp_infer/readme_en.md) + - [Serving](./doc/doc_en/serving_en.md) + - [Mobile](./deploy/lite/readme_en.md) + - Model Quantization and Compression (coming soon) + - [Benchmark](./doc/doc_en/benchmark_en.md) +- Datasets + - [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md) + - [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md) + - [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md) + - [Data Annotation Tools](./doc/doc_en/data_annotation_en.md) + - [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md) +- [FAQ](#FAQ) +- Visualization + - [Ultra-lightweight Chinese/English OCR Visualization](#UCOCRVIS) + - [General Chinese/English OCR Visualization](#GeOCRVIS) + - [Chinese/English OCR Visualization (Support Space Recognition )](#SpaceOCRVIS) +- [Community](#Community) +- [References](./doc/doc_en/reference_en.md) +- [License](#LICENSE) +- [Contribution](#CONTRIBUTION) + + +## Text Detection Algorithm + +PaddleOCR open source text detection algorithms list: +- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) +- [x] DB([paper](https://arxiv.org/abs/1911.08947)) +- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research) + +On the ICDAR2015 dataset, the text detection result is as follows: + +|Model|Backbone|precision|recall|Hmean|Download link| +|-|-|-|-|-|-| +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| +|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| + +On Total-Text dataset, the text detection result is as follows: + +|Model|Backbone|precision|recall|Hmean|Download link| +|-|-|-|-|-|-| +|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| + +**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). + +For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for text detection task are as follows: +|Model|Backbone|Configuration file|Pre-trained model| +|-|-|-|-| +|ultra-lightweight OCR model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| +|General OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| + +* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result. + +For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) + + +## Text Recognition Algorithm + +PaddleOCR open-source text recognition algorithms list: +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) +- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) +- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) +- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research) + +Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: + +|Model|Backbone|Avg Accuracy|Module combination|Download link| +|-|-|-|-|-| +|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| +|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| +|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| +|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| +|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| +|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| + +**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA) (download code: y3ry). + +The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar). + +We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w training data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the model. The related configuration and pre-trained models are as follows: + +|Model|Backbone|Configuration file|Pre-trained model| +|-|-|-|-| +|ultra-lightweight OCR model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)| +|General OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)| + +Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) + + +## END-TO-END OCR Algorithm +- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, coming soon) + +## Visualization + + +### 1.Ultra-lightweight Chinese/English OCR Visualization [more](./doc/doc_en/visualization_en.md) + +
+ +
+ + +### 2. General Chinese/English OCR Visualization [more](./doc/doc_en/visualization_en.md) + +
+ +
+ + +### 3.Chinese/English OCR Visualization (Space_support) [more](./doc/doc_en/visualization_en.md) + +
+ +
+ + + +## FAQ +1. Error when using attention-based recognition model: KeyError: 'predict' + + The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss. + +2. About inference speed + + When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values. + +3. Service deployment and mobile deployment + + It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates. + +4. Release time of self-developed algorithm + + Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. + +[more](./doc/doc_en/FAQ_en.md) + + +## Community +Scan the QR code below with your wechat and completing the questionnaire, you can access to offical technical exchange group. + +
+ +
+ + +## License +This project is released under Apache 2.0 license + + +## Contribution +We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. + +- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation. +- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually. +- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure. +- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets. +- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively. +- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style. +- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services. diff --git a/__init__.py b/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7d94f66be072067172d56da13d8bb27d9aeac431 --- /dev/null +++ b/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +__all__ = ['PaddleOCR', 'draw_ocr'] +from .paddleocr import PaddleOCR +from .tools.infer.utility import draw_ocr diff --git a/deploy/cpp_infer/include/config.h b/deploy/cpp_infer/include/config.h index c5257d8ade72bfc7a68c5ca5c2c78fd5b6c1983c..8db693b121f1f91e30672de53e9b969babb49f8b 100644 --- a/deploy/cpp_infer/include/config.h +++ b/deploy/cpp_infer/include/config.h @@ -41,6 +41,8 @@ public: this->use_mkldnn = bool(stoi(config_map_["use_mkldnn"])); + this->use_zero_copy_run = bool(stoi(config_map_["use_zero_copy_run"])); + this->max_side_len = stoi(config_map_["max_side_len"]); this->det_db_thresh = stod(config_map_["det_db_thresh"]); @@ -68,6 +70,8 @@ public: bool use_mkldnn = false; + bool use_zero_copy_run = false; + int max_side_len = 960; double det_db_thresh = 0.3; diff --git a/deploy/cpp_infer/include/ocr_det.h b/deploy/cpp_infer/include/ocr_det.h index ed2667eecfea9a09d7da77df37f43a7b9e9bb349..0308d07f3bac67a275452500184e0959b16e8003 100644 --- a/deploy/cpp_infer/include/ocr_det.h +++ b/deploy/cpp_infer/include/ocr_det.h @@ -39,8 +39,8 @@ public: explicit DBDetector(const std::string &model_dir, const bool &use_gpu, const int &gpu_id, const int &gpu_mem, const int &cpu_math_library_num_threads, - const bool &use_mkldnn, const int &max_side_len, - const double &det_db_thresh, + const bool &use_mkldnn, const bool &use_zero_copy_run, + const int &max_side_len, const double &det_db_thresh, const double &det_db_box_thresh, const double &det_db_unclip_ratio, const bool &visualize) { @@ -49,6 +49,7 @@ public: this->gpu_mem_ = gpu_mem; this->cpu_math_library_num_threads_ = cpu_math_library_num_threads; this->use_mkldnn_ = use_mkldnn; + this->use_zero_copy_run_ = use_zero_copy_run; this->max_side_len_ = max_side_len; @@ -75,6 +76,7 @@ private: int gpu_mem_ = 4000; int cpu_math_library_num_threads_ = 4; bool use_mkldnn_ = false; + bool use_zero_copy_run_ = false; int max_side_len_ = 960; diff --git a/deploy/cpp_infer/include/ocr_rec.h b/deploy/cpp_infer/include/ocr_rec.h index 471aeb58758d1de1d48b4da1067c8532457ddc92..520f0f2879dcec6b30861755b119227efa11b29c 100644 --- a/deploy/cpp_infer/include/ocr_rec.h +++ b/deploy/cpp_infer/include/ocr_rec.h @@ -38,12 +38,14 @@ public: explicit CRNNRecognizer(const std::string &model_dir, const bool &use_gpu, const int &gpu_id, const int &gpu_mem, const int &cpu_math_library_num_threads, - const bool &use_mkldnn, const string &label_path) { + const bool &use_mkldnn, const bool &use_zero_copy_run, + const string &label_path) { this->use_gpu_ = use_gpu; this->gpu_id_ = gpu_id; this->gpu_mem_ = gpu_mem; this->cpu_math_library_num_threads_ = cpu_math_library_num_threads; this->use_mkldnn_ = use_mkldnn; + this->use_zero_copy_run_ = use_zero_copy_run; this->label_list_ = Utility::ReadDict(label_path); this->label_list_.push_back(" "); @@ -64,6 +66,7 @@ private: int gpu_mem_ = 4000; int cpu_math_library_num_threads_ = 4; bool use_mkldnn_ = false; + bool use_zero_copy_run_ = false; std::vector label_list_; diff --git a/deploy/cpp_infer/src/main.cpp b/deploy/cpp_infer/src/main.cpp index 27c98e5b84367de09f95c901d168c2d318902c43..1dd33b301e8b7da1df2a6325cedb10b8156c43d2 100644 --- a/deploy/cpp_infer/src/main.cpp +++ b/deploy/cpp_infer/src/main.cpp @@ -48,14 +48,15 @@ int main(int argc, char **argv) { cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR); - DBDetector det(config.det_model_dir, config.use_gpu, config.gpu_id, - config.gpu_mem, config.cpu_math_library_num_threads, - config.use_mkldnn, config.max_side_len, config.det_db_thresh, - config.det_db_box_thresh, config.det_db_unclip_ratio, - config.visualize); + DBDetector det( + config.det_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem, + config.cpu_math_library_num_threads, config.use_mkldnn, + config.use_zero_copy_run, config.max_side_len, config.det_db_thresh, + config.det_db_box_thresh, config.det_db_unclip_ratio, config.visualize); CRNNRecognizer rec(config.rec_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem, config.cpu_math_library_num_threads, - config.use_mkldnn, config.char_list_file); + config.use_mkldnn, config.use_zero_copy_run, + config.char_list_file); auto start = std::chrono::system_clock::now(); std::vector>> boxes; diff --git a/deploy/cpp_infer/src/ocr_det.cpp b/deploy/cpp_infer/src/ocr_det.cpp index c87b653ceab011ef0593e7fb87358325deaf882b..56fbace8cc6fa27f8172bed248573f15d0c98dac 100644 --- a/deploy/cpp_infer/src/ocr_det.cpp +++ b/deploy/cpp_infer/src/ocr_det.cpp @@ -31,7 +31,8 @@ void DBDetector::LoadModel(const std::string &model_dir) { } // false for zero copy tensor - config.SwitchUseFeedFetchOps(false); + // true for commom tensor + config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_); // true for multiple input config.SwitchSpecifyInputNames(true); @@ -59,12 +60,22 @@ void DBDetector::Run(cv::Mat &img, std::vector input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f); this->permute_op_.Run(&resize_img, input.data()); - auto input_names = this->predictor_->GetInputNames(); - auto input_t = this->predictor_->GetInputTensor(input_names[0]); - input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); - input_t->copy_from_cpu(input.data()); - - this->predictor_->ZeroCopyRun(); + // Inference. + if (this->use_zero_copy_run_) { + auto input_names = this->predictor_->GetInputNames(); + auto input_t = this->predictor_->GetInputTensor(input_names[0]); + input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); + input_t->copy_from_cpu(input.data()); + this->predictor_->ZeroCopyRun(); + } else { + paddle::PaddleTensor input_t; + input_t.shape = {1, 3, resize_img.rows, resize_img.cols}; + input_t.data = + paddle::PaddleBuf(input.data(), input.size() * sizeof(float)); + input_t.dtype = PaddleDType::FLOAT32; + std::vector outputs; + this->predictor_->Run({input_t}, &outputs, 1); + } std::vector out_data; auto output_names = this->predictor_->GetOutputNames(); diff --git a/deploy/cpp_infer/src/ocr_rec.cpp b/deploy/cpp_infer/src/ocr_rec.cpp index bbd7b9b2269ca776c2433d502893986ebda809c3..a3486db46f6eb6ad0df49619744924e6ef70dd01 100644 --- a/deploy/cpp_infer/src/ocr_rec.cpp +++ b/deploy/cpp_infer/src/ocr_rec.cpp @@ -39,18 +39,29 @@ void CRNNRecognizer::Run(std::vector>> boxes, this->permute_op_.Run(&resize_img, input.data()); - auto input_names = this->predictor_->GetInputNames(); - auto input_t = this->predictor_->GetInputTensor(input_names[0]); - input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); - input_t->copy_from_cpu(input.data()); - - this->predictor_->ZeroCopyRun(); + // Inference. + if (this->use_zero_copy_run_) { + auto input_names = this->predictor_->GetInputNames(); + auto input_t = this->predictor_->GetInputTensor(input_names[0]); + input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); + input_t->copy_from_cpu(input.data()); + this->predictor_->ZeroCopyRun(); + } else { + paddle::PaddleTensor input_t; + input_t.shape = {1, 3, resize_img.rows, resize_img.cols}; + input_t.data = + paddle::PaddleBuf(input.data(), input.size() * sizeof(float)); + input_t.dtype = PaddleDType::FLOAT32; + std::vector outputs; + this->predictor_->Run({input_t}, &outputs, 1); + } std::vector rec_idx; auto output_names = this->predictor_->GetOutputNames(); auto output_t = this->predictor_->GetOutputTensor(output_names[0]); auto rec_idx_lod = output_t->lod(); auto shape_out = output_t->shape(); + int out_num = std::accumulate(shape_out.begin(), shape_out.end(), 1, std::multiplies()); @@ -120,7 +131,8 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) { } // false for zero copy tensor - config.SwitchUseFeedFetchOps(false); + // true for commom tensor + config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_); // true for multiple input config.SwitchSpecifyInputNames(true); diff --git a/deploy/cpp_infer/tools/config.txt b/deploy/cpp_infer/tools/config.txt index a049fc7d9dfaac88e69581b7c0aad8af8a9efaab..40beea3a2e6f0260a42202d6411ffb10907bf871 100644 --- a/deploy/cpp_infer/tools/config.txt +++ b/deploy/cpp_infer/tools/config.txt @@ -4,6 +4,7 @@ gpu_id 0 gpu_mem 4000 cpu_math_library_num_threads 10 use_mkldnn 0 +use_zero_copy_run 1 # det config max_side_len 960 diff --git a/deploy/hubserving/ocr_det/params.py b/deploy/hubserving/ocr_det/params.py index 0b950114f82d88f20d2ce521628ea9dda7740ab4..e88ab45c7bb548ef971465d4aaefb30d247ab17f 100644 --- a/deploy/hubserving/ocr_det/params.py +++ b/deploy/hubserving/ocr_det/params.py @@ -36,4 +36,6 @@ def read_params(): # cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" # cfg.use_space_char = True - return cfg \ No newline at end of file + cfg.use_zero_copy_run = False + + return cfg diff --git a/deploy/hubserving/ocr_rec/params.py b/deploy/hubserving/ocr_rec/params.py index a6b2ee1902d2073f94202a0e9268e7bd821dfa21..59772e2163d1d5f8279dee85432b5bf93502914e 100644 --- a/deploy/hubserving/ocr_rec/params.py +++ b/deploy/hubserving/ocr_rec/params.py @@ -38,4 +38,6 @@ def read_params(): cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" cfg.use_space_char = True - return cfg \ No newline at end of file + cfg.use_zero_copy_run = False + + return cfg diff --git a/deploy/hubserving/ocr_system/params.py b/deploy/hubserving/ocr_system/params.py index 6ece2d6fcfc703125ab5d1b3fa4566c43937d583..0ff56d37d50b30b09bb13b529a48a260dfe8f84a 100644 --- a/deploy/hubserving/ocr_system/params.py +++ b/deploy/hubserving/ocr_system/params.py @@ -38,4 +38,6 @@ def read_params(): cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" cfg.use_space_char = True - return cfg \ No newline at end of file + cfg.use_zero_copy_run = False + + return cfg diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md index c184135e3527f27ea119cff46372edd0d9dde3fe..affe9e6da5251494aa531abd6fe1e75daf278ace 100644 --- a/doc/doc_ch/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -1,25 +1,286 @@ -## FAQ +# FAQ -1. **预测报错:got an unexpected keyword argument 'gradient_clip'** -安装的paddle版本不对,目前本项目仅支持paddle1.7,近期会适配到1.8。 +## 写在前面 -2. **转换attention识别模型时报错:KeyError: 'predict'** -问题已解决,请更新到最新代码。 +- 我们收集整理了issues和用户群中的常见问题和解答,并且会不断更新,旨在为OCR的开发者提供一些参考,也希望帮助大家少走一些弯路。 -3. **关于推理速度** -图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。 +- OCR领域大佬众多,本文档回答主要依赖有限的项目实践,难免挂一漏万,如有遗漏和不足,也**希望有识之士帮忙补充和修正**,万分感谢。 -4. **服务部署与移动端部署** -预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案,欢迎持续关注。 -5. **自研算法发布时间** -自研算法SAST、SRN、End2End-PSL都将在7-8月陆续发布,敬请期待。 +## PaddleOCR常见问题汇总(持续更新) -6. **如何在Windows或Mac系统上运行** -PaddleOCR已完成Windows和Mac系统适配,运行时注意两点:1、在[快速安装](./installation.md)时,如果不想安装docker,可跳过第一步,直接从第二步安装paddle开始。2、inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。 +* [【精选】OCR精选10个问题](#【精选】OCR精选10个问题) +* [【理论篇】OCR通用21个问题](#【理论篇】OCR通用问题) + * [基础知识3题](#基础知识) + * [数据集4题](#数据集) + * [模型训练调优6题](#模型训练调优) + * [预测部署8题](#预测部署) +* [【实战篇】PaddleOCR实战54个问题](#【实战篇】PaddleOCR实战问题) + * [使用咨询18题](#使用咨询) + * [数据集9题](#数据集) + * [模型训练调优13题](#模型训练调优) + * [预测部署14题](#[预测部署) -7. **超轻量模型和通用OCR模型的区别** -目前PaddleOCR开源了2个中文模型,分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下: + + + +## 【精选】OCR精选10个问题 + +#### Q1.1.1:基于深度学习的文字检测方法有哪几种?各有什么优缺点? + +**A**:常用的基于深度学习的文字检测方法一般可以分为基于回归的、基于分割的两大类,当然还有一些将两者进行结合的方法。 +(1)基于回归的方法分为box回归和像素值回归。a. 采用box回归的方法主要有CTPN、Textbox系列和EAST,这类算法对规则形状文本检测效果较好,但无法准确检测不规则形状文本。 b. 像素值回归的方法主要有CRAFT和SA-Text,这类算法能够检测弯曲文本且对小文本效果优秀但是实时性能不够。 +(2)基于分割的算法,如PSENet,这类算法不受文本形状的限制,对各种形状的文本都能取得较好的效果,但是往往后处理比较复杂,导致耗时严重。目前也有一些算法专门针对这个问题进行改进,如DB,将二值化进行近似,使其可导,融入训练,从而获取更准确的边界,大大降低了后处理的耗时。 + +#### Q1.1.2:对于中文行文本识别,CTC和Attention哪种更优? + +**A**:(1)从效果上来看,通用OCR场景CTC的识别效果优于Attention,因为带识别的字典中的字符比较多,常用中文汉字三千字以上,如果训练样本不足的情况下,对于这些字符的序列关系挖掘比较困难。中文场景下Attention模型的优势无法体现。而且Attention适合短语句识别,对长句子识别比较差。 +(2)从训练和预测速度上,Attention的串行解码结构限制了预测速度,而CTC网络结构更高效,预测速度上更有优势。 + +#### Q1.1.3:弯曲形变的文字识别需要怎么处理?TPS应用场景是什么,是否好用? + +**A**:(1)在大多数情况下,如果遇到的场景弯曲形变不是太严重,检测4个顶点,然后直接通过仿射变换转正识别就足够了。 +(2)如果不能满足需求,可以尝试使用TPS(Thin Plate Spline),即薄板样条插值。TPS是一种插值算法,经常用于图像变形等,通过少量的控制点就可以驱动图像进行变化。一般用在有弯曲形变的文本识别中,当检测到不规则的/弯曲的(如,使用基于分割的方法检测算法)文本区域,往往先使用TPS算法对文本区域矫正成矩形再进行识别,如,STAR-Net、RARE等识别算法中引入了TPS模块。 +**Warning**:TPS看起来美好,在实际应用时经常发现并不够鲁棒,并且会增加耗时,需要谨慎使用。 + +#### Q1.1.4:简单的对于精度要求不高的OCR任务,数据集需要准备多少张呢? + +**A**:(1)训练数据的数量和需要解决问题的复杂度有关系。难度越大,精度要求越高,则数据集需求越大,而且一般情况实际中的训练数据越多效果越好。 +(2)对于精度要求不高的场景,检测任务和识别任务需要的数据量是不一样的。对于检测任务,500张图像可以保证基本的检测效果。对于识别任务,需要保证识别字典中每个字符出现在不同场景的行文本图像数目需要大于200张(举例,如果有字典中有5个字,每个字都需要出现在200张图片以上,那么最少要求的图像数量应该在200-1000张之间),这样可以保证基本的识别效果。 + +#### Q1.1.5:背景干扰的文字(如印章盖到落款上,需要识别落款或者印章中的文字),如何识别? + +**A**:(1)在人眼确认可识别的条件下,对于背景有干扰的文字,首先要保证检测框足够准确,如果检测框不准确,需要考虑是否可以通过过滤颜色等方式对图像预处理并且增加更多相关的训练数据;在识别的部分,注意在训练数据中加入背景干扰类的扩增图像。 +(2)如果MobileNet模型不能满足需求,可以尝试ResNet系列大模型来获得更好的效果 +。 + +#### Q1.1.6:OCR领域常用的评估指标是什么? + +**A**:对于两阶段的可以分开来看,分别是检测和识别阶段 + +(1)检测阶段:先按照检测框和标注框的IOU评估,IOU大于某个阈值判断为检测准确。这里检测框和标注框不同于一般的通用目标检测框,是采用多边形进行表示。 + +检测准确率:正确的检测框个数在全部检测框的占比,主要是判断检测指标. + +检测召回率:正确的检测框个数在全部标注框的占比,主要是判断漏检的指标。 + + +(2)识别阶段: +字符识别准确率,即正确识别的文本行占标注的文本行数量的比例,只有整行文本识别对才算正确识别。 + +(3)端到端统计: +端对端准确率:准确检测并正确识别文本行在全部标注文本行的占比; +端到端召回率:准确检测并正确识别文本行在 检测到的文本行数量 的占比;准确检测的标准是检测框与标注框的IOU大于某个阈值,正确识别的的检测框中的文本与标注的文本相同。 + + +#### Q1.1.7:单张图上多语种并存识别(如单张图印刷体和手写文字并存),应该如何处理? + +**A**:单张图像中存在多种类型文本的情况很常见,典型的以学生的试卷为代表,一张图像同时存在手写体和印刷体两种文本,这类情况下,可以尝试”1个检测模型+1个N分类模型+N个识别模型”的解决方案。 +其中不同类型文本共用同一个检测模型,N分类模型指额外训练一个分类器,将检测到的文本进行分类,如手写+印刷的情况就是二分类,N种语言就是N分类,在识别的部分,针对每个类型的文本单独训练一个识别模型,如手写+印刷的场景,就需要训练一个手写体识别模型,一个印刷体识别模型,如果一个文本框的分类结果是手写体,那么就传给手写体识别模型进行识别,其他情况同理。 + +#### Q1.1.8:请问PaddleOCR项目中的中文超轻量和通用模型用了哪些数据集?训练多少样本,gpu什么配置,跑了多少个epoch,大概跑了多久? + +**A**: +(1)检测的话,LSVT街景数据集共3W张图像,超轻量模型,150epoch左右,2卡V100 跑了不到2天;通用模型:2卡V100 150epoch 不到4天。 +(2) +识别的话,520W左右的数据集(真实数据26W+合成数据500W)训练,超轻量模型:4卡V100,总共训练了5天左右。通用模型:4卡V100,共训练6天。 + +超轻量模型训练分为2个阶段: +(1)全量数据训练50epoch,耗时3天 +(2)合成数据+真实数据按照1:1数据采样,进行finetune训练200epoch,耗时2天 + +通用模型训练: +真实数据+合成数据,动态采样(1:1)训练,200epoch,耗时 6天左右。 + + +#### Q1.1.9:PaddleOCR模型推理方式有几种?各自的优缺点是什么 + +**A**:目前推理方式支持基于训练引擎推理和基于预测引擎推理。 +(1)基于训练引擎推理不需要转换模型,但是需要先组网再load参数,语言只支持python,不适合系统集成。 +(2)基于预测引擎的推理需要先转换模型为inference格式,然后可以进行不需要组网的推理,语言支持c++和python,适合系统集成。 + +#### Q1.1.10:PaddleOCR中,对于模型预测加速,CPU加速的途径有哪些?基于TenorRT加速GPU对输入有什么要求? + +**A**:(1)CPU可以使用mkldnn进行加速;对于python inference的话,可以把enable_mkldnn改为true,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/tools/infer/utility.py#L73),对于cpp inference的话,在配置文件里面配置use_mkldnn 1即可,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/deploy/cpp_infer/tools/config.txt#L6) +(2)GPU需要注意变长输入问题等,TRT6 之后才支持变长输入 + + +## 【理论篇】OCR通用问题 +### 基础知识 + +#### Q2.1.1:CRNN能否识别两行的文字?还是说必须一行? + +**A**:CRNN是一种基于1D-CTC的算法,其原理决定无法识别2行或多行的文字,只能单行识别。 + +#### Q2.1.2:怎么判断行文本图像是否是颠倒的? + +**A**:有两种方案:(1)原始图像和颠倒图像都进行识别预测,取得分较高的为识别结果。 +(2)训练一个正常图像和颠倒图像的方向分类器进行判断。 + +#### Q2.1.3:目前OCR普遍是二阶段,端到端的方案在业界落地情况如何? + +**A**:端到端在文字分布密集的业务场景,效率会比较有保证,精度的话看自己业务数据积累情况,如果行级别的识别数据积累比较多的话two-stage会比较好。百度的落地场景,比如工业仪表识别、车牌识别都用到端到端解决方案。 + + +### 数据集 + +#### Q2.2.1:支持空格的模型,标注数据的时候是不是要标注空格?中间几个空格都要标注出来么? + +**A**:如果需要检测和识别模型,就需要在标注的时候把空格标注出来,而且在字典中增加空格对应的字符。标注过程中,如果中间几个空格标注一个就行。 + +#### Q2.2.2:如果考虑支持竖排文字识别,相关的数据集如何合成? + +**A**:竖排文字与横排文字合成方式相同,只是选择了垂直字体。合成工具推荐:[text_renderer](https://github.com/Sanster/text_renderer) + +#### Q2.2.3:训练文字识别模型,真实数据有30w,合成数据有500w,需要做样本均衡吗? + +**A**:需要,一般需要保证一个batch中真实数据样本和合成数据样本的比例是1:1~1:3左右效果比较理想。如果合成数据过大,会过拟合到合成数据,预测效果往往不佳。还有一种**启发性**的尝试是可以先用大量合成数据训练一个base模型,然后再用真实数据微调,在一些简单场景效果也是会有提升的。 + +#### Q2.2.4:请问一下,竖排文字识别时候,字的特征已经变了,这种情况在数据集和字典标注是新增一个类别还是多个角度的字共享一个类别? + +**A**:可以根据实际场景做不同的尝试,共享一个类别是可以收敛,效果也还不错。但是如果分开训练,同类样本之间一致性更好,更容易收敛,识别效果会更优。 + +### 训练训练调优 + +#### Q2.3.1:如何更换文本检测/识别的backbone? +**A**:无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。[**飞桨图像分类套件PaddleClas**](https://github.com/PaddlePaddle/PaddleClas)汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的[**117个预训练模型下载地址**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。 + (1)文字检测骨干网络的替换,主要是确定类似与ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。 + (2)文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中[MobileNetV3骨干网络](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py)的改动。 + +#### Q2.3.2:文本识别训练不加LSTM是否可以收敛? + +**A**:理论上是可以收敛的,加上LSTM模块主要是为了挖掘文字之间的序列关系,提升识别效果。对于有明显上下文语义的场景效果会比较明显。 + +#### Q2.3.3:文本识别中LSTM和GRU如何选择? + +**A**:从项目实践经验来看,序列模块采用LSTM的识别效果优于GRU,但是LSTM的计算量比GRU大一些,可以根据自己实际情况选择。 + +#### Q2.3.4:对于CRNN模型,backbone采用DenseNet和ResNet_vd,哪种网络结构更好? + +**A**:Backbone的识别效果在CRNN模型上的效果,与Imagenet 1000 图像分类任务上识别效果和效率一致。在图像分类任务上ResnNet_vd(79%+)的识别精度明显优于DenseNet(77%+),此外对于GPU,Nvidia针对ResNet系列模型做了优化,预测效率更高,所以相对而言,resnet_vd是较好选择。如果是移动端,可以优先考虑MobileNetV3系列。 + +#### Q2.3.5:训练识别时,如何选择合适的网络输入shape? + +**A**:一般高度采用32,最长宽度的选择,有两种方法: +(1)统计训练样本图像的宽高比分布。最大宽高比的选取考虑满足80%的训练样本。 +(2)统计训练样本文字数目。最长字符数目的选取考虑满足80%的训练样本。然后中文字符长宽比近似认为是1,英文认为3:1,预估一个最长宽度。 + +#### Q2.3.6:如何识别文字比较长的文本? + +**A**:在中文识别模型训练时,并不是采用直接将训练样本缩放到[3,32,320]进行训练,而是先等比例缩放图像,保证图像高度为32,宽度不足320的部分补0,宽高比大于10的样本直接丢弃。预测时,如果是单张图像预测,则按上述操作直接对图像缩放,不做宽度320的限制。如果是多张图预测,则采用batch方式预测,每个batch的宽度动态变换,采用这个batch中最长宽度。 + +### 预测部署 + +#### Q2.4.1:请问对于图片中的密集文字,有什么好的处理办法吗? + +**A**:可以先试用预训练模型测试一下,例如DB+CRNN,判断下密集文字图片中是检测还是识别的问题,然后针对性的改善。还有一种是如果图象中密集文字较小,可以尝试增大图像分辨率,对图像进行一定范围内的拉伸,将文字稀疏化,提高识别效果。 + +#### Q2.4.2:对于一些在识别时稍微模糊的文本,有没有一些图像增强的方式? + +**A**:在人类肉眼可以识别的前提下,可以考虑图像处理中的均值滤波、中值滤波或者高斯滤波等模糊算子尝试。也可以尝试从数据扩增扰动来强化模型鲁棒性,另外新的思路有对抗性训练和超分SR思路,可以尝试借鉴。但目前业界尚无普遍认可的最优方案,建议优先在数据采集阶段增加一些限制提升图片质量。 + +#### Q2.4.3:对于特定文字检测,例如身份证只检测姓名,检测指定区域文字更好,还是检测全部区域再筛选更好? + +**A**:两个角度来说明一般检测全部区域再筛选更好。 +(1)由于特定文字和非特定文字之间的视觉特征并没有很强的区分行,只检测指定区域,容易造成特定文字漏检。 +(2)产品的需求可能是变化的,不排除后续对于模型需求变化的可能性(比如又需要增加一个字段),相比于训练模型,后处理的逻辑会更容易调整。 + +#### Q2.4.4:对于小白如何快速入门中文OCR项目实践? + +**A**:建议可以先了解OCR方向的基础知识,大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看,从内容的完备性来看,PaddleOCR的中英文双语教程文档是有明显优势的,在数据集、模型训练、预测部署文档详实,可以快速入手。而且还有微信用户群答疑,非常适合学习实践。项目地址:[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) + +#### Q2.4.5:如何识别带空格的英文行文本图像? + +**A**:空格识别可以考虑以下两种方案: +(1)优化文本检测算法。检测结果在空格处将文本断开。这种方案在检测数据标注时,需要将含有空格的文本行分成好多段。 +(2)优化文本识别算法。在识别字典里面引入空格字符,然后在识别的训练数据中,如果用空行,进行标注。此外,合成数据时,通过拼接训练数据,生成含有空格的文本。 + +#### Q2.4.6:中英文一起识别时也可以加空格字符来训练吗 + +**A**:中文识别可以加空格当做分隔符训练,具体的效果如何没法给出直接评判,根据实际业务数据训练来判断。 + +#### Q2.4.7:低像素文字或者字号比较小的文字有什么超分辨率方法吗 + +**A**:超分辨率方法分为传统方法和基于深度学习的方法。基于深度学习的方法中,比较经典的有SRCNN,另外CVPR2020也有一篇超分辨率的工作可以参考:Unpaired Image Super-Resolution using Pseudo-Supervision,但是没有充分的实践验证过,需要看实际场景下的效果。 + +#### Q2.4.8:表格识别有什么好的模型 或者论文推荐么 + +**A**:表格目前学术界比较成熟的解决方案不多 ,可以尝试下分割的论文方案。 + + + +## 【实战篇】PaddleOCR实战问题 + +### 使用咨询 + +#### Q3.1.1:OSError: [WinError 126] 找不到指定的模块。mac pro python 3.4 shapely import 问题 + +**A**:这个问题是因为shapely库安装有误,可以参考 [#212](https://github.com/PaddlePaddle/PaddleOCR/issues/212) 这个issue重新安装一下 + +#### Q3.1.2:安装了paddle-gpu,运行时提示没有安装gpu版本的paddle,可能是什么原因? + +**A**:用户同时安装了paddle cpu和gpu版本,都删掉之后,重新安装gpu版本的padle就好了 + +#### Q3.1.3:试用报错:Cannot load cudnn shared library,是什么原因呢? + +**A**:需要把cudnn lib添加到LD_LIBRARY_PATH中去。 + +#### Q3.1.4:PaddlePaddle怎么指定GPU运行 os.environ["CUDA_VISIBLE_DEVICES"]这种不生效 + +**A**:通过设置 export CUDA_VISIBLE_DEVICES='0'环境变量 + +#### Q3.1.5:windows下训练没有问题,aistudio中提示数据路径有问题 + +**A**:需要把`\`改为`/`(windows和linux的文件夹分隔符不一样,windows下的是`\`,linux下是`/`) + +#### Q3.1.6:gpu版的paddle虽然能在cpu上运行,但是必须要有gpu设备 + +**A**:export CUDA_VISIBLE_DEVICES='',CPU是可以正常跑的 + +#### Q3.1.7:预测报错ImportError: dlopen: cannot load any more object with static TLS + +**A**:glibc的版本问题,运行需要glibc的版本号大于2.23。 + +#### Q3.1.8:提供的inference model和预训练模型的区别 + +**A**:inference model为固化模型,文件中包含网络结构和网络参数,多用于预测部署。预训练模型是训练过程中保存好的模型,多用于fine-tune训练或者断点训练。 + +#### Q3.1.9:模型的解码部分有后处理? + +**A**:有的检测的后处理在ppocr/postprocess路径下,识别的后处理均在ppocr/utils/character.py文件内 + +#### Q3.1.10:PaddleOCR中文模型是否支持数字识别? + +**A**:支持的,可以看下ppocr/utils/ppocr_keys_v1.txt 这个文件,是支持的识别字符列表,其中包含了数字识别。 + +#### Q3.1.11:PaddleOCR如何做到横排和竖排同时支持的? + +**A**:合成了一批竖排文字,逆时针旋转90度后加入训练集与横排一起训练。预测时根据图片长款比判断是否为竖排,若为竖排则将crop出的文本逆时针旋转90度后送入识别网络。 + +#### Q3.1.12:如何获取检测文本框的坐标? + +**A**:文本检测的结果有box和文本信息, 具体 [参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/9d33e36df550762b204d5fbfd7977a25e31b2c44/tools/infer/predict_system.py#L13) + +#### Q3.1.13:识别模型框出来的位置太紧凑,会丢失边缘的文字信息,导致识别错误 + +**A**: 可以在命令中加入 --det_db_unclip_ratio ,参考[定义位置](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/utility.py#L49),这个参数是检测后处理时控制文本框大小的,默认2.0,可以尝试改成2.5或者更大 +,反之,如果觉得文本框不够紧凑,也可以把该参数调小。 + +#### Q3.1.14:英文手写体识别有计划提供的预训练模型吗? + +**A**:近期也在开展需求调研,如果企业用户需求较多,我们会考虑增加相应的研发投入,后续提供对应的预训练模型,如果有需求欢迎通过issue或者加入微信群联系我们。 + +#### Q3.1.15:超轻量模型和通用OCR模型的区别? + +**A**:理论上只要有相应的数据集,都是可以的。当然手写识别毕竟和印刷体有区别,对应训练调优策略可能需要适配性优化。 + + +#### Q3.1.16:PaddleOCR的算法可以用于手写文字检测识别吗?后续有计划推出手写预训练模型么? + +**A**:PaddleOCR已完成Windows和Mac系统适配,并且python预测支持使用pip包安装。运行时注意两点:1、在[快速安装](./installation.md)时,如果不想安装docker,可跳过第一步,直接从第二步安装paddle开始。2、inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。 + +#### Q3.1.17:PaddleOCR是否支持在Windows或Mac系统上运行? +**A**:目前PaddleOCR开源了2个中文模型,分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下: - 相同点:两者使用相同的**算法**和**训练数据**; - 不同点:不同之处在于**骨干网络**和**通道参数**,超轻量模型使用MobileNetV3作为骨干网络,通用模型使用Resnet50_vd作为检测模型backbone,Resnet34_vd作为识别模型backbone,具体参数差异可对比两种模型训练的配置文件. @@ -28,26 +289,186 @@ PaddleOCR已完成Windows和Mac系统适配,运行时注意两点:1、在[ |8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml| |通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml| -8. **是否有计划开源仅识别数字或仅识别英文+数字的模型** -暂不计划开源仅数字、仅数字+英文、或其他小垂类专用模型。PaddleOCR开源了多种检测、识别算法供用户自定义训练,两种中文模型也是基于开源的算法库训练产出,有小垂类需求的小伙伴,可以按照教程准备好数据,选择合适的配置文件,自行训练,相信能有不错的效果。训练有任何问题欢迎提issue或在交流群提问,我们会及时解答。 +#### Q3.1.18:是否有计划开源仅识别数字或仅识别英文+数字的模型 + +**A**:目前主要是开源通用类OCR模型,暂不计划开源小垂类专用模型。PaddleOCR开源了多种检测、识别算法供用户自定义训练,两种中文模型也是基于开源的算法库训练产出,有小垂类需求的小伙伴,可以按照教程准备好数据,选择合适的配置文件,自行训练,相信能有不错的效果。训练有任何问题欢迎提issue或在交流群提问,我们会及时解答。 + + +### 数据集 -9. **开源模型使用的训练数据是什么,能否开源** -目前开源的模型,数据集和量级如下: +#### Q3.2.1:如何制作PaddleOCR支持的数据格式 + +**A**:可以参考检测与识别训练文档,里面有数据格式详细介绍。[检测文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md),[识别文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/recognition.md) + +#### Q3.2.2:请问一下,如果想用预训练模型,但是我的数据里面又出现了预训练模型字符集中没有的字符,新的字符是在字符集前面添加还是在后面添加? + +**A**:在后面添加,修改dict之后,就改变了模型最后一层fc的结构,之前训练到的参数没有用到,相当于从头训练,因此acc是0。 + +#### Q3.2.3:如何调试数据读取程序? + +**A**:tools/train.py中有一个test_reader()函数用于调试数据读取。 + +#### Q3.2.4:开源模型使用的训练数据是什么,能否开源? + +**A**:目前开源的模型,数据集和量级如下: - 检测: 英文数据集,ICDAR2015 中文数据集,LSVT街景数据集训练数据3w张图片 - 识别: 英文数据集,MJSynth和SynthText合成数据,数据量上千万。 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。 - 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 -10. **使用带TPS的识别模型预测报错** -报错信息:Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3]\(320) != Grid dimension[2]\(100) -原因:TPS模块暂时无法支持变长的输入,请设置 --rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape +#### Q3.2.5:请问中文字符集多大呢?支持生僻字识别吗? + +**A**:中文字符集是6623, 支持生僻字识别。训练样本中有部分生僻字,但样本不多,如果有特殊需求建议使用自己的数据集做fine-tune。 + +#### Q3.2.6:中文文本检测、文本识别构建训练集的话,大概需要多少数据量 + +**A**:检测需要的数据相对较少,在PaddleOCR模型的基础上进行Fine-tune,一般需要500张可达到不错的效果。 +识别分英文和中文,一般英文场景需要几十万数据可达到不错的效果,中文则需要几百万甚至更多。 + +#### Q3.2.7:中文识别模型如何选择? + +**A**:中文模型共有2大类:通用模型和超轻量模型。他们各自的优势如下: +超轻量模型具有更小的模型大小,更快的预测速度。适合用于端侧使用。 +通用模型具有更高的模型精度,适合对模型大小不敏感的场景。 +此外基于以上模型,PaddleOCR还提供了支持空格识别的模型,主要针对中文场景中的英文句子。 +您可以根据实际使用需求进行选择。 + +#### Q3.2.8:图像旋转90° 文本检测可以正常检测到具体文本位置,但是识别准确度大幅降低,是否会考虑增加相应的旋转预处理? + +**A**:目前模型只支持两种方向的文字:水平和垂直。 为了降低模型大小,加快模型预测速度,PaddleOCR暂时没有加入图片的方向判断。建议用户在识别前自行转正,后期也会考虑添加选择角度判断。 + +#### Q3.2.9:同一张图通用检测出21个条目,轻量级检测出26个 ,难道不是轻量级的好吗? + +**A**:可以主要参考可视化效果,通用模型更倾向于检测一整行文字,轻量级可能会有一行文字被分成两段检测的情况,不是数量越多,效果就越好。 + +### 模型训练 +调优 + +#### Q3.3.1:文本长度超过25,应该怎么处理? + +**A**:默认训练时的文本可识别的最大长度为25,超过25的文本会被忽略不参与训练。如果您训练样本中的长文本较多,可以修改配置文件中的 max\_text\_length 字段,设置为更大的最长文本长度,具体位置在[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/fb9e47b262529386983edc21b33abfa16bbf06ac/configs/rec/rec_chinese_lite_train.yml#L13)。 + +#### Q3.3.2:配置文件里面检测的阈值设置么? + +**A**:有的,检测相关的参数主要有以下几个: +``max_side_len:预测时图像resize的长边尺寸 +thresh: 用于二值化输出图的阈值 +box_thresh:用于过滤文本框的阈值,低于此阈值的文本框不要 +unclip_ratio: 文本框扩张的系数,关系到文本框的大小`` + +这些参数的默认值见[代码](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/utility.py#L40),可以通过从命令行传递参数进行修改。 + +#### Q3.3.3:我想请教一下,你们在训练识别时候,lsvt里的非矩形框文字,你们是怎么做处理的呢。忽略掉还是去最小旋转框? + +**A**:现在是忽略处理的 + +#### Q3.3.4:训练过程中,如何恰当的停止训练(直接kill,经常还有显存占用的问题) + +**A**:可以通过下面的脚本终止所有包含train.py字段的进程,ps -axu | grep train.py | awk '{print $2}' | xargs kill -9 + +#### Q3.3.5:读数据进程数设置4~8时训练一会进程接连defunct后gpu利用率一直为0卡死 + +**A**:修改多进程的队列数后解决, 将[代码段]( https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/ppocr/data/reader_main.py#L75 ) 修改为: + +``` +return paddle.reader.multiprocess_reader(readers, False, queue_size=320) +``` + +#### Q3.3.6:可不可以将pretrain_weights设置为空呢?想从零开始训练一个model + +**A**:这个是可以的,在训练通用识别模型的时候,pretrain_weights就设置为空,但是这样可能需要更长的迭代轮数才能达到相同的精度。 + +#### Q3.3.7:PaddleOCR默认不是200个step保存一次模型吗?为啥文件夹下面都没有生成 + +**A**:eval_batch_step [4000, 5000]改为[0, 5000] 就是从第0次迭代开始,每5000迭代保存一次模型 + +#### Q3.3.8:如何进行模型微调? + +**A**:注意配置好匹配的数据集合适,然后在finetune训练时,可以加载我们提供的预训练模型,设置配置文件中Global.pretrain_weights 参数为要加载的预训练模型路径。 + +#### Q3.3.9:文本检测换成自己的数据没法训练,有一些”###”是什么意思? + +**A**:数据格式有问题,”###” 表示要被忽略的文本区域,所以你的数据都被跳过了,可以换成其他任意字符或者就写个空的。 + +#### Q3.3.10:copy_from_cpu这个地方,这块input不变(t_data的size不变)连续调用两次copy_from_cpu()时,这里面的gpu_place会重新malloc GPU内存吗?还是只有当ele_size变化时才会重新在GPU上malloc呢? + +**A**:小于等于的时候都不会重新分配,只有大于的时候才会重新分配 + +#### Q3.3.11:自己训练出来的未inference转换的模型 可以当作预训练模型吗? + +**A**:可以的,但是如果训练数据两量少的话,可能会过拟合到少量数据上,泛化性能不佳。 + +#### Q3.3.12:如何更换文本检测/识别的backbone? + +**A**:直接更换配置文件里的Backbone.function即可,格式为:网络文件路径,网络Class名词。如果所需的backbone在PaddleOCR里没有提供,可以参照PaddleClas里面的网络结构,进行修改尝试。具体修改原则可以参考OCR通用问题中 "如何更换文本检测/识别的backbone" 的回答。 + +#### Q3.3.13:使用带TPS的识别模型预测报错 + +**A**:报错信息:'''Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](320) != Grid dimension[2](100) +**A**:TPS模块暂时无法支持变长的输入,请设置 --rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape + +### 预测部署 + +#### Q3.4.1:如何pip安装opt模型转换工具? + +**A**:由于OCR端侧部署需要某些算子的支持,这些算子仅在Paddle-Lite 最新develop分支中,所以需要自己编译opt模型转换工具。opt工具可以通过编译PaddleLite获得,编译步骤参考:https://github.com/PaddlePaddle/PaddleOCR/blob/0791714b91/deploy/lite/readme.md 中2.1 模型优化部分。 + +#### Q3.4.2:如何将PaddleOCR预测模型封装成SDK + +**A**:如果是Python的话,可以使用tools/infer/predict_system.py中的TextSystem进行sdk封装,如果是c++的话,可以使用deploy/cpp_infer/src下面的DBDetector和CRNNRecognizer完成封装 + +#### Q3.4.3:服务部署可以只发布 文本识别模型么?(不带文本检测模型) + +**A**:可以的。默认的服务部署是检测和识别串联预测的。也支持单独发布文本检测或文本识别模型,比如使用PaddleHUBPaddleOCR 模型时,deploy下有三个文件夹,分别是 +ocr_det:检测预测 +ocr_rec: 识别预测 +ocr_system: 检测识别串联预测 +每个模块是单独分开的,所以可以选择只发布文本识别模型。使用PaddleServing部署时同理。 + + +#### Q3.4.4:为什么PaddleOCR检测预测是只支持一张图片测试?即test_batch_size_per_card=1 + +**A**:测试的时候,对图像等比例缩放,最长边960,不同图像等比例缩放后长宽不一致,无法组成batch,所以设置为test_batch_size为1。 + +#### Q3.4.5:为什么使用c++ inference和py inference结果不一致 + +**A**:可能是导出的inference model版本与预测库版本需要保持一致,比如在Windows下,Paddle官网提供的预测库版本是1.8,而PaddleOCR提供的inference model 版本是1.7,因此最终预测结果会有差别。可以在Paddle1.8环境下导出模型,再基于该模型进行预测。 +此外也需要保证两者的预测参数配置完全一致。 + +#### Q3.4.6:为什么第一张张图预测时间很长,第二张之后预测时间会降低? + +**A**:第一张图需要初始化,耗时较多。完成模型加载后,之后的预测时间很短。 + +#### Q3.4.7:请问opt工具可以直接转int8量化后的模型为.nb文件吗 + +**A**:有的,PaddleLite提供完善的opt工具,可以参考[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/post_quant_with_data.html) + +#### Q3.4.8:请问在安卓端怎么设置这个参数 --det_db_unclip_ratio=3 + +**A**:在安卓APK上无法设置,没有暴露这个接口,如果使用的是PaddledOCR/deploy/lite/的demo,可以修改config.txt中的对应参数来设置 + +#### Q3.4.9:PaddleOCR模型是否可以转换成ONNX模型? + +**A**:目前不支持转ONNX + +#### Q3.4.10:使用opt工具对检测模型转换时报错 can not found op arguments for node conv2_b_attr + +**A**:这个问题大概率是编译opt工具的Paddle-Lite不是develop分支,建议使用Paddle-Lite 的develop分支编译opt工具。 + +#### Q3.4.11:libopenblas.so找不到是什么意思? + +**A**:目前包括mkl和openblas两种版本的预测库,推荐使用mkl的预测库,如果下载的预测库是mkl的,编译的时候也需要勾选`with_mkl`选项 +,以Linux下编译为例,需要在设置这里为ON,`-DWITH_MKL=ON`,[参考链接](https://github.com/PaddlePaddle/PaddleOCR/blob/8a78af26df0dd8f15b734cc8db13e25d2a3656a2/deploy/cpp_infer/tools/build.sh#L12)。此外,使用预测库时,推荐在Linux或者Windows上进行开发,不推荐在MacOS上开发。 +#### Q3.4.12:使用自定义字典训练,inference时如何修改 + +**A**:使用了自定义字典的话,用inference预测时,需要通过 --rec_char_dict_path 修改字典路径。详细操作可参考[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/inference.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E5%AD%97%E5%85%B8%E7%9A%84%E6%8E%A8%E7%90%86) + +#### Q3.4.13:能否返回单字字符的位置? -11. **自定义字典训练的模型,识别结果出现字典里没出现的字** -预测时没有设置采用的自定义字典路径。设置方法是在预测时,通过增加输入参数rec_char_dict_path来设置。 +**A**:训练的时候标注是整个文本行的标注,所以预测的也是文本行位置,如果要获取单字符位置信息,可以根据预测的文本,计算字符数量,再去根据整个文本行的位置信息,估计文本块中每个字符的位置。 -12. **cpp infer与python inference的结果不一致,相差较大** -导出的inference model版本与预测库版本需要保持一致,比如在Windows下,Paddle官网提供的预测库版本是1.8,而PaddleOCR提供的inference model 版本是1.7,因此最终预测结果会有差别。可以在Paddle1.8环境下导出模型,再基于该模型进行预测。 +#### Q3.4.14:PaddleOCR模型部署方式有哪几种? +**A**:目前有Inference部署,serving部署和手机端Paddle Lite部署,可根据不同场景做灵活的选择:Inference部署适用于本地离线部署,serving部署适用于云端部署,Paddle Lite部署适用于手机端集成。 diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index 03fe1b3280881472c830cf5ac57dee183a94b373..fe8db9c893cf0e6190111de5fe7627d2fe52a4fd 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -63,8 +63,9 @@ | beta1 | 设置一阶矩估计的指数衰减率 | 0.9 | \ | | beta2 | 设置二阶矩估计的指数衰减率 | 0.999 | \ | | decay | 是否使用decay | \ | \ | -| function(decay) | 设置decay方式 | - | 目前支持cosine_decay与piecewise_decay | -| step_each_epoch | 每个epoch包含多少次迭代, cosine_decay时有效 | 20 | 计算方式:total_image_num / (batch_size_per_card * card_size) | -| total_epoch | 总共迭代多少个epoch, cosine_decay时有效 | 1000 | 与Global.epoch_num 一致 | +| function(decay) | 设置decay方式 | - | 目前支持cosine_decay, cosine_decay_warmup与piecewise_decay | +| step_each_epoch | 每个epoch包含多少次迭代, cosine_decay/cosine_decay_warmup时有效 | 20 | 计算方式:total_image_num / (batch_size_per_card * card_size) | +| total_epoch | 总共迭代多少个epoch, cosine_decay/cosine_decay_warmup时有效 | 1000 | 与Global.epoch_num 一致 | +| warmup_minibatch | 线性warmup的迭代次数, cosine_decay_warmup时有效 | 1000 | \ | | boundaries | 学习率下降时的迭代次数间隔, piecewise_decay时有效 | - | 参数为列表形式 | | decay_rate | 学习率衰减系数, piecewise_decay时有效 | - | \ | diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index fead57f3d12395c6b4a2417fe8a23b1e00a4579b..701b50ed36fc69a6285550e6f53f6f3a09a1a63d 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -5,6 +5,8 @@ 请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。 +*注意:也可以通过 whl 包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)。* + ## 2.inference模型下载 |模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| diff --git a/doc/doc_ch/update.md b/doc/doc_ch/update.md index 1cd7788511c29df8934efe2c1462aaca68c9b92b..23a47df580da065af0ab62aca2c50e507f564f05 100644 --- a/doc/doc_ch/update.md +++ b/doc/doc_ch/update.md @@ -1,6 +1,8 @@ # 更新 +- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md) +- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) - 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294) -- 2020.7.23 发布7月21日B站直播课回放和PPT,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519) +- 2020.7.23 发布7月21日B站直播课回放和PPT,课节1,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519) - 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统 - 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark - 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具 diff --git a/doc/doc_ch/whl.md b/doc/doc_ch/whl.md new file mode 100644 index 0000000000000000000000000000000000000000..280cc2f62ec40ec2228128c9ddd95088904f647b --- /dev/null +++ b/doc/doc_ch/whl.md @@ -0,0 +1,194 @@ +# paddleocr package使用说明 + +## 快速上手 + +### 安装whl包 + +pip安装 +```bash +pip install paddleocr +``` + +本地构建并安装 +```bash +python setup.py bdist_wheel +pip install dist/paddleocr-0.0.3-py3-none-any.whl +``` +### 1. 代码使用 + +* 检测+识别全流程 +```python +from paddleocr import PaddleOCR, draw_ocr +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs/11.jpg' +result = ocr.ocr(img_path) +for line in result: + print(line) + +# 显示结果 +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` +结果是一个list,每个item包含了文本框,文字和识别置信度 +```bash +[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]] +[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]] +[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]] +...... +``` +结果可视化 + +
+ +
+ +* 单独执行检测 +```python +from paddleocr import PaddleOCR, draw_ocr +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs/11.jpg' +result = ocr.ocr(img_path,rec=False) +for line in result: + print(line) + +# 显示结果 +from PIL import Image + +image = Image.open(img_path).convert('RGB') +im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` +结果是一个list,每个item只包含文本框 +```bash +[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]] +[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]] +[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]] +...... +``` +结果可视化 + + +
+ +
+ +* 单独执行识别 +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg' +result = ocr.ocr(img_path,det=False) +for line in result: + print(line) +``` +结果是一个list,每个item只包含识别结果和识别置信度 +```bash +['韩国小馆', 0.9907421] +``` + +### 通过命令行使用 + +查看帮助信息 +```bash +paddleocr -h +``` + +* 检测+识别全流程 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg +``` +结果是一个list,每个item包含了文本框,文字和识别置信度 +```bash +[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]] +[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]] +[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]] +...... +``` + +* 单独执行检测 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false +``` +结果是一个list,每个item只包含文本框 +```bash +[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]] +[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]] +[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]] +...... +``` + +* 单独执行识别 +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false +``` + +结果是一个list,每个item只包含识别结果和识别置信度 +```bash +['韩国小馆', 0.9907421] +``` + +## 自定义模型 +当内置模型无法满足需求时,需要使用到自己训练的模型。 +首先,参照[inference.md](./inference.md) 第一节转换将检测和识别模型转换为inference模型,然后按照如下方式使用 + +### 代码使用 +```python +from paddleocr import PaddleOCR, draw_ocr +# 检测模型和识别模型路径下必须含有model和params文件 +ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}') +img_path = 'PaddleOCR/doc/imgs/11.jpg' +result = ocr.ocr(img_path) +for line in result: + print(line) + +# 显示结果 +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +### 通过命令行使用 + +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} +``` + +## 参数说明 + +| 字段 | 说明 | 默认值 | +|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------| +| use_gpu | 是否使用GPU | TRUE | +| gpu_mem | 初始化占用的GPU内存大小 | 8000M | +| image_dir | 通过命令行调用时执行预测的图片或文件夹路径 | | +| det_algorithm | 使用的检测算法类型 | DB | +| det_model_dir | 检测模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/det`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None | +| det_max_side_len | 检测算法前向时图片长边的最大尺寸,当长边超出这个值时会将长边resize到这个大小,短边等比例缩放 | 960 | +| det_db_thresh | DB模型输出预测图的二值化阈值 | 0.3 | +| det_db_box_thresh | DB模型输出框的阈值,低于此值的预测框会被丢弃 | 0.5 | +| det_db_unclip_ratio | DB模型输出框扩大的比例 | 2 | +| det_east_score_thresh | EAST模型输出预测图的二值化阈值 | 0.8 | +| det_east_cover_thresh | EAST模型输出框的阈值,低于此值的预测框会被丢弃 | 0.1 | +| det_east_nms_thresh | EAST模型输出框NMS的阈值 | 0.2 | +| rec_algorithm | 使用的识别算法类型 | CRNN | +| rec_model_dir | 识别模型所在文件夹。传承那方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None | +| rec_image_shape | 识别算法的输入图片尺寸 | "3,32,320" | +| rec_char_type | 识别算法的字符类型,中文(ch)或英文(en) | ch | +| rec_batch_num | 进行识别时,同时前向的图片数 | 30 | +| max_text_length | 识别算法能识别的最大文字长度 | 25 | +| rec_char_dict_path | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径 | ./ppocr/utils/ppocr_keys_v1.txt | +| use_space_char | 是否识别空格 | TRUE | +| enable_mkldnn | 是否启用mkldnn | FALSE | +| det | 前向时使用启动检测 | TRUE | +| rec | 前向时是否启动识别 | TRUE | diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index 66578424a60488a986eaff6fe937e4ffbc1bf59e..b54def895f0758df7cdbd089253d6acd712d2b8e 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -60,8 +60,9 @@ Take `rec_icdar15_train.yml` as an example: | beta1 | Set the exponential decay rate for the 1st moment estimates | 0.9 | \ | | beta2 | Set the exponential decay rate for the 2nd moment estimates | 0.999 | \ | | decay | Whether to use decay | \ | \ | -| function(decay) | Set the decay function | cosine_decay | Support cosine_decay and piecewise_decay | -| step_each_epoch | The number of steps in an epoch. Used in cosine_decay | 20 | Calculation :total_image_num / (batch_size_per_card * card_size) | -| total_epoch | The number of epochs. Used in cosine_decay | 1000 | Consistent with Global.epoch_num | +| function(decay) | Set the decay function | cosine_decay | Support cosine_decay, cosine_decay_warmup and piecewise_decay | +| step_each_epoch | The number of steps in an epoch. Used in cosine_decay/cosine_decay_warmup | 20 | Calculation: total_image_num / (batch_size_per_card * card_size) | +| total_epoch | The number of epochs. Used in cosine_decay/cosine_decay_warmup | 1000 | Consistent with Global.epoch_num | +| warmup_minibatch | Number of steps for linear warmup. Used in cosine_decay_warmup | 1000 | \ | | boundaries | The step intervals to reduce learning rate. Used in piecewise_decay | - | The format is list | | decay_rate | Learning rate decay rate. Used in piecewise_decay | - | \ | diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index bf22f22fee75a028e5f5effd6f7e36b08c194222..d1fa1683fcfea14be477c910fb2a8dc7709c5d36 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -5,6 +5,7 @@ Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment. +*Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)。* ## 2.inference models diff --git a/doc/doc_en/update_en.md b/doc/doc_en/update_en.md index dc839d8955afcfa2d1efbee5e02d35f384d6c627..ca050370989ba3cded8c7211b7ab297ebe239c5f 100644 --- a/doc/doc_en/update_en.md +++ b/doc/doc_en/update_en.md @@ -1,4 +1,5 @@ # RECENT UPDATES +- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md) - 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294) - 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519) - 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite) diff --git a/doc/doc_en/whl_en.md b/doc/doc_en/whl_en.md new file mode 100644 index 0000000000000000000000000000000000000000..73ab78c111fd4c59a7866ba061877cc91100fb93 --- /dev/null +++ b/doc/doc_en/whl_en.md @@ -0,0 +1,199 @@ +# paddleocr package + +## Get started quickly +### install package +install by pypi +```bash +pip install paddleocr +``` + +build own whl package and install +```bash +python setup.py bdist_wheel +pip install dist/paddleocr-0.0.3-py3-none-any.whl +``` +### 1. Use by code + +* detection and recognition +```python +from paddleocr import PaddleOCR,draw_ocr +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' +result = ocr.ocr(img_path) +for line in result: + print(line) + +# draw result +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +Output will be a list, each item contains bounding box, text and recognition confidence +```bash +[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]] +[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]] +[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]] +...... +``` + +Visualization of results + +
+ +
+ +* only detection +```python +from paddleocr import PaddleOCR,draw_ocr +ocr = PaddleOCR() # need to run only once to download and load model into memory +img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' +result = ocr.ocr(img_path,rec=False) +for line in result: + print(line) + +# draw result +from PIL import Image + +image = Image.open(img_path).convert('RGB') +im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +Output will be a list, each item only contains bounding box +```bash +[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]] +[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]] +[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]] +...... +``` + +Visualization of results + +
+ +
+ +* only recognition +```python +from paddleocr import PaddleOCR +ocr = PaddleOCR() # need to run only once to load model into memory +img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png' +result = ocr.ocr(img_path,det=False) +for line in result: + print(line) +``` + +Output will be a list, each item contains text and recognition confidence +```bash +['PAIN', 0.990372] +``` + +### Use by command line + +show help information +```bash +paddleocr -h +``` + +* detection and recognition +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg +``` + +Output will be a list, each item contains bounding box, text and recognition confidence +```bash +[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]] +[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]] +[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]] +...... +``` + +* only detection +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false +``` + +Output will be a list, each item only contains bounding box +```bash +[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]] +[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]] +[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]] +...... +``` + +* only recognition +```bash +paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false +``` + +Output will be a list, each item contains text and recognition confidence +```bash +['PAIN', 0.990372] +``` + +## Use custom model +When the built-in model cannot meet the needs, you need to use your own trained model. +First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows + +### 1. Use by code + +```python +from paddleocr import PaddleOCR,draw_ocr +# The path of detection and recognition model must contain model and params files +ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}å') +img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' +result = ocr.ocr(img_path) +for line in result: + print(line) + +# draw result +from PIL import Image +image = Image.open(img_path).convert('RGB') +boxes = [line[0] for line in result] +txts = [line[1][0] for line in result] +scores = [line[1][1] for line in result] +im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf') +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + +### Use by command line + +```bash +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} +``` + +## Parameter Description + +| Parameter | Description | Default value | +|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------| +| use_gpu | use GPU or not | TRUE | +| gpu_mem | GPU memory size used for initialization | 8000M | +| image_dir | The images path or folder path for predicting when used by the command line | | +| det_algorithm | Type of detection algorithm selected | DB | +| det_model_dir | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None | +| det_max_side_len | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally | 960 | +| det_db_thresh | Binarization threshold value of DB output map | 0.3 | +| det_db_box_thresh | The threshold value of the DB output box. Boxes score lower than this value will be discarded | 0.5 | +| det_db_unclip_ratio | The expanded ratio of DB output box | 2 | +| det_east_score_thresh | Binarization threshold value of EAST output map | 0.8 | +| det_east_cover_thresh | The threshold value of the EAST output box. Boxes score lower than this value will be discarded | 0.1 | +| det_east_nms_thresh | The NMS threshold value of EAST model output box | 0.2 | +| rec_algorithm | Type of recognition algorithm selected | CRNN | +| rec_model_dir | the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/rec`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None | +| rec_image_shape | image shape of recognition algorithm | "3,32,320" | +| rec_char_type | Character type of recognition algorithm, Chinese (ch) or English (en) | ch | +| rec_batch_num | When performing recognition, the batchsize of forward images | 30 | +| max_text_length | The maximum text length that the recognition algorithm can recognize | 25 | +| rec_char_dict_path | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2 | ./ppocr/utils/ppocr_keys_v1.txt | +| use_space_char | Whether to recognize spaces | TRUE | +| enable_mkldnn | Whether to enable mkldnn | FALSE | +| det | Enable detction when `ppocr.ocr` func exec | TRUE | +| rec | Enable detction when `ppocr.ocr` func exec | TRUE | diff --git a/doc/imgs_results/whl/11_det.jpg b/doc/imgs_results/whl/11_det.jpg new file mode 100644 index 0000000000000000000000000000000000000000..fe0cd23cc24457f5d7084fff0c63c239d09c9969 Binary files /dev/null and b/doc/imgs_results/whl/11_det.jpg differ diff --git a/doc/imgs_results/whl/11_det_rec.jpg b/doc/imgs_results/whl/11_det_rec.jpg new file mode 100644 index 0000000000000000000000000000000000000000..31c566478fd874d10a61dcd54635453e34c20e4c Binary files /dev/null and b/doc/imgs_results/whl/11_det_rec.jpg differ diff --git a/doc/imgs_results/whl/12_det.jpg b/doc/imgs_results/whl/12_det.jpg new file mode 100644 index 0000000000000000000000000000000000000000..1d5ccf2a6b5d3fa9516560e0cb2646ad6b917da6 Binary files /dev/null and b/doc/imgs_results/whl/12_det.jpg differ diff --git a/doc/imgs_results/whl/12_det_rec.jpg b/doc/imgs_results/whl/12_det_rec.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9db8b57e1279362db2c9f3d6a3ba36b77bf13775 Binary files /dev/null and b/doc/imgs_results/whl/12_det_rec.jpg differ diff --git a/paddleocr.py b/paddleocr.py new file mode 100644 index 0000000000000000000000000000000000000000..65bca7ae243e15e4788b5b637be65d57cf9504e5 --- /dev/null +++ b/paddleocr.py @@ -0,0 +1,212 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys + +__dir__ = os.path.dirname(__file__) +sys.path.append(os.path.join(__dir__, '')) + +import cv2 +import numpy as np +from pathlib import Path +import tarfile +import requests +from tqdm import tqdm + +from tools.infer import predict_system +from ppocr.utils.utility import initial_logger + +logger = initial_logger() +from ppocr.utils.utility import check_and_read_gif, get_image_file_list + +__all__ = ['PaddleOCR'] + +model_params = { + 'det': 'https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar', + 'rec': + 'https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar', +} + +SUPPORT_DET_MODEL = ['DB'] +SUPPORT_REC_MODEL = ['CRNN'] +BASE_DIR = os.path.expanduser("~/.paddleocr/") + + +def download_with_progressbar(url, save_path): + response = requests.get(url, stream=True) + total_size_in_bytes = int(response.headers.get('content-length', 0)) + block_size = 1024 # 1 Kibibyte + progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True) + with open(save_path, 'wb') as file: + for data in response.iter_content(block_size): + progress_bar.update(len(data)) + file.write(data) + progress_bar.close() + if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes: + logger.error("ERROR, something went wrong") + sys.exit(0) + + +def maybe_download(model_storage_directory, url): + # using custom model + if not os.path.exists(os.path.join( + model_storage_directory, 'model')) or not os.path.exists( + os.path.join(model_storage_directory, 'params')): + tmp_path = os.path.join(model_storage_directory, url.split('/')[-1]) + print('download {} to {}'.format(url, tmp_path)) + os.makedirs(model_storage_directory, exist_ok=True) + download_with_progressbar(url, tmp_path) + with tarfile.open(tmp_path, 'r') as tarObj: + for member in tarObj.getmembers(): + if "model" in member.name: + filename = 'model' + elif "params" in member.name: + filename = 'params' + else: + continue + file = tarObj.extractfile(member) + with open( + os.path.join(model_storage_directory, filename), + 'wb') as f: + f.write(file.read()) + os.remove(tmp_path) + + +def parse_args(): + import argparse + + def str2bool(v): + return v.lower() in ("true", "t", "1") + + parser = argparse.ArgumentParser() + # params for prediction engine + parser.add_argument("--use_gpu", type=str2bool, default=True) + parser.add_argument("--ir_optim", type=str2bool, default=True) + parser.add_argument("--use_tensorrt", type=str2bool, default=False) + parser.add_argument("--gpu_mem", type=int, default=8000) + + # params for text detector + parser.add_argument("--image_dir", type=str) + parser.add_argument("--det_algorithm", type=str, default='DB') + parser.add_argument("--det_model_dir", type=str, default=None) + parser.add_argument("--det_max_side_len", type=float, default=960) + + # DB parmas + parser.add_argument("--det_db_thresh", type=float, default=0.3) + parser.add_argument("--det_db_box_thresh", type=float, default=0.5) + parser.add_argument("--det_db_unclip_ratio", type=float, default=2.0) + + # EAST parmas + parser.add_argument("--det_east_score_thresh", type=float, default=0.8) + parser.add_argument("--det_east_cover_thresh", type=float, default=0.1) + parser.add_argument("--det_east_nms_thresh", type=float, default=0.2) + + # params for text recognizer + parser.add_argument("--rec_algorithm", type=str, default='CRNN') + parser.add_argument("--rec_model_dir", type=str, default=None) + parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320") + parser.add_argument("--rec_char_type", type=str, default='ch') + parser.add_argument("--rec_batch_num", type=int, default=30) + parser.add_argument("--max_text_length", type=int, default=25) + parser.add_argument( + "--rec_char_dict_path", + type=str, + default="./ppocr/utils/ppocr_keys_v1.txt") + parser.add_argument("--use_space_char", type=bool, default=True) + parser.add_argument("--enable_mkldnn", type=bool, default=False) + + parser.add_argument("--det", type=str2bool, default=True) + parser.add_argument("--rec", type=str2bool, default=True) + return parser.parse_args() + + +class PaddleOCR(predict_system.TextSystem): + def __init__(self, **kwargs): + """ + paddleocr package + args: + **kwargs: other params show in paddleocr --help + """ + postprocess_params = parse_args() + postprocess_params.__dict__.update(**kwargs) + + # init model dir + if postprocess_params.det_model_dir is None: + postprocess_params.det_model_dir = os.path.join(BASE_DIR, 'det') + if postprocess_params.rec_model_dir is None: + postprocess_params.rec_model_dir = os.path.join(BASE_DIR, 'rec') + print(postprocess_params) + # download model + maybe_download(postprocess_params.det_model_dir, model_params['det']) + maybe_download(postprocess_params.rec_model_dir, model_params['rec']) + + if postprocess_params.det_algorithm not in SUPPORT_DET_MODEL: + logger.error('det_algorithm must in {}'.format(SUPPORT_DET_MODEL)) + sys.exit(0) + if postprocess_params.rec_algorithm not in SUPPORT_REC_MODEL: + logger.error('rec_algorithm must in {}'.format(SUPPORT_REC_MODEL)) + sys.exit(0) + + postprocess_params.rec_char_dict_path = Path( + __file__).parent / postprocess_params.rec_char_dict_path + + # init det_model and rec_model + super().__init__(postprocess_params) + + def ocr(self, img, det=True, rec=True): + """ + ocr with paddleocr + args: + img: img for ocr, support ndarray, img_path and list or ndarray + det: use text detection or not, if false, only rec will be exec. default is True + rec: use text recognition or not, if false, only det will be exec. default is True + """ + assert isinstance(img, (np.ndarray, list, str)) + if isinstance(img, str): + image_file = img + img, flag = check_and_read_gif(image_file) + if not flag: + img = cv2.imread(image_file) + if img is None: + logger.error("error in loading image:{}".format(image_file)) + return None + if det and rec: + dt_boxes, rec_res = self.__call__(img) + return [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)] + elif det and not rec: + dt_boxes, elapse = self.text_detector(img) + if dt_boxes is None: + return None + return [box.tolist() for box in dt_boxes] + else: + if not isinstance(img, list): + img = [img] + rec_res, elapse = self.text_recognizer(img) + return rec_res + + +def main(): + # for com + args = parse_args() + image_file_list = get_image_file_list(args.image_dir) + if len(image_file_list) == 0: + logger.error('no images find in {}'.format(args.image_dir)) + return + ocr_engine = PaddleOCR() + for img_path in image_file_list: + print(img_path) + result = ocr_engine.ocr(img_path, det=args.det, rec=args.rec) + for line in result: + print(line) diff --git a/ppocr/optimizer.py b/ppocr/optimizer.py index 55f2eba14c4be738c0dbc686cd32afbcff62f874..fd315cd1319d4925e893705957a42f931a39076e 100644 --- a/ppocr/optimizer.py +++ b/ppocr/optimizer.py @@ -14,14 +14,50 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function +import math import paddle.fluid as fluid from paddle.fluid.regularizer import L2Decay +from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter +import paddle.fluid.layers.ops as ops from ppocr.utils.utility import initial_logger logger = initial_logger() +def cosine_decay_with_warmup(learning_rate, + step_each_epoch, + epochs=500, + warmup_minibatch=1000): + """Applies cosine decay to the learning rate. + lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1) + decrease lr for every mini-batch and start with warmup. + """ + global_step = _decay_step_counter() + lr = fluid.layers.tensor.create_global_var( + shape=[1], + value=0.0, + dtype='float32', + persistable=True, + name="learning_rate") + + warmup_minibatch = fluid.layers.fill_constant( + shape=[1], + dtype='float32', + value=float(warmup_minibatch), + force_cpu=True) + + with fluid.layers.control_flow.Switch() as switch: + with switch.case(global_step < warmup_minibatch): + decayed_lr = learning_rate * (1.0 * global_step / warmup_minibatch) + fluid.layers.tensor.assign(input=decayed_lr, output=lr) + with switch.default(): + decayed_lr = learning_rate * \ + (ops.cos((global_step - warmup_minibatch) * (math.pi / (epochs * step_each_epoch))) + 1)/2 + fluid.layers.tensor.assign(input=decayed_lr, output=lr) + return lr + + def AdamDecay(params, parameter_list=None): """ define optimizer function @@ -36,7 +72,9 @@ def AdamDecay(params, parameter_list=None): l2_decay = params.get("l2_decay", 0.0) if 'decay' in params: - supported_decay_mode = ["cosine_decay", "piecewise_decay"] + supported_decay_mode = [ + "cosine_decay", "cosine_decay_warmup", "piecewise_decay" + ] params = params['decay'] decay_mode = params['function'] assert decay_mode in supported_decay_mode, "Supported decay mode is {}, but got {}".format( @@ -49,6 +87,15 @@ def AdamDecay(params, parameter_list=None): learning_rate=base_lr, step_each_epoch=step_each_epoch, epochs=total_epoch) + elif decay_mode == "cosine_decay_warmup": + step_each_epoch = params['step_each_epoch'] + total_epoch = params['total_epoch'] + warmup_minibatch = params.get("warmup_minibatch", 1000) + base_lr = cosine_decay_with_warmup( + learning_rate=base_lr, + step_each_epoch=step_each_epoch, + epochs=total_epoch, + warmup_minibatch=warmup_minibatch) elif decay_mode == "piecewise_decay": boundaries = params["boundaries"] decay_rate = params["decay_rate"] @@ -104,5 +151,5 @@ def RMSProp(params, parameter_list=None): optimizer = fluid.optimizer.RMSProp( learning_rate=base_lr, regularization=fluid.regularizer.L2Decay(regularization_coeff=l2_decay)) - - return optimizer \ No newline at end of file + + return optimizer diff --git a/requirments.txt b/requirments.txt index 94e8478ffad88a6e5cd69424c6aa485400cfae06..ec538138beaed70ec8f5285ea0c4114f22e3b0ef 100644 --- a/requirments.txt +++ b/requirments.txt @@ -1,4 +1,6 @@ shapely imgaug pyclipper -lmdb \ No newline at end of file +lmdb +tqdm +numpy \ No newline at end of file diff --git a/setup.py b/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..7141f170f3afa2be5217faff66a2aeb12dbefcbe --- /dev/null +++ b/setup.py @@ -0,0 +1,56 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from setuptools import setup +from io import open + +with open('requirments.txt', encoding="utf-8-sig") as f: + requirements = f.readlines() + requirements.append('tqdm') + + +def readme(): + with open('doc/doc_en/whl_en.md', encoding="utf-8-sig") as f: + README = f.read() + return README + + +setup( + name='paddleocr', + packages=['paddleocr'], + package_dir={'paddleocr': ''}, + include_package_data=True, + entry_points={"console_scripts": ["paddleocr= paddleocr.paddleocr:main"]}, + version='0.0.3', + install_requires=requirements, + license='Apache License 2.0', + description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices', + long_description=readme(), + long_description_content_type='text/markdown', + url='https://github.com/PaddlePaddle/PaddleOCR', + download_url='https://github.com/PaddlePaddle/PaddleOCR.git', + keywords=[ + 'ocr textdetection textrecognition paddleocr crnn east star-net rosetta ocrlite db chineseocr chinesetextdetection chinesetextrecognition' + ], + classifiers=[ + 'Intended Audience :: Developers', 'Operating System :: OS Independent', + 'Natural Language :: Chinese (Simplified)', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.2', + 'Programming Language :: Python :: 3.3', + 'Programming Language :: Python :: 3.4', + 'Programming Language :: Python :: 3.5', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', 'Topic :: Utilities' + ], ) diff --git a/tools/infer/predict_det.py b/tools/infer/predict_det.py index 82877c0ef71b56e6afb4ea43725981640c0e8c64..625f87abc39fc0e9d7683f72dafec1d53324873a 100755 --- a/tools/infer/predict_det.py +++ b/tools/infer/predict_det.py @@ -17,28 +17,32 @@ __dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) +import cv2 +import copy +import numpy as np +import math +import time +import sys + +import paddle.fluid as fluid + import tools.infer.utility as utility from ppocr.utils.utility import initial_logger logger = initial_logger() from ppocr.utils.utility import get_image_file_list, check_and_read_gif -import cv2 from ppocr.data.det.sast_process import SASTProcessTest from ppocr.data.det.east_process import EASTProcessTest from ppocr.data.det.db_process import DBProcessTest from ppocr.postprocess.db_postprocess import DBPostProcess from ppocr.postprocess.east_postprocess import EASTPostPocess from ppocr.postprocess.sast_postprocess import SASTPostProcess -import copy -import numpy as np -import math -import time -import sys class TextDetector(object): def __init__(self, args): max_side_len = args.det_max_side_len self.det_algorithm = args.det_algorithm + self.use_zero_copy_run = args.use_zero_copy_run preprocess_params = {'max_side_len': max_side_len} postprocess_params = {} if self.det_algorithm == "DB": @@ -127,7 +131,7 @@ class TextDetector(object): dt_boxes_new.append(box) dt_boxes = np.array(dt_boxes_new) return dt_boxes - + def __call__(self, img): ori_im = img.copy() im, ratio_list = self.preprocess_op(img) @@ -135,8 +139,12 @@ class TextDetector(object): return None, 0 im = im.copy() starttime = time.time() - self.input_tensor.copy_from_cpu(im) - self.predictor.zero_copy_run() + if self.use_zero_copy_run: + self.input_tensor.copy_from_cpu(im) + self.predictor.zero_copy_run() + else: + im = fluid.core.PaddleTensor(im) + self.predictor.run([im]) outputs = [] for output_tensor in self.output_tensors: output = output_tensor.copy_to_cpu() @@ -152,7 +160,7 @@ class TextDetector(object): outs_dict['f_tvo'] = outputs[3] else: outs_dict['maps'] = outputs[0] - + dt_boxes_list = self.postprocess_op(outs_dict, [ratio_list]) dt_boxes = dt_boxes_list[0] if self.det_algorithm == "SAST" and self.det_sast_polygon: diff --git a/tools/infer/predict_rec.py b/tools/infer/predict_rec.py index c81b4eb2560ee5ad66a85c96efe4de935a2beee1..6a379853a4a7d62cbffcbebbf09e2fb3e2207b27 100755 --- a/tools/infer/predict_rec.py +++ b/tools/infer/predict_rec.py @@ -17,15 +17,18 @@ __dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) -import tools.infer.utility as utility -from ppocr.utils.utility import initial_logger -logger = initial_logger() -from ppocr.utils.utility import get_image_file_list, check_and_read_gif import cv2 import copy import numpy as np import math import time + +import paddle.fluid as fluid + +import tools.infer.utility as utility +from ppocr.utils.utility import initial_logger +logger = initial_logger() +from ppocr.utils.utility import get_image_file_list, check_and_read_gif from ppocr.utils.character import CharacterOps @@ -37,6 +40,7 @@ class TextRecognizer(object): self.character_type = args.rec_char_type self.rec_batch_num = args.rec_batch_num self.rec_algorithm = args.rec_algorithm + self.use_zero_copy_run = args.use_zero_copy_run char_ops_params = { "character_type": args.rec_char_type, "character_dict_path": args.rec_char_dict_path, @@ -102,8 +106,12 @@ class TextRecognizer(object): norm_img_batch = np.concatenate(norm_img_batch) norm_img_batch = norm_img_batch.copy() starttime = time.time() - self.input_tensor.copy_from_cpu(norm_img_batch) - self.predictor.zero_copy_run() + if self.use_zero_copy_run: + self.input_tensor.copy_from_cpu(norm_img_batch) + self.predictor.zero_copy_run() + else: + norm_img_batch = fluid.core.PaddleTensor(norm_img_batch) + self.predictor.run([norm_img_batch]) if self.loss_type == "ctc": rec_idx_batch = self.output_tensors[0].copy_to_cpu() diff --git a/tools/infer/predict_system.py b/tools/infer/predict_system.py index f8a62679bc17d10380983319a3f239d4a7339646..647a76b20496335cd059242890f86fffe1e3ac1a 100755 --- a/tools/infer/predict_system.py +++ b/tools/infer/predict_system.py @@ -157,7 +157,6 @@ def main(args): boxes, txts, scores, - draw_txt=True, drop_score=drop_score) draw_img_save = "./inference_results/" if not os.path.exists(draw_img_save): diff --git a/tools/infer/utility.py b/tools/infer/utility.py index 392bc4dfa5831ab64c0ed920ea3e9bfdea04925d..9d7ce13d37567ac80e194a6500a0f629ede4b1d4 100755 --- a/tools/infer/utility.py +++ b/tools/infer/utility.py @@ -71,6 +71,7 @@ def parse_args(): default="./ppocr/utils/ppocr_keys_v1.txt") parser.add_argument("--use_space_char", type=bool, default=True) parser.add_argument("--enable_mkldnn", type=bool, default=False) + parser.add_argument("--use_zero_copy_run", type=bool, default=False) return parser.parse_args() @@ -105,9 +106,12 @@ def create_predictor(args, mode): #config.enable_memory_optim() config.disable_glog_info() - # use zero copy - config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") - config.switch_use_feed_fetch_ops(False) + if args.use_zero_copy_run: + config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") + config.switch_use_feed_fetch_ops(False) + else: + config.switch_use_feed_fetch_ops(True) + predictor = create_paddle_predictor(config) input_names = predictor.get_input_names() input_tensor = predictor.get_input_tensor(input_names[0]) @@ -139,7 +143,12 @@ def resize_img(img, input_size=600): return im -def draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5): +def draw_ocr(image, + boxes, + txts=None, + scores=None, + drop_score=0.5, + font_path="./doc/simfang.ttf"): """ Visualize the results of OCR detection and recognition args: @@ -147,23 +156,29 @@ def draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5): boxes(list): boxes with shape(N, 4, 2) txts(list): the texts scores(list): txxs corresponding scores - draw_txt(bool): whether draw text or not drop_score(float): only scores greater than drop_threshold will be visualized + font_path: the path of font which is used to draw text return(array): the visualized img """ if scores is None: scores = [1] * len(boxes) - for (box, score) in zip(boxes, scores): - if score < drop_score or math.isnan(score): + box_num = len(boxes) + for i in range(box_num): + if scores is not None and (scores[i] < drop_score or + math.isnan(scores[i])): continue - box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) + box = np.reshape(np.array(boxes[i]), [-1, 1, 2]).astype(np.int64) image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) - - if draw_txt: + if txts is not None: img = np.array(resize_img(image, input_size=600)) txt_img = text_visual( - txts, scores, img_h=img.shape[0], img_w=600, threshold=drop_score) + txts, + scores, + img_h=img.shape[0], + img_w=600, + threshold=drop_score, + font_path=font_path) img = np.concatenate([np.array(img), np.array(txt_img)], axis=1) return img return image @@ -241,7 +256,12 @@ def str_count(s): return s_len - math.ceil(en_dg_count / 2) -def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): +def text_visual(texts, + scores, + img_h=400, + img_w=600, + threshold=0., + font_path="./doc/simfang.ttf"): """ create new blank img and draw txt on it args: @@ -249,6 +269,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): scores(list|None): corresponding score of each txt img_h(int): the height of blank img img_w(int): the width of blank img + font_path: the path of font which is used to draw text return(array): """ @@ -267,7 +288,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): font_size = 20 txt_color = (0, 0, 0) - font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8") + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") gap = font_size + 5 txt_img_list = [] @@ -348,6 +369,6 @@ if __name__ == '__main__': txts.append(dic['transcription']) scores.append(round(dic['scores'], 3)) - new_img = draw_ocr(image, boxes, txts, scores, draw_txt=True) + new_img = draw_ocr(image, boxes, txts, scores) cv2.imwrite(img_name, new_img)