diff --git a/PPOCRLabel/README.md b/PPOCRLabel/README.md index 9b882812f33a781a448a4f0a89fe15c349f587ae..c28ad72aeb68459c85058f634236e71d00767ca4 100644 --- a/PPOCRLabel/README.md +++ b/PPOCRLabel/README.md @@ -164,10 +164,10 @@ python PPOCRLabel.py - Default model: PPOCRLabel uses the Chinese and English ultra-lightweight OCR model in PaddleOCR by default, supports Chinese, English and number recognition, and multiple language detection. -- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languages​include French, German, Korean, and Japanese. +- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languages include French, German, Korean, and Japanese. For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating) -- **Custom Model**: If users want to replace the built-in model with their own inference model, they can follow the [Custom Model Code Usage](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/whl_en.md#31-use-by-code) by modifying PPOCRLabel.py for [Instantiation of PaddleOCR class](https://github.com/PaddlePaddle/PaddleOCR/blob/release/ 2.3/PPOCRLabel/PPOCRLabel.py#L116) : +- **Custom Model**: If users want to replace the built-in model with their own inference model, they can follow the [Custom Model Code Usage](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/whl_en.md#31-use-by-code) by modifying PPOCRLabel.py for [Instantiation of PaddleOCR class](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/PPOCRLabel/PPOCRLabel.py#L116) : add parameter `det_model_dir` in `self.ocr = PaddleOCR(use_pdserving=False, use_angle_cls=True, det=True, cls=True, use_gpu=gpu, lang=lang) ` diff --git a/PPOCRLabel/README_ch.md b/PPOCRLabel/README_ch.md index 9cd11a7c5c4cbe9a02d9921e32017bb23be77c88..73899d8c4b4abb7f9a8028788a1639675165f676 100644 --- a/PPOCRLabel/README_ch.md +++ b/PPOCRLabel/README_ch.md @@ -73,25 +73,26 @@ PPOCRLabel --lang ch # 启动 > 如果上述安装出现问题,可以参考3.6节 错误提示 -#### 1.2.2 本地构建whl包并安装 +#### 1.2.2 通过Python脚本运行PPOCRLabel + +如果您对PPOCRLabel文件有所更改(例如指定新的内置模型),通过Python脚本运行会更加方面的看到更改的结果。如果仍然需要通过whl包启动,则需要参考下节重新编译whl包。 ```bash -cd PaddleOCR/PPOCRLabel -python3 setup.py bdist_wheel -pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple +cd ./PPOCRLabel # 切换到PPOCRLabel目录 +python PPOCRLabel.py --lang ch ``` -#### 1.2.3 通过Python脚本运行PPOCRLabel +#### 1.2.3 本地构建whl包并安装 -如果您对PPOCRLabel文件有所更改,通过Python脚本运行会更加方面的看到更改的结果 +编译与安装新的whl包,其中1.0.2为版本号,可在 `setup.py` 中指定新版本。 ```bash -cd ./PPOCRLabel # 切换到PPOCRLabel目录 -python PPOCRLabel.py --lang ch +cd PaddleOCR/PPOCRLabel +python3 setup.py bdist_wheel +pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple ``` - ## 2. 使用 ### 2.1 操作步骤 diff --git a/README.md b/README.md index 251d51c0080fe5f2cd9fec76479526d142de368f..45f81a5838ad8a335dd4b2be70eff074aa65e57e 100644 --- a/README.md +++ b/README.md @@ -23,8 +23,8 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools **Recent updates** -- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207 -- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR, [tutorial](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)) and 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM, [tutorial](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa)). +- 2021.3 release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR, [tutorial](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)) and 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM, [tutorial](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa)). +- 2021.12.21 release PaddleOCR v2.4, OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207 - PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Course Address](https://aistudio.baidu.com/aistudio/education/group/info/6758). - 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. - 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). @@ -104,7 +104,7 @@ For a new language request, please refer to [Guideline for new language_requests - [Quick Start](./doc/doc_en/quickstart_en.md) - [PaddleOCR Overview and Installation](./doc/doc_en/paddleOCR_overview_en.md) - PP-OCR Industry Landing: from Training to Deployment - - [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md) + - [PP-OCR Model Zoo](./doc/doc_en/models_en.md) - [PP-OCR Model Download](./doc/doc_en/models_list_en.md) - [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md) - [PP-OCR Training](./doc/doc_en/training_en.md) @@ -112,6 +112,10 @@ For a new language request, please refer to [Guideline for new language_requests - [Text Recognition](./doc/doc_en/recognition_en.md) - [Text Direction Classification](./doc/doc_en/angle_class_en.md) - [Yml Configuration](./doc/doc_en/config_en.md) + - PP-OCR Models Compression + - [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md) + - [Model Quantization](./deploy/slim/quantization/README_en.md) + - [Model Pruning](./deploy/slim/prune/README_en.md) - Inference and Deployment - [C++ Inference](./deploy/cpp_infer/readme_en.md) - [Serving](./deploy/pdserving/README.md) diff --git a/README_ch.md b/README_ch.md index cf3cde61bc10e9d8ccea0d838853ec27ef37e20d..4c9a6ab2952dce0935e9093a48b66e654ef31030 100755 --- a/README_ch.md +++ b/README_ch.md @@ -18,9 +18,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力开发者训练出更好的模型,并应用落地。 ## 近期更新 - -- 2021.12.21《动手学OCR · 十讲》课程开讲,12月21日起每晚八点半线上授课![免费报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207)。 -- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR,[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM,[文档](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa))。 +- 2021.3 OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR,[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM,[文档](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa))。 +- 2021.12.21 发布PaddleOCR v2.4,《动手学OCR · 十讲》课程开讲,12月21日起每晚八点半线上授课![免费报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207)。 - PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[课程回放](https://aistudio.baidu.com/aistudio/education/group/info/6758)。 - 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。 - 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。 @@ -79,22 +78,26 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 文档教程 - [运行环境准备](./doc/doc_ch/environment.md) -- [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md) +- [快速开始(中英文/多语言/版面分析)](./doc/doc_ch/quickstart.md) - [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md) - PP-OCR产业落地:从训练到部署 - - [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md) + - [PP-OCR模型库](./doc/doc_ch/models.md) - [PP-OCR模型下载](./doc/doc_ch/models_list.md) - - [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md) + - [Python引擎的PP-OCR模型库推理](./doc/doc_ch/inference_ppocr.md) - [PP-OCR模型训练](./doc/doc_ch/training.md) - [文本检测](./doc/doc_ch/detection.md) - [文本识别](./doc/doc_ch/recognition.md) - [文本方向分类器](./doc/doc_ch/angle_class.md) - - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md) - [配置文件内容与生成](./doc/doc_ch/config.md) + - PP-OCR模型压缩 + - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md) + - [模型量化](./deploy/slim/quantization/README.md) + - [模型裁剪](./deploy/slim/prune/README.md) - PP-OCR模型推理部署 - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [服务化部署](./deploy/pdserving/README_CN.md) - [端侧部署](./deploy/lite/readme.md) + - [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md) - [Benchmark](./doc/doc_ch/benchmark.md) - [PP-Structure信息提取](./ppstructure/README_ch.md) - [版面分析](./ppstructure/layout/README_ch.md) diff --git a/deploy/paddle2onnx/readme.md b/deploy/paddle2onnx/readme.md index e08f2adee5d315cecba703ecdf515c09cd1569d2..8e821892142d65caddd6fa3bd8ff24a372fe9a5d 100644 --- a/deploy/paddle2onnx/readme.md +++ b/deploy/paddle2onnx/readme.md @@ -1,4 +1,4 @@ -# paddle2onnx 模型转化与预测 +# Paddle2ONNX模型转化与预测 本章节介绍 PaddleOCR 模型如何转化为 ONNX 模型,并基于 ONNXRuntime 引擎预测。 diff --git a/deploy/pdserving/README.md b/deploy/pdserving/README.md index 7ee001423084be2ed300135f706ed22f7e63a3ab..7ed52af90df653251e2501a032b26a00d9b96984 100644 --- a/deploy/pdserving/README.md +++ b/deploy/pdserving/README.md @@ -30,29 +30,31 @@ The introduction and tutorial of Paddle Serving service deployment framework ref PaddleOCR operating environment and Paddle Serving operating environment are needed. 1. Please prepare PaddleOCR operating environment reference [link](../../doc/doc_ch/installation.md). - Download the corresponding paddlepaddle whl package according to the environment, it is recommended to install version 2.2.2. + Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.2.2 2. The steps of PaddleServing operating environment prepare are as follows: - ```bash - # Install serving which used to start the service - wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl - pip3 install paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl - # Install paddle-serving-server for cuda10.1 - # wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl - # pip3 install paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl - # Install serving which used to start the service - wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp37-none-any.whl - pip3 install paddle_serving_client-0.7.0-cp37-none-any.whl +```bash +# Install serving which used to start the service +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl +pip3 install paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl +# Install paddle-serving-server for cuda10.1 +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl +# pip3 install paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl - # Install serving-app - wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.7.0-py3-none-any.whl - pip3 install paddle_serving_app-0.7.0-py3-none-any.whl - ``` +# Install serving which used to start the service +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp37-none-any.whl +pip3 install paddle_serving_client-0.8.3-cp37-none-any.whl - **note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md). +# Install serving-app +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.8.3-py3-none-any.whl +pip3 install paddle_serving_app-0.8.3-py3-none-any.whl +``` + + +**note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md). @@ -187,6 +189,26 @@ The recognition model is the same. 2021-05-13 03:42:36,979 chl1(In: ['det'], Out: ['rec']) size[6/0] 2021-05-13 03:42:36,979 chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0] ``` +## C++ Serving + +1. Compile Serving + + To improve predictive performance, C++ services also provide multiple model concatenation services. Unlike Python Pipeline services, multiple model concatenation requires the pre - and post-model processing code to be written on the server side, so local recompilation is required to generate serving. Specific may refer to the official document: [how to compile Serving](https://github.com/PaddlePaddle/Serving/blob/v0.8.3/doc/Compile_EN.md) + +2. Run the following command to start the service. + ``` + # Start the service and save the running log in log.txt + python3 -m paddle_serving_server.serve --model ppocrv2_det_serving ppocrv2_rec_serving --op GeneralDetectionOp GeneralRecOp --port 9293 &>log.txt & + ``` + After the service is successfully started, a log similar to the following will be printed in log.txt + ![](./imgs/start_server.png) + +3. Send service request + ``` + python3 ocr_cpp_client.py ppocrv2_det_client ppocrv2_rec_client + ``` + After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is: + ![](./imgs/results.png) ## WINDOWS Users diff --git a/deploy/pdserving/README_CN.md b/deploy/pdserving/README_CN.md index d50aff4810d7421450a696e243b8e796f26793d7..aad9e14e504481b8f9d113e6e293bfe4609d57b3 100644 --- a/deploy/pdserving/README_CN.md +++ b/deploy/pdserving/README_CN.md @@ -8,8 +8,7 @@ PaddleOCR提供2种服务部署方式: # 基于PaddleServing的服务部署 -本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PPOCR -动态图模型的pipeline在线服务。 +本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PP-OCR动态图模型的pipeline在线服务。 相比较于hubserving部署,PaddleServing具备以下优点: - 支持客户端和服务端之间高并发和高效通信 @@ -22,6 +21,7 @@ PaddleOCR提供2种服务部署方式: - [环境准备](#环境准备) - [模型转换](#模型转换) - [Paddle Serving pipeline部署](#部署) +- [Paddle Serving C++ 部署](#C++) - [Windows用户](#Windows用户) - [FAQ](#FAQ) @@ -31,35 +31,37 @@ PaddleOCR提供2种服务部署方式: 需要准备PaddleOCR的运行环境和Paddle Serving的运行环境。 - 准备PaddleOCR的运行环境[链接](../../doc/doc_ch/installation.md) + 根据环境下载对应的paddlepaddle whl包,推荐安装2.2.2版本 - 准备PaddleServing的运行环境,步骤如下 ```bash # 安装serving,用于启动服务 -wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl -pip3 install paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl +pip3 install paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl # 如果是cuda10.1环境,可以使用下面的命令安装paddle-serving-server -# wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl -# pip3 install paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl +# pip3 install paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl # 安装client,用于向服务发送请求 -wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp37-none-any.whl -pip3 install paddle_serving_client-0.7.0-cp37-none-any.whl +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp37-none-any.whl +pip3 install paddle_serving_client-0.8.3-cp37-none-any.whl + # 安装serving-app -wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.7.0-py3-none-any.whl -pip3 install paddle_serving_app-0.7.0-py3-none-any.whl +wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.8.3-py3-none-any.whl +pip3 install paddle_serving_app-0.8.3-py3-none-any.whl ``` -**Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md)。 +**Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/v0.8.3/doc/Latest_Packages_CN.md)。 ## 模型转换 使用PaddleServing做服务化部署时,需要将保存的inference模型转换为serving易于部署的模型。 -首先,下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th) +首先,下载PP-OCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th) ```bash # 下载并解压 OCR 文本检测模型 @@ -188,6 +190,45 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \ 2021-05-13 03:42:36,979 chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0] ``` + +## Paddle Serving C++ 部署 + +基于python的服务部署,显然具有二次开发便捷的优势,然而真正落地应用,往往需要追求更优的性能。PaddleServing 也提供了性能更优的C++部署版本。 + +C++ 服务部署在环境搭建和数据准备阶段与 python 相同,区别在于启动服务和客户端发送请求时不同。 + +1. 准备 Serving 环境 + +为了提高预测性能,C++ 服务同样提供了多模型串联服务。与python pipeline服务不同,多模型串联的过程中需要将模型前后处理代码写在服务端,因此需要在本地重新编译生成serving。具体可参考官方文档:[如何编译Serving](https://github.com/PaddlePaddle/Serving/blob/v0.8.3/doc/Compile_CN.md) + +完成编译后,注意要安装编译出的三个whl包,并设置SERVING_BIN环境变量。 + +2. 启动服务可运行如下命令: + +一个服务启动两个模型串联,只需要在--model后依次按顺序传入模型文件夹的相对路径,且需要在--op后依次传入自定义C++OP类名称: + + ``` + # 启动服务,运行日志保存在log.txt + python3 -m paddle_serving_server.serve --model ppocrv2_det_serving ppocrv2_rec_serving --op GeneralDetectionOp GeneralRecOp --port 9293 &>log.txt & + ``` + 成功启动服务后,log.txt中会打印类似如下日志 + ![](./imgs/start_server.png) + +3. 发送服务请求: + ``` + python3 ocr_cpp_client.py ppocrv2_det_client ppocrv2_rec_client + ``` + + 成功运行后,模型预测的结果会打印在cmd窗口中,结果示例为: + ![](./imgs/results.png) + + 在浏览器中输入服务器 ip:端口号,可以看到当前服务的实时QPS。(端口号范围需要是8000-9000) + + 在200张真实图片上测试,把检测长边限制为960。T4 GPU 上 QPS 峰值可达到51左右,约为pipeline的 2.12 倍。 + + ![](./imgs/c++_qps.png) + + ## Windows用户 diff --git a/deploy/pdserving/imgs/c++_qps.png b/deploy/pdserving/imgs/c++_qps.png new file mode 100644 index 0000000000000000000000000000000000000000..dc406acd624ea3f5fd51a56ae7c6d299c8211b48 Binary files /dev/null and b/deploy/pdserving/imgs/c++_qps.png differ diff --git a/deploy/pdserving/ocr_cpp_client.py b/deploy/pdserving/ocr_cpp_client.py index 2baa7565ac78b9551c788c7b36457bce38828eb5..21c5537fdfdf80363d70d2f493c8fb22386c70ac 100755 --- a/deploy/pdserving/ocr_cpp_client.py +++ b/deploy/pdserving/ocr_cpp_client.py @@ -45,7 +45,6 @@ for img_file in os.listdir(test_img_dir): image_data = file.read() image = cv2_to_base64(image_data) res_list = [] - #print(image) fetch_map = client.predict( feed={"x": image}, fetch=["save_infer_model/scale_0.tmp_1"], batch=True) print("fetrch map:", fetch_map) diff --git a/deploy/slim/prune/README.md b/deploy/slim/prune/README.md index 7b8dd169c5fa9d01421070f1ccc2bd4e8ed543a2..6d04f1648705071d70c1e9f17cd30d6825f92467 100644 --- a/deploy/slim/prune/README.md +++ b/deploy/slim/prune/README.md @@ -1,5 +1,5 @@ -## 介绍 +# PP-OCR模型裁剪 复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。 @@ -7,13 +7,13 @@ 在开始本教程之前,建议先了解: -1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md) +1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md) 2. [模型裁剪教程](https://github.com/PaddlePaddle/PaddleSlim/blob/release%2F2.0.0/docs/zh_cn/tutorials/pruning/dygraph/filter_pruning.md) - ## 快速开始 模型裁剪主要包括四个步骤: + 1. 安装 PaddleSlim 2. 准备训练好的模型 3. 敏感度分析、裁剪训练 @@ -35,16 +35,19 @@ python3 setup.py install 加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,得到敏感度文件:sen.pickle,可以通过PaddleSlim提供的[接口](https://github.com/PaddlePaddle/PaddleSlim/blob/9b01b195f0c4bc34a1ab434751cb260e13d64d9e/paddleslim/dygraph/prune/filter_pruner.py#L75)加载文件,获得各网络层在不同裁剪比例下的精度损失。从而了解各网络层冗余度,决定每个网络层的裁剪比例。 敏感度文件内容格式: - sen.pickle(Dict){ +``` +sen.pickle(Dict){ 'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} 'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} } - 例子: +例子: { 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} } +``` + 加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) 进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析训练: diff --git a/deploy/slim/prune/README_en.md b/deploy/slim/prune/README_en.md index f0d652f249686c1d462cd2aa71f4766cf39e763e..aca8d79290016d4602a86ef04fd4e8fa24a39ad7 100644 --- a/deploy/slim/prune/README_en.md +++ b/deploy/slim/prune/README_en.md @@ -1,9 +1,9 @@ -## Introduction +# PP-OCR Models Pruning Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance. -This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model. +This example uses PaddleSlim provided [APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model. [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry. It is recommended that you could understand following pages before reading this example: @@ -37,25 +37,26 @@ PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en. After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md) The data format of sensitivity file: - sen.pickle(Dict){ + +``` +sen.pickle(Dict){ 'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} 'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} } - - example: +example: { 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} } +``` + The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command: ```bash - python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model" Global.save_model_dir=./output/prune_model/ - ``` diff --git a/deploy/slim/quantization/README.md b/deploy/slim/quantization/README.md index 62bc408f5eeda6d8366834200e8d8a20d1dc82cd..8d3f779e0028a62d8396601166283f0ee54d43a7 100644 --- a/deploy/slim/quantization/README.md +++ b/deploy/slim/quantization/README.md @@ -1,12 +1,12 @@ -## 介绍 +# PP-OCR模型量化 复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。 [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。 -在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html) +在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html) ## 快速开始 diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md index 4cafe5f44e48a479cf5b0e4209b8e335a7e4917d..e9e0933d353afca13619aff61b19a0c4242b5653 100644 --- a/deploy/slim/quantization/README_en.md +++ b/deploy/slim/quantization/README_en.md @@ -1,5 +1,5 @@ -## Introduction +# PP-OCR Models Quantization Generally, a more complex model would achieve better performance in the task, but it also leads to some redundancy in the model. Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, diff --git a/doc/doc_ch/environment.md b/doc/doc_ch/environment.md index 3a266c4bb8fe5516f844bea9f0aa21359d51660e..23bec4b978ab34f144a2ec7256e09412f5440646 100644 --- a/doc/doc_ch/environment.md +++ b/doc/doc_ch/environment.md @@ -1,20 +1,19 @@ # 运行环境准备 -Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建PyThon环境。 +Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建Python环境。 推荐环境: -- PaddlePaddle >= 2.0.0 (2.1.2) -- python3.7 +- PaddlePaddle >= 2.1.2 +- Python 3.7 - CUDA10.1 / CUDA10.2 - CUDNN 7.6 -如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。 +> 如果您已经安装Python环境,可以直接参考[PaddleOCR快速开始](./quickstart.md) * [1. Python环境搭建](#1) + [1.1 Windows](#1.1) + [1.2 Mac](#1.2) + [1.3 Linux](#1.3) -* [2. 安装PaddlePaddle](#2) @@ -212,7 +211,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh # 若您要下载其他版本,需要将最后1个/后的文件名改成您希望下载的版本 - ``` + ``` - 安装Anaconda: @@ -311,21 +310,3 @@ sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=hos # ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令 sudo docker container exec -it ppocr /bin/bash ``` - - - -## 2. 安装PaddlePaddle - -- 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 - -```bash -python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple -``` - -- 如果您的机器是CPU,请运行以下命令安装 - -```bash -python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple -``` - -更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 diff --git a/doc/doc_ch/models_and_config.md b/doc/doc_ch/models_and_config.md deleted file mode 100644 index 89afc89a99bed364fd2abe247946dfe9e552ae86..0000000000000000000000000000000000000000 --- a/doc/doc_ch/models_and_config.md +++ /dev/null @@ -1,47 +0,0 @@ - -# PP-OCR模型与配置文件 -PP-OCR模型与配置文件一章主要补充一些OCR模型的基本概念、配置文件的内容与作用以便对模型后续的参数调整和训练中拥有更好的体验。 - -本章包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference_ppocr.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。 - ------- - -下面我们首先了解一些OCR相关的基本概念: - -- [1. OCR 简要介绍](#1-ocr-----) - * [1.1 OCR 检测模型基本概念](#11-ocr---------) - * [1.2 OCR 识别模型基本概念](#12-ocr---------) - * [1.3 PP-OCR模型](#13-pp-ocr--) - - -## 1. OCR 简要介绍 -本节简要介绍OCR检测模型、识别模型的基本概念,并介绍PaddleOCR的PP-OCR模型。 - -OCR(Optical Character Recognition,光学字符识别)目前是文字识别的统称,已不限于文档或书本文字识别,更包括识别自然场景下的文字,又可以称为STR(Scene Text Recognition)。 - -OCR文字识别一般包括两个部分,文本检测和文本识别;文本检测首先利用检测算法检测到图像中的文本行;然后检测到的文本行用识别算法去识别到具体文字。 - - -### 1.1 OCR 检测模型基本概念 - -文本检测就是要定位图像中的文字区域,然后通常以边界框的形式将单词或文本行标记出来。传统的文字检测算法多是通过手工提取特征的方式,特点是速度快,简单场景效果好,但是面对自然场景,效果会大打折扣。当前多是采用深度学习方法来做。 - -基于深度学习的文本检测算法可以大致分为以下几类: -1. 基于目标检测的方法;一般是预测得到文本框后,通过NMS筛选得到最终文本框,多是四点文本框,对弯曲文本场景效果不理想。典型算法为EAST、Text Box等方法。 -2. 基于分割的方法;将文本行当成分割目标,然后通过分割结果构建外接文本框,可以处理弯曲文本,对于文本交叉场景问题效果不理想。典型算法为DB、PSENet等方法。 -3. 混合目标检测和分割的方法; - - -### 1.2 OCR 识别模型基本概念 - -OCR识别算法的输入数据一般是文本行,背景信息不多,文字占据主要部分,识别算法目前可以分为两类算法: -1. 基于CTC的方法;即识别算法的文字预测模块是基于CTC的,常用的算法组合为CNN+RNN+CTC。目前也有一些算法尝试在网络中加入transformer模块等等。 -2. 基于Attention的方法;即识别算法的文字预测模块是基于Attention的,常用算法组合是CNN+RNN+Attention。 - - -### 1.3 PP-OCR模型 - -PaddleOCR 中集成了很多OCR算法,文本检测算法有DB、EAST、SAST等等,文本识别算法有CRNN、RARE、StarNet、Rosetta、SRN等算法。 - -其中PaddleOCR针对中英文自然场景通用OCR,推出了PP-OCR系列模型,PP-OCR模型由DB+CRNN算法组成,利用海量中文数据训练加上模型调优方法,在中文场景上具备较高的文本检测识别能力。并且PaddleOCR推出了高精度超轻量PP-OCRv2模型,检测模型仅3M,识别模型仅8.5M,利用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)的模型量化方法,可以在保持精度不降低的情况下,将检测模型压缩到0.8M,识别压缩到3M,更加适用于移动端部署场景。 - diff --git a/doc/doc_ch/pgnet.md b/doc/doc_ch/pgnet.md index 0aee58ec1aca24d06305c47569fdf156df6ee874..1234502f7840a9d39e7f0c85b240d3a4e106ccc0 100644 --- a/doc/doc_ch/pgnet.md +++ b/doc/doc_ch/pgnet.md @@ -97,7 +97,7 @@ train.txt标注文件格式如下,文件名和标注信息中间用"\t"分隔 " 图像文件名 json.dumps编码的图像标注信息" rgb/img11.jpg [{"transcription": "ASRAMA", "points": [[214.0, 325.0], [235.0, 308.0], [259.0, 296.0], [286.0, 291.0], [313.0, 295.0], [338.0, 305.0], [362.0, 320.0], [349.0, 347.0], [330.0, 337.0], [310.0, 329.0], [290.0, 324.0], [269.0, 328.0], [249.0, 336.0], [231.0, 346.0]]}, {...}] ``` -json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 +json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的多点坐标(如:4点、8点以及14点等),从左上角的点开始顺时针排列。 `transcription` 表示当前文本框的文字,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。** 如果您想在其他数据集上训练,可以按照上述形式构建标注文件。 diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index 1e0d914140072416710a1b37d72ea88a038793ba..d2126192764fa32c7c7a3651b463b8b23240ea6c 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -1,6 +1,9 @@ # PaddleOCR快速开始 -- [1. 安装PaddleOCR whl包](#1) +- [1. 安装](#1) + - [1.1 安装PaddlePaddle](#11) + - [1.2 安装PaddleOCR whl包](#12) + - [2. 便捷使用](#2) - [2.1 命令行使用](#21) - [2.1.1 中英文模型](#211) @@ -9,10 +12,35 @@ - [2.2 Python脚本使用](#22) - [2.2.1 中英文与多语言使用](#221) - [2.2.2 版面分析](#222) +- [3.小结](#3) -## 1. 安装PaddleOCR whl包 +## 1. 安装 + + + +### 1.1 安装PaddlePaddle + +> 如果您没有基础的Python运行环境,请参考[运行环境准备](./environment.md)。 + +- 您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 + + ```bash + python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple + ``` + +- 您的机器是CPU,请运行以下命令安装 + + ```bash + python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + ``` + +更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 + + + +### 1.2 安装PaddleOCR whl包 ```bash pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本 @@ -257,3 +285,11 @@ im_show = draw_structure_result(image, result,font_path=font_path) im_show = Image.fromarray(im_show) im_show.save('result.jpg') ``` + + + +## 3. 小结 + +通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。 + +PaddleOCR是一套丰富领先实用的OCR工具库,打通数据、模型训练、压缩和推理部署全流程,因此在[下一节](./paddleOCR_overview.md)中我们将首先为您介绍PaddleOCR的全景图,然后克隆PaddleOCR项目,正式开启PaddleOCR的应用之旅。 diff --git a/doc/doc_en/environment_en.md b/doc/doc_en/environment_en.md index fc87f10c104628df0268bc6f8910c5914aeba225..6521d3c4144aa579be2075d14826e9dcb9ad9dd6 100644 --- a/doc/doc_en/environment_en.md +++ b/doc/doc_en/environment_en.md @@ -1,18 +1,19 @@ # Environment Preparation -Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. If you are familiar with the Python environment, you can skip to step 2 to install PaddlePaddle. +Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. Recommended working environment: -- PaddlePaddle >= 2.0.0 (2.1.2) +- PaddlePaddle >= 2.1.2 - Python 3.7 - CUDA 10.1 / CUDA 10.2 - cuDNN 7.6 +> If you already have a Python environment installed, you can skip to [PaddleOCR Quick Start](./quickstart_en.md). + * [1. Python Environment Setup](#1) + [1.1 Windows](#1.1) + [1.2 Mac](#1.2) + [1.3 Linux](#1.3) -* [2. Install PaddlePaddle 2.0](#2) @@ -330,21 +331,3 @@ You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags # ctrl+P+Q to exit docker, to re-enter docker using the following command: sudo docker container exec -it ppocr /bin/bash ``` - - - -## 2. Install PaddlePaddle 2.0 - -- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install - -```bash -python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple -``` - -- If you have no available GPU on your machine, please run the following command to install the CPU version - -```bash -python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple -``` - -For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. diff --git a/doc/doc_en/models_and_config_en.md b/doc/doc_en/models_and_config_en.md deleted file mode 100644 index c47fb5597eb56c823dff4c6d52cf3b114f3d9c0e..0000000000000000000000000000000000000000 --- a/doc/doc_en/models_and_config_en.md +++ /dev/null @@ -1,48 +0,0 @@ -# PP-OCR Model and Configuration -The chapter on PP-OCR model and configuration file mainly adds some basic concepts of OCR model and the content and role of configuration file to have a better experience in the subsequent parameter adjustment and training of the model. - -This chapter contains three parts. Firstly, [PP-OCR Model Download](./models_list_en.md) explains the concept of PP-OCR model types and provides links to download all models. Then in [Yml Configuration](./config_en.md) details the parameters needed to fine-tune the PP-OCR models. The final [Python Inference for PP-OCR Model Library](./inference_ppocr_en.md) is an introduction to the use of the PP-OCR model library in the first section, which can quickly utilize the rich model library models to obtain test results through the Python inference engine. - ------- - -Let's first understand some basic concepts. - -- [INTRODUCTION ABOUT OCR](#introduction-about-ocr) - * [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model) - * [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model) - * [PP-OCR model](#pp-ocr-model) - * [And a table of contents](#and-a-table-of-contents) - * [On the right](#on-the-right) - - -## 1. INTRODUCTION ABOUT OCR - -This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model. - -OCR (Optical Character Recognition, Optical Character Recognition) is currently the general term for text recognition. It is not limited to document or book text recognition, but also includes recognizing text in natural scenes. It can also be called STR (Scene Text Recognition). - -OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line. - - -### 1.1 BASIC CONCEPTS OF OCR DETECTION MODEL - -Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used. - -Text detection algorithms based on deep learning can be roughly divided into the following categories: -1. Method based on target detection. Generally, after the text box is predicted, the final text box is filtered through NMS, which is mostly four-point text box, which is not ideal for curved text scenes. Typical algorithms are methods such as EAST and Text Box. -2. Method based on text segmentation. The text line is regarded as the segmentation target, and then the external text box is constructed through the segmentation result, which can handle curved text, and the effect is not ideal for the text cross scene problem. Typical algorithms are DB, PSENet and other methods. -3. Hybrid target detection and segmentation method. - - -### 1.2 Basic concepts of OCR recognition model - -The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms: -1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on. -2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention. - - -### 1.3 PP-OCR model - -PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms. - -Among them, PaddleOCR has released the PP-OCR series model for the general OCR in Chinese and English natural scenes. The PP-OCR model is composed of the DB+CRNN algorithm. It uses massive Chinese data training and model tuning methods to have high text detection and recognition capabilities in Chinese scenes. And PaddleOCR has launched a high-precision and ultra-lightweight PP-OCRv2 model. The detection model is only 3M, and the recognition model is only 8.5M. Using [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)'s model quantification method, the detection model can be compressed to 0.8M without reducing the accuracy. The recognition is compressed to 3M, which is more suitable for mobile deployment scenarios. diff --git a/doc/doc_en/pgnet_en.md b/doc/doc_en/pgnet_en.md index c7cb3221ccfd897e2fd9062a828c2fe0ceb42024..69df605c3848d23868d7be21610dc8f8c12487e8 100644 --- a/doc/doc_en/pgnet_en.md +++ b/doc/doc_en/pgnet_en.md @@ -92,7 +92,7 @@ rgb/img11.jpg [{"transcription": "ASRAMA", "points": [[214.0, 325.0], [235.0, ``` The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries. -The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. +The `points` in the dictionary represent the multi-point coordinates (such as: 4 points, 8 points and 14 points, etc.) of the text box, arranged clockwise from the point at the upper left corner. `transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.** diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index 240a4ba11f3b7df0c518c841d9acee0ae88fcfa8..e44345a8e65f6efc94f83604590d980e052f2abd 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -1,7 +1,9 @@ # PaddleOCR Quick Start -+ [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package) ++ [1. Installation](#1installation) + + [1.1 Install PaddlePaddle](#11-install-paddlepaddle) + + [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package) * [2. Easy-to-Use](#2-easy-to-use) + [2.1 Use by Command Line](#21-use-by-command-line) - [2.1.1 English and Chinese Model](#211-english-and-chinese-model) @@ -10,12 +12,35 @@ + [2.2 Use by Code](#22-use-by-code) - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model) - [2.2.2 Layout Analysis](#222-layoutAnalysis) +* [3. Summary](#3) + +## 1. Installation - + -## 1. Install PaddleOCR Whl Package +### 1.1 Install PaddlePaddle + +> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md). + +- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install + + ```bash + python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple + ``` + +- If you have no available GPU on your machine, please run the following command to install the CPU version + + ```bash + python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + ``` + +For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. + + + +### 1.2 Install PaddleOCR Whl Package ```bash pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+ @@ -248,3 +273,11 @@ im_show = draw_structure_result(image, result,font_path=font_path) im_show = Image.fromarray(im_show) im_show.save('result.jpg') ``` + + + +## 3. Summary + +In this section, you have mastered the use of PaddleOCR whl packages and obtained results. + +PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR. diff --git a/ppocr/losses/det_pse_loss.py b/ppocr/losses/det_pse_loss.py index 9b8ac4b5a5dfac176c398dd0a9e490e5ca67ad5f..6b31343ed4d1687ee8ca44592fba0331b0b287dc 100644 --- a/ppocr/losses/det_pse_loss.py +++ b/ppocr/losses/det_pse_loss.py @@ -121,9 +121,9 @@ class PSELoss(nn.Layer): if neg_num == 0: selected_mask = training_mask - selected_mask = selected_mask.view( - 1, selected_mask.shape[0], - selected_mask.shape[1]).astype('float32') + selected_mask = selected_mask.reshape( + [1, selected_mask.shape[0], selected_mask.shape[1]]).astype( + 'float32') return selected_mask neg_score = paddle.masked_select(score, gt_text <= 0.5)