diff --git a/README.md b/README.md index 251d51c0080fe5f2cd9fec76479526d142de368f..80b1f92b2419d2c67096cbc64a6fba05648272cc 100644 --- a/README.md +++ b/README.md @@ -104,7 +104,7 @@ For a new language request, please refer to [Guideline for new language_requests - [Quick Start](./doc/doc_en/quickstart_en.md) - [PaddleOCR Overview and Installation](./doc/doc_en/paddleOCR_overview_en.md) - PP-OCR Industry Landing: from Training to Deployment - - [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md) + - [PP-OCR Model Zoo](./doc/doc_en/models_en.md) - [PP-OCR Model Download](./doc/doc_en/models_list_en.md) - [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md) - [PP-OCR Training](./doc/doc_en/training_en.md) @@ -112,6 +112,10 @@ For a new language request, please refer to [Guideline for new language_requests - [Text Recognition](./doc/doc_en/recognition_en.md) - [Text Direction Classification](./doc/doc_en/angle_class_en.md) - [Yml Configuration](./doc/doc_en/config_en.md) + - PP-OCR Models Compression + - [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md) + - [Model Quantization](./deploy/slim/quantization/README_en.md) + - [Model Pruning](./deploy/slim/prune/README_en.md) - Inference and Deployment - [C++ Inference](./deploy/cpp_infer/readme_en.md) - [Serving](./deploy/pdserving/README.md) diff --git a/README_ch.md b/README_ch.md index cf3cde61bc10e9d8ccea0d838853ec27ef37e20d..df713816faef83ffede1c6bb2a718afcb1c2bb3a 100755 --- a/README_ch.md +++ b/README_ch.md @@ -79,22 +79,26 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 文档教程 - [运行环境准备](./doc/doc_ch/environment.md) -- [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md) +- [快速开始(中英文/多语言/版面分析)](./doc/doc_ch/quickstart.md) - [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md) - PP-OCR产业落地:从训练到部署 - - [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md) + - [PP-OCR模型库](./doc/doc_ch/models.md) - [PP-OCR模型下载](./doc/doc_ch/models_list.md) - - [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md) + - [Python引擎的PP-OCR模型库推理](./doc/doc_ch/inference_ppocr.md) - [PP-OCR模型训练](./doc/doc_ch/training.md) - [文本检测](./doc/doc_ch/detection.md) - [文本识别](./doc/doc_ch/recognition.md) - [文本方向分类器](./doc/doc_ch/angle_class.md) - - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md) - [配置文件内容与生成](./doc/doc_ch/config.md) + - PP-OCR模型压缩 + - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md) + - [模型量化](./deploy/slim/quantization/README.md) + - [模型裁剪](./deploy/slim/prune/README.md) - PP-OCR模型推理部署 - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [服务化部署](./deploy/pdserving/README_CN.md) - [端侧部署](./deploy/lite/readme.md) + - [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md) - [Benchmark](./doc/doc_ch/benchmark.md) - [PP-Structure信息提取](./ppstructure/README_ch.md) - [版面分析](./ppstructure/layout/README_ch.md) diff --git a/deploy/paddle2onnx/readme.md b/deploy/paddle2onnx/readme.md index e08f2adee5d315cecba703ecdf515c09cd1569d2..8e821892142d65caddd6fa3bd8ff24a372fe9a5d 100644 --- a/deploy/paddle2onnx/readme.md +++ b/deploy/paddle2onnx/readme.md @@ -1,4 +1,4 @@ -# paddle2onnx 模型转化与预测 +# Paddle2ONNX模型转化与预测 本章节介绍 PaddleOCR 模型如何转化为 ONNX 模型,并基于 ONNXRuntime 引擎预测。 diff --git a/deploy/pdserving/README_CN.md b/deploy/pdserving/README_CN.md index d50aff4810d7421450a696e243b8e796f26793d7..0ac0f770c5b41616d66382c87ad9f6a123aebfa1 100644 --- a/deploy/pdserving/README_CN.md +++ b/deploy/pdserving/README_CN.md @@ -8,8 +8,7 @@ PaddleOCR提供2种服务部署方式: # 基于PaddleServing的服务部署 -本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PPOCR -动态图模型的pipeline在线服务。 +本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PP-OCR动态图模型的pipeline在线服务。 相比较于hubserving部署,PaddleServing具备以下优点: - 支持客户端和服务端之间高并发和高效通信 @@ -59,7 +58,7 @@ pip3 install paddle_serving_app-0.7.0-py3-none-any.whl 使用PaddleServing做服务化部署时,需要将保存的inference模型转换为serving易于部署的模型。 -首先,下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th) +首先,下载PP-OCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th) ```bash # 下载并解压 OCR 文本检测模型 @@ -107,7 +106,7 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \ 1. 下载PaddleOCR代码,若已下载可跳过此步骤 ``` git clone https://github.com/PaddlePaddle/PaddleOCR - + # 进入到工作目录 cd PaddleOCR/deploy/pdserving/ ``` diff --git a/deploy/slim/prune/README.md b/deploy/slim/prune/README.md index 7b8dd169c5fa9d01421070f1ccc2bd4e8ed543a2..6d04f1648705071d70c1e9f17cd30d6825f92467 100644 --- a/deploy/slim/prune/README.md +++ b/deploy/slim/prune/README.md @@ -1,5 +1,5 @@ -## 介绍 +# PP-OCR模型裁剪 复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。 @@ -7,13 +7,13 @@ 在开始本教程之前,建议先了解: -1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md) +1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md) 2. [模型裁剪教程](https://github.com/PaddlePaddle/PaddleSlim/blob/release%2F2.0.0/docs/zh_cn/tutorials/pruning/dygraph/filter_pruning.md) - ## 快速开始 模型裁剪主要包括四个步骤: + 1. 安装 PaddleSlim 2. 准备训练好的模型 3. 敏感度分析、裁剪训练 @@ -35,16 +35,19 @@ python3 setup.py install 加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,得到敏感度文件:sen.pickle,可以通过PaddleSlim提供的[接口](https://github.com/PaddlePaddle/PaddleSlim/blob/9b01b195f0c4bc34a1ab434751cb260e13d64d9e/paddleslim/dygraph/prune/filter_pruner.py#L75)加载文件,获得各网络层在不同裁剪比例下的精度损失。从而了解各网络层冗余度,决定每个网络层的裁剪比例。 敏感度文件内容格式: - sen.pickle(Dict){ +``` +sen.pickle(Dict){ 'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} 'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} } - 例子: +例子: { 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} } +``` + 加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) 进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析训练: diff --git a/deploy/slim/prune/README_en.md b/deploy/slim/prune/README_en.md index f0d652f249686c1d462cd2aa71f4766cf39e763e..aca8d79290016d4602a86ef04fd4e8fa24a39ad7 100644 --- a/deploy/slim/prune/README_en.md +++ b/deploy/slim/prune/README_en.md @@ -1,9 +1,9 @@ -## Introduction +# PP-OCR Models Pruning Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance. -This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model. +This example uses PaddleSlim provided [APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model. [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry. It is recommended that you could understand following pages before reading this example: @@ -37,25 +37,26 @@ PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en. After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md) The data format of sensitivity file: - sen.pickle(Dict){ + +``` +sen.pickle(Dict){ 'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} 'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss} } - - example: +example: { 'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594} 'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405} } +``` + The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86) Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command: ```bash - python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model" Global.save_model_dir=./output/prune_model/ - ``` diff --git a/deploy/slim/quantization/README.md b/deploy/slim/quantization/README.md index 62bc408f5eeda6d8366834200e8d8a20d1dc82cd..8d3f779e0028a62d8396601166283f0ee54d43a7 100644 --- a/deploy/slim/quantization/README.md +++ b/deploy/slim/quantization/README.md @@ -1,12 +1,12 @@ -## 介绍 +# PP-OCR模型量化 复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。 [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。 -在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html) +在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html) ## 快速开始 diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md index 4cafe5f44e48a479cf5b0e4209b8e335a7e4917d..e9e0933d353afca13619aff61b19a0c4242b5653 100644 --- a/deploy/slim/quantization/README_en.md +++ b/deploy/slim/quantization/README_en.md @@ -1,5 +1,5 @@ -## Introduction +# PP-OCR Models Quantization Generally, a more complex model would achieve better performance in the task, but it also leads to some redundancy in the model. Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, diff --git a/doc/doc_ch/environment.md b/doc/doc_ch/environment.md index 3a266c4bb8fe5516f844bea9f0aa21359d51660e..23bec4b978ab34f144a2ec7256e09412f5440646 100644 --- a/doc/doc_ch/environment.md +++ b/doc/doc_ch/environment.md @@ -1,20 +1,19 @@ # 运行环境准备 -Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建PyThon环境。 +Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建Python环境。 推荐环境: -- PaddlePaddle >= 2.0.0 (2.1.2) -- python3.7 +- PaddlePaddle >= 2.1.2 +- Python 3.7 - CUDA10.1 / CUDA10.2 - CUDNN 7.6 -如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。 +> 如果您已经安装Python环境,可以直接参考[PaddleOCR快速开始](./quickstart.md) * [1. Python环境搭建](#1) + [1.1 Windows](#1.1) + [1.2 Mac](#1.2) + [1.3 Linux](#1.3) -* [2. 安装PaddlePaddle](#2) @@ -212,7 +211,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh # 若您要下载其他版本,需要将最后1个/后的文件名改成您希望下载的版本 - ``` + ``` - 安装Anaconda: @@ -311,21 +310,3 @@ sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=hos # ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令 sudo docker container exec -it ppocr /bin/bash ``` - - - -## 2. 安装PaddlePaddle - -- 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 - -```bash -python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple -``` - -- 如果您的机器是CPU,请运行以下命令安装 - -```bash -python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple -``` - -更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 diff --git a/doc/doc_ch/models_and_config.md b/doc/doc_ch/models_and_config.md deleted file mode 100644 index 89afc89a99bed364fd2abe247946dfe9e552ae86..0000000000000000000000000000000000000000 --- a/doc/doc_ch/models_and_config.md +++ /dev/null @@ -1,47 +0,0 @@ - -# PP-OCR模型与配置文件 -PP-OCR模型与配置文件一章主要补充一些OCR模型的基本概念、配置文件的内容与作用以便对模型后续的参数调整和训练中拥有更好的体验。 - -本章包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference_ppocr.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。 - ------- - -下面我们首先了解一些OCR相关的基本概念: - -- [1. OCR 简要介绍](#1-ocr-----) - * [1.1 OCR 检测模型基本概念](#11-ocr---------) - * [1.2 OCR 识别模型基本概念](#12-ocr---------) - * [1.3 PP-OCR模型](#13-pp-ocr--) - - -## 1. OCR 简要介绍 -本节简要介绍OCR检测模型、识别模型的基本概念,并介绍PaddleOCR的PP-OCR模型。 - -OCR(Optical Character Recognition,光学字符识别)目前是文字识别的统称,已不限于文档或书本文字识别,更包括识别自然场景下的文字,又可以称为STR(Scene Text Recognition)。 - -OCR文字识别一般包括两个部分,文本检测和文本识别;文本检测首先利用检测算法检测到图像中的文本行;然后检测到的文本行用识别算法去识别到具体文字。 - - -### 1.1 OCR 检测模型基本概念 - -文本检测就是要定位图像中的文字区域,然后通常以边界框的形式将单词或文本行标记出来。传统的文字检测算法多是通过手工提取特征的方式,特点是速度快,简单场景效果好,但是面对自然场景,效果会大打折扣。当前多是采用深度学习方法来做。 - -基于深度学习的文本检测算法可以大致分为以下几类: -1. 基于目标检测的方法;一般是预测得到文本框后,通过NMS筛选得到最终文本框,多是四点文本框,对弯曲文本场景效果不理想。典型算法为EAST、Text Box等方法。 -2. 基于分割的方法;将文本行当成分割目标,然后通过分割结果构建外接文本框,可以处理弯曲文本,对于文本交叉场景问题效果不理想。典型算法为DB、PSENet等方法。 -3. 混合目标检测和分割的方法; - - -### 1.2 OCR 识别模型基本概念 - -OCR识别算法的输入数据一般是文本行,背景信息不多,文字占据主要部分,识别算法目前可以分为两类算法: -1. 基于CTC的方法;即识别算法的文字预测模块是基于CTC的,常用的算法组合为CNN+RNN+CTC。目前也有一些算法尝试在网络中加入transformer模块等等。 -2. 基于Attention的方法;即识别算法的文字预测模块是基于Attention的,常用算法组合是CNN+RNN+Attention。 - - -### 1.3 PP-OCR模型 - -PaddleOCR 中集成了很多OCR算法,文本检测算法有DB、EAST、SAST等等,文本识别算法有CRNN、RARE、StarNet、Rosetta、SRN等算法。 - -其中PaddleOCR针对中英文自然场景通用OCR,推出了PP-OCR系列模型,PP-OCR模型由DB+CRNN算法组成,利用海量中文数据训练加上模型调优方法,在中文场景上具备较高的文本检测识别能力。并且PaddleOCR推出了高精度超轻量PP-OCRv2模型,检测模型仅3M,识别模型仅8.5M,利用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)的模型量化方法,可以在保持精度不降低的情况下,将检测模型压缩到0.8M,识别压缩到3M,更加适用于移动端部署场景。 - diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index 1e0d914140072416710a1b37d72ea88a038793ba..d2126192764fa32c7c7a3651b463b8b23240ea6c 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -1,6 +1,9 @@ # PaddleOCR快速开始 -- [1. 安装PaddleOCR whl包](#1) +- [1. 安装](#1) + - [1.1 安装PaddlePaddle](#11) + - [1.2 安装PaddleOCR whl包](#12) + - [2. 便捷使用](#2) - [2.1 命令行使用](#21) - [2.1.1 中英文模型](#211) @@ -9,10 +12,35 @@ - [2.2 Python脚本使用](#22) - [2.2.1 中英文与多语言使用](#221) - [2.2.2 版面分析](#222) +- [3.小结](#3) -## 1. 安装PaddleOCR whl包 +## 1. 安装 + + + +### 1.1 安装PaddlePaddle + +> 如果您没有基础的Python运行环境,请参考[运行环境准备](./environment.md)。 + +- 您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 + + ```bash + python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple + ``` + +- 您的机器是CPU,请运行以下命令安装 + + ```bash + python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + ``` + +更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 + + + +### 1.2 安装PaddleOCR whl包 ```bash pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本 @@ -257,3 +285,11 @@ im_show = draw_structure_result(image, result,font_path=font_path) im_show = Image.fromarray(im_show) im_show.save('result.jpg') ``` + + + +## 3. 小结 + +通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。 + +PaddleOCR是一套丰富领先实用的OCR工具库,打通数据、模型训练、压缩和推理部署全流程,因此在[下一节](./paddleOCR_overview.md)中我们将首先为您介绍PaddleOCR的全景图,然后克隆PaddleOCR项目,正式开启PaddleOCR的应用之旅。 diff --git a/doc/doc_en/environment_en.md b/doc/doc_en/environment_en.md index fc87f10c104628df0268bc6f8910c5914aeba225..6521d3c4144aa579be2075d14826e9dcb9ad9dd6 100644 --- a/doc/doc_en/environment_en.md +++ b/doc/doc_en/environment_en.md @@ -1,18 +1,19 @@ # Environment Preparation -Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. If you are familiar with the Python environment, you can skip to step 2 to install PaddlePaddle. +Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. Recommended working environment: -- PaddlePaddle >= 2.0.0 (2.1.2) +- PaddlePaddle >= 2.1.2 - Python 3.7 - CUDA 10.1 / CUDA 10.2 - cuDNN 7.6 +> If you already have a Python environment installed, you can skip to [PaddleOCR Quick Start](./quickstart_en.md). + * [1. Python Environment Setup](#1) + [1.1 Windows](#1.1) + [1.2 Mac](#1.2) + [1.3 Linux](#1.3) -* [2. Install PaddlePaddle 2.0](#2) @@ -330,21 +331,3 @@ You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags # ctrl+P+Q to exit docker, to re-enter docker using the following command: sudo docker container exec -it ppocr /bin/bash ``` - - - -## 2. Install PaddlePaddle 2.0 - -- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install - -```bash -python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple -``` - -- If you have no available GPU on your machine, please run the following command to install the CPU version - -```bash -python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple -``` - -For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. diff --git a/doc/doc_en/models_and_config_en.md b/doc/doc_en/models_and_config_en.md deleted file mode 100644 index c47fb5597eb56c823dff4c6d52cf3b114f3d9c0e..0000000000000000000000000000000000000000 --- a/doc/doc_en/models_and_config_en.md +++ /dev/null @@ -1,48 +0,0 @@ -# PP-OCR Model and Configuration -The chapter on PP-OCR model and configuration file mainly adds some basic concepts of OCR model and the content and role of configuration file to have a better experience in the subsequent parameter adjustment and training of the model. - -This chapter contains three parts. Firstly, [PP-OCR Model Download](./models_list_en.md) explains the concept of PP-OCR model types and provides links to download all models. Then in [Yml Configuration](./config_en.md) details the parameters needed to fine-tune the PP-OCR models. The final [Python Inference for PP-OCR Model Library](./inference_ppocr_en.md) is an introduction to the use of the PP-OCR model library in the first section, which can quickly utilize the rich model library models to obtain test results through the Python inference engine. - ------- - -Let's first understand some basic concepts. - -- [INTRODUCTION ABOUT OCR](#introduction-about-ocr) - * [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model) - * [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model) - * [PP-OCR model](#pp-ocr-model) - * [And a table of contents](#and-a-table-of-contents) - * [On the right](#on-the-right) - - -## 1. INTRODUCTION ABOUT OCR - -This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model. - -OCR (Optical Character Recognition, Optical Character Recognition) is currently the general term for text recognition. It is not limited to document or book text recognition, but also includes recognizing text in natural scenes. It can also be called STR (Scene Text Recognition). - -OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line. - - -### 1.1 BASIC CONCEPTS OF OCR DETECTION MODEL - -Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used. - -Text detection algorithms based on deep learning can be roughly divided into the following categories: -1. Method based on target detection. Generally, after the text box is predicted, the final text box is filtered through NMS, which is mostly four-point text box, which is not ideal for curved text scenes. Typical algorithms are methods such as EAST and Text Box. -2. Method based on text segmentation. The text line is regarded as the segmentation target, and then the external text box is constructed through the segmentation result, which can handle curved text, and the effect is not ideal for the text cross scene problem. Typical algorithms are DB, PSENet and other methods. -3. Hybrid target detection and segmentation method. - - -### 1.2 Basic concepts of OCR recognition model - -The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms: -1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on. -2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention. - - -### 1.3 PP-OCR model - -PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms. - -Among them, PaddleOCR has released the PP-OCR series model for the general OCR in Chinese and English natural scenes. The PP-OCR model is composed of the DB+CRNN algorithm. It uses massive Chinese data training and model tuning methods to have high text detection and recognition capabilities in Chinese scenes. And PaddleOCR has launched a high-precision and ultra-lightweight PP-OCRv2 model. The detection model is only 3M, and the recognition model is only 8.5M. Using [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)'s model quantification method, the detection model can be compressed to 0.8M without reducing the accuracy. The recognition is compressed to 3M, which is more suitable for mobile deployment scenarios. diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index 240a4ba11f3b7df0c518c841d9acee0ae88fcfa8..e44345a8e65f6efc94f83604590d980e052f2abd 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -1,7 +1,9 @@ # PaddleOCR Quick Start -+ [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package) ++ [1. Installation](#1installation) + + [1.1 Install PaddlePaddle](#11-install-paddlepaddle) + + [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package) * [2. Easy-to-Use](#2-easy-to-use) + [2.1 Use by Command Line](#21-use-by-command-line) - [2.1.1 English and Chinese Model](#211-english-and-chinese-model) @@ -10,12 +12,35 @@ + [2.2 Use by Code](#22-use-by-code) - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model) - [2.2.2 Layout Analysis](#222-layoutAnalysis) +* [3. Summary](#3) + +## 1. Installation - + -## 1. Install PaddleOCR Whl Package +### 1.1 Install PaddlePaddle + +> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md). + +- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install + + ```bash + python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple + ``` + +- If you have no available GPU on your machine, please run the following command to install the CPU version + + ```bash + python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + ``` + +For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. + + + +### 1.2 Install PaddleOCR Whl Package ```bash pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+ @@ -248,3 +273,11 @@ im_show = draw_structure_result(image, result,font_path=font_path) im_show = Image.fromarray(im_show) im_show.save('result.jpg') ``` + + + +## 3. Summary + +In this section, you have mastered the use of PaddleOCR whl packages and obtained results. + +PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR.