diff --git a/README.md b/README.md index 4bb69766e5d1bdb9fa845efed90f43c2645ec95c..1ba1cd14f3ba68fd599aa50cfd5a19298006a284 100644 --- a/README.md +++ b/README.md @@ -102,6 +102,7 @@ For more model downloads (including multiple languages), please refer to [PP-OCR For a new language request, please refer to [Guideline for new language_requests](#language_requests). ## Tutorials +- [Environment Preparation](./doc/doc_en/environment_en.md) - [Quick Start](./doc/doc_en/quickstart_en.md) - [PaddleOCR Overview and Installation](./doc/doc_en/paddleOCR_overview_en.md) - PP-OCR Industry Landing: from Training to Deployment diff --git a/README_ch.md b/README_ch.md index 8e9f8efc089889ce1e7c069e2dd960cd3a4cdd4d..11e1097250dff6d7384845f5d48fa073a6adf298 100755 --- a/README_ch.md +++ b/README_ch.md @@ -92,6 +92,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式 更多模型下载(包括多语言),可以参考[PP-OCR v2.0 系列模型下载](./doc/doc_ch/models_list.md) ## 文档教程 +- [运行环境准备](./doc/doc_ch/environment.md) - [快速开始](./doc/doc_ch/quickstart.md) - [PaddleOCR全景图与安装](./doc/doc_ch/paddleOCR_overview.md) - PP-OCR产业落地:从训练到部署 @@ -120,7 +121,6 @@ PaddleOCR同时支持动态图与静态图两种编程范式 - OCR学术圈 - [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md) - [端到端PGNet算法](./doc/doc_ch/pgnet.md) - - 模型训练 - 数据集 - [通用中英文OCR数据集](./doc/doc_ch/datasets.md) - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index e70634f857e7b74888bdf6d5f2fc8bb7fa39edd4..29e46c0dd136ee02e7a157cecea4664f693a7af1 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -1,7 +1,5 @@ # 配置文件内容与生成 -[toc] - ## 1. 可选参数列表 以下列表可以通过`--help`查看 @@ -56,7 +54,7 @@ ### Architecture ([ppocr/modeling](../../ppocr/modeling)) -在ppocr中,网络被划分为Transform,Backbone,Neck和Head四个阶段 +在PaddleOCR中,网络被划分为Transform,Backbone,Neck和Head四个阶段 | 字段 | 用途 | 默认值 | 备注 | | :---------------------: | :---------------------: | :--------------: | :--------------------: | @@ -202,3 +200,24 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi ... ``` + +目前PaddleOCR支持的多语言算法有: + +| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | +| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 中文繁体 | chinese_cht| +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语(区分大小写) | EN | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | french | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | german | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | japan | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | korean | +| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 拉丁字母 | latin | +| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 阿拉伯字母 | ar | +| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 斯拉夫字母 | cyrillic | +| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 | devanagari | + +更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99) + +多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以通过下面两种方式下载。 +* [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA)。提取码:frgi。 +* [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view) diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md index 6fc85992c04123a10ad937f2694b513b50a37876..57bfdc01e28042e70e42f0dfecb6f8c81d92d8f1 100644 --- a/doc/doc_ch/detection.md +++ b/doc/doc_ch/detection.md @@ -1,10 +1,32 @@ -# 文字检测 -本节以icdar2015数据集为例,介绍PaddleOCR中检测模型的训练、评估与测试。 +# 目录 +- [1. 文字检测](#1-----) + * [1.1 数据准备](#11-----) + * [1.2 下载预训练模型](#12--------) + * [1.3 启动训练](#13-----) + * [1.4 断点训练](#14-----) + * [1.5 更换Backbone 训练](#15---backbone---) + * [1.6 指标评估](#16-----) + * [1.7 测试检测效果](#17-------) + * [1.8 转inference模型测试](#18--inference----) +- [2. FAQ](#2-faq) -## 数据准备 + + +# 1. 文字检测 + +本节以icdar2015数据集为例,介绍PaddleOCR中检测模型训练、评估、测试的使用方式。 + + +## 1.1 数据准备 icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。 +注册完成登陆后,下载下图中红色框标出的部分,其中, `Training Set Images`下载的内容保存为`icdar_c4_train_imgs`文件夹下,`Test Set Images` 下载的内容保存为`ch4_test_images`文件夹下 + +

+ +

+ 将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/ 下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件 ,您可以通过wget的方式进行下载。 ```shell @@ -23,7 +45,7 @@ python gen_label.py --mode="det" --root_path="/path/to/icdar_c4_train_imgs/" \ --output_label="/path/to/train_icdar2015_label.txt" ``` -解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,分别是: +解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,按照如下方式组织icdar2015数据集: ``` /PaddleOCR/train_data/icdar2015/text_localization/ └─ icdar_c4_train_imgs/ icdar数据集的训练数据 @@ -42,11 +64,13 @@ json.dumps编码前的图像标注信息是包含多个字典的list,字典中 如果您想在其他数据集上训练,可以按照上述形式构建标注文件。 -## 快速启动训练 + +## 1.2 下载预训练模型 首先下载模型backbone的pretrain model,PaddleOCR的检测模型目前支持两种backbone,分别是MobileNetV3、ResNet_vd系列, -您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/develop/ppcls/modeling/architectures)中的模型更换backbone, -对应的backbone预训练模型可以从[PaddleClas repo 主页中找到下载链接](https://github.com/PaddlePaddle/PaddleClas#mobile-series)。 +您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures)中的模型更换backbone, +对应的backbone预训练模型可以从[PaddleClas repo 主页中找到下载链接](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97)。 + ```shell cd PaddleOCR/ # 根据backbone的不同选择下载对应的预训练模型 @@ -56,23 +80,23 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dyg wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams # 或,下载ResNet50_vd的预训练模型 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams - ``` -#### 启动训练 + +## 1.3 启动训练 *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* ```shell # 单机单卡训练 mv3_db 模型 python3 tools/train.py -c configs/det/det_mv3_db.yml \ - -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ + -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained + # 单机多卡训练,通过 --gpus 参数设置使用的GPU ID python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ - -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ + -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained ``` - 上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。 有关配置文件的详细解释,请参考[链接](./config.md)。 @@ -81,46 +105,122 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 ``` -#### 断点训练 + +## 1.4 断点训练 如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径: ```shell python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model +``` +**注意**:`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。 + + +## 1.5 更换Backbone 训练 + +PaddleOCR将网络划分为四部分,分别在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones-> +necks->heads)依次通过这四个部分。 +```bash +├── architectures # 网络的组网代码 +├── transforms # 网络的图像变换模块 +├── backbones # 网络的特征提取模块 +├── necks # 网络的特征增强模块 +└── heads # 网络的输出模块 ``` +如果要更换的Backbone 在PaddleOCR中有对应实现,直接修改配置yml文件中`Backbone`部分的参数即可。 -**注意**:`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。 +如果要使用新的Backbone,更换backbones的例子如下: + +1. 在 [ppocr/modeling/backbones](../../ppocr/modeling/backbones) 文件夹下新建文件,如my_backbone.py。 +2. 在 my_backbone.py 文件内添加相关代码,示例代码如下: + +```python +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class MyBackbone(nn.Layer): + def __init__(self, *args, **kwargs): + super(MyBackbone, self).__init__() + # your init code + self.conv = nn.xxxx + + def forward(self, inputs): + # your network forward + y = self.conv(inputs) + return y +``` + +3. 在 [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py)文件内导入添加的`MyBackbone`模块,然后修改配置文件中Backbone进行配置即可使用,格式如下: -## 指标评估 +```yaml +Backbone: +name: MyBackbone +args1: args1 +``` -PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean。 +**注意**:如果要更换网络的其他模块,可以参考[文档](./add_new_algorithm.md)。 -运行如下代码,根据配置文件`det_db_mv3.yml`中`save_res_path`指定的测试集检测结果文件,计算评估指标。 + +## 1.6 指标评估 + +PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean(F-Score)。 -评估时设置后处理参数`box_thresh=0.5`,`unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化 训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。 + ```shell -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.5 PostProcess.unclip_ratio=1.5 +python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" ``` - * 注:`box_thresh`、`unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置 -## 测试检测效果 + +## 1.7 测试检测效果 测试单张图像的检测效果 ```shell python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" ``` -测试DB模型时,调整后处理阈值, +测试DB模型时,调整后处理阈值 ```shell -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0 ``` - 测试文件夹下所有图像的检测效果 ```shell python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy" ``` + + +## 1.8 转inference模型测试 + +inference 模型(`paddle.jit.save`保存的模型) +一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 +训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 +与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。 + +检测模型转inference 模型方式: +```shell +# 加载配置文件`det_mv3_db.yml`,从`output/det_db`目录下加载`best_accuracy`模型,inference模型保存在`./output/det_db_inference`目录下 +python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model="./output/det_db/best_accuracy" Global.save_inference_dir="./output/det_db_inference/" +``` + +DB检测模型inference 模型预测: +```shell +python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True +``` +如果是其他检测,比如EAST模型,det_algorithm参数需要修改为EAST,默认为DB算法: +```shell +python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True +``` + + +# 2. FAQ + +Q1: 训练模型转inference 模型之后预测效果不一致? +**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。以det_mv3_db.yml配置文件训练的模型为例,训练模型、inference模型预测结果不一致问题解决方式如下: +- 检查[trained model预处理](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116),和[inference model的预测预处理](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42)函数是否一致。算法在评估的时候,输入图像大小会影响精度,为了和论文保持一致,训练icdar15配置文件中将图像resize到[736, 1280],但是在inference model预测的时候只有一套默认参数,会考虑到预测速度问题,默认限制图像最长边为960做resize的。训练模型预处理和inference模型的预处理函数位于[ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147) +- 检查[trained model后处理](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51),和[inference 后处理参数](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50)是否一致。 diff --git a/doc/doc_ch/environment.md b/doc/doc_ch/environment.md index b53f5542d5d8670d557f15e297ff3b57f273203c..4f2acc29d9f70e75a0ed18ea358b747f77cd4a9e 100644 --- a/doc/doc_ch/environment.md +++ b/doc/doc_ch/environment.md @@ -1,8 +1,22 @@ -# 零基础Python环境搭建 +# 运行环境准备 -## Windows +[运行环境准备](#运行环境准备) -### 第1步:安装Anaconda +* [1. Python环境搭建](#1) + + [1.1 Windows](#1.1) + + [1.2 Mac](#1.2) + + [1.3 Linux](#1.3) +* [2. 安装PaddlePaddle](#2) + + + +## 1. Python环境搭建 + + + +### 1.1 Windows + +#### 1.1.1 安装Anaconda - 说明:使用paddlepaddle需要先安装python环境,这里我们选择python集成环境Anaconda工具包 - Anaconda是1个常用的python包管理程序 @@ -11,20 +25,20 @@ - 地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D - 大部分win10电脑均为64位操作系统,选择x86_64版本;若电脑为32位操作系统,则选择x86.exe - anaconda download + anaconda download - 下载完成后,双击安装程序进入图形界面 - 默认安装位置为C盘,建议将安装位置更改到D盘: - install config + install config - 勾选conda加入环境变量,忽略警告: - add conda to path + add conda to path -### 第2步:打开终端并创建conda环境 +#### 1.1.2 打开终端并创建conda环境 - 打开Anaconda Prompt终端:左下角Windows Start Menu -> Anaconda3 -> Anaconda Prompt启动控制台 - anaconda download + anaconda download - 创建新的conda环境 @@ -39,7 +53,7 @@ 之后命令行中会输出提示信息,输入y并回车继续安装 - conda create + conda create - 激活刚创建的conda环境,在命令行中输入以下命令: @@ -50,21 +64,18 @@ where python ``` - create environment - - - + create environment 以上anaconda环境和python环境安装完毕 + +### 1.2 Mac -## Mac - -### 第1步:安装Anaconda +#### 1.2.1 安装Anaconda - 说明:使用paddlepaddle需要先安装python环境,这里我们选择python集成环境Anaconda工具包 - Anaconda是1个常用的python包管理程序 @@ -72,14 +83,14 @@ - Anaconda下载: - 地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D - anaconda download + anaconda download - 选择最下方的`Anaconda3-2021.05-MacOSX-x86_64.pkg`下载 - 下载完成后,双击.pkg文件进入图形界面 - 按默认设置即可,安装需要花费一段时间 - 建议安装vscode或pycharm等代码编辑器 -### 第2步:打开终端并创建conda环境 +#### 1.2.2 打开终端并创建conda环境 - 打开终端 @@ -142,7 +153,7 @@ - 之后命令行中会输出提示信息,输入y并回车继续安装 - - conda_create + - conda_create - 激活刚创建的conda环境,在命令行中输入以下命令: @@ -153,15 +164,17 @@ where python ``` - conda_actviate + conda_actviate 以上anaconda环境和python环境安装完毕 + +### 1.3 Linux -## Linux +Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker且需要训练PaddleOCR模型,推荐使用Docker环境,PaddleOCR的开发流程均在Docker环境下运行。如果你不熟悉Docker,也可以使用Anaconda来运行项目。 -### 第1步:安装Anaconda +#### 1.3.1 Anaconda环境配置 - 说明:使用paddlepaddle需要先安装python环境,这里我们选择python集成环境Anaconda工具包 - Anaconda是1个常用的python包管理程序 @@ -170,43 +183,27 @@ - **下载Anaconda**: - 下载地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D - - - - - - - - - - - - - - - - - + - 选择适合您操作系统的版本 - - 可在终端输入`uname -m`查询系统所用的指令集 - - - 下载法1:本地下载,再将安装包传到linux服务器上 - - - 下载法2:直接使用linux命令行下载 - - ```shell + - 可在终端输入`uname -m`查询系统所用的指令集 + +- 下载法1:本地下载,再将安装包传到linux服务器上 + +- 下载法2:直接使用linux命令行下载 + + ```shell # 首先安装wget - sudo apt-get install wget # Ubuntu + sudo apt-get install wget # Ubuntu sudo yum install wget # CentOS - ``` - - ```shell + ``` + + ```shell # 然后使用wget从清华源上下载 - # 如要下载Anaconda3-2021.05-Linux-x86_64.sh,则下载命令如下: + # 如要下载Anaconda3-2021.05-Linux-x86_64.sh,则下载命令如下: wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh - # 若您要下载其他版本,需要将最后1个/后的文件名改成您希望下载的版本 + # 若您要下载其他版本,需要将最后1个/后的文件名改成您希望下载的版本 ``` - 安装Anaconda: @@ -262,28 +259,71 @@ - 在终端中输入`source ~/.bash_profile`以更新环境变量 - 再在终端输入`conda info --envs`,若能显示当前有base环境,则conda已加入环境变量 -### 第2步:创建conda环境 - - 创建新的conda环境 ```shell - # 在命令行输入以下命令,创建名为paddle_env的环境 - # 此处为加速下载,使用清华源 - conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ + # 在命令行输入以下命令,创建名为paddle_env的环境 + # 此处为加速下载,使用清华源 + conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ ``` - 该命令会创建1个名为paddle_env、python版本为3.8的可执行环境,根据网络状态,需要花费一段时间 - 之后命令行中会输出提示信息,输入y并回车继续安装 - conda_create + conda_create - 激活刚创建的conda环境,在命令行中输入以下命令: ```shell - # 激活paddle_env环境 - conda activate paddle_env + # 激活paddle_env环境 + conda activate paddle_env ``` -以上anaconda环境和python环境安装完毕 \ No newline at end of file +以上anaconda环境和python环境安装完毕 + +#### 1.3.2 Docker环境配置 + +**注意:第一次使用这个镜像,会自动下载该镜像,请耐心等待。** + +```bash +# 切换到工作目录下 +cd /home/Projects +# 首次运行需创建一个docker容器,再次运行时不需要运行当前命令 +# 创建一个名字为ppocr的docker容器,并将当前目录映射到容器的/paddle目录下 + +如果您希望在CPU环境下使用docker,使用docker而不是nvidia-docker创建docker +sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash + +如果使用CUDA10,请运行以下命令创建容器,设置docker容器共享内存shm-size为64G,建议设置32G以上 +sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash + +您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。 + +# ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令 +sudo docker container exec -it ppocr /bin/bash +``` + + + +## 2. 安装PaddlePaddle + +- 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 + +```bash +python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple +``` + +- 如果您的机器是CPU,请运行以下命令安装 + +```bash +python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple +``` + +更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 + + + + + diff --git a/doc/doc_ch/models_and_config.md b/doc/doc_ch/models_and_config.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..167b7ec2cb039a5b7943cda98474d809019a57b7 100644 --- a/doc/doc_ch/models_and_config.md +++ b/doc/doc_ch/models_and_config.md @@ -0,0 +1,38 @@ + +# 目录 +- [1. OCR 简要介绍](#1-ocr-----) + * [1.1 OCR 检测模型基本概念](#11-ocr---------) + * [1.2 OCR 识别模型基本概念](#12-ocr---------) + * [1.3 PP-OCR模型](#13-pp-ocr--) + + +# 1. OCR 简要介绍 +本节简要介绍OCR检测模型、识别模型的基本概念,并介绍PaddleOCR的PP-OCR模型。 + +OCR(Optical Character Recognition,光学字符识别)目前是文字识别的统称,已不限于文档或书本文字识别,更包括识别自然场景下的文字,又可以称为STR(Scene Text Recognition)。 + +OCR文字识别一般包括两个部分,文本检测和文本识别;文本检测首先利用检测算法检测到图像中的文本行;然后检测到的文本行用识别算法去识别到具体文字。 + + +## 1.1 OCR 检测模型基本概念 + +文本检测就是要定位图像中的文字区域,然后通常以边界框的形式将单词或文本行标记出来。传统的文字检测算法多是通过手工提取特征的方式,特点是速度快,简单场景效果好,但是面对自然场景,效果会大打折扣。当前多是采用深度学习方法来做。 + +基于深度学习的文本检测算法可以大致分为以下几类: +1. 基于目标检测的方法;一般是预测得到文本框后,通过NMS筛选得到最终文本框,多是四点文本框,对弯曲文本场景效果不理想。典型算法为EAST、Text Box等方法。 +2. 基于分割的方法;将文本行当成分割目标,然后通过分割结果构建外接文本框,可以处理弯曲文本,对于文本交叉场景问题效果不理想。典型算法为DB、PSENet等方法。 +3. 混合目标检测和分割的方法; + + +## 1.2 OCR 识别模型基本概念 + +OCR识别算法的输入数据一般是文本行,背景信息不多,文字占据主要部分,识别算法目前可以分为两类算法: +1. 基于CTC的方法;即识别算法的文字预测模块是基于CTC的,常用的算法组合为CNN+RNN+CTC。目前也有一些算法尝试在网络中加入transformer模块等等。 +2. 基于Attention的方法;即识别算法的文字预测模块是基于Attention的,常用算法组合是CNN+RNN+Attention。 + + +## 1.3 PP-OCR模型 + +PaddleOCR 中集成了很多OCR算法,文本检测算法有DB、EAST、SAST等等,文本识别算法有CRNN、RARE、StarNet、Rosetta、SRN等算法。 + +其中PaddleOCR针对中英文自然场景通用OCR,推出了PP-OCR系列模型,PP-OCR模型由DB+CRNN算法组成,利用海量中文数据训练加上模型调优方法,在中文场景上具备较高的文本检测识别能力。并且PaddleOCR推出了高精度超轻量PP-OCRv2模型,检测模型仅3M,识别模型仅8.5M,利用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)的模型量化方法,可以在保持精度不降低的情况下,将检测模型压缩到0.8M,识别压缩到3M,更加适用于移动端部署场景。 diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index 35713ae67f797618e043697eb93642208c3df865..43671bf5b051b85a7d0728253bfeab069cd82642 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -32,6 +32,8 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | +|ch_ppocr_mobile_slim_v2.1_det|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_det_lite_train_cml_v2.1.yml](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_quant_infer.tar)| +|ch_ppocr_mobile_v2.1_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_lite_train_cml_v2.1.ym](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)| |ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 2.6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -45,6 +47,8 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | +|ch_ppocr_mobile_slim_v2.1_rec|slim量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_train.tar) | +|ch_ppocr_mobile_v2.1_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -125,13 +129,15 @@ python3 generate_multi_language_configs.py -l it \ |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) | -|ch_ppocr_mobile_v2.0_cls|原始模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | +|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型,对检测到的文本行文字角度分类|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) | +|ch_ppocr_mobile_v2.0_cls|原始分类器模型,对检测到的文本行文字角度分类|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | ### 四、Paddle-Lite 模型 |模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本| |---|---|---|---|---|---|---| -|V2.0|超轻量中文OCR 移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| -|V2.0(slim)|超轻量中文OCR 移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| +|V2.1|ppocr_v2.1蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer_opt.nb)|v2.9| +|V2.1(slim)|ppocr_v2.1蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_opt.nb)|v2.9| +|V2.0|ppocr_v2.0超轻量中文OCR移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| +|V2.0(slim)|ppocr_v2.0超轻量中文OCR移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| diff --git a/doc/doc_ch/paddleOCR_overview.md b/doc/doc_ch/paddleOCR_overview.md index 9c16f8f62b87240d111dd13f05ef06f81bb58a92..f49c1ae302607ff6629da2462f91a36793b4db3a 100644 --- a/doc/doc_ch/paddleOCR_overview.md +++ b/doc/doc_ch/paddleOCR_overview.md @@ -1,2 +1,33 @@ # PaddleOCR全景图与项目克隆 +## 1. PaddleOCR全景图 + +PaddleOCR包含丰富的文本检测、文本识别以及端到端算法。结合实际测试与产业经验,PaddleOCR选择DB和CRNN作为基础的检测和识别模型,经过一系列优化策略提出面向产业应用的PP-OCR模型。PP-OCR模型针对通用场景,根据不同语种形成了PP-OCR模型库。基于PP-OCR的能力,PaddleOCR针对文档场景任务发布PP-Structure工具库,包含版面分析和表格识别两大任务。为了打通产业落地的全流程,PaddleOCR提供了规模化的数据生产工具和多种预测部署工具,助力开发者快速落地。 + +

+ +
+ +## 2. 项目克隆 + +### **2.1 克隆PaddleOCR repo代码** + +``` +【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR +``` + +如果因为网络问题无法pull成功,也可选择使用码云上的托管: + +``` +git clone https://gitee.com/paddlepaddle/PaddleOCR +``` + +注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。 + +### **2.2 安装第三方库** + +``` +cd PaddleOCR +pip3 install -r requirements.txt +``` + diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index a524c38b6eb23c5e846f244d56bfc0edafac67b3..9df686501de48234dbc1821d7d645d7f12bda21a 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -1,9 +1,6 @@ # PaddleOCR快速开始 - [PaddleOCR快速开始](#paddleocr) - * [1. 轻量安装](#1) - + [1.0 运行环境准备](#10) - + [1.1 安装PaddlePaddle2.0](#11) - + [1.2 安装PaddleOCR whl包](#12) + + [1. 安装PaddleOCR whl包](#1) * [2. 便捷使用](#2) + [2.1 命令行使用](#21) - [2.1.1 中英文模型](#211) @@ -13,31 +10,9 @@ - [2.2.1 中英文与多语言使用](#221) - [2.2.2 版面分析使用](#222) - -## 1. 轻量安装 - -### 1.0 运行环境准备 - -如果您未搭建过Python环境,可以通过[零基础Python环境搭建文档](./environment.)进行环境搭建 - -### 1.1 安装PaddlePaddle2.0 - -- 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 - -```bash -python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple -``` -- 如果您的机器是CPU,请运行以下命令安装 - -```bash -python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple -``` - -更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 - -### 1.2 安装PaddleOCR whl包 +## 1. 安装PaddleOCR whl包 ```bash pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本 @@ -59,7 +34,7 @@ pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本 ### 2.1 命令行使用 -PaddleOCR提供了一系列测试图片,点击xx下载,然后在终端中切换到相应目录 +PaddleOCR提供了一系列测试图片,点击[这里](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip)下载并解压,然后在终端中切换到相应目录 ``` cd /path/to/ppocr_img @@ -203,6 +178,7 @@ paddleocr --image_dir=./table/1.png --type=structure 大部分参数和paddleocr whl包保持一致,见 [whl包文档](../doc/doc_ch/whl.md) + ### 2.2 Python脚本使用 diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index 6780bc8b48c51a83c680dee7cff699f8fd24c274..4ac6758ff642a58e265e12a0be8308d1fb8251c0 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -51,7 +51,7 @@ Take rec_chinese_lite_train_v2.0.yml as an example ### Architecture ([ppocr/modeling](../../ppocr/modeling)) -In ppocr, the network is divided into four stages: Transform, Backbone, Neck and Head +In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head | Parameter | Use | Defaults | Note | | :---------------------: | :---------------------: | :--------------: | :--------------------: | @@ -122,14 +122,14 @@ In ppocr, the network is divided into four stages: Transform, Backbone, Neck and | num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ | -## MULTILINGUAL CONFIG FILE GENERATION +## 3. MULTILINGUAL CONFIG FILE GENERATION PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。 There are two ways to create the required configuration file:: -### Automatically generated by script +1. Automatically generated by script [generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) Can help you generate configuration files for multi-language models @@ -176,7 +176,7 @@ There are two ways to create the required configuration file:: ``` Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml. -### Manually modify the configuration file +2. Manually modify the configuration file You can also manually modify the following fields in the template: @@ -203,3 +203,25 @@ Italian is made up of Latin letters, so after executing the command, you will ge ... ``` + + +Currently, the multi-language algorithms supported by PaddleOCR are: + +| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | +| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht| +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | EN | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | korean | +| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | latin | +| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | ar | +| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | cyrillic | +| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | devanagari | + +For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations) + +The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods. +* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi. +* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view) diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index b736beb55d79db02bf4d4301a74c685537fce249..8f12d42fe798de7d330f1d3ef1950325887525cb 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -1,10 +1,32 @@ -# TEXT DETECTION +# CONTENT + +- [Paste Your Document In Here](#paste-your-document-in-here) +- [1. TEXT DETECTION](#1-text-detection) + * [1.1 DATA PREPARATION](#11-data-preparation) + * [1.2 DOWNLOAD PRETRAINED MODEL](#12-download-pretrained-model) + * [1.3 START TRAINING](#13-start-training) + * [1.4 LOAD TRAINED MODEL AND CONTINUE TRAINING](#14-load-trained-model-and-continue-training) + * [1.5 TRAINING WITH NEW BACKBONE](#15-training-with-new-backbone) + * [1.6 EVALUATION](#16-evaluation) + * [1.7 TEST](#17-test) + * [1.8 INFERENCE MODEL PREDICTION](#18-inference-model-prediction) +- [2. FAQ](#2-faq) + + +# 1. TEXT DETECTION This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. -## DATA PREPARATION +## 1.1 DATA PREPARATION The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. + +After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images` + +

+ +

+ Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: ```shell # Under the PaddleOCR path @@ -36,10 +58,11 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format. -## TRAINING +## 1.2 DOWNLOAD PRETRAINED MODEL + +First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs. +And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97). -First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/develop/ppcls/modeling/architectures) to replace backbone according to your needs. -And the responding download link of backbone pretrain weights can be found in [PaddleClas repo](https://github.com/PaddlePaddle/PaddleClas#mobile-series). ```shell cd PaddleOCR/ # Download the pre-trained model of MobileNetV3 @@ -49,11 +72,13 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dyg # or, download the pre-trained model of ResNet50_vd wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams +``` -#### START TRAINING +## 1.3 START TRAINING *If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.* ```shell -python3 tools/train.py -c configs/det/det_mv3_db.yml +python3 tools/train.py -c configs/det/det_mv3_db.yml \ + -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained ``` In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file. @@ -62,16 +87,17 @@ For a detailed explanation of the configuration file, please refer to [config](. You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 ```shell # single GPU training -python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 +python3 tools/train.py -c configs/det/det_mv3_db.yml -o \ + Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained \ + Optimizer.base_lr=0.0001 # multi-GPU training # Set the GPU ID used by the '--gpus' parameter. -python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 - +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained ``` -#### load trained model and continue training +## 1.4 LOAD TRAINED MODEL AND CONTINUE TRAINING If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. For example: @@ -82,9 +108,59 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you **Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded. -## EVALUATION +## 1.5 TRAINING WITH NEW BACKBONE + +The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones-> +necks->heads). + +```bash +├── architectures # Code for building network +├── transforms # Image Transformation Module +├── backbones # Feature extraction module +├── necks # Feature enhancement module +└── heads # Output module +``` + +If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file. + +However, if you want to use a new Backbone, an example of replacing the backbones is as follows: + +1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py. +2. Add code in the my_backbone.py file, the sample code is as follows: + +```python +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class MyBackbone(nn.Layer): + def __init__(self, *args, **kwargs): + super(MyBackbone, self).__init__() + # your init code + self.conv = nn.xxxx + + def forward(self, inputs): + # your network forward + y = self.conv(inputs) + return y +``` + +3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file. + +After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as: + +```yaml + Backbone: + name: MyBackbone + args1: args1 +``` + +**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md). + +## 1.6 EVALUATION -PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. +PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score). Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml` @@ -95,10 +171,9 @@ The model parameters during training are saved in the `Global.save_model_dir` di python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` +* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model. -* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. - -## TEST +## 1.7 TEST Test the detection result on a single image: ```shell @@ -107,7 +182,7 @@ python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./ When testing the DB model, adjust the post-processing threshold: ```shell -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0 ``` @@ -115,3 +190,33 @@ Test the detection result on all images in the folder: ```shell python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy" ``` + +## 1.8 INFERENCE MODEL PREDICTION + +The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. + +The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. + +Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems. + +Firstly, we can convert DB trained model to inference model: +```shell +python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model="./output/det_db/best_accuracy" Global.save_inference_dir="./output/det_db_inference/" +``` + +The detection inference model prediction: +```shell +python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True +``` + +If it is other detection algorithms, such as the EAST, the det_algorithm parameter needs to be modified to EAST, and the default is the DB algorithm: +```shell +python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True +``` + +# 2. FAQ + +Q1: The prediction results of trained model and inference model are inconsistent? +**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows: +- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147) +- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50). diff --git a/doc/doc_en/environment_en.md b/doc/doc_en/environment_en.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..96a46cce3010934689e8d95985ca434f49d18886 100644 --- a/doc/doc_en/environment_en.md +++ b/doc/doc_en/environment_en.md @@ -0,0 +1,332 @@ +# Environment Preparation + +* [1. Python Environment Setup](#1) + + [1.1 Windows](#1.1) + + [1.2 Mac](#1.2) + + [1.3 Linux](#1.3) +* [2. Install PaddlePaddle 2.0](#2) + + + +## 1. Python Environment Setup + + + +### 1.1 Windows + +#### 1.1.1 Install Anaconda + +- Note: To use paddlepaddle you need to install python environment first, here we choose python integrated environment Anaconda toolkit + + - Anaconda is a common python package manager + - After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment. + +- Anaconda download. + + - Address: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D + + - Most Win10 computers are 64-bit operating systems, choose x86_64 version; if the computer is a 32-bit operating system, choose x86.exe + + anaconda download + + - After the download is complete, double-click the installer to enter the graphical interface + + - The default installation location is C drive, it is recommended to change the installation location to D drive. + + install config + + - Check conda to add environment variables and ignore the warning that + + add conda to path + + +#### 1.1.2 Opening the terminal and creating the conda environment + +- Open Anaconda Prompt terminal: bottom left Windows Start Menu -> Anaconda3 -> Anaconda Prompt start console + + anaconda download + + +- Create a new conda environment + + ```shell + # Enter the following command at the command line to create an environment named paddle_env + # Here to speed up the download, use the Tsinghua source + conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ # This is a one line command + ``` + + This command will create an executable environment named paddle_env with python version 3.8, which will take a while depending on the network status + + The command line will then output a prompt, type y and enter to continue the installation + + conda create + +- To activate the conda environment you just created, enter the following command at the command line. + + ```shell + # Activate the paddle_env environment + conda activate paddle_env + # View the current location of python + where python + ``` + + create environment + +The above anaconda environment and python environment are installed + + + + + +### 1.2 Mac + +#### 1.2.1 Installing Anaconda + +- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit + + - Anaconda is a common python package manager + - After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment + +- Anaconda download:. + + - Address: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D + + anaconda download + + - Select `Anaconda3-2021.05-MacOSX-x86_64.pkg` at the bottom to download + +- After downloading, double click on the .pkg file to enter the graphical interface + + - Just follow the default settings, it will take a while to install + +- It is recommended to install a code editor such as vscode or pycharm + +#### 1.2.2 Open a terminal and create a conda environment + +- Open the terminal + + - Press command and spacebar at the same time, type "terminal" in the focus search, double click to enter terminal + +- **Add conda to the environment variables** + + - Environment variables are added so that the system can recognize the conda command + + - Open `~/.bash_profile` in the terminal by typing the following command. + + ```shell + vim ~/.bash_profile + ``` + + - Add conda as an environment variable in `~/.bash_profile`. + + ```shell + # Press i first to enter edit mode + # In the first line type. + export PATH="~/opt/anaconda3/bin:$PATH" + # If you customized the installation location during installation, change ~/opt/anaconda3/bin to the bin folder in the customized installation directory + ``` + + ```shell + # The modified ~/.bash_profile file should look like this (where xxx is the username) + export PATH="~/opt/anaconda3/bin:$PATH" + # >>> conda initialize >>> + # !!! Contents within this block are managed by 'conda init' !!! + __conda_setup="$('/Users/xxx/opt/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" + if [ $? -eq 0 ]; then + eval "$__conda_setup" + else + if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then + . "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" + else + export PATH="/Users/xxx/opt/anaconda3/bin:$PATH" + fi + fi + unset __conda_setup + # <<< conda initialize <<< + ``` + + - When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit + + - Verify that the conda command is recognized. + + - Enter `source ~/.bash_profile` in the terminal to update the environment variables + - Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables + +- Create a new conda environment + + ```shell + # Enter the following command at the command line to create an environment called paddle_env + # Here to speed up the download, use Tsinghua source + conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ + ``` + + - This command will create an executable environment named paddle_env with python version 3.8, which will take a while depending on the network status + + - The command line will then output a prompt, type y and enter to continue the installation + + - conda_create + +- To activate the conda environment you just created, enter the following command at the command line. + + ```shell + # Activate the paddle_env environment + conda activate paddle_env + # View the current location of python + where python + ``` + + conda_actviate + +The above anaconda environment and python environment are installed + + + + + +### 1.3 Linux + +Linux users can choose to run either Anaconda or Docker. If you are familiar with Docker and need to train the PaddleOCR model, it is recommended to use the Docker environment, where the development process of PaddleOCR is run. If you are not familiar with Docker, you can also use Anaconda to run the project. + +#### 1.3.1 Anaconda environment configuration + +- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit + + - Anaconda is a common python package manager + - After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment + +- **Download Anaconda**. + + - Download at: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D + + + + + + + - Select the appropriate version for your operating system + - Type `uname -m` in the terminal to check the command set used by your system + + - Download method 1: Download locally, then transfer the installation package to the linux server + + - Download method 2: Directly use linux command line to download + + ```shell + # First install wget + sudo apt-get install wget # Ubuntu + sudo yum install wget # CentOS + ``` + ```bash + # Then use wget to download from Tsinghua source + # If you want to download Anaconda3-2021.05-Linux-x86_64.sh, the download command is as follows + wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh + # If you want to download another version, you need to change the file name after the last 1 / to the version you want to download + ``` + +- To install Anaconda. + + - Type `sh Anaconda3-2021.05-Linux-x86_64.sh` at the command line + - If you downloaded a different version, replace the file name of the command with the name of the file you downloaded + - Just follow the installation instructions + - You can exit by typing q when viewing the license + +- **Add conda to the environment variables** + + - If you have already added conda to the environment variable path during the installation, you can skip this step + + - Open `~/.bashrc` in a terminal. + + ```shell + # Enter the following command in the terminal. + vim ~/.bashrc + ``` + + - Add conda as an environment variable in `~/.bashrc`. + + ```shell + # Press i first to enter edit mode # In the first line enter. + export PATH="~/anaconda3/bin:$PATH" + # If you customized the installation location during installation, change ~/anaconda3/bin to the bin folder in the customized installation directory + ``` + + ```shell + # The modified ~/.bash_profile file should look like this (where xxx is the username) + export PATH="~/opt/anaconda3/bin:$PATH" + # >>> conda initialize >>> + # !!! Contents within this block are managed by 'conda init' !!! + __conda_setup="$('/Users/xxx/opt/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" + if [ $? -eq 0 ]; then + eval "$__conda_setup" + else + if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then + . "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" + else + export PATH="/Users/xxx/opt/anaconda3/bin:$PATH" + fi + fi + unset __conda_setup + # <<< conda initialize <<< + ``` + + - When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit + + - Verify that the conda command is recognized. + + - Enter `source ~/.bash_profile` in the terminal to update the environment variables + - Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables + +- Create a new conda environment + + ```shell + # Enter the following command at the command line to create an environment called paddle_env + # Here to speed up the download, use Tsinghua source + conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ + ``` + + - This command will create an executable environment named paddle_env with python version 3.8, which will take a while depending on the network status + + - The command line will then output a prompt, type y and enter to continue the installation + + conda_create + +- To activate the conda environment you just created, enter the following command at the command line. + + ```shell + # Activate the paddle_env environment + conda activate paddle_env + ``` + +The above anaconda environment and python environment are installed + + +#### 1.3.2 Docker environment preparation + +**The first time you use this docker image, it will be downloaded automatically. Please be patient.** + +```bash +# Switch to the working directory +cd /home/Projects +# You need to create a docker container for the first run, and do not need to run the current command when you run it again +# Create a docker container named ppocr and map the current directory to the /paddle directory of the container + +# If using CPU, use docker instead of nvidia-docker to create docker +sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash +``` + + + +## 2. Install PaddlePaddle 2.0 + +- If you have cuda9 or cuda10 installed on your machine, please run the following command to install + +```bash +python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple +``` + +- If you only have cpu on your machine, please run the following command to install + +```bash +python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple +``` + +For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. + diff --git a/doc/doc_en/models_and_config_en.md b/doc/doc_en/models_and_config_en.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..c88120b5531347304976919cc2175aa54c9f5597 100644 --- a/doc/doc_en/models_and_config_en.md +++ b/doc/doc_en/models_and_config_en.md @@ -0,0 +1,41 @@ +# CONTENT +- [Paste Your Document In Here](#paste-your-document-in-here) +- [INTRODUCTION ABOUT OCR](#introduction-about-ocr) + * [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model) + * [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model) + * [PP-OCR model](#pp-ocr-model) + * [And a table of contents](#and-a-table-of-contents) + * [On the right](#on-the-right) + + +# INTRODUCTION ABOUT OCR + +This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model. + +OCR (Optical Character Recognition, Optical Character Recognition) is currently the general term for text recognition. It is not limited to document or book text recognition, but also includes recognizing text in natural scenes. It can also be called STR (Scene Text Recognition). + +OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line. + + +## BASIC CONCEPTS OF OCR DETECTION MODEL + +Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used. + +Text detection algorithms based on deep learning can be roughly divided into the following categories: +1. Method based on target detection. Generally, after the text box is predicted, the final text box is filtered through NMS, which is mostly four-point text box, which is not ideal for curved text scenes. Typical algorithms are methods such as EAST and Text Box. +2. Method based on text segmentation. The text line is regarded as the segmentation target, and then the external text box is constructed through the segmentation result, which can handle curved text, and the effect is not ideal for the text cross scene problem. Typical algorithms are DB, PSENet and other methods. +3. Hybrid target detection and segmentation method. + + +## Basic concepts of OCR recognition model + +The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms: +1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on. +2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention. + + +## PP-OCR model + +PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms. + +Among them, PaddleOCR has released the PP-OCR series model for the general OCR in Chinese and English natural scenes. The PP-OCR model is composed of the DB+CRNN algorithm. It uses massive Chinese data training and model tuning methods to have high text detection and recognition capabilities in Chinese scenes. And PaddleOCR has launched a high-precision and ultra-lightweight PP-OCRv2 model. The detection model is only 3M, and the recognition model is only 8.5M. Using [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)'s model quantification method, the detection model can be compressed to 0.8M without reducing the accuracy. The recognition is compressed to 3M, which is more suitable for mobile deployment scenarios. diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index 9bee4aef5121b1964a9bdbdeeaad4e81dd9ff6d4..1f9ee1489a87e5814f672a1615920ded41d41e03 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -28,6 +28,8 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | +|ch_ppocr_mobile_slim_v2.1_det|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.yml](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_quant_infer.tar)| +|ch_ppocr_mobile_v2.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.ym](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)| |ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -40,6 +42,8 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | +|ch_ppocr_mobile_slim_v2.1_rec|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_train.tar) | +|ch_ppocr_mobile_v2.1_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -120,12 +124,14 @@ For more supported languages, please refer to : [Multi-language model](./multi_l |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_train.tar) | -|ch_ppocr_mobile_v2.0_cls|Original model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | +|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model for text angle classification|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_train.tar) | +|ch_ppocr_mobile_v2.0_cls|Original model for text angle classification|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | ### 4. Paddle-Lite Model |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch| |---|---|---|---|---|---|---| -|V2.0|extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| -|V2.0(slim)|extra-lightweight chinese OCR optimized model|3.3M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| +|V2.1|ppocr_v2.1 extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer_opt.nb)|v2.9| +|V2.1(slim)|extra-lightweight chinese OCR optimized model|4.9M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_opt.nb)|v2.9| +|V2.0|ppocr_v2.0 extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| +|V2.0(slim)|ppovr_v2.0 extra-lightweight chinese OCR optimized model|3.3M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| diff --git a/doc/doc_en/paddleOCR_overview_en.md b/doc/doc_en/paddleOCR_overview_en.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..403cd99415e08de198270fb5bfe1a43f297c5156 100644 --- a/doc/doc_en/paddleOCR_overview_en.md +++ b/doc/doc_en/paddleOCR_overview_en.md @@ -0,0 +1,39 @@ +# PaddleOCR Overview and Project Clone + +## 1. PaddleOCR Overview + +PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. Combining actual testing and industrial experience, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library according to different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure tool library for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality. + +

+ +
+ + + +## 2. Project Clone + +### **2.1 Clone PaddleOCR repo** + +``` +# Recommend +git clone https://github.com/PaddlePaddle/PaddleOCR + +# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud: + +git clone https://gitee.com/paddlepaddle/PaddleOCR + +# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method. +``` + +### **2.2 Install third-party libraries** + +``` +cd PaddleOCR +pip3 install -r requirements.txt +``` + +If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. + +Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely). + +Reference: [Solve shapely installation on windows]( \ No newline at end of file diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index 4aad3f1f7bad9baba1048691698d389279345c47..637e9407ccddfbc27b941a99ec5404ba5173e7e8 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -3,9 +3,7 @@ [PaddleOCR Quick Start](#paddleocr-quick-start) -* [1. Light Installation](#1-light-installation) - + [1.1 Install PaddlePaddle2.0](#11-install-paddlepaddle20) - + [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package) ++ [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package) * [2. Easy-to-Use](#2-easy-to-use) + [2.1 Use by command line](#21-use-by-command-line) - [2.1.1 English and Chinese Model](#211-english-and-chinese-model) @@ -15,27 +13,11 @@ - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model) - [2.2.2 LayoutParser](#222-layoutparser) - -## 1. Light Installation - + -### 1.1 Install PaddlePaddle2.0 - -```bash -# If you have cuda9 or cuda10 installed on your machine, please run the following command to install -python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple - -# If you only have cpu on your machine, please run the following command to install -python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple -``` - -For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. - - - -### 1.2 Install PaddleOCR Whl Package +## 1. Install PaddleOCR Whl Package ```bash pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+ @@ -59,7 +41,7 @@ pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+ ### 2.1 Use by command line -PaddleOCR provides a series of test images, click xx to download, and then switch to the corresponding directory in the terminal +PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal ```bash cd /path/to/ppocr_img diff --git a/doc/ic15_location_download.png b/doc/ic15_location_download.png new file mode 100644 index 0000000000000000000000000000000000000000..7cb8540e5e51b77aa8b480069841fc51c0d907b7 Binary files /dev/null and b/doc/ic15_location_download.png differ diff --git a/doc/overview.png b/doc/overview.png new file mode 100644 index 0000000000000000000000000000000000000000..c5c4e09d6730bb0b1ca2c0b5442079ceb41ecdfa Binary files /dev/null and b/doc/overview.png differ diff --git a/doc/overview_en.png b/doc/overview_en.png new file mode 100644 index 0000000000000000000000000000000000000000..b44da4e9874d6a2162a8bb05ff1b479875bd65f3 Binary files /dev/null and b/doc/overview_en.png differ