diff --git a/README.md b/README.md index bb3418d925c938dd7e21ca9314ee5607f47c85e0..7eb030bdc686b42115146b41ec27e81dab6a1b82 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools **Recent updates** -- PaddleOCR R&D team would like to share the released tools with developers, at 20:15 pm on September 8th, [Live Address](https://live.bilibili.com/21689802). +- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Live Address](https://live.bilibili.com/21689802). - 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. - 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). - 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized. @@ -86,7 +86,7 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr | Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | | ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| Chinese and English ultra-lightweight PP-OCRv2 model(11.6M) | ch_ppocrv2_xx |Mobile&Server|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)| [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar)| +| Chinese and English ultra-lightweight PP-OCRv2 model(11.6M) | ch_PP-OCRv2_xx |Mobile&Server|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/ch/ch_PP-OCRv2_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)| | Chinese and English ultra-lightweight PP-OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | | Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -103,13 +103,12 @@ For a new language request, please refer to [Guideline for new language_requests - [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md) - [PP-OCR Model Download](./doc/doc_en/models_list_en.md) - [Yml Configuration](./doc/doc_en/config_en.md) - - [Python Inference](./doc/doc_en/inference_en.md) + - [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md) - [PP-OCR Training](./doc/doc_en/training_en.md) - [Text Detection](./doc/doc_en/detection_en.md) - [Text Recognition](./doc/doc_en/recognition_en.md) - [Direction Classification](./doc/doc_en/angle_class_en.md) - Inference and Deployment - - [Python Inference](./doc/doc_en/inference_en.md) - [C++ Inference](./deploy/cpp_infer/readme_en.md) - [Serving](./deploy/pdserving/README.md) - [Mobile](./deploy/lite/readme_en.md) @@ -120,6 +119,7 @@ For a new language request, please refer to [Guideline for new language_requests - Academic Circles - [Two-stage Algorithm](./doc/doc_en/algorithm_overview_en.md) - [PGNet Algorithm](./doc/doc_en/algorithm_overview_en.md) + - [Python Inference](./doc/doc_en/inference_en.md) - Data Annotation and Synthesis - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md) - [Data Synthesis Tool: Style-Text](./StyleText/README.md) @@ -146,7 +146,7 @@ For a new language request, please refer to [Guideline for new language_requests [1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). -[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy; The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (arXiv link is coming soon). +[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (arXiv link is coming soon). diff --git a/README_ch.md b/README_ch.md index 3a9c7567d47dd7ef0d481174977ae958081bb235..7e8a8e241be1e22ddcc74bcd99d78225b32a91fa 100755 --- a/README_ch.md +++ b/README_ch.md @@ -81,7 +81,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 | 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | | ------------ | --------------- | ----------------|---- | ---------- | -------- | -| 中英文超轻量PP-OCRv2模型(11.6M) | ch_ppocrv2_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)| [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar)| +| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/chinese/ch_PP-OCRv2_det_distill_train.tar)| [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)| | 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | | 中英文通用PP-OCR server模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -95,13 +95,12 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md) - [PP-OCR模型下载](./doc/doc_ch/models_list.md) - [配置文件内容与生成](./doc/doc_ch/config.md) - - [模型库快速使用](./doc/doc_ch/inference.md) + - [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md) - [PP-OCR模型训练](./doc/doc_ch/training.md) - [文本检测](./doc/doc_ch/detection.md) - [文本识别](./doc/doc_ch/recognition.md) - [方向分类器](./doc/doc_ch/angle_class.md) - PP-OCR模型推理部署 - - [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md) - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [服务化部署](./deploy/pdserving/README_CN.md) - [端侧部署](./deploy/lite/readme.md) @@ -117,6 +116,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - OCR学术圈 - [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md) - [端到端PGNet算法](./doc/doc_ch/pgnet.md) + - [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md) - 数据集 - [通用中英文OCR数据集](./doc/doc_ch/datasets.md) - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) diff --git a/configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml b/configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml similarity index 98% rename from configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml rename to configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml index c8962ffafe7ea782377b1988dee9efdd01d56d5a..0f08909add17d8c73ad6e1b00e17d4c351def7e5 100644 --- a/configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml +++ b/configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml @@ -8,7 +8,7 @@ Global: # evaluation is run every 5000 iterations after the 4000th iteration eval_batch_step: [3000, 2000] cal_metric_during_train: False - pretrained_model: ./pretrain_models/ch_ppocr_mobile_v2.1_det_distill_train/best_accuracy + pretrained_model: ./pretrain_models/ch_PP-OCRv2_det_distill_train/best_accuracy checkpoints: save_inference_dir: use_visualdl: False diff --git a/configs/det/ch_ppocr_v2.1/ch_det_lite_train_distill_v2.1.yml b/configs/det/ch_PP-OCRv2/ch_PP-OCR_det_distill.yml similarity index 100% rename from configs/det/ch_ppocr_v2.1/ch_det_lite_train_distill_v2.1.yml rename to configs/det/ch_PP-OCRv2/ch_PP-OCR_det_distill.yml diff --git a/configs/det/ch_ppocr_v2.1/ch_det_lite_train_dml_v2.1.yml b/configs/det/ch_PP-OCRv2/ch_PP-OCR_det_dml.yml similarity index 100% rename from configs/det/ch_ppocr_v2.1/ch_det_lite_train_dml_v2.1.yml rename to configs/det/ch_PP-OCRv2/ch_PP-OCR_det_dml.yml diff --git a/configs/det/ch_ppocr_v2.1/ch_det_mv3_db_v2.1_student.yml b/configs/det/ch_PP-OCRv2/ch_PP-OCR_det_student.yml similarity index 100% rename from configs/det/ch_ppocr_v2.1/ch_det_mv3_db_v2.1_student.yml rename to configs/det/ch_PP-OCRv2/ch_PP-OCR_det_student.yml diff --git a/configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml b/configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml similarity index 100% rename from configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml rename to configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml diff --git a/doc/ic15_location_download.png b/doc/datasets/ic15_location_download.png similarity index 100% rename from doc/ic15_location_download.png rename to doc/datasets/ic15_location_download.png diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md index 57bfdc01e28042e70e42f0dfecb6f8c81d92d8f1..66295b25252e3906b4d3e6ffb30b135f0c6bdf6c 100644 --- a/doc/doc_ch/detection.md +++ b/doc/doc_ch/detection.md @@ -19,15 +19,16 @@ ## 1.1 数据准备 -icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。 +icdar2015 TextLocalization数据集是文本检测的数据集,包含1000张训练图像和500张测试图像。 +icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。 注册完成登陆后,下载下图中红色框标出的部分,其中, `Training Set Images`下载的内容保存为`icdar_c4_train_imgs`文件夹下,`Test Set Images` 下载的内容保存为`ch4_test_images`文件夹下

- +

-将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/ 下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件 +将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件 ,您可以通过wget的方式进行下载。 ```shell # 在PaddleOCR路径下 diff --git a/doc/doc_ch/environment.md b/doc/doc_ch/environment.md index 8efc31983a9d7cee50f922a3f84a2b1a2de23889..a55008100312f50e0786dca49eaa186afc49b3b2 100644 --- a/doc/doc_ch/environment.md +++ b/doc/doc_ch/environment.md @@ -1,4 +1,7 @@ # 运行环境准备 +Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建PyThon环境。 + +如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。 * [1. Python环境搭建](#1) + [1.1 Windows](#1.1) @@ -63,9 +66,9 @@ ``` create environment - - - + + + 以上anaconda环境和python环境安装完毕 @@ -80,9 +83,9 @@ - 安装完Anaconda后,可以安装python环境,以及numpy等所需的工具包环境 - Anaconda下载: - 地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D - + anaconda download - + - 选择最下方的`Anaconda3-2021.05-MacOSX-x86_64.pkg`下载 - 下载完成后,双击.pkg文件进入图形界面 - 按默认设置即可,安装需要花费一段时间 @@ -177,7 +180,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker - 说明:使用paddlepaddle需要先安装python环境,这里我们选择python集成环境Anaconda工具包 - Anaconda是1个常用的python包管理程序 - 安装完Anaconda后,可以安装python环境,以及numpy等所需的工具包环境 - + - **下载Anaconda**: - 下载地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D @@ -185,22 +188,22 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker - 选择适合您操作系统的版本 - 可在终端输入`uname -m`查询系统所用的指令集 - + - 下载法1:本地下载,再将安装包传到linux服务器上 - + - 下载法2:直接使用linux命令行下载 - + ```shell # 首先安装wget sudo apt-get install wget # Ubuntu sudo yum install wget # CentOS ``` - + ```shell # 然后使用wget从清华源上下载 # 如要下载Anaconda3-2021.05-Linux-x86_64.sh,则下载命令如下: wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh - + # 若您要下载其他版本,需要将最后1个/后的文件名改成您希望下载的版本 ``` @@ -210,7 +213,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker - 若您下载的是其它版本,则将该命令的文件名替换为您下载的文件名 - 按照安装提示安装即可 - 查看许可时可输入q来退出 - + - **将conda加入环境变量** - 加入环境变量是为了让系统能识别conda命令,若您在安装时已将conda加入环境变量path,则可跳过本步 @@ -277,13 +280,13 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker # 激活paddle_env环境 conda activate paddle_env ``` - + 以上anaconda环境和python环境安装完毕 #### 1.3.2 Docker环境配置 -**注意:第一次使用这个镜像,会自动下载该镜像,请耐心等待。** +**注意:第一次使用这个镜像,会自动下载该镜像,请耐心等待。您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。** ```bash # 切换到工作目录下 @@ -297,8 +300,6 @@ sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/pad 如果使用CUDA10,请运行以下命令创建容器,设置docker容器共享内存shm-size为64G,建议设置32G以上 sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash -您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。 - # ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令 sudo docker container exec -it ppocr /bin/bash ``` diff --git a/doc/doc_ch/knowledge_distillation.md b/doc/doc_ch/knowledge_distillation.md index b561f718491011e8dddcd44e66bfd6da62101ba6..5827f48c81d51a674011e2df40c798e0548fb0a1 100644 --- a/doc/doc_ch/knowledge_distillation.md +++ b/doc/doc_ch/knowledge_distillation.md @@ -39,7 +39,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要 ### 2.1 识别配置文件解析 -配置文件在[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)。 +配置文件在[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)。 #### 2.1.1 模型结构 diff --git a/doc/doc_ch/models_and_config.md b/doc/doc_ch/models_and_config.md index cc425947238e12961d84be3878e178135b996c43..89afc89a99bed364fd2abe247946dfe9e552ae86 100644 --- a/doc/doc_ch/models_and_config.md +++ b/doc/doc_ch/models_and_config.md @@ -2,7 +2,7 @@ # PP-OCR模型与配置文件 PP-OCR模型与配置文件一章主要补充一些OCR模型的基本概念、配置文件的内容与作用以便对模型后续的参数调整和训练中拥有更好的体验。 -本节包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。 +本章包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference_ppocr.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。 ------ diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index 59a36b578ef1ad99ae62c4a09db78fb4562538eb..5e78795bcda96cf005f24b97cfcc0a8580b2ae1e 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -33,8 +33,8 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.1_det|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_det_lite_train_cml_v2.1.yml](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_quant_infer.tar)| -|ch_ppocr_mobile_v2.1_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_lite_train_cml_v2.1.ym](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)| +|ch_PP-OCRv2_det_slim|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| +|ch_PP-OCRv2_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| |ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 2.6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -48,8 +48,8 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.1_rec|slim量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_train.tar) | -|ch_ppocr_mobile_v2.1_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar) | +|ch_PP-OCRv2_rec_slim|slim量化版超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | +|ch_PP-OCRv2_rec|原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -93,12 +93,13 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型,对检测到的文本行文字角度分类|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) | |ch_ppocr_mobile_v2.0_cls|原始分类器模型,对检测到的文本行文字角度分类|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | + ### 四、Paddle-Lite 模型 |模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本| |---|---|---|---|---|---|---| -|V2.1|ppocr_v2.1蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer_opt.nb)|v2.9| -|V2.1(slim)|ppocr_v2.1蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_opt.nb)|v2.9| +|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| +|PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9| |V2.0|ppocr_v2.0超轻量中文OCR移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| |V2.0(slim)|ppocr_v2.0超轻量中文OCR移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index 74d7004b35b06a79dd0acfe7d443ed128d63df88..8f57489059e7f8ac1fde11d9e5c382e2f7e85a18 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -2,7 +2,7 @@ - [PaddleOCR快速开始](#paddleocr) - + + [1. 安装PaddleOCR whl包](#1) * [2. 便捷使用](#2) + [2.1 命令行使用](#21) @@ -90,7 +90,7 @@ cd /path/to/ppocr_img ``` -更多whl包使用可参考[whl包文档](./whl.md) +如需使用2.0模型,请指定参数`--version 2.0`,paddleocr默认使用2.1模型。更多whl包使用可参考[whl包文档](./whl.md) @@ -166,7 +166,7 @@ paddleocr --image_dir=./table/1.png --type=structure ``` /output/table/1/ - └─ res.txt + └─ res.txt └─ [454, 360, 824, 658].xlsx 表格识别结果 └─ [16, 2, 828, 305].jpg 被裁剪出的图片区域 └─ [17, 361, 404, 711].xlsx 表格识别结果 @@ -183,7 +183,7 @@ paddleocr --image_dir=./table/1.png --type=structure 大部分参数和paddleocr whl包保持一致,见 [whl包文档](./whl.md) - + ### 2.2 Python脚本使用 @@ -232,6 +232,7 @@ im_show.save('result.jpg') + #### 2.2.2 版面分析 ```python diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index cd1a64bd7b2aa6d645d59bb9ab1cc1a741f38adc..d57d6bb152fb3de80909cb04f1b9b10a70ebd0b7 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -7,15 +7,13 @@ - [1.2 数据下载](#数据下载) - [1.3 字典](#字典) - [1.4 支持空格](#支持空格) - - [2 启动训练](#启动训练) - [2.1 数据增强](#数据增强) - [2.2 通用模型训练](#通用模型训练) - [2.3 多语言模型训练](#多语言模型训练) - - [3 评估](#评估) - - [4 预测](#预测) +- [5 转Inference模型测试](#Inference) @@ -424,3 +422,39 @@ python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v infer_img: doc/imgs_words/ch/word_1.jpg result: ('韩国小馆', 0.997218) ``` + + + +## 5. 转Inference模型测试 + +识别模型转inference模型与检测的方式相同,如下: + +``` +# -c 后面设置训练算法的yml配置文件 +# -o 配置可选参数 +# Global.pretrained_model 参数设置待转换的训练模型地址,不用添加文件后缀 .pdmodel,.pdopt或.pdparams。 +# Global.save_inference_dir参数设置转换的模型将保存的地址。 + +python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_rec_train/best_accuracy Global.save_inference_dir=./inference/rec_crnn/ +``` + +**注意:**如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的`character_dict_path`是否是所需要的字典文件。 + +转换成功后,在目录下有三个文件: + +``` +/inference/rec_crnn/ + ├── inference.pdiparams # 识别inference模型的参数文件 + ├── inference.pdiparams.info # 识别inference模型的参数信息,可忽略 + └── inference.pdmodel # 识别inference模型的program文件 +``` + +- 自定义模型推理 + + 如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch` + + ``` + python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" + ``` + + diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 8f12d42fe798de7d330f1d3ef1950325887525cb..d3f6f3da102d06c53e4e179a0bd89670536e1af7 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -18,13 +18,14 @@ This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. ## 1.1 DATA PREPARATION -The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. + +The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`

- +

Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: diff --git a/doc/doc_en/inference_ppocr_en.md b/doc/doc_en/inference_ppocr_en.md index 5442d0c578027c33890fbb063a0d2c78ecd226c5..fa3b1c88713f01e8e411cf95d107b4b58dd7f4e1 100755 --- a/doc/doc_en/inference_ppocr_en.md +++ b/doc/doc_en/inference_ppocr_en.md @@ -1,22 +1,22 @@ -# Reasoning based on Python prediction engine +# Python Inference for PP-OCR Model Library This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU. -- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE) +- [Text Detection Model Inference](#DETECTION_MODEL_INFERENCE) -- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE) - - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION) - - [2. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE) +- [Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE) + - [1. Lightweight Chinese Recognition Model Inference](#LIGHTWEIGHT_RECOGNITION) + - [2. Multilingaul Model Inference](#MULTILINGUAL_MODEL_INFERENCE) -- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) +- [Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE) -- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION) +- [Text Detection Angle Classification and Recognition Inference Concatenation](#CONCATENATION) -## TEXT DETECTION MODEL INFERENCE +## Text Detection Model Inference The default configuration is based on the inference setting of the DB text detection model. For lightweight Chinese detection model inference, you can execute the following commands: @@ -52,11 +52,11 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_di -## TEXT RECOGNITION MODEL INFERENCE +## Text Recognition Model Inference -### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE +### 1. Lightweight Chinese Recognition Model Inference For lightweight Chinese recognition model inference, you can execute the following commands: @@ -77,7 +77,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658) -### 2. MULTILINGAUL MODEL INFERENCE +### 2. Multilingaul Model Inference If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition: @@ -94,7 +94,7 @@ Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904) -## ANGLE CLASSIFICATION MODEL INFERENCE +## Angle Classification Model Inference For angle classification model inference, you can execute the following commands: @@ -114,7 +114,7 @@ After executing the command, the prediction results (classification angle and sc ``` -## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION +## Text Detection Angle Classification and Recognition Inference Concatenation When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default. diff --git a/doc/doc_en/models_and_config_en.md b/doc/doc_en/models_and_config_en.md index f80c00715f148974db616f411740580c281659ed..414d844d63d51a2b53feea035c1f735594d73fe0 100644 --- a/doc/doc_en/models_and_config_en.md +++ b/doc/doc_en/models_and_config_en.md @@ -1,4 +1,12 @@ # PP-OCR Model and Configuration +The chapter on PP-OCR model and configuration file mainly adds some basic concepts of OCR model and the content and role of configuration file to have a better experience in the subsequent parameter adjustment and training of the model. + +This chapter contains three parts. Firstly, [PP-OCR Model Download](. /models_list_en.md) explains the concept of PP-OCR model types and provides links to download all models. Then in [Yml Configuration](. /config_en.md) details the parameters needed to fine-tune the PP-OCR models. The final [Python Inference for PP-OCR Model Library](. /inference_ppocr_en.md) is an introduction to the use of the PP-OCR model library in the first section, which can quickly utilize the rich model library models to obtain test results through the Python inference engine. + +------ + +Let's first understand some basic concepts. + - [INTRODUCTION ABOUT OCR](#introduction-about-ocr) * [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model) * [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model) diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index d699678e64963d8949c559e422f48b814ec3b921..3b9b5518701f052079af1398a4fa3e3770eb12a1 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -29,8 +29,8 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.1_det|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.yml](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_quant_infer.tar)| -|ch_ppocr_mobile_v2.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.ym](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)| +|ch_PP-OCRv2_det_slim|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| +|ch_PP-OCRv2_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| |ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -43,8 +43,8 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.1_rec|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_train.tar) | -|ch_ppocr_mobile_v2.1_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar) | +|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | +|ch_PP-OCRv2_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -93,7 +93,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l ### 4. Paddle-Lite Model |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch| |---|---|---|---|---|---|---| -|V2.1|ppocr_v2.1 extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer_opt.nb)|v2.9| -|V2.1(slim)|extra-lightweight chinese OCR optimized model|4.9M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_opt.nb)|v2.9| +|PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| +|PP-OCRv2(slim)|extra-lightweight chinese OCR optimized model|4.9M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9| |V2.0|ppocr_v2.0 extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| |V2.0(slim)|ppovr_v2.0 extra-lightweight chinese OCR optimized model|3.3M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index 19d6f327a64134be4ef5763ded7ce3fd5d8590ef..c4fb5068197c8fb655c1e3ddf4aa6143e7d558e2 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -95,7 +95,7 @@ If you do not use the provided test image, you can replace the following `--imag ['PAIN', 0.990372] ``` -More whl package usage can be found in [whl package](./whl_en.md) +If you need to use the 2.0 model, please specify the parameter `--version 2.0`, paddleocr uses the 2.1 model by default. More whl package usage can be found in [whl package](./whl_en.md) #### 2.1.2 Multi-language Model diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 21233c80a298fd32c889b24c47acdfaa885fc682..7ee0cb5fc084e10658fe02b03910431a074e84ce 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -15,6 +15,7 @@ - [4 PREDICTION](#PREDICTION) - [4.1 Training engine prediction](#Training_engine_prediction) +- [5 CONVERT TO INFERENCE MODEL](#Inference) ## 1 DATA PREPARATION @@ -361,6 +362,7 @@ Eval: ``` + ## 3 EVALUATION The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file. @@ -432,3 +434,40 @@ Get the prediction result of the input image: infer_img: doc/imgs_words/ch/word_1.jpg result: ('韩国小馆', 0.997218) ``` + + + +## 5 CONVERT TO INFERENCE MODEL + +The recognition model is converted to the inference model in the same way as the detection, as follows: + +``` +# -c Set the training algorithm yml configuration file +# -o Set optional parameters +# Global.pretrained_model parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# Global.save_inference_dir Set the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_rec_train/best_accuracy Global.save_inference_dir=./inference/rec_crnn/ +``` + +If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path. + +After the conversion is successful, there are three files in the model save directory: + +``` +inference/det_db/ + ├── inference.pdiparams # The parameter file of recognition inference model + ├── inference.pdiparams.info # The parameter information of recognition inference model, which can be ignored + └── inference.pdmodel # The program file of recognition model +``` + +- Text recognition model Inference using custom characters dictionary + + If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch` + + ``` + python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" + ``` + + + diff --git a/paddleocr.py b/paddleocr.py index de712442450aaba1176d2cf754de8a429042f84d..a98efd34088701d5eb5602743cf75b7d5e80157f 100644 --- a/paddleocr.py +++ b/paddleocr.py @@ -49,13 +49,13 @@ MODEL_URLS = { 'det': { 'ch': { 'url': - 'https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar', + 'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar', }, }, 'rec': { 'ch': { 'url': - 'https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer.tar', + 'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar', 'dict_path': './ppocr/utils/ppocr_keys_v1.txt' } }