提交 8f77e68f 编写于 作者: K Khanh Tran

raw english words from gg translate

上级 ddefd24d
## 简介 ## Introduction
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 PaddleOCR aims to create a rich, leading, and practical OCR tool library to help users train better models and apply them.
**近期更新** **Recent updates**
- 2020.5.30,模型预测、训练支持Windows系统,识别结果的显示进行了优化 - 2020.5.30,Model prediction and training support Windows systems, and the display of recognition results is optimized
- 2020.5.30,开源通用中文OCR模型 - 2020.5.30,Open source general Chinese OCR model
- 2020.5.30,提供超轻量级中文OCR在线体验 - 2020.5.30,Provide Ultra-lightweight Chinese OCR model inference
## 特性 ## Features
- 超轻量级中文OCR,总模型仅8.6M - Ultra-lightweight Chinese OCR model, total model size is only 8.6M
- 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 - Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
- 检测模型DB(4.1M)+识别模型CRNN(4.5M) - Detection model DB (4.1M) + recognition model CRNN (4.5M)
- 多种文本检测训练算法,EAST、DB - Various text detection algorithms: EAST, DB
- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE - Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
### 支持的中文模型列表: ### Supported Chinese models list:
|模型名称|模型简介|检测模型地址|识别模型地址| |Model Name|Description |Detection Model link|Recognition Model link|
|-|-|-|-| |-|-|-|-|
|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| |chinese_db_crnn_mobile|Ultra-lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| |chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
**也可以按如下教程快速体验超轻量级中文OCR和通用中文OCR模型。** For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr
## **超轻量级中文OCR以及通用中文OCR体验** **You can also quickly experience the Ultra-lightweight Chinese OCR and general Chinese OCR models as follows:**
## **Ultra-lightweight Chinese OCR and General Chinese OCR inference**
![](doc/imgs_results/11.jpg) ![](doc/imgs_results/11.jpg)
上图是超轻量级中文OCR模型效果展示,更多效果图请见文末[超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示)[通用中文OCR效果展示](#通用中文OCR效果展示) The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#超轻量级中文OCR效果展示) and [General Chinese OCR results](#通用中文OCR效果展示).
#### 1.环境配置 #### 1. Environment configuration
请先参考[快速安装](./doc/installation.md)配置PaddleOCR运行环境。 Please see [Quick installation](./doc/installation.md)
#### 2.inference模型下载 #### 2. Download inference models
#### (1)超轻量级中文OCR模型下载 #### (1) Download Ultra-lightweight Chinese OCR models
``` ```
mkdir inference && cd inference mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压 # Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压 # Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd .. cd ..
``` ```
#### (2)通用中文OCR模型下载 #### (2) Download General Chinese OCR models
``` ```
mkdir inference && cd inference mkdir inference && cd inference
# 下载通用中文OCR模型的检测模型并解压 # Download the detection part of the general Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar
# 下载通用中文OCR模型的识别模型并解压 # Download the recognition part of the generic Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar
cd .. cd ..
``` ```
#### 3.单张图像或者图像集合预测 #### 3. Single image and batch image prediction
以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
``` ```
# 设置PYTHONPATH环境变量 # Set PYTHONPATH environment variable
export PYTHONPATH=. export PYTHONPATH=.
# 预测image_dir指定的单张图像 # Predict a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# 预测image_dir指定的图像集合 # Predict a batch of images by specifying image folder path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# 如果想使用CPU进行预测,需设置use_gpu参数为False # If you want to use the CPU for prediction, you need to set the use_gpu parameter to False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
``` ```
通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
``` ```
# 预测image_dir指定的单张图像 # Predict a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
``` ```
更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/inference.md) For more text detection and recognition models, please refer to the document [Inference](./doc/inference.md)
## 文档教程 ## Documentation
- [快速安装](./doc/installation.md) - [Quick installation](./doc/installation.md)
- [文本检测模型训练/评估/预测](./doc/detection.md) - [Text detection model training/evaluation/prediction](./doc/detection.md)
- [文本识别模型训练/评估/预测](./doc/recognition.md) - [Text recognition model training/evaluation/prediction](./doc/recognition.md)
- [基于预测引擎推理](./doc/inference.md) - [Inference](./doc/inference.md)
## 文本检测算法 ## Text detection algorithm
PaddleOCR开源的文本检测算法列表: PaddleOCR open source text detection algorithm list:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] EAST([paper](https://arxiv.org/abs/1704.03155))
- [x] DB([paper](https://arxiv.org/abs/1911.08947)) - [x] DB([paper](https://arxiv.org/abs/1911.08947))
- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研, comming soon) - [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
在ICDAR2015文本检测公开数据集上,算法效果如下: On the ICDAR2015 text detection public dataset, the detection result is as follows:
|模型|骨干网络|precision|recall|Hmean|下载链接| |Model|Backbone|precision|recall|Hmean|Download link|
|-|-|-|-|-|-| |-|-|-|-|-|-|
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| |EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| |EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| |DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| |DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 * Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/detection.md) For the training guide and use of PaddleOCR text detection algorithm, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
## 文本识别算法 ## Text recognition algorithm
PaddleOCR开源的文本识别算法列表: PaddleOCR open-source text recognition algorithm list:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) - [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) - [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) - [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研, comming soon) - [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| |Model|Backbone|Avg Accuracy|Module combination|Download link|
|-|-|-|-|-| |-|-|-|-|-|
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| |Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| |Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| |CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| |CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| |STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| |STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| |RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| |RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/recognition.md) Please refer to the document for training guide and use of PaddleOCR text recognition algorithm [Text recognition model training/evaluation/prediction](./doc/recognition.md)
## 端到端OCR算法 ## End-to-end OCR algorithm
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon) - [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)
<a name="超轻量级中文OCR效果展示"></a> <a name="超轻量级中文OCR效果展示"></a>
## 超轻量级中文OCR效果展示 ## Ultra-lightweight Chinese OCR result
![](doc/imgs_results/1.jpg) ![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg) ![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg) ![](doc/imgs_results/12.jpg)
...@@ -154,7 +155,7 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识 ...@@ -154,7 +155,7 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识
## FAQ ## FAQ
1. 预测报错:got an unexpected keyword argument 'gradient_clip' 1. 预测报错:got an unexpected keyword argument 'gradient_clip'
安装的paddle版本不对,目前本项目仅支持paddle1.7,近期会适配到1.8 The installed paddle version is not correct. At present, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.
2. 转换attention识别模型时报错:KeyError: 'predict' 2. 转换attention识别模型时报错:KeyError: 'predict'
...@@ -172,15 +173,11 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识 ...@@ -172,15 +173,11 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识
自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布,敬请期待。 自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布,敬请期待。
## 欢迎加入PaddleOCR技术交流群 ## Welcome to the PaddleOCR technical exchange group
加微信:paddlehelp,备注OCR,小助手拉你进群~ 加微信:paddlehelp,备注OCR,小助手拉你进群~
## 更新
- 2020.5.30,模型预测、训练支持Windows系统,识别结果的显示进行了优化
- 2020.5.30,开源通用中文OCR模型
- 2020.5.30,提供超轻量级中文OCR在线体验
## 参考文献 ## References
``` ```
1. EAST: 1. EAST:
@inproceedings{zhou2017east, @inproceedings{zhou2017east,
...@@ -235,8 +232,8 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识 ...@@ -235,8 +232,8 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识
} }
``` ```
## 许可证书 ## License
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。 This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
## 如何贡献代码 ## Contribution
我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。 We welcome your contribution to PaddleOCR and thank you for your feedback.
# 可选参数列表 # Optional parameters list
以下列表可以通过`--help`查看 The following list can be viewed via `--help`
| FLAG | 支持脚本 | 用途 | 默认值 | 备注 | | FLAG | Supported script | Use | Defaults | Note |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: | | :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
| -c | ALL | 指定配置文件 | None | **配置模块说明请参考 参数介绍** | | -c | ALL | Specify configuration file | None | **Please refer to the parameter introduction for configuration file usage** |
| -o | ALL | 设置配置文件里的参数内容 | None | 使用-o配置相较于-c选择的配置文件具有更高的优先级。例如:`-o Global.use_gpu=false` | | -o | ALL | Set the parameter in the configuration file | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` |
## 配置文件 Global 参数介绍 ## Introduction to Global Parameters of Configuration File
`rec_chinese_lite_train.yml` 为例 Take `rec_chinese_lite_train.yml` as an example
| 字段 | 用途 | 默认值 | 备注 | | Parameter | Use | Default | Note |
| :----------------------: | :---------------------: | :--------------: | :--------------------: | | :----------------------: | :---------------------: | :--------------: | :--------------------: |
| algorithm | 设置算法 | 与配置文件同步 | 选择模型,支持模型请参考[简介](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README.md) | | algorithm | Select algorithm to use | Synchronize with configuration file | For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README.md) |
| use_gpu | 设置代码运行场所 | true | \ | | use_gpu | Set using GPU or not | true | \ |
| epoch_num | 最大训练epoch数 | 3000 | \ | | epoch_num | Maximum training epoch number | 3000 | \ |
| log_smooth_window | 滑动窗口大小 | 20 | \ | | log_smooth_window | Sliding window size | 20 | \ |
| print_batch_step | 设置打印log间隔 | 10 | \ | | print_batch_step | Set print log interval | 10 | \ |
| save_model_dir | 设置模型保存路径 | output/{算法名称} | \ | | save_model_dir | Set model save path | output/{model_name} | \ |
| save_epoch_step | 设置模型保存间隔 | 3 | \ | | save_epoch_step | Set model save interval | 3 | \ |
| eval_batch_step | 设置模型评估间隔 | 2000 | \ | | eval_batch_step | Set the model evaluation interval | 2000 | \ |
|train_batch_size_per_card | 设置训练时单卡batch size | 256 | \ | |train_batch_size_per_card | Set the batch size during training | 256 | \ |
| test_batch_size_per_card | 设置评估时单卡batch size | 256 | \ | | test_batch_size_per_card | Set the batch size during testing | 256 | \ |
| image_shape | 设置输入图片尺寸 | [3, 32, 100] | \ | | image_shape | Set input image size | [3, 32, 100] | \ |
| max_text_length | 设置文本最大长度 | 25 | \ | | max_text_length | Set the maximum text length | 25 | \ |
| character_type | 设置字符类型 | ch | en/ch, en时将使用默认dict,ch时使用自定义dict| | character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch|
| character_dict_path | 设置字典路径 | ./ppocr/utils/ic15_dict.txt | \ | | character_dict_path | Set dictionary path | ./ppocr/utils/ic15_dict.txt | \ |
| loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention | | loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention |
| reader_yml | 设置reader配置文件 | ./configs/rec/rec_icdar15_reader.yml | \ | | reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ |
| pretrain_weights | 加载预训练模型路径 | ./pretrain_models/CRNN/best_accuracy | \ | | pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ |
| checkpoints | 加载模型参数路径 | None | 用于中断后加载参数继续训练 | | checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption |
| save_inference_dir | inference model 保存路径 | None | 用于保存inference model | | save_inference_dir | path to save model for inference | None | Use to save inference model |
## 配置文件 Reader 系列参数介绍 ## Introduction Reader parameters of Configuration file
`rec_chinese_reader.yml` 为例 Take `rec_chinese_reader.yml` as an example:
| 字段 | 用途 | 默认值 | 备注 | | Parameter | Use | Default | Note |
| :----------------------: | :---------------------: | :--------------: | :--------------------: | | :----------------------: | :---------------------: | :--------------: | :--------------------: |
| reader_function | 选择数据读取方式 | ppocr.data.rec.dataset_traversal,SimpleReader | 支持SimpleReader / LMDBReader 两种数据读取方式 | | reader_function | Select data reading method | ppocr.data.rec.dataset_traversal,SimpleReader | Support two data reading methods: SimpleReader / LMDBReader |
| num_workers | 设置数据读取线程数 | 8 | \ | | num_workers | Set the number of data reading threads | 8 | \ |
| img_set_dir | 数据集路径 | ./train_data | \ | | img_set_dir | Image folder path | ./train_data | \ |
| label_file_path | 数据标签路径 | ./train_data/rec_gt_train.txt| \ | | label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ |
| infer_img | 预测图像文件夹路径 | ./infer_img | \| | infer_img | Result folder path | ./infer_img | \|
# 如何生产自定义超轻量模型? # How to make your own custom ultra-lightweight models?
生产自定义的超轻量模型可分为三步:训练文本检测模型、训练文本识别模型、模型串联预测。 The process of making a customized ultra-lightweight models can be divided into three steps: training text detection model, training text recognition model, and make prediction with trained models.
## step1:训练文本检测模型 ## step1: Train text detection model
PaddleOCR提供了EAST、DB两种文本检测算法,均支持MobileNetV3、ResNet50_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的DB检测模型(即超轻量模型使用的配置): PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks. Select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model :
``` ```
python3 tools/train.py -c configs/det/det_mv3_db.yml python3 tools/train.py -c configs/det/det_mv3_db.yml
``` ```
更详细的数据准备和训练教程参考文档教程中[文本检测模型训练/评估/预测](./detection.md) For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md)
## step2:训练文本识别模型 ## step2: Train text recognition model
PaddleOCR提供了CRNN、Rosetta、STAR-Net、RARE四种文本识别算法,均支持MobileNetV3、ResNet34_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的CRNN识别模型(即超轻量模型使用的配置): PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks, MobileNetV3 and ResNet34_vd, and select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network:
``` ```
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
``` ```
更详细的数据准备和训练教程参考文档教程中[文本识别模型训练/评估/预测](./recognition.md) For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md)
## step3:模型串联预测 ## step3: Make prediction
PaddleOCR提供了检测和识别模型的串联工具,可以将训练好的任一检测模型和任一识别模型串联成两阶段的文本识别系统。输入图像经过文本检测、检测框矫正、文本识别、得分过滤四个主要阶段输出文本位置和识别结果,同时可选择对结果进行可视化。 PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages of text detection, detection frame correction, text recognition, and score filtering to output the text position and recognition results, and at the same time, the results can be selected for visualization.
在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
``` ```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"
``` ```
更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./inference.md) For more text detection and recognition concatenation, please refer to the document [Inference](./inference.md)
# 文字检测 # Text detection
本节以icdar15数据集为例,介绍PaddleOCR中检测模型的训练、评估与测试。 This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
## 数据准备 ## Data preparation
icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。 The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/ 下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件 Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes scattered annotation files into separate annotation files. You can download by wget:
,您可以通过wget的方式进行下载。
``` ```
# 在PaddleOCR路径下 # Under the PaddleOCR path
cd PaddleOCR/ cd PaddleOCR/
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
``` ```
解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,分别是: After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
``` ```
/PaddleOCR/train_data/icdar2015/text_localization/ /PaddleOCR/train_data/icdar2015/text_localization/
└─ icdar_c4_train_imgs/ icdar数据集的训练数据 └─ icdar_c4_train_imgs/ Training data of icdar dataset
└─ ch4_test_images/ icdar数据集的测试数据 └─ ch4_test_images/ Testing data of icdar dataset
└─ train_icdar2015_label.txt icdar数据集的训练标注 └─ train_icdar2015_label.txt Training annotation of icdar dataset
└─ test_icdar2015_label.txt icdar数据集的测试标注 └─ test_icdar2015_label.txt Test annotation of icdar dataset
``` ```
提供的标注文件格式为: The label file format provided is:
``` ```
" 图像文件名 json.dumps编码的图像标注信息" " Image file name Image annotation information encoded by json.dumps"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
``` ```
json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point in the upper left corner.
`transcription` 表示当前文本框的文字,在文本检测任务中并不需要这个信息。
如果您想在其他数据集上训练PaddleOCR,可以按照上述形式构建标注文件。
`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format。
## 快速启动训练
首先下载pretrain model,PaddleOCR的检测模型目前支持两种backbone,分别是MobileNetV3、ResNet50_vd, ## Quickstart training
您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures)中的模型更换backbone。
First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
``` ```
cd PaddleOCR/ cd PaddleOCR/
# 下载MobileNetV3的预训练模型 # Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
# 下载ResNet50的预训练模型 # Download the pre-trained model of ResNet50
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
``` ```
**启动训练** **Start training**
``` ```
python3 tools/train.py -c configs/det/det_mv3_db.yml python3 tools/train.py -c configs/det/det_mv3_db.yml
``` ```
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。 In the above instruction, use -c to select the training to use the configs/det/det_db_mv3.yml configuration file.
有关配置文件的详细解释,请参考[链接](./doc/config.md) For a detailed explanation of the configuration file, please refer to [link](./doc/config.md).
您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001 You can also use the -o parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
``` ```
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
``` ```
## 指标评估 ## Index evaluation
PaddleOCR calculates three indicators related to OCR detection: Precision, Recall, and Hmean.
PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean。 Run the following code to calculate the evaluation index based on the test result file specified by save_res_path in the configuration file det_db_mv3.yml
运行如下代码,根据配置文件det_db_mv3.yml中save_res_path指定的测试集检测结果文件,计算评估指标。 When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5, use different data sets, different models for training, these two parameters can be adjusted for optimization.
评估时设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化
``` ```
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
训练中模型参数默认保存在Global.save_model_dir目录下。在评估指标时,需要设置Global.checkpoints指向保存的参数文件。 The model parameters during training are saved in the Global.save_model_dir directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file.
比如: Such as:
``` ```
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
* 注:box_thresh、unclip_ratio是DB后处理所需要的参数,在评估EAST模型时不需要设置 * Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and do not need to be set when evaluating the EAST model.
## 测试检测效果 ## Test detection result
测试单张图像的检测效果 Test the detection result on a single image:
``` ```
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
测试DB模型时,调整后处理阈值, When testing the DB model, adjust the post-processing threshold:
``` ```
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
测试文件夹下所有图像的检测效果 Test the detection effect of all images in the folder:
``` ```
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
# 基于预测引擎推理 # Inference based on prediction engine
inference 模型(fluid.io.save_inference_model保存的模型) inference model (model saved by fluid.io.save_inference_model)
一般是模型训练完成后保存的固化模型,多用于预测部署。 It is generally the solidified model saved after the model training is completed, which is mostly used to predict deployment.
训练过程中保存的模型是checkpoints模型,保存的是模型的参数,多用于恢复训练等。 The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合与实际系统集成。更详细的介绍请参考文档[分类预测框架](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting deployment and accelerating reasoning, is flexible and convenient, and is suitable for integration with actual systems. For more detailed introduction, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html).
接下来首先介绍如何将训练的模型转换成inference模型,然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。 Next, we first introduce how to convert the trained model into an inference model, and then we will introduce text detection, text recognition, and the connection of the two based on prediction engine inference.
## 训练模型转inference模型 ## Training model to inference model
### 检测模型转inference模型 ### Detection model to inference model
下载超轻量级中文检测模型: Download the super lightweight Chinese detection model:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/
``` ```
上述模型是以MobileNetV3为backbone训练的DB算法,将训练好的模型转换成inference模型只需要运行如下命令: The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command:
``` ```
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/ python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/
``` ```
转inference模型时,使用的配置文件和训练时使用的配置文件相同。另外,还需要设置配置文件中的Global.checkpoints、Global.save_inference_dir参数。 When transferring an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the Global.checkpoints and Global.save_inference_dir parameters in the configuration file.
其中Global.checkpoints指向训练中保存的模型参数文件,Global.save_inference_dir是生成的inference模型要保存的目录。 Global.checkpoints points to the model parameter file saved in training, and Global.save_inference_dir is the directory where the generated inference model is to be saved.
转换成功后,在save_inference_dir 目录下有两个文件: After the conversion is successful, there are two files in the `save_inference_dir` directory:
``` ```
inference/det_db/ inference/det_db/
└─ model 检测inference模型的program文件 └─ model Check the program file of inference model
└─ params 检测inference模型的参数文件 └─ params Check the parameter file of the inference model
``` ```
### 识别模型转inference模型 ### Recognition model to inference model
下载超轻量中文识别模型: Download the ultra-lightweight Chinese recognition model:
``` ```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/
``` ```
识别模型转inference模型与检测的方式相同,如下: The identification model is converted to the inference model in the same way as the detection, as follows:
``` ```
python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \
Global.save_inference_dir=./inference/rec_crnn/ Global.save_inference_dir=./inference/rec_crnn/
``` ```
如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。 If you are a model trained on your own data set and you have adjusted the dictionary file of Chinese characters, please pay attention to whether the character_dict_path in the configuration file is the required dictionary file.
转换成功后,在目录下有两个文件: After the conversion is successful, there are two files in the directory:
``` ```
/inference/rec_crnn/ /inference/rec_crnn/
└─ model 识别inference模型的program文件 └─ model Identify the program file of the inference model
└─ params 识别inference模型的参数文件 └─ params Identify the parameter file of the inference model
``` ```
## 文本检测模型推理 ## Text detection model inference
下面将介绍超轻量中文检测模型推理、DB文本检测模型推理和EAST文本检测模型推理。默认配置是根据DB文本检测模型推理设置的。由于EAST和DB算法差别很大,在推理时,需要通过传入相应的参数适配EAST文本检测算法。 The following will introduce the ultra-lightweight Chinese detection model reasoning, DB text detection model reasoning and EAST text detection model reasoning. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters.
### 1.超轻量中文检测模型推理 ### 1.Ultra-lightweight Chinese detection model inference
超轻量中文检测模型推理,可以执行如下命令: Super lightweight Chinese detection model inference, you can execute the following commands:
``` ```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/"
``` ```
可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](imgs_results/det_res_2.jpg) ![](imgs_results/det_res_2.jpg)
通过设置参数det_max_side_len的大小,改变检测算法中图片规范化的最大值。当图片的长宽都小于det_max_side_len,则使用原图预测,否则将图片等比例缩放到最大值,进行预测。该参数默认设置为det_max_side_len=960. 如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以执行如下命令: By setting the size of the parameter det_max_side_len, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command:
``` ```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200 python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200
``` ```
如果想使用CPU进行预测,执行命令如下 If you want to use the CPU for prediction, execute the command as follows
``` ```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
``` ```
### 2.DB文本检测模型推理 ### 2.DB text detection model inference
首先将DB文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)),可以使用如下命令进行转换: First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert:
``` ```
# -c后面设置训练算法的yml配置文件 # Set the yml configuration file of the training algorithm after -c
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir参数设置转换的模型将保存的地址。 # The Global.save_inference_dir parameter sets the address where the converted model will be saved.
python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.checkpoints="./models/det_r50_vd_db/best_accuracy" Global.save_inference_dir="./inference/det_db" python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.checkpoints="./models/det_r50_vd_db/best_accuracy" Global.save_inference_dir="./inference/det_db"
``` ```
DB文本检测模型推理,可以执行如下命令: DB text detection model inference, you can execute the following command:
``` ```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/" python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/"
``` ```
可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](imgs_results/det_res_img_10_db.jpg) ![](imgs_results/det_res_img_10_db.jpg)
**注意**:由于ICDAR2015数据集只有1000张训练图像,主要针对英文场景,所以上述模型对中文文本图像检测效果非常差。 **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection effect on Chinese text images.
### 3.EAST文本检测模型推理 ### 3.EAST text detection model inference
首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)),可以使用如下命令进行转换: First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English data set as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert:
``` ```
# -c后面设置训练算法的yml配置文件 # Set the yml configuration file of the training algorithm after -c
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir参数设置转换的模型将保存的地址。 # The Global.save_inference_dir parameter sets the address where the converted model will be saved.
python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east" python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east"
``` ```
EAST文本检测模型推理,需要设置参数det_algorithm,指定检测算法类型为EAST,可以执行如下命令: EAST text detection model inference, you need to set the parameter det_algorithm, specify the detection algorithm type as EAST, you can execute the following command:
``` ```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST"
``` ```
可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](imgs_results/det_res_img_10_east.jpg) ![](imgs_results/det_res_img_10_east.jpg)
**注意**:本代码库中EAST后处理中NMS采用的Python版本,所以预测速度比较耗时。如果采用C++版本,会有明显加速。 **Note**: The Python version of NMS used in EAST post-processing in this codebase, so the prediction speed is time-consuming. If you use the C++ version, there will be a significant speedup.
## 文本识别模型推理 ## Text recognition model inference
下面将介绍超轻量中文识别模型推理和基于CTC损失的识别模型推理。**而基于Attention损失的识别模型推理还在调试中**。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。 The following will introduce the ultra-lightweight Chinese recognition model reasoning and CTC loss-based recognition model reasoning. **The recognition model reasoning based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to prefer the recognition model based on CTC loss. In practice, it is also found that the effect based on Attention loss is not as good as the recognition model based on CTC loss.
### 1.超轻量中文识别模型推理 ### 1.Ultra-lightweight Chinese recognition model inference
超轻量中文识别模型推理,可以执行如下命令: Super lightweight Chinese recognition model inference, you can execute the following commands:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/"
...@@ -140,70 +140,70 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" ...@@ -140,70 +140,70 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg"
![](imgs_words/ch/word_4.jpg) ![](imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen.
Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695]
### 2.基于CTC损失的识别模型推理 ### 2.Identification model reasoning based on CTC loss
我们以STAR-Net为例,介绍基于CTC损失的识别模型推理。 CRNN和Rosetta使用方式类似,不用设置识别算法参数rec_algorithm。 Taking STAR-Net as an example, we introduce the identification model reasoning based on CTC loss. CRNN and Rosetta are used in a similar way, without setting the recognition algorithm parameter rec_algorithm.
首先将STAR-Net文本识别训练过程中保存的模型,转换成inference model。以基于Resnet34_vd骨干网络,使用MJSynth和SynthText两个英文文本识别合成数据集训练 First, convert the model saved in the STAR-Net text recognition training process into an inference model. Based on Resnet34_vd backbone network, using MJSynth and SynthText two English text recognition synthetic data set training
的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)),可以使用如下命令进行转换: The example of the model ([model download address](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar))
``` ```
# -c后面设置训练算法的yml配置文件 # Set the yml configuration file of the training algorithm after -c
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。 # The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir参数设置转换的模型将保存的地址。 # The Global.save_inference_dir parameter sets the address where the converted model will be saved.
python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.checkpoints="./models/rec_r34_vd_tps_bilstm_ctc/best_accuracy" Global.save_inference_dir="./inference/starnet" python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.checkpoints="./models/rec_r34_vd_tps_bilstm_ctc/best_accuracy" Global.save_inference_dir="./inference/starnet"
``` ```
STAR-Net文本识别模型推理,可以执行如下命令: STAR-Net text recognition model inference can execute the following commands:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
``` ```
![](imgs_words_en/word_336.png) ![](imgs_words_en/word_336.png)
执行命令后,上面图像的识别结果如下: After executing the command, the recognition result of the above image is as follows:
Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555]
**注意**:由于上述模型是参考[DTRB](https://arxiv.org/abs/1904.01906)文本识别训练和评估流程,与超轻量级中文识别模型训练有两方面不同: **Note**:Since the above model refers to [DTRB] (https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects:
- 训练时采用的图像分辨率不同,训练上述模型采用的图像分辨率是[3,32,100],而中文模型训练时,为了保证长文本的识别效果,训练时采用的图像分辨率是[3, 32, 320]。预测推理程序默认的的形状参数是训练中文采用的图像分辨率,即[3, 32, 320]。因此,这里推理上述英文模型时,需要通过参数rec_image_shape设置识别图像的形状。 - The image resolution used in training is different, and the image resolution used in training the above model is [3,32,100], While the Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the predictive inference program is the image resolution used in training Chinese, that is [3, 32, 320]. Therefore, when reasoning the above English model here, you need to set the shape of the recognition image through the parameter rec_image_shape.
- 字符列表,DTRB论文中实验只是针对26个小写英文本母和10个数字进行实验,总共36个字符。所有大小字符都转成了小写字符,不在上面列表的字符都忽略,认为是空格。因此这里没有输入字符字典,而是通过如下命令生成字典.因此在推理时需要设置参数rec_char_type,指定为英文"en"。 - Character list, the experiment in the DTRB paper is only for 26 lowercase English mothers and 10 numbers, a total of 36 characters. All upper and lower case characters are converted to lower case characters, and characters not in the above list are ignored and considered as spaces. Therefore, no character dictionary is entered here, but a dictionary is generated by the following command. Therefore, the parameter rec_char_type needs to be set during inference, which is specified as "en" in English.
``` ```
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str) dict_character = list(self.character_str)
``` ```
## 文本检测、识别串联推理 ## Text detection, recognition tandem reasoning
### 1.超轻量中文OCR模型推理 ### 1.Ultra-lightweight Chinese OCR model reasoning
在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 When performing prediction, you need to specify the path of a single image or a collection of images through the parameter image_dir, the parameter det_model_dir specifies the path to detect the inference model, and the parameter rec_model_dir specifies the path to identify the inference model. The visual recognition results are saved to the ./inference_results folder by default.
``` ```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/"
``` ```
执行命令后,识别结果图像如下: After executing the command, the recognition result image is as follows:
![](imgs_results/2.jpg) ![](imgs_results/2.jpg)
### 2.其他模型推理 ### 2.Other model reasoning
如果想尝试使用其他检测算法或者识别算法,请参考上述文本检测模型推理和文本识别模型推理,更新相应配置和模型,下面给出基于EAST文本检测和STAR-Net文本识别执行命令: If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following gives the EAST text detection and STAR-Net text recognition execution commands:
``` ```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
``` ```
执行命令后,识别结果图像如下: After executing the command, the recognition result image is as follows:
![](imgs_results/img_10.jpg) ![](imgs_results/img_10.jpg)
## 快速安装 ## Quick installation
经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23 After testing PaddleOCR can run on glibc 2.23, you can also test other glibc versions or install glic 2.23
PaddleOCR 工作环境 PaddleOCR working environment
- PaddlePaddle1.7 - PaddlePaddle1.7
- python3 - python3
- glibc 2.23 - glibc 2.23
建议使用我们提供的docker运行PaddleOCR,有关docker使用请参考[链接](https://docs.docker.com/get-started/) It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/).
1. (建议)准备docker环境。第一次使用这个镜像,会自动下载该镜像,请耐心等待。 1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient.
``` ```
# 切换到工作目录下 # Switch to the working directory
cd /home/Projects cd /home/Projects
# 首次运行需创建一个docker容器,再次运行时不需要运行当前命令 # You need to create a docker container for the first run, and do not need to run the current command when you run it again
# 创建一个名字为ppocr的docker容器,并将当前目录映射到容器的/paddle目录下 # Create a docker container named ppocr and map the current directory to the /paddle directory of the container
如果您希望在CPU环境下使用docker,使用docker而不是nvidia-docker创建docker If you want to use docker in a CPU environment, use docker instead of nvidia-docker to create docker
sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash
```
如果您的机器安装的是CUDA9,请运行以下命令创建容器 If your machine is installed CUDA9, please run the following command to create a container
```
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash
```
如果您的机器安装的是CUDA10,请运行以下命令创建容器 If your machine is installed with CUDA10, please run the following command to create a container
```
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash
```
您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。 You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine.
```
# ctrl+P+Q可退出docker,重新进入docker使用如下命令 # ctrl+P+Q can exit docker and re-enter docker using the following command
sudo docker container exec -it ppocr /bin/bash sudo docker container exec -it ppocr /bin/bash
``` ```
注意:如果docker pull过慢,可以按照如下步骤手动下载后加载docker,以cuda9 docker为例,使用cuda10 docker只需要将cuda9改为cuda10即可。 Note: if docker pull is too slow, you can manually download and load docker according to the following steps. Taking cuda9 docker as an example, using cuda10 docker only needs to change cuda9 to cuda10
``` ```
# 下载CUDA9 docker的压缩文件,并解压 # Download the CUDA9 docker compressed file and unzip it
wget https://paddleocr.bj.bcebos.com/docker/docker_pdocr_cuda9.tar.gz wget https://paddleocr.bj.bcebos.com/docker/docker_pdocr_cuda9.tar.gz
# 为减少下载时间,上传的docker image是压缩过的,需要解压使用 # To reduce download time, the uploaded docker image is compressed and needs to be decompressed
tar zxf docker_pdocr_cuda9.tar.gz tar zxf docker_pdocr_cuda9.tar.gz
# 创建image # Create image
docker load < docker_pdocr_cuda9.tar docker load < docker_pdocr_cuda9.tar
# 完成上述步骤后通过docker images检查是否加载了下载的镜像 # After completing the above steps, check whether the downloaded image is loaded through docker images
docker images docker images
# 执行docker images后如果有下面的输出,即可按照按照 步骤1 创建docker环境。 # If you have the following output after executing docker images, you can follow step 1 to create a docker environment.
hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829
``` ```
2. 安装PaddlePaddle Fluid v1.7(暂不支持更高版本,适配工作进行中) 2. Install PaddlePaddle Fluid v1.7 (the higher version is not supported yet, the adaptation work is in progress)
``` ```
pip3 install --upgrade pip pip3 install --upgrade pip
如果您的机器安装的是CUDA9,请运行以下命令安装 # If your machine is installed CUDA9, please run the following command to install
python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple
如果您的机器安装的是CUDA10,请运行以下命令安装 # If your machine is installed CUDA10, please run the following command to install
python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
``` ```
For more version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
3. 克隆PaddleOCR repo代码 3. Clone PaddleOCR repo code
``` ```
【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR # Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR
如果因为网络问题无法pull成功,也可选择使用码云上的托管: # If you cannot pull because of network problems, you can also choose to use the hosting on the code cloud:
git clone https://gitee.com/paddlepaddle/PaddleOCR git clone https://gitee.com/paddlepaddle/PaddleOCR
注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。 # Note: Code cloud hosting code may not be able to synchronize this github project update in real time, there is a delay of 3~5 days, please use the recommended method first.
``` ```
4. 安装第三方库 4. Install third-party libraries
``` ```
cd PaddleOCR cd PaddleOCR
pip3 install -r requirments.txt pip3 install -r requirments.txt
......
## 文字识别 ## Text recognition
### 数据准备 ### Data preparation
PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据,调试算法; `通用数据` 训练自己的数据: PaddleOCR pupports two data formats: `lmdb` used to train public data and debug algorithms; `General Data` to train your own data:
请按如下步骤设置数据集: Please set the dataset as follows:
训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录: The default storage path for training data is `PaddleOCR/train_data`, if you already have a data set on your disk, just create a soft link to the data set directory:
``` ```
ln -sf <path/to/dataset> <path/to/paddle_detection>/train_data/dataset ln -sf <path/to/dataset> <path/to/paddle_detection>/train_data/dataset
``` ```
* 数据下载 * Data download
若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),下载 benchmark 所需的lmdb格式数据集。 If you do not have a data set locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required by benchmark
* 使用自己数据集: * Use your own dataset:
若您希望使用自己的数据进行训练,请参考下文组织您的数据。 If you want to use your own data for training, please refer to the following to organize your data.
- 训练集 - Training set
首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(rec_gt_train.txt)记录图片路径和标签。 First put the training pictures in the same folder (train_images), and use a txt file (rec_gt_train.txt) to record the picture path and label.
* 注意: 默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错 * Note: by default, please split the image path and image label with \t, if you use other methods to split, it will cause training error
``` ```
" 图像文件名 图像标注信息 " " Image file name Image annotation "
train_data/train_0001.jpg 简单可依赖 train_data/train_0001.jpg 简单可依赖
train_data/train_0002.jpg 用科技让复杂的世界更简单 train_data/train_0002.jpg 用科技让复杂的世界更简单
``` ```
PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件,通过以下方式下载: PaddleOCR provides a label file for training the icdar2015 dataset, which can be downloaded in the following ways:
``` ```
# 训练集标签 # Training set label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# 测试集标签 # Test Set Label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
``` ```
最终训练集应有如下文件结构: The final training set should have the following file structure:
``` ```
|-train_data |-train_data
...@@ -56,9 +56,9 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t ...@@ -56,9 +56,9 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t
| ... | ...
``` ```
- 测试集 - Test set
同训练集类似,测试集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,测试集的结构如下所示: Similar to the training set, the test set also needs to provide a folder containing all pictures (test) and a rec_gt_test.txt. The structure of the test set is as follows:
``` ```
|-train_data |-train_data
...@@ -71,11 +71,11 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t ...@@ -71,11 +71,11 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t
| ... | ...
``` ```
- 字典 - Dictionary
最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。 Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
因此字典需要包含所有希望被正确识别的字符,{word_dict_name}.txt需要写成如下格式,并以 `utf-8` 编码格式保存: Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format:
``` ```
l l
...@@ -86,48 +86,50 @@ r ...@@ -86,48 +86,50 @@ r
n n
``` ```
word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,“and” 将被映射成 [2 5 1] word_dict.txt There is a single word in each line, which maps characters and numeric indexes together, and "and" will be mapped to [2 5 1]
`ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典, `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters,
`ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典,
您可以按需使用。
如需自定义dic文件,请修改 `configs/rec/rec_icdar15_train.yml` 中的 `character_dict_path` 字段, 并将 `character_type` 设置为 `ch` `ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters,
### 启动训练 You can use them as needed.
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例: To customize the dic file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.。
首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune ### Start training
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
First download the pretrain model, you can download the trained model to finetune on the icdar2015 data
``` ```
cd PaddleOCR/ cd PaddleOCR/
# 下载MobileNetV3的预训练模型 # Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar
# 解压模型参数 # Decompress model parameters
cd pretrain_models cd pretrain_models
tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar
``` ```
开始训练: Start training:
``` ```
# 设置PYTHONPATH路径 # Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:. export PYTHONPATH=$PYTHONPATH:.
# GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号 # GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
# 训练icdar15英文数据 # Training icdar15 English data
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
``` ```
PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter. By default, the best acc model is saved as `output/rec_CRNN/best_accuracy` during the evaluation process.
如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。 If the verification set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
* 提示: 可通过 -c 参数选择 `configs/rec/` 路径下的多种模型配置进行训练,PaddleOCR支持的识别算法有: * Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are:
| 配置文件 | 算法名称 | backbone | trans | seq | pred | | Configuration file | Algorithm name | backbone | trans | seq | pred |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: |
| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | | rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
...@@ -140,58 +142,58 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t ...@@ -140,58 +142,58 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention | | rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention |
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
训练中文数据,推荐使用`rec_chinese_lite_train.yml`,如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the effect of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
`rec_mv3_none_none_ctc.yml` 为例: Take `rec_mv3_none_none_ctc.yml` as an example:
``` ```
Global: Global:
... ...
# 修改 image_shape 以适应长文本 # Modify image_shape to fit long text
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
... ...
# 修改字符类型 # Modify character type
character_type: ch character_type: ch
# 添加自定义字典,如修改字典请将路径指向新字典 # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
... ...
# 修改reader类型 # Modify reader type
reader_yml: ./configs/rec/rec_chinese_reader.yml reader_yml: ./configs/rec/rec_chinese_reader.yml
... ...
... ...
``` ```
**注意,预测/评估时的配置文件请务必与训练一致。** **Note that the configuration file for prediction/evaluation must be consistent with the training.**
### 评估 ### Evaluation
评估数据集可以通过 `configs/rec/rec_icdar15_reader.yml` 修改EvalReader中的 `label_file_path` 设置。 The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader.
``` ```
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
# GPU 评估, Global.checkpoints 为待测权重 # GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
``` ```
### 预测 ### prediction
* 训练引擎的预测 * Training engine prediction
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。 The model trained using PaddleOCR can be quickly predicted by the following script.
默认预测图片存储在 `infer_img` 里,通过 `-o Global.checkpoints` 指定权重: The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`:
``` ```
# 预测英文结果 # Predict English results
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
``` ```
预测图片: Input image:
![](./imgs_words/en/word_1.png) ![](./imgs_words/en/word_1.png)
得到输入图像的预测结果: Get the prediction result of the input image:
``` ```
infer_img: doc/imgs_words/en/word_1.png infer_img: doc/imgs_words/en/word_1.png
...@@ -199,19 +201,19 @@ infer_img: doc/imgs_words/en/word_1.png ...@@ -199,19 +201,19 @@ infer_img: doc/imgs_words/en/word_1.png
word : joint word : joint
``` ```
预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml` 完成了中文模型的训练, The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model through `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`,
您可以使用如下命令进行中文模型预测。 You can use the following command to predict the Chinese model.
``` ```
# 预测中文结果 # Predict Chinese results
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg
``` ```
预测图片: Input image:
![](./imgs_words/ch/word_1.jpg) ![](./imgs_words/ch/word_1.jpg)
得到输入图像的预测结果: Get the prediction result of the input image:
``` ```
infer_img: doc/imgs_words/ch/word_1.jpg infer_img: doc/imgs_words/ch/word_1.jpg
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册