diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md index 440b392227182f50396f0e66ca8250bc6bfc1c0d..c61a9234bb29d1562112c47b5ca34118c12331d6 100644 --- a/doc/doc_ch/algorithm_overview.md +++ b/doc/doc_ch/algorithm_overview.md @@ -1,6 +1,6 @@ ## 算法介绍 -本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v1.1 系列模型下载](./models_list.md)。 +本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)。 - [1.文本检测算法](#文本检测算法) - [2.文本识别算法](#文本识别算法) @@ -9,25 +9,25 @@ ### 1.文本检测算法 PaddleOCR开源的文本检测算法列表: -- [x] DB([paper](https://arxiv.org/abs/1911.08947))(ppocr推荐) +- [x] DB([paper]( https://arxiv.org/abs/1911.08947) )(ppocr推荐) - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] SAST([paper](https://arxiv.org/abs/1908.05498)) 在ICDAR2015文本检测公开数据集上,算法效果如下: |模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](link)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](link)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](link)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](link)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](link))| +| --- | --- | --- | --- | --- | --- | +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接 (coming soon)](link)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接 (coming soon)](coming soon)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| +|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接 (coming soon)](link)| 在Total-text文本检测公开数据集上,算法效果如下: |模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](link)| +| --- | --- | --- | --- | --- | --- | +|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接 (coming soon)](link)| **说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) @@ -38,22 +38,22 @@ PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训 ### 2.文本识别算法 PaddleOCR基于动态图开源的文本识别算法列表: -- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))(ppocr推荐) +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717) )(ppocr推荐) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) - [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon - [ ] SRN([paper](https://arxiv.org/abs/2003.12294)) coming soon 参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: |模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| -|-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](link)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](link)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](link)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](link)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](link)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](link)| +| --- | --- | --- | --- | --- | +|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)| +|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)| +|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)| +|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接 (coming soon )]()| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接 (coming soon )]()| PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。 diff --git a/doc/doc_ch/angle_class.md b/doc/doc_ch/angle_class.md index d6a36b86b476f15b7b34f67e888ceb781b2ed7a0..3f2027b9ddff331b3259ed62c7c7b43e686efcce 100644 --- a/doc/doc_ch/angle_class.md +++ b/doc/doc_ch/angle_class.md @@ -62,9 +62,9 @@ PaddleOCR提供了训练脚本、评估脚本和预测脚本。 *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* ``` -# GPU训练 支持单卡,多卡训练,通过selected_gpus指定卡号 +# GPU训练 支持单卡,多卡训练,通过 '--gpus' 指定卡号,如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU # 启动训练,下面的命令已经写入train.sh文件中,只需修改文件里的配置文件路径即可 -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml +python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml ``` - 数据增强 @@ -74,7 +74,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入 默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse),随机数据增强(RandAugment)。 训练过程中除随机数据增强外每种扰动方式以50%的概率被选择,具体代码实现请参考: -[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) +[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) [randaugment.py](../../ppocr/data/imaug/randaugment.py) *由于OpenCV的兼容性问题,扰动操作暂时只支持linux* diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md index a31907015de3c6a119764917893ade29a0ff5493..ec3cb2766071d4c1ff6927de3e79c3e3c0c51131 100644 --- a/doc/doc_ch/detection.md +++ b/doc/doc_ch/detection.md @@ -76,8 +76,8 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model # 单机单卡训练 mv3_db 模型 python3 tools/train.py -c configs/det/det_mv3_db.yml \ -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ -# 单机多卡训练,通过 --select_gpus 参数设置使用的GPU ID; -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ +# 单机多卡训练,通过 --gpus 参数设置使用的GPU ID;如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ ``` diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md index 663533c492ab5dc0bd22cc79bd95c9d1d194d854..f9437d0772c0612d78938781b265da35253e21cd 100644 --- a/doc/doc_ch/inference.md +++ b/doc/doc_ch/inference.md @@ -22,9 +22,8 @@ inference 模型(`paddle.jit.save`保存的模型) - [三、文本识别模型推理](#文本识别模型推理) - [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理) - [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理) - - [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理) - - [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理) - - [5. 多语言模型的推理](#多语言模型的推理) + - [3. 自定义文本识别字典的推理](#自定义文本识别字典的推理) + - [4. 多语言模型的推理](#多语言模型的推理) - [四、方向分类模型推理](#方向识别模型推理) - [1. 方向分类模型推理](#方向分类模型推理) @@ -268,16 +267,6 @@ CRNN 文本识别模型推理,可以执行如下命令: python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` - -### 3. 基于Attention损失的识别模型推理 - -基于Attention损失的识别模型与ctc不同,需要额外设置识别算法参数 --rec_algorithm="RARE" -RARE 文本识别模型推理,可以执行如下命令: -``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE" - -``` - ![](../imgs_words_en/word_336.png) 执行命令后,上面图像的识别结果如下: @@ -297,7 +286,7 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) ``` -### 4. 自定义文本识别字典的推理 +### 3. 自定义文本识别字典的推理 如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch` ``` @@ -305,7 +294,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png ``` -### 5. 多语言模型的推理 +### 4. 多语言模型的推理 如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果, 需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别: diff --git a/doc/doc_ch/installation.md b/doc/doc_ch/installation.md index eb9a5c9f1323feadc3bea3017586d1004ee75cc2..0dddfec0a6e17e26a73d284ac98c9c95e449c378 100644 --- a/doc/doc_ch/installation.md +++ b/doc/doc_ch/installation.md @@ -2,7 +2,7 @@ 经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23 PaddleOCR 工作环境 -- PaddlePaddle 2.0rc0+ ,推荐使用 PaddlePaddle 2.0rc0 +- PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0rc1 - python3.7 - glibc 2.23 - cuDNN 7.6+ (GPU) @@ -35,11 +35,11 @@ sudo docker container exec -it ppocr /bin/bash pip3 install --upgrade pip 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 -python3 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle-gpu==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple 如果您的机器是CPU,请运行以下命令安装 -python3 -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple 更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 ``` diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index d67179e026f6dc5b0f2baaea482f6b8cee337dc5..12c73d3de714a699cb01b8e4592261ee24fe41da 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -200,11 +200,8 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | -| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc | -| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention | | rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc | | rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | -| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | 训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md index 7888bd96476e21aa3c12edba283742db425c0219..847b9288b96adf8c32b6486cb2798b838745c85f 100644 --- a/doc/doc_en/algorithm_overview_en.md +++ b/doc/doc_en/algorithm_overview_en.md @@ -13,23 +13,23 @@ This tutorial lists the text detection algorithms and text recognition algorithm PaddleOCR open source text detection algorithms list: - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research) +- [x] SAST([paper](https://arxiv.org/abs/1908.05498) )(Baidu Self-Research) On the ICDAR2015 dataset, the text detection result is as follows: |Model|Backbone|precision|recall|Hmean|Download link| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](link)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](link)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](link)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](link)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](link)| +| --- | --- | --- | --- | --- | --- | +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[download link (coming soon)](link)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[download link (coming soon)](coming soon)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| +|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[download link (coming soon)](link)| On Total-Text dataset, the text detection result is as follows: |Model|Backbone|precision|recall|Hmean|Download link| -|-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](link)| +| --- | --- | --- | --- | --- | --- | +|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[download link (coming soon)](link)| **Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). @@ -41,20 +41,21 @@ For the training guide and use of PaddleOCR text detection algorithms, please re PaddleOCR open-source text recognition algorithms list: - [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) - [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon -- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research) coming soon +- [ ] SRN([paper](https://arxiv.org/abs/2003.12294) )(Baidu Self-Research) coming soon Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: |Model|Backbone|Avg Accuracy|Module combination|Download link| -|-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +| --- | --- | --- | --- | --- | +|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)| +|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)| +|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)| +|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[download link (coming soon )]()| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[download link (coming soon )]()| + Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) diff --git a/doc/doc_en/angle_class_en.md b/doc/doc_en/angle_class_en.md index defdff3ccbbad9d0201305529073bdc80abd5d29..4c479e7b22e7caea6bf5f864d32b57197b925dd9 100644 --- a/doc/doc_en/angle_class_en.md +++ b/doc/doc_en/angle_class_en.md @@ -65,9 +65,9 @@ Start training: ``` # Set PYTHONPATH path export PYTHONPATH=$PYTHONPATH:. -# GPU training Support single card and multi-card training, specify the card number through selected_gpus +# GPU training Support single card and multi-card training, specify the card number through --gpus. If your paddle version is less than 2.0rc1, please use '--selected_gpus' # Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml +python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml ``` - Data Augmentation @@ -77,7 +77,7 @@ PaddleOCR provides a variety of data augmentation methods. If you want to add di The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment. Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: -[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) +[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) [randaugment.py](../../ppocr/data/imaug/randaugment.py) diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 83e949344f1821aae2dcb57911aff7173246076f..6a2bda6b497df5d9a6ccb914976f53b2e27ce9b0 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -76,8 +76,10 @@ You can also use `-o` to change the training parameters without modifying the ym python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 # multi-GPU training -# Set the GPU ID used by the '--select_gpus' parameter; -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 +# Set the GPU ID used by the '--gpus' parameter; If your paddle version is less than 2.0rc1, please use '--selected_gpus' +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 + + ``` #### load trained model and continue training diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md index 411a733dd062cf347d7a2e5d5d067739bda36819..826aad692a31567708609b142c5f3c9337a36dd3 100644 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -25,9 +25,8 @@ Next, we first introduce how to convert a trained model into an inference model, - [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE) - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION) - [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION) - - [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION) - - [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS) - - [5. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE) + - [3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS) + - [4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE) - [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) - [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) @@ -275,15 +274,6 @@ For CRNN text recognition model inference, execute the following commands: python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` - -### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE - -The recognition model based on Attention loss is different from ctc, and additional recognition algorithm parameters need to be set --rec_algorithm="RARE" -After executing the command, the recognition result of the above image is as follows: -```bash -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE" -``` - ![](../imgs_words_en/word_336.png) After executing the command, the recognition result of the above image is as follows: @@ -303,7 +293,7 @@ dict_character = list(self.character_str) ``` -### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY +### 3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch` ``` @@ -311,7 +301,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png ``` -### 5. MULTILINGAUL MODEL INFERENCE +### 4. MULTILINGAUL MODEL INFERENCE If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition: diff --git a/doc/doc_en/installation_en.md b/doc/doc_en/installation_en.md index b49c3f02851eb540c3b8e78beb0dfa1a4b08ce09..073b67b04d10cc2ae4b20f0ca38b604ab95bc09f 100644 --- a/doc/doc_en/installation_en.md +++ b/doc/doc_en/installation_en.md @@ -3,7 +3,7 @@ After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. PaddleOCR working environment: -- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0rc0 +- PaddlePaddle 1.8+, Recommend PaddlePaddle 2.0rc1 - python3.7 - glibc 2.23 @@ -38,10 +38,10 @@ sudo docker container exec -it ppocr /bin/bash pip3 install --upgrade pip # If you have cuda9 or cuda10 installed on your machine, please run the following command to install -python3 -m pip install paddlepaddle-gpu==2.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle-gpu==2.0rc1 -i https://mirror.baidu.com/pypi/simple # If you only have cpu on your machine, please run the following command to install -python3 -m pip install paddlepaddle==2.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle==2.0rc1 -i https://mirror.baidu.com/pypi/simple ``` For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 1539b288da2518bf5441adea7983135f3c46619f..3be0e1627f0fe09bc9c01da5871321d519273833 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -193,11 +193,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | -| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc | -| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention | | rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc | | rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | -| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | For training Chinese data, it is recommended to use [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: diff --git a/ppocr/modeling/transforms/tps.py b/ppocr/modeling/transforms/tps.py index 74bec7416bb1fd970ad00aecfdafc4173827a145..86665bedfff726c174e676cb544000a37ada0dad 100644 --- a/ppocr/modeling/transforms/tps.py +++ b/ppocr/modeling/transforms/tps.py @@ -180,7 +180,6 @@ class GridGenerator(nn.Layer): P = self.build_P_paddle(I_r_size) inv_delta_C_tensor = self.build_inv_delta_C_paddle(C).astype('float32') - # inv_delta_C_tensor = paddle.zeros((23,23)).astype('float32') P_hat_tensor = self.build_P_hat_paddle( C, paddle.to_tensor(P)).astype('float32') diff --git a/train.sh b/train.sh index a0483e4dc81a4530822850f5a0a20ec9b4c94764..c511c51600cc2d939f0bc8c7f52a3f3c6ce52d58 100644 --- a/train.sh +++ b/train.sh @@ -1 +1,5 @@ - python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml \ No newline at end of file +# for paddle.__version__ >= 2.0rc1 +python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml + +# for paddle.__version__ < 2.0rc1 +# python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml