diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md index c4a3b3255f7367aec672387272d47b64a02658ee..d047959dbacc53da596d567eb9fd054f7772e4a5 100644 --- a/doc/doc_ch/algorithm_overview.md +++ b/doc/doc_ch/algorithm_overview.md @@ -17,17 +17,17 @@ PaddleOCR开源的文本检测算法列表: |模型|骨干网络|precision|recall|Hmean|下载链接| |-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| +|EAST|ResNet50_vd||||[敬请期待]()| +|EAST|MobileNetV3||||[敬请期待]()| +|DB|ResNet50_vd||||[敬请期待]()| +|DB|MobileNetV3||||[敬请期待]()| +|SAST|ResNet50_vd||||[敬请期待]()| 在Total-text文本检测公开数据集上,算法效果如下: |模型|骨干网络|precision|recall|Hmean|下载链接| |-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| +|SAST|ResNet50_vd||||[敬请期待]()| **说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) @@ -48,17 +48,12 @@ PaddleOCR开源的文本识别算法列表: |模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| |-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| -|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| -|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| -|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| - -**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。 -原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。 +|Rosetta|Resnet34_vd||rec_r34_vd_none_none_ctc|[敬请期待]()| +|Rosetta|MobileNetV3||rec_mv3_none_none_ctc|[敬请期待]()| +|CRNN|Resnet34_vd||rec_r34_vd_none_bilstm_ctc|[敬请期待]()| +|CRNN|MobileNetV3||rec_mv3_none_bilstm_ctc|[敬请期待]()| +|STAR-Net|Resnet34_vd||rec_r34_vd_tps_bilstm_ctc|[敬请期待]()| +|STAR-Net|MobileNetV3||rec_mv3_tps_bilstm_ctc|[敬请期待]()| + PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。 diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index 71be1e89dc561630315337ef11c52289e0756c00..91b1af78835707fc37494c847a74f7c1f30191bf 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -142,9 +142,8 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起, - 添加空格类别 -如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true`。 +如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True`。 -**注意:`use_space_char` 仅在 `character_type=ch` 时生效** ### 启动训练 @@ -167,10 +166,9 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* ``` -# GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号 -export CUDA_VISIBLE_DEVICES=0,1,2,3 +# GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号 # 训练icdar15英文数据 并将训练日志保存为 tain_rec.log -python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml ``` - 数据增强 @@ -195,8 +193,8 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t | 配置文件 | 算法名称 | backbone | trans | seq | pred | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | -| [rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | -| [rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc | +| [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| [rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | | rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | @@ -210,39 +208,69 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | | rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn | -训练中文数据,推荐使用[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: +训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: -以 `rec_mv3_none_none_ctc.yml` 为例: +以 `rec_chinese_lite_train_v2.0.yml` 为例: ``` Global: ... - # 修改 image_shape 以适应长文本 - image_shape: [3, 32, 320] - ... + # 添加自定义字典,如修改字典请将路径指向新字典 + character_dict_path: ppocr/utils/ppocr_keys_v1.txt # 修改字符类型 character_type: ch - # 添加自定义字典,如修改字典请将路径指向新字典 - character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt - # 训练时添加数据增强 - distort: true - # 识别空格 - use_space_char: true - ... - # 修改reader类型 - reader_yml: ./configs/rec/rec_chinese_reader.yml ... + # 识别空格 + use_space_char: True -... Optimizer: ... # 添加学习率衰减策略 - decay: - function: cosine_decay - # 每个 epoch 包含 iter 数 - step_each_epoch: 20 - # 总共训练epoch数 - total_epoch: 1000 + lr: + name: Cosine + learning_rate: 0.001 + ... + +... + +Train: + dataset: + # 数据集格式,支持LMDBDateSet以及SimpleDataSet + name: SimpleDataSet + # 数据集路径 + data_dir: ./train_data/ + # 训练集标签文件 + label_file_list: ["./train_data/train_list.txt"] + transforms: + ... + - RecResizeImg: + # 修改 image_shape 以适应长文本 + image_shape: [3, 32, 320] + ... + loader: + ... + # 单卡训练的batch_size + batch_size_per_card: 256 + ... + +Eval: + dataset: + # 数据集格式,支持LMDBDateSet以及SimpleDataSet + name: SimpleDataSet + # 数据集路径 + data_dir: ./train_data + # 验证集标签文件 + label_file_list: ["./train_data/val_list.txt"] + transforms: + ... + - RecResizeImg: + # 修改 image_shape 以适应长文本 + image_shape: [3, 32, 320] + ... + loader: + # 单卡验证的batch_size + batch_size_per_card: 256 + ... ``` **注意,预测/评估时的配置文件请务必与训练一致。** @@ -270,39 +298,41 @@ Global: ... # 添加自定义字典,如修改字典请将路径指向新字典 character_dict_path: ./ppocr/utils/dict/french_dict.txt - # 训练时添加数据增强 - distort: true - # 识别空格 - use_space_char: true - ... - # 修改reader类型 - reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml - ... -... -``` - -同时需要修改数据读取文件 `rec_french_reader.yml`: - -``` -TrainReader: ... - # 修改训练数据存放的目录名 - img_set_dir: ./train_data - # 修改 label 文件名称 - label_file_path: ./train_data/french_train.txt + # 识别空格 + use_space_char: True ... + +Train: + dataset: + # 数据集格式,支持LMDBDateSet以及SimpleDataSet + name: SimpleDataSet + # 数据集路径 + data_dir: ./train_data/ + # 训练集标签文件 + label_file_list: ["./train_data/french_train.txt"] + ... + +Eval: + dataset: + # 数据集格式,支持LMDBDateSet以及SimpleDataSet + name: SimpleDataSet + # 数据集路径 + data_dir: ./train_data + # 验证集标签文件 + label_file_list: ["./train_data/french_val.txt"] + ... ``` ### 评估 -评估数据集可以通过 `configs/rec/rec_icdar15_reader.yml` 修改EvalReader中的 `label_file_path` 设置。 +评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。 *注意* 评估时必须确保配置文件中 infer_img 字段为空 ``` -export CUDA_VISIBLE_DEVICES=0 # GPU 评估, Global.checkpoints 为待测权重 -python3 tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy +python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy ``` @@ -332,12 +362,12 @@ infer_img: doc/imgs_words/en/word_1.png word : joint ``` -预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml` 完成了中文模型的训练, +预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml` 完成了中文模型的训练, 您可以使用如下命令进行中文模型预测。 ``` # 预测中文结果 -python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg +python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg ``` 预测图片: diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md index 2e21fd621971e062384a9323e79a8cf4498d7495..60c44865a74d08f6f77af98812266370e2f68309 100644 --- a/doc/doc_en/algorithm_overview_en.md +++ b/doc/doc_en/algorithm_overview_en.md @@ -19,17 +19,17 @@ On the ICDAR2015 dataset, the text detection result is as follows: |Model|Backbone|precision|recall|Hmean|Download link| |-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| +|EAST|ResNet50_vd||||[Coming soon]()| +|EAST|MobileNetV3||||[Coming soon]()| +|DB|ResNet50_vd||||[Coming soon]()| +|DB|MobileNetV3||||[Coming soon]()| +|SAST|ResNet50_vd||||[Coming soon]()| On Total-Text dataset, the text detection result is as follows: |Model|Backbone|precision|recall|Hmean|Download link| |-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| +|SAST|ResNet50_vd||||[Coming soon]()| **Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). @@ -49,18 +49,12 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r |Model|Backbone|Avg Accuracy|Module combination|Download link| |-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| -|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| -|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| -|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| - -**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA) (download code: y3ry). - -The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar). +|Rosetta|Resnet34_vd||rec_r34_vd_none_none_ctc|[Coming soon]()| +|Rosetta|MobileNetV3||rec_mv3_none_none_ctc|[Coming soon]()| +|CRNN|Resnet34_vd||rec_r34_vd_none_bilstm_ctc|[Coming soon]()| +|CRNN|MobileNetV3||rec_mv3_none_bilstm_ctc|[Coming soon]()| +|STAR-Net|Resnet34_vd||rec_r34_vd_tps_bilstm_ctc|[Coming soon]()| +|STAR-Net|MobileNetV3||rec_mv3_tps_bilstm_ctc|[Coming soon]()| + Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 41b00c52a7780d02c144c251553f427e5b875e5e..f9849321d3ac792accf02cac260f16daf030e755 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -135,7 +135,7 @@ If you need to customize dic file, please add character_dict_path field in confi - Add space category -If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `true`. +If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. **Note: use_space_char only takes effect when character_type=ch** @@ -158,10 +158,9 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar Start training: ``` -# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES -export CUDA_VISIBLE_DEVICES=0,1,2,3 +# GPU training Support single card and multi-card training, specify the card number through --gpus # Training icdar15 English data and saving the log as train_rec.log -python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml ``` - Data Augmentation @@ -184,8 +183,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend | Configuration file | Algorithm | backbone | trans | seq | pred | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | -| [rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | -| [rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc | +| [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| [rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | | rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc | | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | @@ -199,39 +198,69 @@ If the evaluation set is large, the test will be time-consuming. It is recommend | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | For training Chinese data, it is recommended to use -训练中文数据,推荐使用[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: +[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: co -Take `rec_mv3_none_none_ctc.yml` as an example: +Take `rec_chinese_lite_train_v2.0.yml` as an example: ``` Global: ... - # Modify image_shape to fit long text - image_shape: [3, 32, 320] - ... + # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary + character_dict_path: ppocr/utils/ppocr_keys_v1.txt # Modify character type character_type: ch - # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary - character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt ... - # Modify reader type - reader_yml: ./configs/rec/rec_chinese_reader.yml - # Whether to use data augmentation - distort: true # Whether to recognize spaces - use_space_char: true - ... + use_space_char: True -... Optimizer: ... # Add learning rate decay strategy - decay: - function: cosine_decay - # Each epoch contains iter number - step_each_epoch: 20 - # Total epoch number - total_epoch: 1000 + lr: + name: Cosine + learning_rate: 0.001 + ... + +... + +Train: + dataset: + # Type of dataset,we support LMDBDateSet and SimpleDataSet + name: SimpleDataSet + # Path of dataset + data_dir: ./train_data/ + # Path of train list + label_file_list: ["./train_data/train_list.txt"] + transforms: + ... + - RecResizeImg: + # Modify image_shape to fit long text + image_shape: [3, 32, 320] + ... + loader: + ... + # Train batch_size for Single card + batch_size_per_card: 256 + ... + +Eval: + dataset: + # Type of dataset,we support LMDBDateSet and SimpleDataSet + name: SimpleDataSet + # Path of dataset + data_dir: ./train_data + # Path of eval list + label_file_list: ["./train_data/val_list.txt"] + transforms: + ... + - RecResizeImg: + # Modify image_shape to fit long text + image_shape: [3, 32, 320] + ... + loader: + # Eval batch_size for Single card + batch_size_per_card: 256 + ... ``` **Note that the configuration file for prediction/evaluation must be consistent with the training.** @@ -257,18 +286,33 @@ Take `rec_french_lite_train` as an example: ``` Global: ... - # Add a custom dictionary, if you modify the dictionary - # please point the path to the new dictionary + # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary character_dict_path: ./ppocr/utils/dict/french_dict.txt - # Add data augmentation during training - distort: true - # Identify spaces - use_space_char: true - ... - # Modify reader type - reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml ... + # Whether to recognize spaces + use_space_char: True + ... + +Train: + dataset: + # Type of dataset,we support LMDBDateSet and SimpleDataSet + name: SimpleDataSet + # Path of dataset + data_dir: ./train_data/ + # Path of train list + label_file_list: ["./train_data/french_train.txt"] + ... + +Eval: + dataset: + # Type of dataset,we support LMDBDateSet and SimpleDataSet + name: SimpleDataSet + # Path of dataset + data_dir: ./train_data + # Path of eval list + label_file_list: ["./train_data/french_val.txt"] + ... ``` @@ -277,9 +321,8 @@ Global: The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. ``` -export CUDA_VISIBLE_DEVICES=0 # GPU evaluation, Global.checkpoints is the weight to be tested -python3 tools/eval.py -c configs/rec/rec_icdar15_reader.yml -o Global.checkpoints={path/to/weights}/best_accuracy +python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_reader.yml -o Global.checkpoints={path/to/weights}/best_accuracy ``` @@ -294,7 +337,7 @@ The default prediction picture is stored in `infer_img`, and the weight is speci ``` # Predict English results -python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg +python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg ``` Input image: @@ -309,11 +352,11 @@ infer_img: doc/imgs_words/en/word_1.png word : joint ``` -The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml`, you can use the following command to predict the Chinese model: +The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml`, you can use the following command to predict the Chinese model: ``` # Predict Chinese results -python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg +python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg ``` Input image: