diff --git a/configs/rec/rec_r31_sar.yml b/configs/rec/rec_r31_sar.yml index b761d28bd2d90c1ebb8a49025127be2eaa3a233a..c19bcdee099339adfedb76b047d691758f4264cb 100644 --- a/configs/rec/rec_r31_sar.yml +++ b/configs/rec/rec_r31_sar.yml @@ -56,7 +56,7 @@ Train: dataset: name: SimpleDataSet delimiter: ' ' - label_file_list: ['/paddle/data/concat_data/icdar_2013_train20.txt', '/paddle/data/concat_data/icdar_2015_train20.txt', '/paddle/data/concat_data/coco_text_train20.txt', '/paddle/data/concat_data/IIIt5k_train20.txt', '/paddle/data/concat_data/SynthAdd_train.txt', '/paddle/data/concat_data/SynthText_train.txt', '/paddle/data/concat_data/Syn90k_train.txt'] + label_file_list: ['/paddle/data/concat_data/train_list.txt'] data_dir: /paddle/data/concat_data/ ratio_list: 1.0 transforms: @@ -71,7 +71,7 @@ Train: keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order loader: shuffle: True - batch_size_per_card: 64 # 32 + batch_size_per_card: 64 drop_last: True num_workers: 8 use_shared_memory: False diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index 0ac6da8715e6a7350d62a32c6baae7adf8f87747..bcad6b865484f2defdcf53b6f8f1ba1b53dd0086 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -90,6 +90,8 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单 如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。 +如果希望复现SAR的论文指标,需要下载[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), 提取码:627x。此外,真实数据集icdar2013, icdar2015, cocotext, IIIT5也作为训练数据的一部分。具体数据细节可以参考论文SAR。 + ``` # 训练集标签 wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 91f81a6aac7cea1ef799b1fe2a89741dee43699a..65030b4a16e6520e96d440271b7eb545b8177bbb 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -90,6 +90,8 @@ If you do not have a dataset locally, you can download it on the official websit If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path. +If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR. + PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways: ```