diff --git a/PPOCRLabel/README.md b/PPOCRLabel/README.md index e8634ef8c06feae1f0adffb22c5694084dab78cd..c5e1097a09a3076a8f5eaf19f8fbc5c6ba48b39f 100644 --- a/PPOCRLabel/README.md +++ b/PPOCRLabel/README.md @@ -196,18 +196,28 @@ For some data that are difficult to recognize, the recognition results will not ``` cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder - python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --labelRootPath ../train_data/label --detRootPath ../train_data/det --recRootPath ../train_data/rec + python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath ../train_data ``` Parameter Description: - `trainValTestRatio` is the division ratio of the number of images in the training set, validation set, and test set, set according to your actual situation, the default is `6:2:2` - - `labelRootPath` is the storage path of the dataset labeled by PPOCRLabel, the default is `../train_data/label` - - - `detRootPath` is the path where the text detection dataset is divided according to the dataset marked by PPOCRLabel. The default is `../train_data/det` - - - `recRootPath` is the path where the character recognition dataset is divided according to the dataset marked by PPOCRLabel. The default is `../train_data/rec` + - `datasetRootPath` is the storage path of the complete dataset labeled by PPOCRLabel. The default path is PaddleOCR/train_data. + ``` + |-train_data + |-crop_img + |- word_001_crop_0.png + |- word_002_crop_0.jpg + |- word_003_crop_0.jpg + | ... + | Label.txt + | rec_gt.txt + |- word_001.png + |- word_002.jpg + |- word_003.jpg + | ... + ``` ### 3.6 Error message diff --git a/PPOCRLabel/README_ch.md b/PPOCRLabel/README_ch.md index e1c391bc8637baa4adfa8852d805ed0f4bf04d6d..ac0ccb98fa3284142ff5754677025ca1061e210b 100644 --- a/PPOCRLabel/README_ch.md +++ b/PPOCRLabel/README_ch.md @@ -181,18 +181,28 @@ PPOCRLabel支持三种导出方式: ``` cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下 -python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --labelRootPath ../train_data/label --detRootPath ../train_data/det --recRootPath ../train_data/rec +python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath ../train_data ``` 参数说明: - `trainValTestRatio` 是训练集、验证集、测试集的图像数量划分比例,根据实际情况设定,默认是`6:2:2` -- `labelRootPath` 是PPOCRLabel标注的数据集存放路径,默认是`../train_data/label` - -- `detRootPath` 是根据PPOCRLabel标注的数据集划分后的文本检测数据集存放的路径,默认是`../train_data/det ` - -- `recRootPath` 是根据PPOCRLabel标注的数据集划分后的字符识别数据集存放的路径,默认是`../train_data/rec` +- `datasetRootPath` 是PPOCRLabel标注的完整数据集存放路径。默认路径是PaddleOCR/train_data 分割数据集前应有如下结构: + ``` + |-train_data + |-crop_img + |- word_001_crop_0.png + |- word_002_crop_0.jpg + |- word_003_crop_0.jpg + | ... + | Label.txt + | rec_gt.txt + |- word_001.png + |- word_002.jpg + |- word_003.jpg + | ... + ``` ### 3.6 错误提示