diff --git a/doc/doc_en/dataset/ocr_datasets_en.md b/doc/doc_en/dataset/ocr_datasets_en.md index c05fb87d5a3ed61ad38c97b84321819e8981d436..0b9abd529ddb6d0cf0bc294d74e3249215c8fd45 100644 --- a/doc/doc_en/dataset/ocr_datasets_en.md +++ b/doc/doc_en/dataset/ocr_datasets_en.md @@ -73,7 +73,7 @@ After decompressing the data set and downloading the annotation file, PaddleOCR/ The text recognition algorithm in PaddleOCR supports two data formats: - `lmdb` is used to train data sets stored in lmdb format, use [lmdb_dataset.py](../../../ppocr/data/lmdb_dataset.py) to load; - - `通用数据` is used to train data sets stored in text files, use [simple_dataset.py](../../../ppocr/data/simple_dataset.py) to load. + - `common dataset` is used to train data sets stored in text files, use [simple_dataset.py](../../../ppocr/data/simple_dataset.py) to load. If you want to use your own data for training, please refer to the following to organize your data.