diff --git a/doc/doc_en/dataset/ocr_datasets_en.md b/doc/doc_en/dataset/ocr_datasets_en.md
index c05fb87d5a3ed61ad38c97b84321819e8981d436..0b9abd529ddb6d0cf0bc294d74e3249215c8fd45 100644
--- a/doc/doc_en/dataset/ocr_datasets_en.md
+++ b/doc/doc_en/dataset/ocr_datasets_en.md
@@ -73,7 +73,7 @@ After decompressing the data set and downloading the annotation file, PaddleOCR/
 
 The text recognition algorithm in PaddleOCR supports two data formats:
  - `lmdb` is used to train data sets stored in lmdb format, use [lmdb_dataset.py](../../../ppocr/data/lmdb_dataset.py) to load;
- - `通用数据` is used to train data sets stored in text files, use [simple_dataset.py](../../../ppocr/data/simple_dataset.py) to load.
+ - `common dataset` is used to train data sets stored in text files, use [simple_dataset.py](../../../ppocr/data/simple_dataset.py) to load.
 
 
 If you want to use your own data for training, please refer to the following to organize your data.