diff --git a/StyleText/README.md b/StyleText/README.md index bbce7c5eba280e95452933f713524b19ff5778e2..f978f43875f3994c7016fbf6cff1a1c0f2de4588 100644 --- a/StyleText/README.md +++ b/StyleText/README.md @@ -116,9 +116,17 @@ In actual application scenarios, it is often necessary to synthesize pictures in * `CorpusGenerator`: * `method`:Method of CorpusGenerator,supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used,No other configuration is needed,otherwise you need to set `corpus_file` and `language`. * `language`:Language of the corpus. - * `corpus_file`: Filepath of the corpus. + * `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings('\n'). Corpus generator samples one line each time. +Example of corpus file: +``` +PaddleOCR +飞桨文字识别 +StyleText +风格文本图像数据合成 +``` + We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
diff --git a/StyleText/README_ch.md b/StyleText/README_ch.md index eb557ff24547f228610ffa2cbbaf993e2b4569c3..a8ab933bd38792bcd6aecd76c85fc3fd1f5f204f 100644 --- a/StyleText/README_ch.md +++ b/StyleText/README_ch.md @@ -102,7 +102,16 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_ * `CorpusGenerator`: * `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`; * `language`:语料的语种; - * `corpus_file`: 语料文件路径。 + * `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。 + + 语料文件格式示例: + ``` + PaddleOCR + 飞桨文字识别 + StyleText + 风格文本图像数据合成 + ... + ``` Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像,便于合成场景丰富的文本图像,下图给出了一些示例。