提交 181b2933 编写于 作者: W weishengyu

change config; add doc

上级 d3dcaed4
### 快速上手
Style-Text是对百度自研文本编辑算法《Editing Text in the Wild》中提出的SRNet网络的改进,不同于常用的GAN的方法只选择一个分支,该工具将文本合成任务分解为三个子模块,文本风格迁移模块、背景抽取模块和前背景融合模块,来提升合成数据的效果。下图显示了一些示例结果。
此外,在实际铭牌文本识别场景和韩语文本识别场景,验证了该合成工具的有效性,具体如下。
#### 环境配置
1. 参考[快速安装](./installation.md),安装PaddlePaddle并准备环境。
2. 进入`style_text_rec`目录,下载模型,并解压:
```bash
cd tools/style_text_rec
wget /path/to/style_text_models.zip
unzip style_text_models.zip
```
您可以在[此处]()下载模型文件。如果您选择了其他下载位置,请在`configs/config.yml`中修改模型文件的地址,修改时需要同时修改这三个配置:
```
bg_generator:
pretrain: style_text_models/bg_generator
...
text_generator:
pretrain: style_text_models/text_generator
...
fusion_generator:
pretrain: style_text_models/fusion_generator
```
#### 合成单张图片
1. 运行tools/synth_image,生成示例图片:
```python
python -m tools.synth_image -c configs/config.yml
```
1. 运行后,会生成`fake_busion.jpg`,即为最终结果。除此之外,程序还会生成并保存中间结果:
* `fake_bg.jpg`:为风格参考图去掉文字后的背景;
* `fake_text.jpg`:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。
2. 如果您想尝试其他风格图像和文字的效果,可以在`tools/synth_image.py`中修改:
* `img = cv2.imread("examples/style_images/1.jpg")`:请在此处修改风格图像的目录;
* `corpus = "PaddleOCR"`:请在此处修改要使用的语料文本
* 注意:请修改语言选项(`language = "en"`)和语料相对应,目前我们支持英文、简体中文和韩语。
3.`tools/synth_image.py`中,我们还提供了一个`batch_synth_images`方法,可以两两组合语料和图片,批量生成一批数据。
### 高级使用
#### 组件介绍
`Style Text Rec`主要包含以下组件:
* `style_samplers`:风格图片采样器,负责返回风格图片。目前我们提供了`DatasetSampler`,可以从一个有标注的数据集中采样。
* `corpus_generators`:语料生成器,负责生成语料。目前提供了两种语料成生成器:
* `EnNumCorpus`:根据给定的长度生成随机字符串,字符可能是大小写英文字母、数字和空格。
* `FileCorpus`:读取指定的文本文件,并随机返回其中的单词.
* `text_drawers`:标准字体图片生成器,负责根据输入的语料,生成标准字体的图片。注意,使用该组件时,一定要根据语料修改对应的语言信息,否则可能会书写失败。
* `predictors`:预测器,根据给定的风格图片和标准字体图片,调用深度学习模型,生成新的数据。`predictor`是整个算法的核心模块。
* `writers`:文件输出器,负责将合成的图片与标签文件写入硬盘。
* `synthesisers`:合成器,负责调用各个模块,完成数据合成。
### 合成数据集
在开始合成数据集前,需要准备一些素材。
首先,需要风格图片作为合成图片的参考依据,这些数据可以是用作训练OCR识别模型的数据集。本例中使用带有标注文件的数据集作为风格图片.
1.`configs/dataset_config.yml`中配置输入数据路径。
* `StyleSamplerl`
* `method`:使用的风格图片采样方法;
* `image_home`:风格图片目录;
* `label_file`:风格图片路径列表文件,如果所用数据集有label,则label_file为label文件路径;
* `with_label`:标志`label_file`是否为label文件。
* `CorpusGenerator`
* `method`:语料生成方法,目前有`FileCorpus``EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file``language`
* `language`:语料的语种;
* `corpus_file`: 语料文件路径。
2. 运行`tools/synth_dataset`合成数据:
``` bash
python -m tools.synth_dataset -c configs/dataset_config.yml
```
3. 如果您想使用并行方式来快速合成数据,可以通过启动多个进程,在启动时需要指定不同的`tag``-t`),如下所示:
```bash
python -m tools.synth_dataset -t 0 -c configs/dataset_config.yml
python -m tools.synth_dataset -t 1 -c configs/dataset_config.yml
```
### 使用合成数据集进行OCR识别训练
在完成上述操作后,即可得到用于OCR识别的合成数据集,接下来请参考[OCR识别文档](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/recognition.md#%E5%90%AF%E5%8A%A8%E8%AE%AD%E7%BB%83),完成训练。
\ No newline at end of file
### Quick Start
`Style-Text` is an improvement of the SRNet network proposed in Baidu's self-developed text editing algorithm "Editing Text in the Wild". It is different from the commonly used GAN methods. This tool decomposes the text synthesis task into three sub-modules to improve the effect of synthetic data: text style transfer module, background extraction module and fusion module.
The following figure shows some example results. In addition, the actual `nameplate text recognition` scene and `the Korean text recognition` scene verify the effectiveness of the synthesis tool, as follows.
#### Preparation
1. Please refer the [QUICK INSTALLATION](./installation_en.md) to install PaddlePaddle.
2. Download the pretrained models and unzip:
```bash
cd tools/style_text_rec
wget /path/to/style_text_models.zip
unzip style_text_models.zip
```
You can dowload models [here](). If you save the model files in other folders, please edit the three model paths in `configs/config.yml`:
```
bg_generator:
pretrain: style_text_rec/bg_generator
...
text_generator:
pretrain: style_text_models/text_generator
...
fusion_generator:
pretrain: style_text_models/fusion_generator
```
#### Demo
1. You can use the following commands to run a demo:
```bash
python -m tools.synth_image -c configs/config.yml
```
2. The results are `fake_bg.jpg`, `fake_text.jpg` and `fake_fusion.jpg` as shown in the figure above. Above them:
* `fake_text.jpg` is the generated image with the same font style as `Style Input`;
* `fake_bg.jpg` is the generated image of `Style Input` after removing foreground.
* `fake_fusion.jpg` is the final result, that is synthesised by `fake_text.jpg` and `fake_bg.jpg`.
3. If want to generate image by other `Style Input` or `Text Input`, you can modify the `tools/synth_image.py`:
* `img = cv2.imread("examples/style_images/1.jpg")`: the path of `Style Input`;
* `corpus = "PaddleOCR"`: the `Text Input`;
* Notice:modify the language option(`language = "en"`) to adapt `Text Input`, that support `en`, `ch`, `ko`.
4. We also provide `batch_synth_images` mothod, that can combine corpus and pictures in pairs to generate a batch of data.
### Advanced Usage
#### Components
`Style Text Rec` mainly contains the following components:
* `style_samplers`: It can sample `Style Input` from a dataset. Now, We only provide `DatasetSampler`.
* `corpus_generators`: It can generate corpus. Now, wo only provide two `corpus_generators`:
* `EnNumCorpus`: It can generate a random string according to a given length, including uppercase and lowercase English letters, numbers and spaces.
* `FileCorpus`: It can read a text file and randomly return the words in it.
* `text_drawers`: It can generate `Text Input`(text picture in standard font according to the input corpus). Note that when using, you have to modify the language information according to the corpus.
* `predictors`: It can call the deep learning model to generate new data based on the `Style Input` and `Text Input`.
* `writers`: It can write the generated pictures(`fake_bg.jpg`, `fake_text.jpg` and `fake_fusion.jpg`) and label information to the disk.
* `synthesisers`: It can call the all modules to complete the work.
### Generate Dataset
Before the start, you need to prepare some data as material.
First, you should have the style reference data for synthesis tasks, which are generally used as datasets for OCR recognition tasks.
1. The referenced dataset can be specifed in `configs/dataset_config.yml`:
* `StyleSampler`:
* `method`: The method of `StyleSampler`.
* `image_home`: The directory of pictures.
* `label_file`: The list of pictures path if `with_label` is `false`, otherwise, the label file path.
* `with_label`: The `label_file` is label file or not.
* `CorpusGenerator`:
* `method`: The mothod of `CorpusGenerator`. If `FileCorpus` used, you need modify `corpus_file` and `language` accordingly, if `EnNumCorpus`, other configurations is not needed.
* `language`: The language of the corpus. Needed if method is not `EnNumCorpus`.
* `corpus_file`: The corpus file path. Needed if method is not `EnNumCorpus`.
2. You can run the following command to start synthesis task:
``` bash
python -m tools.synth_dataset.py -c configs/dataset_config.yml
```
3. You can using the following command to start multiple synthesis tasks in a multi-threaded manner, which needed to specifying tags by `-t`:
```bash
python -m tools.synth_dataset.py -t 0 -c configs/dataset_config.yml
python -m tools.synth_dataset.py -t 1 -c configs/dataset_config.yml
```
### OCR Recognition Training
After completing the above operations, you can get the synthetic data set for OCR recognition. Next, please complete the training by refering to [OCR Recognition Document](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/recognition. md#%E5%90%AF%E5%8A%A8%E8%AE%AD%E7%BB%83).
\ No newline at end of file
...@@ -23,7 +23,7 @@ Predictor: ...@@ -23,7 +23,7 @@ Predictor:
- 0.5 - 0.5
expand_result: false expand_result: false
bg_generator: bg_generator:
pretrain: models/style_text_rec/bg_generator pretrain: style_text_models/bg_generator
module_name: bg_generator module_name: bg_generator
generator_type: BgGeneratorWithMask generator_type: BgGeneratorWithMask
encode_dim: 64 encode_dim: 64
...@@ -33,7 +33,7 @@ Predictor: ...@@ -33,7 +33,7 @@ Predictor:
conv_block_dilation: true conv_block_dilation: true
output_factor: 1.05 output_factor: 1.05
text_generator: text_generator:
pretrain: models/style_text_rec/text_generator pretrain: style_text_models/text_generator
module_name: text_generator module_name: text_generator
generator_type: TextGenerator generator_type: TextGenerator
encode_dim: 64 encode_dim: 64
...@@ -42,7 +42,7 @@ Predictor: ...@@ -42,7 +42,7 @@ Predictor:
conv_block_dropout: false conv_block_dropout: false
conv_block_dilation: true conv_block_dilation: true
fusion_generator: fusion_generator:
pretrain: models/style_text_rec/fusion_generator pretrain: style_text_models/fusion_generator
module_name: fusion_generator module_name: fusion_generator
generator_type: FusionGeneratorSimple generator_type: FusionGeneratorSimple
encode_dim: 64 encode_dim: 64
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册