README.md 6.3 KB
Newer Older
littletomatodonkey's avatar
littletomatodonkey 已提交
1
English | [简体中文](README_ch.md)
W
weishengyu 已提交
2

littletomatodonkey's avatar
littletomatodonkey 已提交
3 4 5 6 7 8
### Contents
- [1. Introduction](#Introduction)
- [2. Preparation](#Preparation)
- [3. Demo](#Demo)
- [4. Advanced Usage](#Advanced_Usage)
- [5. Code Structure](#Code_structure)
W
weishengyu 已提交
9 10


littletomatodonkey's avatar
littletomatodonkey 已提交
11 12
<a name="Introduction"></a>
### Introduction
W
weishengyu 已提交
13

littletomatodonkey's avatar
littletomatodonkey 已提交
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
<div align="center">
    <img src="doc/images/3.png" width="800">
</div>

<div align="center">
    <img src="doc/images/1.png" width="600">
</div>


The Style-Text data synthesis tool is a tool based on Baidu's self-developed text editing algorithm "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047).

Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes:
* (1) Text foreground style transfer module.
* (2) Background extraction module.
* (3) Fusion module.

littletomatodonkey's avatar
littletomatodonkey 已提交
30
After these three steps, you can quickly realize the image text style transfer. The following figure is some results of the data synthesis tool.
littletomatodonkey's avatar
littletomatodonkey 已提交
31 32 33 34 35 36 37

<div align="center">
    <img src="doc/images/2.png" width="1000">
</div>


<a name="Preparation"></a>
W
weishengyu 已提交
38 39
#### Preparation

W
weishengyu 已提交
40
1. Please refer the [QUICK INSTALLATION](../doc/doc_en/installation_en.md) to install PaddlePaddle. Python3 environment is strongly recommended.
W
weishengyu 已提交
41 42 43
2. Download the pretrained models and unzip:

```bash
littletomatodonkey's avatar
littletomatodonkey 已提交
44 45
cd StyleText
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
W
weishengyu 已提交
46 47 48
unzip style_text_models.zip
```

littletomatodonkey's avatar
littletomatodonkey 已提交
49
If you save the model in another location, please modify the address of the model file in `configs/config.yml`, and you need to modify these three configurations at the same time:
W
weishengyu 已提交
50 51 52 53 54 55 56 57 58 59 60 61

```
bg_generator:
  pretrain: style_text_rec/bg_generator
...
text_generator:
  pretrain: style_text_models/text_generator
...
fusion_generator:
  pretrain: style_text_models/fusion_generator
```

littletomatodonkey's avatar
littletomatodonkey 已提交
62 63
<a name="Demo"></a>
### Demo
W
weishengyu 已提交
64

littletomatodonkey's avatar
littletomatodonkey 已提交
65
#### Synthesis single image
W
weishengyu 已提交
66

littletomatodonkey's avatar
littletomatodonkey 已提交
67
1. You can run `tools/synth_image` and generate the demo image.
W
weishengyu 已提交
68

littletomatodonkey's avatar
littletomatodonkey 已提交
69 70
```python
python3 -m tools.synth_image -c configs/config.yml
W
weishengyu 已提交
71 72 73 74
```

2. The results are `fake_bg.jpg`, `fake_text.jpg` and `fake_fusion.jpg` as shown in the figure above. Above them:
   * `fake_text.jpg` is the generated image with the same font style as `Style Input`;
littletomatodonkey's avatar
littletomatodonkey 已提交
75
   * `fake_bg.jpg` is the generated image of `Style Input` after removing foreground.
W
weishengyu 已提交
76 77 78 79 80 81 82 83 84 85
   * `fake_fusion.jpg` is the final result, that is synthesised by `fake_text.jpg` and `fake_bg.jpg`.  

3. If want to generate image by other `Style Input` or `Text Input`, you can modify the `tools/synth_image.py`:
   * `img = cv2.imread("examples/style_images/1.jpg")`: the path of `Style Input`;
   * `corpus = "PaddleOCR"`: the `Text Input`;
   * Notice:modify the language option(`language = "en"`) to adapt `Text Input`, that support `en`, `ch`, `ko`.

4. We also provide `batch_synth_images` mothod, that can combine corpus and pictures in pairs to generate a batch of data.


littletomatodonkey's avatar
littletomatodonkey 已提交
86
#### Batch synthesis
W
weishengyu 已提交
87 88 89 90 91

Before the start, you need to prepare some data as material.
First, you should have the style reference data for synthesis tasks, which are generally used as datasets for OCR recognition tasks.

1. The referenced dataset can be specifed in `configs/dataset_config.yml`:
littletomatodonkey's avatar
littletomatodonkey 已提交
92
   * `StyleSampler`:
W
weishengyu 已提交
93 94 95 96
     * `method`: The method of `StyleSampler`.
     * `image_home`: The directory of pictures.
     * `label_file`: The list of pictures path if `with_label` is `false`, otherwise, the label file path.
     * `with_label`: The `label_file` is label file or not.
littletomatodonkey's avatar
littletomatodonkey 已提交
97 98

   * `CorpusGenerator`:
W
weishengyu 已提交
99 100 101 102
     * `method`: The mothod of `CorpusGenerator`. If `FileCorpus` used, you need modify `corpus_file` and `language` accordingly, if `EnNumCorpus`, other configurations is not needed.
     * `language`: The language of the corpus. Needed if method is not `EnNumCorpus`.
     * `corpus_file`: The corpus file path. Needed if method is not `EnNumCorpus`.

littletomatodonkey's avatar
littletomatodonkey 已提交
103
We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
littletomatodonkey's avatar
littletomatodonkey 已提交
104 105 106 107 108

<div align="center">
     <img src="doc/images/5.png" width="800">
</div>

W
weishengyu 已提交
109 110 111 112 113 114 115
2. You can run the following command to start synthesis task:

   ``` bash
   python -m tools.synth_dataset.py -c configs/dataset_config.yml
   ```

3. You can using the following command to start multiple synthesis tasks in a multi-threaded manner, which needed to specifying tags by `-t`:
littletomatodonkey's avatar
littletomatodonkey 已提交
116

W
weishengyu 已提交
117 118 119 120 121 122
   ```bash
   python -m tools.synth_dataset.py -t 0 -c configs/dataset_config.yml
   python -m tools.synth_dataset.py -t 1 -c configs/dataset_config.yml
   ```


littletomatodonkey's avatar
littletomatodonkey 已提交
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183


<a name="Advanced_Usage"></a>
### Advanced Usage
We take two scenes as examples, which are metal surface English number recognition and general Korean recognition, to illustrate practical cases of using StyleText to synthesize data to improve text recognition. The following figure shows some examples of real scene images and composite images:

<div align="center">
    <img src="doc/images/6.png" width="800">
</div>


After adding the above synthetic data for training, the accuracy of the recognition model is improved, which is shown in the following table:

| Scenario | Characters | Raw Data | Test Data | Only Use Raw Data</br>Recognition Accuracy | New Synthetic Data | Simultaneous Use of Synthetic Data</br>Recognition Accuracy | Index Improvement |
| -------- | ---------- | -------- | -------- | ----------- --------------- | ------------ | --------------------- -| -------- |
| Metal surface | English and numbers | 2203 | 650 | 0.5938 | 20000 | 0.7546 | 16% |
| Random background | Korean | 5631 | 1230 | 0.3012 | 100000 | 0.5057 | 20% |


<a name="Code_structure"></a>
### Code Structure
```
style_text_rec
|-- arch
|   |-- base_module.py
|   |-- decoder.py
|   |-- encoder.py
|   |-- spectral_norm.py
|   `-- style_text_rec.py
|-- configs
|   |-- config.yml
|   `-- dataset_config.yml
|-- engine
|   |-- corpus_generators.py
|   |-- predictors.py
|   |-- style_samplers.py
|   |-- synthesisers.py
|   |-- text_drawers.py
|   `-- writers.py
|-- examples
|   |-- corpus
|   |   `-- example.txt
|   |-- image_list.txt
|   `-- style_images
|       |-- 1.jpg
|       `-- 2.jpg
|-- fonts
|   |-- ch_standard.ttf
|   |-- en_standard.ttf
|   `-- ko_standard.ttf
|-- tools
|   |-- __init__.py
|   |-- synth_dataset.py
|   `-- synth_image.py
`-- utils
    |-- config.py
    |-- load_params.py
    |-- logging.py
    |-- math_functions.py
    `-- sys_funcs.py
```