augmentation.md 2.0 KB
Newer Older
H
Hui Zhang 已提交
1 2
# Data Augmentation Pipeline

小湉湉's avatar
小湉湉 已提交
3
Data augmentation has often been a highly effective technique to boost deep learning performance. We augment our speech data by synthesizing new audios with small random perturbation (label-invariant transformation) added upon raw audios. You don't have to do the syntheses on your own, as it is already embedded into the data provider and is done on the fly, randomly for each epoch during training.
H
Hui Zhang 已提交
4

小湉湉's avatar
小湉湉 已提交
5
Six optional augmentation components are provided to be selected, configured, and inserted into the processing pipeline.
H
Hui Zhang 已提交
6

7
* Audio
H
Hui Zhang 已提交
8 9 10 11 12 13 14
  - Volume Perturbation
  - Speed Perturbation
  - Shifting Perturbation
  - Online Bayesian normalization
  - Noise Perturbation (need background noise audio files)
  - Impulse Response (need impulse audio files)

15 16 17 18
* Feature
  - SpecAugment
  - Adaptive SpecAugment

小湉湉's avatar
小湉湉 已提交
19
To inform the trainer of what augmentation components are needed and what their processing orders are, it is required to prepare in advance an *augmentation configuration file* in [JSON](http://www.json.org/) format. For example:
H
Hui Zhang 已提交
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

```
[{
    "type": "speed",
    "params": {"min_speed_rate": 0.95,
               "max_speed_rate": 1.05},
    "prob": 0.6
},
{
    "type": "shift",
    "params": {"min_shift_ms": -5,
               "max_shift_ms": 5},
    "prob": 0.8
}]
```

小湉湉's avatar
小湉湉 已提交
36
When the `augment_conf_file` argument is set to the path of the above example configuration file, every audio clip in every epoch will be processed: with 60% of chance, it will first be speed perturbed with a uniformly random sampled speed-rate between 0.95 and 1.05, and then with 80% of chance it will be shifted in time with a randomly sampled offset between -5 ms and 5 ms. Finally, this newly synthesized audio clip will be fed into the feature extractor for further training.
H
Hui Zhang 已提交
37

38
For other configuration examples, please refer to `examples/conf/augmentation.example.json`.
H
Hui Zhang 已提交
39

小湉湉's avatar
小湉湉 已提交
40
Be careful when utilizing the data augmentation technique, as improper augmentation will harm the training, due to the enlarged train-test gap.