README.md 3.4 KB
Newer Older
L
lifuchen 已提交
1
# TransformerTTS
Z
zhaokexin01 已提交
2
PaddlePaddle dynamic graph implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).
L
lifuchen 已提交
3

L
lifuchen 已提交
4 5 6 7 8 9 10 11 12 13 14
## Dataset

We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).

```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Model Architecture

![TransformerTTS model architecture](./images/model_architecture.jpg)
Z
zhaokexin01 已提交
15
The model adopts the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implement the CBHG model of Tacotron as the vocoder part and convert the spectrogram into raw wave using Griffin-Lim algorithm.
L
lifuchen 已提交
16 17 18 19 20 21 22 23 24 25 26 27

## Project Structure
```text
├── config                 # yaml configuration files
├── data.py                # dataset and dataloader settings for LJSpeech
├── synthesis.py           # script to synthesize waveform from text
├── train_transformer.py   # script for transformer model training
├── train_vocoder.py       # script for vocoder model training
```

## Train Transformer

Z
zhaokexin01 已提交
28
TransformerTTS model can be trained with ``train_transformer.py``.
L
lifuchen 已提交
29 30 31 32 33 34 35
```bash
python train_trasformer.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```
Z
zhaokexin01 已提交
36
Or you can run the script file directly.
L
lifuchen 已提交
37 38 39
```bash
sh train_transformer.sh
```
Z
zhaokexin01 已提交
40
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
L
lifuchen 已提交
41 42 43 44 45

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \
--use_gpu=1 \
L
lifuchen 已提交
46
--use_data_parallel=1 \
L
lifuchen 已提交
47 48 49 50
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```

Z
zhaokexin01 已提交
51
If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--transformer_step``.
L
lifuchen 已提交
52

L
lifuchen 已提交
53
For more help on arguments:
L
lifuchen 已提交
54 55 56
``python train_transformer.py --help``.

## Train Vocoder
Z
zhaokexin01 已提交
57
Vocoder model can be trained with ``train_vocoder.py``.
L
lifuchen 已提交
58 59 60 61 62 63 64
```bash
python train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
Z
zhaokexin01 已提交
65
Or you can run the script file directly.
L
lifuchen 已提交
66 67 68
```bash
sh train_vocoder.sh
```
Z
zhaokexin01 已提交
69
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
L
lifuchen 已提交
70 71 72 73 74

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \
--use_gpu=1 \
L
lifuchen 已提交
75
--use_data_parallel=1 \
L
lifuchen 已提交
76 77 78
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
Z
zhaokexin01 已提交
79
If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--vocoder_step``.
L
lifuchen 已提交
80

L
lifuchen 已提交
81
For more help on arguments:
L
lifuchen 已提交
82 83 84
``python train_vocoder.py --help``.

## Synthesis
Z
zhaokexin01 已提交
85
After training the TransformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
L
lifuchen 已提交
86 87 88 89 90 91 92 93 94 95 96
```bash
python synthesis.py \
--max_len=50 \
--transformer_step=160000 \
--vocoder_step=70000 \
--use_gpu=1
--checkpoint_path='./checkpoint' \
--sample_path='./sample' \
--config_path='config/synthesis.yaml' \
```

Z
zhaokexin01 已提交
97
Or you can run the script file directly.
L
lifuchen 已提交
98 99 100 101 102 103
```bash
sh synthesis.sh
```

And the audio file will be saved in ``--sample_path``.

L
lifuchen 已提交
104
For more help on arguments:
L
lifuchen 已提交
105
``python synthesis.py --help``.