Paddle fluid implementation of Fastspeech, a feed-forward network based on Transformer. The implementation is based on [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).
PaddlePaddle dynamic graph implementation of Fastspeech, a feed-forward network based on Transformer. The implementation is based on [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).
## Dataset
## Dataset
...
@@ -14,7 +14,7 @@ tar xjvf LJSpeech-1.1.tar.bz2
...
@@ -14,7 +14,7 @@ tar xjvf LJSpeech-1.1.tar.bz2


FastSpeech is a feed-forward structure based on Transformer, instead of using the encoder-attention-decoder based architecture. This model extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length
FastSpeech is a feed-forward structure based on Transformer, instead of using the encoder-attention-decoder based architecture. This model extracts attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length
regulator to expand the source phoneme sequence to match the length of the target
regulator to expand the source phoneme sequence to match the length of the target
mel-spectrogram sequence for parallel mel-spectrogram generation. We use the TransformerTTS as teacher model.
mel-spectrogram sequence for parallel mel-spectrogram generation. We use the TransformerTTS as teacher model.
The model consists of encoder, decoder and length regulator three parts.
The model consists of encoder, decoder and length regulator three parts.
...
@@ -28,7 +28,7 @@ The model consists of encoder, decoder and length regulator three parts.
...
@@ -28,7 +28,7 @@ The model consists of encoder, decoder and length regulator three parts.
## Train Transformer
## Train Transformer
FastSpeech model can train with ``train.py``.
FastSpeech model can be trained with ``train.py``.
```bash
```bash
python train.py \
python train.py \
--use_gpu=1 \
--use_gpu=1 \
...
@@ -38,11 +38,11 @@ python train.py \
...
@@ -38,11 +38,11 @@ python train.py \
--transformer_step=160000 \
--transformer_step=160000 \
--config_path='config/fastspeech.yaml'\
--config_path='config/fastspeech.yaml'\
```
```
or you can run the script file directly.
Or you can run the script file directly.
```bash
```bash
sh train.sh
sh train.sh
```
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows: