@@ -26,36 +26,69 @@ The model consists of encoder, decoder and length regulator three parts.
├── train.py # script for model training
```
## Train Transformer
## Saving & Loading
`train.py` have 3 arguments in common, `--checkpooint`, `iteration` and `output`.
1.`output` is the directory for saving results.
During training, checkpoints are saved in `checkpoints/` in `output` and tensorboard log is save in `log/` in `output`.
During synthesis, results are saved in `samples/` in `output` and tensorboard log is save in `log/` in `output`.
2.`--checkpoint` and `--iteration` for loading from existing checkpoint. Loading existing checkpoiont follows the following rule:
If `--checkpoint` is provided, the checkpoint specified by `--checkpoint` is loaded.
If `--checkpoint` is not provided, we try to load the model specified by `--iteration` from the checkpoint directory. If `--iteration` is not provided, we try to load the latested checkpoint from checkpoint directory.
## Compute Alignment
Before train FastSpeech model, you should have diagonal information. We use the diagonal obtained from the TranformerTTS model as the diagonal, you can run alignments/get_alignments.py to get it.
```bash
cd alignments
python get_alignments.py \
--use_gpu=1 \
--output='./alignments'\
--data=${DATAPATH}\
--config=${CONFIG}\
--checkpoint_transformer=${CHECKPOINT}\
```
where `${DATAPATH}` is the path saved LJSpeech data, `${CHECKPOINT}` is the pretrain model path of TransformerTTS, `${CONFIG}` is the config yaml file of TransformerTTS checkpoint. It necessary for you to prepare a pre-trained TranformerTTS checkpoint.
For more help on arguments:
``python train.py --help``.
Or you can use your own diagonal information, you should process the data into the following format:
```bash
{'fname1': alignment1,
'fname2': alignment2,
...}
```
## Train FastSpeech
FastSpeech model can be trained with ``train.py``.
```bash
python train.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH}\
--transtts_path='../transformer_tts/checkpoint'\
--transformer_step=160000 \
--config_path='config/fastspeech.yaml'\
--data=${DATAPATH}\
--alignments_path=${ALIGNMENTS_PATH}\
--output='./experiment'\
--config='configs/ljspeech.yaml'\
```
Or you can run the script file directly.
```bash
sh train.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
If you want to train on multiple GPUs, start training as follows:
We use Clarinet to synthesis wav, so it necessary for you to prepare a pre-trained [Clarinet checkpoint](https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_ckpt_1.0.zip).
@@ -27,6 +27,16 @@ The model adopts the multi-head attention mechanism to replace the RNN structure
├── train_transformer.py # script for transformer model training
├── train_vocoder.py # script for vocoder model training
```
## Saving & Loading
`train_transformer.py` and `train_vocoer.py` have 3 arguments in common, `--checkpooint`, `iteration` and `output`.
1.`output` is the directory for saving results.
During training, checkpoints are saved in `checkpoints/` in `output` and tensorboard log is save in `log/` in `output`.
During synthesis, results are saved in `samples/` in `output` and tensorboard log is save in `log/` in `output`.
2.`--checkpoint` and `--iteration` for loading from existing checkpoint. Loading existing checkpoiont follows the following rule:
If `--checkpoint` is provided, the checkpoint specified by `--checkpoint` is loaded.
If `--checkpoint` is not provided, we try to load the model specified by `--iteration` from the checkpoint directory. If `--iteration` is not provided, we try to load the latested checkpoint from checkpoint directory.
## Train Transformer
...
...
@@ -34,26 +44,26 @@ TransformerTTS model can be trained with ``train_transformer.py``.
```bash
python train_trasformer.py \
--use_gpu=1 \
--use_data_parallel=0\
--data_path=${DATAPATH}\
--config_path='config/train_transformer.yaml'\
--data=${DATAPATH}\
--output='./experiment'\
--config='configs/ljspeech.yaml'\
```
Or you can run the script file directly.
```bash
sh train_transformer.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
If you want to train on multiple GPUs, you must start training as follows:
If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--transformer_step``.
If you wish to resume from an existing model, See [Saving-&-Loading](#Saving-&-Loading) for details of checkpoint loading.
**Note: In order to ensure the training effect, we recommend using multi-GPU training to enlarge the batch size, and at least 16 samples in single batch per GPU.**
...
...
@@ -65,25 +75,25 @@ Vocoder model can be trained with ``train_vocoder.py``.
```bash
python train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=0\
--data_path=${DATAPATH}\
--config_path='config/train_vocoder.yaml'\
--data=${DATAPATH}\
--output='./vocoder'\
--config='configs/ljspeech.yaml'\
```
Or you can run the script file directly.
```bash
sh train_vocoder.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
If you want to train on multiple GPUs, you must start training as follows: