@@ -74,31 +74,72 @@ Entries to the introduction, and the launch of training and synthsis for differe
## Pre-trained models and audio samples
Parakeet also releases some well-trained parameters for the example models, which can be accessed in the following tables. Each column of these tables lists resources for one model, including the url link to the pre-trained model, the dataset that the model is trained on and the total training steps, and several synthesized audio samples based on the pre-trained model.
Parakeet also releases some well-trained parameters for the example models, which can be accessed in the following tables. Each column of these tables lists resources for one model, including the url link to the pre-trained model, the dataset that the model is trained on, and synthesized audio samples based on the pre-trained model.
- Vocoders
#### Vocoders
We provide the model checkpoints of WaveFlow with 64 and 128 residual channels, ClariNet and WaveNet.
**Note:** The input mel spectrogams are from validation dataset, which are not seen during training.
**Note:** The input mel spectrogams are from validation dataset, which are not seen during training.
- TTS models
#### TTS models
<divalign="center">
<table>
<thead>
<tr>
<thstyle="width: 250px">
Deep Voice 3
</th>
<thstyle="width: 250px">
Transformer TTS
</th>
</tr>
</thead>
<tbody>
<tr>
<th>LJSpeech </th>
<th>LJSpeech </th>
</tr>
<tr>
<thstyle="height: 150px">
To be added soon
</th>
<th>
To be added soon
</th>
</tr>
</tbody>
<thead>
</table>
</div>
Click each link to download, then you can get the compressed package which contains the pre-trained model and the `yaml` config describing how to train the model.
`train.py` and `synthesis.py` have 3 arguments in common, `--checkpooint`, `iteration` and `output`.
1.`output` is the directory for saving results.
During training, checkpoints are saved in `checkpoints/` in `output` and tensorboard log is save in `log/` in `output`. Other possible outputs are saved in `states/` in `outuput`.
During synthesizing, audio files and other possible outputs are save in `synthesis/` in `output`.
So after training and synthesizing with the same output directory, the file structure of the output directory looks like this.
```text
├── checkpoints/ # checkpoint directory (including *.pdparams, *.pdopt and a text file `checkpoint` that records the latest checkpoint)
├── states/ # audio files generated at validation and other possible outputs
├── log/ # tensorboard log
└── synthesis/ # synthesized audio files and other possible outputs
```
2.`--checkpoint` and `--iteration` for loading from existing checkpoint. Loading existing checkpoiont follows the following rule:
If `--checkpoint` is provided, the checkpoint specified by `--checkpoint` is loaded.
If `--checkpoint` is not provided, we try to load the model specified by `--iteration` from the checkpoint directory. If `--iteration` is not provided, we try to load the latested checkpoint from checkpoint directory.
## Train
Train the model using train.py, follow the usage displayed by `python train.py --help`.
Train a ClariNet model with LJspeech and a trained WaveNet model.
train a ClariNet model with LJspeech and a trained WaveNet model.
positional arguments:
output path to save experiment results
optional arguments:
-h, --help show this help message and exit
--config CONFIG path of the config file.
--device DEVICE device to use.
--output OUTPUT path to save student.
--data DATA path of LJspeech dataset.
--resume RESUME checkpoint to load from.
--wavenet WAVENET wavenet checkpoint to use.
```
-h, --help show this help message and exit
--config CONFIG path of the config file
--device DEVICE device to use
--data DATA path of LJspeech dataset
--checkpoint CHECKPOINT checkpoint to resume from
--iteration ITERATION the iteration of the checkpoint to load from output directory
--wavenet WAVENET wavenet checkpoint to use
- `--config` is the configuration file to use. The provided configurations can be used directly. And you can change some values in the configuration file and train the model with a different config.
-`--data` is the path of the LJSpeech dataset, the extracted folder from the downloaded archive (the folder which contains metadata.txt).
-`--resume` is the path of the checkpoint. If it is provided, the model would load the checkpoint before trainig.
-`--output` is the directory to save results, all result are saved in this directory. The structure of the output directory is shown below.
```text
├── checkpoints # checkpoint
├── states # audio files generated at validation
└── log # tensorboard log
```
- `--device` is the device (gpu id) to use for training. `-1` means CPU.
-`--wavenet` is the path of the wavenet checkpoint to load. If you do not specify `--resume`, then this must be provided.
- `--data` is the path of the LJSpeech dataset, the extracted folder from the downloaded archive (the folder which contains `metadata.txt`).
- `--checkpoint` is the path of the checkpoint.
- `--iteration` is the iteration of the checkpoint to load from output directory.
- `output` is the directory to save results, all result are saved in this directory.
Before you start training a ClariNet model, you should have trained a WaveNet model with single Gaussian output distribution. Make sure the config of the teacher model matches that of the trained model.
See [Saving-&-Loading](#Saving-&-Loading) for details of checkpoint loading.
- `--wavenet` is the path of the wavenet checkpoint to load.
When you start training a ClariNet model without loading form a ClariNet checkpoint, you should have trained a WaveNet model with single Gaussian output distribution. Make sure the config of the teacher model matches that of the trained wavenet model.
train a ClariNet model with LJspeech and a trained WaveNet model.
Synthesize audio files from mel spectrogram in the validation set.
positional arguments:
checkpoint checkpoint to load from.
output path to save student.
output path to save the synthesized audio
optional arguments:
-h, --help show this help message and exit
--config CONFIG path of the config file.
--device DEVICE device to use.
--data DATA path of LJspeech dataset.
-h, --help show this help message and exit
--config CONFIG path of the config file
--device DEVICE device to use.
--data DATA path of LJspeech dataset
--checkpoint CHECKPOINT checkpoint to resume from
--iteration ITERATION the iteration of the checkpoint to load from output directory
```
- `--config` is the configuration file to use. You should use the same configuration with which you train you model.
-`--data` is the path of the LJspeech dataset. A dataset is not needed for synthesis, but since the input is mel spectrogram, we need to get mel spectrogram from audio files.
-`checkpoint` is the checkpoint to load.
-`output_path` is the directory to save results. The output path contains the generated audio files (`*.wav`).
- `--device` is the device (gpu id) to use for training. `-1` means CPU.
- `--data` is the path of the LJspeech dataset. In principle, a dataset is not needed for synthesis, but since the input is mel spectrogram, we need to get mel spectrogram from audio files.
- `--checkpoint` is the checkpoint to load.
- `--iteration` is the iteration of the checkpoint to load from output directory.
- `output` is the directory to save synthesized audio. Audio file is saved in `synthesis/` in `output` directory.
See [Saving-&-Loading](#Saving-&-Loading) for details of checkpoint loading.
@@ -30,32 +30,55 @@ The model consists of an encoder, a decoder and a converter (and a speaker embed
└── utils.py utility functions
```
## Saving & Loading
`train.py` and `synthesis.py` have 3 arguments in common, `--checkpooint`, `iteration` and `output`.
1.`output` is the directory for saving results.
During training, checkpoints are saved in `checkpoints/` in `output` and tensorboard log is save in `log/` in `output`. Other possible outputs are saved in `states/` in `outuput`.
During synthesizing, audio files and other possible outputs are save in `synthesis/` in `output`.
So after training and synthesizing with the same output directory, the file structure of the output directory looks like this.
```text
├── checkpoints/ # checkpoint directory (including *.pdparams, *.pdopt and a text file `checkpoint` that records the latest checkpoint)
├── states/ # audio files generated at validation and other possible outputs
├── log/ # tensorboard log
└── synthesis/ # synthesized audio files and other possible outputs
```
2.`--checkpoint` and `--iteration` for loading from existing checkpoint. Loading existing checkpoiont follows the following rule:
If `--checkpoint` is provided, the checkpoint specified by `--checkpoint` is loaded.
If `--checkpoint` is not provided, we try to load the model specified by `--iteration` from the checkpoint directory. If `--iteration` is not provided, we try to load the latested checkpoint from checkpoint directory.
## Train
Train the model using train.py, follow the usage displayed by `python train.py --help`.
-s DATA, --data DATA The path of the LJSpeech dataset.
-r RESUME, --resume RESUME
checkpoint to load
-o OUTPUT, --output OUTPUT
The directory to save result.
-g DEVICE, --device DEVICE
device to use
-h, --help show this help message and exit
--config CONFIG experimrnt config
--data DATA The path of the LJSpeech dataset.
--device DEVICE device to use
--checkpoint CHECKPOINT checkpoint to resume from.
--iteration ITERATION the iteration of the checkpoint to load from output directory
```
-`--config` is the configuration file to use. The provided `ljspeech.yaml` can be used directly. And you can change some values in the configuration file and train the model with a different config.
-`--data` is the path of the LJSpeech dataset, the extracted folder from the downloaded archive (the folder which contains metadata.txt).
-`--resume` is the path of the checkpoint. If it is provided, the model would load the checkpoint before trainig.
-`--output` is the directory to save results, all results are saved in this directory. The structure of the output directory is shown below.
-`--device` is the device (gpu id) to use for training. `-1` means CPU.
-`--checkpoint` is the path of the checkpoint.
-`--iteration` is the iteration of the checkpoint to load from output directory.
See [Saving-&-Loading](#Saving-&-Loading) for details of checkpoint loading.
-`output` is the directory to save results, all results are saved in this directory. The structure of the output directory is shown below.
```text
├── checkpoints # checkpoint
...
...
@@ -67,12 +90,14 @@ optional arguments:
└── waveform # waveform (.wav files)
```
-`--device` is the device (gpu id) to use for training. `-1` means CPU.
--iteration ITERATION the iteration of the checkpoint to load from output directory
```
-`--config` is the configuration file to use. You should use the same configuration with which you train you model.
-`checkpoint` is the checkpoint to load.
-`text`is the text file to synthesize.
-`output_path` is the directory to save results. The output path contains the generated audio files (`*.wav`) and attention plots (*.png) for each sentence.
-`--device` is the device (gpu id) to use for training. `-1` means CPU.
-`--checkpoint` is the path of the checkpoint.
-`--iteration` is the iteration of the checkpoint to load from output directory.
See [Saving-&-Loading](#Saving-&-Loading) for details of checkpoint loading.
-`text`is the text file to synthesize.
-`output` is the directory to save results. The generated audio files (`*.wav`) and attention plots (*.png) for are save in `synthesis/` in ouput directory.
@@ -13,8 +13,8 @@ PaddlePaddle dynamic graph implementation of [WaveFlow: A Compact Flow-based Mod
├── synthesis.py # script for speech synthesis
├── train.py # script for model training
├── utils.py # helper functions for e.g., model checkpointing
├── parakeet/models/waveflow/data.py # dataset and dataloader settings for LJSpeech
├── parakeet/models/waveflow/waveflow.py # WaveFlow model high level APIs
├── data.py # dataset and dataloader settings for LJSpeech
├── waveflow.py # WaveFlow model high level APIs
└── parakeet/models/waveflow/waveflow_modules.py # WaveFlow model implementation
```
...
...
@@ -48,12 +48,12 @@ python -u train.py \
--config=./configs/waveflow_ljspeech.yaml \
--root=./data/LJSpeech-1.1 \
--name=${ModelName}--batch_size=4 \
--parallel=false--use_gpu=true
--use_gpu=true
```
#### Save and Load checkpoints
Our model will save model parameters as checkpoints in `./runs/waveflow/${ModelName}/checkpoint/` every 10000 iterations by default.
Our model will save model parameters as checkpoints in `./runs/waveflow/${ModelName}/checkpoint/` every 10000 iterations by default, where `${ModelName}` is the model name for one single experiment and it could be whatever you like.
The saved checkpoint will have the format of `step-${iteration_number}.pdparams` for model parameters and `step-${iteration_number}.pdopt` for optimizer parameters.
There are three ways to load a checkpoint and resume training (take an example that you want to load a 500000-iteration checkpoint):
Use `export CUDA_VISIBLE_DEVICES=0,1,2,3` to set the GPUs that you want to use to be visible. Then the `paddle.distributed.launch` module will use these visible GPUs to do data parallel training in multiprocessing mode.
`train.py` and `synthesis.py` have 3 arguments in common, `--checkpooint`, `iteration` and `output`.
1.`output` is the directory for saving results.
During training, checkpoints are saved in `checkpoints/` in `output` and tensorboard log is save in `log/` in `output`. Other possible outputs are saved in `states/` in `outuput`.
During synthesizing, audio files and other possible outputs are save in `synthesis/` in `output`.
So after training and synthesizing with the same output directory, the file structure of the output directory looks like this.
```text
├── checkpoints/ # checkpoint directory (including *.pdparams, *.pdopt and a text file `checkpoint` that records the latest checkpoint)
├── states/ # audio files generated at validation and other possible outputs
├── log/ # tensorboard log
└── synthesis/ # synthesized audio files and other possible outputs
```
2.`--checkpoint` and `--iteration` for loading from existing checkpoint. Loading existing checkpoiont follows the following rule:
If `--checkpoint` is provided, the checkpoint specified by `--checkpoint` is loaded.
If `--checkpoint` is not provided, we try to load the model specified by `--iteration` from the checkpoint directory. If `--iteration` is not provided, we try to load the latested checkpoint from checkpoint directory.
## Train
Train the model using train.py. For help on usage, try `python train.py --help`.
--iteration ITERATION the iteration of the checkpoint to load from output directory
```
-`--config` is the configuration file to use. The provided configurations can be used directly. And you can change some values in the configuration file and train the model with a different config.
-`--data` is the path of the LJSpeech dataset, the extracted folder from the downloaded archive (the folder which contains metadata.txt).
-`--resume` is the path of the checkpoint. If it is provided, the model would load the checkpoint before training.
-`--output` is the directory to save results, all result are saved in this directory. The structure of the output directory is shown below.
-`--config` is the configuration file to use. The provided configurations can be used directly. And you can change some values in the configuration file and train the model with a different config.
-`--device` is the device (gpu id) to use for training. `-1` means CPU.
```text
├── checkpoints # checkpoint
└── log # tensorboard log
```
-`--checkpoint` is the path of the checkpoint.
-`--iteration` is the iteration of the checkpoint to load from output directory.
-`output` is the directory to save results, all result are saved in this directory.
See [Saving-&-Loading](#Saving-&-Loading) for details of checkpoint loading.
-`--device` is the device (gpu id) to use for training. `-1` means CPU.
Synthesize valid data from LJspeech with a WaveNet model.
Synthesize valid data from LJspeech with a wavenet model.
positional arguments:
checkpoint checkpoint to load.
output path to save results.
output path to save the synthesized audio
optional arguments:
-h, --help show this help message and exit
--data DATA path of the LJspeech dataset.
--config CONFIG path of the config file.
--device DEVICE device to use.
-h, --help show this help message and exit
--data DATA path of the LJspeech dataset
--config CONFIG path of the config file
--device DEVICE device to use
--checkpoint CHECKPOINT checkpoint to resume from
--iteration ITERATION the iteration of the checkpoint to load from output directory
```
-`--data` is the path of the LJspeech dataset. In principle, a dataset is not needed for synthesis, but since the input is mel spectrogram, we need to get mel spectrogram from audio files.
-`--config` is the configuration file to use. You should use the same configuration with which you train you model.
-`--data` is the path of the LJspeech dataset. A dataset is not needed for synthesis, but since the input is mel spectrogram, we need to get mel spectrogram from audio files.
-`checkpoint` is the checkpoint to load.
-`output_path` is the directory to save results. The output path contains the generated audio files (`*.wav`).
-`--device` is the device (gpu id) to use for training. `-1` means CPU.
-`--checkpoint` is the checkpoint to load.
-`--iteration` is the iteration of the checkpoint to load from output directory.
-`output` is the directory to save synthesized audio. Audio file is saved in `synthesis/` in `output` directory.
See [Saving-&-Loading](#Saving-&-Loading) for details of checkpoint loading.
@@ -128,7 +128,7 @@ class Conv1DTranspose(dg.Conv2DTranspose):
padding=0,
stride=1,
dilation=1,
groups=None,
groups=1,
param_attr=None,
bias_attr=None,
use_cudnn=True,
...
...
@@ -179,7 +179,7 @@ class Conv1DCell(Conv1D):
filter_size,
dilation=1,
causal=False,
groups=None,
groups=1,
param_attr=None,
bias_attr=None,
use_cudnn=True,
...
...
@@ -225,6 +225,12 @@ class Conv1DCell(Conv1D):
defstart_sequence(self):
"""Prepare the Conv1DCell to generate a new sequence, this method should be called before calling add_input multiple times.
WARNING:
This method accesses `self.weight` directly. If a `Conv1DCell` object is wrapped in a `WeightNormWrapper`, make sure this method is called only after the `WeightNormWrapper`'s hook is called.
`WeightNormWrapper` removes the wrapped layer's `weight`, add has a `weight_v` and `weight_g` to re-compute the wrapped layer's weight as $weight = weight_g * weight_v / ||weight_v||$. (Recomputing the `weight` is a hook before calling the wrapped layer's `forward` method.)
Whenever a `WeightNormWrapper`'s `forward` method is called, the wrapped layer's weight is updated. But when loading from a checkpoint, `weight_v` and `weight_g` are updated but the wrapped layer's weight is not, since it is no longer a `Parameter`. You should manually call `remove_weight_norm` or `hook` to re-compute the wrapped layer's weight before calling this method if you don't call `forward` first.
So when loading a model which uses `Conv1DCell` objects wrapped in `WeightNormWrapper`s, remember to call `remove_weight_norm` for all `WeightNormWrapper`s before synthesizing. Also, removing weight norm speeds up computation.