提交 c7176a87 编写于 作者: K Kexin Zhao

note on conv queue

上级 d16abc49
...@@ -22,11 +22,13 @@ PaddlePaddle dynamic graph implementation of [WaveFlow: A Compact Flow-based Mod ...@@ -22,11 +22,13 @@ PaddlePaddle dynamic graph implementation of [WaveFlow: A Compact Flow-based Mod
There are many hyperparameters to be tuned depending on the specification of model and dataset you are working on. There are many hyperparameters to be tuned depending on the specification of model and dataset you are working on.
We provide `wavenet_ljspeech.yaml` as a hyperparameter set that works well on the LJSpeech dataset. We provide `wavenet_ljspeech.yaml` as a hyperparameter set that works well on the LJSpeech dataset.
Note that we use [convolutional queue](https://arxiv.org/abs/1611.09482) at audio synthesis to cache the intermediate hidden states, which will speed up the autoregressive inference over the height dimension. Current implementation only supports height dimension equals 8 or 16, i.e., where there is no dilation on the height dimension. Therefore, you can only set value of `n_group` key in the yaml config file to be either 8 or 16.
Note that `train.py`, `synthesis.py`, and `benchmark.py` all accept a `--config` parameter. To ensure consistency, you should use the same config yaml file for both training, synthesizing and benchmarking. You can also overwrite these preset hyperparameters with command line by updating parameters after `--config`.
Also note that `train.py`, `synthesis.py`, and `benchmark.py` all accept a `--config` parameter. To ensure consistency, you should use the same config yaml file for both training, synthesizing and benchmarking. You can also overwrite these preset hyperparameters with command line by updating parameters after `--config`.
For example `--config=${yaml} --batch_size=8` can overwrite the corresponding hyperparameters in the `${yaml}` config file. For more details about these hyperparameters, check `utils.add_config_options_to_parser`. For example `--config=${yaml} --batch_size=8` can overwrite the corresponding hyperparameters in the `${yaml}` config file. For more details about these hyperparameters, check `utils.add_config_options_to_parser`.
Note that you also need to specify some additional parameters for `train.py`, `synthesis.py`, and `benchmark.py`, and the details can be found in `train.add_options_to_parser`, `synthesis.add_options_to_parser`, and `benchmark.add_options_to_parser`, respectively. Additionally, you need to specify some additional parameters for `train.py`, `synthesis.py`, and `benchmark.py`, and the details can be found in `train.add_options_to_parser`, `synthesis.add_options_to_parser`, and `benchmark.add_options_to_parser`, respectively.
### Dataset ### Dataset
......
...@@ -391,6 +391,12 @@ class WaveFlowModule(dg.Layer): ...@@ -391,6 +391,12 @@ class WaveFlowModule(dg.Layer):
These hidden states along with initial random gaussian latent variable These hidden states along with initial random gaussian latent variable
are passed to a stack of Flow modules to obtain the audio output. are passed to a stack of Flow modules to obtain the audio output.
Note that we use convolutional queue (https://arxiv.org/abs/1611.09482)
to cache the intermediate hidden states, which will speed up the
autoregressive inference over the height dimension. Current
implementation only supports height dimension (self.n_group) equals
8 or 16, i.e., where there is no dilation on the height dimension.
Args: Args:
mel (obj): mel spectrograms. mel (obj): mel spectrograms.
sigma (float, optional): standard deviation of the guassian latent sigma (float, optional): standard deviation of the guassian latent
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册