note on conv queue

c7176a87 · Kexin Zhao · d16abc49 · c7176a87 · c7176a87
隐藏空白更改
内联并排

Showing with 10 addition and 2 deletion

examples/waveflow/README.md examples/waveflow/README.md +4 -2

parakeet/models/waveflow/waveflow_modules.py parakeet/models/waveflow/waveflow_modules.py +6 -0

未找到文件。
--- a/examples/waveflow/README.md
+++ b/examples/waveflow/README.md
@@ -22,11 +22,13 @@ PaddlePaddle dynamic graph implementation of [WaveFlow: A Compact Flow-based Mod

 There are many hyperparameters to be tuned depending on the specification of model and dataset you are working on.
 We provide `wavenet_ljspeech.yaml` as a hyperparameter set that works well on the LJSpeech dataset.
+Note that we use [convolutional queue](https://arxiv.org/abs/1611.09482) at audio synthesis to cache the intermediate hidden states, which will speed up the autoregressive inference over the height dimension. Current implementation only supports height dimension equals 8 or 16, i.e., where there is no dilation on the height dimension. Therefore, you can only set value of `n_group` key in the yaml config file to be either 8 or 16.

-Note that `train.py`, `synthesis.py`, and `benchmark.py` all accept a `--config` parameter. To ensure consistency, you should use the same config yaml file for both training, synthesizing and benchmarking. You can also overwrite these preset hyperparameters with command line by updating parameters after `--config`.
+
+Also note that `train.py`, `synthesis.py`, and `benchmark.py` all accept a `--config` parameter. To ensure consistency, you should use the same config yaml file for both training, synthesizing and benchmarking. You can also overwrite these preset hyperparameters with command line by updating parameters after `--config`.
 For example `--config=${yaml} --batch_size=8` can overwrite the corresponding hyperparameters in the `${yaml}` config file. For more details about these hyperparameters, check `utils.add_config_options_to_parser`.

-Note that you also need to specify some additional parameters for `train.py`, `synthesis.py`, and `benchmark.py`, and the details can be found in `train.add_options_to_parser`, `synthesis.add_options_to_parser`, and `benchmark.add_options_to_parser`, respectively.
+Additionally, you need to specify some additional parameters for `train.py`, `synthesis.py`, and `benchmark.py`, and the details can be found in `train.add_options_to_parser`, `synthesis.add_options_to_parser`, and `benchmark.add_options_to_parser`, respectively.

 ### Dataset


--- a/parakeet/models/waveflow/waveflow_modules.py
+++ b/parakeet/models/waveflow/waveflow_modules.py
@@ -391,6 +391,12 @@ class WaveFlowModule(dg.Layer):
        These hidden states along with initial random gaussian latent variable
        are passed to a stack of Flow modules to obtain the audio output.

+        Note that we use convolutional queue (https://arxiv.org/abs/1611.09482)
+        to cache the intermediate hidden states, which will speed up the
+        autoregressive inference over the height dimension. Current
+        implementation only supports height dimension (self.n_group) equals
+        8 or 16, i.e., where there is no dilation on the height dimension.
+
        Args:
            mel (obj): mel spectrograms.
            sigma (float, optional): standard deviation of the guassian latent