update details about checkpoint in README (#3673)

* ini commit for deepvoice, add tensorboard to requirements * fix urls for code we adapted from * fix makedirs for python2, fix README * fix open with encoding for python2 compatability * fix python2's str(), use encode for unicode, and str() for int * fix python2 encoding issue, add model architecture and project structure for README * add model structure, add explanation for hyperparameter priority order. * fix repitiio n in README_cn, and reorder that in README * README update; fix integer division issues for python2 compatability * fix data type for input data, specifier the int type as np.int64 to be platform agnostic * fix README for preprocess.py, use io.open instead of open for python2 compatability. * update commanline options, use new save/load API * fix IO conflict bug for data parallel training * only construct summary writer in process 0 to further avoid conflict * fix typos and update details for checkpoints

update details about checkpoint in README (#3673)
* ini commit for deepvoice, add tensorboard to requirements * fix urls for code we adapted from * fix makedirs for python2, fix README * fix open with encoding for python2 compatability * fix python2's str(), use encode for unicode, and str() for int * fix python2 encoding issue, add model architecture and project structure for README * add model structure, add explanation for hyperparameter priority order. * fix repitiio n in README_cn, and reorder that in README * README update; fix integer division issues for python2 compatability * fix data type for input data, specifier the int type as np.int64 to be platform agnostic * fix README for preprocess.py, use io.open instead of open for python2 compatability. * update commanline options, use new save/load API * fix IO conflict bug for data parallel training * only construct summary writer in process 0 to further avoid conflict * fix typos and update details for checkpoints
cae023eb · Feiyu Chan · GitHub · 044f19e7 · cae023eb · cae023eb
隐藏空白更改
内联并排

Showing with 16 addition and 16 deletion

PaddleSpeech/DeepVoice3/README.md PaddleSpeech/DeepVoice3/README.md +9 -9

PaddleSpeech/DeepVoice3/README_cn.md PaddleSpeech/DeepVoice3/README_cn.md +7 -7

未找到文件。
--- a/PaddleSpeech/DeepVoice3/README.md
+++ b/PaddleSpeech/DeepVoice3/README.md
@@ -88,12 +88,12 @@ python preprocess.py \
 Now `${name}$` only supports `ljspeech`. Support for other datasets is pending.
-Assuming that you use `presers/deepvoice3_ljspeech.json` for LJSpeech and the path of the unziped dataset is `~/data/LJSpeech-1.1`, then you can preprocess data with the following command.
+Assuming that you use `presers/deepvoice3_ljspeech.json` for LJSpeech and the path of the unziped dataset is `./data/LJSpeech-1.1`, then you can preprocess data with the following command.
 ```bash
 python preprocess.py \
    --preset=presets/deepvoice3_ljspeech.json \
-    ljspeech ~/data/LJSpeech-1.1/ ./data/ljspeech
+    ljspeech ./data/LJSpeech-1.1/ ./data/ljspeech
 ```
 When this is done, you will see extracted features in `./data/ljspeech` including:
@@ -123,7 +123,7 @@ You can load saved checkpoint and resume training with `--checkpoint`, if you wa
 You can also train parts of the model while freezing other parts, by passing `--train-seq2seq-only` or `--train-postnet-only`. When training only parts of the model, other parts should be loaded from saved checkpoint.
-To train only the `seq2seq` or `postnet`, you should load from a whole model  with `--checkpoint`and keep the same configurations. Note that when training only the `postnet`, you should set `use_decoder_state_for_postnet_input=false`, because when train only the postnet, the postnet takes the ground truth mel-spectrogram as input. Note that the default value for `use_decoder_state_for_postnet_input` is `True`.
+To train only the `seq2seq` or `postnet`, you should load from a whole model  with `--checkpoint` and keep the same configurations with which the checkpoint is trained. Note that when training only the `postnet`, you should set `use_decoder_state_for_postnet_input=false`, because when train only the postnet, the postnet takes the ground truth mel-spectrogram as input. Note that the default value for `use_decoder_state_for_postnet_input` is `True`.
 example:
@@ -148,7 +148,7 @@ python -m paddle.distributed.launch \
    training_script ...
 ```
-`paddle.distributed.launch` parallelizes training in multiprocessing mode.`--selected_gpus` means the logical ids of the selected GPUs, and `started_port` means the port used by the first worker.  Outputs of each worker are saved in `--log_dir.` Then follows the command for training on a single GPU, except that   you should pass `--use-data-paralle` in addition.
+`paddle.distributed.launch` parallelizes training in multiprocessing mode.`--selected_gpus` means the logical ids of the selected GPUs, and `started_port` means the port used by the first worker.  Outputs of each process are saved in `--log_dir.` Then follows the command for training on a single GPU, except that you should pass `--use-data-paralle` in addition.
 ```bash
 export CUDA_VISIBLE_DEVICES=2,3,4,5    # The IDs of visible physical devices
@@ -160,11 +160,11 @@ python -m paddle.distributed.launch \
    --hparams="parameters you may want to override"
 ```
-In the example above, we set only GPU `2, 3, 4, 5` to be visible. Then `--selected_gpus="0, 1, 2, 3"` means the logical ids of the selected gpus, which correpond to GPU `2, 3, 4, 5`.
+In the example above, we set only GPU `2, 3, 4, 5` to be visible. Then `--selected_gpus="0, 1, 2, 3"` means the logical ids of the selected gpus, which correponds to GPU `2, 3, 4, 5`.
-Model checkpoints (`*.pdparams` for the model and `*.pdoptim` for the optimizer)  are saved in `${directory_to_save_results}/checkpoints` per 10000 steps by default. Layer-wise averaged attention alignments (.png) are saved in `${directory_to_save_results}/checkpoints/alignment_ave`. And alignments for each attention layer are saved in `${directory_to_save_results}/checkpoints/alignment_layer{attention_layer_num}` per 10000 steps for inspection.
+Model checkpoints (`*.pdparams` for the model and `*.pdopt` for the optimizer) are saved in `${directory_to_save_results}/checkpoints` per 10000 steps by default. Layer-wise averaged attention alignments (.png) are saved in `${directory_to_save_results}/checkpoints/alignment_ave`. And alignments for each attention layer are saved in `${directory_to_save_results}/checkpoints/alignment_layer{attention_layer_num}` per 10000 steps for inspection.
-Synthesis results of 6 sentences (hardcoded in `eval_model.py`) are saved in `checkpoints/eval`, including  `step{step_num}_text{text_id}_single_alignment.png` for averaged alignments and `step{step_num}_text{text_id}_single_predicted.wav` for the predicted waveforms.
+Synthesis results of 6 sentences (hardcoded in `eval_model.py`) are saved in `${directory_to_save_results}/checkpoints/eval`, including  `step{step_num}_text{text_id}_single_alignment.png` for averaged alignments and `step{step_num}_text{text_id}_single_predicted.wav` for the predicted waveforms.
 ### Monitor with Tensorboard
@@ -199,7 +199,7 @@ generated waveform files and alignment files are saved in `${dst_dir}`.
 According to [Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654), the position rate is different for different datasets. There are 2 position rates, one for the query and the other for the key, which are referred to as $\omega_1$ and $\omega_2$ in th paper, and the corresponding names in preset json are `query_position_rate` and `key_position_rate`.
-For example, the `query_position_rate` and `key_position_rate` for LJSpeech are `1.0` and `1.385`, respectively. These values are computed with `compute_timestamp_ratio.py`. Run the command below, where `${data_root}` means the path of the preprocessed dataset.
+For example, the `query_position_rate` and `key_position_rate` for LJSpeech are `1.0` and `1.385`, respectively. Fix the `query_position_rate` as 1.0, the `key_position_rate` is computed with `compute_timestamp_ratio.py`. Run the command below, where `${data_root}` means the path of the preprocessed dataset.
 ```bash
 python compute_timestamp_ratio.py --preset=${preset_json_path} \
@@ -217,4 +217,4 @@ Then set the `key_position_rate=1.385` and `query_position_rate=1.0` in the pres
 ## Acknowledgement
-We thankfully included and adapted some files r9y9's from [deepvoice3_pytorch](https://github.com/r9y9/deepvoice3_pytorch).
+We thankfully included and adapted some files from r9y9's [deepvoice3_pytorch](https://github.com/r9y9/deepvoice3_pytorch).
--- a/PaddleSpeech/DeepVoice3/README_cn.md
+++ b/PaddleSpeech/DeepVoice3/README_cn.md
@@ -64,7 +64,7 @@ nltk.download("cmudict")
 可以通过 `--hparams` 参数来覆盖预设的超参数配置，参数格式是逗号分隔的键值对 `${key}=${value}`，例如 `--hparams="batch_size=8, nepochs=500"`。
-部分参数可以只和训练有关，如 `batch_size`, `checkpoint_interval`, 用户在训练时可以使用不同的值。但部分参数和数据预处理相关，如 `num_mels` 和 `ref_level_db`, 这些参数在数据预处理和训练时候应该保持一致。
+部分参数只和训练有关，如 `batch_size`, `checkpoint_interval`, 用户在训练时可以使用不同的值。但部分参数和数据预处理相关，如 `num_mels` 和 `ref_level_db`, 这些参数在数据预处理和训练时候应该保持一致。
 关于超参数设置更多细节可以参考 `hparams.py` ，其中定义了 hparams。超参数的优先级序列是：通过命令行参数 `--hparams` 传入的参数优先级高于通过 `--preset` 参数传入的 json 配置文件，高于 `hparams.py` 中的定义。
@@ -88,12 +88,12 @@ python preprocess.py \
 目前 `${name}$` 只支持 `ljspeech`。未来将会支持更多数据集。
-假设你使用 `presers/deepvoice3_ljspeech.json` 作为处理 LJSpeech 的预设配置文件，并且解压后的数据集位于  `~/data/LJSpeech-1.1`, 那么使用如下的命令进行数据预处理。
+假设你使用 `presers/deepvoice3_ljspeech.json` 作为处理 LJSpeech 的预设配置文件，并且解压后的数据集位于  `./data/LJSpeech-1.1`, 那么使用如下的命令进行数据预处理。
 ```bash
 python preprocess.py \
    --preset=presets/deepvoice3_ljspeech.json \
-    ljspeech ~/data/LJSpeech-1.1/ ./data/ljspeech
+    ljspeech ./data/LJSpeech-1.1/ ./data/ljspeech
 ```
 数据处理完成后，你会在 `./data/ljspeech` 看到提取的特征，包含如下文件。
@@ -138,7 +138,7 @@ python train.py --data-root=${data-root} --use-gpu \
 ### 使用 GPU 多卡训练
-本模型支持使用多个 GPU 通过数据并行的方式 训练。方法是使用 `paddle.distributed.launch` 模块来启动 `train.py`。
+本模型支持使用多个 GPU 通过数据并行的方式训练。方法是使用 `paddle.distributed.launch` 模块来启动 `train.py`。
 ```bash
 python -m paddle.distributed.launch \
@@ -163,9 +163,9 @@ python -m paddle.distributed.launch \
 上述的示例中，设置了 `2, 3, 4, 5` 号显卡为可见的 GPU。然后 `--selected_gpus=0,1,2,3` 选择的是 GPU 的逻辑序号，分别对应于  `2, 3, 4, 5` 号卡。
-模型默认被保存为后缀为 `.model`的文件夹，保存在 `${directory_to_save_results}/checkpoints` 文件夹中。多层平均的注意力机制对齐结果被保存为 `.png` 图片，默认保存在 `${directory_to_save_results}/checkpoints/alignment_ave` 中。每一层的注意力机制对齐结果默认被保存在 `${directory_to_save_results}/checkpoints/alignment_layer{attention_layer_num}`文件夹中。默认每 10000 步保存一次用于查看。
+模型 (模型参数保存为`*.pdparams` 文件，优化器被保存为 `*.pdopt` 文件)保存在 `${directory_to_save_results}/checkpoints` 文件夹中。多层平均的注意力机制对齐结果被保存为 `.png` 图片，默认保存在 `${directory_to_save_results}/checkpoints/alignment_ave` 中。每一层的注意力机制对齐结果默认被保存在 `${directory_to_save_results}/checkpoints/alignment_layer{attention_layer_num}`文件夹中。默认每 10000 步保存一次用于查看。
-对 6 个给定的句子的语音合成结果保存在 `checkpoints/eval` 中，包含多层平均平均的注意力机制对齐结果，这被保存为名为  `step{step_num}_text{text_id}_single_alignment.png` 的图片；以及合成的音频文件，保存为名为 `step{step_num}_text{text_id}_single_predicted.wav` 的音频。
+对 6 个给定的句子的语音合成结果保存在 `${directory_to_save_results}/checkpoints/eval` 中，包含多层平均平均的注意力机制对齐结果，这被保存为名为  `step{step_num}_text{text_id}_single_alignment.png` 的图片；以及合成的音频文件，保存为名为 `step{step_num}_text{text_id}_single_predicted.wav` 的音频。
 ### 使用 Tensorboard 查看训练
@@ -200,7 +200,7 @@ A text-to-speech synthesis system typically consists of multiple stages, such as
 根据 [Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654), 对于不同的数据集，会有不同的 position rate. 有两个不同的 position rate，一个用于 query 一个用于 key， 这在论文中称为 $\omega_1$ 和 $\omega_2$ ，在预设配置文件中的名字分别为 `query_position_rate` 和 `key_position_rate`。
-比如 LJSpeech 数据集的 `query_position_rate` 和 `key_position_rate` 分别为 `1.0` 和 `1.385`。这些值可以使用 `compute_timestamp_ratio.py` 计算，命令如下，其中 `${data_root}` 是预处理后的数据集路径。
+比如 LJSpeech 数据集的 `query_position_rate` 和 `key_position_rate` 分别为 `1.0` 和 `1.385`。固定 `query_position_rate` 为 1.0，`key_position_rate` 可以使用 `compute_timestamp_ratio.py` 计算，命令如下，其中 `${data_root}` 是预处理后的数据集路径。
 ```bash
 python compute_timestamp_ratio.py --preset=${preset_json_path} \