[TTS]Cantonese TTS checkpoint for e2e, test=tts (#2932)

25530223 · HuangLiangJie · GitHub · 1af9bd47 · 25530223 · 25530223
隐藏空白更改
内联并排

Showing with 40 addition and 2 deletion

examples/canton/tts3/README.md examples/canton/tts3/README.md +38 -0

examples/canton/tts3/run.sh examples/canton/tts3/run.sh +2 -2

未找到文件。
--- a/examples/canton/tts3/README.md
+++ b/examples/canton/tts3/README.md
@@ -75,3 +75,41 @@ Also, there is a `metadata.jsonl` in each subfolder. It is a table-like file tha
 ### Training details can refer to the script of [examples/aishell3/tts3](../../aishell3/tts3).

 ## Pretrained Model
+Pretrained FastSpeech2 model with no silence in the edge of audios:
+- [fastspeech2_canton_ckpt_1.4.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_canton_ckpt_1.4.0.zip)
+
+FastSpeech2 checkpoint contains files listed below.
+
+```text
+fastspeech2_canton_ckpt_1.4.0
+├── default.yaml            # default config used to train fastspeech2
+├── energy_stats.npy        # statistics used to normalize energy when training fastspeech2
+├── phone_id_map.txt        # phone vocabulary file when training fastspeech2
+├── pitch_stats.npy         # statistics used to normalize pitch when training fastspeech2
+├── snapshot_iter_140000.pdz # model parameters and optimizer states
+├── speaker_id_map.txt      # speaker id map file when training a multi-speaker fastspeech2
+└── speech_stats.npy        # statistics used to normalize spectrogram when training fastspeech2
+```
+You can use the following scripts to synthesize for `${BIN_DIR}/../sentences_canton.txt` using pretrained fastspeech2 and parallel wavegan models.
+```bash
+source path.sh
+
+FLAGS_allocator_strategy=naive_best_fit \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python3 ${BIN_DIR}/../synthesize_e2e.py \
+  --am=fastspeech2_aishell3 \
+  --am_config=fastspeech2_canton_ckpt_1.4.0/default.yaml \
+  --am_ckpt=fastspeech2_canton_ckpt_1.4.0/snapshot_iter_140000.pdz \
+  --am_stat=fastspeech2_canton_ckpt_1.4.0/speech_stats.npy \
+  --voc=pwgan_aishell3 \
+  --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \
+  --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
+  --voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy \
+  --lang=canton \
+  --text=${BIN_DIR}/../sentences_canton.txt \
+  --output_dir=exp/default/test_e2e \
+  --phones_dict=fastspeech2_canton_ckpt_1.4.0/phone_id_map.txt \
+  --speaker_dict=fastspeech2_canton_ckpt_1.4.0/speaker_id_map.txt \
+  --spk_id=0 \
+  --inference_dir=exp/default/inference
+```
--- a/examples/canton/tts3/run.sh
+++ b/examples/canton/tts3/run.sh
@@ -3,14 +3,14 @@
 set -e
 source path.sh

-gpus=0
+gpus=0,1
 stage=0
 stop_stage=100

 conf_path=conf/default.yaml
 train_output_path=exp/default

-ckpt_name=snapshot_iter_280000.pdz
+ckpt_name=snapshot_iter_140000.pdz

 # with the following command, you can choose the stage range you want to run
 # such as `./run.sh --stage 0 --stop-stage 0`