add scripts for tts code switch

83d93da8 · Hui Zhang · bae905c9 · 83d93da8 · 83d93da8 · 83d93da8
5 changed file
--- a/examples/zh_en_tts/tts3/.gitignore
+++ b/examples/zh_en_tts/tts3/.gitignore
+data
+exp
--- a/examples/zh_en_tts/tts3/README.md
+++ b/examples/zh_en_tts/tts3/README.md
@@ -6,11 +6,11 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2
 ## Dataset
 ### Download and Extract
-Download all datasets and extract it to `~/datasets`:
+Download all datasets and extract it to `./data`:
- The CSMSC dataset is in the directory `~/datasets/BZNSYP`
+- The CSMSC dataset is in the directory `./data/BZNSYP`
- The Ljspeech dataset is in the directory `~/datasets/LJSpeech-1.1`
+- The Ljspeech dataset is in the directory `./data/LJSpeech-1.1`
- The aishell3 dataset is in the directory `~/datasets/data_aishell3`
+- The aishell3 dataset is in the directory `./data/data_aishell3`
- The vctk dataset is in the directory `~/datasets/VCTK-Corpus-0.92`
+- The vctk dataset is in the directory `./data/VCTK-Corpus-0.92`
 ### Get MFA Result and Extract
 We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for the fastspeech2 training.
@@ -24,16 +24,16 @@ Or train your MFA model reference to [mfa example](https://github.com/PaddlePadd
 ## Get Started
 Assume the paths to the datasets are:
- `~/datasets/BZNSYP`
+- `./data/BZNSYP`
- `~/datasets/LJSpeech-1.1`
+- `./data/LJSpeech-1.1`
- `~/datasets/data_aishell3` 
+- `./data/data_aishell3` 
- `~/datasets/VCTK-Corpus-0.92`
+- `./data/VCTK-Corpus-0.92`
 Assume the path to the MFA results of the datasets are:
- `./mfa_results/baker_alignment_tone`
+- `./data/mfa/baker_alignment_tone`
- `./mfa_results/ljspeech_alignment`
+- `./data/mfa/ljspeech_alignment`
- `./mfa_results/aishell3_alignment_tone`
+- `./data/mfa/aishell3_alignment_tone`
- `./mfa_results/vctk_alignment`
+- `./data/mfa/vctk_alignment`
 Run the command below to
 1. **source path**.
@@ -288,6 +288,9 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \
  --am_config=fastspeech2_mix_ckpt_1.2.0/default.yaml \
  --am_ckpt=fastspeech2_mix_ckpt_1.2.0/snapshot_iter_99200.pdz \
  --am_stat=fastspeech2_mix_ckpt_1.2.0/speech_stats.npy \
+  --phones_dict=fastspeech2_mix_ckpt_1.2.0/phone_id_map.txt \
+  --speaker_dict=fastspeech2_mix_ckpt_1.2.0/speaker_id_map.txt \
+  --spk_id=174 \
  --voc=pwgan_aishell3 \
  --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \
  --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
@@ -295,8 +298,5 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \
  --lang=mix \
  --text=${BIN_DIR}/../sentences_mix.txt \
  --output_dir=exp/default/test_e2e \
-  --phones_dict=fastspeech2_mix_ckpt_1.2.0/phone_id_map.txt \
-  --speaker_dict=fastspeech2_mix_ckpt_1.2.0/speaker_id_map.txt \
-  --spk_id=174 \
  --inference_dir=exp/default/inference
 ```
--- a/examples/zh_en_tts/tts3/local/mfa_download.sh
+++ b/examples/zh_en_tts/tts3/local/mfa_download.sh
+#!/bin/bash
+exp=exp
+mfa=$exp/mfa
+mkdir -p $mfa
+pushd $mfa
+wget -c https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz &
+wget -c https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz &
+wget -c https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz &
+wget -c https://paddlespeech.bj.bcebos.com/MFA/VCTK-Corpus-0.92/vctk_alignment.tar.gz &
+wait
+popd
--- a/examples/zh_en_tts/tts3/local/model_download.sh
+++ b/examples/zh_en_tts/tts3/local/model_download.sh
+#!/bin/bash
+exp=exp
+pretrain=$exp/pretrain
+mkdir -p $pretrain
+pushd $pretrain
+wget -c https://paddlespeech.bj.bcebos.com/t2s/chinse_english_mixed/models/fastspeech2_mix_ckpt_1.2.0.zip &
+wait
+popd
--- a/examples/zh_en_tts/tts3/run.sh
+++ b/examples/zh_en_tts/tts3/run.sh
@@ -7,8 +7,8 @@ gpus=0,1
 stage=0
 stop_stage=100
-datasets_root_dir=~/datasets
+datasets_root_dir=./data
-mfa_root_dir=./mfa_results/
+mfa_root_dir=./data/mfa
 conf_path=conf/default.yaml
 train_output_path=exp/default
 ckpt_name=snapshot_iter_99200.pdz