diff --git a/demos/README.md b/demos/README.md index 2a306df6b1e1b3648b7306506adc01d5e0ffcdf2..72b70b237944516ad53048212ecdb3504342e359 100644 --- a/demos/README.md +++ b/demos/README.md @@ -12,6 +12,7 @@ This directory contains many speech applications in multiple scenarios. * speech recognition - recognize text of an audio file * speech server - Server for Speech Task, e.g. ASR,TTS,CLS * streaming asr server - receive audio stream from websocket, and recognize to transcript. +* streaming tts server - receive text from http or websocket, and streaming audio data stream. * speech translation - end to end speech translation * story talker - book reader based on OCR and TTS * style_fs2 - multi style control for FastSpeech2 model diff --git a/demos/README_cn.md b/demos/README_cn.md index 471342127f4e6e49522714d5926f5c185fbdb92b..04fc1fa7d4a4fa5c4df6229dc92b4df9ad6ceda5 100644 --- a/demos/README_cn.md +++ b/demos/README_cn.md @@ -10,8 +10,9 @@ * 元宇宙 - 基于语音合成的 2D 增强现实。 * 标点恢复 - 通常作为语音识别的文本后处理任务,为一段无标点的纯文本添加相应的标点符号。 * 语音识别 - 识别一段音频中包含的语音文字。 -* 语音服务 - 离线语音服务,包括ASR、TTS、CLS等 -* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字 +* 语音服务 - 离线语音服务,包括ASR、TTS、CLS等。 +* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字。 +* 流式语音合成服务 - 根据待合成文本流式生成合成音频数据流。 * 语音翻译 - 实时识别音频中的语言,并同时翻译成目标语言。 * 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。 * 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。 diff --git a/examples/aishell/asr1/README.md b/examples/aishell/asr1/README.md index 25b28ede8c9ed7a891527bb934b072c1c3692476..a7390fd68949f2e74de1b82658d9e85d79513b50 100644 --- a/examples/aishell/asr1/README.md +++ b/examples/aishell/asr1/README.md @@ -1,5 +1,5 @@ # Transformer/Conformer ASR with Aishell -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Aishell dataset](http://www.openslr.org/resources/33) +This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Aishell dataset](http://www.openslr.org/resources/33) ## Overview All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function. | Stage | Function | diff --git a/examples/callcenter/README.md b/examples/callcenter/README.md index 1c715cb694dfde06bd9e7d53fafb401e49b8025b..6d5211461b9820d275a247043665ba53814b46da 100644 --- a/examples/callcenter/README.md +++ b/examples/callcenter/README.md @@ -1,20 +1,3 @@ # Callcenter 8k sample rate -Data distribution: - -``` -676048 utts -491.4004722221223 h -4357792.0 text -2.4633630739178654 text/sec -2.6167397877068495 sec/utt -``` - -train/dev/test partition: - -``` - 33802 manifest.dev - 67606 manifest.test - 574640 manifest.train - 676048 total -``` +This recipe only has model/data config for 8k ASR, user need to prepare data and generate manifest metafile. You can see Aishell or Libripseech. diff --git a/examples/csmsc/vits/README.md b/examples/csmsc/vits/README.md index 5ca57e3a3603eb53fe4bf7c16fc1ba51bbc14147..8f223e07b017e9287aa7e4be823ae9e20c50cd03 100644 --- a/examples/csmsc/vits/README.md +++ b/examples/csmsc/vits/README.md @@ -154,7 +154,7 @@ VITS checkpoint contains files listed below. vits_csmsc_ckpt_1.1.0 ├── default.yaml # default config used to train vitx ├── phone_id_map.txt # phone vocabulary file when training vits -└── snapshot_iter_350000.pdz # model parameters and optimizer states +└── snapshot_iter_333000.pdz # model parameters and optimizer states ``` ps: This ckpt is not good enough, a better result is training @@ -169,7 +169,7 @@ FLAGS_allocator_strategy=naive_best_fit \ FLAGS_fraction_of_gpu_memory_to_use=0.01 \ python3 ${BIN_DIR}/synthesize_e2e.py \ --config=vits_csmsc_ckpt_1.1.0/default.yaml \ - --ckpt=vits_csmsc_ckpt_1.1.0/snapshot_iter_350000.pdz \ + --ckpt=vits_csmsc_ckpt_1.1.0/snapshot_iter_333000.pdz \ --phones_dict=vits_csmsc_ckpt_1.1.0/phone_id_map.txt \ --output_dir=exp/default/test_e2e \ --text=${BIN_DIR}/../sentences.txt \ diff --git a/examples/csmsc/vits/conf/default.yaml b/examples/csmsc/vits/conf/default.yaml index 32f995cc9489359bc91bb951442c5fde78286724..a2aef998d2843bc4330339120304bf8741601bf3 100644 --- a/examples/csmsc/vits/conf/default.yaml +++ b/examples/csmsc/vits/conf/default.yaml @@ -179,7 +179,7 @@ generator_first: False # whether to start updating generator first # OTHER TRAINING SETTING # ########################################################## num_snapshots: 10 # max number of snapshots to keep while training -train_max_steps: 250000 # Number of training steps. == total_iters / ngpus, total_iters = 1000000 +train_max_steps: 350000 # Number of training steps. == total_iters / ngpus, total_iters = 1000000 save_interval_steps: 1000 # Interval steps to save checkpoint. eval_interval_steps: 250 # Interval steps to evaluate the network. seed: 777 # random seed number diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md index ae252a58b58b2bf2602b73c8477e38fa5670a831..ca0081444c7566ffdb47711fa4d4539e70ab9017 100644 --- a/examples/librispeech/asr1/README.md +++ b/examples/librispeech/asr1/README.md @@ -1,5 +1,5 @@ # Transformer/Conformer ASR with Librispeech -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) +This example contains code used to train [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Librispeech dataset](http://www.openslr.org/resources/12) ## Overview All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function. | Stage | Function | diff --git a/examples/librispeech/asr2/README.md b/examples/librispeech/asr2/README.md index 5bc7185a9b0cf4a79dad60ecaeefb1e65b3f8bb1..26978520da25c4542b4d93f53c9945f790bff06f 100644 --- a/examples/librispeech/asr2/README.md +++ b/examples/librispeech/asr2/README.md @@ -1,6 +1,6 @@ # Transformer/Conformer ASR with Librispeech ASR2 -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi. +This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi. To use this example, you need to install Kaldi first. diff --git a/examples/tiny/asr1/README.md b/examples/tiny/asr1/README.md index 6a4999aa649f336a51a927f87965458cd7bdaa96..cfa266704519194fe9062752f4058dcc705988fb 100644 --- a/examples/tiny/asr1/README.md +++ b/examples/tiny/asr1/README.md @@ -1,5 +1,5 @@ # Transformer/Conformer ASR with Tiny -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model Tiny dataset(a part of [[Librispeech dataset](http://www.openslr.org/resources/12)](http://www.openslr.org/resources/33)) +This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with Tiny dataset(a part of [[Librispeech dataset](http://www.openslr.org/resources/12)](http://www.openslr.org/resources/33)) ## Overview All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function. | Stage | Function |