diff --git a/README_cn.md b/README_cn.md
index 27cd6089064b6d526cdee2b19a036a324c6686a2..8c018a08e3d99a3e33d9750c3eacebb41cc17d80 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -691,7 +691,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- 非常感谢 [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) 基于 PaddleSpeech 的 TTS 模型搭建带 GUI 操作界面的配音工具。
- 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。
-- 非常感谢 [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) 基于 PaddleSpeech 的 ASR与TTS 设计的可听、说对话机器人
+- 非常感谢 [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) 基于 PaddleSpeech 的 ASR 与 TTS 设计的可听、说对话机器人。
此外,PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。
diff --git a/docs/source/install_cn.md b/docs/source/install_cn.md
index 345e79bb5dd8fd4d089c873cd42f7de8444088ba..52eb5dadac443a6826ef1f696c624dcc956cbf3d 100644
--- a/docs/source/install_cn.md
+++ b/docs/source/install_cn.md
@@ -116,7 +116,7 @@ conda install -y -c gcc_linux-64=8.4.0 gxx_linux-64=8.4.0
python3 -m pip install paddlepaddle-gpu==2.2.0 -i https://mirror.baidu.com/pypi/simple
```
### 安装 PaddleSpeech
-最后安装 `paddlespeech`,这样你就可以使用 `paddlespeech`中已有的 examples:
+最后安装 `paddlespeech`,这样你就可以使用 `paddlespeech` 中已有的 examples:
```bash
# 部分用户系统由于默认源的问题,安装中会出现kaldiio安转出错的问题,建议首先安装pytest-runner:
pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
@@ -137,7 +137,7 @@ Docker 是一种开源工具,用于在和系统本身环境相隔离的环境
在 [Docker Hub](https://hub.docker.com/repository/docker/paddlecloud/paddlespeech) 中获取这些镜像及相应的使用指南,包括 CPU、GPU、ROCm 版本。
-如果您对自动化制作docker镜像感兴趣,或有自定义需求,请访问 [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton) 做进一步了解。
+如果您对自动化制作 docker 镜像感兴趣,或有自定义需求,请访问 [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton) 做进一步了解。
完成这些以后,你就可以在 docker 容器中执行训练、推理和超参 fine-tune。
### 选择2: 使用有 root 权限的 Ubuntu
- 使用apt安装 `build-essential`
@@ -173,7 +173,7 @@ conda install -y -c conda-forge sox libsndfile swig bzip2 libflac bc
python3 -m pip install paddlepaddle-gpu==2.2.0 -i https://mirror.baidu.com/pypi/simple
```
### 用开发者模式安装 PaddleSpeech
-部分用户系统由于默认源的问题,安装中会出现kaldiio安转出错的问题,建议首先安装pytest-runner:
+部分用户系统由于默认源的问题,安装中会出现 kaldiio 安转出错的问题,建议首先安装 pytest-runner:
```bash
pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
```
diff --git a/docs/source/released_model.md b/docs/source/released_model.md
index 80d6b44b476191b3d4c81ad5fa65f92d02c13d0c..551a86ef0bd013120597be512f6a78242314f59f 100644
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@@ -1,4 +1,3 @@
-
# Released Models
## Speech-to-Text Models
@@ -34,32 +33,33 @@ Language Model | Training Data | Token-based | Size | Descriptions
## Text-to-Speech Models
### Acoustic Models
-Model Type | Dataset| Example Link | Pretrained Models|Static Models|Size (static)
+Model Type | Dataset| Example Link | Pretrained Models|Static/ONNX Models|Size (static)
:-------------:| :------------:| :-----: | :-----:| :-----:| :-----:
Tacotron2|LJSpeech|[tacotron2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_ljspeech_ckpt_0.2.0.zip)|||
Tacotron2|CSMSC|[tacotron2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts0)|[tacotron2_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_ckpt_0.2.0.zip)|[tacotron2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip)|103MB|
TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip)|||
-SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2)|[speedyspeech_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_ckpt_0.2.0.zip)|[speedyspeech_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip)|12MB|
-FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip)|157MB|
+SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2)|[speedyspeech_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_ckpt_0.2.0.zip)|[speedyspeech_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip) [speedyspeech_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_onnx_0.2.0.zip)|13MB|
+FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip) [fastspeech2_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_onnx_0.2.0.zip)|157MB|
FastSpeech2-Conformer| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_conformer_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_baker_ckpt_0.5.zip)|||
-FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)|||
-FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)|||
-FastSpeech2| VCTK |[fastspeech2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)|||
+FastSpeech2-CNNDecoder| CSMSC| [fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)| [fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip) | [fastspeech2_cnndecoder_csmsc_static_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_static_1.0.0.zip) [fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip) [fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip) [fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip) | 84MB|
+FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)|[fastspeech2_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip) [fastspeech2_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip)|147MB|
+FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)|[fastspeech2_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_static_1.1.0.zip) [fastspeech2_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_onnx_1.1.0.zip)|145MB|
+FastSpeech2| VCTK |[fastspeech2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)|[fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip) [fastspeech2_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip) | 145MB|
### Vocoders
-Model Type | Dataset| Example Link | Pretrained Models| Static Models|Size (static)
+Model Type | Dataset| Example Link | Pretrained Models| Static/ONNX Models|Size (static)
:-----:| :-----:| :-----: | :-----:| :-----:| :-----:
WaveFlow| LJSpeech |[waveflow-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc0)|[waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/waveflow/waveflow_ljspeech_ckpt_0.3.zip)|||
-Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip)|[pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip)|5.1MB|
-Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)|||
-Parallel WaveGAN| AISHELL-3 |[PWGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc1)|[pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)|||
-Parallel WaveGAN| VCTK |[PWGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc1)|[pwg_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.5.zip)|||
-|Multi Band MelGAN | CSMSC |[MB MelGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc3) | [mb_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip)
[mb_melgan_baker_finetune_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip)|[mb_melgan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip) |8.2MB|
+Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip)|[pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip) [pwgan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_csmsc_onnx_0.2.0.zip)|4.8MB|
+Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)|[pwgan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_static_1.1.0.zip) [pwgan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_onnx_1.1.0.zip)|4.8MB|
+Parallel WaveGAN| AISHELL-3 |[PWGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc1)|[pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)| [pwgan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_static_1.1.0.zip) [pwgan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_onnx_1.1.0.zip)|4.8MB|
+Parallel WaveGAN| VCTK |[PWGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc1)|[pwg_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.5.zip)|[pwgan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_static_1.1.0.zip) [pwgan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_onnx_1.1.0.zip)|4.8MB|
+|Multi Band MelGAN | CSMSC |[MB MelGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc3) | [mb_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip)
[mb_melgan_baker_finetune_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip)|[mb_melgan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip) [mb_melgan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip)|7.6MB|
Style MelGAN | CSMSC |[Style MelGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc4)|[style_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/style_melgan/style_melgan_csmsc_ckpt_0.1.1.zip)| | |
-HiFiGAN | CSMSC |[HiFiGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5)|[hifigan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip)|[hifigan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip)|50MB|
-HiFiGAN | LJSpeech |[HiFiGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc5)|[hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip)|||
-HiFiGAN | AISHELL-3 |[HiFiGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc5)|[hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip)|||
-HiFiGAN | VCTK |[HiFiGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc5)|[hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip)|||
+HiFiGAN | CSMSC |[HiFiGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5)|[hifigan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip)|[hifigan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip) [hifigan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_onnx_0.2.0.zip)|46MB|
+HiFiGAN | LJSpeech |[HiFiGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc5)|[hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip)|[hifigan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_static_1.1.0.zip) [hifigan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_onnx_1.1.0.zip) |49MB|
+HiFiGAN | AISHELL-3 |[HiFiGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc5)|[hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip)|[hifigan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_static_1.1.0.zip) [hifigan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_onnx_1.1.0.zip)|46MB|
+HiFiGAN | VCTK |[HiFiGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc5)|[hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip)|[hifigan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_static_1.1.0.zip) [hifigan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_onnx_1.1.0.zip)|46MB|
WaveRNN | CSMSC |[WaveRNN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc6)|[wavernn_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_ckpt_0.2.0.zip)|[wavernn_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_0.2.0.zip)|18MB|
diff --git a/examples/aishell3/tts3/README.md b/examples/aishell3/tts3/README.md
index 31c99898ccc7839b835a0fbd7daec550a36de340..21bad51ecfbbce58acc1ef14c82dadbc6b09634c 100644
--- a/examples/aishell3/tts3/README.md
+++ b/examples/aishell3/tts3/README.md
@@ -220,6 +220,12 @@ Pretrained FastSpeech2 model with no silence in the edge of audios:
- [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
- [fastspeech2_conformer_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_aishell3_ckpt_0.2.0.zip) (Thanks for [@awmmmm](https://github.com/awmmmm)'s contribution)
+The static model can be downloaded here:
+- [fastspeech2_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [fastspeech2_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip)
+
FastSpeech2 checkpoint contains files listed below.
```text
diff --git a/examples/aishell3/tts3/local/inference.sh b/examples/aishell3/tts3/local/inference.sh
index 3b03b53ce1c49d58dd6d89fbd1d09c115e231925..dc05ec59218533631fe9fa16f5ec4ca1c4f4b3db 100755
--- a/examples/aishell3/tts3/local/inference.sh
+++ b/examples/aishell3/tts3/local/inference.sh
@@ -17,3 +17,14 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
--spk_id=0
fi
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../inference.py \
+ --inference_dir=${train_output_path}/inference \
+ --am=fastspeech2_aishell3 \
+ --voc=hifigan_aishell3 \
+ --text=${BIN_DIR}/../sentences.txt \
+ --output_dir=${train_output_path}/pd_infer_out \
+ --phones_dict=dump/phone_id_map.txt \
+ --speaker_dict=dump/speaker_id_map.txt \
+ --spk_id=0
+fi
diff --git a/examples/aishell3/tts3/local/ort_predict.sh b/examples/aishell3/tts3/local/ort_predict.sh
new file mode 100755
index 0000000000000000000000000000000000000000..24e66f689603519e894fecceae22c644000719f9
--- /dev/null
+++ b/examples/aishell3/tts3/local/ort_predict.sh
@@ -0,0 +1,32 @@
+train_output_path=$1
+
+stage=0
+stop_stage=0
+
+# e2e, synthesize from text
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_aishell3 \
+ --voc=pwgan_aishell3 \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --spk_id=0
+
+fi
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_aishell3 \
+ --voc=hifigan_aishell3 \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --spk_id=0
+fi
diff --git a/examples/aishell3/tts3/local/paddle2onnx.sh b/examples/aishell3/tts3/local/paddle2onnx.sh
new file mode 120000
index 0000000000000000000000000000000000000000..8d5dbef4ca64b96b1d90d8fb812efccbe7ab3f3e
--- /dev/null
+++ b/examples/aishell3/tts3/local/paddle2onnx.sh
@@ -0,0 +1 @@
+../../../csmsc/tts3/local/paddle2onnx.sh
\ No newline at end of file
diff --git a/examples/aishell3/tts3/run.sh b/examples/aishell3/tts3/run.sh
index b375f215984e92ff8acd7ad5f91da67e16863716..868087a01d47d9673231d430cf860a870e1f4f66 100755
--- a/examples/aishell3/tts3/run.sh
+++ b/examples/aishell3/tts3/run.sh
@@ -27,11 +27,34 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
- # synthesize, vocoder is pwgan
+ # synthesize, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
- # synthesize_e2e, vocoder is pwgan
+ # synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
+
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+ # inference with static model, vocoder is pwgan by default
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
+fi
+
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ # install paddle2onnx
+ version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
+ if [[ -z "$version" || ${version} != '0.9.8' ]]; then
+ pip install paddle2onnx==0.9.8
+ fi
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_aishell3
+ # considering the balance between speed and quality, we recommend that you use hifigan as vocoder
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_aishell3
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_aishell3
+
+fi
+
+# inference with onnxruntime, use fastspeech2 + hifigan by default
+if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
+ ./local/ort_predict.sh ${train_output_path}
+fi
diff --git a/examples/aishell3/voc1/README.md b/examples/aishell3/voc1/README.md
index a3daf3dfd6889e60f9d71be396bc9b3d2404fe54..bc25f43cf78fdfeec1152bdcb436422c0d2b8fc6 100644
--- a/examples/aishell3/voc1/README.md
+++ b/examples/aishell3/voc1/README.md
@@ -133,6 +133,12 @@ optional arguments:
Pretrained models can be downloaded here:
- [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)
+The static model can be downloaded here:
+- [pwgan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [pwgan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_onnx_1.1.0.zip)
+
Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss:| eval/spectral_convergence_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
default| 1(gpu) x 400000|1.968762|0.759008|0.218524
diff --git a/examples/aishell3/voc5/README.md b/examples/aishell3/voc5/README.md
index c3e3197d63c0e871dabd61e67992335fc8d5d1f9..7f99a52e3f1cb79d98489ab56eb60f757afd6bc9 100644
--- a/examples/aishell3/voc5/README.md
+++ b/examples/aishell3/voc5/README.md
@@ -116,6 +116,11 @@ optional arguments:
The pretrained model can be downloaded here:
- [hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip)
+The static model can be downloaded here:
+- [hifigan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [hifigan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_onnx_1.1.0.zip)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
diff --git a/examples/csmsc/tts2/local/ort_predict.sh b/examples/csmsc/tts2/local/ort_predict.sh
index 46b0409b81946a18cf0969fbca0d8cf3f43c7abc..8ca4c0e9bd06ea5ff836ff0ceb61bae28554ca07 100755
--- a/examples/csmsc/tts2/local/ort_predict.sh
+++ b/examples/csmsc/tts2/local/ort_predict.sh
@@ -3,22 +3,34 @@ train_output_path=$1
stage=0
stop_stage=0
-# only support default_fastspeech2/speedyspeech + hifigan/mb_melgan now!
-
-# synthesize from metadata
+# e2e, synthesize from text
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
- python3 ${BIN_DIR}/../ort_predict.py \
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
--inference_dir=${train_output_path}/inference_onnx \
--am=speedyspeech_csmsc \
- --voc=hifigan_csmsc \
- --test_metadata=dump/test/norm/metadata.jsonl \
- --output_dir=${train_output_path}/onnx_infer_out \
+ --voc=pwgan_csmsc \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --tones_dict=dump/tone_id_map.txt \
--device=cpu \
--cpu_threads=2
fi
-# e2e, synthesize from text
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=speedyspeech_csmsc \
+ --voc=mb_melgan_csmsc \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --tones_dict=dump/tone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2
+fi
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
python3 ${BIN_DIR}/../ort_predict_e2e.py \
--inference_dir=${train_output_path}/inference_onnx \
--am=speedyspeech_csmsc \
@@ -30,3 +42,15 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
--device=cpu \
--cpu_threads=2
fi
+
+# synthesize from metadata
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+ python3 ${BIN_DIR}/../ort_predict.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=speedyspeech_csmsc \
+ --voc=hifigan_csmsc \
+ --test_metadata=dump/test/norm/metadata.jsonl \
+ --output_dir=${train_output_path}/onnx_infer_out \
+ --device=cpu \
+ --cpu_threads=2
+fi
diff --git a/examples/csmsc/tts2/run.sh b/examples/csmsc/tts2/run.sh
index 1d67a5c9101ee88d6d181837d601bed287c4e62f..e51913496f0a41a298eb02d2a6ef0f3a180143fa 100755
--- a/examples/csmsc/tts2/run.sh
+++ b/examples/csmsc/tts2/run.sh
@@ -27,12 +27,12 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
- # synthesize, vocoder is pwgan
+ # synthesize, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
- # synthesize_e2e, vocoder is pwgan
+ # synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
@@ -46,19 +46,17 @@ fi
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
# install paddle2onnx
version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
- if [[ -z "$version" || ${version} != '0.9.5' ]]; then
- pip install paddle2onnx==0.9.5
+ if [[ -z "$version" || ${version} != '0.9.8' ]]; then
+ pip install paddle2onnx==0.9.8
fi
./local/paddle2onnx.sh ${train_output_path} inference inference_onnx speedyspeech_csmsc
- ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
+ # considering the balance between speed and quality, we recommend that you use hifigan as vocoder
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
fi
-# inference with onnxruntime, use fastspeech2 + hifigan by default
+# inference with onnxruntime
if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
- # install onnxruntime
- version=$(echo `pip list |grep "onnxruntime"` |awk -F" " '{print $2}')
- if [[ -z "$version" || ${version} != '1.10.0' ]]; then
- pip install onnxruntime==1.10.0
- fi
./local/ort_predict.sh ${train_output_path}
fi
diff --git a/examples/csmsc/tts3/local/ort_predict.sh b/examples/csmsc/tts3/local/ort_predict.sh
index 96350c06c846f6fd4d49e942b50bdaa35f7fc131..e16c7bd0533436c53a076e3c5a0e434481926fda 100755
--- a/examples/csmsc/tts3/local/ort_predict.sh
+++ b/examples/csmsc/tts3/local/ort_predict.sh
@@ -3,22 +3,32 @@ train_output_path=$1
stage=0
stop_stage=0
-# only support default_fastspeech2/speedyspeech + hifigan/mb_melgan now!
-
-# synthesize from metadata
+# e2e, synthesize from text
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
- python3 ${BIN_DIR}/../ort_predict.py \
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
--inference_dir=${train_output_path}/inference_onnx \
--am=fastspeech2_csmsc \
- --voc=hifigan_csmsc \
- --test_metadata=dump/test/norm/metadata.jsonl \
- --output_dir=${train_output_path}/onnx_infer_out \
+ --voc=pwgan_csmsc \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
--device=cpu \
--cpu_threads=2
fi
-# e2e, synthesize from text
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_csmsc \
+ --voc=mb_melgan_csmsc \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2
+fi
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
python3 ${BIN_DIR}/../ort_predict_e2e.py \
--inference_dir=${train_output_path}/inference_onnx \
--am=fastspeech2_csmsc \
@@ -29,3 +39,15 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
--device=cpu \
--cpu_threads=2
fi
+
+# synthesize from metadata, take hifigan as an example
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+ python3 ${BIN_DIR}/../ort_predict.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_csmsc \
+ --voc=hifigan_csmsc \
+ --test_metadata=dump/test/norm/metadata.jsonl \
+ --output_dir=${train_output_path}/onnx_infer_out \
+ --device=cpu \
+ --cpu_threads=2
+fi
\ No newline at end of file
diff --git a/examples/csmsc/tts3/local/ort_predict_streaming.sh b/examples/csmsc/tts3/local/ort_predict_streaming.sh
index 502ec912a236ee361cf7adb9aa3adaffe11fba3d..743935816509b4eeab386df254b67432cc34d2e6 100755
--- a/examples/csmsc/tts3/local/ort_predict_streaming.sh
+++ b/examples/csmsc/tts3/local/ort_predict_streaming.sh
@@ -5,6 +5,34 @@ stop_stage=0
# e2e, synthesize from text
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ python3 ${BIN_DIR}/../ort_predict_streaming.py \
+ --inference_dir=${train_output_path}/inference_onnx_streaming \
+ --am=fastspeech2_csmsc \
+ --am_stat=dump/train/speech_stats.npy \
+ --voc=pwgan_csmsc \
+ --output_dir=${train_output_path}/onnx_infer_out_streaming \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --am_streaming=True
+fi
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../ort_predict_streaming.py \
+ --inference_dir=${train_output_path}/inference_onnx_streaming \
+ --am=fastspeech2_csmsc \
+ --am_stat=dump/train/speech_stats.npy \
+ --voc=mb_melgan_csmsc \
+ --output_dir=${train_output_path}/onnx_infer_out_streaming \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --am_streaming=True
+fi
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
python3 ${BIN_DIR}/../ort_predict_streaming.py \
--inference_dir=${train_output_path}/inference_onnx_streaming \
--am=fastspeech2_csmsc \
diff --git a/examples/csmsc/tts3/local/synthesize_streaming.sh b/examples/csmsc/tts3/local/synthesize_streaming.sh
index b135db76d42e90a922b2d07d2b664c7c24690a0f..366a88db969950c76efc9a53362d4e35e1eb8602 100755
--- a/examples/csmsc/tts3/local/synthesize_streaming.sh
+++ b/examples/csmsc/tts3/local/synthesize_streaming.sh
@@ -24,7 +24,8 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
--text=${BIN_DIR}/../sentences.txt \
--output_dir=${train_output_path}/test_e2e_streaming \
--phones_dict=dump/phone_id_map.txt \
- --am_streaming=True
+ --am_streaming=True \
+ --inference_dir=${train_output_path}/inference_streaming
fi
# for more GAN Vocoders
@@ -45,7 +46,8 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
--text=${BIN_DIR}/../sentences.txt \
--output_dir=${train_output_path}/test_e2e_streaming \
--phones_dict=dump/phone_id_map.txt \
- --am_streaming=True
+ --am_streaming=True \
+ --inference_dir=${train_output_path}/inference_streaming
fi
# the pretrained models haven't release now
diff --git a/examples/csmsc/tts3/run.sh b/examples/csmsc/tts3/run.sh
index f0afcc8958cf8cef7df5901c7b9fb935d74290f2..2662b58115dcd122e63f0dd5ff902dff73f7ec4b 100755
--- a/examples/csmsc/tts3/run.sh
+++ b/examples/csmsc/tts3/run.sh
@@ -27,17 +27,17 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
- # synthesize, vocoder is pwgan
+ # synthesize, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
- # synthesize_e2e, vocoder is pwgan
+ # synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
- # inference with static model
+ # inference with static model, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
fi
@@ -46,15 +46,18 @@ fi
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
# install paddle2onnx
version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
- if [[ -z "$version" || ${version} != '0.9.5' ]]; then
- pip install paddle2onnx==0.9.5
+ if [[ -z "$version" || ${version} != '0.9.8' ]]; then
+ pip install paddle2onnx==0.9.8
fi
./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_csmsc
- ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
- ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
+ # considering the balance between speed and quality, we recommend that you use hifigan as vocoder
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
+
fi
-# inference with onnxruntime, use fastspeech2 + hifigan by default
+# inference with onnxruntime
if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
./local/ort_predict.sh ${train_output_path}
fi
diff --git a/examples/csmsc/tts3/run_cnndecoder.sh b/examples/csmsc/tts3/run_cnndecoder.sh
index c8dd8545b4cd48915990f6c5c5a7fb9ebcd866bf..c5ce41a9c6607806b9d7350142c251909d76ed44 100755
--- a/examples/csmsc/tts3/run_cnndecoder.sh
+++ b/examples/csmsc/tts3/run_cnndecoder.sh
@@ -33,25 +33,25 @@ fi
# synthesize_e2e non-streaming
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
- # synthesize_e2e, vocoder is pwgan
+ # synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
# inference non-streaming
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
- # inference with static model
+ # inference with static model, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
fi
# synthesize_e2e streaming
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
- # synthesize_e2e, vocoder is pwgan
+ # synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_streaming.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
# inference streaming
if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
- # inference with static model
+ # inference with static model, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/inference_streaming.sh ${train_output_path} || exit -1
fi
@@ -59,32 +59,37 @@ fi
if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
# install paddle2onnx
version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
- if [[ -z "$version" || ${version} != '0.9.5' ]]; then
- pip install paddle2onnx==0.9.5
+ if [[ -z "$version" || ${version} != '0.9.8' ]]; then
+ pip install paddle2onnx==0.9.8
fi
./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_csmsc
- ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
+ # considering the balance between speed and quality, we recommend that you use hifigan as vocoder
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
fi
# onnxruntime non streaming
-# inference with onnxruntime, use fastspeech2 + hifigan by default
if [ ${stage} -le 8 ] && [ ${stop_stage} -ge 8 ]; then
./local/ort_predict.sh ${train_output_path}
fi
# paddle2onnx streaming
+
if [ ${stage} -le 9 ] && [ ${stop_stage} -ge 9 ]; then
# install paddle2onnx
version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
- if [[ -z "$version" || ${version} != '0.9.5' ]]; then
- pip install paddle2onnx==0.9.5
+ if [[ -z "$version" || ${version} != '0.9.8' ]]; then
+ pip install paddle2onnx==0.9.8
fi
# streaming acoustic model
./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_encoder_infer
./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_decoder
./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_postnet
- # vocoder
- ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming hifigan_csmsc
+ # considering the balance between speed and quality, we recommend that you use hifigan as vocoder
+ ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming pwgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming mb_melgan_csmsc
+ # ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming hifigan_csmsc
fi
# onnxruntime streaming
diff --git a/examples/ljspeech/tts3/README.md b/examples/ljspeech/tts3/README.md
index 81a0580c0a7d658e87ec09a436cb3e48de00fea5..d786c1571918a4923a5db9aff8a1fafd1617d757 100644
--- a/examples/ljspeech/tts3/README.md
+++ b/examples/ljspeech/tts3/README.md
@@ -215,6 +215,13 @@ optional arguments:
Pretrained FastSpeech2 model with no silence in the edge of audios:
- [fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)
+The static model can be downloaded here:
+- [fastspeech2_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [fastspeech2_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_onnx_1.1.0.zip)
+
+
Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss
:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
default| 2(gpu) x 100000| 1.505682|0.612104| 0.045505| 0.62792| 0.220147
diff --git a/examples/ljspeech/tts3/local/inference.sh b/examples/ljspeech/tts3/local/inference.sh
new file mode 100755
index 0000000000000000000000000000000000000000..ff192f3e3c7571699cdf18a39c257d927ca2e11c
--- /dev/null
+++ b/examples/ljspeech/tts3/local/inference.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+
+train_output_path=$1
+
+stage=0
+stop_stage=0
+
+# pwgan
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ python3 ${BIN_DIR}/../inference.py \
+ --inference_dir=${train_output_path}/inference \
+ --am=fastspeech2_ljspeech \
+ --voc=pwgan_ljspeech \
+ --text=${BIN_DIR}/../sentences_en.txt \
+ --output_dir=${train_output_path}/pd_infer_out \
+ --phones_dict=dump/phone_id_map.txt \
+ --lang=en
+fi
+
+# hifigan
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../inference.py \
+ --inference_dir=${train_output_path}/inference \
+ --am=fastspeech2_ljspeech \
+ --voc=hifigan_ljspeech \
+ --text=${BIN_DIR}/../sentences_en.txt \
+ --output_dir=${train_output_path}/pd_infer_out \
+ --phones_dict=dump/phone_id_map.txt \
+ --lang=en
+fi
diff --git a/examples/ljspeech/tts3/local/ort_predict.sh b/examples/ljspeech/tts3/local/ort_predict.sh
new file mode 100755
index 0000000000000000000000000000000000000000..b4716f70e92fb24199312bfb89e300a6f3ffbee3
--- /dev/null
+++ b/examples/ljspeech/tts3/local/ort_predict.sh
@@ -0,0 +1,32 @@
+train_output_path=$1
+
+stage=0
+stop_stage=0
+
+# e2e, synthesize from text
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_ljspeech \
+ --voc=pwgan_ljspeech\
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../sentences_en.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --lang=en
+
+fi
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_ljspeech \
+ --voc=hifigan_ljspeech \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../sentences_en.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --lang=en
+fi
diff --git a/examples/ljspeech/tts3/local/paddle2onnx.sh b/examples/ljspeech/tts3/local/paddle2onnx.sh
new file mode 120000
index 0000000000000000000000000000000000000000..8d5dbef4ca64b96b1d90d8fb812efccbe7ab3f3e
--- /dev/null
+++ b/examples/ljspeech/tts3/local/paddle2onnx.sh
@@ -0,0 +1 @@
+../../../csmsc/tts3/local/paddle2onnx.sh
\ No newline at end of file
diff --git a/examples/ljspeech/tts3/run.sh b/examples/ljspeech/tts3/run.sh
index c64fa8883220db1b019d56056fe7c06033176573..c4a5963862ee7ff7d5f064be8cccede5bb429783 100755
--- a/examples/ljspeech/tts3/run.sh
+++ b/examples/ljspeech/tts3/run.sh
@@ -27,11 +27,35 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
- # synthesize, vocoder is pwgan
+ # synthesize, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
- # synthesize_e2e, vocoder is pwgan
+ # synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
+
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+ # inference with static model, vocoder is pwgan by default
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
+fi
+
+# paddle2onnx, please make sure the static models are in ${train_output_path}/inference first
+# we have only tested the following models so far
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ # install paddle2onnx
+ version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
+ if [[ -z "$version" || ${version} != '0.9.8' ]]; then
+ pip install paddle2onnx==0.9.8
+ fi
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_ljspeech
+ # considering the balance between speed and quality, we recommend that you use hifigan as vocoder
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_ljspeech
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_ljspeech
+fi
+
+# inference with onnxruntime, use fastspeech2 + hifigan by default
+if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
+ ./local/ort_predict.sh ${train_output_path}
+fi
diff --git a/examples/ljspeech/voc1/README.md b/examples/ljspeech/voc1/README.md
index d16c0e35fb2fcf13e1ec52ef608db925dc945f51..ad6cd29824ab46610306c4d93ad2801004bbff6a 100644
--- a/examples/ljspeech/voc1/README.md
+++ b/examples/ljspeech/voc1/README.md
@@ -130,6 +130,13 @@ optional arguments:
Pretrained models can be downloaded here:
- [pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)
+The static model can be downloaded here:
+- [pwgan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [pwgan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_onnx_1.1.0.zip)
+
+
Parallel WaveGAN checkpoint contains files listed below.
```text
diff --git a/examples/ljspeech/voc5/README.md b/examples/ljspeech/voc5/README.md
index d856cfecfdebed32cd33b6ef90285c3c1ec5299a..eaa51e50783d07577769ecee72f7ea2e65c8d8f2 100644
--- a/examples/ljspeech/voc5/README.md
+++ b/examples/ljspeech/voc5/README.md
@@ -115,6 +115,12 @@ optional arguments:
The pretrained model can be downloaded here:
- [hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip)
+The static model can be downloaded here:
+- [hifigan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [hifigan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_onnx_1.1.0.zip)
+
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
diff --git a/examples/vctk/tts3/README.md b/examples/vctk/tts3/README.md
index 0b0ce09349dbf11e0823392e7ace9aeb9c1033cc..9c0d75616c3e77b0f058e5efa120fc213e08382b 100644
--- a/examples/vctk/tts3/README.md
+++ b/examples/vctk/tts3/README.md
@@ -218,6 +218,12 @@ optional arguments:
Pretrained FastSpeech2 model with no silence in the edge of audios:
- [fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)
+The static model can be downloaded here:
+- [fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [fastspeech2_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip)
+
FastSpeech2 checkpoint contains files listed below.
```text
fastspeech2_nosil_vctk_ckpt_0.5
diff --git a/examples/vctk/tts3/local/inference.sh b/examples/vctk/tts3/local/inference.sh
index caef89d8b10495cb82d7cdc44d93366371b8cd45..9c4426146ff5fd1bd75eaa4921920feaf106f478 100755
--- a/examples/vctk/tts3/local/inference.sh
+++ b/examples/vctk/tts3/local/inference.sh
@@ -18,3 +18,15 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
--lang=en
fi
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../inference.py \
+ --inference_dir=${train_output_path}/inference \
+ --am=fastspeech2_vctk \
+ --voc=hifigan_vctk \
+ --text=${BIN_DIR}/../sentences_en.txt \
+ --output_dir=${train_output_path}/pd_infer_out \
+ --phones_dict=dump/phone_id_map.txt \
+ --speaker_dict=dump/speaker_id_map.txt \
+ --spk_id=0 \
+ --lang=en
+fi
diff --git a/examples/vctk/tts3/local/ort_predict.sh b/examples/vctk/tts3/local/ort_predict.sh
new file mode 100755
index 0000000000000000000000000000000000000000..4019e17fa935c3955e13b5d63c5fa8414661f4f8
--- /dev/null
+++ b/examples/vctk/tts3/local/ort_predict.sh
@@ -0,0 +1,34 @@
+train_output_path=$1
+
+stage=0
+stop_stage=0
+
+# e2e, synthesize from text
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_vctk \
+ --voc=pwgan_vctk \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../sentences_en.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --spk_id=0 \
+ --lang=en
+
+fi
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_vctk \
+ --voc=hifigan_vctk \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../sentences_en.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2 \
+ --spk_id=0 \
+ --lang=en
+fi
diff --git a/examples/vctk/tts3/local/paddle2onnx.sh b/examples/vctk/tts3/local/paddle2onnx.sh
new file mode 120000
index 0000000000000000000000000000000000000000..8d5dbef4ca64b96b1d90d8fb812efccbe7ab3f3e
--- /dev/null
+++ b/examples/vctk/tts3/local/paddle2onnx.sh
@@ -0,0 +1 @@
+../../../csmsc/tts3/local/paddle2onnx.sh
\ No newline at end of file
diff --git a/examples/vctk/tts3/run.sh b/examples/vctk/tts3/run.sh
index a2b849bc8999bc72f5b6c12d79e44ef2d63005d9..3d2a4a9476edc8a90d4676636320bd5d3159bcf4 100755
--- a/examples/vctk/tts3/run.sh
+++ b/examples/vctk/tts3/run.sh
@@ -27,11 +27,34 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
- # synthesize, vocoder is pwgan
+ # synthesize, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
- # synthesize_e2e, vocoder is pwgan
+ # synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
+
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+ # inference with static model, vocoder is pwgan by default
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
+fi
+
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ # install paddle2onnx
+ version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
+ if [[ -z "$version" || ${version} != '0.9.8' ]]; then
+ pip install paddle2onnx==0.9.8
+ fi
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_vctk
+ # considering the balance between speed and quality, we recommend that you use hifigan as vocoder
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_vctk
+ # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_vctk
+
+fi
+
+# inference with onnxruntime, use fastspeech2 + hifigan by default
+if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
+ ./local/ort_predict.sh ${train_output_path}
+fi
diff --git a/examples/vctk/voc1/README.md b/examples/vctk/voc1/README.md
index a0e06a4206d8c214119b187164396fe9a0b1711b..2d80e7563304b968da20e08ed2b39d4b942e5717 100644
--- a/examples/vctk/voc1/README.md
+++ b/examples/vctk/voc1/README.md
@@ -135,6 +135,13 @@ optional arguments:
Pretrained models can be downloaded here:
- [pwg_vctk_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.1.1.zip)
+The static model can be downloaded here:
+- [pwgan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [pwgan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_onnx_1.1.0.zip)
+
+
Parallel WaveGAN checkpoint contains files listed below.
```text
diff --git a/examples/vctk/voc5/README.md b/examples/vctk/voc5/README.md
index f2cbf27d21706d0702e46a20ff57aabf737de6a2..e937679b53dfed16851920c05b03ad8f9210a0f2 100644
--- a/examples/vctk/voc5/README.md
+++ b/examples/vctk/voc5/README.md
@@ -121,6 +121,12 @@ optional arguments:
The pretrained model can be downloaded here:
- [hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip)
+The static model can be downloaded here:
+- [hifigan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_static_1.1.0.zip)
+
+The ONNX model can be downloaded here:
+- [hifigan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_onnx_1.1.0.zip)
+
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
diff --git a/paddlespeech/t2s/exps/inference.py b/paddlespeech/t2s/exps/inference.py
index 98e73e10269824816bbfcc56a8cc5a9a79ac28e1..ba951182df01b751c09cb0377d6536aadade8cce 100644
--- a/paddlespeech/t2s/exps/inference.py
+++ b/paddlespeech/t2s/exps/inference.py
@@ -35,8 +35,12 @@ def parse_args():
type=str,
default='fastspeech2_csmsc',
choices=[
- 'speedyspeech_csmsc', 'fastspeech2_csmsc', 'fastspeech2_aishell3',
- 'fastspeech2_vctk', 'tacotron2_csmsc'
+ 'speedyspeech_csmsc',
+ 'fastspeech2_csmsc',
+ 'fastspeech2_aishell3',
+ 'fastspeech2_ljspeech',
+ 'fastspeech2_vctk',
+ 'tacotron2_csmsc',
],
help='Choose acoustic model type of tts task.')
parser.add_argument(
@@ -56,8 +60,16 @@ def parse_args():
type=str,
default='pwgan_csmsc',
choices=[
- 'pwgan_csmsc', 'mb_melgan_csmsc', 'hifigan_csmsc', 'pwgan_aishell3',
- 'pwgan_vctk', 'wavernn_csmsc'
+ 'pwgan_csmsc',
+ 'pwgan_aishell3',
+ 'pwgan_ljspeech',
+ 'pwgan_vctk',
+ 'mb_melgan_csmsc',
+ 'hifigan_csmsc',
+ 'hifigan_aishell3',
+ 'hifigan_ljspeech',
+ 'hifigan_vctk',
+ 'wavernn_csmsc',
],
help='Choose vocoder type of tts task.')
# other
diff --git a/paddlespeech/t2s/exps/ort_predict_e2e.py b/paddlespeech/t2s/exps/ort_predict_e2e.py
index a2ef8e4c6da5c9eabf77b50c2b0153077d9426a5..f33fc41288d1feba2cb587b6b98a92acdc8f56ea 100644
--- a/paddlespeech/t2s/exps/ort_predict_e2e.py
+++ b/paddlespeech/t2s/exps/ort_predict_e2e.py
@@ -54,19 +54,31 @@ def ort_predict(args):
device=args.device,
cpu_threads=args.cpu_threads)
+ merge_sentences = True
+
# frontend warmup
# Loading model cost 0.5+ seconds
if args.lang == 'zh':
- frontend.get_input_ids("你好,欢迎使用飞桨框架进行深度学习研究!", merge_sentences=True)
+ frontend.get_input_ids(
+ "你好,欢迎使用飞桨框架进行深度学习研究!", merge_sentences=merge_sentences)
else:
- print("lang should in be 'zh' here!")
+ frontend.get_input_ids(
+ "hello, thank you, thank you very much",
+ merge_sentences=merge_sentences)
# am warmup
+ spk_id = [args.spk_id]
for T in [27, 38, 54]:
am_input_feed = {}
if am_name == 'fastspeech2':
- phone_ids = np.random.randint(1, 266, size=(T, ))
+ if args.lang == 'en':
+ phone_ids = np.random.randint(1, 78, size=(T, ))
+ else:
+ phone_ids = np.random.randint(1, 266, size=(T, ))
am_input_feed.update({'text': phone_ids})
+ if am_dataset in {"aishell3", "vctk"}:
+ am_input_feed.update({'spk_id': spk_id})
+
elif am_name == 'speedyspeech':
phone_ids = np.random.randint(1, 92, size=(T, ))
tone_ids = np.random.randint(1, 5, size=(T, ))
@@ -96,12 +108,18 @@ def ort_predict(args):
phone_ids = input_ids["phone_ids"]
if get_tone_ids:
tone_ids = input_ids["tone_ids"]
+ elif args.lang == 'en':
+ input_ids = frontend.get_input_ids(
+ sentence, merge_sentences=merge_sentences)
+ phone_ids = input_ids["phone_ids"]
else:
- print("lang should in be 'zh' here!")
+ print("lang should in {'zh', 'en'}!")
# merge_sentences=True here, so we only use the first item of phone_ids
phone_ids = phone_ids[0].numpy()
if am_name == 'fastspeech2':
am_input_feed.update({'text': phone_ids})
+ if am_dataset in {"aishell3", "vctk"}:
+ am_input_feed.update({'spk_id': spk_id})
elif am_name == 'speedyspeech':
tone_ids = tone_ids[0].numpy()
am_input_feed.update({'phones': phone_ids, 'tones': tone_ids})
@@ -130,19 +148,40 @@ def parse_args():
'--am',
type=str,
default='fastspeech2_csmsc',
- choices=['fastspeech2_csmsc', 'speedyspeech_csmsc'],
+ choices=[
+ 'fastspeech2_csmsc',
+ 'fastspeech2_aishell3',
+ 'fastspeech2_ljspeech',
+ 'fastspeech2_vctk',
+ 'speedyspeech_csmsc',
+ ],
help='Choose acoustic model type of tts task.')
parser.add_argument(
"--phones_dict", type=str, default=None, help="phone vocabulary file.")
parser.add_argument(
"--tones_dict", type=str, default=None, help="tone vocabulary file.")
+ parser.add_argument(
+ '--spk_id',
+ type=int,
+ default=0,
+ help='spk id for multi speaker acoustic model')
# voc
parser.add_argument(
'--voc',
type=str,
default='hifigan_csmsc',
- choices=['hifigan_csmsc', 'mb_melgan_csmsc', 'pwgan_csmsc'],
+ choices=[
+ 'pwgan_csmsc',
+ 'pwgan_aishell3',
+ 'pwgan_ljspeech',
+ 'pwgan_vctk',
+ 'hifigan_csmsc',
+ 'hifigan_aishell3',
+ 'hifigan_ljspeech',
+ 'hifigan_vctk',
+ 'mb_melgan_csmsc',
+ ],
help='Choose vocoder type of tts task.')
# other
parser.add_argument(