diff --git a/README_cn.md b/README_cn.md index 27cd6089064b6d526cdee2b19a036a324c6686a2..8c018a08e3d99a3e33d9750c3eacebb41cc17d80 100644 --- a/README_cn.md +++ b/README_cn.md @@ -691,7 +691,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 - 非常感谢 [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) 基于 PaddleSpeech 的 TTS 模型搭建带 GUI 操作界面的配音工具。 - 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。 -- 非常感谢 [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) 基于 PaddleSpeech 的 ASR与TTS 设计的可听、说对话机器人 +- 非常感谢 [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) 基于 PaddleSpeech 的 ASR 与 TTS 设计的可听、说对话机器人。 此外,PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。 diff --git a/docs/source/install_cn.md b/docs/source/install_cn.md index 345e79bb5dd8fd4d089c873cd42f7de8444088ba..52eb5dadac443a6826ef1f696c624dcc956cbf3d 100644 --- a/docs/source/install_cn.md +++ b/docs/source/install_cn.md @@ -116,7 +116,7 @@ conda install -y -c gcc_linux-64=8.4.0 gxx_linux-64=8.4.0 python3 -m pip install paddlepaddle-gpu==2.2.0 -i https://mirror.baidu.com/pypi/simple ``` ### 安装 PaddleSpeech -最后安装 `paddlespeech`,这样你就可以使用 `paddlespeech`中已有的 examples: +最后安装 `paddlespeech`,这样你就可以使用 `paddlespeech` 中已有的 examples: ```bash # 部分用户系统由于默认源的问题,安装中会出现kaldiio安转出错的问题,建议首先安装pytest-runner: pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple @@ -137,7 +137,7 @@ Docker 是一种开源工具,用于在和系统本身环境相隔离的环境 在 [Docker Hub](https://hub.docker.com/repository/docker/paddlecloud/paddlespeech) 中获取这些镜像及相应的使用指南,包括 CPU、GPU、ROCm 版本。 -如果您对自动化制作docker镜像感兴趣,或有自定义需求,请访问 [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton) 做进一步了解。 +如果您对自动化制作 docker 镜像感兴趣,或有自定义需求,请访问 [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton) 做进一步了解。 完成这些以后,你就可以在 docker 容器中执行训练、推理和超参 fine-tune。 ### 选择2: 使用有 root 权限的 Ubuntu - 使用apt安装 `build-essential` @@ -173,7 +173,7 @@ conda install -y -c conda-forge sox libsndfile swig bzip2 libflac bc python3 -m pip install paddlepaddle-gpu==2.2.0 -i https://mirror.baidu.com/pypi/simple ``` ### 用开发者模式安装 PaddleSpeech -部分用户系统由于默认源的问题,安装中会出现kaldiio安转出错的问题,建议首先安装pytest-runner: +部分用户系统由于默认源的问题,安装中会出现 kaldiio 安转出错的问题,建议首先安装 pytest-runner: ```bash pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple ``` diff --git a/docs/source/released_model.md b/docs/source/released_model.md index 80d6b44b476191b3d4c81ad5fa65f92d02c13d0c..551a86ef0bd013120597be512f6a78242314f59f 100644 --- a/docs/source/released_model.md +++ b/docs/source/released_model.md @@ -1,4 +1,3 @@ - # Released Models ## Speech-to-Text Models @@ -34,32 +33,33 @@ Language Model | Training Data | Token-based | Size | Descriptions ## Text-to-Speech Models ### Acoustic Models -Model Type | Dataset| Example Link | Pretrained Models|Static Models|Size (static) +Model Type | Dataset| Example Link | Pretrained Models|Static/ONNX Models|Size (static) :-------------:| :------------:| :-----: | :-----:| :-----:| :-----: Tacotron2|LJSpeech|[tacotron2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_ljspeech_ckpt_0.2.0.zip)||| Tacotron2|CSMSC|[tacotron2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts0)|[tacotron2_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_ckpt_0.2.0.zip)|[tacotron2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip)|103MB| TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip)||| -SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2)|[speedyspeech_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_ckpt_0.2.0.zip)|[speedyspeech_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip)|12MB| -FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip)|157MB| +SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2)|[speedyspeech_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_ckpt_0.2.0.zip)|[speedyspeech_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip)
[speedyspeech_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_onnx_0.2.0.zip)|13MB| +FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip)
[fastspeech2_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_onnx_0.2.0.zip)|157MB| FastSpeech2-Conformer| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_conformer_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_baker_ckpt_0.5.zip)||| -FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)||| -FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)||| -FastSpeech2| VCTK |[fastspeech2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)||| +FastSpeech2-CNNDecoder| CSMSC| [fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)| [fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip) | [fastspeech2_cnndecoder_csmsc_static_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_static_1.0.0.zip)
[fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip)
[fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip)
[fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip) | 84MB| +FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)|[fastspeech2_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip)
[fastspeech2_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip)|147MB| +FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)|[fastspeech2_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_static_1.1.0.zip)
[fastspeech2_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_onnx_1.1.0.zip)|145MB| +FastSpeech2| VCTK |[fastspeech2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)|[fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip)
[fastspeech2_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip) | 145MB| ### Vocoders -Model Type | Dataset| Example Link | Pretrained Models| Static Models|Size (static) +Model Type | Dataset| Example Link | Pretrained Models| Static/ONNX Models|Size (static) :-----:| :-----:| :-----: | :-----:| :-----:| :-----: WaveFlow| LJSpeech |[waveflow-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc0)|[waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/waveflow/waveflow_ljspeech_ckpt_0.3.zip)||| -Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip)|[pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip)|5.1MB| -Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)||| -Parallel WaveGAN| AISHELL-3 |[PWGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc1)|[pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)||| -Parallel WaveGAN| VCTK |[PWGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc1)|[pwg_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.5.zip)||| -|Multi Band MelGAN | CSMSC |[MB MelGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc3) | [mb_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip)
[mb_melgan_baker_finetune_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip)|[mb_melgan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip) |8.2MB| +Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip)|[pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip)
[pwgan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_csmsc_onnx_0.2.0.zip)|4.8MB| +Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)|[pwgan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_static_1.1.0.zip)
[pwgan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_onnx_1.1.0.zip)|4.8MB| +Parallel WaveGAN| AISHELL-3 |[PWGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc1)|[pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)| [pwgan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_static_1.1.0.zip)
[pwgan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_onnx_1.1.0.zip)|4.8MB| +Parallel WaveGAN| VCTK |[PWGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc1)|[pwg_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.5.zip)|[pwgan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_static_1.1.0.zip)
[pwgan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_onnx_1.1.0.zip)|4.8MB| +|Multi Band MelGAN | CSMSC |[MB MelGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc3) | [mb_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip)
[mb_melgan_baker_finetune_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip)|[mb_melgan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip)
[mb_melgan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip)|7.6MB| Style MelGAN | CSMSC |[Style MelGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc4)|[style_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/style_melgan/style_melgan_csmsc_ckpt_0.1.1.zip)| | | -HiFiGAN | CSMSC |[HiFiGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5)|[hifigan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip)|[hifigan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip)|50MB| -HiFiGAN | LJSpeech |[HiFiGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc5)|[hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip)||| -HiFiGAN | AISHELL-3 |[HiFiGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc5)|[hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip)||| -HiFiGAN | VCTK |[HiFiGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc5)|[hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip)||| +HiFiGAN | CSMSC |[HiFiGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5)|[hifigan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip)|[hifigan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip)
[hifigan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_onnx_0.2.0.zip)|46MB| +HiFiGAN | LJSpeech |[HiFiGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc5)|[hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip)|[hifigan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_static_1.1.0.zip)
[hifigan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_onnx_1.1.0.zip) |49MB| +HiFiGAN | AISHELL-3 |[HiFiGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc5)|[hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip)|[hifigan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_static_1.1.0.zip)
[hifigan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_onnx_1.1.0.zip)|46MB| +HiFiGAN | VCTK |[HiFiGAN-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc5)|[hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip)|[hifigan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_static_1.1.0.zip)
[hifigan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_onnx_1.1.0.zip)|46MB| WaveRNN | CSMSC |[WaveRNN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc6)|[wavernn_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_ckpt_0.2.0.zip)|[wavernn_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_0.2.0.zip)|18MB| diff --git a/examples/aishell3/tts3/README.md b/examples/aishell3/tts3/README.md index 31c99898ccc7839b835a0fbd7daec550a36de340..21bad51ecfbbce58acc1ef14c82dadbc6b09634c 100644 --- a/examples/aishell3/tts3/README.md +++ b/examples/aishell3/tts3/README.md @@ -220,6 +220,12 @@ Pretrained FastSpeech2 model with no silence in the edge of audios: - [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip) - [fastspeech2_conformer_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_aishell3_ckpt_0.2.0.zip) (Thanks for [@awmmmm](https://github.com/awmmmm)'s contribution) +The static model can be downloaded here: +- [fastspeech2_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [fastspeech2_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip) + FastSpeech2 checkpoint contains files listed below. ```text diff --git a/examples/aishell3/tts3/local/inference.sh b/examples/aishell3/tts3/local/inference.sh index 3b03b53ce1c49d58dd6d89fbd1d09c115e231925..dc05ec59218533631fe9fa16f5ec4ca1c4f4b3db 100755 --- a/examples/aishell3/tts3/local/inference.sh +++ b/examples/aishell3/tts3/local/inference.sh @@ -17,3 +17,14 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then --spk_id=0 fi +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../inference.py \ + --inference_dir=${train_output_path}/inference \ + --am=fastspeech2_aishell3 \ + --voc=hifigan_aishell3 \ + --text=${BIN_DIR}/../sentences.txt \ + --output_dir=${train_output_path}/pd_infer_out \ + --phones_dict=dump/phone_id_map.txt \ + --speaker_dict=dump/speaker_id_map.txt \ + --spk_id=0 +fi diff --git a/examples/aishell3/tts3/local/ort_predict.sh b/examples/aishell3/tts3/local/ort_predict.sh new file mode 100755 index 0000000000000000000000000000000000000000..24e66f689603519e894fecceae22c644000719f9 --- /dev/null +++ b/examples/aishell3/tts3/local/ort_predict.sh @@ -0,0 +1,32 @@ +train_output_path=$1 + +stage=0 +stop_stage=0 + +# e2e, synthesize from text +if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_aishell3 \ + --voc=pwgan_aishell3 \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --spk_id=0 + +fi + +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_aishell3 \ + --voc=hifigan_aishell3 \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --spk_id=0 +fi diff --git a/examples/aishell3/tts3/local/paddle2onnx.sh b/examples/aishell3/tts3/local/paddle2onnx.sh new file mode 120000 index 0000000000000000000000000000000000000000..8d5dbef4ca64b96b1d90d8fb812efccbe7ab3f3e --- /dev/null +++ b/examples/aishell3/tts3/local/paddle2onnx.sh @@ -0,0 +1 @@ +../../../csmsc/tts3/local/paddle2onnx.sh \ No newline at end of file diff --git a/examples/aishell3/tts3/run.sh b/examples/aishell3/tts3/run.sh index b375f215984e92ff8acd7ad5f91da67e16863716..868087a01d47d9673231d430cf860a870e1f4f66 100755 --- a/examples/aishell3/tts3/run.sh +++ b/examples/aishell3/tts3/run.sh @@ -27,11 +27,34 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then fi if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then - # synthesize, vocoder is pwgan + # synthesize, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then - # synthesize_e2e, vocoder is pwgan + # synthesize_e2e, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi + +if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then + # inference with static model, vocoder is pwgan by default + CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1 +fi + +if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then + # install paddle2onnx + version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}') + if [[ -z "$version" || ${version} != '0.9.8' ]]; then + pip install paddle2onnx==0.9.8 + fi + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_aishell3 + # considering the balance between speed and quality, we recommend that you use hifigan as vocoder + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_aishell3 + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_aishell3 + +fi + +# inference with onnxruntime, use fastspeech2 + hifigan by default +if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then + ./local/ort_predict.sh ${train_output_path} +fi diff --git a/examples/aishell3/voc1/README.md b/examples/aishell3/voc1/README.md index a3daf3dfd6889e60f9d71be396bc9b3d2404fe54..bc25f43cf78fdfeec1152bdcb436422c0d2b8fc6 100644 --- a/examples/aishell3/voc1/README.md +++ b/examples/aishell3/voc1/README.md @@ -133,6 +133,12 @@ optional arguments: Pretrained models can be downloaded here: - [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip) +The static model can be downloaded here: +- [pwgan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [pwgan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_onnx_1.1.0.zip) + Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss:| eval/spectral_convergence_loss :-------------:| :------------:| :-----: | :-----: | :--------: default| 1(gpu) x 400000|1.968762|0.759008|0.218524 diff --git a/examples/aishell3/voc5/README.md b/examples/aishell3/voc5/README.md index c3e3197d63c0e871dabd61e67992335fc8d5d1f9..7f99a52e3f1cb79d98489ab56eb60f757afd6bc9 100644 --- a/examples/aishell3/voc5/README.md +++ b/examples/aishell3/voc5/README.md @@ -116,6 +116,11 @@ optional arguments: The pretrained model can be downloaded here: - [hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip) +The static model can be downloaded here: +- [hifigan_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [hifigan_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_onnx_1.1.0.zip) Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss :-------------:| :------------:| :-----: | :-----: | :--------: diff --git a/examples/csmsc/tts2/local/ort_predict.sh b/examples/csmsc/tts2/local/ort_predict.sh index 46b0409b81946a18cf0969fbca0d8cf3f43c7abc..8ca4c0e9bd06ea5ff836ff0ceb61bae28554ca07 100755 --- a/examples/csmsc/tts2/local/ort_predict.sh +++ b/examples/csmsc/tts2/local/ort_predict.sh @@ -3,22 +3,34 @@ train_output_path=$1 stage=0 stop_stage=0 -# only support default_fastspeech2/speedyspeech + hifigan/mb_melgan now! - -# synthesize from metadata +# e2e, synthesize from text if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then - python3 ${BIN_DIR}/../ort_predict.py \ + python3 ${BIN_DIR}/../ort_predict_e2e.py \ --inference_dir=${train_output_path}/inference_onnx \ --am=speedyspeech_csmsc \ - --voc=hifigan_csmsc \ - --test_metadata=dump/test/norm/metadata.jsonl \ - --output_dir=${train_output_path}/onnx_infer_out \ + --voc=pwgan_csmsc \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ + --tones_dict=dump/tone_id_map.txt \ --device=cpu \ --cpu_threads=2 fi -# e2e, synthesize from text if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=speedyspeech_csmsc \ + --voc=mb_melgan_csmsc \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ + --tones_dict=dump/tone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 +fi + +if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then python3 ${BIN_DIR}/../ort_predict_e2e.py \ --inference_dir=${train_output_path}/inference_onnx \ --am=speedyspeech_csmsc \ @@ -30,3 +42,15 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then --device=cpu \ --cpu_threads=2 fi + +# synthesize from metadata +if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then + python3 ${BIN_DIR}/../ort_predict.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=speedyspeech_csmsc \ + --voc=hifigan_csmsc \ + --test_metadata=dump/test/norm/metadata.jsonl \ + --output_dir=${train_output_path}/onnx_infer_out \ + --device=cpu \ + --cpu_threads=2 +fi diff --git a/examples/csmsc/tts2/run.sh b/examples/csmsc/tts2/run.sh index 1d67a5c9101ee88d6d181837d601bed287c4e62f..e51913496f0a41a298eb02d2a6ef0f3a180143fa 100755 --- a/examples/csmsc/tts2/run.sh +++ b/examples/csmsc/tts2/run.sh @@ -27,12 +27,12 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then fi if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then - # synthesize, vocoder is pwgan + # synthesize, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then - # synthesize_e2e, vocoder is pwgan + # synthesize_e2e, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi @@ -46,19 +46,17 @@ fi if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then # install paddle2onnx version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}') - if [[ -z "$version" || ${version} != '0.9.5' ]]; then - pip install paddle2onnx==0.9.5 + if [[ -z "$version" || ${version} != '0.9.8' ]]; then + pip install paddle2onnx==0.9.8 fi ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx speedyspeech_csmsc - ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc + # considering the balance between speed and quality, we recommend that you use hifigan as vocoder + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc fi -# inference with onnxruntime, use fastspeech2 + hifigan by default +# inference with onnxruntime if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then - # install onnxruntime - version=$(echo `pip list |grep "onnxruntime"` |awk -F" " '{print $2}') - if [[ -z "$version" || ${version} != '1.10.0' ]]; then - pip install onnxruntime==1.10.0 - fi ./local/ort_predict.sh ${train_output_path} fi diff --git a/examples/csmsc/tts3/local/ort_predict.sh b/examples/csmsc/tts3/local/ort_predict.sh index 96350c06c846f6fd4d49e942b50bdaa35f7fc131..e16c7bd0533436c53a076e3c5a0e434481926fda 100755 --- a/examples/csmsc/tts3/local/ort_predict.sh +++ b/examples/csmsc/tts3/local/ort_predict.sh @@ -3,22 +3,32 @@ train_output_path=$1 stage=0 stop_stage=0 -# only support default_fastspeech2/speedyspeech + hifigan/mb_melgan now! - -# synthesize from metadata +# e2e, synthesize from text if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then - python3 ${BIN_DIR}/../ort_predict.py \ + python3 ${BIN_DIR}/../ort_predict_e2e.py \ --inference_dir=${train_output_path}/inference_onnx \ --am=fastspeech2_csmsc \ - --voc=hifigan_csmsc \ - --test_metadata=dump/test/norm/metadata.jsonl \ - --output_dir=${train_output_path}/onnx_infer_out \ + --voc=pwgan_csmsc \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ --device=cpu \ --cpu_threads=2 fi -# e2e, synthesize from text if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_csmsc \ + --voc=mb_melgan_csmsc \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 +fi + +if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then python3 ${BIN_DIR}/../ort_predict_e2e.py \ --inference_dir=${train_output_path}/inference_onnx \ --am=fastspeech2_csmsc \ @@ -29,3 +39,15 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then --device=cpu \ --cpu_threads=2 fi + +# synthesize from metadata, take hifigan as an example +if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then + python3 ${BIN_DIR}/../ort_predict.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_csmsc \ + --voc=hifigan_csmsc \ + --test_metadata=dump/test/norm/metadata.jsonl \ + --output_dir=${train_output_path}/onnx_infer_out \ + --device=cpu \ + --cpu_threads=2 +fi \ No newline at end of file diff --git a/examples/csmsc/tts3/local/ort_predict_streaming.sh b/examples/csmsc/tts3/local/ort_predict_streaming.sh index 502ec912a236ee361cf7adb9aa3adaffe11fba3d..743935816509b4eeab386df254b67432cc34d2e6 100755 --- a/examples/csmsc/tts3/local/ort_predict_streaming.sh +++ b/examples/csmsc/tts3/local/ort_predict_streaming.sh @@ -5,6 +5,34 @@ stop_stage=0 # e2e, synthesize from text if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then + python3 ${BIN_DIR}/../ort_predict_streaming.py \ + --inference_dir=${train_output_path}/inference_onnx_streaming \ + --am=fastspeech2_csmsc \ + --am_stat=dump/train/speech_stats.npy \ + --voc=pwgan_csmsc \ + --output_dir=${train_output_path}/onnx_infer_out_streaming \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --am_streaming=True +fi + +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../ort_predict_streaming.py \ + --inference_dir=${train_output_path}/inference_onnx_streaming \ + --am=fastspeech2_csmsc \ + --am_stat=dump/train/speech_stats.npy \ + --voc=mb_melgan_csmsc \ + --output_dir=${train_output_path}/onnx_infer_out_streaming \ + --text=${BIN_DIR}/../csmsc_test.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --am_streaming=True +fi + +if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then python3 ${BIN_DIR}/../ort_predict_streaming.py \ --inference_dir=${train_output_path}/inference_onnx_streaming \ --am=fastspeech2_csmsc \ diff --git a/examples/csmsc/tts3/local/synthesize_streaming.sh b/examples/csmsc/tts3/local/synthesize_streaming.sh index b135db76d42e90a922b2d07d2b664c7c24690a0f..366a88db969950c76efc9a53362d4e35e1eb8602 100755 --- a/examples/csmsc/tts3/local/synthesize_streaming.sh +++ b/examples/csmsc/tts3/local/synthesize_streaming.sh @@ -24,7 +24,8 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then --text=${BIN_DIR}/../sentences.txt \ --output_dir=${train_output_path}/test_e2e_streaming \ --phones_dict=dump/phone_id_map.txt \ - --am_streaming=True + --am_streaming=True \ + --inference_dir=${train_output_path}/inference_streaming fi # for more GAN Vocoders @@ -45,7 +46,8 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then --text=${BIN_DIR}/../sentences.txt \ --output_dir=${train_output_path}/test_e2e_streaming \ --phones_dict=dump/phone_id_map.txt \ - --am_streaming=True + --am_streaming=True \ + --inference_dir=${train_output_path}/inference_streaming fi # the pretrained models haven't release now diff --git a/examples/csmsc/tts3/run.sh b/examples/csmsc/tts3/run.sh index f0afcc8958cf8cef7df5901c7b9fb935d74290f2..2662b58115dcd122e63f0dd5ff902dff73f7ec4b 100755 --- a/examples/csmsc/tts3/run.sh +++ b/examples/csmsc/tts3/run.sh @@ -27,17 +27,17 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then fi if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then - # synthesize, vocoder is pwgan + # synthesize, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then - # synthesize_e2e, vocoder is pwgan + # synthesize_e2e, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then - # inference with static model + # inference with static model, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1 fi @@ -46,15 +46,18 @@ fi if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then # install paddle2onnx version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}') - if [[ -z "$version" || ${version} != '0.9.5' ]]; then - pip install paddle2onnx==0.9.5 + if [[ -z "$version" || ${version} != '0.9.8' ]]; then + pip install paddle2onnx==0.9.8 fi ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_csmsc - ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc - ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc + # considering the balance between speed and quality, we recommend that you use hifigan as vocoder + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc + fi -# inference with onnxruntime, use fastspeech2 + hifigan by default +# inference with onnxruntime if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then ./local/ort_predict.sh ${train_output_path} fi diff --git a/examples/csmsc/tts3/run_cnndecoder.sh b/examples/csmsc/tts3/run_cnndecoder.sh index c8dd8545b4cd48915990f6c5c5a7fb9ebcd866bf..c5ce41a9c6607806b9d7350142c251909d76ed44 100755 --- a/examples/csmsc/tts3/run_cnndecoder.sh +++ b/examples/csmsc/tts3/run_cnndecoder.sh @@ -33,25 +33,25 @@ fi # synthesize_e2e non-streaming if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then - # synthesize_e2e, vocoder is pwgan + # synthesize_e2e, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi # inference non-streaming if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then - # inference with static model + # inference with static model, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1 fi # synthesize_e2e streaming if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then - # synthesize_e2e, vocoder is pwgan + # synthesize_e2e, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_streaming.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi # inference streaming if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then - # inference with static model + # inference with static model, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/inference_streaming.sh ${train_output_path} || exit -1 fi @@ -59,32 +59,37 @@ fi if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then # install paddle2onnx version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}') - if [[ -z "$version" || ${version} != '0.9.5' ]]; then - pip install paddle2onnx==0.9.5 + if [[ -z "$version" || ${version} != '0.9.8' ]]; then + pip install paddle2onnx==0.9.8 fi ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_csmsc - ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc + # considering the balance between speed and quality, we recommend that you use hifigan as vocoder + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc fi # onnxruntime non streaming -# inference with onnxruntime, use fastspeech2 + hifigan by default if [ ${stage} -le 8 ] && [ ${stop_stage} -ge 8 ]; then ./local/ort_predict.sh ${train_output_path} fi # paddle2onnx streaming + if [ ${stage} -le 9 ] && [ ${stop_stage} -ge 9 ]; then # install paddle2onnx version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}') - if [[ -z "$version" || ${version} != '0.9.5' ]]; then - pip install paddle2onnx==0.9.5 + if [[ -z "$version" || ${version} != '0.9.8' ]]; then + pip install paddle2onnx==0.9.8 fi # streaming acoustic model ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_encoder_infer ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_decoder ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_postnet - # vocoder - ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming hifigan_csmsc + # considering the balance between speed and quality, we recommend that you use hifigan as vocoder + ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming pwgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming mb_melgan_csmsc + # ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming hifigan_csmsc fi # onnxruntime streaming diff --git a/examples/ljspeech/tts3/README.md b/examples/ljspeech/tts3/README.md index 81a0580c0a7d658e87ec09a436cb3e48de00fea5..d786c1571918a4923a5db9aff8a1fafd1617d757 100644 --- a/examples/ljspeech/tts3/README.md +++ b/examples/ljspeech/tts3/README.md @@ -215,6 +215,13 @@ optional arguments: Pretrained FastSpeech2 model with no silence in the edge of audios: - [fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip) +The static model can be downloaded here: +- [fastspeech2_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [fastspeech2_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_onnx_1.1.0.zip) + + Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss :-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------: default| 2(gpu) x 100000| 1.505682|0.612104| 0.045505| 0.62792| 0.220147 diff --git a/examples/ljspeech/tts3/local/inference.sh b/examples/ljspeech/tts3/local/inference.sh new file mode 100755 index 0000000000000000000000000000000000000000..ff192f3e3c7571699cdf18a39c257d927ca2e11c --- /dev/null +++ b/examples/ljspeech/tts3/local/inference.sh @@ -0,0 +1,30 @@ +#!/bin/bash + +train_output_path=$1 + +stage=0 +stop_stage=0 + +# pwgan +if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then + python3 ${BIN_DIR}/../inference.py \ + --inference_dir=${train_output_path}/inference \ + --am=fastspeech2_ljspeech \ + --voc=pwgan_ljspeech \ + --text=${BIN_DIR}/../sentences_en.txt \ + --output_dir=${train_output_path}/pd_infer_out \ + --phones_dict=dump/phone_id_map.txt \ + --lang=en +fi + +# hifigan +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../inference.py \ + --inference_dir=${train_output_path}/inference \ + --am=fastspeech2_ljspeech \ + --voc=hifigan_ljspeech \ + --text=${BIN_DIR}/../sentences_en.txt \ + --output_dir=${train_output_path}/pd_infer_out \ + --phones_dict=dump/phone_id_map.txt \ + --lang=en +fi diff --git a/examples/ljspeech/tts3/local/ort_predict.sh b/examples/ljspeech/tts3/local/ort_predict.sh new file mode 100755 index 0000000000000000000000000000000000000000..b4716f70e92fb24199312bfb89e300a6f3ffbee3 --- /dev/null +++ b/examples/ljspeech/tts3/local/ort_predict.sh @@ -0,0 +1,32 @@ +train_output_path=$1 + +stage=0 +stop_stage=0 + +# e2e, synthesize from text +if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_ljspeech \ + --voc=pwgan_ljspeech\ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../sentences_en.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --lang=en + +fi + +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_ljspeech \ + --voc=hifigan_ljspeech \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../sentences_en.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --lang=en +fi diff --git a/examples/ljspeech/tts3/local/paddle2onnx.sh b/examples/ljspeech/tts3/local/paddle2onnx.sh new file mode 120000 index 0000000000000000000000000000000000000000..8d5dbef4ca64b96b1d90d8fb812efccbe7ab3f3e --- /dev/null +++ b/examples/ljspeech/tts3/local/paddle2onnx.sh @@ -0,0 +1 @@ +../../../csmsc/tts3/local/paddle2onnx.sh \ No newline at end of file diff --git a/examples/ljspeech/tts3/run.sh b/examples/ljspeech/tts3/run.sh index c64fa8883220db1b019d56056fe7c06033176573..c4a5963862ee7ff7d5f064be8cccede5bb429783 100755 --- a/examples/ljspeech/tts3/run.sh +++ b/examples/ljspeech/tts3/run.sh @@ -27,11 +27,35 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then fi if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then - # synthesize, vocoder is pwgan + # synthesize, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then - # synthesize_e2e, vocoder is pwgan + # synthesize_e2e, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi + +if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then + # inference with static model, vocoder is pwgan by default + CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1 +fi + +# paddle2onnx, please make sure the static models are in ${train_output_path}/inference first +# we have only tested the following models so far +if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then + # install paddle2onnx + version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}') + if [[ -z "$version" || ${version} != '0.9.8' ]]; then + pip install paddle2onnx==0.9.8 + fi + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_ljspeech + # considering the balance between speed and quality, we recommend that you use hifigan as vocoder + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_ljspeech + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_ljspeech +fi + +# inference with onnxruntime, use fastspeech2 + hifigan by default +if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then + ./local/ort_predict.sh ${train_output_path} +fi diff --git a/examples/ljspeech/voc1/README.md b/examples/ljspeech/voc1/README.md index d16c0e35fb2fcf13e1ec52ef608db925dc945f51..ad6cd29824ab46610306c4d93ad2801004bbff6a 100644 --- a/examples/ljspeech/voc1/README.md +++ b/examples/ljspeech/voc1/README.md @@ -130,6 +130,13 @@ optional arguments: Pretrained models can be downloaded here: - [pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip) +The static model can be downloaded here: +- [pwgan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [pwgan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_onnx_1.1.0.zip) + + Parallel WaveGAN checkpoint contains files listed below. ```text diff --git a/examples/ljspeech/voc5/README.md b/examples/ljspeech/voc5/README.md index d856cfecfdebed32cd33b6ef90285c3c1ec5299a..eaa51e50783d07577769ecee72f7ea2e65c8d8f2 100644 --- a/examples/ljspeech/voc5/README.md +++ b/examples/ljspeech/voc5/README.md @@ -115,6 +115,12 @@ optional arguments: The pretrained model can be downloaded here: - [hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip) +The static model can be downloaded here: +- [hifigan_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [hifigan_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_onnx_1.1.0.zip) + Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss :-------------:| :------------:| :-----: | :-----: | :--------: diff --git a/examples/vctk/tts3/README.md b/examples/vctk/tts3/README.md index 0b0ce09349dbf11e0823392e7ace9aeb9c1033cc..9c0d75616c3e77b0f058e5efa120fc213e08382b 100644 --- a/examples/vctk/tts3/README.md +++ b/examples/vctk/tts3/README.md @@ -218,6 +218,12 @@ optional arguments: Pretrained FastSpeech2 model with no silence in the edge of audios: - [fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip) +The static model can be downloaded here: +- [fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [fastspeech2_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip) + FastSpeech2 checkpoint contains files listed below. ```text fastspeech2_nosil_vctk_ckpt_0.5 diff --git a/examples/vctk/tts3/local/inference.sh b/examples/vctk/tts3/local/inference.sh index caef89d8b10495cb82d7cdc44d93366371b8cd45..9c4426146ff5fd1bd75eaa4921920feaf106f478 100755 --- a/examples/vctk/tts3/local/inference.sh +++ b/examples/vctk/tts3/local/inference.sh @@ -18,3 +18,15 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then --lang=en fi +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../inference.py \ + --inference_dir=${train_output_path}/inference \ + --am=fastspeech2_vctk \ + --voc=hifigan_vctk \ + --text=${BIN_DIR}/../sentences_en.txt \ + --output_dir=${train_output_path}/pd_infer_out \ + --phones_dict=dump/phone_id_map.txt \ + --speaker_dict=dump/speaker_id_map.txt \ + --spk_id=0 \ + --lang=en +fi diff --git a/examples/vctk/tts3/local/ort_predict.sh b/examples/vctk/tts3/local/ort_predict.sh new file mode 100755 index 0000000000000000000000000000000000000000..4019e17fa935c3955e13b5d63c5fa8414661f4f8 --- /dev/null +++ b/examples/vctk/tts3/local/ort_predict.sh @@ -0,0 +1,34 @@ +train_output_path=$1 + +stage=0 +stop_stage=0 + +# e2e, synthesize from text +if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_vctk \ + --voc=pwgan_vctk \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../sentences_en.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --spk_id=0 \ + --lang=en + +fi + +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + python3 ${BIN_DIR}/../ort_predict_e2e.py \ + --inference_dir=${train_output_path}/inference_onnx \ + --am=fastspeech2_vctk \ + --voc=hifigan_vctk \ + --output_dir=${train_output_path}/onnx_infer_out_e2e \ + --text=${BIN_DIR}/../sentences_en.txt \ + --phones_dict=dump/phone_id_map.txt \ + --device=cpu \ + --cpu_threads=2 \ + --spk_id=0 \ + --lang=en +fi diff --git a/examples/vctk/tts3/local/paddle2onnx.sh b/examples/vctk/tts3/local/paddle2onnx.sh new file mode 120000 index 0000000000000000000000000000000000000000..8d5dbef4ca64b96b1d90d8fb812efccbe7ab3f3e --- /dev/null +++ b/examples/vctk/tts3/local/paddle2onnx.sh @@ -0,0 +1 @@ +../../../csmsc/tts3/local/paddle2onnx.sh \ No newline at end of file diff --git a/examples/vctk/tts3/run.sh b/examples/vctk/tts3/run.sh index a2b849bc8999bc72f5b6c12d79e44ef2d63005d9..3d2a4a9476edc8a90d4676636320bd5d3159bcf4 100755 --- a/examples/vctk/tts3/run.sh +++ b/examples/vctk/tts3/run.sh @@ -27,11 +27,34 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then fi if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then - # synthesize, vocoder is pwgan + # synthesize, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then - # synthesize_e2e, vocoder is pwgan + # synthesize_e2e, vocoder is pwgan by default CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 fi + +if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then + # inference with static model, vocoder is pwgan by default + CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1 +fi + +if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then + # install paddle2onnx + version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}') + if [[ -z "$version" || ${version} != '0.9.8' ]]; then + pip install paddle2onnx==0.9.8 + fi + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_vctk + # considering the balance between speed and quality, we recommend that you use hifigan as vocoder + ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_vctk + # ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_vctk + +fi + +# inference with onnxruntime, use fastspeech2 + hifigan by default +if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then + ./local/ort_predict.sh ${train_output_path} +fi diff --git a/examples/vctk/voc1/README.md b/examples/vctk/voc1/README.md index a0e06a4206d8c214119b187164396fe9a0b1711b..2d80e7563304b968da20e08ed2b39d4b942e5717 100644 --- a/examples/vctk/voc1/README.md +++ b/examples/vctk/voc1/README.md @@ -135,6 +135,13 @@ optional arguments: Pretrained models can be downloaded here: - [pwg_vctk_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.1.1.zip) +The static model can be downloaded here: +- [pwgan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [pwgan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_onnx_1.1.0.zip) + + Parallel WaveGAN checkpoint contains files listed below. ```text diff --git a/examples/vctk/voc5/README.md b/examples/vctk/voc5/README.md index f2cbf27d21706d0702e46a20ff57aabf737de6a2..e937679b53dfed16851920c05b03ad8f9210a0f2 100644 --- a/examples/vctk/voc5/README.md +++ b/examples/vctk/voc5/README.md @@ -121,6 +121,12 @@ optional arguments: The pretrained model can be downloaded here: - [hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip) +The static model can be downloaded here: +- [hifigan_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_static_1.1.0.zip) + +The ONNX model can be downloaded here: +- [hifigan_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_onnx_1.1.0.zip) + Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss :-------------:| :------------:| :-----: | :-----: | :--------: diff --git a/paddlespeech/t2s/exps/inference.py b/paddlespeech/t2s/exps/inference.py index 98e73e10269824816bbfcc56a8cc5a9a79ac28e1..ba951182df01b751c09cb0377d6536aadade8cce 100644 --- a/paddlespeech/t2s/exps/inference.py +++ b/paddlespeech/t2s/exps/inference.py @@ -35,8 +35,12 @@ def parse_args(): type=str, default='fastspeech2_csmsc', choices=[ - 'speedyspeech_csmsc', 'fastspeech2_csmsc', 'fastspeech2_aishell3', - 'fastspeech2_vctk', 'tacotron2_csmsc' + 'speedyspeech_csmsc', + 'fastspeech2_csmsc', + 'fastspeech2_aishell3', + 'fastspeech2_ljspeech', + 'fastspeech2_vctk', + 'tacotron2_csmsc', ], help='Choose acoustic model type of tts task.') parser.add_argument( @@ -56,8 +60,16 @@ def parse_args(): type=str, default='pwgan_csmsc', choices=[ - 'pwgan_csmsc', 'mb_melgan_csmsc', 'hifigan_csmsc', 'pwgan_aishell3', - 'pwgan_vctk', 'wavernn_csmsc' + 'pwgan_csmsc', + 'pwgan_aishell3', + 'pwgan_ljspeech', + 'pwgan_vctk', + 'mb_melgan_csmsc', + 'hifigan_csmsc', + 'hifigan_aishell3', + 'hifigan_ljspeech', + 'hifigan_vctk', + 'wavernn_csmsc', ], help='Choose vocoder type of tts task.') # other diff --git a/paddlespeech/t2s/exps/ort_predict_e2e.py b/paddlespeech/t2s/exps/ort_predict_e2e.py index a2ef8e4c6da5c9eabf77b50c2b0153077d9426a5..f33fc41288d1feba2cb587b6b98a92acdc8f56ea 100644 --- a/paddlespeech/t2s/exps/ort_predict_e2e.py +++ b/paddlespeech/t2s/exps/ort_predict_e2e.py @@ -54,19 +54,31 @@ def ort_predict(args): device=args.device, cpu_threads=args.cpu_threads) + merge_sentences = True + # frontend warmup # Loading model cost 0.5+ seconds if args.lang == 'zh': - frontend.get_input_ids("你好,欢迎使用飞桨框架进行深度学习研究!", merge_sentences=True) + frontend.get_input_ids( + "你好,欢迎使用飞桨框架进行深度学习研究!", merge_sentences=merge_sentences) else: - print("lang should in be 'zh' here!") + frontend.get_input_ids( + "hello, thank you, thank you very much", + merge_sentences=merge_sentences) # am warmup + spk_id = [args.spk_id] for T in [27, 38, 54]: am_input_feed = {} if am_name == 'fastspeech2': - phone_ids = np.random.randint(1, 266, size=(T, )) + if args.lang == 'en': + phone_ids = np.random.randint(1, 78, size=(T, )) + else: + phone_ids = np.random.randint(1, 266, size=(T, )) am_input_feed.update({'text': phone_ids}) + if am_dataset in {"aishell3", "vctk"}: + am_input_feed.update({'spk_id': spk_id}) + elif am_name == 'speedyspeech': phone_ids = np.random.randint(1, 92, size=(T, )) tone_ids = np.random.randint(1, 5, size=(T, )) @@ -96,12 +108,18 @@ def ort_predict(args): phone_ids = input_ids["phone_ids"] if get_tone_ids: tone_ids = input_ids["tone_ids"] + elif args.lang == 'en': + input_ids = frontend.get_input_ids( + sentence, merge_sentences=merge_sentences) + phone_ids = input_ids["phone_ids"] else: - print("lang should in be 'zh' here!") + print("lang should in {'zh', 'en'}!") # merge_sentences=True here, so we only use the first item of phone_ids phone_ids = phone_ids[0].numpy() if am_name == 'fastspeech2': am_input_feed.update({'text': phone_ids}) + if am_dataset in {"aishell3", "vctk"}: + am_input_feed.update({'spk_id': spk_id}) elif am_name == 'speedyspeech': tone_ids = tone_ids[0].numpy() am_input_feed.update({'phones': phone_ids, 'tones': tone_ids}) @@ -130,19 +148,40 @@ def parse_args(): '--am', type=str, default='fastspeech2_csmsc', - choices=['fastspeech2_csmsc', 'speedyspeech_csmsc'], + choices=[ + 'fastspeech2_csmsc', + 'fastspeech2_aishell3', + 'fastspeech2_ljspeech', + 'fastspeech2_vctk', + 'speedyspeech_csmsc', + ], help='Choose acoustic model type of tts task.') parser.add_argument( "--phones_dict", type=str, default=None, help="phone vocabulary file.") parser.add_argument( "--tones_dict", type=str, default=None, help="tone vocabulary file.") + parser.add_argument( + '--spk_id', + type=int, + default=0, + help='spk id for multi speaker acoustic model') # voc parser.add_argument( '--voc', type=str, default='hifigan_csmsc', - choices=['hifigan_csmsc', 'mb_melgan_csmsc', 'pwgan_csmsc'], + choices=[ + 'pwgan_csmsc', + 'pwgan_aishell3', + 'pwgan_ljspeech', + 'pwgan_vctk', + 'hifigan_csmsc', + 'hifigan_aishell3', + 'hifigan_ljspeech', + 'hifigan_vctk', + 'mb_melgan_csmsc', + ], help='Choose vocoder type of tts task.') # other parser.add_argument(