Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
0fa3fdb9
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
0fa3fdb9
编写于
6月 23, 2022
作者:
u010070587
提交者:
GitHub
6月 23, 2022
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #2068 from yt605155624/p_norm
[TTS]add onnx models for aishell3/ljspeech/vctk's tts3/voc1/voc5
上级
ec759094
6a45c5c3
变更
33
显示空白变更内容
内联
并排
Showing
33 changed file
with
499 addition
and
86 deletion
+499
-86
README_cn.md
README_cn.md
+1
-1
docs/source/install_cn.md
docs/source/install_cn.md
+3
-3
docs/source/released_model.md
docs/source/released_model.md
+17
-17
examples/aishell3/tts3/README.md
examples/aishell3/tts3/README.md
+6
-0
examples/aishell3/tts3/local/inference.sh
examples/aishell3/tts3/local/inference.sh
+11
-0
examples/aishell3/tts3/local/ort_predict.sh
examples/aishell3/tts3/local/ort_predict.sh
+32
-0
examples/aishell3/tts3/local/paddle2onnx.sh
examples/aishell3/tts3/local/paddle2onnx.sh
+1
-0
examples/aishell3/tts3/run.sh
examples/aishell3/tts3/run.sh
+25
-2
examples/aishell3/voc1/README.md
examples/aishell3/voc1/README.md
+6
-0
examples/aishell3/voc5/README.md
examples/aishell3/voc5/README.md
+5
-0
examples/csmsc/tts2/local/ort_predict.sh
examples/csmsc/tts2/local/ort_predict.sh
+32
-8
examples/csmsc/tts2/run.sh
examples/csmsc/tts2/run.sh
+9
-11
examples/csmsc/tts3/local/ort_predict.sh
examples/csmsc/tts3/local/ort_predict.sh
+30
-8
examples/csmsc/tts3/local/ort_predict_streaming.sh
examples/csmsc/tts3/local/ort_predict_streaming.sh
+28
-0
examples/csmsc/tts3/local/synthesize_streaming.sh
examples/csmsc/tts3/local/synthesize_streaming.sh
+4
-2
examples/csmsc/tts3/run.sh
examples/csmsc/tts3/run.sh
+11
-8
examples/csmsc/tts3/run_cnndecoder.sh
examples/csmsc/tts3/run_cnndecoder.sh
+17
-12
examples/ljspeech/tts3/README.md
examples/ljspeech/tts3/README.md
+7
-0
examples/ljspeech/tts3/local/inference.sh
examples/ljspeech/tts3/local/inference.sh
+30
-0
examples/ljspeech/tts3/local/ort_predict.sh
examples/ljspeech/tts3/local/ort_predict.sh
+32
-0
examples/ljspeech/tts3/local/paddle2onnx.sh
examples/ljspeech/tts3/local/paddle2onnx.sh
+1
-0
examples/ljspeech/tts3/run.sh
examples/ljspeech/tts3/run.sh
+26
-2
examples/ljspeech/voc1/README.md
examples/ljspeech/voc1/README.md
+7
-0
examples/ljspeech/voc5/README.md
examples/ljspeech/voc5/README.md
+6
-0
examples/vctk/tts3/README.md
examples/vctk/tts3/README.md
+6
-0
examples/vctk/tts3/local/inference.sh
examples/vctk/tts3/local/inference.sh
+12
-0
examples/vctk/tts3/local/ort_predict.sh
examples/vctk/tts3/local/ort_predict.sh
+34
-0
examples/vctk/tts3/local/paddle2onnx.sh
examples/vctk/tts3/local/paddle2onnx.sh
+1
-0
examples/vctk/tts3/run.sh
examples/vctk/tts3/run.sh
+25
-2
examples/vctk/voc1/README.md
examples/vctk/voc1/README.md
+7
-0
examples/vctk/voc5/README.md
examples/vctk/voc5/README.md
+6
-0
paddlespeech/t2s/exps/inference.py
paddlespeech/t2s/exps/inference.py
+16
-4
paddlespeech/t2s/exps/ort_predict_e2e.py
paddlespeech/t2s/exps/ort_predict_e2e.py
+45
-6
未找到文件。
README_cn.md
浏览文件 @
0fa3fdb9
...
...
@@ -691,7 +691,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
-
非常感谢
[
phecda-xu
](
https://github.com/phecda-xu
)
/
[
PaddleDubbing
](
https://github.com/phecda-xu/PaddleDubbing
)
基于 PaddleSpeech 的 TTS 模型搭建带 GUI 操作界面的配音工具。
-
非常感谢
[
jerryuhoo
](
https://github.com/jerryuhoo
)
/
[
VTuberTalk
](
https://github.com/jerryuhoo/VTuberTalk
)
基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。
-
非常感谢
[
vpegasus
](
https://github.com/vpegasus
)
/
[
xuesebot
](
https://github.com/vpegasus/xuesebot
)
基于 PaddleSpeech 的 ASR
与TTS 设计的可听、说对话机器人
-
非常感谢
[
vpegasus
](
https://github.com/vpegasus
)
/
[
xuesebot
](
https://github.com/vpegasus/xuesebot
)
基于 PaddleSpeech 的 ASR
与 TTS 设计的可听、说对话机器人。
此外,PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅
[
references
](
./docs/source/reference.md
)
。
...
...
docs/source/install_cn.md
浏览文件 @
0fa3fdb9
...
...
@@ -116,7 +116,7 @@ conda install -y -c gcc_linux-64=8.4.0 gxx_linux-64=8.4.0
python3
-m
pip
install
paddlepaddle-gpu
==
2.2.0
-i
https://mirror.baidu.com/pypi/simple
```
### 安装 PaddleSpeech
最后安装
`paddlespeech`
,这样你就可以使用
`paddlespeech`
中已有的 examples:
最后安装
`paddlespeech`
,这样你就可以使用
`paddlespeech`
中已有的 examples:
```
bash
# 部分用户系统由于默认源的问题,安装中会出现kaldiio安转出错的问题,建议首先安装pytest-runner:
pip
install
pytest-runner
-i
https://pypi.tuna.tsinghua.edu.cn/simple
...
...
@@ -137,7 +137,7 @@ Docker 是一种开源工具,用于在和系统本身环境相隔离的环境
在
[
Docker Hub
](
https://hub.docker.com/repository/docker/paddlecloud/paddlespeech
)
中获取这些镜像及相应的使用指南,包括 CPU、GPU、ROCm 版本。
如果您对自动化制作
docker
镜像感兴趣,或有自定义需求,请访问
[
PaddlePaddle/PaddleCloud
](
https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton
)
做进一步了解。
如果您对自动化制作
docker
镜像感兴趣,或有自定义需求,请访问
[
PaddlePaddle/PaddleCloud
](
https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton
)
做进一步了解。
完成这些以后,你就可以在 docker 容器中执行训练、推理和超参 fine-tune。
### 选择2: 使用有 root 权限的 Ubuntu
-
使用apt安装
`build-essential`
...
...
@@ -173,7 +173,7 @@ conda install -y -c conda-forge sox libsndfile swig bzip2 libflac bc
python3
-m
pip
install
paddlepaddle-gpu
==
2.2.0
-i
https://mirror.baidu.com/pypi/simple
```
### 用开发者模式安装 PaddleSpeech
部分用户系统由于默认源的问题,安装中会出现
kaldiio安转出错的问题,建议首先安装
pytest-runner:
部分用户系统由于默认源的问题,安装中会出现
kaldiio 安转出错的问题,建议首先安装
pytest-runner:
```
bash
pip
install
pytest-runner
-i
https://pypi.tuna.tsinghua.edu.cn/simple
```
...
...
docs/source/released_model.md
浏览文件 @
0fa3fdb9
# Released Models
## Speech-to-Text Models
...
...
@@ -34,32 +33,33 @@ Language Model | Training Data | Token-based | Size | Descriptions
## Text-to-Speech Models
### Acoustic Models
Model Type | Dataset| Example Link | Pretrained Models|Static Models|Size (static)
Model Type | Dataset| Example Link | Pretrained Models|Static
/ONNX
Models|Size (static)
:-------------:| :------------:| :-----: | :-----:| :-----:| :-----:
Tacotron2|LJSpeech|
[
tacotron2-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0
)
|
[
tacotron2_ljspeech_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_ljspeech_ckpt_0.2.0.zip
)
|||
Tacotron2|CSMSC|
[
tacotron2-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts0
)
|
[
tacotron2_csmsc_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_ckpt_0.2.0.zip
)
|
[
tacotron2_csmsc_static_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip
)
|103MB|
TransformerTTS| LJSpeech|
[
transformer-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1
)
|
[
transformer_tts_ljspeech_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip
)
|||
SpeedySpeech| CSMSC |
[
speedyspeech-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2
)
|
[
speedyspeech_csmsc_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_ckpt_0.2.0.zip
)
|
[
speedyspeech_csmsc_static_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip
)
|12
MB|
FastSpeech2| CSMSC |
[
fastspeech2-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3
)
|
[
fastspeech2_nosil_baker_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip
)
|
[
fastspeech2_csmsc_static_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip
)
|157MB|
SpeedySpeech| CSMSC |
[
speedyspeech-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2
)
|
[
speedyspeech_csmsc_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_ckpt_0.2.0.zip
)
|
[
speedyspeech_csmsc_static_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip
)
</br>
[
speedyspeech_csmsc_onnx_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_onnx_0.2.0.zip
)
|13
MB|
FastSpeech2| CSMSC |
[
fastspeech2-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3
)
|
[
fastspeech2_nosil_baker_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip
)
|
[
fastspeech2_csmsc_static_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip
)
</br>
[
fastspeech2_csmsc_onnx_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_onnx_0.2.0.zip
)
|157MB|
FastSpeech2-Conformer| CSMSC |
[
fastspeech2-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3
)
|
[
fastspeech2_conformer_baker_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_baker_ckpt_0.5.zip
)
|||
FastSpeech2| AISHELL-3 |
[
fastspeech2-aishell3
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3
)
|
[
fastspeech2_nosil_aishell3_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip
)
|||
FastSpeech2| LJSpeech |
[
fastspeech2-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3
)
|
[
fastspeech2_nosil_ljspeech_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip
)
|||
FastSpeech2| VCTK |
[
fastspeech2-vctk
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3
)
|
[
fastspeech2_nosil_vctk_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip
)
|||
FastSpeech2-CNNDecoder| CSMSC|
[
fastspeech2-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3
)
|
[
fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip
)
|
[
fastspeech2_cnndecoder_csmsc_static_1.0.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_static_1.0.0.zip
)
</br>
[
fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip
)
</br>
[
fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip
)
</br>
[
fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip
)
| 84MB|
FastSpeech2| AISHELL-3 |
[
fastspeech2-aishell3
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3
)
|
[
fastspeech2_nosil_aishell3_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip
)
|
[
fastspeech2_aishell3_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip
)
</br>
[
fastspeech2_aishell3_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip
)
|147MB|
FastSpeech2| LJSpeech |
[
fastspeech2-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3
)
|
[
fastspeech2_nosil_ljspeech_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip
)
|
[
fastspeech2_ljspeech_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_static_1.1.0.zip
)
</br>
[
fastspeech2_ljspeech_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_onnx_1.1.0.zip
)
|145MB|
FastSpeech2| VCTK |
[
fastspeech2-vctk
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3
)
|
[
fastspeech2_nosil_vctk_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip
)
|
[
fastspeech2_vctk_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip
)
</br>
[
fastspeech2_vctk_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip
)
| 145MB|
### Vocoders
Model Type | Dataset| Example Link | Pretrained Models| Static Models|Size (static)
Model Type | Dataset| Example Link | Pretrained Models| Static
/ONNX
Models|Size (static)
:-----:| :-----:| :-----: | :-----:| :-----:| :-----:
WaveFlow| LJSpeech |
[
waveflow-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc0
)
|
[
waveflow_ljspeech_ckpt_0.3.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/waveflow/waveflow_ljspeech_ckpt_0.3.zip
)
|||
Parallel WaveGAN| CSMSC |
[
PWGAN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1
)
|
[
pwg_baker_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip
)
|
[
pwg_baker_static_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip
)
|5.1
MB|
Parallel WaveGAN| LJSpeech |
[
PWGAN-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc1
)
|
[
pwg_ljspeech_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip
)
|
|
|
Parallel WaveGAN| AISHELL-3 |
[
PWGAN-aishell3
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc1
)
|
[
pwg_aishell3_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
)
|
|
|
Parallel WaveGAN| VCTK |
[
PWGAN-vctk
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc1
)
|
[
pwg_vctk_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.5.zip
)
|
|
|
|Multi Band MelGAN | CSMSC |
[
MB MelGAN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc3
)
|
[
mb_melgan_csmsc_ckpt_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip
)
<br>
[
mb_melgan_baker_finetune_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip
)
|
[
mb_melgan_csmsc_static_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip
)
|8.2
MB|
Parallel WaveGAN| CSMSC |
[
PWGAN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1
)
|
[
pwg_baker_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip
)
|
[
pwg_baker_static_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip
)
</br>
[
pwgan_csmsc_onnx_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_csmsc_onnx_0.2.0.zip
)
|4.8
MB|
Parallel WaveGAN| LJSpeech |
[
PWGAN-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc1
)
|
[
pwg_ljspeech_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip
)
|
[
pwgan_ljspeech_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_static_1.1.0.zip
)
</br>
[
pwgan_ljspeech_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_onnx_1.1.0.zip
)
|4.8MB
|
Parallel WaveGAN| AISHELL-3 |
[
PWGAN-aishell3
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc1
)
|
[
pwg_aishell3_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
)
|
[
pwgan_aishell3_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_static_1.1.0.zip
)
</br>
[
pwgan_aishell3_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_onnx_1.1.0.zip
)
|4.8MB
|
Parallel WaveGAN| VCTK |
[
PWGAN-vctk
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc1
)
|
[
pwg_vctk_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.5.zip
)
|
[
pwgan_vctk_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_static_1.1.0.zip
)
</br>
[
pwgan_vctk_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_onnx_1.1.0.zip
)
|4.8MB
|
|Multi Band MelGAN | CSMSC |
[
MB MelGAN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc3
)
|
[
mb_melgan_csmsc_ckpt_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip
)
<br>
[
mb_melgan_baker_finetune_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip
)
|
[
mb_melgan_csmsc_static_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip
)
</br>
[
mb_melgan_csmsc_onnx_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip
)
|7.6
MB|
Style MelGAN | CSMSC |
[
Style MelGAN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc4
)
|
[
style_melgan_csmsc_ckpt_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/style_melgan/style_melgan_csmsc_ckpt_0.1.1.zip
)
| | |
HiFiGAN | CSMSC |
[
HiFiGAN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5
)
|
[
hifigan_csmsc_ckpt_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip
)
|
[
hifigan_csmsc_static_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip
)
|50
MB|
HiFiGAN | LJSpeech |
[
HiFiGAN-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc5
)
|
[
hifigan_ljspeech_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip
)
|
|
|
HiFiGAN | AISHELL-3 |
[
HiFiGAN-aishell3
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc5
)
|
[
hifigan_aishell3_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
)
|
|
|
HiFiGAN | VCTK |
[
HiFiGAN-vctk
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc5
)
|
[
hifigan_vctk_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
)
|
|
|
HiFiGAN | CSMSC |
[
HiFiGAN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5
)
|
[
hifigan_csmsc_ckpt_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip
)
|
[
hifigan_csmsc_static_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip
)
</br>
[
hifigan_csmsc_onnx_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_onnx_0.2.0.zip
)
|46
MB|
HiFiGAN | LJSpeech |
[
HiFiGAN-ljspeech
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc5
)
|
[
hifigan_ljspeech_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip
)
|
[
hifigan_ljspeech_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_static_1.1.0.zip
)
</br>
[
hifigan_ljspeech_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_onnx_1.1.0.zip
)
|49MB
|
HiFiGAN | AISHELL-3 |
[
HiFiGAN-aishell3
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc5
)
|
[
hifigan_aishell3_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
)
|
[
hifigan_aishell3_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_static_1.1.0.zip
)
</br>
[
hifigan_aishell3_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_onnx_1.1.0.zip
)
|46MB
|
HiFiGAN | VCTK |
[
HiFiGAN-vctk
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/voc5
)
|
[
hifigan_vctk_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
)
|
[
hifigan_vctk_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_static_1.1.0.zip
)
</br>
[
hifigan_vctk_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_onnx_1.1.0.zip
)
|46MB
|
WaveRNN | CSMSC |
[
WaveRNN-csmsc
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc6
)
|
[
wavernn_csmsc_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_ckpt_0.2.0.zip
)
|
[
wavernn_csmsc_static_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_0.2.0.zip
)
|18MB|
...
...
examples/aishell3/tts3/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -220,6 +220,12 @@ Pretrained FastSpeech2 model with no silence in the edge of audios:
-
[
fastspeech2_nosil_aishell3_ckpt_0.4.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip
)
-
[
fastspeech2_conformer_aishell3_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_aishell3_ckpt_0.2.0.zip
)
(
Thanks
for
[
@awmmmm
](
https://github.com/awmmmm
)
's contribution)
The static model can be downloaded here:
-
[
fastspeech2_aishell3_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
fastspeech2_aishell3_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip
)
FastSpeech2 checkpoint contains files listed below.
```
text
...
...
examples/aishell3/tts3/local/inference.sh
浏览文件 @
0fa3fdb9
...
...
@@ -17,3 +17,14 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
--spk_id
=
0
fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../inference.py
\
--inference_dir
=
${
train_output_path
}
/inference
\
--am
=
fastspeech2_aishell3
\
--voc
=
hifigan_aishell3
\
--text
=
${
BIN_DIR
}
/../sentences.txt
\
--output_dir
=
${
train_output_path
}
/pd_infer_out
\
--phones_dict
=
dump/phone_id_map.txt
\
--speaker_dict
=
dump/speaker_id_map.txt
\
--spk_id
=
0
fi
examples/aishell3/tts3/local/ort_predict.sh
0 → 100755
浏览文件 @
0fa3fdb9
train_output_path
=
$1
stage
=
0
stop_stage
=
0
# e2e, synthesize from text
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_aishell3
\
--voc
=
pwgan_aishell3
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--spk_id
=
0
fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_aishell3
\
--voc
=
hifigan_aishell3
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--spk_id
=
0
fi
examples/aishell3/tts3/local/paddle2onnx.sh
0 → 120000
浏览文件 @
0fa3fdb9
../../../csmsc/tts3/local/paddle2onnx.sh
\ No newline at end of file
examples/aishell3/tts3/run.sh
浏览文件 @
0fa3fdb9
...
...
@@ -27,11 +27,34 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
# synthesize, vocoder is pwgan
# synthesize, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# synthesize_e2e, vocoder is pwgan
# synthesize_e2e, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize_e2e.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# inference with static model, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/inference.sh
${
train_output_path
}
||
exit
-1
fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# install paddle2onnx
version
=
$(
echo
`
pip list |grep
"paddle2onnx"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.8'
]]
;
then
pip
install
paddle2onnx
==
0.9.8
fi
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx fastspeech2_aishell3
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx pwgan_aishell3
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_aishell3
fi
# inference with onnxruntime, use fastspeech2 + hifigan by default
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
./local/ort_predict.sh
${
train_output_path
}
fi
examples/aishell3/voc1/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -133,6 +133,12 @@ optional arguments:
Pretrained models can be downloaded here:
-
[
pwg_aishell3_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
)
The static model can be downloaded here:
-
[
pwgan_aishell3_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
pwgan_aishell3_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_aishell3_onnx_1.1.0.zip
)
Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss:| eval/spectral_convergence_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
default| 1(gpu) x 400000|1.968762|0.759008|0.218524
...
...
examples/aishell3/voc5/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -116,6 +116,11 @@ optional arguments:
The pretrained model can be downloaded here:
-
[
hifigan_aishell3_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
)
The static model can be downloaded here:
-
[
hifigan_aishell3_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
hifigan_aishell3_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_onnx_1.1.0.zip
)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
...
...
examples/csmsc/tts2/local/ort_predict.sh
浏览文件 @
0fa3fdb9
...
...
@@ -3,22 +3,34 @@ train_output_path=$1
stage
=
0
stop_stage
=
0
# only support default_fastspeech2/speedyspeech + hifigan/mb_melgan now!
# synthesize from metadata
# e2e, synthesize from text
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
python3
${
BIN_DIR
}
/../ort_predict.py
\
python3
${
BIN_DIR
}
/../ort_predict
_e2e
.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
speedyspeech_csmsc
\
--voc
=
hifigan_csmsc
\
--test_metadata
=
dump/test/norm/metadata.jsonl
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out
\
--voc
=
pwgan_csmsc
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--tones_dict
=
dump/tone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
fi
# e2e, synthesize from text
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
speedyspeech_csmsc
\
--voc
=
mb_melgan_csmsc
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--tones_dict
=
dump/tone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
speedyspeech_csmsc
\
...
...
@@ -30,3 +42,15 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
--device
=
cpu
\
--cpu_threads
=
2
fi
# synthesize from metadata
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
python3
${
BIN_DIR
}
/../ort_predict.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
speedyspeech_csmsc
\
--voc
=
hifigan_csmsc
\
--test_metadata
=
dump/test/norm/metadata.jsonl
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out
\
--device
=
cpu
\
--cpu_threads
=
2
fi
examples/csmsc/tts2/run.sh
浏览文件 @
0fa3fdb9
...
...
@@ -27,12 +27,12 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
# synthesize, vocoder is pwgan
# synthesize, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# synthesize_e2e, vocoder is pwgan
# synthesize_e2e, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize_e2e.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
...
...
@@ -46,19 +46,17 @@ fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# install paddle2onnx
version
=
$(
echo
`
pip list |grep
"paddle2onnx"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
5
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
5
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
8
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
8
fi
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx speedyspeech_csmsc
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx hifigan_csmsc
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx pwgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
fi
# inference with onnxruntime
, use fastspeech2 + hifigan by default
# inference with onnxruntime
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
# install onnxruntime
version
=
$(
echo
`
pip list |grep
"onnxruntime"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'1.10.0'
]]
;
then
pip
install
onnxruntime
==
1.10.0
fi
./local/ort_predict.sh
${
train_output_path
}
fi
examples/csmsc/tts3/local/ort_predict.sh
浏览文件 @
0fa3fdb9
...
...
@@ -3,22 +3,32 @@ train_output_path=$1
stage
=
0
stop_stage
=
0
# only support default_fastspeech2/speedyspeech + hifigan/mb_melgan now!
# synthesize from metadata
# e2e, synthesize from text
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
python3
${
BIN_DIR
}
/../ort_predict.py
\
python3
${
BIN_DIR
}
/../ort_predict
_e2e
.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_csmsc
\
--voc
=
hifigan_csmsc
\
--test_metadata
=
dump/test/norm/metadata.jsonl
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out
\
--voc
=
pwgan_csmsc
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
fi
# e2e, synthesize from text
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_csmsc
\
--voc
=
mb_melgan_csmsc
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_csmsc
\
...
...
@@ -29,3 +39,15 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
--device
=
cpu
\
--cpu_threads
=
2
fi
# synthesize from metadata, take hifigan as an example
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
python3
${
BIN_DIR
}
/../ort_predict.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_csmsc
\
--voc
=
hifigan_csmsc
\
--test_metadata
=
dump/test/norm/metadata.jsonl
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out
\
--device
=
cpu
\
--cpu_threads
=
2
fi
\ No newline at end of file
examples/csmsc/tts3/local/ort_predict_streaming.sh
浏览文件 @
0fa3fdb9
...
...
@@ -5,6 +5,34 @@ stop_stage=0
# e2e, synthesize from text
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_streaming.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx_streaming
\
--am
=
fastspeech2_csmsc
\
--am_stat
=
dump/train/speech_stats.npy
\
--voc
=
pwgan_csmsc
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_streaming
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--am_streaming
=
True
fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_streaming.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx_streaming
\
--am
=
fastspeech2_csmsc
\
--am_stat
=
dump/train/speech_stats.npy
\
--voc
=
mb_melgan_csmsc
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_streaming
\
--text
=
${
BIN_DIR
}
/../csmsc_test.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--am_streaming
=
True
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_streaming.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx_streaming
\
--am
=
fastspeech2_csmsc
\
...
...
examples/csmsc/tts3/local/synthesize_streaming.sh
浏览文件 @
0fa3fdb9
...
...
@@ -24,7 +24,8 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
--text
=
${
BIN_DIR
}
/../sentences.txt
\
--output_dir
=
${
train_output_path
}
/test_e2e_streaming
\
--phones_dict
=
dump/phone_id_map.txt
\
--am_streaming
=
True
--am_streaming
=
True
\
--inference_dir
=
${
train_output_path
}
/inference_streaming
fi
# for more GAN Vocoders
...
...
@@ -45,7 +46,8 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
--text
=
${
BIN_DIR
}
/../sentences.txt
\
--output_dir
=
${
train_output_path
}
/test_e2e_streaming
\
--phones_dict
=
dump/phone_id_map.txt
\
--am_streaming
=
True
--am_streaming
=
True
\
--inference_dir
=
${
train_output_path
}
/inference_streaming
fi
# the pretrained models haven't release now
...
...
examples/csmsc/tts3/run.sh
浏览文件 @
0fa3fdb9
...
...
@@ -27,17 +27,17 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
# synthesize, vocoder is pwgan
# synthesize, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# synthesize_e2e, vocoder is pwgan
# synthesize_e2e, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize_e2e.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# inference with static model
# inference with static model
, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/inference.sh
${
train_output_path
}
||
exit
-1
fi
...
...
@@ -46,15 +46,18 @@ fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# install paddle2onnx
version
=
$(
echo
`
pip list |grep
"paddle2onnx"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
5
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
5
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
8
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
8
fi
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx fastspeech2_csmsc
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx hifigan_csmsc
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx mb_melgan_csmsc
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx pwgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
fi
# inference with onnxruntime
, use fastspeech2 + hifigan by default
# inference with onnxruntime
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
./local/ort_predict.sh
${
train_output_path
}
fi
examples/csmsc/tts3/run_cnndecoder.sh
浏览文件 @
0fa3fdb9
...
...
@@ -33,25 +33,25 @@ fi
# synthesize_e2e non-streaming
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# synthesize_e2e, vocoder is pwgan
# synthesize_e2e, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize_e2e.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
# inference non-streaming
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# inference with static model
# inference with static model
, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/inference.sh
${
train_output_path
}
||
exit
-1
fi
# synthesize_e2e streaming
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# synthesize_e2e, vocoder is pwgan
# synthesize_e2e, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize_streaming.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
# inference streaming
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
# inference with static model
# inference with static model
, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/inference_streaming.sh
${
train_output_path
}
||
exit
-1
fi
...
...
@@ -59,32 +59,37 @@ fi
if
[
${
stage
}
-le
7
]
&&
[
${
stop_stage
}
-ge
7
]
;
then
# install paddle2onnx
version
=
$(
echo
`
pip list |grep
"paddle2onnx"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
5
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
5
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
8
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
8
fi
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx fastspeech2_csmsc
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx hifigan_csmsc
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx pwgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
fi
# onnxruntime non streaming
# inference with onnxruntime, use fastspeech2 + hifigan by default
if
[
${
stage
}
-le
8
]
&&
[
${
stop_stage
}
-ge
8
]
;
then
./local/ort_predict.sh
${
train_output_path
}
fi
# paddle2onnx streaming
if
[
${
stage
}
-le
9
]
&&
[
${
stop_stage
}
-ge
9
]
;
then
# install paddle2onnx
version
=
$(
echo
`
pip list |grep
"paddle2onnx"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
5
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
5
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.
8
'
]]
;
then
pip
install
paddle2onnx
==
0.9.
8
fi
# streaming acoustic model
./local/paddle2onnx.sh
${
train_output_path
}
inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_encoder_infer
./local/paddle2onnx.sh
${
train_output_path
}
inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_decoder
./local/paddle2onnx.sh
${
train_output_path
}
inference_streaming inference_onnx_streaming fastspeech2_csmsc_am_postnet
# vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference_streaming inference_onnx_streaming hifigan_csmsc
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference_streaming inference_onnx_streaming pwgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming mb_melgan_csmsc
# ./local/paddle2onnx.sh ${train_output_path} inference_streaming inference_onnx_streaming hifigan_csmsc
fi
# onnxruntime streaming
...
...
examples/ljspeech/tts3/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -215,6 +215,13 @@ optional arguments:
Pretrained FastSpeech2 model with no silence in the edge of audios:
-
[
fastspeech2_nosil_ljspeech_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip
)
The static model can be downloaded here:
-
[
fastspeech2_ljspeech_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
fastspeech2_ljspeech_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_onnx_1.1.0.zip
)
Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss
:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
default| 2(gpu) x 100000| 1.505682|0.612104| 0.045505| 0.62792| 0.220147
...
...
examples/ljspeech/tts3/local/inference.sh
0 → 100755
浏览文件 @
0fa3fdb9
#!/bin/bash
train_output_path
=
$1
stage
=
0
stop_stage
=
0
# pwgan
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
python3
${
BIN_DIR
}
/../inference.py
\
--inference_dir
=
${
train_output_path
}
/inference
\
--am
=
fastspeech2_ljspeech
\
--voc
=
pwgan_ljspeech
\
--text
=
${
BIN_DIR
}
/../sentences_en.txt
\
--output_dir
=
${
train_output_path
}
/pd_infer_out
\
--phones_dict
=
dump/phone_id_map.txt
\
--lang
=
en
fi
# hifigan
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../inference.py
\
--inference_dir
=
${
train_output_path
}
/inference
\
--am
=
fastspeech2_ljspeech
\
--voc
=
hifigan_ljspeech
\
--text
=
${
BIN_DIR
}
/../sentences_en.txt
\
--output_dir
=
${
train_output_path
}
/pd_infer_out
\
--phones_dict
=
dump/phone_id_map.txt
\
--lang
=
en
fi
examples/ljspeech/tts3/local/ort_predict.sh
0 → 100755
浏览文件 @
0fa3fdb9
train_output_path
=
$1
stage
=
0
stop_stage
=
0
# e2e, synthesize from text
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_ljspeech
\
--voc
=
pwgan_ljspeech
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../sentences_en.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--lang
=
en
fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_ljspeech
\
--voc
=
hifigan_ljspeech
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../sentences_en.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--lang
=
en
fi
examples/ljspeech/tts3/local/paddle2onnx.sh
0 → 120000
浏览文件 @
0fa3fdb9
../../../csmsc/tts3/local/paddle2onnx.sh
\ No newline at end of file
examples/ljspeech/tts3/run.sh
浏览文件 @
0fa3fdb9
...
...
@@ -27,11 +27,35 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
# synthesize, vocoder is pwgan
# synthesize, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# synthesize_e2e, vocoder is pwgan
# synthesize_e2e, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize_e2e.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# inference with static model, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/inference.sh
${
train_output_path
}
||
exit
-1
fi
# paddle2onnx, please make sure the static models are in ${train_output_path}/inference first
# we have only tested the following models so far
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# install paddle2onnx
version
=
$(
echo
`
pip list |grep
"paddle2onnx"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.8'
]]
;
then
pip
install
paddle2onnx
==
0.9.8
fi
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx fastspeech2_ljspeech
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx pwgan_ljspeech
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_ljspeech
fi
# inference with onnxruntime, use fastspeech2 + hifigan by default
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
./local/ort_predict.sh
${
train_output_path
}
fi
examples/ljspeech/voc1/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -130,6 +130,13 @@ optional arguments:
Pretrained models can be downloaded here:
-
[
pwg_ljspeech_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip
)
The static model can be downloaded here:
-
[
pwgan_ljspeech_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
pwgan_ljspeech_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_ljspeech_onnx_1.1.0.zip
)
Parallel WaveGAN checkpoint contains files listed below.
```
text
...
...
examples/ljspeech/voc5/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -115,6 +115,12 @@ optional arguments:
The pretrained model can be downloaded here:
-
[
hifigan_ljspeech_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip
)
The static model can be downloaded here:
-
[
hifigan_ljspeech_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
hifigan_ljspeech_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_onnx_1.1.0.zip
)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
...
...
examples/vctk/tts3/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -218,6 +218,12 @@ optional arguments:
Pretrained FastSpeech2 model with no silence in the edge of audios:
-
[
fastspeech2_nosil_vctk_ckpt_0.5.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip
)
The static model can be downloaded here:
-
[
fastspeech2_vctk_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
fastspeech2_vctk_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip
)
FastSpeech2 checkpoint contains files listed below.
```
text
fastspeech2_nosil_vctk_ckpt_0.5
...
...
examples/vctk/tts3/local/inference.sh
浏览文件 @
0fa3fdb9
...
...
@@ -18,3 +18,15 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
--lang
=
en
fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../inference.py
\
--inference_dir
=
${
train_output_path
}
/inference
\
--am
=
fastspeech2_vctk
\
--voc
=
hifigan_vctk
\
--text
=
${
BIN_DIR
}
/../sentences_en.txt
\
--output_dir
=
${
train_output_path
}
/pd_infer_out
\
--phones_dict
=
dump/phone_id_map.txt
\
--speaker_dict
=
dump/speaker_id_map.txt
\
--spk_id
=
0
\
--lang
=
en
fi
examples/vctk/tts3/local/ort_predict.sh
0 → 100755
浏览文件 @
0fa3fdb9
train_output_path
=
$1
stage
=
0
stop_stage
=
0
# e2e, synthesize from text
if
[
${
stage
}
-le
0
]
&&
[
${
stop_stage
}
-ge
0
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_vctk
\
--voc
=
pwgan_vctk
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../sentences_en.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--spk_id
=
0
\
--lang
=
en
fi
if
[
${
stage
}
-le
1
]
&&
[
${
stop_stage
}
-ge
1
]
;
then
python3
${
BIN_DIR
}
/../ort_predict_e2e.py
\
--inference_dir
=
${
train_output_path
}
/inference_onnx
\
--am
=
fastspeech2_vctk
\
--voc
=
hifigan_vctk
\
--output_dir
=
${
train_output_path
}
/onnx_infer_out_e2e
\
--text
=
${
BIN_DIR
}
/../sentences_en.txt
\
--phones_dict
=
dump/phone_id_map.txt
\
--device
=
cpu
\
--cpu_threads
=
2
\
--spk_id
=
0
\
--lang
=
en
fi
examples/vctk/tts3/local/paddle2onnx.sh
0 → 120000
浏览文件 @
0fa3fdb9
../../../csmsc/tts3/local/paddle2onnx.sh
\ No newline at end of file
examples/vctk/tts3/run.sh
浏览文件 @
0fa3fdb9
...
...
@@ -27,11 +27,34 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi
if
[
${
stage
}
-le
2
]
&&
[
${
stop_stage
}
-ge
2
]
;
then
# synthesize, vocoder is pwgan
# synthesize, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
3
]
&&
[
${
stop_stage
}
-ge
3
]
;
then
# synthesize_e2e, vocoder is pwgan
# synthesize_e2e, vocoder is pwgan
by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/synthesize_e2e.sh
${
conf_path
}
${
train_output_path
}
${
ckpt_name
}
||
exit
-1
fi
if
[
${
stage
}
-le
4
]
&&
[
${
stop_stage
}
-ge
4
]
;
then
# inference with static model, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES
=
${
gpus
}
./local/inference.sh
${
train_output_path
}
||
exit
-1
fi
if
[
${
stage
}
-le
5
]
&&
[
${
stop_stage
}
-ge
5
]
;
then
# install paddle2onnx
version
=
$(
echo
`
pip list |grep
"paddle2onnx"
`
|awk
-F
" "
'{print $2}'
)
if
[[
-z
"
$version
"
||
${
version
}
!=
'0.9.8'
]]
;
then
pip
install
paddle2onnx
==
0.9.8
fi
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx fastspeech2_vctk
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
./local/paddle2onnx.sh
${
train_output_path
}
inference inference_onnx pwgan_vctk
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_vctk
fi
# inference with onnxruntime, use fastspeech2 + hifigan by default
if
[
${
stage
}
-le
6
]
&&
[
${
stop_stage
}
-ge
6
]
;
then
./local/ort_predict.sh
${
train_output_path
}
fi
examples/vctk/voc1/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -135,6 +135,13 @@ optional arguments:
Pretrained models can be downloaded here:
-
[
pwg_vctk_ckpt_0.1.1.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.1.1.zip
)
The static model can be downloaded here:
-
[
pwgan_vctk_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
pwgan_vctk_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwgan_vctk_onnx_1.1.0.zip
)
Parallel WaveGAN checkpoint contains files listed below.
```
text
...
...
examples/vctk/voc5/README.md
浏览文件 @
0fa3fdb9
...
...
@@ -121,6 +121,12 @@ optional arguments:
The pretrained model can be downloaded here:
-
[
hifigan_vctk_ckpt_0.2.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
)
The static model can be downloaded here:
-
[
hifigan_vctk_static_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_static_1.1.0.zip
)
The ONNX model can be downloaded here:
-
[
hifigan_vctk_onnx_1.1.0.zip
](
https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_onnx_1.1.0.zip
)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
...
...
paddlespeech/t2s/exps/inference.py
浏览文件 @
0fa3fdb9
...
...
@@ -35,8 +35,12 @@ def parse_args():
type
=
str
,
default
=
'fastspeech2_csmsc'
,
choices
=
[
'speedyspeech_csmsc'
,
'fastspeech2_csmsc'
,
'fastspeech2_aishell3'
,
'fastspeech2_vctk'
,
'tacotron2_csmsc'
'speedyspeech_csmsc'
,
'fastspeech2_csmsc'
,
'fastspeech2_aishell3'
,
'fastspeech2_ljspeech'
,
'fastspeech2_vctk'
,
'tacotron2_csmsc'
,
],
help
=
'Choose acoustic model type of tts task.'
)
parser
.
add_argument
(
...
...
@@ -56,8 +60,16 @@ def parse_args():
type
=
str
,
default
=
'pwgan_csmsc'
,
choices
=
[
'pwgan_csmsc'
,
'mb_melgan_csmsc'
,
'hifigan_csmsc'
,
'pwgan_aishell3'
,
'pwgan_vctk'
,
'wavernn_csmsc'
'pwgan_csmsc'
,
'pwgan_aishell3'
,
'pwgan_ljspeech'
,
'pwgan_vctk'
,
'mb_melgan_csmsc'
,
'hifigan_csmsc'
,
'hifigan_aishell3'
,
'hifigan_ljspeech'
,
'hifigan_vctk'
,
'wavernn_csmsc'
,
],
help
=
'Choose vocoder type of tts task.'
)
# other
...
...
paddlespeech/t2s/exps/ort_predict_e2e.py
浏览文件 @
0fa3fdb9
...
...
@@ -54,19 +54,31 @@ def ort_predict(args):
device
=
args
.
device
,
cpu_threads
=
args
.
cpu_threads
)
merge_sentences
=
True
# frontend warmup
# Loading model cost 0.5+ seconds
if
args
.
lang
==
'zh'
:
frontend
.
get_input_ids
(
"你好,欢迎使用飞桨框架进行深度学习研究!"
,
merge_sentences
=
True
)
frontend
.
get_input_ids
(
"你好,欢迎使用飞桨框架进行深度学习研究!"
,
merge_sentences
=
merge_sentences
)
else
:
print
(
"lang should in be 'zh' here!"
)
frontend
.
get_input_ids
(
"hello, thank you, thank you very much"
,
merge_sentences
=
merge_sentences
)
# am warmup
spk_id
=
[
args
.
spk_id
]
for
T
in
[
27
,
38
,
54
]:
am_input_feed
=
{}
if
am_name
==
'fastspeech2'
:
if
args
.
lang
==
'en'
:
phone_ids
=
np
.
random
.
randint
(
1
,
78
,
size
=
(
T
,
))
else
:
phone_ids
=
np
.
random
.
randint
(
1
,
266
,
size
=
(
T
,
))
am_input_feed
.
update
({
'text'
:
phone_ids
})
if
am_dataset
in
{
"aishell3"
,
"vctk"
}:
am_input_feed
.
update
({
'spk_id'
:
spk_id
})
elif
am_name
==
'speedyspeech'
:
phone_ids
=
np
.
random
.
randint
(
1
,
92
,
size
=
(
T
,
))
tone_ids
=
np
.
random
.
randint
(
1
,
5
,
size
=
(
T
,
))
...
...
@@ -96,12 +108,18 @@ def ort_predict(args):
phone_ids
=
input_ids
[
"phone_ids"
]
if
get_tone_ids
:
tone_ids
=
input_ids
[
"tone_ids"
]
elif
args
.
lang
==
'en'
:
input_ids
=
frontend
.
get_input_ids
(
sentence
,
merge_sentences
=
merge_sentences
)
phone_ids
=
input_ids
[
"phone_ids"
]
else
:
print
(
"lang should in
be 'zh' here
!"
)
print
(
"lang should in
{'zh', 'en'}
!"
)
# merge_sentences=True here, so we only use the first item of phone_ids
phone_ids
=
phone_ids
[
0
].
numpy
()
if
am_name
==
'fastspeech2'
:
am_input_feed
.
update
({
'text'
:
phone_ids
})
if
am_dataset
in
{
"aishell3"
,
"vctk"
}:
am_input_feed
.
update
({
'spk_id'
:
spk_id
})
elif
am_name
==
'speedyspeech'
:
tone_ids
=
tone_ids
[
0
].
numpy
()
am_input_feed
.
update
({
'phones'
:
phone_ids
,
'tones'
:
tone_ids
})
...
...
@@ -130,19 +148,40 @@ def parse_args():
'--am'
,
type
=
str
,
default
=
'fastspeech2_csmsc'
,
choices
=
[
'fastspeech2_csmsc'
,
'speedyspeech_csmsc'
],
choices
=
[
'fastspeech2_csmsc'
,
'fastspeech2_aishell3'
,
'fastspeech2_ljspeech'
,
'fastspeech2_vctk'
,
'speedyspeech_csmsc'
,
],
help
=
'Choose acoustic model type of tts task.'
)
parser
.
add_argument
(
"--phones_dict"
,
type
=
str
,
default
=
None
,
help
=
"phone vocabulary file."
)
parser
.
add_argument
(
"--tones_dict"
,
type
=
str
,
default
=
None
,
help
=
"tone vocabulary file."
)
parser
.
add_argument
(
'--spk_id'
,
type
=
int
,
default
=
0
,
help
=
'spk id for multi speaker acoustic model'
)
# voc
parser
.
add_argument
(
'--voc'
,
type
=
str
,
default
=
'hifigan_csmsc'
,
choices
=
[
'hifigan_csmsc'
,
'mb_melgan_csmsc'
,
'pwgan_csmsc'
],
choices
=
[
'pwgan_csmsc'
,
'pwgan_aishell3'
,
'pwgan_ljspeech'
,
'pwgan_vctk'
,
'hifigan_csmsc'
,
'hifigan_aishell3'
,
'hifigan_ljspeech'
,
'hifigan_vctk'
,
'mb_melgan_csmsc'
,
],
help
=
'Choose vocoder type of tts task.'
)
# other
parser
.
add_argument
(
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录