diff --git a/README.md b/README.md
index 5c62925e20ae1e2f1cc14909fc45b127201f0e3f..2fb281e7f83637a5cbc90bd1d75358870c3c41eb 100644
--- a/README.md
+++ b/README.md
@@ -159,15 +159,20 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update
-- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md)
-- 👏🏻 2022.05.06: `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp`.
-- 👏🏻 2022.05.06: `Server` is available for `Speaker Verification`, and `Punctuation Restoration`.
-- 👏🏻 2022.04.28: `Streaming Server` is available for `Automatic Speech Recognition` and `Text-to-Speech`.
-- 👏🏻 2022.03.28: `Server` is available for `Audio Classification`, `Automatic Speech Recognition` and `Text-to-Speech`.
-- 👏🏻 2022.03.28: `CLI` is available for `Speaker Verification`.
+- ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example.
+- 🔥 2022.08.22: Add ERNIE-SAT models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat).
+- 🔥 2022.08.15: Add [g2pW](https://github.com/GitYCC/g2pW) into TTS Chinese Text Frontend.
+- 🔥 2022.08.09: Release [Chinese English mixed TTS](./examples/zh_en_tts/tts3).
+- ⚡ 2022.08.03: Add ONNXRuntime infer for TTS CLI.
+- 🎉 2022.07.18: Release VITS: [VITS-csmsc](./examples/csmsc/vits)、[VITS-aishell3](./examples/aishell3/vits)、[VITS-VC](./examples/aishell3/vits-vc).
+- 🎉 2022.06.22: All TTS models support ONNX format.
+- 🍀 2022.06.17: Add [PaddleSpeech Web Demo](./demos/speech_web).
+- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md).
+- 👏🏻 2022.05.06: `PaddleSpeech Streaming Server` is available for `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp` and `Text-to-Speech`.
+- 👏🏻 2022.05.06: `PaddleSpeech Server` is available for `Audio Classification`, `Automatic Speech Recognition` and `Text-to-Speech`, `Speaker Verification` and `Punctuation Restoration`.
+- 👏🏻 2022.03.28: `PaddleSpeech CLI` is available for `Speaker Verification`.
- 🤗 2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
-- 👏🏻 2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
-
+- 👏🏻 2021.12.10: `PaddleSpeech CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
### Community
- Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.
@@ -599,49 +604,56 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
- HiFiGAN |
- LJSpeech / VCTK / CSMSC / AISHELL-3 |
+ HiFiGAN |
+ LJSpeech / VCTK / CSMSC / AISHELL-3 |
HiFiGAN-ljspeech / HiFiGAN-vctk / HiFiGAN-csmsc / HiFiGAN-aishell3
|
- WaveRNN |
- CSMSC |
+ WaveRNN |
+ CSMSC |
WaveRNN-csmsc
|
- Voice Cloning |
+ Voice Cloning |
GE2E |
Librispeech, etc. |
- ge2e
+ GE2E
+ |
+
+
+ SV2TTS (GE2E + Tacotron2) |
+ AISHELL-3 |
+
+ VC0
|
- GE2E + Tacotron2 |
+ SV2TTS (GE2E + FastSpeech2) |
AISHELL-3 |
- ge2e-tacotron2-aishell3
+ VC1
|
- GE2E + FastSpeech2 |
+ SV2TTS (ECAPA-TDNN + FastSpeech2) |
AISHELL-3 |
- ge2e-fastspeech2-aishell3
+ VC2
|
GE2E + VITS |
AISHELL-3 |
- ge2e-vits-aishell3
+ VITS-VC
|
-
+
End-to-End |
VITS |
CSMSC / AISHELL-3 |
@@ -876,8 +888,9 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
## Acknowledgement
-- Many thanks to [david-95](https://github.com/david-95) improved TTS, fixed multi-punctuation bug, and contributed to multiple program and data.
-- Many thanks to [BarryKCL](https://github.com/BarryKCL) improved TTS Chinses frontend based on [G2PW](https://github.com/GitYCC/g2pW)
+- Many thanks to [HighCWu](https://github.com/HighCWu)for adding [VITS-aishell3](./examples/aishell3/vits) and [VITS-VC](./examples/aishell3/vits-vc) examples.
+- Many thanks to [david-95](https://github.com/david-95) improved TTS, fixed multi-punctuation bug, and contributed to multiple program and data.
+- Many thanks to [BarryKCL](https://github.com/BarryKCL) improved TTS Chinses frontend based on [G2PW](https://github.com/GitYCC/g2pW).
- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
- Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
diff --git a/README_cn.md b/README_cn.md
index 21cd00a99a7b85103ac85de63c6fa6ae8a9ac2ba..590124648bc6a133931b6248ebb9d07064781317 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -181,12 +181,20 @@
### 近期更新
-
+- ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。
+- 🔥 2022.08.22: 新增 ERNIE-SAT 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。
+- 🔥 2022.08.15: 将 [g2pW](https://github.com/GitYCC/g2pW) 引入 TTS 中文文本前端。
+- 🔥 2022.08.09: 发布[中英文混合 TTS](./examples/zh_en_tts/tts3)。
+- ⚡ 2022.08.03: TTS CLI 新增 ONNXRuntime 推理方式。
+- 🎉 2022.07.18: 发布 VITS 模型: [VITS-csmsc](./examples/csmsc/vits)、[VITS-aishell3](./examples/aishell3/vits)、[VITS-VC](./examples/aishell3/vits-vc)。
+- 🎉 2022.06.22: 所有 TTS 模型支持了 ONNX 格式。
+- 🍀 2022.06.17: 新增 [PaddleSpeech 网页应用](./demos/speech_web)。
- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
-- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。
-- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。
-- 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成,声纹验证。
-- 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
+- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线!覆盖了语音识别(标点恢复、时间戳)和语音合成。
+- 👏🏻 2022.05.06: PaddleSpeech Server 上线!覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。
+- 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成和声纹验证。
+- 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) 和 [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) 可在 Hugging Face Spaces 上体验!
+- 👏🏻 2021.12.10: PaddleSpeech CLI 支持语音分类, 语音识别, 语音翻译(英译中)和语音合成。
### 🔥 加入技术交流群获取入群福利
@@ -237,7 +245,6 @@ pip install .
## 快速开始
-
安装完成后,开发者可以通过命令行或者 Python 快速开始,命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试,支持 16k wav 格式音频。
你也可以在 `aistudio` 中快速体验 👉🏻[一键预测,快速上手 Speech 开发任务](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660878142250)。
@@ -624,34 +631,40 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- 声音克隆 |
+ 声音克隆 |
GE2E |
Librispeech, etc. |
- ge2e
+ GE2E
|
- GE2E + Tacotron2 |
+ SV2TTS (GE2E + Tacotron2) |
AISHELL-3 |
- ge2e-tacotron2-aishell3
+ VC0
|
- GE2E + FastSpeech2 |
+ SV2TTS (GE2E + FastSpeech2) |
AISHELL-3 |
- ge2e-fastspeech2-aishell3
+ VC1
|
- GE2E + VITS |
+ SV2TTS (ECAPA-TDNN + FastSpeech2) |
AISHELL-3 |
- ge2e-vits-aishell3
+ VC2
|
+
+ GE2E + VITS |
+ AISHELL-3 |
+
+ VITS-VC
+ |
端到端 |
@@ -896,8 +909,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
## 致谢
-- 非常感谢 [david-95](https://github.com/david-95)修复句尾多标点符号出错的问题,补充frontend语音polyphonic 数据,贡献补充多条程序和数据
-- 非常感谢 [BarryKCL](https://github.com/BarryKCL)基于[G2PW](https://github.com/GitYCC/g2pW)对TTS中文文本前端的优化。
+- 非常感谢 [HighCWu](https://github.com/HighCWu) 新增 [VITS-aishell3](./examples/aishell3/vits) 和 [VITS-VC](./examples/aishell3/vits-vc) 代码示例。
+- 非常感谢 [david-95](https://github.com/david-95) 修复句尾多标点符号出错的问题,贡献补充多条程序和数据。
+- 非常感谢 [BarryKCL](https://github.com/BarryKCL) 基于 [G2PW](https://github.com/GitYCC/g2pW) 对 TTS 中文文本前端的优化。
- 非常感谢 [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) 多年来的关注和建议,以及在诸多问题上的帮助。
- 非常感谢 [mymagicpower](https://github.com/mymagicpower) 采用PaddleSpeech 对 ASR 的[短语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk)及[长语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk)进行 Java 实现。
- 非常感谢 [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) 采用 PaddleSpeech 语音合成功能实现 Virtual Uploader(VUP)/Virtual YouTuber(VTuber) 虚拟主播。
diff --git a/demos/text_to_speech/README.md b/demos/text_to_speech/README.md
index 3288ecf2f07ddeb028a83b2f33b1f13a62975928..41dcf820b08cfbe894a8b68f16ae552ea73609c9 100644
--- a/demos/text_to_speech/README.md
+++ b/demos/text_to_speech/README.md
@@ -16,8 +16,8 @@ You can choose one way from easy, meduim and hard to install paddlespeech.
The input of this demo should be a text of the specific language that can be passed via argument.
### 3. Usage
- Command Line (Recommended)
+ The default acoustic model is `Fastspeech2`, and the default vocoder is `HiFiGAN`, the default inference method is dygraph inference.
- Chinese
- The default acoustic model is `Fastspeech2`, and the default vocoder is `Parallel WaveGAN`.
```bash
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!"
```
@@ -58,6 +58,20 @@ The input of this demo should be a text of the specific language that can be pas
paddlespeech tts --am fastspeech2_mix --voc pwgan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175_pwgan.wav
paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175.wav
```
+ - Use ONNXRuntime infer:
+ ```bash
+ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output default.wav --use_onnx True
+ paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output ss.wav --use_onnx True
+ paddlespeech tts --voc mb_melgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output mb.wav --use_onnx True
+ paddlespeech tts --voc pwgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_aishell3 --voc pwgan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_aishell3 --voc hifigan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_hifigan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_ljspeech --voc pwgan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_ljspeech --voc hifigan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_hifigan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_vctk --voc pwgan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_vctk --voc hifigan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_hifigan.wav --use_onnx True
+ ```
+
Usage:
```bash
@@ -80,6 +94,8 @@ The input of this demo should be a text of the specific language that can be pas
- `lang`: Language of tts task. Default: `zh`.
- `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
- `output`: Output wave filepath. Default: `output.wav`.
+ - `use_onnx`: whether to usen ONNXRuntime inference.
+ - `fs`: sample rate for ONNX models when use specified model files.
Output:
```bash
@@ -87,38 +103,50 @@ The input of this demo should be a text of the specific language that can be pas
```
- Python API
- ```python
- import paddle
- from paddlespeech.cli.tts import TTSExecutor
-
- tts_executor = TTSExecutor()
- wav_file = tts_executor(
- text='今天的天气不错啊',
- output='output.wav',
- am='fastspeech2_csmsc',
- am_config=None,
- am_ckpt=None,
- am_stat=None,
- spk_id=0,
- phones_dict=None,
- tones_dict=None,
- speaker_dict=None,
- voc='pwgan_csmsc',
- voc_config=None,
- voc_ckpt=None,
- voc_stat=None,
- lang='zh',
- device=paddle.get_device())
- print('Wave file has been generated: {}'.format(wav_file))
- ```
-
+ - Dygraph infer:
+ ```python
+ import paddle
+ from paddlespeech.cli.tts import TTSExecutor
+ tts_executor = TTSExecutor()
+ wav_file = tts_executor(
+ text='今天的天气不错啊',
+ output='output.wav',
+ am='fastspeech2_csmsc',
+ am_config=None,
+ am_ckpt=None,
+ am_stat=None,
+ spk_id=0,
+ phones_dict=None,
+ tones_dict=None,
+ speaker_dict=None,
+ voc='pwgan_csmsc',
+ voc_config=None,
+ voc_ckpt=None,
+ voc_stat=None,
+ lang='zh',
+ device=paddle.get_device())
+ print('Wave file has been generated: {}'.format(wav_file))
+ ```
+ - ONNXRuntime infer:
+ ```python
+ from paddlespeech.cli.tts import TTSExecutor
+ tts_executor = TTSExecutor()
+ wav_file = tts_executor(
+ text='对数据集进行预处理',
+ output='output.wav',
+ am='fastspeech2_csmsc',
+ voc='hifigan_csmsc',
+ lang='zh',
+ use_onnx=True,
+ cpu_threads=2)
+ ```
+
Output:
```bash
Wave file has been generated: output.wav
```
### 4. Pretrained Models
-
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
- Acoustic model
diff --git a/demos/text_to_speech/README_cn.md b/demos/text_to_speech/README_cn.md
index ec5eb5ae92d421c6fb3790e9df9ddd9480ae9026..4a4132238f63feb5fe86aa2f0821cc481cd99e6a 100644
--- a/demos/text_to_speech/README_cn.md
+++ b/demos/text_to_speech/README_cn.md
@@ -1,26 +1,23 @@
(简体中文|[English](./README.md))
# 语音合成
-
## 介绍
语音合成是一种自然语言建模过程,其将文本转换为语音以进行音频演示。
这个 demo 是一个从给定文本生成音频的实现,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
-
## 使用方法
### 1. 安装
请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
-你可以从 easy,medium,hard 三中方式中选择一种方式安装。
+你可以从 easy,medium,hard 三种方式中选择一种方式安装。
### 2. 准备输入
这个 demo 的输入是通过参数传递的特定语言的文本。
### 3. 使用方法
- 命令行 (推荐使用)
+ 默认的声学模型是 `Fastspeech2`,默认的声码器是 `HiFiGAN`,默认推理方式是动态图推理。
- 中文
-
- 默认的声学模型是 `Fastspeech2`,默认的声码器是 `Parallel WaveGAN`.
```bash
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!"
```
@@ -61,6 +58,19 @@
paddlespeech tts --am fastspeech2_mix --voc pwgan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175_pwgan.wav
paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175.wav
```
+ - 使用 ONNXRuntime 推理:
+ ```bash
+ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output default.wav --use_onnx True
+ paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output ss.wav --use_onnx True
+ paddlespeech tts --voc mb_melgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output mb.wav --use_onnx True
+ paddlespeech tts --voc pwgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_aishell3 --voc pwgan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_aishell3 --voc hifigan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_hifigan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_ljspeech --voc pwgan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_ljspeech --voc hifigan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_hifigan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_vctk --voc pwgan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_pwgan.wav --use_onnx True
+ paddlespeech tts --am fastspeech2_vctk --voc hifigan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_hifigan.wav --use_onnx True
+ ```
使用方法:
@@ -84,6 +94,8 @@
- `lang`:TTS 任务的语言, 默认值:`zh`。
- `device`:执行预测的设备, 默认值:当前系统下 paddlepaddle 的默认 device。
- `output`:输出音频的路径, 默认值:`output.wav`。
+ - `use_onnx`: 是否使用 ONNXRuntime 进行推理。
+ - `fs`: 使用特定 ONNX 模型时的采样率。
输出:
```bash
@@ -91,31 +103,44 @@
```
- Python API
- ```python
- import paddle
- from paddlespeech.cli.tts import TTSExecutor
-
- tts_executor = TTSExecutor()
- wav_file = tts_executor(
- text='今天的天气不错啊',
- output='output.wav',
- am='fastspeech2_csmsc',
- am_config=None,
- am_ckpt=None,
- am_stat=None,
- spk_id=0,
- phones_dict=None,
- tones_dict=None,
- speaker_dict=None,
- voc='pwgan_csmsc',
- voc_config=None,
- voc_ckpt=None,
- voc_stat=None,
- lang='zh',
- device=paddle.get_device())
- print('Wave file has been generated: {}'.format(wav_file))
- ```
-
+ - 动态图推理:
+ ```python
+ import paddle
+ from paddlespeech.cli.tts import TTSExecutor
+ tts_executor = TTSExecutor()
+ wav_file = tts_executor(
+ text='今天的天气不错啊',
+ output='output.wav',
+ am='fastspeech2_csmsc',
+ am_config=None,
+ am_ckpt=None,
+ am_stat=None,
+ spk_id=0,
+ phones_dict=None,
+ tones_dict=None,
+ speaker_dict=None,
+ voc='pwgan_csmsc',
+ voc_config=None,
+ voc_ckpt=None,
+ voc_stat=None,
+ lang='zh',
+ device=paddle.get_device())
+ print('Wave file has been generated: {}'.format(wav_file))
+ ```
+ - ONNXRuntime 推理:
+ ```python
+ from paddlespeech.cli.tts import TTSExecutor
+ tts_executor = TTSExecutor()
+ wav_file = tts_executor(
+ text='对数据集进行预处理',
+ output='output.wav',
+ am='fastspeech2_csmsc',
+ voc='hifigan_csmsc',
+ lang='zh',
+ use_onnx=True,
+ cpu_threads=2)
+ ```
+
输出:
```bash
Wave file has been generated: output.wav