paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
**Batch Process**
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
```
**Shell Pipeline**
- ASR + Punctuation Restoration
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
```
For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
...
...
@@ -561,6 +578,9 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
- Many thanks to [745165806](https://github.com/745165806)/[PaddleSpeechTask](https://github.com/745165806/PaddleSpeechTask) for contributing Punctuation Restoration model.
- Many thanks to [kslz](https://github.com/745165806) for supplementary Chinese documents.
- Many thanks to [awmmmm](https://github.com/awmmmm) for contributing fastspeech2 aishell3 conformer pretrained model.
- Many thanks to [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) for developing a dubbing tool with GUI based on PaddleSpeech TTS model.
- Many thanks to [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) for developing a GUI tool based on PaddleSpeech TTS and code for making datasets from videos based on PaddleSpeech ASR.
Besides, PaddleSpeech depends on a lot of open source repositories. See [references](./docs/source/reference.md) for more information.
@@ -10,11 +10,23 @@ This demo is an implementation of starting the voice service and accessing the s
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above.
You can choose one way from easy, meduim and hard to install paddlespeech.
### 2. Prepare config File
The configuration file contains the service-related configuration files and the model configuration related to the voice tasks contained in the service. They are all under the `conf` folder.
**Note: The configuration of `engine_backend` in `application.yaml` represents all speech tasks included in the started service.**
If the service you want to start contains only a certain speech task, then you need to comment out the speech tasks that do not need to be included. For example, if you only want to use the speech recognition (ASR) service, then you can comment out the speech synthesis (TTS) service, as in the following example:
```bash
engine_backend:
asr: 'conf/asr/asr.yaml'
#tts: 'conf/tts/tts.yaml'
```
**Note: The configuration file of `engine_backend` in `application.yaml` needs to match the configuration type of `engine_type`.**
When the configuration file of `engine_backend` is `XXX.yaml`, the configuration type of `engine_type` needs to be set to `python`; when the configuration file of `engine_backend` is `XXX_pd.yaml`, the configuration of `engine_type` needs to be set type is `inference`;
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for thisASR client demo that can be downloaded:
Save synthesized audio successfully on ./output.wav.
Audio duration: 3.612500 s.
Response time: 0.388317 s.
RTF: 0.107493
```
## Pretrained Models
## Models supported by the service
### ASR model
Here is a list of [ASR pretrained models](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README.md#4pretrained-models) released by PaddleSpeech, both command line and python interfaces are available:
| Model | Language | Sample Rate
| :--- | :---: | :---: |
| conformer_wenetspeech| zh| 16000
| transformer_librispeech| en| 16000
Get all models supported by the ASR service via `paddlespeech_server stats --task asr`, where static models can be used for paddle inference inference.
### TTS model
Here is a list of [TTS pretrained models](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/text_to_speech/README.md#4-pretrained-models) released by PaddleSpeech, both command line and python interfaces are available:
- Acoustic model
| Model | Language
| :--- | :---: |
| speedyspeech_csmsc| zh
| fastspeech2_csmsc| zh
| fastspeech2_aishell3| zh
| fastspeech2_ljspeech| en
| fastspeech2_vctk| en
- Vocoder
| Model | Language
| :--- | :---: |
| pwgan_csmsc| zh
| pwgan_aishell3| zh
| pwgan_ljspeech| en
| pwgan_vctk| en
| mb_melgan_csmsc| zh
Here is a list of **TTS pretrained static models** released by PaddleSpeech, both command line and python interfaces are available:
- Acoustic model
| Model | Language
| :--- | :---: |
| speedyspeech_csmsc| zh
| fastspeech2_csmsc| zh
- Vocoder
| Model | Language
| :--- | :---: |
| pwgan_csmsc| zh
| mb_melgan_csmsc| zh
| hifigan_csmsc| zh
Get all models supported by the TTS service via `paddlespeech_server stats --task tts`, where static models can be used for paddle inference inference.
"2022-02-25 11:34:34.143 | INFO | paddlespeech.s2t.modules.loss:__init__:41 - CTCLoss Loss reduction: sum, div-bs: True\n",
"2022-02-25 11:34:34.143 | INFO | paddlespeech.s2t.modules.loss:__init__:42 - CTCLoss Grad Norm Type: instance\n",
"2022-02-25 11:34:34.144 | INFO | paddlespeech.s2t.modules.loss:__init__:73 - CTCLoss() kwargs:{'norm_by_times': True}, not support: {'norm_by_batchsize': False, 'norm_by_total_logits_len': False}\n",
"loss 159.17205810546875\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:253: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int32, the right dtype will convert to paddle.float32\n",
9.`--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
Pretrained FastSpeech2 model with no silence in the edge of audios. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
Pretrained FastSpeech2 model with no silence in the edge of audios:
-[fastspeech2_conformer_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_aishell3_ckpt_0.2.0.zip)(Thanks for [@awmmmm](https://github.com/awmmmm)'s contribution)