@@ -10,7 +10,7 @@ This demo is an implementation of starting the voice service and accessing the s
...
@@ -10,7 +10,7 @@ This demo is an implementation of starting the voice service and accessing the s
### 1. Installation
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above.
It is recommended to use **paddlepaddle 2.2.2** or above.
You can choose one way from meduim and hard to install paddlespeech.
You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File
### 2. Prepare config File
...
@@ -18,6 +18,7 @@ The configuration file can be found in `conf/application.yaml` .
...
@@ -18,6 +18,7 @@ The configuration file can be found in `conf/application.yaml` .
Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
Currently the engine type supports two forms: python and inference (Paddle Inference)
Currently the engine type supports two forms: python and inference (Paddle Inference)
**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
@@ -10,7 +10,7 @@ This demo is an implementation of starting the streaming speech synthesis servic
...
@@ -10,7 +10,7 @@ This demo is an implementation of starting the streaming speech synthesis servic
### 1. Installation
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above.
It is recommended to use **paddlepaddle 2.2.2** or above.
You can choose one way from meduim and hard to install paddlespeech.
You can choose one way from meduim and hard to install paddlespeech.
...
@@ -29,6 +29,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
...
@@ -29,6 +29,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
- When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
- When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
- When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
-**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
### 3. Streaming speech synthesis server and client using http protocol
### 3. Streaming speech synthesis server and client using http protocol
...
@@ -120,6 +122,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
...
@@ -120,6 +122,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.
Output:
Output:
```bash
```bash
...
@@ -254,6 +257,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
...
@@ -254,6 +257,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.
First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started
First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started.
**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.