未验证 提交 663cfc01 编写于 作者: 小湉湉's avatar 小湉湉 提交者: GitHub

Merge pull request #2189 from yt605155624/fix_name_bug

[doc]update server readme
此差异已折叠。
此差异已折叠。
...@@ -8,7 +8,7 @@ http://0.0.0.0:8010/docs ...@@ -8,7 +8,7 @@ http://0.0.0.0:8010/docs
### 【POST】/asr/offline ### 【POST】/asr/offline
说明:上传16k,16bit wav文件,返回 offline 语音识别模型识别结果 说明:上传 16k, 16bit wav 文件,返回 offline 语音识别模型识别结果
返回: JSON 返回: JSON
...@@ -26,11 +26,11 @@ http://0.0.0.0:8010/docs ...@@ -26,11 +26,11 @@ http://0.0.0.0:8010/docs
### 【POST】/asr/offlinefile ### 【POST】/asr/offlinefile
说明:上传16k,16bit wav文件,返回 offline 语音识别模型识别结果 + wav数据的base64 说明:上传16k,16bit wav文件,返回 offline 语音识别模型识别结果 + wav 数据的 base64
返回: JSON 返回: JSON
前端接口: 音频文件识别(播放这段base64还原后记得添加wav头,采样率16k, int16,添加后才能播放) 前端接口: 音频文件识别(播放这段base64还原后记得添加 wav 头,采样率 16k, int16,添加后才能播放)
示例: 示例:
...@@ -48,7 +48,7 @@ http://0.0.0.0:8010/docs ...@@ -48,7 +48,7 @@ http://0.0.0.0:8010/docs
### 【POST】/asr/collectEnv ### 【POST】/asr/collectEnv
说明: 通过采集环境噪音,上传16k, int16 wav文件,来生成后台VAD的能量阈值, 返回阈值结果 说明: 通过采集环境噪音,上传 16k, int16 wav 文件,来生成后台 VAD 的能量阈值, 返回阈值结果
前端接口:ASR-环境采样 前端接口:ASR-环境采样
...@@ -64,9 +64,9 @@ http://0.0.0.0:8010/docs ...@@ -64,9 +64,9 @@ http://0.0.0.0:8010/docs
### 【GET】/asr/stopRecord ### 【GET】/asr/stopRecord
说明:通过 GET 请求 /asr/stopRecord, 后台停止接收 offlineStream 中通过 WS协议 上传的数据 说明:通过 GET 请求 /asr/stopRecord, 后台停止接收 offlineStream 中通过 WS 协议 上传的数据
前端接口:语音聊天-暂停录音(获取NLP,播放TTS时暂停) 前端接口:语音聊天-暂停录音(获取 NLP,播放 TTS 时暂停)
返回: JSON 返回: JSON
...@@ -80,9 +80,9 @@ http://0.0.0.0:8010/docs ...@@ -80,9 +80,9 @@ http://0.0.0.0:8010/docs
### 【GET】/asr/resumeRecord ### 【GET】/asr/resumeRecord
说明:通过 GET 请求 /asr/resumeRecord, 后台停止接收 offlineStream 中通过 WS协议 上传的数据 说明:通过 GET 请求 /asr/resumeRecord, 后台停止接收 offlineStream 中通过 WS 协议 上传的数据
前端接口:语音聊天-恢复录音(TTS播放完毕时,告诉后台恢复录音) 前端接口:语音聊天-恢复录音( TTS 播放完毕时,告诉后台恢复录音)
返回: JSON 返回: JSON
...@@ -100,16 +100,16 @@ http://0.0.0.0:8010/docs ...@@ -100,16 +100,16 @@ http://0.0.0.0:8010/docs
前端接口:语音聊天-开始录音,持续将麦克风语音传给后端,后端推送语音识别结果 前端接口:语音聊天-开始录音,持续将麦克风语音传给后端,后端推送语音识别结果
返回:后端返回识别结果,offline模型识别结果, 由WS推送 返回:后端返回识别结果,offline 模型识别结果, 由WS推送
### 【Websocket】/ws/asr/onlineStream ### 【Websocket】/ws/asr/onlineStream
说明:通过 WS 协议,将前端音频持续上传到后台,前端采集 16k,Int16 类型的PCM片段,持续上传到后端 说明:通过 WS 协议,将前端音频持续上传到后台,前端采集 16k,Int16 类型的 PCM 片段,持续上传到后端
前端接口:ASR-流式识别开始录音,持续将麦克风语音传给后端,后端推送语音识别结果 前端接口:ASR-流式识别开始录音,持续将麦克风语音传给后端,后端推送语音识别结果
返回:后端返回识别结果,online模型识别结果, 由WS推送 返回:后端返回识别结果,online 模型识别结果, 由 WS 推送
## NLP ## NLP
...@@ -202,7 +202,7 @@ http://0.0.0.0:8010/docs ...@@ -202,7 +202,7 @@ http://0.0.0.0:8010/docs
### 【POST】/tts/offline ### 【POST】/tts/offline
说明:获取TTS离线模型音频 说明:获取 TTS 离线模型音频
前端接口:TTS-端到端合成 前端接口:TTS-端到端合成
...@@ -272,7 +272,7 @@ curl -X 'POST' \ ...@@ -272,7 +272,7 @@ curl -X 'POST' \
### 【POST】/vpr/recog ### 【POST】/vpr/recog
说明:声纹识别,识别文件,提取文件的声纹信息做比对 音频 16k, int 16 wav格式 说明:声纹识别,识别文件,提取文件的声纹信息做比对 音频 16k, int 16 wav 格式
前端接口:声纹识别-上传音频,返回声纹识别结果 前端接口:声纹识别-上传音频,返回声纹识别结果
...@@ -383,9 +383,9 @@ curl -X 'GET' \ ...@@ -383,9 +383,9 @@ curl -X 'GET' \
### 【GET】/vpr/database64 ### 【GET】/vpr/database64
说明: 根据 vpr_id 获取用户vpr时注册使用音频转换成 16k, int16 类型的数组,返回base64编码 说明: 根据 vpr_id 获取用户 vpr 时注册使用音频转换成 16k, int16 类型的数组,返回 base64 编码
前端接口:声纹识别-获取vpr对应的音频(注意:播放时需要添加 wav头,16k,int16, 可参考tts播放时添加wav的方式,注意更改采样率) 前端接口:声纹识别-获取 vpr 对应的音频(注意:播放时需要添加 wav头,16k,int16, 可参考 tts 播放时添加 wav 的方式,注意更改采样率)
访问示例: 访问示例:
...@@ -401,6 +401,4 @@ curl -X 'GET' \ ...@@ -401,6 +401,4 @@ curl -X 'GET' \
"code": 0, "code": 0,
"result":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", "result":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
"message": "ok" "message": "ok"
``` ```
\ No newline at end of file
# Paddle Speech Demo # Paddle Speech Demo
PaddleSpeechDemo是一个以PaddleSpeech的语音交互功能为主体开发的Demo展示项目,用于帮助大家更好的上手PaddleSpeech以及使用PaddleSpeech构建自己的应用。 PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目,用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
智能语音交互部分使用PaddleSpeech,对话以及信息抽取部分使用PaddleNLP,网页前端展示部分基于Vue3进行开发 智能语音交互部分使用 PaddleSpeech,对话以及信息抽取部分使用 PaddleNLP,网页前端展示部分基于 Vue3 进行开发
主要功能: 主要功能:
+ 语音聊天:PaddleSpeech的语音识别能力+语音合成能力,对话部分基于PaddleNLP的闲聊功能 + 语音聊天:PaddleSpeech 的语音识别能力+语音合成能力,对话部分基于 PaddleNLP 的闲聊功能
+ 声纹识别:PaddleSpeech的声纹识别功能展示 + 声纹识别:PaddleSpeech 的声纹识别功能展示
+ 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式 + 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式
+ 语音合成:支持【流式合成】与【端到端合成】两种方式 + 语音合成:支持【流式合成】与【端到端合成】两种方式
+ 语音指令:基于PaddleSpeech的语音识别能力与PaddleNLP的信息抽取,实现交通费的智能报销 + 语音指令:基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取,实现交通费的智能报销
运行效果: 运行效果:
...@@ -32,23 +32,21 @@ cd model ...@@ -32,23 +32,21 @@ cd model
wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
``` ```
### 前端环境安装 ### 前端环境安装
前端依赖node.js ,需要提前安装,确保npm可用,npm测试版本8.3.1,建议下载[官网](https://nodejs.org/en/)稳定版的node.js 前端依赖 `node.js` ,需要提前安装,确保 `npm` 可用,`npm` 测试版本 `8.3.1`,建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js`
``` ```
# 进入前端目录 # 进入前端目录
cd web_client cd web_client
# 安装yarn,已经安装可跳过 # 安装 `yarn`,已经安装可跳过
npm install -g yarn npm install -g yarn
# 使用yarn安装前端依赖 # 使用yarn安装前端依赖
yarn install yarn install
``` ```
## 启动服务 ## 启动服务
### 开启后端服务 ### 开启后端服务
...@@ -66,18 +64,18 @@ cd web_client ...@@ -66,18 +64,18 @@ cd web_client
yarn dev --port 8011 yarn dev --port 8011
``` ```
默认配置下,前端中配置的后台地址信息是localhost,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的FAQ:【后端如果部署在其它机器或者别的端口如何修改】 默认配置下,前端中配置的后台地址信息是 localhost,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的 FAQ:【后端如果部署在其它机器或者别的端口如何修改】
## FAQ ## FAQ
#### Q: 如何安装node.js #### Q: 如何安装node.js
A: node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nodejs/nodejs-install-setup.html), 确保npm可用 A: node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nodejs/nodejs-install-setup.html), 确保 npm 可用
#### Q:后端如果部署在其它机器或者别的端口如何修改 #### Q:后端如果部署在其它机器或者别的端口如何修改
A:后端的配置地址有分散在两个文件中 A:后端的配置地址有分散在两个文件中
修改第一个文件`PaddleSpeechWebClient/vite.config.js` 修改第一个文件 `PaddleSpeechWebClient/vite.config.js`
``` ```
server: { server: {
...@@ -92,7 +90,7 @@ server: { ...@@ -92,7 +90,7 @@ server: {
} }
``` ```
修改第二个文件`PaddleSpeechWebClient/src/api/API.js`(Websocket代理配置失败,所以需要在这个文件中修改) 修改第二个文件 `PaddleSpeechWebClient/src/api/API.js`( Websocket 代理配置失败,所以需要在这个文件中修改)
``` ```
// websocket (这里改成后端所在的接口) // websocket (这里改成后端所在的接口)
...@@ -107,9 +105,6 @@ A:这里主要是游览器安全策略的限制,需要配置游览器后重 ...@@ -107,9 +105,6 @@ A:这里主要是游览器安全策略的限制,需要配置游览器后重
chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure
## 参考资料 ## 参考资料
vue实现录音参考资料:https://blog.csdn.net/qq_41619796/article/details/107865602#t1 vue实现录音参考资料:https://blog.csdn.net/qq_41619796/article/details/107865602#t1
......
...@@ -5,15 +5,19 @@ ...@@ -5,15 +5,19 @@
## Introduction ## Introduction
This demo is an implementation of starting the streaming speech synthesis service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python. This demo is an implementation of starting the streaming speech synthesis service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
For service interface definition, please check:
- [PaddleSpeech Server RESTful API](https://github.com/PaddlePaddle/PaddleSpeech/wiki/PaddleSpeech-Server-RESTful-API)
- [PaddleSpeech Streaming Server WebSocket API](https://github.com/PaddlePaddle/PaddleSpeech/wiki/PaddleSpeech-Server-WebSocket-API)
## Usage ## Usage
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.2** or above. It is recommended to use **paddlepaddle 2.3.1** or above.
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, meduim and hard to install paddlespeech.
**If you install in simple mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.**
**If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.**
### 2. Prepare config File ### 2. Prepare config File
The configuration file can be found in `conf/tts_online_application.yaml`. The configuration file can be found in `conf/tts_online_application.yaml`.
...@@ -29,11 +33,10 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -29,11 +33,10 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- Both hifigan and mb_melgan support streaming voc inference. - Both hifigan and mb_melgan support streaming voc inference.
- When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal. - When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
- When the voc model is hifigan, when voc_pad=19, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing. - When the voc model is hifigan, when voc_pad=19, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
- Pad calculation method of streaming vocoder in PaddleSpeech: [AIStudio tutorial](https://aistudio.baidu.com/aistudio/projectdetail/4151335)
- Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan - Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
- **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address. - **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
### 3. Streaming speech synthesis server and client using http protocol ### 3. Streaming speech synthesis server and client using http protocol
#### 3.1 Server Usage #### 3.1 Server Usage
- Command Line (Recommended) - Command Line (Recommended)
...@@ -53,7 +56,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -53,7 +56,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `log_file`: log file. Default: ./log/paddlespeech.log - `log_file`: log file. Default: ./log/paddlespeech.log
Output: Output:
```bash ```text
[2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s [2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
[2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s [2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
[2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s [2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
...@@ -79,8 +82,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -79,8 +82,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
log_file="./log/paddlespeech.log") log_file="./log/paddlespeech.log")
``` ```
Output: Output:
```bash ```text
[2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s [2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s
[2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s [2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
[2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s [2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
...@@ -93,8 +96,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -93,8 +96,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
#### 3.2 Streaming TTS client Usage #### 3.2 Streaming TTS client Usage
...@@ -125,7 +126,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -125,7 +126,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- Currently, only the single-speaker model is supported in the code, so `spk_id` does not take effect. Streaming TTS does not support changing sample rate, variable speed and volume. - Currently, only the single-speaker model is supported in the code, so `spk_id` does not take effect. Streaming TTS does not support changing sample rate, variable speed and volume.
Output: Output:
```bash ```text
[2022-04-24 21:08:18,559] [ INFO] - tts http client start [2022-04-24 21:08:18,559] [ INFO] - tts http client start
[2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 [2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-24 21:08:21,703] [ INFO] - 首包响应:0.18863153457641602 s [2022-04-24 21:08:21,703] [ INFO] - 首包响应:0.18863153457641602 s
...@@ -154,7 +155,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -154,7 +155,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
``` ```
Output: Output:
```bash ```text
[2022-04-24 21:11:13,798] [ INFO] - tts http client start [2022-04-24 21:11:13,798] [ INFO] - tts http client start
[2022-04-24 21:11:16,800] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 [2022-04-24 21:11:16,800] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-24 21:11:16,801] [ INFO] - 首包响应:0.18234872817993164 s [2022-04-24 21:11:16,801] [ INFO] - 首包响应:0.18234872817993164 s
...@@ -164,7 +165,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -164,7 +165,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav [2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav
``` ```
### 4. Streaming speech synthesis server and client using websocket protocol ### 4. Streaming speech synthesis server and client using websocket protocol
#### 4.1 Server Usage #### 4.1 Server Usage
- Command Line (Recommended) - Command Line (Recommended)
...@@ -184,21 +184,19 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -184,21 +184,19 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `log_file`: log file. Default: ./log/paddlespeech.log - `log_file`: log file. Default: ./log/paddlespeech.log
Output: Output:
```bash ```text
[2022-04-27 10:18:09,107] [ INFO] - The first response time of the 0 warm up: 1.1551103591918945 s [2022-04-27 10:18:09,107] [ INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
[2022-04-27 10:18:09,219] [ INFO] - The first response time of the 1 warm up: 0.11204338073730469 s [2022-04-27 10:18:09,219] [ INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
[2022-04-27 10:18:09,324] [ INFO] - The first response time of the 2 warm up: 0.1051797866821289 s [2022-04-27 10:18:09,324] [ INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
[2022-04-27 10:18:09,325] [ INFO] - ********************************************************************** [2022-04-27 10:18:09,325] [ INFO] - **********************************************************************
INFO: Started server process [17600] INFO: Started server process [17600]
[2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600] [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
- Python API - Python API
...@@ -212,20 +210,19 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -212,20 +210,19 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
``` ```
Output: Output:
```bash ```text
[2022-04-27 10:20:16,660] [ INFO] - The first response time of the 0 warm up: 1.0945196151733398 s [2022-04-27 10:20:16,660] [ INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
[2022-04-27 10:20:16,773] [ INFO] - The first response time of the 1 warm up: 0.11222052574157715 s [2022-04-27 10:20:16,773] [ INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
[2022-04-27 10:20:16,878] [ INFO] - The first response time of the 2 warm up: 0.10494542121887207 s [2022-04-27 10:20:16,878] [ INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
[2022-04-27 10:20:16,878] [ INFO] - ********************************************************************** [2022-04-27 10:20:16,878] [ INFO] - **********************************************************************
INFO: Started server process [23466] INFO: Started server process [23466]
[2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466] [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
#### 4.2 Streaming TTS client Usage #### 4.2 Streaming TTS client Usage
...@@ -258,7 +255,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -258,7 +255,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
Output: Output:
```bash ```text
[2022-04-27 10:21:04,262] [ INFO] - tts websocket client start [2022-04-27 10:21:04,262] [ INFO] - tts websocket client start
[2022-04-27 10:21:04,496] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 [2022-04-27 10:21:04,496] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:21:04,496] [ INFO] - 首包响应:0.2124948501586914 s [2022-04-27 10:21:04,496] [ INFO] - 首包响应:0.2124948501586914 s
...@@ -266,7 +263,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -266,7 +263,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-27 10:21:07,484] [ INFO] - 音频时长:3.825 s [2022-04-27 10:21:07,484] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:21:07,484] [ INFO] - RTF: 0.8363677006141812 [2022-04-27 10:21:07,484] [ INFO] - RTF: 0.8363677006141812
[2022-04-27 10:21:07,516] [ INFO] - 音频保存至:output.wav [2022-04-27 10:21:07,516] [ INFO] - 音频保存至:output.wav
``` ```
- Python API - Python API
...@@ -283,21 +279,15 @@ The configuration file can be found in `conf/tts_online_application.yaml`. ...@@ -283,21 +279,15 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
spk_id=0, spk_id=0,
output="./output.wav", output="./output.wav",
play=False) play=False)
``` ```
Output: Output:
```bash ```text
[2022-04-27 10:22:48,852] [ INFO] - tts websocket client start [2022-04-27 10:22:48,852] [ INFO] - tts websocket client start
[2022-04-27 10:22:49,080] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 [2022-04-27 10:22:49,080] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:22:49,080] [ INFO] - 首包响应:0.21017956733703613 s [2022-04-27 10:22:49,080] [ INFO] - 首包响应:0.21017956733703613 s
[2022-04-27 10:22:52,100] [ INFO] - 尾包响应:3.2304444313049316 s [2022-04-27 10:22:52,100] [ INFO] - 尾包响应:3.2304444313049316 s
[2022-04-27 10:22:52,101] [ INFO] - 音频时长:3.825 s [2022-04-27 10:22:52,101] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:22:52,101] [ INFO] - RTF: 0.8445606356352762 [2022-04-27 10:22:52,101] [ INFO] - RTF: 0.8445606356352762
[2022-04-27 10:22:52,134] [ INFO] - 音频保存至:./output.wav [2022-04-27 10:22:52,134] [ INFO] - 音频保存至:./output.wav
``` ```
...@@ -3,15 +3,19 @@ ...@@ -3,15 +3,19 @@
# 流式语音合成服务 # 流式语音合成服务
## 介绍 ## 介绍
这个demo是一个启动流式语音合成服务和访问该服务的实现。 它可以通过使用`paddlespeech_server``paddlespeech_client`的单个命令或 python 的几行代码来实现。 这个 demo 是一个启动流式语音合成服务和访问该服务的实现。 它可以通过使用 `paddlespeech_server``paddlespeech_client` 的单个命令或 python 的几行代码来实现。
服务接口定义请参考:
- [PaddleSpeech Server RESTful API](https://github.com/PaddlePaddle/PaddleSpeech/wiki/PaddleSpeech-Server-RESTful-API)
- [PaddleSpeech Streaming Server WebSocket API](https://github.com/PaddlePaddle/PaddleSpeech/wiki/PaddleSpeech-Server-WebSocket-API)
## 使用方法 ## 使用方法
### 1. 安装 ### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). 请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
推荐使用 **paddlepaddle 2.2.2** 或以上版本。 推荐使用 **paddlepaddle 2.3.1** 或以上版本。
你可以从简单,中等,困难几种方式中选择一种方式安装 PaddleSpeech。
你可以从简单,中等,困难 几种方式中选择一种方式安装 PaddleSpeech。
**如果使用简单模式安装,需要自行准备 yaml 文件,可参考 conf 目录下的 yaml 文件。** **如果使用简单模式安装,需要自行准备 yaml 文件,可参考 conf 目录下的 yaml 文件。**
### 2. 准备配置文件 ### 2. 准备配置文件
...@@ -20,19 +24,20 @@ ...@@ -20,19 +24,20 @@
- `engine_list` 表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型> - `engine_list` 表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>
- 该 demo 主要介绍流式语音合成服务,因此语音任务应设置为 tts。 - 该 demo 主要介绍流式语音合成服务,因此语音任务应设置为 tts。
- 目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎;**online-onnx** 表示使用 onnxruntime 进行推理的引擎。其中,online-onnx 的推理速度更快。 - 目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎;**online-onnx** 表示使用 onnxruntime 进行推理的引擎。其中,online-onnx 的推理速度更快。
- 流式 TTS 引擎的 AM 模型支持:**fastspeech2 以及fastspeech2_cnndecoder**; Voc 模型支持:**hifigan, mb_melgan** - 流式 TTS 引擎的 AM 模型支持:**fastspeech2 以及 fastspeech2_cnndecoder**; Voc 模型支持:**hifigan, mb_melgan**
- 流式 am 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `am_block` 表示 chunk 中的有效帧数,`am_pad` 表示一个 chunk 中 am_block 前后各加的帧数。am_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。 - 流式 am 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `am_block` 表示 chunk 中的有效帧数,`am_pad` 表示一个 chunk 中 am_block 前后各加的帧数。am_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- fastspeech2 不支持流式 am 推理,因此 am_pad 与 m_block 对它无效 - fastspeech2 不支持流式 am 推理,因此 am_pad 与 am_block 对它无效
- fastspeech2_cnndecoder 支持流式推理,当 am_pad=12 时,流式推理合成音频与非流式合成音频一致 - fastspeech2_cnndecoder 支持流式推理,当 am_pad=12 时,流式推理合成音频与非流式合成音频一致
- 流式 voc 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `voc_block` 表示chunk中的有效帧数,`voc_pad` 表示一个 chunk 中 voc_block 前后各加的帧数。voc_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。 - 流式 voc 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `voc_block` 表示 chunk 中的有效帧数,`voc_pad` 表示一个 chunk 中 voc_block 前后各加的帧数。voc_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- hifigan, mb_melgan 均支持流式 voc 推理 - hifigan, mb_melgan 均支持流式 voc 推理
- 当 voc 模型为 mb_melgan,当 voc_pad=14 时,流式推理合成音频与非流式合成音频一致;voc_pad 最小可以设置为7,合成音频听感上没有异常,若 voc_pad 小于7,合成音频听感上存在异常。 - 当 voc 模型为 mb_melgan,当 voc_pad=14 时,流式推理合成音频与非流式合成音频一致;voc_pad 最小可以设置为7,合成音频听感上没有异常,若 voc_pad 小于7,合成音频听感上存在异常。
- 当 voc 模型为 hifigan,当 voc_pad=19 时,流式推理合成音频与非流式合成音频一致;当 voc_pad=14 时,合成音频听感上没有异常。 - 当 voc 模型为 hifigan,当 voc_pad=19 时,流式推理合成音频与非流式合成音频一致;当 voc_pad=14 时,合成音频听感上没有异常。
- PaddleSpeech 中流式声码器 Pad 计算方法: [AIStudio 教程](https://aistudio.baidu.com/aistudio/projectdetail/4151335)
- 推理速度:mb_melgan > hifigan; 音频质量:mb_melgan < hifigan - 推理速度:mb_melgan > hifigan; 音频质量:mb_melgan < hifigan
- **注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。 - **注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。
### 3. 使用http协议的流式语音合成服务端及客户端使用方法 ### 3. 使用 http 协议的流式语音合成服务端及客户端使用方法
#### 3.1 服务端使用方法 #### 3.1 服务端使用方法
- 命令行 (推荐使用) - 命令行 (推荐使用)
...@@ -51,7 +56,7 @@ ...@@ -51,7 +56,7 @@
- `log_file`: log 文件. 默认:./log/paddlespeech.log - `log_file`: log 文件. 默认:./log/paddlespeech.log
输出: 输出:
```bash ```text
[2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s [2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
[2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s [2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
[2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s [2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
...@@ -64,7 +69,6 @@ ...@@ -64,7 +69,6 @@
[2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete. [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
- Python API - Python API
...@@ -77,8 +81,8 @@ ...@@ -77,8 +81,8 @@
log_file="./log/paddlespeech.log") log_file="./log/paddlespeech.log")
``` ```
输出 输出:
```bash ```text
[2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s [2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s
[2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s [2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
[2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s [2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
...@@ -91,8 +95,6 @@ ...@@ -91,8 +95,6 @@
[2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
#### 3.2 客户端使用方法 #### 3.2 客户端使用方法
...@@ -124,7 +126,7 @@ ...@@ -124,7 +126,7 @@
输出: 输出:
```bash ```text
[2022-04-24 21:08:18,559] [ INFO] - tts http client start [2022-04-24 21:08:18,559] [ INFO] - tts http client start
[2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 [2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-24 21:08:21,703] [ INFO] - 首包响应:0.18863153457641602 s [2022-04-24 21:08:21,703] [ INFO] - 首包响应:0.18863153457641602 s
...@@ -162,9 +164,8 @@ ...@@ -162,9 +164,8 @@
[2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238 [2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238
[2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav [2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav
``` ```
### 4. 使用websocket协议的流式语音合成服务端及客户端使用方法 ### 4. 使用 websocket 协议的流式语音合成服务端及客户端使用方法
#### 4.1 服务端使用方法 #### 4.1 服务端使用方法
- 命令行 (推荐使用) - 命令行 (推荐使用)
首先修改配置文件 `conf/tts_online_application.yaml`**将 `protocol` 设置为 `websocket`** 首先修改配置文件 `conf/tts_online_application.yaml`**将 `protocol` 设置为 `websocket`**
...@@ -183,21 +184,19 @@ ...@@ -183,21 +184,19 @@
- `log_file`: log 文件. 默认:./log/paddlespeech.log - `log_file`: log 文件. 默认:./log/paddlespeech.log
输出: 输出:
```bash ```text
[2022-04-27 10:18:09,107] [ INFO] - The first response time of the 0 warm up: 1.1551103591918945 s [2022-04-27 10:18:09,107] [ INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
[2022-04-27 10:18:09,219] [ INFO] - The first response time of the 1 warm up: 0.11204338073730469 s [2022-04-27 10:18:09,219] [ INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
[2022-04-27 10:18:09,324] [ INFO] - The first response time of the 2 warm up: 0.1051797866821289 s [2022-04-27 10:18:09,324] [ INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
[2022-04-27 10:18:09,325] [ INFO] - ********************************************************************** [2022-04-27 10:18:09,325] [ INFO] - **********************************************************************
INFO: Started server process [17600] INFO: Started server process [17600]
[2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600] [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
- Python API - Python API
...@@ -210,27 +209,26 @@ ...@@ -210,27 +209,26 @@
log_file="./log/paddlespeech.log") log_file="./log/paddlespeech.log")
``` ```
输出: 输出:
```bash ```text
[2022-04-27 10:20:16,660] [ INFO] - The first response time of the 0 warm up: 1.0945196151733398 s [2022-04-27 10:20:16,660] [ INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
[2022-04-27 10:20:16,773] [ INFO] - The first response time of the 1 warm up: 0.11222052574157715 s [2022-04-27 10:20:16,773] [ INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
[2022-04-27 10:20:16,878] [ INFO] - The first response time of the 2 warm up: 0.10494542121887207 s [2022-04-27 10:20:16,878] [ INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
[2022-04-27 10:20:16,878] [ INFO] - ********************************************************************** [2022-04-27 10:20:16,878] [ INFO] - **********************************************************************
INFO: Started server process [23466] INFO: Started server process [23466]
[2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466] [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
#### 4.2 客户端使用方法 #### 4.2 客户端使用方法
- 命令行 (推荐使用) - 命令行 (推荐使用)
访问 websocket 流式TTS服务: 访问 websocket 流式 TTS 服务:
若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
...@@ -255,9 +253,8 @@ ...@@ -255,9 +253,8 @@
- 目前代码中只支持单说话人的模型,因此 spk_id 的选择并不生效。流式 TTS 不支持更换采样率,变速和变音量等功能。 - 目前代码中只支持单说话人的模型,因此 spk_id 的选择并不生效。流式 TTS 不支持更换采样率,变速和变音量等功能。
输出: 输出:
```bash ```text
[2022-04-27 10:21:04,262] [ INFO] - tts websocket client start [2022-04-27 10:21:04,262] [ INFO] - tts websocket client start
[2022-04-27 10:21:04,496] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 [2022-04-27 10:21:04,496] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:21:04,496] [ INFO] - 首包响应:0.2124948501586914 s [2022-04-27 10:21:04,496] [ INFO] - 首包响应:0.2124948501586914 s
...@@ -265,7 +262,6 @@ ...@@ -265,7 +262,6 @@
[2022-04-27 10:21:07,484] [ INFO] - 音频时长:3.825 s [2022-04-27 10:21:07,484] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:21:07,484] [ INFO] - RTF: 0.8363677006141812 [2022-04-27 10:21:07,484] [ INFO] - RTF: 0.8363677006141812
[2022-04-27 10:21:07,516] [ INFO] - 音频保存至:output.wav [2022-04-27 10:21:07,516] [ INFO] - 音频保存至:output.wav
``` ```
- Python API - Python API
...@@ -282,11 +278,10 @@ ...@@ -282,11 +278,10 @@
spk_id=0, spk_id=0,
output="./output.wav", output="./output.wav",
play=False) play=False)
``` ```
输出: 输出:
```bash ```text
[2022-04-27 10:22:48,852] [ INFO] - tts websocket client start [2022-04-27 10:22:48,852] [ INFO] - tts websocket client start
[2022-04-27 10:22:49,080] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 [2022-04-27 10:22:49,080] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:22:49,080] [ INFO] - 首包响应:0.21017956733703613 s [2022-04-27 10:22:49,080] [ INFO] - 首包响应:0.21017956733703613 s
...@@ -294,8 +289,4 @@ ...@@ -294,8 +289,4 @@
[2022-04-27 10:22:52,101] [ INFO] - 音频时长:3.825 s [2022-04-27 10:22:52,101] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:22:52,101] [ INFO] - RTF: 0.8445606356352762 [2022-04-27 10:22:52,101] [ INFO] - RTF: 0.8445606356352762
[2022-04-27 10:22:52,134] [ INFO] - 音频保存至:./output.wav [2022-04-27 10:22:52,134] [ INFO] - 音频保存至:./output.wav
``` ```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册