diff --git a/README.md b/README.md
index 379550cee4ea66b9ff7b48ed5d74a266731cd55e..2ade8a69ce0ad8cc9b9af77b76e87c9ba5e90b7b 100644
--- a/README.md
+++ b/README.md
@@ -1,19 +1,10 @@
([简体中文](./README_cn.md)|English)
+
+
-
-
-------------------------------------------------------------------------------------
-
@@ -28,6 +19,20 @@
+
+
+
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
@@ -142,47 +147,40 @@ For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech sample
-### ⭐ Examples
-- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
-
-
-
-- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
-
-- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
-
-
-

-
-
-### 🔥 Hot Activities
-
-- 2021.12.21~12.24
-
- 4 Days Live Courses: Depth interpretation of PaddleSpeech!
-
- **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
### Features
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
-- 📦 **Ease of Use**: low barriers to install, and [CLI](#quick-start) is available to quick-start your journey.
+- 📦 **Ease of Use**: low barriers to install, [CLI](#quick-start), [Server](#quick-start-server), and [Streaming Server](#quick-start-streaming-server) is available to quick-start your journey.
- 🏆 **Align to the State-of-the-Art**: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
+- 🏆 **Streaming ASR and TTS System**: we provide production ready streaming asr and streaming tts system.
- 💯 **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
-- **Varieties of Functions that Vitalize both Industrial and Academia**:
- - 🛎️ *Implementation of critical audio tasks*: this toolkit contains audio functions like Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc.
+- 📦 **Varieties of Functions that Vitalize both Industrial and Academia**:
+ - 🛎️ *Implementation of critical audio tasks*: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verfication, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
- 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update
+- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md)
+- 👏🏻 2022.05.06: `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp`.
+- 👏🏻 2022.05.06: `Server` is available for `Speaker Verification`, and `Punctuation Restoration`.
+- 👏🏻 2022.04.28: `Streaming Server` is available for `Automatic Speech Recognition` and `Text-to-Speech`.
+- 👏🏻 2022.03.28: `Server` is available for `Audio Classification`, `Automatic Speech Recognition` and `Text-to-Speech`.
+- 👏🏻 2022.03.28: `CLI` is available for `Speaker Verification`.
+- 🤗 2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
+- 👏🏻 2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
+
+### 🔥 Hot Activities
-- 👏🏻 2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech.
-- 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verification.
-- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
-- 👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.
+
+- 2021.12.21~12.24
+
+ 4 Days Live Courses: Depth interpretation of PaddleSpeech!
+
+ **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
### Community
- Scan the QR code below with your Wechat (reply【语音】after your friend's application is approved), you can access to official technical exchange group. Look forward to your participation.
@@ -196,6 +194,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*.
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
+
## Quick Start
@@ -238,7 +237,7 @@ paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --ou
**Batch Process**
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
-```
+```
**Shell Pipeline**
- ASR + Punctuation Restoration
@@ -257,16 +256,19 @@ If you want to try more functions like training and tuning, please have a look a
Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).
**Start server**
+
```shell
paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml
```
**Access Speech Recognition Services**
+
```shell
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
```
**Access Text to Speech Services**
+
```shell
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
@@ -280,6 +282,37 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
+
+## Quick Start Streaming Server
+
+Developers can have a try of [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) server.
+
+**Start Streaming Speech Recognition Server**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
+```
+
+**Access Streaming Speech Recognition Services**
+
+```
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
+```
+
+**Start Streaming Text to Speech Server**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
+```
+
+**Access Streaming Text to Speech Services**
+
+```
+paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
+```
+
+For more information please see: [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md)
+
## Model List
@@ -296,7 +329,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
Speech-to-Text Module Type |
Dataset |
Model Type |
- Link |
+ Example |
@@ -371,7 +404,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
Text-to-Speech Module Type |
Model Type |
Dataset |
- Link |
+ Example |
@@ -489,7 +522,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
Task |
Dataset |
Model Type |
- Link |
+ Example |
@@ -514,7 +547,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
Task |
Dataset |
Model Type |
- Link |
+ Example |
@@ -539,7 +572,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
Task |
Dataset |
Model Type |
- Link |
+ Example |
@@ -589,6 +622,21 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
The Text-to-Speech module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with this repository. If you are interested in academic research about this task, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) is a good guideline for the pipeline components.
+
+## ⭐ Examples
+- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
+
+
+
+- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
+
+- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
+
+
+

+
+
+
## Citation
To cite PaddleSpeech for research, please use the following format.
@@ -655,7 +703,6 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
## Acknowledgement
-
- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
- Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
diff --git a/README_cn.md b/README_cn.md
index 228d5d783dcddf0bf491c2a1334a7c0922e7c5f0..f5ba93629d897b793ffb45a145dc8aa37dcde8bb 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -2,26 +2,45 @@
-
-------------------------------------------------------------------------------------
-
+
+
+
+
+
+
+
+------------------------------------------------------------------------------------
+
+
+
+
+
+
**PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用示例如下:
##### 语音识别
@@ -57,7 +78,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
我认为跑步最重要的就是给我带来了身体健康。 |
-
@@ -143,47 +163,39 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
-### ⭐ 应用案例
-- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
-
-
-
-- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
-
-
-- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**
-
-

-
-
-### 🔥 热门活动
-
-- 2021.12.21~12.24
-
- 4 日直播课: 深度解读 PaddleSpeech 语音技术!
-
- **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
### 特性
本项目采用了易用、高效、灵活以及可扩展的实现,旨在为工业应用、学术研究提供更好的支持,实现的功能包含训练、推断以及测试模块,以及部署过程,主要包括
- 📦 **易用性**: 安装门槛低,可使用 [CLI](#quick-start) 快速开始。
- 🏆 **对标 SoTA**: 提供了高速、轻量级模型,且借鉴了最前沿的技术。
+- 🏆 **流式ASR和TTS系统**:工业级的端到端流式识别、流式合成系统。
- 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换(G2P)。此外,我们使用自定义语言规则来适应中文语境。
- **多种工业界以及学术界主流功能支持**:
- - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成等任务的实现。
+ - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成、声纹识别、KWS等任务的实现。
- 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块,并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC,详情请见 [模型列表](#model-list)。
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。
+
### 近期更新
-- 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。
-- 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。
-- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
-- 👏🏻 2021.12.10: PaddleSpeech CLI 上线!覆盖了声音分类、语音识别、语音翻译(英译中)以及语音合成。
+- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md)、[PP-TTS](./docs/source/tts/PPTTS_cn.md)、[PP-VPR](docs/source/vpr/PPVPR_cn.md)
+- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。
+- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。
+- 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成,声纹验证。
+- 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
+
+### 🔥 热门活动
+
+- 2021.12.21~12.24
+
+ 4 日直播课: 深度解读 PaddleSpeech 语音技术!
+
+ **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
+
### 技术交流群
微信扫描二维码(好友申请通过后回复【语音】)加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
@@ -192,11 +204,13 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
+
## 安装
我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
目前为止,**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能,**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节,可以参考[安装文档](./docs/source/install_cn.md)。
+
## 快速开始
安装完成后,开发者可以通过命令行快速开始,改变 `--input` 可以尝试用自己的音频或文本测试。
@@ -232,7 +246,7 @@ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!
**批处理**
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
-```
+```
**Shell管道**
ASR + Punc:
@@ -269,6 +283,38 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
+
+## 快速使用流式服务
+
+开发者可以尝试[流式ASR](./demos/streaming_asr_server/README.md)和 [流式TTS](./demos/streaming_tts_server/README.md)服务.
+
+**启动流式ASR服务**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
+```
+
+**访问流式ASR服务**
+
+```
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
+```
+
+**启动流式TTS服务**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
+```
+
+**访问流式TTS服务**
+
+```
+paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
+```
+
+更多信息参看: [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md)
+
+
## 模型列表
PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)。
@@ -282,8 +328,8 @@ PaddleSpeech 的 **语音转文本** 包含语音识别声学模型、语音识
语音转文本模块类型 |
数据集 |
- 模型种类 |
- 链接 |
+ 模型类型 |
+ 脚本 |
@@ -356,9 +402,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
语音合成模块类型 |
- 模型种类 |
+ 模型类型 |
数据集 |
- 链接 |
+ 脚本 |
@@ -474,8 +520,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
任务 |
数据集 |
- 模型种类 |
- 链接 |
+ 模型类型 |
+ 脚本 |
@@ -498,10 +544,10 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- Task |
- Dataset |
- Model Type |
- Link |
+ 任务 |
+ 数据集 |
+ 模型类型 |
+ 脚本 |
@@ -525,8 +571,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
任务 |
数据集 |
- 模型种类 |
- 链接 |
+ 模型类型 |
+ 脚本 |
@@ -582,6 +628,21 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet),现在与此仓库合并。如果您对该任务的学术研究感兴趣,请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外,[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。
+## ⭐ 应用案例
+- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
+
+
+
+- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
+
+
+- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**
+
+
+

+
+
+
## 引用
要引用 PaddleSpeech 进行研究,请使用以下格式进行引用。
@@ -658,6 +719,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。
+
此外,PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。
## License
diff --git a/paddleaudio/.gitignore b/audio/.gitignore
similarity index 100%
rename from paddleaudio/.gitignore
rename to audio/.gitignore
diff --git a/paddleaudio/CHANGELOG.md b/audio/CHANGELOG.md
similarity index 100%
rename from paddleaudio/CHANGELOG.md
rename to audio/CHANGELOG.md
diff --git a/paddleaudio/README.md b/audio/README.md
similarity index 100%
rename from paddleaudio/README.md
rename to audio/README.md
diff --git a/paddleaudio/docs/Makefile b/audio/docs/Makefile
similarity index 100%
rename from paddleaudio/docs/Makefile
rename to audio/docs/Makefile
diff --git a/paddleaudio/docs/README.md b/audio/docs/README.md
similarity index 100%
rename from paddleaudio/docs/README.md
rename to audio/docs/README.md
diff --git a/paddleaudio/docs/images/paddle.png b/audio/docs/images/paddle.png
similarity index 100%
rename from paddleaudio/docs/images/paddle.png
rename to audio/docs/images/paddle.png
diff --git a/paddleaudio/docs/make.bat b/audio/docs/make.bat
similarity index 100%
rename from paddleaudio/docs/make.bat
rename to audio/docs/make.bat
diff --git a/paddleaudio/docs/source/_static/custom.css b/audio/docs/source/_static/custom.css
similarity index 100%
rename from paddleaudio/docs/source/_static/custom.css
rename to audio/docs/source/_static/custom.css
diff --git a/paddleaudio/docs/source/_templates/module.rst_t b/audio/docs/source/_templates/module.rst_t
similarity index 100%
rename from paddleaudio/docs/source/_templates/module.rst_t
rename to audio/docs/source/_templates/module.rst_t
diff --git a/paddleaudio/docs/source/_templates/package.rst_t b/audio/docs/source/_templates/package.rst_t
similarity index 100%
rename from paddleaudio/docs/source/_templates/package.rst_t
rename to audio/docs/source/_templates/package.rst_t
diff --git a/paddleaudio/docs/source/_templates/toc.rst_t b/audio/docs/source/_templates/toc.rst_t
similarity index 100%
rename from paddleaudio/docs/source/_templates/toc.rst_t
rename to audio/docs/source/_templates/toc.rst_t
diff --git a/paddleaudio/docs/source/conf.py b/audio/docs/source/conf.py
similarity index 100%
rename from paddleaudio/docs/source/conf.py
rename to audio/docs/source/conf.py
diff --git a/paddleaudio/docs/source/index.rst b/audio/docs/source/index.rst
similarity index 100%
rename from paddleaudio/docs/source/index.rst
rename to audio/docs/source/index.rst
diff --git a/paddleaudio/paddleaudio/__init__.py b/audio/paddleaudio/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/__init__.py
rename to audio/paddleaudio/__init__.py
diff --git a/paddleaudio/paddleaudio/backends/__init__.py b/audio/paddleaudio/backends/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/backends/__init__.py
rename to audio/paddleaudio/backends/__init__.py
diff --git a/paddleaudio/paddleaudio/backends/soundfile_backend.py b/audio/paddleaudio/backends/soundfile_backend.py
similarity index 100%
rename from paddleaudio/paddleaudio/backends/soundfile_backend.py
rename to audio/paddleaudio/backends/soundfile_backend.py
diff --git a/paddleaudio/paddleaudio/backends/sox_backend.py b/audio/paddleaudio/backends/sox_backend.py
similarity index 100%
rename from paddleaudio/paddleaudio/backends/sox_backend.py
rename to audio/paddleaudio/backends/sox_backend.py
diff --git a/paddleaudio/paddleaudio/compliance/__init__.py b/audio/paddleaudio/compliance/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/compliance/__init__.py
rename to audio/paddleaudio/compliance/__init__.py
diff --git a/paddleaudio/paddleaudio/compliance/kaldi.py b/audio/paddleaudio/compliance/kaldi.py
similarity index 100%
rename from paddleaudio/paddleaudio/compliance/kaldi.py
rename to audio/paddleaudio/compliance/kaldi.py
diff --git a/paddleaudio/paddleaudio/compliance/librosa.py b/audio/paddleaudio/compliance/librosa.py
similarity index 100%
rename from paddleaudio/paddleaudio/compliance/librosa.py
rename to audio/paddleaudio/compliance/librosa.py
diff --git a/paddleaudio/paddleaudio/datasets/__init__.py b/audio/paddleaudio/datasets/__init__.py
similarity index 96%
rename from paddleaudio/paddleaudio/datasets/__init__.py
rename to audio/paddleaudio/datasets/__init__.py
index ebd4af984f697a8fe73c7a87f4d8362a95915c42..f95fad3054de8d19f24f881b69b682ae6def5b5b 100644
--- a/paddleaudio/paddleaudio/datasets/__init__.py
+++ b/audio/paddleaudio/datasets/__init__.py
@@ -13,6 +13,7 @@
# limitations under the License.
from .esc50 import ESC50
from .gtzan import GTZAN
+from .hey_snips import HeySnips
from .rirs_noises import OpenRIRNoise
from .tess import TESS
from .urban_sound import UrbanSound8K
diff --git a/paddleaudio/paddleaudio/datasets/dataset.py b/audio/paddleaudio/datasets/dataset.py
similarity index 76%
rename from paddleaudio/paddleaudio/datasets/dataset.py
rename to audio/paddleaudio/datasets/dataset.py
index 06e2df6d0efac865baece7f0fd446fbf41f35c32..488187a69de54aed7af2b038bea6f3bcb73c57f6 100644
--- a/paddleaudio/paddleaudio/datasets/dataset.py
+++ b/audio/paddleaudio/datasets/dataset.py
@@ -17,6 +17,8 @@ import numpy as np
import paddle
from ..backends import load as load_audio
+from ..compliance.kaldi import fbank as kaldi_fbank
+from ..compliance.kaldi import mfcc as kaldi_mfcc
from ..compliance.librosa import melspectrogram
from ..compliance.librosa import mfcc
@@ -24,6 +26,8 @@ feat_funcs = {
'raw': None,
'melspectrogram': melspectrogram,
'mfcc': mfcc,
+ 'kaldi_fbank': kaldi_fbank,
+ 'kaldi_mfcc': kaldi_mfcc,
}
@@ -73,16 +77,24 @@ class AudioClassificationDataset(paddle.io.Dataset):
feat_func = feat_funcs[self.feat_type]
record = {}
- record['feat'] = feat_func(
- waveform, sample_rate,
- **self.feat_config) if feat_func else waveform
+ if self.feat_type in ['kaldi_fbank', 'kaldi_mfcc']:
+ waveform = paddle.to_tensor(waveform).unsqueeze(0) # (C, T)
+ record['feat'] = feat_func(
+ waveform=waveform, sr=self.sample_rate, **self.feat_config)
+ else:
+ record['feat'] = feat_func(
+ waveform, sample_rate,
+ **self.feat_config) if feat_func else waveform
record['label'] = label
return record
def __getitem__(self, idx):
record = self._convert_to_record(idx)
- return np.array(record['feat']).transpose(), np.array(
- record['label'], dtype=np.int64)
+ if self.feat_type in ['kaldi_fbank', 'kaldi_mfcc']:
+ return self.keys[idx], record['feat'], record['label']
+ else:
+ return np.array(record['feat']).transpose(), np.array(
+ record['label'], dtype=np.int64)
def __len__(self):
return len(self.files)
diff --git a/paddleaudio/paddleaudio/datasets/esc50.py b/audio/paddleaudio/datasets/esc50.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/esc50.py
rename to audio/paddleaudio/datasets/esc50.py
diff --git a/paddleaudio/paddleaudio/datasets/gtzan.py b/audio/paddleaudio/datasets/gtzan.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/gtzan.py
rename to audio/paddleaudio/datasets/gtzan.py
diff --git a/audio/paddleaudio/datasets/hey_snips.py b/audio/paddleaudio/datasets/hey_snips.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a67b843bb4dca8bea4f49c69cd7dd2105e2618d
--- /dev/null
+++ b/audio/paddleaudio/datasets/hey_snips.py
@@ -0,0 +1,74 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import collections
+import json
+import os
+from typing import List
+from typing import Tuple
+
+from .dataset import AudioClassificationDataset
+
+__all__ = ['HeySnips']
+
+
+class HeySnips(AudioClassificationDataset):
+ meta_info = collections.namedtuple('META_INFO',
+ ('key', 'label', 'duration', 'wav'))
+
+ def __init__(self,
+ data_dir: os.PathLike,
+ mode: str='train',
+ feat_type: str='kaldi_fbank',
+ sample_rate: int=16000,
+ **kwargs):
+ self.data_dir = data_dir
+ files, labels = self._get_data(mode)
+ super(HeySnips, self).__init__(
+ files=files,
+ labels=labels,
+ feat_type=feat_type,
+ sample_rate=sample_rate,
+ **kwargs)
+
+ def _get_meta_info(self, mode) -> List[collections.namedtuple]:
+ ret = []
+ with open(os.path.join(self.data_dir, '{}.json'.format(mode)),
+ 'r') as f:
+ data = json.load(f)
+ for item in data:
+ sample = collections.OrderedDict()
+ if item['duration'] > 0:
+ sample['key'] = item['id']
+ sample['label'] = 0 if item['is_hotword'] == 1 else -1
+ sample['duration'] = item['duration']
+ sample['wav'] = os.path.join(self.data_dir,
+ item['audio_file_path'])
+ ret.append(self.meta_info(*sample.values()))
+ return ret
+
+ def _get_data(self, mode: str) -> Tuple[List[str], List[int]]:
+ meta_info = self._get_meta_info(mode)
+
+ files = []
+ labels = []
+ self.keys = []
+ self.durations = []
+ for sample in meta_info:
+ key, target, duration, wav = sample
+ files.append(wav)
+ labels.append(int(target))
+ self.keys.append(key)
+ self.durations.append(float(duration))
+
+ return files, labels
diff --git a/paddleaudio/paddleaudio/datasets/rirs_noises.py b/audio/paddleaudio/datasets/rirs_noises.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/rirs_noises.py
rename to audio/paddleaudio/datasets/rirs_noises.py
diff --git a/paddleaudio/paddleaudio/datasets/tess.py b/audio/paddleaudio/datasets/tess.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/tess.py
rename to audio/paddleaudio/datasets/tess.py
diff --git a/paddleaudio/paddleaudio/datasets/urban_sound.py b/audio/paddleaudio/datasets/urban_sound.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/urban_sound.py
rename to audio/paddleaudio/datasets/urban_sound.py
diff --git a/paddleaudio/paddleaudio/datasets/voxceleb.py b/audio/paddleaudio/datasets/voxceleb.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/voxceleb.py
rename to audio/paddleaudio/datasets/voxceleb.py
diff --git a/paddleaudio/paddleaudio/features/__init__.py b/audio/paddleaudio/features/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/features/__init__.py
rename to audio/paddleaudio/features/__init__.py
diff --git a/paddleaudio/paddleaudio/features/layers.py b/audio/paddleaudio/features/layers.py
similarity index 100%
rename from paddleaudio/paddleaudio/features/layers.py
rename to audio/paddleaudio/features/layers.py
diff --git a/paddleaudio/paddleaudio/functional/__init__.py b/audio/paddleaudio/functional/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/functional/__init__.py
rename to audio/paddleaudio/functional/__init__.py
diff --git a/paddleaudio/paddleaudio/functional/functional.py b/audio/paddleaudio/functional/functional.py
similarity index 100%
rename from paddleaudio/paddleaudio/functional/functional.py
rename to audio/paddleaudio/functional/functional.py
diff --git a/paddleaudio/paddleaudio/functional/window.py b/audio/paddleaudio/functional/window.py
similarity index 100%
rename from paddleaudio/paddleaudio/functional/window.py
rename to audio/paddleaudio/functional/window.py
diff --git a/paddleaudio/paddleaudio/io/__init__.py b/audio/paddleaudio/io/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/io/__init__.py
rename to audio/paddleaudio/io/__init__.py
diff --git a/paddleaudio/paddleaudio/metric/__init__.py b/audio/paddleaudio/metric/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/metric/__init__.py
rename to audio/paddleaudio/metric/__init__.py
diff --git a/paddleaudio/paddleaudio/metric/dtw.py b/audio/paddleaudio/metric/dtw.py
similarity index 100%
rename from paddleaudio/paddleaudio/metric/dtw.py
rename to audio/paddleaudio/metric/dtw.py
diff --git a/paddleaudio/paddleaudio/metric/eer.py b/audio/paddleaudio/metric/eer.py
similarity index 100%
rename from paddleaudio/paddleaudio/metric/eer.py
rename to audio/paddleaudio/metric/eer.py
diff --git a/paddleaudio/paddleaudio/sox_effects/__init__.py b/audio/paddleaudio/sox_effects/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/sox_effects/__init__.py
rename to audio/paddleaudio/sox_effects/__init__.py
diff --git a/paddleaudio/paddleaudio/utils/__init__.py b/audio/paddleaudio/utils/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/__init__.py
rename to audio/paddleaudio/utils/__init__.py
diff --git a/paddleaudio/paddleaudio/utils/download.py b/audio/paddleaudio/utils/download.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/download.py
rename to audio/paddleaudio/utils/download.py
diff --git a/paddleaudio/paddleaudio/utils/env.py b/audio/paddleaudio/utils/env.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/env.py
rename to audio/paddleaudio/utils/env.py
diff --git a/paddleaudio/paddleaudio/utils/error.py b/audio/paddleaudio/utils/error.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/error.py
rename to audio/paddleaudio/utils/error.py
diff --git a/paddleaudio/paddleaudio/utils/log.py b/audio/paddleaudio/utils/log.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/log.py
rename to audio/paddleaudio/utils/log.py
diff --git a/paddleaudio/paddleaudio/utils/numeric.py b/audio/paddleaudio/utils/numeric.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/numeric.py
rename to audio/paddleaudio/utils/numeric.py
diff --git a/paddleaudio/paddleaudio/utils/time.py b/audio/paddleaudio/utils/time.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/time.py
rename to audio/paddleaudio/utils/time.py
diff --git a/paddleaudio/setup.py b/audio/setup.py
similarity index 99%
rename from paddleaudio/setup.py
rename to audio/setup.py
index aac38930295aac345c0a5746e4dadfec98ef9dc7..ec67c81def776d25e86800ef3606093e91e4c2ef 100644
--- a/paddleaudio/setup.py
+++ b/audio/setup.py
@@ -19,7 +19,7 @@ from setuptools.command.install import install
from setuptools.command.test import test
# set the version here
-VERSION = '0.2.1'
+VERSION = '0.0.0'
# Inspired by the example at https://pytest.org/latest/goodpractises.html
diff --git a/paddleaudio/tests/.gitkeep b/audio/tests/.gitkeep
similarity index 100%
rename from paddleaudio/tests/.gitkeep
rename to audio/tests/.gitkeep
diff --git a/paddleaudio/tests/backends/__init__.py b/audio/tests/backends/__init__.py
similarity index 100%
rename from paddleaudio/tests/backends/__init__.py
rename to audio/tests/backends/__init__.py
diff --git a/paddleaudio/tests/backends/base.py b/audio/tests/backends/base.py
similarity index 100%
rename from paddleaudio/tests/backends/base.py
rename to audio/tests/backends/base.py
diff --git a/paddleaudio/tests/backends/soundfile/__init__.py b/audio/tests/backends/soundfile/__init__.py
similarity index 100%
rename from paddleaudio/tests/backends/soundfile/__init__.py
rename to audio/tests/backends/soundfile/__init__.py
diff --git a/paddleaudio/tests/backends/soundfile/test_io.py b/audio/tests/backends/soundfile/test_io.py
similarity index 100%
rename from paddleaudio/tests/backends/soundfile/test_io.py
rename to audio/tests/backends/soundfile/test_io.py
index 0f7580a40d386c048e88e6e3f75c6451917c9d68..9d092902da49e4651574201fa6d050d2a12b9c92 100644
--- a/paddleaudio/tests/backends/soundfile/test_io.py
+++ b/audio/tests/backends/soundfile/test_io.py
@@ -16,9 +16,9 @@ import os
import unittest
import numpy as np
+import paddleaudio
import soundfile as sf
-import paddleaudio
from ..base import BackendTest
diff --git a/paddleaudio/tests/benchmark/README.md b/audio/tests/benchmark/README.md
similarity index 100%
rename from paddleaudio/tests/benchmark/README.md
rename to audio/tests/benchmark/README.md
diff --git a/paddleaudio/tests/benchmark/log_melspectrogram.py b/audio/tests/benchmark/log_melspectrogram.py
similarity index 99%
rename from paddleaudio/tests/benchmark/log_melspectrogram.py
rename to audio/tests/benchmark/log_melspectrogram.py
index 5230acd424e27b22dfc3e656410f5f74f5a1b2d0..9832aed4d1b80a4565efac8a551946feb7a7a117 100644
--- a/paddleaudio/tests/benchmark/log_melspectrogram.py
+++ b/audio/tests/benchmark/log_melspectrogram.py
@@ -17,11 +17,10 @@ import urllib.request
import librosa
import numpy as np
import paddle
+import paddleaudio
import torch
import torchaudio
-import paddleaudio
-
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
diff --git a/paddleaudio/tests/benchmark/melspectrogram.py b/audio/tests/benchmark/melspectrogram.py
similarity index 99%
rename from paddleaudio/tests/benchmark/melspectrogram.py
rename to audio/tests/benchmark/melspectrogram.py
index e0b79b45a71a83ee5791ab97a633018c1d377ee1..5fe3f2481820810a394350b56bdd3c315e08cb46 100644
--- a/paddleaudio/tests/benchmark/melspectrogram.py
+++ b/audio/tests/benchmark/melspectrogram.py
@@ -17,11 +17,10 @@ import urllib.request
import librosa
import numpy as np
import paddle
+import paddleaudio
import torch
import torchaudio
-import paddleaudio
-
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
diff --git a/paddleaudio/tests/benchmark/mfcc.py b/audio/tests/benchmark/mfcc.py
similarity index 99%
rename from paddleaudio/tests/benchmark/mfcc.py
rename to audio/tests/benchmark/mfcc.py
index 2572ff33dd1cd80ba41ac1f0e35ec1df5e04e757..c6a8c85f90905442a8c2ee19ac52b1f0727aa50a 100644
--- a/paddleaudio/tests/benchmark/mfcc.py
+++ b/audio/tests/benchmark/mfcc.py
@@ -17,11 +17,10 @@ import urllib.request
import librosa
import numpy as np
import paddle
+import paddleaudio
import torch
import torchaudio
-import paddleaudio
-
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
diff --git a/paddleaudio/tests/features/__init__.py b/audio/tests/features/__init__.py
similarity index 100%
rename from paddleaudio/tests/features/__init__.py
rename to audio/tests/features/__init__.py
diff --git a/paddleaudio/tests/features/base.py b/audio/tests/features/base.py
similarity index 99%
rename from paddleaudio/tests/features/base.py
rename to audio/tests/features/base.py
index 725e1e2e70bdacca0e067e371dfb8e71130e0170..476f6b8eeb7f14247fa00fd0943741c2eca53e66 100644
--- a/paddleaudio/tests/features/base.py
+++ b/audio/tests/features/base.py
@@ -17,7 +17,6 @@ import urllib.request
import numpy as np
import paddle
-
from paddleaudio import load
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
diff --git a/paddleaudio/tests/features/test_istft.py b/audio/tests/features/test_istft.py
similarity index 100%
rename from paddleaudio/tests/features/test_istft.py
rename to audio/tests/features/test_istft.py
index 23371200b6209a300e1205d1db02d3f6542f473e..9cf8cdd65582c0300d59749db621155eebd3faee 100644
--- a/paddleaudio/tests/features/test_istft.py
+++ b/audio/tests/features/test_istft.py
@@ -15,9 +15,9 @@ import unittest
import numpy as np
import paddle
+from paddleaudio.functional.window import get_window
from .base import FeatTest
-from paddleaudio.functional.window import get_window
from paddlespeech.s2t.transform.spectrogram import IStft
from paddlespeech.s2t.transform.spectrogram import Stft
diff --git a/paddleaudio/tests/features/test_kaldi.py b/audio/tests/features/test_kaldi.py
similarity index 100%
rename from paddleaudio/tests/features/test_kaldi.py
rename to audio/tests/features/test_kaldi.py
index 6e826aaa75b751127548cba4d600195ad7094d00..00a576f6f48ee71405f5942ff961ae8f6e8edf55 100644
--- a/paddleaudio/tests/features/test_kaldi.py
+++ b/audio/tests/features/test_kaldi.py
@@ -15,10 +15,10 @@ import unittest
import numpy as np
import paddle
+import paddleaudio
import torch
import torchaudio
-import paddleaudio
from .base import FeatTest
diff --git a/paddleaudio/tests/features/test_librosa.py b/audio/tests/features/test_librosa.py
similarity index 100%
rename from paddleaudio/tests/features/test_librosa.py
rename to audio/tests/features/test_librosa.py
index cf0c98c7295d6a7c2cdc7739455900d28ec02ef4..a1d3e8400dbc62924b68a1519605231d5da70bd8 100644
--- a/paddleaudio/tests/features/test_librosa.py
+++ b/audio/tests/features/test_librosa.py
@@ -16,11 +16,11 @@ import unittest
import librosa
import numpy as np
import paddle
-
import paddleaudio
-from .base import FeatTest
from paddleaudio.functional.window import get_window
+from .base import FeatTest
+
class TestLibrosa(FeatTest):
def initParmas(self):
diff --git a/paddleaudio/tests/features/test_log_melspectrogram.py b/audio/tests/features/test_log_melspectrogram.py
similarity index 100%
rename from paddleaudio/tests/features/test_log_melspectrogram.py
rename to audio/tests/features/test_log_melspectrogram.py
index 6bae2df3f564da16cb511541f8bbc714ad0b087e..0383c2b8b200a261cbb3e9a8a354f432e28e10a2 100644
--- a/paddleaudio/tests/features/test_log_melspectrogram.py
+++ b/audio/tests/features/test_log_melspectrogram.py
@@ -15,8 +15,8 @@ import unittest
import numpy as np
import paddle
-
import paddleaudio
+
from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import LogMelSpectrogram
diff --git a/paddleaudio/tests/features/test_spectrogram.py b/audio/tests/features/test_spectrogram.py
similarity index 100%
rename from paddleaudio/tests/features/test_spectrogram.py
rename to audio/tests/features/test_spectrogram.py
index 50b21403b4fb8187587edae0222a09996b384aec..1774fe61975c4b4ae11b7ff2c9200a4d67499efe 100644
--- a/paddleaudio/tests/features/test_spectrogram.py
+++ b/audio/tests/features/test_spectrogram.py
@@ -15,8 +15,8 @@ import unittest
import numpy as np
import paddle
-
import paddleaudio
+
from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import Spectrogram
diff --git a/paddleaudio/tests/features/test_stft.py b/audio/tests/features/test_stft.py
similarity index 100%
rename from paddleaudio/tests/features/test_stft.py
rename to audio/tests/features/test_stft.py
index c64b5ebe6b497b5d9c40af0c14d2785afa2e7504..58792ffe2477058958a4e31ed122263306e83388 100644
--- a/paddleaudio/tests/features/test_stft.py
+++ b/audio/tests/features/test_stft.py
@@ -15,9 +15,9 @@ import unittest
import numpy as np
import paddle
+from paddleaudio.functional.window import get_window
from .base import FeatTest
-from paddleaudio.functional.window import get_window
from paddlespeech.s2t.transform.spectrogram import Stft
diff --git a/demos/README.md b/demos/README.md
index 84f4de41f0514cb31bf114e00f5622c771a56348..8abd67249d7ad939db6d79d7b8160b8efa7cb8ba 100644
--- a/demos/README.md
+++ b/demos/README.md
@@ -11,6 +11,7 @@ The directory containes many speech applications in multi scenarios.
* punctuation_restoration - restore punctuation from raw text
* speech recogintion - recognize text of an audio file
* speech server - Server for Speech Task, e.g. ASR,TTS,CLS
+* streaming asr server - receive audio stream from websocket, and recognize to transcript.
* speech translation - end to end speech translation
* story talker - book reader based on OCR and TTS
* style_fs2 - multi style control for FastSpeech2 model
diff --git a/demos/README_cn.md b/demos/README_cn.md
index 692b8468fc0fc5d5c36b959f49bf73f830fd9e2b..471342127f4e6e49522714d5926f5c185fbdb92b 100644
--- a/demos/README_cn.md
+++ b/demos/README_cn.md
@@ -11,6 +11,7 @@
* 标点恢复 - 通常作为语音识别的文本后处理任务,为一段无标点的纯文本添加相应的标点符号。
* 语音识别 - 识别一段音频中包含的语音文字。
* 语音服务 - 离线语音服务,包括ASR、TTS、CLS等
+* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字
* 语音翻译 - 实时识别音频中的语言,并同时翻译成目标语言。
* 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。
* 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。
diff --git a/demos/audio_content_search/README.md b/demos/audio_content_search/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..d73d6a59d71f7973b88f3cc9cee2834b49e5fe59
--- /dev/null
+++ b/demos/audio_content_search/README.md
@@ -0,0 +1,74 @@
+([简体中文](./README_cn.md)|English)
+# ACS (Audio Content Search)
+
+## Introduction
+ACS, or Audio Content Search, refers to the problem of getting the key word time stamp from automatically transcribe spoken language (speech-to-text).
+
+This demo is an implementation of obtaining the keyword timestamp in the text from a given audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`.
+Now, the search word in demo is:
+```
+我
+康
+```
+## Usage
+### 1. Installation
+see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+
+You can choose one way from meduim and hard to install paddlespeech.
+
+The dependency refers to the requirements.txt
+### 2. Prepare Input File
+The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+
+Here are sample files for this demo that can be downloaded:
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+
+### 3. Usage
+- Command Line(Recommended)
+ ```bash
+ # Chinese
+ paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+ ```
+
+ Usage:
+ ```bash
+ paddlespeech asr --help
+ ```
+ Arguments:
+ - `input`(required): Audio file to recognize.
+ - `server_ip`: the server ip.
+ - `port`: the server port.
+ - `lang`: the language type of the model. Default: `zh`.
+ - `sample_rate`: Sample rate of the model. Default: `16000`.
+ - `audio_format`: The audio format.
+
+ Output:
+ ```bash
+ [2022-05-15 15:00:58,185] [ INFO] - acs http client start
+ [2022-05-15 15:00:58,185] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+ [2022-05-15 15:01:03,220] [ INFO] - acs http client finished
+ [2022-05-15 15:01:03,221] [ INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+ [2022-05-15 15:01:03,221] [ INFO] - Response time 5.036084 s.
+ ```
+
+- Python API
+ ```python
+ from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+ acs_executor = ACSClientExecutor()
+ res = acs_executor(
+ input='./zh.wav',
+ server_ip="127.0.0.1",
+ port=8490,)
+ print(res)
+ ```
+
+ Output:
+ ```bash
+ [2022-05-15 15:08:13,955] [ INFO] - acs http client start
+ [2022-05-15 15:08:13,956] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+ [2022-05-15 15:08:19,026] [ INFO] - acs http client finished
+ {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+ ```
diff --git a/demos/audio_content_search/README_cn.md b/demos/audio_content_search/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..c74af4cf1f1e1a70470bf176cd1821dfdd02ac74
--- /dev/null
+++ b/demos/audio_content_search/README_cn.md
@@ -0,0 +1,74 @@
+(简体中文|[English](./README.md))
+
+# 语音内容搜索
+## 介绍
+语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。
+
+这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
+
+当前示例中检索词是
+```
+我
+康
+```
+## 使用方法
+### 1. 安装
+请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
+
+你可以从 medium,hard 三中方式中选择一种方式安装。
+依赖参见 requirements.txt
+
+### 2. 准备输入
+这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
+
+可以下载此 demo 的示例音频:
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+### 3. 使用方法
+- 命令行 (推荐使用)
+ ```bash
+ # 中文
+ paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+ ```
+
+ 使用方法:
+ ```bash
+ paddlespeech acs --help
+ ```
+ 参数:
+ - `input`(必须输入):用于识别的音频文件。
+ - `server_ip`: 服务的ip。
+ - `port`:服务的端口。
+ - `lang`:模型语言,默认值:`zh`。
+ - `sample_rate`:音频采样率,默认值:`16000`。
+ - `audio_format`: 音频的格式。
+
+ 输出:
+ ```bash
+ [2022-05-15 15:00:58,185] [ INFO] - acs http client start
+ [2022-05-15 15:00:58,185] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+ [2022-05-15 15:01:03,220] [ INFO] - acs http client finished
+ [2022-05-15 15:01:03,221] [ INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+ [2022-05-15 15:01:03,221] [ INFO] - Response time 5.036084 s.
+ ```
+
+- Python API
+ ```python
+ from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+ acs_executor = ACSClientExecutor()
+ res = acs_executor(
+ input='./zh.wav',
+ server_ip="127.0.0.1",
+ port=8490,)
+ print(res)
+ ```
+
+ 输出:
+ ```bash
+ [2022-05-15 15:08:13,955] [ INFO] - acs http client start
+ [2022-05-15 15:08:13,956] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+ [2022-05-15 15:08:19,026] [ INFO] - acs http client finished
+ {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+ ```
diff --git a/demos/audio_content_search/acs_clinet.py b/demos/audio_content_search/acs_clinet.py
new file mode 100644
index 0000000000000000000000000000000000000000..11f99aca7aa74b2b9fca8544939a0f7267878b21
--- /dev/null
+++ b/demos/audio_content_search/acs_clinet.py
@@ -0,0 +1,49 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.utils.audio_handler import ASRHttpHandler
+
+
+def main(args):
+ logger.info("asr http client start")
+ audio_format = "wav"
+ sample_rate = 16000
+ lang = "zh"
+ handler = ASRHttpHandler(
+ server_ip=args.server_ip, port=args.port, endpoint=args.endpoint)
+ res = handler.run(args.wavfile, audio_format, sample_rate, lang)
+ # res = res['result']
+ logger.info(f"the final result: {res}")
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description="audio content search client")
+ parser.add_argument(
+ '--server_ip', type=str, default='127.0.0.1', help='server ip')
+ parser.add_argument('--port', type=int, default=8090, help='server port')
+ parser.add_argument(
+ "--wavfile",
+ action="store",
+ help="wav file path ",
+ default="./16_audio.wav")
+ parser.add_argument(
+ '--endpoint',
+ type=str,
+ default='/paddlespeech/asr/search',
+ help='server endpoint')
+ args = parser.parse_args()
+
+ main(args)
diff --git a/demos/audio_content_search/conf/acs_application.yaml b/demos/audio_content_search/conf/acs_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d3c5e3039945ffe23ba6dd2de717d9b6ab8a433f
--- /dev/null
+++ b/demos/audio_content_search/conf/acs_application.yaml
@@ -0,0 +1,34 @@
+#################################################################################
+# SERVER SETTING #
+#################################################################################
+host: 0.0.0.0
+port: 8490
+
+# The task format in the engin_list is: _
+# task choices = ['acs_python']
+# protocol = ['http'] (only one can be selected).
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['acs_python']
+
+
+#################################################################################
+# ENGINE CONFIG #
+#################################################################################
+
+################################### ACS #########################################
+################### acs task: engine_type: python ###############################
+acs_python:
+ task: acs
+ asr_protocol: 'websocket' # 'websocket'
+ offset: 1.0 # second
+ asr_server_ip: 127.0.0.1
+ asr_server_port: 8390
+ lang: 'zh'
+ word_list: "./conf/words.txt"
+ sample_rate: 16000
+ device: 'cpu' # set 'gpu:id' or 'cpu'
+
+
+
+
diff --git a/demos/audio_content_search/conf/words.txt b/demos/audio_content_search/conf/words.txt
new file mode 100644
index 0000000000000000000000000000000000000000..25510eb424fbe48ba81f51a3ce10d6ff9facad63
--- /dev/null
+++ b/demos/audio_content_search/conf/words.txt
@@ -0,0 +1,2 @@
+我
+康
\ No newline at end of file
diff --git a/demos/audio_content_search/conf/ws_conformer_application.yaml b/demos/audio_content_search/conf/ws_conformer_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..97201382f57e12e3fccb600f98ee3b0b26dc889c
--- /dev/null
+++ b/demos/audio_content_search/conf/ws_conformer_application.yaml
@@ -0,0 +1,43 @@
+#################################################################################
+# SERVER SETTING #
+#################################################################################
+host: 0.0.0.0
+port: 8390
+
+# The task format in the engin_list is: _
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+# ENGINE CONFIG #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+ model_type: 'conformer_online_multicn'
+ am_model: # the pdmodel file of am static model [optional]
+ am_params: # the pdiparams file of am static model [optional]
+ lang: 'zh'
+ sample_rate: 16000
+ cfg_path:
+ decode_method: 'attention_rescoring'
+ force_yes: True
+ device: 'cpu' # cpu or gpu:id
+ am_predictor_conf:
+ device: # set 'gpu:id' or 'cpu'
+ switch_ir_optim: True
+ glog_info: False # True -> print glog
+ summary: True # False -> do not show predictor config
+
+ chunk_buffer_conf:
+ window_n: 7 # frame
+ shift_n: 4 # frame
+ window_ms: 25 # ms
+ shift_ms: 10 # ms
+ sample_rate: 16000
+ sample_width: 2
diff --git a/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c23680bd59d5286ea0854efd46a7479485784f27
--- /dev/null
+++ b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
@@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+# SERVER SETTING #
+#################################################################################
+host: 0.0.0.0
+port: 8390
+
+# The task format in the engin_list is: _
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+# ENGINE CONFIG #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+ model_type: 'conformer_online_wenetspeech'
+ am_model: # the pdmodel file of am static model [optional]
+ am_params: # the pdiparams file of am static model [optional]
+ lang: 'zh'
+ sample_rate: 16000
+ cfg_path:
+ decode_method:
+ force_yes: True
+ device: 'cpu' # cpu or gpu:id
+ decode_method: "attention_rescoring"
+ am_predictor_conf:
+ device: # set 'gpu:id' or 'cpu'
+ switch_ir_optim: True
+ glog_info: False # True -> print glog
+ summary: True # False -> do not show predictor config
+
+ chunk_buffer_conf:
+ window_n: 7 # frame
+ shift_n: 4 # frame
+ window_ms: 25 # ms
+ shift_ms: 10 # ms
+ sample_rate: 16000
+ sample_width: 2
diff --git a/demos/audio_content_search/run.sh b/demos/audio_content_search/run.sh
new file mode 100755
index 0000000000000000000000000000000000000000..e322a37c5fcb98f1d5410f736e69646414af5f0f
--- /dev/null
+++ b/demos/audio_content_search/run.sh
@@ -0,0 +1,7 @@
+export CUDA_VISIBLE_DEVICE=0,1,2,3
+# we need the streaming asr server
+nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log 2>&1 &
+
+# start the acs server
+nohup paddlespeech_server start --config_file conf/acs_application.yaml > acs.log 2>&1 &
+
diff --git a/demos/audio_searching/README.md b/demos/audio_searching/README.md
index 87a1956b9fd22b1d5f71d33794839e5d2817d5c1..e829d991aa9863259d20b07c9dc6af664eb8dc27 100644
--- a/demos/audio_searching/README.md
+++ b/demos/audio_searching/README.md
@@ -167,8 +167,8 @@ Then to start the system server, and it provides HTTP backend services.
[2022-03-26 22:54:08,633] [ INFO] - embedding size: (192,)
Extracting feature from audio No. 2 , 20 audios in total
...
- 2022-03-26 22:54:15,892 | INFO | main.py | load_audios | 85 | Successfully loaded data, total count: 20
- 2022-03-26 22:54:15,908 | INFO | main.py | count_audio | 148 | Successfully count the number of data!
+ 2022-03-26 22:54:15,892 | INFO | audio_search.py | load_audios | 85 | Successfully loaded data, total count: 20
+ 2022-03-26 22:54:15,908 | INFO | audio_search.py | count_audio | 148 | Successfully count the number of data!
[2022-03-26 22:54:15,916] [ INFO] - checking the aduio file format......
[2022-03-26 22:54:15,916] [ INFO] - The sample rate is 16000
[2022-03-26 22:54:15,916] [ INFO] - The audio file format is right
@@ -183,12 +183,12 @@ Then to start the system server, and it provides HTTP backend services.
[2022-03-26 22:54:15,924] [ INFO] - feats shape:[1, 80, 53], lengths shape: [1]
[2022-03-26 22:54:16,051] [ INFO] - embedding size: (192,)
...
- 2022-03-26 22:54:16,086 | INFO | main.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
- 2022-03-26 22:54:16,087 | INFO | main.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
- 2022-03-26 22:54:16,087 | INFO | main.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
+ 2022-03-26 22:54:16,086 | INFO | audio_search.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
+ 2022-03-26 22:54:16,087 | INFO | audio_search.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
+ 2022-03-26 22:54:16,087 | INFO | audio_search.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
...
- 2022-03-26 22:54:16,088 | INFO | main.py | search_local_audio | 136 | Successfully searched similar audio!
- 2022-03-26 22:54:17,164 | INFO | main.py | drop_tables | 160 | Successfully drop tables in Milvus and MySQL!
+ 2022-03-26 22:54:16,088 | INFO | audio_search.py | search_local_audio | 136 | Successfully searched similar audio!
+ 2022-03-26 22:54:17,164 | INFO | audio_search.py | drop_tables | 160 | Successfully drop tables in Milvus and MySQL!
```
- GUI test (Optional)
diff --git a/demos/audio_searching/README_cn.md b/demos/audio_searching/README_cn.md
index a93dbdc1f4585c35a86121b8a2629f7854cbed46..c13742af7a1613a089e1e14c069ec7a3340dd669 100644
--- a/demos/audio_searching/README_cn.md
+++ b/demos/audio_searching/README_cn.md
@@ -169,8 +169,8 @@ ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…"
[2022-03-26 22:54:08,633] [ INFO] - embedding size: (192,)
Extracting feature from audio No. 2 , 20 audios in total
...
- 2022-03-26 22:54:15,892 | INFO | main.py | load_audios | 85 | Successfully loaded data, total count: 20
- 2022-03-26 22:54:15,908 | INFO | main.py | count_audio | 148 | Successfully count the number of data!
+ 2022-03-26 22:54:15,892 | INFO | audio_search.py | load_audios | 85 | Successfully loaded data, total count: 20
+ 2022-03-26 22:54:15,908 | INFO | audio_search.py | count_audio | 148 | Successfully count the number of data!
[2022-03-26 22:54:15,916] [ INFO] - checking the aduio file format......
[2022-03-26 22:54:15,916] [ INFO] - The sample rate is 16000
[2022-03-26 22:54:15,916] [ INFO] - The audio file format is right
@@ -185,12 +185,12 @@ ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…"
[2022-03-26 22:54:15,924] [ INFO] - feats shape:[1, 80, 53], lengths shape: [1]
[2022-03-26 22:54:16,051] [ INFO] - embedding size: (192,)
...
- 2022-03-26 22:54:16,086 | INFO | main.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
- 2022-03-26 22:54:16,087 | INFO | main.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
- 2022-03-26 22:54:16,087 | INFO | main.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
+ 2022-03-26 22:54:16,086 | INFO | audio_search.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
+ 2022-03-26 22:54:16,087 | INFO | audio_search.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
+ 2022-03-26 22:54:16,087 | INFO | audio_search.py | search_local_audio | 132 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
...
- 2022-03-26 22:54:16,088 | INFO | main.py | search_local_audio | 136 | Successfully searched similar audio!
- 2022-03-26 22:54:17,164 | INFO | main.py | drop_tables | 160 | Successfully drop tables in Milvus and MySQL!
+ 2022-03-26 22:54:16,088 | INFO | audio_search.py | search_local_audio | 136 | Successfully searched similar audio!
+ 2022-03-26 22:54:17,164 | INFO | audio_search.py | drop_tables | 160 | Successfully drop tables in Milvus and MySQL!
```
- 前端测试(可选)
diff --git a/demos/audio_searching/src/operations/load.py b/demos/audio_searching/src/operations/load.py
index 0d9edb7846198f4e0e9111b642c0c55d6ff2dbb9..d1ea00576ec4fff72e9d1554ca2514e284ec9169 100644
--- a/demos/audio_searching/src/operations/load.py
+++ b/demos/audio_searching/src/operations/load.py
@@ -26,8 +26,9 @@ def get_audios(path):
"""
supported_formats = [".wav", ".mp3", ".ogg", ".flac", ".m4a"]
return [
- item for sublist in [[os.path.join(dir, file) for file in files]
- for dir, _, files in list(os.walk(path))]
+ item
+ for sublist in [[os.path.join(dir, file) for file in files]
+ for dir, _, files in list(os.walk(path))]
for item in sublist if os.path.splitext(item)[1] in supported_formats
]
diff --git a/demos/audio_searching/src/test_vpr_search.py b/demos/audio_searching/src/test_vpr_search.py
index 8cc8dc8412e76d92ac04219c52cb940643df62b9..298e12ebaf2b4408f67df3e9fe16f6fd59cb6219 100644
--- a/demos/audio_searching/src/test_vpr_search.py
+++ b/demos/audio_searching/src/test_vpr_search.py
@@ -73,7 +73,9 @@ def test_data(spk: str):
"""
Get the audio file by spk_id in MySQL
"""
- response = client.get("/vpr/data?spk_id=" + spk)
+ response = client.get(
+ "/vpr/data",
+ json={"spk_id": spk}, )
assert response.status_code == 200
@@ -81,7 +83,9 @@ def test_del(spk: str):
"""
Delete the record in MySQL by spk_id
"""
- response = client.post("/vpr/del?spk_id=" + spk)
+ response = client.post(
+ "/vpr/del",
+ json={"spk_id": spk}, )
assert response.status_code == 200
diff --git a/demos/audio_searching/src/vpr_search.py b/demos/audio_searching/src/vpr_search.py
index 8e702221c8bc23213533654aa7d6545e91f5631b..2780dfb3bf2b8630bd1b2b5975f42d01056e1133 100644
--- a/demos/audio_searching/src/vpr_search.py
+++ b/demos/audio_searching/src/vpr_search.py
@@ -17,6 +17,7 @@ import uvicorn
from config import UPLOAD_PATH
from fastapi import FastAPI
from fastapi import File
+from fastapi import Form
from fastapi import UploadFile
from logs import LOGGER
from mysql_helpers import MySQLHelper
@@ -49,10 +50,12 @@ if not os.path.exists(UPLOAD_PATH):
@app.post('/vpr/enroll')
async def vpr_enroll(table_name: str=None,
- spk_id: str=None,
+ spk_id: str=Form(...),
audio: UploadFile=File(...)):
# Enroll the uploaded audio with spk-id into MySQL
try:
+ if not spk_id:
+ return {'status': False, 'msg': "spk_id can not be None"}
# Save the upload data to server.
content = await audio.read()
audio_path = os.path.join(UPLOAD_PATH, audio.filename)
@@ -63,7 +66,7 @@ async def vpr_enroll(table_name: str=None,
return {'status': True, 'msg': "Successfully enroll data!"}
except Exception as e:
LOGGER.error(e)
- return {'status': False, 'msg': e}, 400
+ return {'status': False, 'msg': e}
@app.post('/vpr/enroll/local')
@@ -128,9 +131,12 @@ async def vpr_recog_local(request: Request,
@app.post('/vpr/del')
-async def vpr_del(table_name: str=None, spk_id: str=None):
+async def vpr_del(table_name: str=None, spk_id: dict=None):
# Delete a record by spk_id in MySQL
try:
+ spk_id = spk_id['spk_id']
+ if not spk_id:
+ return {'status': False, 'msg': "spk_id can not be None"}
do_delete(table_name, spk_id, MYSQL_CLI)
LOGGER.info("Successfully delete a record by spk_id in MySQL")
return {'status': True, 'msg': "Successfully delete data!"}
@@ -156,9 +162,12 @@ async def vpr_list(table_name: str=None):
@app.get('/vpr/data')
async def vpr_data(
table_name: str=None,
- spk_id: str=None, ):
+ spk_id: dict=None, ):
# Get the audio file from path by spk_id in MySQL
try:
+ spk_id = spk_id['spk_id']
+ if not spk_id:
+ return {'status': False, 'msg': "spk_id can not be None"}
audio_path = do_get(table_name, spk_id, MYSQL_CLI)
LOGGER.info(f"Successfully get audio path {audio_path}!")
return FileResponse(audio_path)
diff --git a/demos/custom_streaming_asr/README.md b/demos/custom_streaming_asr/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..aa28d502f9da451b0279c224523160ad22f0b97a
--- /dev/null
+++ b/demos/custom_streaming_asr/README.md
@@ -0,0 +1,65 @@
+([简体中文](./README_cn.md)|English)
+
+# Customized Auto Speech Recognition
+
+## introduction
+In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
+
+this demo is customized for expense account, which need to recognize rare address.
+
+* G with slot: 打车到 "address_slot"。
+
+
+* this is address slot wfst, you can add the address which want to recognize.
+
+
+* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.
+
+
+## Usage
+### 1. Installation
+install paddle:2.2.2 docker.
+```
+sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
+
+sudo docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
+```
+
+### 2. demo
+* run websocket_server.sh. This script will download resources and libs, and launch the service.
+```
+cd /paddle
+bash websocket_server.sh
+```
+this script run in two steps:
+1. download the resources.tar.gz, those direcotries will be found in resource directory.
+model: acustic model
+graph: the decoder graph (TLG.fst)
+lib: some libs
+bin: binary
+data: audio and wav.scp
+
+2. websocket_server_main launch the service.
+some params:
+port: the service port
+graph_path: the decoder graph path
+model_path: acustic model path
+please refer other params in those files:
+PaddleSpeech/speechx/speechx/decoder/param.h
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
+
+* In other terminal, run script websocket_client.sh, the client will send data and get the results.
+```
+bash websocket_client.sh
+```
+websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
+
+* result:
+In the log of client, you will see the message below:
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```
\ No newline at end of file
diff --git a/demos/custom_streaming_asr/README_cn.md b/demos/custom_streaming_asr/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..ffbf682fb362394289083658364cb4bc0616682a
--- /dev/null
+++ b/demos/custom_streaming_asr/README_cn.md
@@ -0,0 +1,63 @@
+(简体中文|[English](./README.md))
+
+# 定制化语音识别演示
+## 介绍
+在一些场景中,识别系统需要高精度的识别一些稀有词,例如导航软件中地名识别。而通过定制化识别可以满足这一需求。
+
+这个 demo 是打车报销单的场景识别,需要识别一些稀有的地名,可以通过如下操作实现。
+
+* G with slot: 打车到 "address_slot"。
+
+
+* 这是 address slot wfst, 可以添加一些需要识别的地名.
+
+
+* 通过 replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。
+
+
+## 使用方法
+### 1. 配置环境
+安装paddle:2.2.2 docker镜像。
+```
+sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
+
+sudo docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
+```
+
+### 2. 演示
+* 运行如下命令,完成相关资源和库的下载和服务启动。
+```
+cd /paddle
+bash websocket_server.sh
+```
+上面脚本完成了如下两个功能:
+1. 完成 resource.tar.gz 下载,解压后,会在 resource 中发现如下目录:
+model: 声学模型
+graph: 解码构图
+lib: 相关库
+bin: 运行程序
+data: 语音数据
+
+2. 通过 websocket_server_main 来启动服务。
+这里简单的介绍几个参数:
+port 是服务端口,
+graph_path 用来指定解码图文件,
+其他参数说明可参见代码:
+PaddleSpeech/speechx/speechx/decoder/param.h
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
+
+* 在另一个终端中, 通过 client 发送数据,得到结果。运行如下命令:
+```
+bash websocket_client.sh
+```
+通过 websocket_client_main 来启动 client 服务,其中 wav_scp 是发送的语音句子集合,port 为服务端口。
+
+* 结果:
+client 的 log 中可以看到如下类似的结果
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```
diff --git a/demos/custom_streaming_asr/path.sh b/demos/custom_streaming_asr/path.sh
new file mode 100644
index 0000000000000000000000000000000000000000..47462324d739e7cc5dbd16097d5ca5b5cbdacbf3
--- /dev/null
+++ b/demos/custom_streaming_asr/path.sh
@@ -0,0 +1,2 @@
+export LD_LIBRARY_PATH=$PWD/resource/lib
+export PATH=$PATH:$PWD/resource/bin
diff --git a/demos/custom_streaming_asr/setup_docker.sh b/demos/custom_streaming_asr/setup_docker.sh
new file mode 100644
index 0000000000000000000000000000000000000000..329a75db0ef34c8cb4e3a54d9663f027d1919a14
--- /dev/null
+++ b/demos/custom_streaming_asr/setup_docker.sh
@@ -0,0 +1 @@
+sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
diff --git a/demos/custom_streaming_asr/websocket_client.sh b/demos/custom_streaming_asr/websocket_client.sh
new file mode 100755
index 0000000000000000000000000000000000000000..ede076cafa2529c89bc79dee211a8cf962cf960d
--- /dev/null
+++ b/demos/custom_streaming_asr/websocket_client.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+set +x
+set -e
+
+. path.sh
+# input
+data=$PWD/data
+
+# output
+wav_scp=wav.scp
+
+export GLOG_logtostderr=1
+
+# websocket client
+websocket_client_main \
+ --wav_rspecifier=scp:$data/$wav_scp \
+ --streaming_chunk=0.36 \
+ --port=8881
diff --git a/demos/custom_streaming_asr/websocket_server.sh b/demos/custom_streaming_asr/websocket_server.sh
new file mode 100755
index 0000000000000000000000000000000000000000..041c345be79722c882d50e828f2a2438c0eb9a24
--- /dev/null
+++ b/demos/custom_streaming_asr/websocket_server.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+set +x
+set -e
+
+export GLOG_logtostderr=1
+
+. path.sh
+#test websocket server
+
+model_dir=./resource/model
+graph_dir=./resource/graph
+cmvn=./data/cmvn.ark
+
+
+#paddle_asr_online/resource.tar.gz
+if [ ! -f $cmvn ]; then
+ wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
+ tar xzfv resource.tar.gz
+ ln -s ./resource/data .
+fi
+
+websocket_server_main \
+ --cmvn_file=$cmvn \
+ --streaming_chunk=0.1 \
+ --use_fbank=true \
+ --model_path=$model_dir/avg_10.jit.pdmodel \
+ --param_path=$model_dir/avg_10.jit.pdiparams \
+ --model_cache_shapes="5-1-2048,5-1-2048" \
+ --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+ --word_symbol_table=$graph_dir/words.txt \
+ --graph_path=$graph_dir/TLG.fst --max_active=7500 \
+ --port=8881 \
+ --acoustic_scale=12
diff --git a/demos/speaker_verification/README.md b/demos/speaker_verification/README.md
index b79f3f7a1660bda40695147b1177f512055f2702..b6a1d9bcc26058c2789f82444b2aa9eced26e0d0 100644
--- a/demos/speaker_verification/README.md
+++ b/demos/speaker_verification/README.md
@@ -14,7 +14,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
You can choose one way from easy, meduim and hard to install paddlespeech.
### 2. Prepare Input File
-The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
```bash
diff --git a/demos/speaker_verification/README_cn.md b/demos/speaker_verification/README_cn.md
index db382f298df74c73ef5fcbd5a3fb64fb2fa1c44f..90bba38acf2d176092d224c5c1112418bbac353a 100644
--- a/demos/speaker_verification/README_cn.md
+++ b/demos/speaker_verification/README_cn.md
@@ -4,16 +4,16 @@
## 介绍
声纹识别是一项用计算机程序自动提取说话人特征的技术。
-这个 demo 是一个从给定音频文件提取说话人特征,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
+这个 demo 是从一个给定音频文件中提取说话人特征,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
## 使用方法
### 1. 安装
请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
-你可以从 easy,medium,hard 三中方式中选择一种方式安装。
+你可以从easy medium,hard 三种方式中选择一种方式安装。
### 2. 准备输入
-这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
+声纹cli demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
可以下载此 demo 的示例音频:
```bash
diff --git a/demos/speech_recognition/README.md b/demos/speech_recognition/README.md
index 636548801b40a1485c28d77ca97a3c87265b95a7..6493e8e613800ea163b8669842c93a7dd82d68ac 100644
--- a/demos/speech_recognition/README.md
+++ b/demos/speech_recognition/README.md
@@ -24,13 +24,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- Command Line(Recommended)
```bash
# Chinese
- paddlespeech asr --input ./zh.wav
+ paddlespeech asr --input ./zh.wav -v
# English
- paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
+ paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
# Chinese ASR + Punctuation Restoration
- paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+ paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
- (It doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)
+ (If you don't want to see the log information, you can remove "-v". Besides, it doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)
Usage:
```bash
@@ -45,6 +45,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
- `yes`: No additional parameters required. Once set this parameter, it means accepting the request of the program by default, which includes transforming the audio sample rate. Default: `False`.
- `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
+ - `verbose`: Show the log information.
Output:
```bash
@@ -84,8 +85,12 @@ Here is a list of pretrained models released by PaddleSpeech that can be used by
| Model | Language | Sample Rate
| :--- | :---: | :---: |
-| conformer_wenetspeech| zh| 16k
-| transformer_librispeech| en| 16k
+| conformer_wenetspeech | zh | 16k
+| conformer_online_multicn | zh | 16k
+| conformer_aishell | zh | 16k
+| conformer_online_aishell | zh | 16k
+| transformer_librispeech | en | 16k
+| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
-|deepspeech2offline_librispeech|en| 16k
+| deepspeech2offline_librispeech | en | 16k
diff --git a/demos/speech_recognition/README_cn.md b/demos/speech_recognition/README_cn.md
index 8033dbd8130e5f282bceae286f6df7662f1deff8..8d631d89ca1d61196cbf167b3f263cfd478fb571 100644
--- a/demos/speech_recognition/README_cn.md
+++ b/demos/speech_recognition/README_cn.md
@@ -22,13 +22,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- 命令行 (推荐使用)
```bash
# 中文
- paddlespeech asr --input ./zh.wav
+ paddlespeech asr --input ./zh.wav -v
# 英文
- paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
+ paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
# 中文 + 标点恢复
- paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+ paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
- (如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error,没有关系,这个包是非必须的。)
+ (如果不想显示 log 信息,可以不使用"-v", 另外如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error,没有关系,这个包是非必须的。)
使用方法:
```bash
@@ -43,6 +43,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。
- `yes`;不需要设置额外的参数,一旦设置了该参数,说明你默认同意程序的所有请求,其中包括自动转换输入音频的采样率。默认值:`False`。
- `device`:执行预测的设备,默认值:当前系统下 paddlepaddle 的默认 device。
+ - `verbose`: 如果使用,显示 logger 信息。
输出:
```bash
@@ -82,7 +83,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
| 模型 | 语言 | 采样率
| :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k
+| conformer_online_multicn | zh | 16k
+| conformer_aishell | zh | 16k
+| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
+| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
diff --git a/demos/speech_server/README.md b/demos/speech_server/README.md
index 0323d3983ab58f40285f81f135dedf2f9f019b7e..a03a43dffa6464e2c517e4bac9c1af58fe0dd2d6 100644
--- a/demos/speech_server/README.md
+++ b/demos/speech_server/README.md
@@ -10,7 +10,7 @@ This demo is an implementation of starting the voice service and accessing the s
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
-It is recommended to use **paddlepaddle 2.2.1** or above.
+It is recommended to use **paddlepaddle 2.2.2** or above.
You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File
@@ -18,6 +18,7 @@ The configuration file can be found in `conf/application.yaml` .
Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `_`.
At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
Currently the engine type supports two forms: python and inference (Paddle Inference)
+**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
@@ -83,6 +84,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 4. ASR Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
@@ -131,6 +135,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 5. TTS Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address
+
```bash
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
@@ -191,6 +198,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 6. CLS Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
```
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
@@ -235,6 +245,172 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
```
+### 7. Speaker Verification Client Usage
+
+#### 7.1 Extract speaker embedding
+**Note:** The response time will be slightly longer when using the client for the first time
+- Command Line (Recommended)
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+ ``` bash
+ paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+ ```
+
+ Usage:
+
+ ``` bash
+ paddlespeech_client vector --help
+ ```
+
+ Arguments:
+ * server_ip: server ip. Default: 127.0.0.1
+ * port: server port. Default: 8090
+ * input(required): Input text to generate.
+ * task: the task of vector, can be use 'spk' or 'score。Default is 'spk'。
+ * enroll: enroll audio
+ * test: test audio
+
+ Output:
+
+ ```bash
+ [2022-05-08 00:18:44,249] [ INFO] - vector http client start
+ [2022-05-08 00:18:44,250] [ INFO] - the input audio: 85236145389.wav
+ [2022-05-08 00:18:44,250] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
+ [2022-05-08 00:18:44,250] [ INFO] - http://127.0.0.1:8590/paddlespeech/vector
+ [2022-05-08 00:18:44,406] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+ [2022-05-08 00:18:44,406] [ INFO] - Response time 0.156481 s.
+ ```
+
+* Python API
+
+ ``` python
+ from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+ vectorclient_executor = VectorClientExecutor()
+ res = vectorclient_executor(
+ input="85236145389.wav",
+ server_ip="127.0.0.1",
+ port=8090,
+ task="spk")
+ print(res)
+ ```
+
+ Output:
+
+ ``` bash
+ {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+ ```
+
+#### 7.2 Get the score between speaker audio embedding
+
+**Note:** The response time will be slightly longer when using the client for the first time
+
+- Command Line (Recommended)
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+ ``` bash
+ paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
+ ```
+
+ Usage:
+
+ ``` bash
+ paddlespeech_client vector --help
+ ```
+
+ Arguments:
+ * server_ip: server ip. Default: 127.0.0.1
+ * port: server port. Default: 8090
+ * input(required): Input text to generate.
+ * task: the task of vector, can be use 'spk' or 'score。If get the score, this must be 'score' parameter.
+ * enroll: enroll audio
+ * test: test audio
+
+ Output:
+
+ ``` bash
+ [2022-05-09 10:28:40,556] [ INFO] - vector score http client start
+ [2022-05-09 10:28:40,556] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+ [2022-05-09 10:28:40,556] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+ [2022-05-09 10:28:40,731] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+ [2022-05-09 10:28:40,731] [ INFO] - The vector: None
+ [2022-05-09 10:28:40,731] [ INFO] - Response time 0.175514 s.
+ ```
+
+* Python API
+
+ ``` python
+ from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+ vectorclient_executor = VectorClientExecutor()
+ res = vectorclient_executor(
+ input=None,
+ enroll_audio="85236145389.wav",
+ test_audio="123456789.wav",
+ server_ip="127.0.0.1",
+ port=8090,
+ task="score")
+ print(res)
+ ```
+
+ Output:
+
+ ``` bash
+ [2022-05-09 10:34:54,769] [ INFO] - vector score http client start
+ [2022-05-09 10:34:54,771] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+ [2022-05-09 10:34:54,771] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+ [2022-05-09 10:34:55,026] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+ ```
+
+### 8. Punctuation prediction
+
+**Note:** The response time will be slightly longer when using the client for the first time
+
+- Command Line (Recommended)
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+ ``` bash
+ paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
+ ```
+
+ Usage:
+
+ ```bash
+ paddlespeech_client text --help
+ ```
+ Arguments:
+ - `server_ip`: server ip. Default: 127.0.0.1
+ - `port`: server port. Default: 8090
+ - `input`(required): Input text to get punctuation.
+
+ Output:
+ ```bash
+ [2022-05-09 18:19:04,397] [ INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+ [2022-05-09 18:19:04,397] [ INFO] - Response time 0.092407 s.
+ ```
+
+- Python API
+ ```python
+ from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+ textclient_executor = TextClientExecutor()
+ res = textclient_executor(
+ input="我认为跑步最重要的就是给我带来了身体健康",
+ server_ip="127.0.0.1",
+ port=8090,)
+ print(res)
+
+ ```
+
+ Output:
+ ```bash
+ 我认为跑步最重要的就是给我带来了身体健康。
+ ```
+
+
## Models supported by the service
### ASR model
Get all models supported by the ASR service via `paddlespeech_server stats --task asr`, where static models can be used for paddle inference inference.
@@ -244,3 +420,9 @@ Get all models supported by the TTS service via `paddlespeech_server stats --tas
### CLS model
Get all models supported by the CLS service via `paddlespeech_server stats --task cls`, where static models can be used for paddle inference inference.
+
+### Vector model
+Get all models supported by the TTS service via `paddlespeech_server stats --task vector`, where static models can be used for paddle inference inference.
+
+### Text model
+Get all models supported by the CLS service via `paddlespeech_server stats --task text`, where static models can be used for paddle inference inference.
diff --git a/demos/speech_server/README_cn.md b/demos/speech_server/README_cn.md
index 4a7c7447e0c0bf897fc272069a9e474f82836181..4895b182b7ae401da9e3030662d55bbd6b874818 100644
--- a/demos/speech_server/README_cn.md
+++ b/demos/speech_server/README_cn.md
@@ -1,29 +1,30 @@
-([简体中文](./README_cn.md)|English)
+(简体中文|[English](./README.md))
# 语音服务
## 介绍
-这个demo是一个启动语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
+这个 demo 是一个启动离线语音服务和访问服务的实现。它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
## 使用方法
### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
-推荐使用 **paddlepaddle 2.2.1** 或以上版本。
-你可以从 medium,hard 三中方式中选择一种方式安装 PaddleSpeech。
+推荐使用 **paddlepaddle 2.2.2** 或以上版本。
+你可以从 medium,hard 两种方式中选择一种方式安装 PaddleSpeech。
### 2. 准备配置文件
配置文件可参见 `conf/application.yaml` 。
其中,`engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
-目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)以及cls(音频分类)。
+目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)、cls(音频分类)、vector(声纹识别)以及text(文本处理)。
目前引擎类型支持两种形式:python 及 inference (Paddle Inference)
+**注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。
-这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
+ASR client 的输入是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
-可以下载此 ASR client的示例音频:
+可以下载此 ASR client 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
```
@@ -83,31 +84,34 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 4. ASR 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用)
- ```
- paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
- ```
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
- 使用帮助:
-
- ```bash
- paddlespeech_client asr --help
- ```
+ ```
+ paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
- 参数:
- - `server_ip`: 服务端ip地址,默认: 127.0.0.1。
- - `port`: 服务端口,默认: 8090。
- - `input`(必须输入): 用于识别的音频文件。
- - `sample_rate`: 音频采样率,默认值:16000。
- - `lang`: 模型语言,默认值:zh_cn。
- - `audio_format`: 音频格式,默认值:wav。
+ ```
- 输出:
+ 使用帮助:
- ```bash
- [2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
- [2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s.
- ```
+ ```bash
+ paddlespeech_client asr --help
+ ```
+
+ 参数:
+ - `server_ip`: 服务端 ip 地址,默认: 127.0.0.1。
+ - `port`: 服务端口,默认: 8090。
+ - `input`(必须输入): 用于识别的音频文件。
+ - `sample_rate`: 音频采样率,默认值:16000。
+ - `lang`: 模型语言,默认值:zh_cn。
+ - `audio_format`: 音频格式,默认值:wav。
+
+ 输出:
+
+ ```bash
+ [2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
+ [2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s.
+ ```
- Python API
```python
@@ -134,33 +138,35 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 5. TTS 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用)
-
- ```bash
- paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
- ```
- 使用帮助:
- ```bash
- paddlespeech_client tts --help
- ```
-
- 参数:
- - `server_ip`: 服务端ip地址,默认: 127.0.0.1。
- - `port`: 服务端口,默认: 8090。
- - `input`(必须输入): 待合成的文本。
- - `spk_id`: 说话人 id,用于多说话人语音合成,默认值: 0。
- - `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0
- - `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
- - `sample_rate`: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值:0
- - `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。
-
- 输出:
- ```bash
- [2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'}
- [2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav.
- [2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s.
- [2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s.
- ```
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ```bash
+ paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
+ ```
+ 使用帮助:
+
+ ```bash
+ paddlespeech_client tts --help
+ ```
+
+ 参数:
+ - `server_ip`: 服务端ip地址,默认: 127.0.0.1。
+ - `port`: 服务端口,默认: 8090。
+ - `input`(必须输入): 待合成的文本。
+ - `spk_id`: 说话人 id,用于多说话人语音合成,默认值: 0。
+ - `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0
+ - `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
+ - `sample_rate`: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值:0
+ - `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。
+
+ 输出:
+ ```bash
+ [2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'}
+ [2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav.
+ [2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s.
+ [2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s.
+ ```
- Python API
```python
@@ -192,12 +198,17 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
```
- ### 6. CLS 客户端使用方法
- **注意:** 初次使用客户端时响应时间会略长
- - 命令行 (推荐使用)
- ```
- paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
- ```
+### 6. CLS 客户端使用方法
+
+**注意:** 初次使用客户端时响应时间会略长
+
+- 命令行 (推荐使用)
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ```
+ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+ ```
使用帮助:
@@ -205,7 +216,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech_client cls --help
```
参数:
- - `server_ip`: 服务端ip地址,默认: 127.0.0.1。
+ - `server_ip`: 服务端 ip 地址,默认: 127.0.0.1。
- `port`: 服务端口,默认: 8090。
- `input`(必须输入): 用于分类的音频文件。
- `topk`: 分类结果的topk。
@@ -239,13 +250,180 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
```
+### 7. 声纹客户端使用方法
+
+#### 7.1 提取声纹特征
+注意: 初次使用客户端时响应时间会略长
+* 命令行 (推荐使用)
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ``` bash
+ paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+ ```
+
+ 使用帮助:
+
+ ``` bash
+ paddlespeech_client vector --help
+ ```
+ 参数:
+ * server_ip: 服务端ip地址,默认: 127.0.0.1。
+ * port: 服务端口,默认: 8090。
+ * input(必须输入): 用于识别的音频文件。
+ * task: vector 的任务,可选spk或者score。默认是 spk。
+ * enroll: 注册音频;。
+ * test: 测试音频。
+ 输出:
+
+ ``` bash
+ [2022-05-08 00:18:44,249] [ INFO] - vector http client start
+ [2022-05-08 00:18:44,250] [ INFO] - the input audio: 85236145389.wav
+ [2022-05-08 00:18:44,250] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
+ [2022-05-08 00:18:44,250] [ INFO] - http://127.0.0.1:8590/paddlespeech/vector
+ [2022-05-08 00:18:44,406] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+ [2022-05-08 00:18:44,406] [ INFO] - Response time 0.156481 s.
+ ```
+
+* Python API
+
+ ``` python
+ from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+ vectorclient_executor = VectorClientExecutor()
+ res = vectorclient_executor(
+ input="85236145389.wav",
+ server_ip="127.0.0.1",
+ port=8090,
+ task="spk")
+ print(res)
+ ```
+
+ 输出:
+
+ ``` bash
+ {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+ ```
+
+#### 7.2 音频声纹打分
+
+注意: 初次使用客户端时响应时间会略长
+* 命令行 (推荐使用)
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ``` bash
+ paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
+ ```
+
+ 使用帮助:
+
+ ``` bash
+ paddlespeech_client vector --help
+ ```
+
+ 参数:
+ * server_ip: 服务端ip地址,默认: 127.0.0.1。
+ * port: 服务端口,默认: 8090。
+ * input(必须输入): 用于识别的音频文件。
+ * task: vector 的任务,可选spk或者score。默认是 spk。
+ * enroll: 注册音频;。
+ * test: 测试音频。
+
+ 输出:
+
+ ``` bash
+ [2022-05-09 10:28:40,556] [ INFO] - vector score http client start
+ [2022-05-09 10:28:40,556] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+ [2022-05-09 10:28:40,556] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+ [2022-05-09 10:28:40,731] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+ [2022-05-09 10:28:40,731] [ INFO] - The vector: None
+ [2022-05-09 10:28:40,731] [ INFO] - Response time 0.175514 s.
+ ```
+
+* Python API
+
+ ``` python
+ from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+ vectorclient_executor = VectorClientExecutor()
+ res = vectorclient_executor(
+ input=None,
+ enroll_audio="85236145389.wav",
+ test_audio="123456789.wav",
+ server_ip="127.0.0.1",
+ port=8090,
+ task="score")
+ print(res)
+ ```
+
+ 输出:
+
+ ``` bash
+ [2022-05-09 10:34:54,769] [ INFO] - vector score http client start
+ [2022-05-09 10:34:54,771] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+ [2022-05-09 10:34:54,771] [ INFO] - endpoint: http://127.0.0.1:8590/paddlespeech/vector/score
+ [2022-05-09 10:34:55,026] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+ ```
+
+
+### 8. 标点预测
+
+ **注意:** 初次使用客户端时响应时间会略长
+- 命令行 (推荐使用)
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ``` bash
+ paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
+ ```
+
+ 使用帮助:
+
+ ```bash
+ paddlespeech_client text --help
+ ```
+ 参数:
+ - `server_ip`: 服务端ip地址,默认: 127.0.0.1。
+ - `port`: 服务端口,默认: 8090。
+ - `input`(必须输入): 用于标点预测的文本内容。
+
+ 输出:
+ ```bash
+ [2022-05-09 18:19:04,397] [ INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+ [2022-05-09 18:19:04,397] [ INFO] - Response time 0.092407 s.
+ ```
+
+- Python API
+ ```python
+ from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+ textclient_executor = TextClientExecutor()
+ res = textclient_executor(
+ input="我认为跑步最重要的就是给我带来了身体健康",
+ server_ip="127.0.0.1",
+ port=8090,)
+ print(res)
+
+ ```
+
+ 输出:
+ ```bash
+ 我认为跑步最重要的就是给我带来了身体健康。
+ ```
## 服务支持的模型
-### ASR支持的模型
-通过 `paddlespeech_server stats --task asr` 获取ASR服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
+### ASR 支持的模型
+通过 `paddlespeech_server stats --task asr` 获取 ASR 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
+
+### TTS 支持的模型
+通过 `paddlespeech_server stats --task tts` 获取 TTS 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
+
+### CLS 支持的模型
+通过 `paddlespeech_server stats --task cls` 获取 CLS 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
-### TTS支持的模型
-通过 `paddlespeech_server stats --task tts` 获取TTS服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
+### Vector 支持的模型
+通过 `paddlespeech_server stats --task vector` 获取 Vector 服务支持的所有模型。
-### CLS支持的模型
-通过 `paddlespeech_server stats --task cls` 获取CLS服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
+### Text支持的模型
+通过 `paddlespeech_server stats --task text` 获取 Text 服务支持的所有模型。
diff --git a/demos/speech_server/asr_client.sh b/demos/speech_server/asr_client.sh
index afe2f82181aeab08194963d126f7621bc59b8b63..37a7ab0b02e8afd6bb7d412314e804c56a2ac254 100644
--- a/demos/speech_server/asr_client.sh
+++ b/demos/speech_server/asr_client.sh
@@ -1,4 +1,6 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
diff --git a/demos/speech_server/cls_client.sh b/demos/speech_server/cls_client.sh
index 5797aa204f6ba2cb260440e8709d7905134ddf53..67012648c7ec9ce3be6aa5f4da234116864fb503 100644
--- a/demos/speech_server/cls_client.sh
+++ b/demos/speech_server/cls_client.sh
@@ -1,4 +1,6 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1
diff --git a/demos/speech_server/conf/application.yaml b/demos/speech_server/conf/application.yaml
index 2b1a05998083e08377d63ee02bc77323a7c4dce5..c6588ce802caa2419425fd5b94170a1e75d16568 100644
--- a/demos/speech_server/conf/application.yaml
+++ b/demos/speech_server/conf/application.yaml
@@ -1,15 +1,15 @@
-# This is the parameter configuration file for PaddleSpeech Serving.
+# This is the parameter configuration file for PaddleSpeech Offline Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
port: 8090
# The task format in the engin_list is: _
-# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference']
-
-engine_list: ['asr_python', 'tts_python', 'cls_python']
+# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
+protocol: 'http'
+engine_list: ['asr_python', 'tts_python', 'cls_python', 'text_python', 'vector_python']
#################################################################################
@@ -135,3 +135,26 @@ cls_inference:
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
+
+################################### Text #########################################
+################### text task: punc; engine_type: python #######################
+text_python:
+ task: punc
+ model_type: 'ernie_linear_p3_wudao'
+ lang: 'zh'
+ sample_rate: 16000
+ cfg_path: # [optional]
+ ckpt_path: # [optional]
+ vocab_file: # [optional]
+ device: # set 'gpu:id' or 'cpu'
+
+
+################################### Vector ######################################
+################### Vector task: spk; engine_type: python #######################
+vector_python:
+ task: spk
+ model_type: 'ecapatdnn_voxceleb12'
+ sample_rate: 16000
+ cfg_path: # [optional]
+ ckpt_path: # [optional]
+ device: # set 'gpu:id' or 'cpu'
diff --git a/demos/speech_server/tts_client.sh b/demos/speech_server/tts_client.sh
index a756dfd3ef555f0b74e845d1b7754bed1d826e19..a443a0a94a6a6e19f0a0cf40708ebca3e8137624 100644
--- a/demos/speech_server/tts_client.sh
+++ b/demos/speech_server/tts_client.sh
@@ -1,3 +1,4 @@
#!/bin/bash
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
diff --git a/demos/streaming_asr_server/README.md b/demos/streaming_asr_server/README.md
index 0eed8e5615f5185af884e372bf25d27b09a93936..4824da6281bc883f393dc16c9e43ba38c6bdcf6e 100644
--- a/demos/streaming_asr_server/README.md
+++ b/demos/streaming_asr_server/README.md
@@ -1,10 +1,11 @@
([简体中文](./README_cn.md)|English)
-# Speech Server
+# Streaming ASR Server
## Introduction
This demo is an implementation of starting the streaming speech service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
+Streaming ASR server only support `websocket` protocol, and doesn't support `http` protocol.
## Usage
### 1. Installation
@@ -14,7 +15,7 @@ It is recommended to use **paddlepaddle 2.2.1** or above.
You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File
-The configuration file can be found in `conf/ws_application.yaml` 和 `conf/ws_conformer_application.yaml`.
+The configuration file can be found in `conf/ws_application.yaml` 和 `conf/ws_conformer_wenetspeech_application.yaml`.
At present, the speech tasks integrated by the model include: DeepSpeech2 and conformer.
@@ -28,10 +29,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
### 3. Server Usage
- Command Line (Recommended)
-
+ **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
```bash
- # start the service
- paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
+ # in PaddleSpeech/demos/streaming_asr_server start the service
+ paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
```
Usage:
@@ -40,156 +41,82 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
paddlespeech_server start --help
```
Arguments:
- - `config_file`: yaml file of the app, defalut: ./conf/ws_conformer_application.yaml
- - `log_file`: log file. Default: ./log/paddlespeech.log
+ - `config_file`: yaml file of the app, defalut: `./conf/application.yaml`
+ - `log_file`: log file. Default: `./log/paddlespeech.log`
Output:
```bash
- [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
- [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
- [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
- [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
- [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
- [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
- INFO: Started server process [11173]
- [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
- INFO: Waiting for application startup.
- [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
- INFO: Application startup complete.
- [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- infos = await tasks.gather(*fs, loop=self)
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- await tasks.sleep(0, loop=self)
- INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
- [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
+ [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
+ [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+ [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
+ [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
+ [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
+ [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
+ [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
+ INFO: Started server process [4242]
+ [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+ INFO: Waiting for application startup.
+ [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
- Python API
+ **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
```python
+ # in PaddleSpeech/demos/streaming_asr_server directory
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
- config_file="./conf/ws_conformer_application.yaml",
+ config_file="./conf/ws_conformer_wenetspeech_application.yaml",
log_file="./log/paddlespeech.log")
```
Output:
```bash
- [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
- [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
- [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
- [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
- [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
- [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
- INFO: Started server process [11173]
- [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
- INFO: Waiting for application startup.
- [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
- INFO: Application startup complete.
- [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- infos = await tasks.gather(*fs, loop=self)
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- await tasks.sleep(0, loop=self)
- INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
- [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
+ [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
+ [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+ [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
+ [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
+ [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
+ [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
+ [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
+ INFO: Started server process [4242]
+ [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+ INFO: Waiting for application startup.
+ [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
### 4. ASR Client Usage
+
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
- ```
- paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
- ```
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+ ```
+ paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+ ```
Usage:
@@ -203,81 +130,86 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- `sample_rate`: Audio ampling rate, default: 16000.
- `lang`: Language. Default: "zh_cn".
- `audio_format`: Audio format. Default: "wav".
+ - `punc.server_ip`: punctuation server ip. Default: None.
+ - `punc.server_port`: punctuation server port. Default: None.
Output:
```bash
- [2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
- [2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
- [2022-04-21 15:59:12,884] [ INFO] - Response time 9.051567 s.
+ [2022-05-06 21:10:35,598] [ INFO] - Start to do streaming asr client
+ [2022-05-06 21:10:35,600] [ INFO] - asr websocket client start
+ [2022-05-06 21:10:35,600] [ INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+ [2022-05-06 21:10:35,600] [ INFO] - start to process the wavscp: ./zh.wav
+ [2022-05-06 21:10:35,670] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-06 21:10:35,699] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,713] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,726] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,738] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,750] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,762] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,774] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,786] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,387] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,398] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,407] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,416] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,425] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,434] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,442] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,930] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,938] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,946] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,954] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,962] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,970] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,977] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,985] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:37,484] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,492] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,500] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,508] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,517] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,525] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,532] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:38,050] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,058] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,066] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,073] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,081] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,089] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,097] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,105] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,630] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,639] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,647] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,655] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,663] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,671] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,679] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:39,216] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,224] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,232] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,240] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,248] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,256] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,264] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,272] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,885] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,896] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,905] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,915] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,924] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,934] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:44,827] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-06 21:10:44,827] [ INFO] - audio duration: 4.9968125, elapsed time: 9.225094079971313, RTF=1.846195765794957
+ [2022-05-06 21:10:44,828] [ INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
```
- Python API
```python
- from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
- import json
+ from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
- asrclient_executor = ASRClientExecutor()
+ asrclient_executor = ASROnlineClientExecutor()
res = asrclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
@@ -285,71 +217,359 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
sample_rate=16000,
lang="zh_cn",
audio_format="wav")
- print(res.json())
+ print(res)
```
Output:
```bash
- [2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
- [2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
+ [2022-05-06 21:14:03,137] [ INFO] - asr websocket client start
+ [2022-05-06 21:14:03,137] [ INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+ [2022-05-06 21:14:03,149] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-06 21:14:03,167] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,181] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,194] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,207] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,219] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,230] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,241] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,252] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,768] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,776] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,784] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,792] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,800] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,807] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,815] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:04,301] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,309] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,317] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,325] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,333] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,341] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,349] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,356] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,855] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,864] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,871] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,879] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,887] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,894] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,902] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:05,418] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,426] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,434] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,442] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,449] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,457] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,465] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,473] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,996] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,006] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,013] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,021] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,029] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,037] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,045] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,581] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,589] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,597] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,605] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,613] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,621] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,628] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,636] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:07,188] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,196] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,203] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,211] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,219] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,226] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:12,158] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-06 21:14:12,159] [ INFO] - audio duration: 4.9968125, elapsed time: 9.019973039627075, RTF=1.8051453881103354
+ [2022-05-06 21:14:12,160] [ INFO] - asr websocket client finished
+ ```
+
+
+## Punctuation service
+
+### 1. Server usage
+
+- Command Line
+ **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
+ ``` bash
+ In PaddleSpeech/demos/streaming_asr_server directory to lanuch punctuation service
+ paddlespeech_server start --config_file conf/punc_application.yaml
+ ```
+
+
+ Usage:
+ ```bash
+ paddlespeech_server start --help
+ ```
+
+ Arguments:
+ - `config_file`: configuration file.
+ - `log_file`: log file.
+
+
+ Output:
+ ``` bash
+ [2022-05-02 17:59:26,285] [ INFO] - Create the TextEngine Instance
+ [2022-05-02 17:59:26,285] [ INFO] - Init the text engine
+ [2022-05-02 17:59:26,285] [ INFO] - Text Engine set the device: gpu:0
+ [2022-05-02 17:59:26,286] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+ [2022-05-02 17:59:30,810] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+ W0502 17:59:31.486552 9595 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+ W0502 17:59:31.491360 9595 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+ [2022-05-02 17:59:34,688] [ INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+ [2022-05-02 17:59:34,701] [ INFO] - Init the text engine successfully
+ INFO: Started server process [9595]
+ [2022-05-02 17:59:34] [INFO] [server.py:75] Started server process [9595]
+ INFO: Waiting for application startup.
+ [2022-05-02 17:59:34] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-02 17:59:34] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ [2022-05-02 17:59:34] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ ```
+
+- Python API
+ **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
+ ```python
+ # 在 PaddleSpeech/demos/streaming_asr_server 目录
+ from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+ server_executor = ServerExecutor()
+ server_executor(
+ config_file="./conf/punc_application.yaml",
+ log_file="./log/paddlespeech.log")
+ ```
+
+ Output:
+ ```
+ [2022-05-02 18:09:02,542] [ INFO] - Create the TextEngine Instance
+ [2022-05-02 18:09:02,543] [ INFO] - Init the text engine
+ [2022-05-02 18:09:02,543] [ INFO] - Text Engine set the device: gpu:0
+ [2022-05-02 18:09:02,545] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+ [2022-05-02 18:09:06,919] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+ W0502 18:09:07.523002 22615 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+ W0502 18:09:07.527882 22615 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+ [2022-05-02 18:09:10,900] [ INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+ [2022-05-02 18:09:10,913] [ INFO] - Init the text engine successfully
+ INFO: Started server process [22615]
+ [2022-05-02 18:09:10] [INFO] [server.py:75] Started server process [22615]
+ INFO: Waiting for application startup.
+ [2022-05-02 18:09:10] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-02 18:09:10] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ [2022-05-02 18:09:10] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ ```
+
+### 2. Client usage
+**Note** The response time will be slightly longer when using the client for the first time
+
+- Command line:
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+ ```
+ paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康"
+ ```
+
+ Output
+ ```
+ [2022-05-02 18:12:29,767] [ INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+ [2022-05-02 18:12:29,767] [ INFO] - Response time 0.096548 s.
+ ```
+
+- Python3 API
+
+ ```python
+ from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+ textclient_executor = TextClientExecutor()
+ res = textclient_executor(
+ input="我认为跑步最重要的就是给我带来了身体健康",
+ server_ip="127.0.0.1",
+ port=8190,)
+ print(res)
+ ```
+
+ Output:
+ ``` bash
+ 我认为跑步最重要的就是给我带来了身体健康。
+ ```
+
+
+## Join streaming asr and punctuation server
+
+By default, each server is deployed on the 'CPU' device and speech recognition and punctuation prediction can be deployed on different 'GPU' by modifying the' device 'parameter in the service configuration file respectively.
+
+We use `streaming_ asr_server.py` and `punc_server.py` two services to lanuch streaming speech recognition and punctuation prediction services respectively. And the `websocket_client.py` script can be used to call streaming speech recognition and punctuation prediction services at the same time.
+
+### 1. Start two server
+
+``` bash
+Note: streaming speech recognition and punctuation prediction are configured on different graphics cards through configuration files
+bash server.sh
+```
+
+### 2. Call client
+- Command line
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+ ```
+ paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
+ ```
+ Output:
+ ```
+ [2022-05-07 11:21:47,060] [ INFO] - asr websocket client start
+ [2022-05-07 11:21:47,060] [ INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+ [2022-05-07 11:21:47,080] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-07 11:21:47,096] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,108] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,120] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,131] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,142] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,152] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,163] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,173] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,705] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,713] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,721] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,728] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,736] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,743] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,751] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:48,459] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,572] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,681] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,790] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,898] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,005] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,112] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,219] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,935] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,062] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,186] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,310] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,435] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,560] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,686] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:51,444] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:51,606] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:51,744] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:51,882] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,020] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,159] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,298] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,437] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:53,298] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,450] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,589] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,728] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,867] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:54,007] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:54,146] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:55,002] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,148] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,292] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,437] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,584] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,731] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,877] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:56,021] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:56,842] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,013] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,174] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,336] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,497] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,659] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:22:03,035] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-07 11:22:03,035] [ INFO] - audio duration: 4.9968125, elapsed time: 15.974023818969727, RTF=3.1968427510477384
+ [2022-05-07 11:22:03,037] [ INFO] - asr websocket client finished
+ [2022-05-07 11:22:03,037] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康。
+ [2022-05-07 11:22:03,037] [ INFO] - Response time 15.977116 s.
```
+
+- Use script
+
+ If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+ ```
+ python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
+ ```
+ Output:
+ ```
+ [2022-05-07 11:11:02,984] [ INFO] - Start to do streaming asr client
+ [2022-05-07 11:11:02,985] [ INFO] - asr websocket client start
+ [2022-05-07 11:11:02,985] [ INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+ [2022-05-07 11:11:02,986] [ INFO] - start to process the wavscp: ./zh.wav
+ [2022-05-07 11:11:03,006] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-07 11:11:03,021] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,034] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,046] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,058] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,070] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,081] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,092] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,102] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,629] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,638] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,645] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,653] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,661] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,668] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,676] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:04,402] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,510] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,619] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,743] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,849] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,956] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:05,063] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:05,170] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:05,876] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,019] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,184] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,342] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,537] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,727] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,871] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:07,617] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:07,769] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:07,905] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,043] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,186] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,326] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,466] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,611] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:09,431] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,571] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,714] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,853] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,992] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:10,129] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:10,266] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:11,113] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,296] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,439] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,582] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,727] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,869] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:12,011] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:12,153] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:12,969] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,137] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,297] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,456] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,615] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,776] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:18,915] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-07 11:11:18,915] [ INFO] - audio duration: 4.9968125, elapsed time: 15.928460597991943, RTF=3.187724293835709
+ [2022-05-07 11:11:18,916] [ INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
+ ```
+
+
diff --git a/demos/streaming_asr_server/README_cn.md b/demos/streaming_asr_server/README_cn.md
index bf122bb3afe845d76a6327c378917169c4dbf3ff..4ed15e17e4d2189e1579ca5a528f2072b41af320 100644
--- a/demos/streaming_asr_server/README_cn.md
+++ b/demos/streaming_asr_server/README_cn.md
@@ -1,22 +1,30 @@
([English](./README.md)|中文)
-# 语音服务
+# 流式语音识别服务
## 介绍
这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
+**流式语音识别服务只支持 `weboscket` 协议,不支持 `http` 协议。**
## 使用方法
### 1. 安装
-请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+安装 PaddleSpeech 的详细过程请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md)。
推荐使用 **paddlepaddle 2.2.1** 或以上版本。
-你可以从 medium,hard 三中方式中选择一种方式安装 PaddleSpeech。
+你可以从medium,hard 两种方式中选择一种方式安装 PaddleSpeech。
### 2. 准备配置文件
-配置文件可参见 `conf/ws_application.yaml` 和 `conf/ws_conformer_application.yaml` 。
-目前服务集成的模型有: DeepSpeech2和conformer模型。
+
+流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。
+下载好 `PaddleSpeech` 之后,进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。
+配置文件可参见该目录下 `conf/ws_application.yaml` 和 `conf/ws_conformer_wenetspeech_application.yaml` 。
+
+目前服务集成的模型有: DeepSpeech2和 conformer模型,对应的配置文件如下:
+* DeepSpeech: `conf/ws_application.yaml`
+* conformer: `conf/ws_conformer_wenetspeech_application.yaml`
+
这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
@@ -28,10 +36,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
### 3. 服务端使用方法
- 命令行 (推荐使用)
-
+ **注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
```bash
- # 启动服务
- paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
+ # 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
+ paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
```
使用方法:
@@ -40,155 +48,80 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
paddlespeech_server start --help
```
参数:
- - `config_file`: 服务的配置文件,默认: ./conf/ws_conformer_application.yaml
- - `log_file`: log 文件. 默认:./log/paddlespeech.log
+ - `config_file`: 服务的配置文件,默认: `./conf/application.yaml`
+ - `log_file`: log 文件. 默认:`./log/paddlespeech.log`
输出:
```bash
- [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
- [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
- [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
- [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
- [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
- [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
- INFO: Started server process [11173]
- [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
- INFO: Waiting for application startup.
- [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
- INFO: Application startup complete.
- [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- infos = await tasks.gather(*fs, loop=self)
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- await tasks.sleep(0, loop=self)
- INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
- [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
+ [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
+ [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+ [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
+ [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
+ [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
+ [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
+ [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
+ INFO: Started server process [4242]
+ [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+ INFO: Waiting for application startup.
+ [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
- Python API
+ **注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
```python
+ # 在 PaddleSpeech/demos/streaming_asr_server 目录
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
- config_file="./conf/ws_conformer_application.yaml",
+ config_file="./conf/ws_conformer_wenetspeech_application",
log_file="./log/paddlespeech.log")
```
输出:
```bash
- [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
- [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
- [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
- [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
- [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
- [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- set kaiming_uniform
- [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
- [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
- INFO: Started server process [11173]
- [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
- INFO: Waiting for application startup.
- [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
- INFO: Application startup complete.
- [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- infos = await tasks.gather(*fs, loop=self)
- /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
- await tasks.sleep(0, loop=self)
- INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
- [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
+ [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
+ [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+ [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
+ [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
+ [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
+ [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
+ [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
+ [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
+ INFO: Started server process [4242]
+ [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+ INFO: Waiting for application startup.
+ [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+ [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
### 4. ASR 客户端使用方法
+
**注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用)
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
-
```
使用帮助:
@@ -204,79 +137,84 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- `sample_rate`: 音频采样率,默认值:16000。
- `lang`: 模型语言,默认值:zh_cn。
- `audio_format`: 音频格式,默认值:wav。
+ - `punc.server_ip` 标点预测服务的ip。默认是None。
+ - `punc.server_port` 标点预测服务的端口port。默认是None。
输出:
```bash
- [2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
- [2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
- [2022-04-21 15:59:12,884] [ INFO] - Response time 9.051567 s.
+ [2022-05-06 21:10:35,598] [ INFO] - Start to do streaming asr client
+ [2022-05-06 21:10:35,600] [ INFO] - asr websocket client start
+ [2022-05-06 21:10:35,600] [ INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+ [2022-05-06 21:10:35,600] [ INFO] - start to process the wavscp: ./zh.wav
+ [2022-05-06 21:10:35,670] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-06 21:10:35,699] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,713] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,726] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,738] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,750] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,762] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,774] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:35,786] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,387] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,398] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,407] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,416] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,425] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,434] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,442] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:10:36,930] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,938] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,946] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,954] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,962] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,970] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,977] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:36,985] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:10:37,484] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,492] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,500] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,508] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,517] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,525] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:37,532] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:10:38,050] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,058] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,066] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,073] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,081] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,089] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,097] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,105] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:10:38,630] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,639] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,647] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,655] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,663] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,671] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:38,679] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:10:39,216] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,224] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,232] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,240] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,248] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,256] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,264] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,272] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:10:39,885] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,896] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,905] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,915] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,924] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:39,934] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:10:44,827] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-06 21:10:44,827] [ INFO] - audio duration: 4.9968125, elapsed time: 9.225094079971313, RTF=1.846195765794957
+ [2022-05-06 21:10:44,828] [ INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
- import json
asrclient_executor = ASROnlineClientExecutor()
res = asrclient_executor(
@@ -286,71 +224,360 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
sample_rate=16000,
lang="zh_cn",
audio_format="wav")
- print(res.json())
+ print(res)
```
输出:
```bash
- [2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
- [2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
- [2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
- [2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
- [2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
- [2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
- [2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
- [2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
- [2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
+ [2022-05-06 21:14:03,137] [ INFO] - asr websocket client start
+ [2022-05-06 21:14:03,137] [ INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+ [2022-05-06 21:14:03,149] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-06 21:14:03,167] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,181] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,194] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,207] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,219] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,230] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,241] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,252] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,768] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,776] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,784] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,792] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,800] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,807] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:03,815] [ INFO] - client receive msg={'result': ''}
+ [2022-05-06 21:14:04,301] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,309] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,317] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,325] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,333] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,341] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,349] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,356] [ INFO] - client receive msg={'result': '我认为跑'}
+ [2022-05-06 21:14:04,855] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,864] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,871] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,879] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,887] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,894] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:04,902] [ INFO] - client receive msg={'result': '我认为跑步最重要的'}
+ [2022-05-06 21:14:05,418] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,426] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,434] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,442] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,449] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,457] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,465] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,473] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+ [2022-05-06 21:14:05,996] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,006] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,013] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,021] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,029] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,037] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,045] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+ [2022-05-06 21:14:06,581] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,589] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,597] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,605] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,613] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,621] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,628] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:06,636] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+ [2022-05-06 21:14:07,188] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,196] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,203] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,211] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,219] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:07,226] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+ [2022-05-06 21:14:12,158] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-06 21:14:12,159] [ INFO] - audio duration: 4.9968125, elapsed time: 9.019973039627075, RTF=1.8051453881103354
+ [2022-05-06 21:14:12,160] [ INFO] - asr websocket client finished
+ ```
+
+
+
+## 标点预测
+
+### 1. 服务端使用方法
+
+- 命令行
+ **注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
+ ``` bash
+ 在 PaddleSpeech/demos/streaming_asr_server 目录下启动标点预测服务
+ paddlespeech_server start --config_file conf/punc_application.yaml
+ ```
+
+
+ 使用方法:
+
+ ```bash
+ paddlespeech_server start --help
+ ```
+
+ 参数:
+ - `config_file`: 服务的配置文件。
+ - `log_file`: log 文件。
+
+
+ 输出:
+ ``` bash
+ [2022-05-02 17:59:26,285] [ INFO] - Create the TextEngine Instance
+ [2022-05-02 17:59:26,285] [ INFO] - Init the text engine
+ [2022-05-02 17:59:26,285] [ INFO] - Text Engine set the device: gpu:0
+ [2022-05-02 17:59:26,286] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+ [2022-05-02 17:59:30,810] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+ W0502 17:59:31.486552 9595 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+ W0502 17:59:31.491360 9595 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+ [2022-05-02 17:59:34,688] [ INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+ [2022-05-02 17:59:34,701] [ INFO] - Init the text engine successfully
+ INFO: Started server process [9595]
+ [2022-05-02 17:59:34] [INFO] [server.py:75] Started server process [9595]
+ INFO: Waiting for application startup.
+ [2022-05-02 17:59:34] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-02 17:59:34] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ [2022-05-02 17:59:34] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ ```
+
+- Python API
+ **注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
+ ```python
+ # 在 PaddleSpeech/demos/streaming_asr_server 目录
+ from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+ server_executor = ServerExecutor()
+ server_executor(
+ config_file="./conf/punc_application.yaml",
+ log_file="./log/paddlespeech.log")
```
+
+ 输出
+ ```
+ [2022-05-02 18:09:02,542] [ INFO] - Create the TextEngine Instance
+ [2022-05-02 18:09:02,543] [ INFO] - Init the text engine
+ [2022-05-02 18:09:02,543] [ INFO] - Text Engine set the device: gpu:0
+ [2022-05-02 18:09:02,545] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+ [2022-05-02 18:09:06,919] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+ W0502 18:09:07.523002 22615 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+ W0502 18:09:07.527882 22615 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+ [2022-05-02 18:09:10,900] [ INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+ [2022-05-02 18:09:10,913] [ INFO] - Init the text engine successfully
+ INFO: Started server process [22615]
+ [2022-05-02 18:09:10] [INFO] [server.py:75] Started server process [22615]
+ INFO: Waiting for application startup.
+ [2022-05-02 18:09:10] [INFO] [on.py:45] Waiting for application startup.
+ INFO: Application startup complete.
+ [2022-05-02 18:09:10] [INFO] [on.py:59] Application startup complete.
+ INFO: Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ [2022-05-02 18:09:10] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+ ```
+
+### 2. 标点预测客户端使用方法
+**注意:** 初次使用客户端时响应时间会略长
+
+- 命令行 (推荐使用)
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ```
+ paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康"
+ ```
+
+ 输出
+ ```
+ [2022-05-02 18:12:29,767] [ INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+ [2022-05-02 18:12:29,767] [ INFO] - Response time 0.096548 s.
+ ```
+
+- Python3 API
+
+ ```python
+ from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+ textclient_executor = TextClientExecutor()
+ res = textclient_executor(
+ input="我认为跑步最重要的就是给我带来了身体健康",
+ server_ip="127.0.0.1",
+ port=8190,)
+ print(res)
+ ```
+
+ 输出:
+ ``` bash
+ 我认为跑步最重要的就是给我带来了身体健康。
+ ```
+
+
+## 联合流式语音识别和标点预测
+**注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数将语音识别和标点预测部署在不同的 `gpu` 上。
+
+使用 `streaming_asr_server.py` 和 `punc_server.py` 两个服务,分别启动流式语音识别和标点预测服务。调用 `websocket_client.py` 脚本可以同时调用流式语音识别和标点预测服务。
+
+### 1. 启动服务
+
+``` bash
+注意:流式语音识别和标点预测通过配置文件配置到不同的显卡上
+bash server.sh
+```
+
+### 2. 调用服务
+- 使用命令行:
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ```
+ paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
+ ```
+ 输出:
+ ```
+ [2022-05-07 11:21:47,060] [ INFO] - asr websocket client start
+ [2022-05-07 11:21:47,060] [ INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+ [2022-05-07 11:21:47,080] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-07 11:21:47,096] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,108] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,120] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,131] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,142] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,152] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,163] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,173] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,705] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,713] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,721] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,728] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,736] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,743] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:47,751] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:21:48,459] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,572] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,681] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,790] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:48,898] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,005] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,112] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,219] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:21:49,935] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,062] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,186] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,310] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,435] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,560] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:50,686] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:21:51,444] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:51,606] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:51,744] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:51,882] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,020] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,159] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,298] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:52,437] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:21:53,298] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,450] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,589] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,728] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:53,867] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:54,007] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:54,146] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:21:55,002] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,148] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,292] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,437] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,584] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,731] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:55,877] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:56,021] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:21:56,842] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,013] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,174] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,336] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,497] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:21:57,659] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:22:03,035] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-07 11:22:03,035] [ INFO] - audio duration: 4.9968125, elapsed time: 15.974023818969727, RTF=3.1968427510477384
+ [2022-05-07 11:22:03,037] [ INFO] - asr websocket client finished
+ [2022-05-07 11:22:03,037] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康。
+ [2022-05-07 11:22:03,037] [ INFO] - Response time 15.977116 s.
+ ```
+
+- 使用脚本调用
+
+ 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
+
+ ```
+ python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
+ ```
+ 输出:
+ ```
+ [2022-05-07 11:11:02,984] [ INFO] - Start to do streaming asr client
+ [2022-05-07 11:11:02,985] [ INFO] - asr websocket client start
+ [2022-05-07 11:11:02,985] [ INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+ [2022-05-07 11:11:02,986] [ INFO] - start to process the wavscp: ./zh.wav
+ [2022-05-07 11:11:03,006] [ INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+ [2022-05-07 11:11:03,021] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,034] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,046] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,058] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,070] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,081] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,092] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,102] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,629] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,638] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,645] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,653] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,661] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,668] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:03,676] [ INFO] - client receive msg={'result': ''}
+ [2022-05-07 11:11:04,402] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,510] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,619] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,743] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,849] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:04,956] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:05,063] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:05,170] [ INFO] - client receive msg={'result': '我认为,跑'}
+ [2022-05-07 11:11:05,876] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,019] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,184] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,342] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,537] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,727] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:06,871] [ INFO] - client receive msg={'result': '我认为,跑步最重要的。'}
+ [2022-05-07 11:11:07,617] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:07,769] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:07,905] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,043] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,186] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,326] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,466] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:08,611] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是。'}
+ [2022-05-07 11:11:09,431] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,571] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,714] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,853] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:09,992] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:10,129] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:10,266] [ INFO] - client receive msg={'result': '我认为,跑步最重要的就是给。'}
+ [2022-05-07 11:11:11,113] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,296] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,439] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,582] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,727] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:11,869] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:12,011] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:12,153] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+ [2022-05-07 11:11:12,969] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,137] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,297] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,456] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,615] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:13,776] [ INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+ [2022-05-07 11:11:18,915] [ INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+ [2022-05-07 11:11:18,915] [ INFO] - audio duration: 4.9968125, elapsed time: 15.928460597991943, RTF=3.187724293835709
+ [2022-05-07 11:11:18,916] [ INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
+ ```
+
+
diff --git a/demos/streaming_asr_server/conf/application.yaml b/demos/streaming_asr_server/conf/application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e9a89c19d2ad08db9a6c41ec94bdf21be95125b0
--- /dev/null
+++ b/demos/streaming_asr_server/conf/application.yaml
@@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+# SERVER SETTING #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: _
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+# ENGINE CONFIG #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+ model_type: 'conformer_online_wenetspeech'
+ am_model: # the pdmodel file of am static model [optional]
+ am_params: # the pdiparams file of am static model [optional]
+ lang: 'zh'
+ sample_rate: 16000
+ cfg_path:
+ decode_method:
+ force_yes: True
+ device: 'cpu' # cpu or gpu:id
+ decode_method: "attention_rescoring"
+ am_predictor_conf:
+ device: # set 'gpu:id' or 'cpu'
+ switch_ir_optim: True
+ glog_info: False # True -> print glog
+ summary: True # False -> do not show predictor config
+
+ chunk_buffer_conf:
+ window_n: 7 # frame
+ shift_n: 4 # frame
+ window_ms: 25 # ms
+ shift_ms: 10 # ms
+ sample_rate: 16000
+ sample_width: 2
diff --git a/demos/streaming_asr_server/conf/punc_application.yaml b/demos/streaming_asr_server/conf/punc_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f947525e16478cbbf739c0281cb2234467b82972
--- /dev/null
+++ b/demos/streaming_asr_server/conf/punc_application.yaml
@@ -0,0 +1,35 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+# SERVER SETTING #
+#################################################################################
+host: 0.0.0.0
+port: 8190
+
+# The task format in the engin_list is: _
+# task choices = ['asr_python']
+# protocol = ['http'] (only one can be selected).
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['text_python']
+
+
+#################################################################################
+# ENGINE CONFIG #
+#################################################################################
+
+################################### Text #########################################
+################### text task: punc; engine_type: python #######################
+text_python:
+ task: punc
+ model_type: 'ernie_linear_p3_wudao'
+ lang: 'zh'
+ sample_rate: 16000
+ cfg_path: # [optional]
+ ckpt_path: # [optional]
+ vocab_file: # [optional]
+ device: 'cpu' # set 'gpu:id' or 'cpu'
+
+
+
+
diff --git a/demos/streaming_asr_server/conf/ws_application.yaml b/demos/streaming_asr_server/conf/ws_application.yaml
index dee8d78baa933f4447ea1a5afffc157fd70bfa7c..f2ea6330f690801182f457ba1170207a12e14b18 100644
--- a/demos/streaming_asr_server/conf/ws_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_application.yaml
@@ -7,8 +7,8 @@ host: 0.0.0.0
port: 8090
# The task format in the engin_list is: _
-# task choices = ['asr_online', 'tts_online']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
@@ -29,6 +29,7 @@ asr_online:
cfg_path:
decode_method:
force_yes: True
+ device: 'cpu' # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
diff --git a/demos/streaming_asr_server/conf/ws_conformer_application.yaml b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
index 8f01148590697d2b0fec9141ca2ec09c8b946d00..2affde0739ff5873a88cbe621ebf907ab0663dcb 100644
--- a/demos/streaming_asr_server/conf/ws_conformer_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
@@ -7,8 +7,8 @@ host: 0.0.0.0
port: 8090
# The task format in the engin_list is: _
-# task choices = ['asr_online', 'tts_online']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
@@ -29,7 +29,7 @@ asr_online:
cfg_path:
decode_method:
force_yes: True
- device: # cpu or gpu:id
+ device: 'cpu' # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
@@ -42,4 +42,4 @@ asr_online:
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
- sample_width: 2
\ No newline at end of file
+ sample_width: 2
diff --git a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e9a89c19d2ad08db9a6c41ec94bdf21be95125b0
--- /dev/null
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
@@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+# SERVER SETTING #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: _
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+# ENGINE CONFIG #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+ model_type: 'conformer_online_wenetspeech'
+ am_model: # the pdmodel file of am static model [optional]
+ am_params: # the pdiparams file of am static model [optional]
+ lang: 'zh'
+ sample_rate: 16000
+ cfg_path:
+ decode_method:
+ force_yes: True
+ device: 'cpu' # cpu or gpu:id
+ decode_method: "attention_rescoring"
+ am_predictor_conf:
+ device: # set 'gpu:id' or 'cpu'
+ switch_ir_optim: True
+ glog_info: False # True -> print glog
+ summary: True # False -> do not show predictor config
+
+ chunk_buffer_conf:
+ window_n: 7 # frame
+ shift_n: 4 # frame
+ window_ms: 25 # ms
+ shift_ms: 10 # ms
+ sample_rate: 16000
+ sample_width: 2
diff --git a/demos/streaming_asr_server/punc_server.py b/demos/streaming_asr_server/punc_server.py
new file mode 100644
index 0000000000000000000000000000000000000000..eefa0fb407f44c5f9e2d6f8ac282a64c85ff5d3d
--- /dev/null
+++ b/demos/streaming_asr_server/punc_server.py
@@ -0,0 +1,38 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(
+ prog='paddlespeech_server.start', add_help=True)
+ parser.add_argument(
+ "--config_file",
+ action="store",
+ help="yaml file of the app",
+ default=None,
+ required=True)
+
+ parser.add_argument(
+ "--log_file",
+ action="store",
+ help="log file",
+ default="./log/paddlespeech.log")
+ logger.info("start to parse the args")
+ args = parser.parse_args()
+
+ logger.info("start to launch the punctuation server")
+ punc_server = ServerExecutor()
+ punc_server(config_file=args.config_file, log_file=args.log_file)
diff --git a/demos/streaming_asr_server/server.sh b/demos/streaming_asr_server/server.sh
new file mode 100755
index 0000000000000000000000000000000000000000..4266f8c642c83ece8dc4a2dd29812acfad4d6f8a
--- /dev/null
+++ b/demos/streaming_asr_server/server.sh
@@ -0,0 +1,8 @@
+export CUDA_VISIBLE_DEVICE=0,1,2,3
+ export CUDA_VISIBLE_DEVICE=0,1,2,3
+
+# nohup python3 punc_server.py --config_file conf/punc_application.yaml > punc.log 2>&1 &
+paddlespeech_server start --config_file conf/punc_application.yaml &> punc.log &
+
+# nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log 2>&1 &
+paddlespeech_server start --config_file conf/ws_conformer_application.yaml &> streaming_asr.log &
\ No newline at end of file
diff --git a/demos/streaming_asr_server/streaming_asr_server.py b/demos/streaming_asr_server/streaming_asr_server.py
new file mode 100644
index 0000000000000000000000000000000000000000..011b009aaf8b6736e5910ddca76df5f1ecdd56e0
--- /dev/null
+++ b/demos/streaming_asr_server/streaming_asr_server.py
@@ -0,0 +1,38 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(
+ prog='paddlespeech_server.start', add_help=True)
+ parser.add_argument(
+ "--config_file",
+ action="store",
+ help="yaml file of the app",
+ default=None,
+ required=True)
+
+ parser.add_argument(
+ "--log_file",
+ action="store",
+ help="log file",
+ default="./log/paddlespeech.log")
+ logger.info("start to parse the args")
+ args = parser.parse_args()
+
+ logger.info("start to launch the streaming asr server")
+ streaming_asr_server = ServerExecutor()
+ streaming_asr_server(config_file=args.config_file, log_file=args.log_file)
diff --git a/demos/streaming_asr_server/test.sh b/demos/streaming_asr_server/test.sh
old mode 100644
new mode 100755
index fe8155cf347ead91a5956e3e575e9bf52d99af9a..4f43c6534f078683329a287bb87a1c79cff15b8f
--- a/demos/streaming_asr_server/test.sh
+++ b/demos/streaming_asr_server/test.sh
@@ -1,5 +1,12 @@
# download the test wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
-# read the wav and pass it to service
-python3 websocket_client.py --wavfile ./zh.wav
+# read the wav and pass it to only streaming asr service
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --wavfile ./zh.wav
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --input ./zh.wav
+
+# read the wav and call streaming and punc service
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
\ No newline at end of file
diff --git a/demos/streaming_asr_server/web/templates/index.html b/demos/streaming_asr_server/web/templates/index.html
index 7aa227fb1d946894854131a0fb91305bd319eec0..56c630808567177993dfdb633a60a1d6c1299b4f 100644
--- a/demos/streaming_asr_server/web/templates/index.html
+++ b/demos/streaming_asr_server/web/templates/index.html
@@ -93,7 +93,7 @@
function parseResult(data) {
var data = JSON.parse(data)
- var result = data.asr_results
+ var result = data.result
console.log(result)
$("#resultPanel").html(result)
}
@@ -152,4 +152,4 @@