diff --git a/README.md b/README.md index d32131c0d7c7d4934cc82b17cb1f7ccfe6499896..2ade8a69ce0ad8cc9b9af77b76e87c9ba5e90b7b 100644 --- a/README.md +++ b/README.md @@ -161,6 +161,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). ### Recent Update +- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md) - 👏🏻 2022.05.06: `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp`. - 👏🏻 2022.05.06: `Server` is available for `Speaker Verification`, and `Punctuation Restoration`. - 👏🏻 2022.04.28: `Streaming Server` is available for `Automatic Speech Recognition` and `Text-to-Speech`. diff --git a/README_cn.md b/README_cn.md index ceb9dc187c4cf613fbfafba111c4909e0935e0ee..f5ba93629d897b793ffb45a145dc8aa37dcde8bb 100644 --- a/README_cn.md +++ b/README_cn.md @@ -182,6 +182,7 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme +- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md)、[PP-TTS](./docs/source/tts/PPTTS_cn.md)、[PP-VPR](docs/source/vpr/PPVPR_cn.md) - 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。 - 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。 - 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成,声纹验证。 diff --git a/demos/audio_content_search/README.md b/demos/audio_content_search/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d73d6a59d71f7973b88f3cc9cee2834b49e5fe59 --- /dev/null +++ b/demos/audio_content_search/README.md @@ -0,0 +1,74 @@ +([简体中文](./README_cn.md)|English) +# ACS (Audio Content Search) + +## Introduction +ACS, or Audio Content Search, refers to the problem of getting the key word time stamp from automatically transcribe spoken language (speech-to-text). + +This demo is an implementation of obtaining the keyword timestamp in the text from a given audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. +Now, the search word in demo is: +``` +我 +康 +``` +## Usage +### 1. Installation +see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). + +You can choose one way from meduim and hard to install paddlespeech. + +The dependency refers to the requirements.txt +### 2. Prepare Input File +The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. + +Here are sample files for this demo that can be downloaded: +```bash +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav +``` + +### 3. Usage +- Command Line(Recommended) + ```bash + # Chinese + paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav + ``` + + Usage: + ```bash + paddlespeech asr --help + ``` + Arguments: + - `input`(required): Audio file to recognize. + - `server_ip`: the server ip. + - `port`: the server port. + - `lang`: the language type of the model. Default: `zh`. + - `sample_rate`: Sample rate of the model. Default: `16000`. + - `audio_format`: The audio format. + + Output: + ```bash + [2022-05-15 15:00:58,185] [ INFO] - acs http client start + [2022-05-15 15:00:58,185] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search + [2022-05-15 15:01:03,220] [ INFO] - acs http client finished + [2022-05-15 15:01:03,221] [ INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]} + [2022-05-15 15:01:03,221] [ INFO] - Response time 5.036084 s. + ``` + +- Python API + ```python + from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor + + acs_executor = ACSClientExecutor() + res = acs_executor( + input='./zh.wav', + server_ip="127.0.0.1", + port=8490,) + print(res) + ``` + + Output: + ```bash + [2022-05-15 15:08:13,955] [ INFO] - acs http client start + [2022-05-15 15:08:13,956] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search + [2022-05-15 15:08:19,026] [ INFO] - acs http client finished + {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]} + ``` diff --git a/demos/audio_content_search/README_cn.md b/demos/audio_content_search/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..c74af4cf1f1e1a70470bf176cd1821dfdd02ac74 --- /dev/null +++ b/demos/audio_content_search/README_cn.md @@ -0,0 +1,74 @@ +(简体中文|[English](./README.md)) + +# 语音内容搜索 +## 介绍 +语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。 + +这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。 + +当前示例中检索词是 +``` +我 +康 +``` +## 使用方法 +### 1. 安装 +请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。 + +你可以从 medium,hard 三中方式中选择一种方式安装。 +依赖参见 requirements.txt + +### 2. 准备输入 +这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。 + +可以下载此 demo 的示例音频: +```bash +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav +``` +### 3. 使用方法 +- 命令行 (推荐使用) + ```bash + # 中文 + paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav + ``` + + 使用方法: + ```bash + paddlespeech acs --help + ``` + 参数: + - `input`(必须输入):用于识别的音频文件。 + - `server_ip`: 服务的ip。 + - `port`:服务的端口。 + - `lang`:模型语言,默认值:`zh`。 + - `sample_rate`:音频采样率,默认值:`16000`。 + - `audio_format`: 音频的格式。 + + 输出: + ```bash + [2022-05-15 15:00:58,185] [ INFO] - acs http client start + [2022-05-15 15:00:58,185] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search + [2022-05-15 15:01:03,220] [ INFO] - acs http client finished + [2022-05-15 15:01:03,221] [ INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]} + [2022-05-15 15:01:03,221] [ INFO] - Response time 5.036084 s. + ``` + +- Python API + ```python + from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor + + acs_executor = ACSClientExecutor() + res = acs_executor( + input='./zh.wav', + server_ip="127.0.0.1", + port=8490,) + print(res) + ``` + + 输出: + ```bash + [2022-05-15 15:08:13,955] [ INFO] - acs http client start + [2022-05-15 15:08:13,956] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search + [2022-05-15 15:08:19,026] [ INFO] - acs http client finished + {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]} + ``` diff --git a/demos/audio_content_search/acs_clinet.py b/demos/audio_content_search/acs_clinet.py new file mode 100644 index 0000000000000000000000000000000000000000..11f99aca7aa74b2b9fca8544939a0f7267878b21 --- /dev/null +++ b/demos/audio_content_search/acs_clinet.py @@ -0,0 +1,49 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse + +from paddlespeech.cli.log import logger +from paddlespeech.server.utils.audio_handler import ASRHttpHandler + + +def main(args): + logger.info("asr http client start") + audio_format = "wav" + sample_rate = 16000 + lang = "zh" + handler = ASRHttpHandler( + server_ip=args.server_ip, port=args.port, endpoint=args.endpoint) + res = handler.run(args.wavfile, audio_format, sample_rate, lang) + # res = res['result'] + logger.info(f"the final result: {res}") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="audio content search client") + parser.add_argument( + '--server_ip', type=str, default='127.0.0.1', help='server ip') + parser.add_argument('--port', type=int, default=8090, help='server port') + parser.add_argument( + "--wavfile", + action="store", + help="wav file path ", + default="./16_audio.wav") + parser.add_argument( + '--endpoint', + type=str, + default='/paddlespeech/asr/search', + help='server endpoint') + args = parser.parse_args() + + main(args) diff --git a/demos/audio_content_search/conf/acs_application.yaml b/demos/audio_content_search/conf/acs_application.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d3c5e3039945ffe23ba6dd2de717d9b6ab8a433f --- /dev/null +++ b/demos/audio_content_search/conf/acs_application.yaml @@ -0,0 +1,34 @@ +################################################################################# +# SERVER SETTING # +################################################################################# +host: 0.0.0.0 +port: 8490 + +# The task format in the engin_list is: _ +# task choices = ['acs_python'] +# protocol = ['http'] (only one can be selected). +# http only support offline engine type. +protocol: 'http' +engine_list: ['acs_python'] + + +################################################################################# +# ENGINE CONFIG # +################################################################################# + +################################### ACS ######################################### +################### acs task: engine_type: python ############################### +acs_python: + task: acs + asr_protocol: 'websocket' # 'websocket' + offset: 1.0 # second + asr_server_ip: 127.0.0.1 + asr_server_port: 8390 + lang: 'zh' + word_list: "./conf/words.txt" + sample_rate: 16000 + device: 'cpu' # set 'gpu:id' or 'cpu' + + + + diff --git a/demos/audio_content_search/conf/words.txt b/demos/audio_content_search/conf/words.txt new file mode 100644 index 0000000000000000000000000000000000000000..25510eb424fbe48ba81f51a3ce10d6ff9facad63 --- /dev/null +++ b/demos/audio_content_search/conf/words.txt @@ -0,0 +1,2 @@ +我 +康 \ No newline at end of file diff --git a/demos/audio_content_search/conf/ws_conformer_application.yaml b/demos/audio_content_search/conf/ws_conformer_application.yaml new file mode 100644 index 0000000000000000000000000000000000000000..97201382f57e12e3fccb600f98ee3b0b26dc889c --- /dev/null +++ b/demos/audio_content_search/conf/ws_conformer_application.yaml @@ -0,0 +1,43 @@ +################################################################################# +# SERVER SETTING # +################################################################################# +host: 0.0.0.0 +port: 8390 + +# The task format in the engin_list is: _ +# task choices = ['asr_online'] +# protocol = ['websocket'] (only one can be selected). +# websocket only support online engine type. +protocol: 'websocket' +engine_list: ['asr_online'] + + +################################################################################# +# ENGINE CONFIG # +################################################################################# + +################################### ASR ######################################### +################### speech task: asr; engine_type: online ####################### +asr_online: + model_type: 'conformer_online_multicn' + am_model: # the pdmodel file of am static model [optional] + am_params: # the pdiparams file of am static model [optional] + lang: 'zh' + sample_rate: 16000 + cfg_path: + decode_method: 'attention_rescoring' + force_yes: True + device: 'cpu' # cpu or gpu:id + am_predictor_conf: + device: # set 'gpu:id' or 'cpu' + switch_ir_optim: True + glog_info: False # True -> print glog + summary: True # False -> do not show predictor config + + chunk_buffer_conf: + window_n: 7 # frame + shift_n: 4 # frame + window_ms: 25 # ms + shift_ms: 10 # ms + sample_rate: 16000 + sample_width: 2 diff --git a/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c23680bd59d5286ea0854efd46a7479485784f27 --- /dev/null +++ b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml @@ -0,0 +1,46 @@ +# This is the parameter configuration file for PaddleSpeech Serving. + +################################################################################# +# SERVER SETTING # +################################################################################# +host: 0.0.0.0 +port: 8390 + +# The task format in the engin_list is: _ +# task choices = ['asr_online'] +# protocol = ['websocket'] (only one can be selected). +# websocket only support online engine type. +protocol: 'websocket' +engine_list: ['asr_online'] + + +################################################################################# +# ENGINE CONFIG # +################################################################################# + +################################### ASR ######################################### +################### speech task: asr; engine_type: online ####################### +asr_online: + model_type: 'conformer_online_wenetspeech' + am_model: # the pdmodel file of am static model [optional] + am_params: # the pdiparams file of am static model [optional] + lang: 'zh' + sample_rate: 16000 + cfg_path: + decode_method: + force_yes: True + device: 'cpu' # cpu or gpu:id + decode_method: "attention_rescoring" + am_predictor_conf: + device: # set 'gpu:id' or 'cpu' + switch_ir_optim: True + glog_info: False # True -> print glog + summary: True # False -> do not show predictor config + + chunk_buffer_conf: + window_n: 7 # frame + shift_n: 4 # frame + window_ms: 25 # ms + shift_ms: 10 # ms + sample_rate: 16000 + sample_width: 2 diff --git a/demos/audio_content_search/run.sh b/demos/audio_content_search/run.sh new file mode 100755 index 0000000000000000000000000000000000000000..e322a37c5fcb98f1d5410f736e69646414af5f0f --- /dev/null +++ b/demos/audio_content_search/run.sh @@ -0,0 +1,7 @@ +export CUDA_VISIBLE_DEVICE=0,1,2,3 +# we need the streaming asr server +nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log 2>&1 & + +# start the acs server +nohup paddlespeech_server start --config_file conf/acs_application.yaml > acs.log 2>&1 & + diff --git a/demos/custom_streaming_asr/README.md b/demos/custom_streaming_asr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..aa28d502f9da451b0279c224523160ad22f0b97a --- /dev/null +++ b/demos/custom_streaming_asr/README.md @@ -0,0 +1,65 @@ +([简体中文](./README_cn.md)|English) + +# Customized Auto Speech Recognition + +## introduction +In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues. + +this demo is customized for expense account, which need to recognize rare address. + +* G with slot: 打车到 "address_slot"。 +![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4) + +* this is address slot wfst, you can add the address which want to recognize. +![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2) + +* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph. +![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b) + +## Usage +### 1. Installation +install paddle:2.2.2 docker. +``` +sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2 + +sudo docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash +``` + +### 2. demo +* run websocket_server.sh. This script will download resources and libs, and launch the service. +``` +cd /paddle +bash websocket_server.sh +``` +this script run in two steps: +1. download the resources.tar.gz, those direcotries will be found in resource directory. +model: acustic model +graph: the decoder graph (TLG.fst) +lib: some libs +bin: binary +data: audio and wav.scp + +2. websocket_server_main launch the service. +some params: +port: the service port +graph_path: the decoder graph path +model_path: acustic model path +please refer other params in those files: +PaddleSpeech/speechx/speechx/decoder/param.h +PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc + +* In other terminal, run script websocket_client.sh, the client will send data and get the results. +``` +bash websocket_client.sh +``` +websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port. + +* result: +In the log of client, you will see the message below: +``` +0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208 +I0513 10:58:13.884493 41768 feature_cache.h:52] set finished +I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240 +I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240 +LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元 +``` \ No newline at end of file diff --git a/demos/custom_streaming_asr/README_cn.md b/demos/custom_streaming_asr/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..ffbf682fb362394289083658364cb4bc0616682a --- /dev/null +++ b/demos/custom_streaming_asr/README_cn.md @@ -0,0 +1,63 @@ +(简体中文|[English](./README.md)) + +# 定制化语音识别演示 +## 介绍 +在一些场景中,识别系统需要高精度的识别一些稀有词,例如导航软件中地名识别。而通过定制化识别可以满足这一需求。 + +这个 demo 是打车报销单的场景识别,需要识别一些稀有的地名,可以通过如下操作实现。 + +* G with slot: 打车到 "address_slot"。 +![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4) + +* 这是 address slot wfst, 可以添加一些需要识别的地名. +![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2) + +* 通过 replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。 +![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b) + +## 使用方法 +### 1. 配置环境 +安装paddle:2.2.2 docker镜像。 +``` +sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2 + +sudo docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash +``` + +### 2. 演示 +* 运行如下命令,完成相关资源和库的下载和服务启动。 +``` +cd /paddle +bash websocket_server.sh +``` +上面脚本完成了如下两个功能: +1. 完成 resource.tar.gz 下载,解压后,会在 resource 中发现如下目录: +model: 声学模型 +graph: 解码构图 +lib: 相关库 +bin: 运行程序 +data: 语音数据 + +2. 通过 websocket_server_main 来启动服务。 +这里简单的介绍几个参数: +port 是服务端口, +graph_path 用来指定解码图文件, +其他参数说明可参见代码: +PaddleSpeech/speechx/speechx/decoder/param.h +PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc + +* 在另一个终端中, 通过 client 发送数据,得到结果。运行如下命令: +``` +bash websocket_client.sh +``` +通过 websocket_client_main 来启动 client 服务,其中 wav_scp 是发送的语音句子集合,port 为服务端口。 + +* 结果: +client 的 log 中可以看到如下类似的结果 +``` +0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208 +I0513 10:58:13.884493 41768 feature_cache.h:52] set finished +I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240 +I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240 +LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元 +``` diff --git a/demos/custom_streaming_asr/path.sh b/demos/custom_streaming_asr/path.sh new file mode 100644 index 0000000000000000000000000000000000000000..47462324d739e7cc5dbd16097d5ca5b5cbdacbf3 --- /dev/null +++ b/demos/custom_streaming_asr/path.sh @@ -0,0 +1,2 @@ +export LD_LIBRARY_PATH=$PWD/resource/lib +export PATH=$PATH:$PWD/resource/bin diff --git a/demos/custom_streaming_asr/setup_docker.sh b/demos/custom_streaming_asr/setup_docker.sh new file mode 100644 index 0000000000000000000000000000000000000000..329a75db0ef34c8cb4e3a54d9663f027d1919a14 --- /dev/null +++ b/demos/custom_streaming_asr/setup_docker.sh @@ -0,0 +1 @@ +sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash diff --git a/demos/custom_streaming_asr/websocket_client.sh b/demos/custom_streaming_asr/websocket_client.sh new file mode 100755 index 0000000000000000000000000000000000000000..ede076cafa2529c89bc79dee211a8cf962cf960d --- /dev/null +++ b/demos/custom_streaming_asr/websocket_client.sh @@ -0,0 +1,18 @@ +#!/bin/bash +set +x +set -e + +. path.sh +# input +data=$PWD/data + +# output +wav_scp=wav.scp + +export GLOG_logtostderr=1 + +# websocket client +websocket_client_main \ + --wav_rspecifier=scp:$data/$wav_scp \ + --streaming_chunk=0.36 \ + --port=8881 diff --git a/demos/custom_streaming_asr/websocket_server.sh b/demos/custom_streaming_asr/websocket_server.sh new file mode 100755 index 0000000000000000000000000000000000000000..041c345be79722c882d50e828f2a2438c0eb9a24 --- /dev/null +++ b/demos/custom_streaming_asr/websocket_server.sh @@ -0,0 +1,33 @@ +#!/bin/bash +set +x +set -e + +export GLOG_logtostderr=1 + +. path.sh +#test websocket server + +model_dir=./resource/model +graph_dir=./resource/graph +cmvn=./data/cmvn.ark + + +#paddle_asr_online/resource.tar.gz +if [ ! -f $cmvn ]; then + wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz + tar xzfv resource.tar.gz + ln -s ./resource/data . +fi + +websocket_server_main \ + --cmvn_file=$cmvn \ + --streaming_chunk=0.1 \ + --use_fbank=true \ + --model_path=$model_dir/avg_10.jit.pdmodel \ + --param_path=$model_dir/avg_10.jit.pdiparams \ + --model_cache_shapes="5-1-2048,5-1-2048" \ + --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \ + --word_symbol_table=$graph_dir/words.txt \ + --graph_path=$graph_dir/TLG.fst --max_active=7500 \ + --port=8881 \ + --acoustic_scale=12 diff --git a/demos/speech_server/README.md b/demos/speech_server/README.md index bb974c97a9eeaad7ba2d675b293fde722378c2d3..5a3de0ccdd50bfc6ba20e4d4b13c9bb9a0ca7874 100644 --- a/demos/speech_server/README.md +++ b/demos/speech_server/README.md @@ -52,8 +52,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup. INFO: Application startup complete. [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) - [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` @@ -75,8 +75,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup. INFO: Application startup complete. [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) - [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` @@ -84,6 +84,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ### 4. ASR Client Usage **Note:** The response time will be slightly longer when using the client for the first time - Command Line (Recommended) + + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ``` paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav ``` @@ -132,6 +135,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ### 5. TTS Client Usage **Note:** The response time will be slightly longer when using the client for the first time - Command Line (Recommended) + + If `127.0.0.1` is not accessible, you need to use the actual service IP address + ```bash paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav ``` @@ -192,6 +198,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ### 6. CLS Client Usage **Note:** The response time will be slightly longer when using the client for the first time - Command Line (Recommended) + + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ``` paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav ``` @@ -242,9 +251,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee **Note:** The response time will be slightly longer when using the client for the first time - Command Line (Recommended) -``` bash -paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav -``` + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + + ``` bash + paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav + ``` * Usage: @@ -297,6 +308,8 @@ print(res) - Command Line (Recommended) + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ``` bash paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav ``` @@ -357,6 +370,9 @@ print(res) **Note:** The response time will be slightly longer when using the client for the first time - Command Line (Recommended) + + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ``` bash paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康" ``` diff --git a/demos/speech_server/README_cn.md b/demos/speech_server/README_cn.md index 8fa67c0dbdd55b48ca766cf388a618ee5fc66118..51b6caa40856d934de362021516186e2c82760df 100644 --- a/demos/speech_server/README_cn.md +++ b/demos/speech_server/README_cn.md @@ -53,8 +53,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup. INFO: Application startup complete. [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) - [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` @@ -76,39 +76,42 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup. INFO: Application startup complete. [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) - [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` ### 4. ASR 客户端使用方法 **注意:** 初次使用客户端时响应时间会略长 - 命令行 (推荐使用) - ``` - paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav - ``` + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 - 使用帮助: - - ```bash - paddlespeech_client asr --help - ``` + ``` + paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav - 参数: - - `server_ip`: 服务端ip地址,默认: 127.0.0.1。 - - `port`: 服务端口,默认: 8090。 - - `input`(必须输入): 用于识别的音频文件。 - - `sample_rate`: 音频采样率,默认值:16000。 - - `lang`: 模型语言,默认值:zh_cn。 - - `audio_format`: 音频格式,默认值:wav。 + ``` - 输出: + 使用帮助: + + ```bash + paddlespeech_client asr --help + ``` + + 参数: + - `server_ip`: 服务端ip地址,默认: 127.0.0.1。 + - `port`: 服务端口,默认: 8090。 + - `input`(必须输入): 用于识别的音频文件。 + - `sample_rate`: 音频采样率,默认值:16000。 + - `lang`: 模型语言,默认值:zh_cn。 + - `audio_format`: 音频格式,默认值:wav。 + + 输出: - ```bash - [2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}} - [2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s. - ``` + ```bash + [2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}} + [2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s. + ``` - Python API ```python @@ -135,33 +138,35 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ### 5. TTS 客户端使用方法 **注意:** 初次使用客户端时响应时间会略长 - 命令行 (推荐使用) - - ```bash - paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav - ``` - 使用帮助: - ```bash - paddlespeech_client tts --help - ``` - - 参数: - - `server_ip`: 服务端ip地址,默认: 127.0.0.1。 - - `port`: 服务端口,默认: 8090。 - - `input`(必须输入): 待合成的文本。 - - `spk_id`: 说话人 id,用于多说话人语音合成,默认值: 0。 - - `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0 - - `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0 - - `sample_rate`: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值:0 - - `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。 - - 输出: - ```bash - [2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'} - [2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav. - [2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s. - [2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s. - ``` + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + + ```bash + paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav + ``` + 使用帮助: + + ```bash + paddlespeech_client tts --help + ``` + + 参数: + - `server_ip`: 服务端ip地址,默认: 127.0.0.1。 + - `port`: 服务端口,默认: 8090。 + - `input`(必须输入): 待合成的文本。 + - `spk_id`: 说话人 id,用于多说话人语音合成,默认值: 0。 + - `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0 + - `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0 + - `sample_rate`: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值:0 + - `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。 + + 输出: + ```bash + [2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'} + [2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav. + [2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s. + [2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s. + ``` - Python API ```python @@ -197,9 +202,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee **注意:** 初次使用客户端时响应时间会略长 - 命令行 (推荐使用) - ``` - paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav - ``` + + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + + ``` + paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav + ``` 使用帮助: @@ -247,15 +255,17 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee 注意: 初次使用客户端时响应时间会略长 * 命令行 (推荐使用) -``` bash -paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav -``` + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + + ``` bash + paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav + ``` * 使用帮助: -``` bash -paddlespeech_client vector --help -``` + ``` bash + paddlespeech_client vector --help + ``` * 参数: * server_ip: 服务端ip地址,默认: 127.0.0.1。 * port: 服务端口,默认: 8090。 @@ -299,15 +309,17 @@ print(res) 注意: 初次使用客户端时响应时间会略长 * 命令行 (推荐使用) -``` bash -paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav -``` + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + + ``` bash + paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav + ``` * 使用帮助: -``` bash -paddlespeech_client vector --help -``` + ``` bash + paddlespeech_client vector --help + ``` * 参数: * server_ip: 服务端ip地址,默认: 127.0.0.1。 @@ -357,9 +369,12 @@ print(res) **注意:** 初次使用客户端时响应时间会略长 - 命令行 (推荐使用) - ``` bash - paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康" - ``` + + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + + ``` bash + paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康" + ``` 使用帮助: @@ -409,4 +424,4 @@ print(res) 通过 `paddlespeech_server stats --task vector` 获取Vector服务支持的所有模型。 ### Text支持的模型 -通过 `paddlespeech_server stats --task text` 获取Text服务支持的所有模型。 \ No newline at end of file +通过 `paddlespeech_server stats --task text` 获取Text服务支持的所有模型。 diff --git a/demos/speech_server/asr_client.sh b/demos/speech_server/asr_client.sh index afe2f82181aeab08194963d126f7621bc59b8b63..37a7ab0b02e8afd6bb7d412314e804c56a2ac254 100644 --- a/demos/speech_server/asr_client.sh +++ b/demos/speech_server/asr_client.sh @@ -1,4 +1,6 @@ #!/bin/bash wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav + +# If `127.0.0.1` is not accessible, you need to use the actual service IP address. paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav diff --git a/demos/speech_server/cls_client.sh b/demos/speech_server/cls_client.sh index 5797aa204f6ba2cb260440e8709d7905134ddf53..67012648c7ec9ce3be6aa5f4da234116864fb503 100644 --- a/demos/speech_server/cls_client.sh +++ b/demos/speech_server/cls_client.sh @@ -1,4 +1,6 @@ #!/bin/bash wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav + +# If `127.0.0.1` is not accessible, you need to use the actual service IP address. paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1 diff --git a/demos/speech_server/conf/application.yaml b/demos/speech_server/conf/application.yaml index 14a9195acd925d2f6f6d1a5869d44d81b2bdca06..c6588ce802caa2419425fd5b94170a1e75d16568 100644 --- a/demos/speech_server/conf/application.yaml +++ b/demos/speech_server/conf/application.yaml @@ -3,7 +3,7 @@ ################################################################################# # SERVER SETTING # ################################################################################# -host: 127.0.0.1 +host: 0.0.0.0 port: 8090 # The task format in the engin_list is: _ @@ -157,4 +157,4 @@ vector_python: sample_rate: 16000 cfg_path: # [optional] ckpt_path: # [optional] - device: # set 'gpu:id' or 'cpu' \ No newline at end of file + device: # set 'gpu:id' or 'cpu' diff --git a/demos/speech_server/tts_client.sh b/demos/speech_server/tts_client.sh index a756dfd3ef555f0b74e845d1b7754bed1d826e19..a443a0a94a6a6e19f0a0cf40708ebca3e8137624 100644 --- a/demos/speech_server/tts_client.sh +++ b/demos/speech_server/tts_client.sh @@ -1,3 +1,4 @@ #!/bin/bash +# If `127.0.0.1` is not accessible, you need to use the actual service IP address. paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav diff --git a/demos/streaming_asr_server/README.md b/demos/streaming_asr_server/README.md index 909f5a4c735841a9390404547fb915ee193f53b9..4824da6281bc883f393dc16c9e43ba38c6bdcf6e 100644 --- a/demos/streaming_asr_server/README.md +++ b/demos/streaming_asr_server/README.md @@ -1,6 +1,6 @@ ([简体中文](./README_cn.md)|English) -# Speech Server +# Streaming ASR Server ## Introduction This demo is an implementation of starting the streaming speech service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python. @@ -15,7 +15,7 @@ It is recommended to use **paddlepaddle 2.2.1** or above. You can choose one way from meduim and hard to install paddlespeech. ### 2. Prepare config File -The configuration file can be found in `conf/ws_application.yaml` 和 `conf/ws_conformer_application.yaml`. +The configuration file can be found in `conf/ws_application.yaml` 和 `conf/ws_conformer_wenetspeech_application.yaml`. At present, the speech tasks integrated by the model include: DeepSpeech2 and conformer. @@ -32,7 +32,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file. ```bash # in PaddleSpeech/demos/streaming_asr_server start the service - paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml + paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml ``` Usage: @@ -46,31 +46,27 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav Output: ```bash - [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance - [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu - [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... - [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine - [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online - [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success - [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. - INFO: Started server process [11173] - [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] - INFO: Waiting for application startup. - [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. - INFO: Application startup complete. - [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - infos = await tasks.gather(*fs, loop=self) - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - await tasks.sleep(0, loop=self) - INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) - [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance + [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu + [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k + [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking... + [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine + [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online + [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success + [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully. + INFO: Started server process [4242] + [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242] + INFO: Waiting for application startup. + [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` - Python API @@ -81,37 +77,33 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav server_executor = ServerExecutor() server_executor( - config_file="./conf/ws_conformer_application.yaml", + config_file="./conf/ws_conformer_wenetspeech_application.yaml", log_file="./log/paddlespeech.log") ``` Output: ```bash - [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance - [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu - [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... - [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine - [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online - [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success - [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. - INFO: Started server process [11173] - [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] - INFO: Waiting for application startup. - [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. - INFO: Application startup complete. - [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - infos = await tasks.gather(*fs, loop=self) - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - await tasks.sleep(0, loop=self) - INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) - [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance + [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu + [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k + [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking... + [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine + [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online + [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success + [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully. + INFO: Started server process [4242] + [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242] + INFO: Waiting for application startup. + [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` @@ -119,9 +111,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav **Note:** The response time will be slightly longer when using the client for the first time - Command Line (Recommended) - ``` - paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav - ``` + + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + + ``` + paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav + ``` Usage: @@ -374,10 +369,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav ### 2. Client usage **Note** The response time will be slightly longer when using the client for the first time -- Command line - ``` - paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康" - ``` +- Command line: + + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + + ``` + paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康" + ``` Output ``` @@ -419,6 +417,9 @@ bash server.sh ### 2. Call client - Command line + + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ``` paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav ``` @@ -494,6 +495,9 @@ bash server.sh ``` - Use script + + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ``` python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav ``` diff --git a/demos/streaming_asr_server/README_cn.md b/demos/streaming_asr_server/README_cn.md index 0f1ae1c1550f14a3028cce3a04c1814798db0004..4ed15e17e4d2189e1579ca5a528f2072b41af320 100644 --- a/demos/streaming_asr_server/README_cn.md +++ b/demos/streaming_asr_server/README_cn.md @@ -1,6 +1,6 @@ ([English](./README.md)|中文) -# 语音服务 +# 流式语音识别服务 ## 介绍 这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。 @@ -19,11 +19,11 @@ 流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。 下载好 `PaddleSpeech` 之后,进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。 -配置文件可参见该目录下 `conf/ws_application.yaml` 和 `conf/ws_conformer_application.yaml` 。 +配置文件可参见该目录下 `conf/ws_application.yaml` 和 `conf/ws_conformer_wenetspeech_application.yaml` 。 目前服务集成的模型有: DeepSpeech2和 conformer模型,对应的配置文件如下: * DeepSpeech: `conf/ws_application.yaml` -* conformer: `conf/ws_conformer_application.yaml` +* conformer: `conf/ws_conformer_wenetspeech_application.yaml` @@ -39,7 +39,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav **注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。 ```bash # 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务 - paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml + paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml ``` 使用方法: @@ -53,31 +53,27 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav 输出: ```bash - [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance - [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu - [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... - [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine - [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online - [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success - [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. - INFO: Started server process [11173] - [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] - INFO: Waiting for application startup. - [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. - INFO: Application startup complete. - [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - infos = await tasks.gather(*fs, loop=self) - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - await tasks.sleep(0, loop=self) - INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) - [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance + [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu + [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k + [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking... + [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine + [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online + [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success + [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully. + INFO: Started server process [4242] + [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242] + INFO: Waiting for application startup. + [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` - Python API @@ -88,43 +84,42 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav server_executor = ServerExecutor() server_executor( - config_file="./conf/ws_conformer_application.yaml", + config_file="./conf/ws_conformer_wenetspeech_application", log_file="./log/paddlespeech.log") ``` 输出: ```bash - [2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance - [2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu - [2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... - [2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams - [2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine - [2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online - [2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success - [2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. - INFO: Started server process [11173] - [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] - INFO: Waiting for application startup. - [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. - INFO: Application startup complete. - [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - infos = await tasks.gather(*fs, loop=self) - /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. - await tasks.sleep(0, loop=self) - INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) - [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance + [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu + [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k + [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking... + [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams + [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine + [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online + [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success + [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully. + INFO: Started server process [4242] + [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242] + INFO: Waiting for application startup. + [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) + [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) ``` ### 4. ASR 客户端使用方法 **注意:** 初次使用客户端时响应时间会略长 - 命令行 (推荐使用) + + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + ``` paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav ``` @@ -384,6 +379,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav **注意:** 初次使用客户端时响应时间会略长 - 命令行 (推荐使用) + + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + ``` paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康" ``` @@ -427,6 +425,9 @@ bash server.sh ### 2. 调用服务 - 使用命令行: + + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + ``` paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav ``` @@ -502,6 +503,9 @@ bash server.sh ``` - 使用脚本调用 + + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + ``` python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav ``` diff --git a/demos/streaming_asr_server/conf/application.yaml b/demos/streaming_asr_server/conf/application.yaml index f576d704ae1ae1a1c95576673aae2bc73eedc6af..e9a89c19d2ad08db9a6c41ec94bdf21be95125b0 100644 --- a/demos/streaming_asr_server/conf/application.yaml +++ b/demos/streaming_asr_server/conf/application.yaml @@ -29,7 +29,8 @@ asr_online: cfg_path: decode_method: force_yes: True - device: cpu # cpu or gpu:id + device: 'cpu' # cpu or gpu:id + decode_method: "attention_rescoring" am_predictor_conf: device: # set 'gpu:id' or 'cpu' switch_ir_optim: True @@ -42,4 +43,4 @@ asr_online: window_ms: 25 # ms shift_ms: 10 # ms sample_rate: 16000 - sample_width: 2 \ No newline at end of file + sample_width: 2 diff --git a/demos/streaming_asr_server/test.sh b/demos/streaming_asr_server/test.sh index c7b57e9b3f66fc518b170d9754ddaa15dc7ae038..4f43c6534f078683329a287bb87a1c79cff15b8f 100755 --- a/demos/streaming_asr_server/test.sh +++ b/demos/streaming_asr_server/test.sh @@ -2,9 +2,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav # read the wav and pass it to only streaming asr service +# If `127.0.0.1` is not accessible, you need to use the actual service IP address. # python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --wavfile ./zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --input ./zh.wav # read the wav and call streaming and punc service +# If `127.0.0.1` is not accessible, you need to use the actual service IP address. # python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav \ No newline at end of file diff --git a/demos/streaming_asr_server/websocket_client.py b/demos/streaming_asr_server/websocket_client.py index 3451b8d047429de7c3c8edf4053ac9dfed1df361..8a4fe330ac0e31d199cf2b436c1b4fa18e1d4a06 100644 --- a/demos/streaming_asr_server/websocket_client.py +++ b/demos/streaming_asr_server/websocket_client.py @@ -13,6 +13,9 @@ # limitations under the License. #!/usr/bin/python # -*- coding: UTF-8 -*- + +# script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR, "RTF", sum/NR}' + import argparse import asyncio import codecs @@ -40,7 +43,7 @@ def main(args): result = result["result"] logger.info(f"asr websocket client finished : {result}") - # support to process batch audios from wav.scp + # support to process batch audios from wav.scp if args.wavscp and os.path.exists(args.wavscp): logging.info(f"start to process the wavscp: {args.wavscp}") with codecs.open(args.wavscp, 'r', encoding='utf-8') as f,\ diff --git a/demos/streaming_tts_server/README.md b/demos/streaming_tts_server/README.md index 299aa3d2aff6d46dd7b50e234b2cedf9bc3b2ba4..775cd908603d7447a7587834f11a8b3248ae9f55 100644 --- a/demos/streaming_tts_server/README.md +++ b/demos/streaming_tts_server/README.md @@ -63,8 +63,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`. [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -90,8 +90,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`. [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -101,6 +101,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`. Access http streaming TTS service: + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ```bash paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav ``` @@ -198,8 +200,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`. [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -226,8 +228,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`. [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -236,6 +238,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`. Access websocket streaming TTS service: + If `127.0.0.1` is not accessible, you need to use the actual service IP address. + ```bash paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav ``` diff --git a/demos/streaming_tts_server/README_cn.md b/demos/streaming_tts_server/README_cn.md index bb159503d2656b43d6b6ecbffab2deadf927ee84..9c2cc50ecca8b8c38d41593eefa5020cc67ab63b 100644 --- a/demos/streaming_tts_server/README_cn.md +++ b/demos/streaming_tts_server/README_cn.md @@ -62,8 +62,8 @@ [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -89,8 +89,8 @@ [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -100,6 +100,8 @@ 访问 http 流式TTS服务: + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + ```bash paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav ``` @@ -198,8 +200,8 @@ [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -226,8 +228,8 @@ [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup. INFO: Application startup complete. [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete. - INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) - [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) + [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit) ``` @@ -236,6 +238,8 @@ 访问 websocket 流式TTS服务: + 若 `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址 + ```bash paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav ``` diff --git a/demos/streaming_tts_server/conf/tts_online_application.yaml b/demos/streaming_tts_server/conf/tts_online_application.yaml index 714f4a68969b2ec196c483692c4f712baeaad3a3..964e85ef95a80db29a35ee9a69e5909c1aef70d8 100644 --- a/demos/streaming_tts_server/conf/tts_online_application.yaml +++ b/demos/streaming_tts_server/conf/tts_online_application.yaml @@ -3,7 +3,7 @@ ################################################################################# # SERVER SETTING # ################################################################################# -host: 127.0.0.1 +host: 0.0.0.0 port: 8092 # The task format in the engin_list is: _ diff --git a/demos/streaming_tts_server/test_client.sh b/demos/streaming_tts_server/test_client.sh index 8698209521d4ed05ca53d4844e7b7bcba06f7cca..bd88f20b1bce760437c5c2bf7ffa85624b5490e2 100644 --- a/demos/streaming_tts_server/test_client.sh +++ b/demos/streaming_tts_server/test_client.sh @@ -1,7 +1,9 @@ #!/bin/bash # http client test +# If `127.0.0.1` is not accessible, you need to use the actual service IP address. paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav # websocket client test -#paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav +# If `127.0.0.1` is not accessible, you need to use the actual service IP address. +# paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav diff --git a/docs/source/asr/PPASR.md b/docs/source/asr/PPASR.md new file mode 100644 index 0000000000000000000000000000000000000000..3779434e3d5eb05d65fd89dc54c4d2cc329c8b39 --- /dev/null +++ b/docs/source/asr/PPASR.md @@ -0,0 +1,96 @@ +([简体中文](./PPASR_cn.md)|English) +# PP-ASR + +## Catalogue +- [1. Introduction](#1) +- [2. Characteristic](#2) +- [3. Tutorials](#3) + - [3.1 Pre-trained Models](#31) + - [3.2 Training](#32) + - [3.3 Inference](#33) + - [3.4 Service Deployment](#33) + - [3.5 Customized Auto Speech Recognition and Deployment](#33) +- [4. Quick Start](#4) + + +## 1. Introduction + +PP-ASR is a tool to provide ASR(Automatic speech recognition) function. It provides a variety of Chinese and English models and supports model training. It also supports model inference using the command line. In addition, PP-ASR supports the deployment of streaming models and customized ASR. + + +## 2. Characteristic +The basic process of ASR is shown in the figure below: +
+ + +The main characteristics of PP-ASR are shown below: +- Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models include deepspeech2 and conformer/transformer. +- Support model training on Chinese/English datasets. +- Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference. +- Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function. +- Support customized auto speech recognition and deployment. + + +## 3. Tutorials + + +## 3.1 Pre-trained Models +The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md). +The model with good effect are Ds2 Online Wenetspeech ASR0 Model and Conformer Online Wenetspeech ASR1 Model. Both two models support streaming ASR. +For more information about model design, you can refer to the aistudio tutorial: +- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807) +- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110) + + +## 3.2 Training +The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer). +The specific steps of executing the script are recorded in `run.sh`. + +For more information, you can refer to [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) + + + +## 3.3 Inference + +PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`. + +Specific supported functions include: + +- Prediction of single audio +- Use the pipe to predict multiple audio +- Support RTF calculation + +For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) + + + +## 3.4 Service Deployment + +PP-ASR supports the service deployment of streaming ASR. Support the simultaneous use of speech recognition and punctuation processing. + +Demo of ASR Server: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server) + +![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png) + +Display of using ASR server on Web page: [streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html) + + +For more information about service deployment, you can refer to the aistudio tutorial: +- [Streaming service - model part](https://aistudio.baidu.com/aistudio/projectdetail/3839884) +- [Streaming service](https://aistudio.baidu.com/aistudio/projectdetail/4017905) + + +## 3.5 Customized Auto Speech Recognition and Deployment + +For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference model(Scoring Library)=> C++ program of TLG(WFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx) +If you want to quickly use it, you can refer to [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) + +For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial: +- [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561) + + + + +## 4. Quick Start + +To use PP-ASR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method. diff --git a/docs/source/asr/PPASR_cn.md b/docs/source/asr/PPASR_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..82b1c1d374ae5dac968f6c6f7583cb4ad487cfdf --- /dev/null +++ b/docs/source/asr/PPASR_cn.md @@ -0,0 +1,96 @@ +(简体中文|[English](./PPASR.md)) +# PP-ASR + +## 目录 +- [1. 简介](#1) +- [2. 特点](#2) +- [3. 使用教程](#3) + - [3.1 预训练模型](#31) + - [3.2 模型训练](#32) + - [3.3 模型推理](#33) + - [3.4 服务部署](#33) + - [3.5 支持个性化场景部署](#33) +- [4. 快速开始](#4) + + +## 1. 简介 + +PP-ASR 是一个 提供 ASR 功能的工具。其提供了多种中文和英文的模型,支持模型的训练,并且支持使用命令行的方式进行模型的推理。 PP-ASR 也支持流式模型的部署,以及个性化场景的部署。 + + +## 2. 特点 +语音识别的基本流程如下图所示: +
+ + +PP-ASR 的主要特点如下: +- 提供在中/英文开源数据集 aishell (中文),wenetspeech(中文),librispeech (英文)上的预训练模型。模型包含 deepspeech2 模型以及 conformer/transformer 模型。 +- 支持中/英文的模型训练功能。 +- 支持命令行方式的模型推理,可使用 `paddlespeech asr --model xxx --input xxx.wav` 方式调用各个预训练模型进行推理。 +- 支持流式 ASR 的服务部署,也支持输出时间戳。 +- 支持个性化场景的部署。 + + +## 3. 使用教程 + + +## 3.1 预训练模型 +支持的预训练模型列表:[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。 +其中效果较好的模型为 Ds2 Online Wenetspeech ASR0 Model 以及 Conformer Online Wenetspeech ASR1 Model。 两个模型都支持流式 ASR。 +更多关于模型设计的部分,可以参考 AIStudio 教程: +- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807) +- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110) + + +## 3.2 模型训练 + +模型的训练的参考脚本存放在 [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) 中,并按照 `examples/数据集/模型` 存放,数据集主要支持 aishell 和 librispeech,模型支持 deepspeech2 模型和 u2 (conformer/transformer) 模型。 +具体的执行脚本的步骤记录在 `run.sh` 当中。具体可参考: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) + + + +## 3.3 模型推理 + +PP-ASR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。 + +具体支持的功能包括: + +- 对单条音频进行预测 +- 使用管道的方式对多条音频进行预测 +- 支持 RTF 的计算 + +具体的使用方式可以参考: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) + + + +## 3.4 服务部署 + +PP-ASR 支持流式ASR的服务部署。支持 语音识别 + 标点处理两个功能同时使用。 + +server 的 demo: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server) + +![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png) + +网页上使用 asr server 的效果展示:[streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html) + +关于服务部署方面的更多资料,可以参考 AIStudio 教程: +- [流式服务-模型部分](https://aistudio.baidu.com/aistudio/projectdetail/3839884) +- [流式服务](https://aistudio.baidu.com/aistudio/projectdetail/4017905) + + +## 3.5 支持个性化场景部署 + +针对个性化场景部署,提供了特征提取(fbank) => 推理模型(打分库)=> TLG(WFST, token, lexion, grammer)的 C++ 程序。具体参考 [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)。 +如果想快速了解和使用,可以参考: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) + +关于支持个性化场景部署的更多资料,可以参考 AIStudio 教程: +- [定制化识别](https://aistudio.baidu.com/aistudio/projectdetail/4021561) + + + + +## 4. 快速开始 + +关于如果使用 PP-ASR,可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能,可以用 **简单** 安装方式。 + + diff --git a/docs/source/index.rst b/docs/source/index.rst index 7741f17f7627af6b9f69eceb830dfee52edc7e3a..fc1649eb3c7173b63f5c1a036a09f8bf15fe65d2 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -54,6 +54,7 @@ Contents :caption: Demos demo_video + streaming_asr_demo_video tts_demo_video streaming_tts_demo_video diff --git a/docs/source/install.md b/docs/source/install.md index bdeb37cec23de92941cbff2dd971a9f301c702e0..43cc784ccac744a5d9f00266912cdece7bf37f6f 100644 --- a/docs/source/install.md +++ b/docs/source/install.md @@ -139,7 +139,7 @@ pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple To avoid the trouble of environment setup, running in a Docker container is highly recommended. Otherwise, if you work on `Ubuntu` with `root` privilege, you can still complete the installation. ### Choice 1: Running in Docker Container (Recommend) -Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with all the dependencies installed. This Docker image requires the support of NVIDIA GPU, so please make sure its availability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed. +Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with dependencies of cuda and cudnn installed. This Docker image requires the support of NVIDIA GPU, so please make sure its availability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed. Take several steps to launch the Docker image: - Download the Docker image diff --git a/docs/source/streaming_asr_demo_video.rst b/docs/source/streaming_asr_demo_video.rst new file mode 100644 index 0000000000000000000000000000000000000000..6c96fea0427053835a3741d66e011c0a552ca2e1 --- /dev/null +++ b/docs/source/streaming_asr_demo_video.rst @@ -0,0 +1,10 @@ +Streaming ASR Demo Video +================== + +.. raw:: html + + diff --git a/docs/source/tts/PPTTS.md b/docs/source/tts/PPTTS.md index c8534cd3265e5369909b12f7853b30a947cc66d5..ef0baa07d62f69b4c573a9052a63231dae718df0 100644 --- a/docs/source/tts/PPTTS.md +++ b/docs/source/tts/PPTTS.md @@ -1,5 +1,7 @@ ([简体中文](./PPTTS_cn.md)|English) +# PPTTS + - [1. Introduction](#1) - [2. Characteristic](#2) - [3. Benchmark](#3) diff --git a/docs/source/vpr/PPVPR.md b/docs/source/vpr/PPVPR.md new file mode 100644 index 0000000000000000000000000000000000000000..a87dd621b2bf24fb3c481d5f5a256d9a36cc154f --- /dev/null +++ b/docs/source/vpr/PPVPR.md @@ -0,0 +1,78 @@ +([简体中文](./PPVPR_cn.md)|English) +# PP-VPR + +## Catalogue +- [1. Introduction](#1) +- [2. Characteristic](#2) +- [3. Tutorials](#3) + - [3.1 Pre-trained Models](#31) + - [3.2 Training](#32) + - [3.3 Inference](#33) + - [3.4 Service Deployment](#33) +- [4. Quick Start](#4) + + +## 1. Introduction + +PP-VPR is a tool that provides voice print feature extraction and retrieval functions. Provides a variety of quasi-industrial solutions, easy to solve the difficult problems in complex scenes, support the use of command line model reasoning. PP-VPR also supports interface operations and container deployment. + + +## 2. Characteristic +The basic process of VPR is shown in the figure below: +
+ + +The main characteristics of PP-ASR are shown below: +- Provides pre-trained models on Chinese open source datasets: VoxCeleb(English). The models include ecapa-tdnn. +- Support model training/evaluation. +- Support model inference using the command line. You can use to use `paddlespeech vector --task spk --input xxx.wav` to use the pre-trained model to do model inference. +- Support interface operations and container deployment. + + +## 3. Tutorials + + +## 3.1 Pre-trained Models +The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md). +For more information about model design, you can refer to the aistudio tutorial: +- [ecapa-tdnn](https://aistudio.baidu.com/aistudio/projectdetail/4027664) + + +## 3.2 Training +The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports VoxCeleb. The model supports ecapa-tdnn. +The specific steps of executing the script are recorded in `run.sh`. + +For more information, you can refer to [sv0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) + + + +## 3.3 Inference + +PP-VPR supports use `paddlespeech vector --task spk --input xxx.wav` to use the pre-trained model to do inference after install `paddlespeech` by `pip install paddlespeech`. + +Specific supported functions include: + +- Prediction of single audio +- Score the similarity between the two audios +- Support RTF calculation + +For specific usage, please refer to: [speaker_verification](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speaker_verification/README_cn.md) + + + +## 3.4 Service Deployment + +PP-VPR supports Docker containerized service deployment. Through Milvus, MySQL performs high performance library building search. + +Demo of VPR Server: [audio_searching](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/audio_searching) + +![arch](https://ai-studio-static-online.cdn.bcebos.com/7b32dd0200084866863095677e8b40d3b725b867d2e6439e9cf21514e235dfd5) + +For more information about service deployment, you can refer to the aistudio tutorial: +- [speaker_recognition](https://aistudio.baidu.com/aistudio/projectdetail/4027664) + + + +## 4. Quick Start + +To use PP-VPR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method. diff --git a/docs/source/vpr/PPVPR_cn.md b/docs/source/vpr/PPVPR_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..f0e562d1eeaa1522c86db50ea1644ba7bb5b472b --- /dev/null +++ b/docs/source/vpr/PPVPR_cn.md @@ -0,0 +1,79 @@ +(简体中文|[English](./PPVPR.md)) +# PP-VPR + +## 目录 +- [1. 简介](#1) +- [2. 特点](#2) +- [3. 使用教程](#3) + - [3.1 预训练模型](#31) + - [3.2 模型训练](#32) + - [3.3 模型推理](#33) + - [3.4 服务部署](#33) +- [4. 快速开始](#4) + + +## 1. 简介 + +PP-VPR 是一个 提供声纹特征提取,检索功能的工具。提供了多种准工业化的方案,轻松搞定复杂场景中的难题,支持使用命令行的方式进行模型的推理。 PP-VPR 也支持界面化的操作,容器化的部署。 + + +## 2. 特点 +VPR 的基本流程如下图所示: +
+ + +PP-VPR 的主要特点如下: +- 提供在英文开源数据集 VoxCeleb(英文)上的预训练模型,ecapa-tdnn。 +- 支持模型训练评估功能。 +- 支持命令行方式的模型推理,可使用 `paddlespeech vector --task spk --input xxx.wav` 方式调用预训练模型进行推理。 +- 支持 VPR 的服务容器化部署,界面化操作。 + + + +## 3. 使用教程 + + +## 3.1 预训练模型 +支持的预训练模型列表:[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。 +更多关于模型设计的部分,可以参考 AIStudio 教程: +- [ecapa-tdnn](https://aistudio.baidu.com/aistudio/projectdetail/4027664) + + +## 3.2 模型训练 + +模型的训练的参考脚本存放在 [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) 中,并按照 `examples/数据集/模型` 存放,数据集主要支持 VoxCeleb,模型支持 ecapa-tdnn 模型。 +具体的执行脚本的步骤记录在 `run.sh` 当中。具体可参考: [sv0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) + + + +## 3.3 模型推理 + +PP-VPR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。 + +具体支持的功能包括: + +- 对单条音频进行预测 +- 对两条音频进行打分 +- 支持 RTF 的计算 + +具体的使用方式可以参考: [speaker_verification](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speaker_verification/README_cn.md) + + + +## 3.4 服务部署 + +PP-VPR 支持 Docker 容器化服务部署。通过 Milvus, MySQL 进行高性能建库检索。 + +server 的 demo: [audio_searching](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/audio_searching) + +![arch](https://ai-studio-static-online.cdn.bcebos.com/7b32dd0200084866863095677e8b40d3b725b867d2e6439e9cf21514e235dfd5) + + +关于服务部署方面的更多资料,可以参考 AIStudio 教程: +- [speaker_recognition](https://aistudio.baidu.com/aistudio/projectdetail/4027664) + + + +## 4. 快速开始 + +关于如何使用 PP-VPR,可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能,可以用 **简单** 安装方式。 diff --git a/examples/aishell/asr1/RESULTS.md b/examples/aishell/asr1/RESULTS.md index db188450ac958bc4e93a881e469e847dbae73201..f16d423a2dc11f08aeac2a8061f4532d56e6ebbf 100644 --- a/examples/aishell/asr1/RESULTS.md +++ b/examples/aishell/asr1/RESULTS.md @@ -11,7 +11,7 @@ paddlespeech version: 0.2.0 | conformer | 47.07M | conf/conformer.yaml | spec_aug | test | attention_rescoring | - | 0.0464 | -## Chunk Conformer +## Conformer Streaming paddle version: 2.2.2 paddlespeech version: 0.2.0 Need set `decoding.decoding_chunk_size=16` when decoding. diff --git a/examples/librispeech/asr0/RESULTS.md b/examples/librispeech/asr0/RESULTS.md index 77f92a2b7625fbb618b70025ad09d94ac1cc992d..9f6d1cc04177a54024761a1d6f713a088c307ca8 100644 --- a/examples/librispeech/asr0/RESULTS.md +++ b/examples/librispeech/asr0/RESULTS.md @@ -1,6 +1,6 @@ # LibriSpeech -## Deepspeech2 +## Deepspeech2 Non-Streaming | Model | Params | release | Config | Test set | Loss | WER | | --- | --- | --- | --- | --- | --- | --- | | DeepSpeech2 | 42.96M | 2.2.0 | conf/deepspeech2.yaml + spec_aug | test-clean | 14.49190807 | 0.067283 | diff --git a/examples/librispeech/asr1/RESULTS.md b/examples/librispeech/asr1/RESULTS.md index 10f0fe33d97d56483d37a656813c508ffd15c9b1..6f39ae146b6d1a6d36cd379685a6a29f87adc57e 100644 --- a/examples/librispeech/asr1/RESULTS.md +++ b/examples/librispeech/asr1/RESULTS.md @@ -11,7 +11,7 @@ train: Epoch 70, 4 V100-32G, best avg: 20 | conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention_rescoring | 6.433612394332886 | 0.033761 | -## Chunk Conformer +## Conformer Streaming | Model | Params | Config | Augmentation| Test set | Decode method | Chunk Size & Left Chunks | Loss | WER | | --- | --- | --- | --- | --- | --- | --- | --- | --- | diff --git a/examples/wenetspeech/asr1/RESULTS.md b/examples/wenetspeech/asr1/RESULTS.md index 2f7ad65937475d206bcc303c9300fe4ed61c3f4a..cc209db754ffd67655ab25dac662cf2ea122a1fc 100644 --- a/examples/wenetspeech/asr1/RESULTS.md +++ b/examples/wenetspeech/asr1/RESULTS.md @@ -1,6 +1,6 @@ # WenetSpeech -## Conformer online +## Conformer Streaming | Model | Params | Config | Augmentation| Test set | Decode method | Valid Loss | CER | | --- | --- | --- | --- | --- | --- | --- | --- | diff --git a/paddlespeech/cli/asr/infer.py b/paddlespeech/cli/asr/infer.py index 23029cfb4d853c57d3f1e4fb48bd2230b9d49228..863a933f2a7446b920228fe2f5fa6e0294b50d5d 100644 --- a/paddlespeech/cli/asr/infer.py +++ b/paddlespeech/cli/asr/infer.py @@ -187,6 +187,7 @@ class ASRExecutor(BaseExecutor): vocab=self.config.vocab_filepath, spm_model_prefix=self.config.spm_model_prefix) self.config.decode.decoding_method = decode_method + else: raise Exception("wrong type") model_name = model_type[:model_type.rindex( @@ -201,6 +202,21 @@ class ASRExecutor(BaseExecutor): model_dict = paddle.load(self.ckpt_path) self.model.set_state_dict(model_dict) + # compute the max len limit + if "conformer" in model_type or "transformer" in model_type or "wenetspeech" in model_type: + # in transformer like model, we may use the subsample rate cnn network + subsample_rate = self.model.subsampling_rate() + frame_shift_ms = self.config.preprocess_config.process[0][ + 'n_shift'] / self.config.preprocess_config.process[0]['fs'] + max_len = self.model.encoder.embed.pos_enc.max_len + + if self.config.encoder_conf.get("max_len", None): + max_len = self.config.encoder_conf.max_len + + self.max_len = frame_shift_ms * max_len * subsample_rate + logger.info( + f"The asr server limit max duration len: {self.max_len}") + def preprocess(self, model_type: str, input: Union[str, os.PathLike]): """ Input preprocess and return paddle.Tensor stored in self.input. @@ -352,9 +368,10 @@ class ASRExecutor(BaseExecutor): audio, audio_sample_rate = soundfile.read( audio_file, dtype="int16", always_2d=True) audio_duration = audio.shape[0] / audio_sample_rate - max_duration = 50.0 - if audio_duration >= max_duration: - logger.error("Please input audio file less then 50 seconds.\n") + if audio_duration > self.max_len: + logger.error( + f"Please input audio file less then {self.max_len} seconds.\n" + ) return False except Exception as e: logger.exception(e) diff --git a/paddlespeech/s2t/modules/decoder.py b/paddlespeech/s2t/modules/decoder.py index 3a851ec62c35f633ce07fd0b4380d92b31d67b3b..42ac119b44540a1931408b1b86aa75e8b1413597 100644 --- a/paddlespeech/s2t/modules/decoder.py +++ b/paddlespeech/s2t/modules/decoder.py @@ -62,21 +62,21 @@ class TransformerDecoder(BatchScorerInterface, nn.Layer): False: x -> x + att(x) """ - def __init__( - self, - vocab_size: int, - encoder_output_size: int, - attention_heads: int=4, - linear_units: int=2048, - num_blocks: int=6, - dropout_rate: float=0.1, - positional_dropout_rate: float=0.1, - self_attention_dropout_rate: float=0.0, - src_attention_dropout_rate: float=0.0, - input_layer: str="embed", - use_output_layer: bool=True, - normalize_before: bool=True, - concat_after: bool=False, ): + def __init__(self, + vocab_size: int, + encoder_output_size: int, + attention_heads: int=4, + linear_units: int=2048, + num_blocks: int=6, + dropout_rate: float=0.1, + positional_dropout_rate: float=0.1, + self_attention_dropout_rate: float=0.0, + src_attention_dropout_rate: float=0.0, + input_layer: str="embed", + use_output_layer: bool=True, + normalize_before: bool=True, + concat_after: bool=False, + max_len: int=5000): assert check_argument_types() @@ -87,7 +87,8 @@ class TransformerDecoder(BatchScorerInterface, nn.Layer): if input_layer == "embed": self.embed = nn.Sequential( Embedding(vocab_size, attention_dim), - PositionalEncoding(attention_dim, positional_dropout_rate), ) + PositionalEncoding( + attention_dim, positional_dropout_rate, max_len=max_len), ) else: raise ValueError(f"only 'embed' is supported: {input_layer}") diff --git a/paddlespeech/s2t/modules/embedding.py b/paddlespeech/s2t/modules/embedding.py index 5d4e91753b38129a9c2c71d706787af9d14a903d..596f61b78a4e449b2998b3544dd4204371aa8a2b 100644 --- a/paddlespeech/s2t/modules/embedding.py +++ b/paddlespeech/s2t/modules/embedding.py @@ -112,7 +112,9 @@ class PositionalEncoding(nn.Layer, PositionalEncodingInterface): paddle.Tensor: for compatibility to RelPositionalEncoding, (batch=1, time, ...) """ T = x.shape[1] - assert offset + x.shape[1] < self.max_len + assert offset + x.shape[ + 1] < self.max_len, "offset: {} + x.shape[1]: {} is larger than the max_len: {}".format( + offset, x.shape[1], self.max_len) #TODO(Hui Zhang): using T = x.size(1), __getitem__ not support Tensor pos_emb = self.pe[:, offset:offset + T] x = x * self.xscale + pos_emb @@ -148,6 +150,7 @@ class RelPositionalEncoding(PositionalEncoding): max_len (int, optional): [Maximum input length.]. Defaults to 5000. """ super().__init__(d_model, dropout_rate, max_len, reverse=True) + logger.info(f"max len: {max_len}") def forward(self, x: paddle.Tensor, offset: int=0) -> Tuple[paddle.Tensor, paddle.Tensor]: @@ -158,7 +161,9 @@ class RelPositionalEncoding(PositionalEncoding): paddle.Tensor: Encoded tensor (batch, time, `*`). paddle.Tensor: Positional embedding tensor (1, time, `*`). """ - assert offset + x.shape[1] < self.max_len + assert offset + x.shape[ + 1] < self.max_len, "offset: {} + x.shape[1]: {} is larger than the max_len: {}".format( + offset, x.shape[1], self.max_len) x = x * self.xscale #TODO(Hui Zhang): using x.size(1), __getitem__ not support Tensor pos_emb = self.pe[:, offset:offset + x.shape[1]] diff --git a/paddlespeech/s2t/modules/encoder.py b/paddlespeech/s2t/modules/encoder.py index c843c0e207054b20a5d3850334198ef6bcb6888c..669a12d656947f0446eba3d228832964e8c1d7b0 100644 --- a/paddlespeech/s2t/modules/encoder.py +++ b/paddlespeech/s2t/modules/encoder.py @@ -47,24 +47,24 @@ __all__ = ["BaseEncoder", 'TransformerEncoder', "ConformerEncoder"] class BaseEncoder(nn.Layer): - def __init__( - self, - input_size: int, - output_size: int=256, - attention_heads: int=4, - linear_units: int=2048, - num_blocks: int=6, - dropout_rate: float=0.1, - positional_dropout_rate: float=0.1, - attention_dropout_rate: float=0.0, - input_layer: str="conv2d", - pos_enc_layer_type: str="abs_pos", - normalize_before: bool=True, - concat_after: bool=False, - static_chunk_size: int=0, - use_dynamic_chunk: bool=False, - global_cmvn: paddle.nn.Layer=None, - use_dynamic_left_chunk: bool=False, ): + def __init__(self, + input_size: int, + output_size: int=256, + attention_heads: int=4, + linear_units: int=2048, + num_blocks: int=6, + dropout_rate: float=0.1, + positional_dropout_rate: float=0.1, + attention_dropout_rate: float=0.0, + input_layer: str="conv2d", + pos_enc_layer_type: str="abs_pos", + normalize_before: bool=True, + concat_after: bool=False, + static_chunk_size: int=0, + use_dynamic_chunk: bool=False, + global_cmvn: paddle.nn.Layer=None, + use_dynamic_left_chunk: bool=False, + max_len: int=5000): """ Args: input_size (int): input dim, d_feature @@ -127,7 +127,9 @@ class BaseEncoder(nn.Layer): odim=output_size, dropout_rate=dropout_rate, pos_enc_class=pos_enc_class( - d_model=output_size, dropout_rate=positional_dropout_rate), ) + d_model=output_size, + dropout_rate=positional_dropout_rate, + max_len=max_len), ) self.normalize_before = normalize_before self.after_norm = LayerNorm(output_size, epsilon=1e-12) @@ -415,32 +417,32 @@ class TransformerEncoder(BaseEncoder): class ConformerEncoder(BaseEncoder): """Conformer encoder module.""" - def __init__( - self, - input_size: int, - output_size: int=256, - attention_heads: int=4, - linear_units: int=2048, - num_blocks: int=6, - dropout_rate: float=0.1, - positional_dropout_rate: float=0.1, - attention_dropout_rate: float=0.0, - input_layer: str="conv2d", - pos_enc_layer_type: str="rel_pos", - normalize_before: bool=True, - concat_after: bool=False, - static_chunk_size: int=0, - use_dynamic_chunk: bool=False, - global_cmvn: nn.Layer=None, - use_dynamic_left_chunk: bool=False, - positionwise_conv_kernel_size: int=1, - macaron_style: bool=True, - selfattention_layer_type: str="rel_selfattn", - activation_type: str="swish", - use_cnn_module: bool=True, - cnn_module_kernel: int=15, - causal: bool=False, - cnn_module_norm: str="batch_norm", ): + def __init__(self, + input_size: int, + output_size: int=256, + attention_heads: int=4, + linear_units: int=2048, + num_blocks: int=6, + dropout_rate: float=0.1, + positional_dropout_rate: float=0.1, + attention_dropout_rate: float=0.0, + input_layer: str="conv2d", + pos_enc_layer_type: str="rel_pos", + normalize_before: bool=True, + concat_after: bool=False, + static_chunk_size: int=0, + use_dynamic_chunk: bool=False, + global_cmvn: nn.Layer=None, + use_dynamic_left_chunk: bool=False, + positionwise_conv_kernel_size: int=1, + macaron_style: bool=True, + selfattention_layer_type: str="rel_selfattn", + activation_type: str="swish", + use_cnn_module: bool=True, + cnn_module_kernel: int=15, + causal: bool=False, + cnn_module_norm: str="batch_norm", + max_len: int=5000): """Construct ConformerEncoder Args: input_size to use_dynamic_chunk, see in BaseEncoder @@ -464,7 +466,7 @@ class ConformerEncoder(BaseEncoder): attention_dropout_rate, input_layer, pos_enc_layer_type, normalize_before, concat_after, static_chunk_size, use_dynamic_chunk, global_cmvn, - use_dynamic_left_chunk) + use_dynamic_left_chunk, max_len) activation = get_activation(activation_type) # self-attention module definition diff --git a/paddlespeech/server/bin/paddlespeech_client.py b/paddlespeech/server/bin/paddlespeech_client.py index 3adf8015bbceda29fe678ee0f452212b4f36f8cb..74e7ce3fe8c3693d9f3f293a59e9a3c574dd5534 100644 --- a/paddlespeech/server/bin/paddlespeech_client.py +++ b/paddlespeech/server/bin/paddlespeech_client.py @@ -20,6 +20,7 @@ import os import random import sys import time +import warnings from typing import List import numpy as np @@ -34,6 +35,7 @@ from paddlespeech.server.utils.audio_handler import ASRWsAudioHandler from paddlespeech.server.utils.audio_process import wav2pcm from paddlespeech.server.utils.util import compute_delay from paddlespeech.server.utils.util import wav2base64 +warnings.filterwarnings("ignore") __all__ = [ 'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor', @@ -752,3 +754,88 @@ class VectorClientExecutor(BaseExecutor): logger.info(f"The vector score is: {res}") else: logger.error(f"Sorry, we have not support such task {task}") + + +@cli_client_register( + name='paddlespeech_client.acs', description='visit acs service') +class ACSClientExecutor(BaseExecutor): + def __init__(self): + super(ACSClientExecutor, self).__init__() + self.parser = argparse.ArgumentParser( + prog='paddlespeech_client.acs', add_help=True) + self.parser.add_argument( + '--server_ip', type=str, default='127.0.0.1', help='server ip') + self.parser.add_argument( + '--port', type=int, default=8090, help='server port') + self.parser.add_argument( + '--input', + type=str, + default=None, + help='Audio file to be recognized', + required=True) + self.parser.add_argument( + '--sample_rate', type=int, default=16000, help='audio sample rate') + self.parser.add_argument( + '--lang', type=str, default="zh_cn", help='language') + self.parser.add_argument( + '--audio_format', type=str, default="wav", help='audio format') + + def execute(self, argv: List[str]) -> bool: + args = self.parser.parse_args(argv) + input_ = args.input + server_ip = args.server_ip + port = args.port + sample_rate = args.sample_rate + lang = args.lang + audio_format = args.audio_format + + try: + time_start = time.time() + res = self( + input=input_, + server_ip=server_ip, + port=port, + sample_rate=sample_rate, + lang=lang, + audio_format=audio_format, ) + time_end = time.time() + logger.info(f"ACS result: {res}") + logger.info("Response time %f s." % (time_end - time_start)) + return True + except Exception as e: + logger.error("Failed to speech recognition.") + logger.error(e) + return False + + @stats_wrapper + def __call__( + self, + input: str, + server_ip: str="127.0.0.1", + port: int=8090, + sample_rate: int=16000, + lang: str="zh_cn", + audio_format: str="wav", ): + """Python API to call an executor. + + Args: + input (str): The input audio file path + server_ip (str, optional): The ASR server ip. Defaults to "127.0.0.1". + port (int, optional): The ASR server port. Defaults to 8090. + sample_rate (int, optional): The audio sample rate. Defaults to 16000. + lang (str, optional): The audio language type. Defaults to "zh_cn". + audio_format (str, optional): The audio format information. Defaults to "wav". + + Returns: + str: The ACS results + """ + # we use the acs server to get the key word time stamp in audio text content + logger.info("acs http client start") + from paddlespeech.server.utils.audio_handler import ASRHttpHandler + handler = ASRHttpHandler( + server_ip=server_ip, port=port, endpoint="/paddlespeech/asr/search") + res = handler.run(input, audio_format, sample_rate, lang) + res = res['result'] + logger.info("acs http client finished") + + return res diff --git a/paddlespeech/server/bin/paddlespeech_server.py b/paddlespeech/server/bin/paddlespeech_server.py index 1922399f3b5898f4f821512637d3f3d4c0cc8a18..578a0a8a86397fff4fe23fca00f9689b8b9017e9 100644 --- a/paddlespeech/server/bin/paddlespeech_server.py +++ b/paddlespeech/server/bin/paddlespeech_server.py @@ -13,12 +13,14 @@ # limitations under the License. import argparse import sys +import warnings from typing import List import uvicorn from fastapi import FastAPI from starlette.middleware.cors import CORSMiddleware from prettytable import PrettyTable +from starlette.middleware.cors import CORSMiddleware from ..executor import BaseExecutor from ..util import cli_server_register @@ -28,6 +30,7 @@ from paddlespeech.server.engine.engine_pool import init_engine_pool from paddlespeech.server.restful.api import setup_router as setup_http_router from paddlespeech.server.utils.config import get_config from paddlespeech.server.ws.api import setup_router as setup_ws_router +warnings.filterwarnings("ignore") __all__ = ['ServerExecutor', 'ServerStatsExecutor'] @@ -40,6 +43,10 @@ app.add_middleware( allow_credentials=True, allow_methods=["*"], allow_headers=["*"]) +<<<<<<< HEAD +======= + +>>>>>>> develop @cli_server_register( name='paddlespeech_server.start', description='Start the service') @@ -79,7 +86,7 @@ class ServerExecutor(BaseExecutor): else: raise Exception("unsupported protocol") app.include_router(api_router) - + logger.info("start to init the engine") if not init_engine_pool(config): return False diff --git a/paddlespeech/server/conf/application.yaml b/paddlespeech/server/conf/application.yaml index 31a37ef04e2dc910314bad88c1e81fdbff07bb4b..8650154e953c9db0dcc4eb03e598cef64f648878 100644 --- a/paddlespeech/server/conf/application.yaml +++ b/paddlespeech/server/conf/application.yaml @@ -3,7 +3,7 @@ ################################################################################# # SERVER SETTING # ################################################################################# -host: 127.0.0.1 +host: 0.0.0.0 port: 8090 # The task format in the engin_list is: _ @@ -157,4 +157,4 @@ vector_python: sample_rate: 16000 cfg_path: # [optional] ckpt_path: # [optional] - device: # set 'gpu:id' or 'cpu' \ No newline at end of file + device: # set 'gpu:id' or 'cpu' diff --git a/paddlespeech/server/conf/tts_online_application.yaml b/paddlespeech/server/conf/tts_online_application.yaml index 714f4a68969b2ec196c483692c4f712baeaad3a3..964e85ef95a80db29a35ee9a69e5909c1aef70d8 100644 --- a/paddlespeech/server/conf/tts_online_application.yaml +++ b/paddlespeech/server/conf/tts_online_application.yaml @@ -3,7 +3,7 @@ ################################################################################# # SERVER SETTING # ################################################################################# -host: 127.0.0.1 +host: 0.0.0.0 port: 8092 # The task format in the engin_list is: _ diff --git a/paddlespeech/server/engine/acs/__init__.py b/paddlespeech/server/engine/acs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/paddlespeech/server/engine/acs/python/__init__.py b/paddlespeech/server/engine/acs/python/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/paddlespeech/server/engine/acs/python/acs_engine.py b/paddlespeech/server/engine/acs/python/acs_engine.py new file mode 100644 index 0000000000000000000000000000000000000000..30deeeb50519d65fcad2a6f398932718d2bbcd7a --- /dev/null +++ b/paddlespeech/server/engine/acs/python/acs_engine.py @@ -0,0 +1,188 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import io +import json +import os +import re + +import paddle +import soundfile +import websocket + +from paddlespeech.cli.log import logger +from paddlespeech.server.engine.base_engine import BaseEngine + + +class ACSEngine(BaseEngine): + def __init__(self): + """The ACSEngine Engine + """ + super(ACSEngine, self).__init__() + logger.info("Create the ACSEngine Instance") + self.word_list = [] + + def init(self, config: dict): + """Init the ACSEngine Engine + + Args: + config (dict): The server configuation + + Returns: + bool: The engine instance flag + """ + logger.info("Init the acs engine") + try: + self.config = config + if self.config.device: + self.device = self.config.device + else: + self.device = paddle.get_device() + + paddle.set_device(self.device) + logger.info(f"ACS Engine set the device: {self.device}") + + except BaseException as e: + logger.error( + "Set device failed, please check if device is already used and the parameter 'device' in the yaml file" + ) + logger.error("Initialize Text server engine Failed on device: %s." % + (self.device)) + return False + + self.read_search_words() + + # init the asr url + self.url = "ws://" + self.config.asr_server_ip + ":" + str( + self.config.asr_server_port) + "/paddlespeech/asr/streaming" + + logger.info("Init the acs engine successfully") + return True + + def read_search_words(self): + word_list = self.config.word_list + if word_list is None: + logger.error( + "No word list file in config, please set the word list parameter" + ) + return + + if not os.path.exists(word_list): + logger.error("Please input correct word list file") + return + + with open(word_list, 'r') as fp: + self.word_list = [line.strip() for line in fp.readlines()] + + logger.info(f"word list: {self.word_list}") + + def get_asr_content(self, audio_data): + """Get the streaming asr result + + Args: + audio_data (_type_): _description_ + + Returns: + _type_: _description_ + """ + logger.info("send a message to the server") + if self.url is None: + logger.error("No asr server, please input valid ip and port") + return "" + ws = websocket.WebSocket() + ws.connect(self.url) + # with websocket.WebSocket.connect(self.url) as ws: + audio_info = json.dumps( + { + "name": "test.wav", + "signal": "start", + "nbest": 1 + }, + sort_keys=True, + indent=4, + separators=(',', ': ')) + ws.send(audio_info) + msg = ws.recv() + logger.info("client receive msg={}".format(msg)) + + # send the total audio data + samples, sample_rate = soundfile.read(audio_data, dtype='int16') + ws.send_binary(samples.tobytes()) + msg = ws.recv() + msg = json.loads(msg) + logger.info(f"audio result: {msg}") + + # 3. send chunk audio data to engine + logger.info("send the end signal") + audio_info = json.dumps( + { + "name": "test.wav", + "signal": "end", + "nbest": 1 + }, + sort_keys=True, + indent=4, + separators=(',', ': ')) + ws.send(audio_info) + msg = ws.recv() + msg = json.loads(msg) + + logger.info(f"the final result: {msg}") + ws.close() + + return msg + + def get_macthed_word(self, msg): + """Get the matched info in msg + + Args: + msg (dict): the asr info, including the asr result and time stamp + + Returns: + acs_result, asr_result: the acs result and the asr result + """ + asr_result = msg['result'] + time_stamp = msg['times'] + acs_result = [] + + # search for each word in self.word_list + offset = self.config.offset + max_ed = time_stamp[-1]['ed'] + for w in self.word_list: + # search the w in asr_result and the index in asr_result + for m in re.finditer(w, asr_result): + start = max(time_stamp[m.start(0)]['bg'] - offset, 0) + + end = min(time_stamp[m.end(0) - 1]['ed'] + offset, max_ed) + logger.info(f'start: {start}, end: {end}') + acs_result.append({'w': w, 'bg': start, 'ed': end}) + + return acs_result, asr_result + + def run(self, audio_data): + """process the audio data in acs engine + the engine does not store any data, so all the request use the self.run api + + Args: + audio_data (str): the audio data + + Returns: + acs_result, asr_result: the acs result and the asr result + """ + logger.info("start to process the audio content search") + msg = self.get_asr_content(io.BytesIO(audio_data)) + + acs_result, asr_result = self.get_macthed_word(msg) + logger.info(f'the asr result {asr_result}') + logger.info(f'the acs result: {acs_result}') + return acs_result, asr_result diff --git a/paddlespeech/server/engine/asr/online/asr_engine.py b/paddlespeech/server/engine/asr/online/asr_engine.py index 79b0ddb70ec3e6349f303ce1ebc062f771508d68..fd57a3d5214107996d4355dbdd6f1a46514b4bb0 100644 --- a/paddlespeech/server/engine/asr/online/asr_engine.py +++ b/paddlespeech/server/engine/asr/online/asr_engine.py @@ -13,6 +13,7 @@ # limitations under the License. import copy import os +import sys from typing import Optional import numpy as np @@ -588,7 +589,7 @@ class ASRServerExecutor(ASRExecutor): self.pretrained_models = pretrained_models def _init_from_path(self, - model_type: str='deepspeech2online_aishell', + model_type: str=None, am_model: Optional[os.PathLike]=None, am_params: Optional[os.PathLike]=None, lang: str='zh', @@ -599,6 +600,12 @@ class ASRServerExecutor(ASRExecutor): """ Init model and other resources from a specific path. """ + if not model_type or not lang or not sample_rate: + logger.error( + "The model type or lang or sample rate is None, please input an valid server parameter yaml" + ) + return False + self.model_type = model_type self.sample_rate = sample_rate sample_rate_str = '16k' if sample_rate == 16000 else '8k' @@ -730,6 +737,8 @@ class ASRServerExecutor(ASRExecutor): # update the ctc decoding self.searcher = CTCPrefixBeamSearch(self.config.decode) self.transformer_decode_reset() + + return True def reset_decoder_and_chunk(self): """reset decoder and chunk state for an new audio @@ -1028,20 +1037,27 @@ class ASREngine(BaseEngine): self.device = paddle.get_device() logger.info(f"paddlespeech_server set the device: {self.device}") paddle.set_device(self.device) - except BaseException: + except BaseException as e: logger.error( - "Set device failed, please check if device is already used and the parameter 'device' in the yaml file" + f"Set device failed, please check if device '{self.device}' is already used and the parameter 'device' in the yaml file" ) - - self.executor._init_from_path( - model_type=self.config.model_type, - am_model=self.config.am_model, - am_params=self.config.am_params, - lang=self.config.lang, - sample_rate=self.config.sample_rate, - cfg_path=self.config.cfg_path, - decode_method=self.config.decode_method, - am_predictor_conf=self.config.am_predictor_conf) + logger.error( + "If all GPU or XPU is used, you can set the server to 'cpu'") + sys.exit(-1) + + if not self.executor._init_from_path( + model_type=self.config.model_type, + am_model=self.config.am_model, + am_params=self.config.am_params, + lang=self.config.lang, + sample_rate=self.config.sample_rate, + cfg_path=self.config.cfg_path, + decode_method=self.config.decode_method, + am_predictor_conf=self.config.am_predictor_conf): + logger.error( + "Init the ASR server occurs error, please check the server configuration yaml" + ) + return False logger.info("Initialize ASR server engine successfully.") return True diff --git a/paddlespeech/server/engine/asr/python/asr_engine.py b/paddlespeech/server/engine/asr/python/asr_engine.py index e76c49a79a66be505f239f9f04b5fdd050701fda..d60a5feaeca6caa5e385f872872104df2a8aa124 100644 --- a/paddlespeech/server/engine/asr/python/asr_engine.py +++ b/paddlespeech/server/engine/asr/python/asr_engine.py @@ -78,21 +78,26 @@ class ASREngine(BaseEngine): Args: audio_data (bytes): base64.b64decode """ - if self.executor._check( - io.BytesIO(audio_data), self.config.sample_rate, - self.config.force_yes): - logger.info("start run asr engine") - self.executor.preprocess(self.config.model, io.BytesIO(audio_data)) - st = time.time() - self.executor.infer(self.config.model) - infer_time = time.time() - st - self.output = self.executor.postprocess() # Retrieve result of asr. - else: - logger.info("file check failed!") - self.output = None - - logger.info("inference time: {}".format(infer_time)) - logger.info("asr engine type: python") + try: + if self.executor._check( + io.BytesIO(audio_data), self.config.sample_rate, + self.config.force_yes): + logger.info("start run asr engine") + self.executor.preprocess(self.config.model, + io.BytesIO(audio_data)) + st = time.time() + self.executor.infer(self.config.model) + infer_time = time.time() - st + self.output = self.executor.postprocess( + ) # Retrieve result of asr. + else: + logger.info("file check failed!") + self.output = None + + logger.info("inference time: {}".format(infer_time)) + logger.info("asr engine type: python") + except Exception as e: + logger.info(e) def postprocess(self): """postprocess diff --git a/paddlespeech/server/engine/engine_factory.py b/paddlespeech/server/engine/engine_factory.py index 6cf95d756c269d5c8c7806ca9bb38b8927ebb72e..5fdaacceaca3f9d0f0a84ddb82ad7a9426a219ab 100644 --- a/paddlespeech/server/engine/engine_factory.py +++ b/paddlespeech/server/engine/engine_factory.py @@ -52,5 +52,8 @@ class EngineFactory(object): elif engine_name.lower() == 'vector' and engine_type.lower() == 'python': from paddlespeech.server.engine.vector.python.vector_engine import VectorEngine return VectorEngine() + elif engine_name.lower() == 'acs' and engine_type.lower() == 'python': + from paddlespeech.server.engine.acs.python.acs_engine import ACSEngine + return ACSEngine() else: return None diff --git a/paddlespeech/server/engine/engine_pool.py b/paddlespeech/server/engine/engine_pool.py index 9de73567e47c8150a7b2807d4bf1cc299e0e1b40..5300303f6b8e991fcade29bfa5aacc421a7003dc 100644 --- a/paddlespeech/server/engine/engine_pool.py +++ b/paddlespeech/server/engine/engine_pool.py @@ -34,6 +34,7 @@ def init_engine_pool(config) -> bool: engine_type = engine_and_type.split("_")[1] ENGINE_POOL[engine] = EngineFactory.get_engine( engine_name=engine, engine_type=engine_type) + if not ENGINE_POOL[engine].init(config=config[engine_and_type]): return False diff --git a/paddlespeech/server/restful/acs_api.py b/paddlespeech/server/restful/acs_api.py new file mode 100644 index 0000000000000000000000000000000000000000..61cb34d9f47ba7dade4bc769cc179b597b7e2a8b --- /dev/null +++ b/paddlespeech/server/restful/acs_api.py @@ -0,0 +1,101 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import base64 +from typing import Union + +from fastapi import APIRouter + +from paddlespeech.cli.log import logger +from paddlespeech.server.engine.engine_pool import get_engine_pool +from paddlespeech.server.restful.request import ASRRequest +from paddlespeech.server.restful.response import ACSResponse +from paddlespeech.server.restful.response import ErrorResponse +from paddlespeech.server.utils.errors import ErrorCode +from paddlespeech.server.utils.errors import failed_response +from paddlespeech.server.utils.exception import ServerBaseException + +router = APIRouter() + + +@router.get('/paddlespeech/asr/search/help') +def help(): + """help + + Returns: + json: the audio content search result + """ + response = { + "success": "True", + "code": 200, + "message": { + "global": "success" + }, + "result": { + "description": "acs server", + "input": "base64 string of wavfile", + "output": { + "asr_result": "你好", + "acs_result": [{ + 'w': '你', + 'bg': 0.0, + 'ed': 1.2 + }] + } + } + } + return response + + +@router.post( + "/paddlespeech/asr/search", + response_model=Union[ACSResponse, ErrorResponse]) +def acs(request_body: ASRRequest): + """acs api + + Args: + request_body (ASRRequest): the acs request, we reuse the http ASRRequest + + Returns: + json: the acs result + """ + try: + # 1. get the audio data via base64 decoding + audio_data = base64.b64decode(request_body.audio) + + # 2. get single engine from engine pool + engine_pool = get_engine_pool() + acs_engine = engine_pool['acs'] + + # 3. no data stored in acs_engine, so we need to create the another instance process the data + acs_result, asr_result = acs_engine.run(audio_data) + + response = { + "success": True, + "code": 200, + "message": { + "description": "success" + }, + "result": { + "transcription": asr_result, + "acs": acs_result + } + } + + except ServerBaseException as e: + response = failed_response(e.error_code, e.msg) + except BaseException as e: + response = failed_response(ErrorCode.SERVER_UNKOWN_ERR) + logger.error(e) + + return response diff --git a/paddlespeech/server/restful/api.py b/paddlespeech/server/restful/api.py index 63f865e8a3fe7929d00630af8594afef9694b564..1c2dd28147cce5307dcfac0fa27383f178868eeb 100644 --- a/paddlespeech/server/restful/api.py +++ b/paddlespeech/server/restful/api.py @@ -22,6 +22,7 @@ from paddlespeech.server.restful.cls_api import router as cls_router from paddlespeech.server.restful.text_api import router as text_router from paddlespeech.server.restful.tts_api import router as tts_router from paddlespeech.server.restful.vector_api import router as vec_router +from paddlespeech.server.restful.acs_api import router as acs_router _router = APIRouter() @@ -45,6 +46,8 @@ def setup_router(api_list: List): _router.include_router(text_router) elif api_name.lower() == 'vector': _router.include_router(vec_router) + elif api_name.lower() == 'acs': + _router.include_router(acs_router) else: logger.error( f"PaddleSpeech has not support such service: {api_name}") diff --git a/paddlespeech/server/restful/response.py b/paddlespeech/server/restful/response.py index c91b38992198339f4c05c9c7f6f56e89d8a72239..3d991de43ed41c62284587c7c9924965068c69a3 100644 --- a/paddlespeech/server/restful/response.py +++ b/paddlespeech/server/restful/response.py @@ -17,7 +17,7 @@ from pydantic import BaseModel __all__ = [ 'ASRResponse', 'TTSResponse', 'CLSResponse', 'TextResponse', - 'VectorResponse', 'VectorScoreResponse' + 'VectorResponse', 'VectorScoreResponse', 'ACSResponse' ] @@ -231,3 +231,32 @@ class ErrorResponse(BaseModel): success: bool code: int message: Message + + +#****************************************************************************************/ +#************************************ ACS response **************************************/ +#****************************************************************************************/ +class AcsResult(BaseModel): + transcription: str + acs: list + + +class ACSResponse(BaseModel): + """ + response example + { + "success": true, + "code": 0, + "message": { + "description": "success" + }, + "result": { + "transcription": "你好,飞桨" + "acs": [(你好, 0.0, 0.45)] + } + } + """ + success: bool + code: int + message: Message + result: AcsResult diff --git a/paddlespeech/server/utils/audio_handler.py b/paddlespeech/server/utils/audio_handler.py index b85cf485dc315f6df98803bce6bb6adcf90c0dd2..baa7b9343c2f0409db444e2061a41a50d96880ad 100644 --- a/paddlespeech/server/utils/audio_handler.py +++ b/paddlespeech/server/utils/audio_handler.py @@ -205,7 +205,7 @@ class ASRWsAudioHandler: class ASRHttpHandler: - def __init__(self, server_ip=None, port=None): + def __init__(self, server_ip=None, port=None, endpoint="/paddlespeech/asr"): """The ASR client http request Args: @@ -219,7 +219,7 @@ class ASRHttpHandler: self.url = None else: self.url = 'http://' + self.server_ip + ":" + str( - self.port) + '/paddlespeech/asr' + self.port) + endpoint logger.info(f"endpoint: {self.url}") def run(self, input, audio_format, sample_rate, lang): @@ -248,7 +248,7 @@ class ASRHttpHandler: } res = requests.post(url=self.url, data=json.dumps(data)) - + return res.json() diff --git a/paddlespeech/server/ws/asr_api.py b/paddlespeech/server/ws/asr_api.py index 0f7dcddda6c05cb169625e218c305c02d8530aa4..0faa131aaf27e04b535d685da2349b9d1b3268d8 100644 --- a/paddlespeech/server/ws/asr_api.py +++ b/paddlespeech/server/ws/asr_api.py @@ -18,9 +18,9 @@ from fastapi import WebSocket from fastapi import WebSocketDisconnect from starlette.websockets import WebSocketState as WebSocketState +from paddlespeech.cli.log import logger from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler from paddlespeech.server.engine.engine_pool import get_engine_pool - router = APIRouter() @@ -106,5 +106,5 @@ async def websocket_endpoint(websocket: WebSocket): # if the engine create the vad instance, this connection will have many period results resp = {'result': asr_results} await websocket.send_json(resp) - except WebSocketDisconnect: - pass + except WebSocketDisconnect as e: + logger.error(e) diff --git a/paddlespeech/t2s/frontend/tone_sandhi.py b/paddlespeech/t2s/frontend/tone_sandhi.py index 07f7fa2b8f8615af73fd656b0abd381e551179f9..e3102b9bc14ea89d065b5cb26a6339295bb26c66 100644 --- a/paddlespeech/t2s/frontend/tone_sandhi.py +++ b/paddlespeech/t2s/frontend/tone_sandhi.py @@ -63,7 +63,8 @@ class ToneSandhi(): '扫把', '惦记' } self.must_not_neural_tone_words = { - "男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子", "人人", "虎虎" + "男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子", "人人", "虎虎", + "幺幺" } self.punc = ":,;。?!“”‘’':,;.?!" diff --git a/paddlespeech/t2s/frontend/zh_normalization/num.py b/paddlespeech/t2s/frontend/zh_normalization/num.py index a83b42a47b70b30452d5908e58d6e7a5b1c2f93c..ec13677367f949b73de74692a941a36ac9acadc0 100644 --- a/paddlespeech/t2s/frontend/zh_normalization/num.py +++ b/paddlespeech/t2s/frontend/zh_normalization/num.py @@ -103,7 +103,7 @@ def replace_default_num(match): str """ number = match.group(0) - return verbalize_digit(number) + return verbalize_digit(number, alt_one=True) # 数字表达式 diff --git a/tests/unit/cli/cacu_rtf_by_aishell.sh b/tests/unit/cli/calc_rtf_by_aishell.sh similarity index 87% rename from tests/unit/cli/cacu_rtf_by_aishell.sh rename to tests/unit/cli/calc_rtf_by_aishell.sh index b9d68352dbab3c86e1211852bae16e2903078ea3..cee79160e0720ae7abb877ff1a39a84bd27a523d 100644 --- a/tests/unit/cli/cacu_rtf_by_aishell.sh +++ b/tests/unit/cli/calc_rtf_by_aishell.sh @@ -1,5 +1,6 @@ #!/bin/bash +source path.sh stage=-1 stop_stage=100 MAIN_ROOT=../../.. @@ -23,5 +24,5 @@ if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then fi if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then - cat data/manifest.test | paddlespeech asr --model conformer_online_aishell --rtf -v + cat data/manifest.test | paddlespeech asr --model conformer_online_aishell --device gpu --decode_method ctc_prefix_beam_search --rtf -v fi diff --git a/tests/unit/cli/path.sh b/tests/unit/cli/path.sh new file mode 100644 index 0000000000000000000000000000000000000000..38a242a4ab3dd01e29873e8f827f9bdb4656fb57 --- /dev/null +++ b/tests/unit/cli/path.sh @@ -0,0 +1,11 @@ +export MAIN_ROOT=`realpath ${PWD}/../../../` + +export PATH=${MAIN_ROOT}:${MAIN_ROOT}/utils:${PATH} +export LC_ALL=C + +export PYTHONDONTWRITEBYTECODE=1 +# Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C +export PYTHONIOENCODING=UTF-8 +export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH} + +export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/ diff --git a/tests/unit/server/offline/conf/application.yaml b/tests/unit/server/offline/conf/application.yaml index 762f4af6e952fad3c671b452899584ffcfe81aeb..ce399e288c9e4c51069538962568dfaa03565f8e 100644 --- a/tests/unit/server/offline/conf/application.yaml +++ b/tests/unit/server/offline/conf/application.yaml @@ -3,7 +3,7 @@ ################################################################################# # SERVER SETTING # ################################################################################# -host: 127.0.0.1 +host: 0.0.0.0 port: 8090 # The task format in the engin_list is: _ diff --git a/tests/unit/server/online/tts/check_server/conf/application.yaml b/tests/unit/server/online/tts/check_server/conf/application.yaml index dd1a7e197875df335b491f5fab971c58bc7d1a23..9bf663964c93c3a3de886664451ea93f2953aa06 100644 --- a/tests/unit/server/online/tts/check_server/conf/application.yaml +++ b/tests/unit/server/online/tts/check_server/conf/application.yaml @@ -3,7 +3,7 @@ ################################################################################# # SERVER SETTING # ################################################################################# -host: 127.0.0.1 +host: 0.0.0.0 port: 8092 # The task format in the engin_list is: _ diff --git a/tests/unit/server/online/tts/check_server/tts_online_application.yaml b/tests/unit/server/online/tts/check_server/tts_online_application.yaml index dd1a7e197875df335b491f5fab971c58bc7d1a23..9bf663964c93c3a3de886664451ea93f2953aa06 100644 --- a/tests/unit/server/online/tts/check_server/tts_online_application.yaml +++ b/tests/unit/server/online/tts/check_server/tts_online_application.yaml @@ -3,7 +3,7 @@ ################################################################################# # SERVER SETTING # ################################################################################# -host: 127.0.0.1 +host: 0.0.0.0 port: 8092 # The task format in the engin_list is: _