Merge pull request #1771 from lym0302/add_streaming_cli

[server] add streaming tts demos

Merge pull request #1771 from lym0302/add_streaming_cli
[server] add streaming tts demos
f256bb9c · 小湉湉 · GitHub · 87ef68f1 · c00c3159 · f256bb9c
20 changed file
--- a/demos/streaming_tts_server/README.md
+++ b/demos/streaming_tts_server/README.md
+([简体中文](./README_cn.md)|English)
+# Streaming Speech Synthesis Service
+## Introduction
+This demo is an implementation of starting the streaming speech synthesis service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
+## Usage
+### 1. Installation
+see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+It is recommended to use **paddlepaddle 2.2.1** or above.
+You can choose one way from meduim and hard to install paddlespeech.
+### 2. Prepare config File
+The configuration file can be found in `conf/tts_online_application.yaml` 。
+Among them, `protocol` indicates the network protocol used by the streaming TTS service. Currently, both http and websocket are supported.
+`engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
+This demo mainly introduces the streaming speech synthesis service, so the speech task should be set to `tts`.
+Currently, the engine type supports two forms: **online**  and **online-onnx**. `online` indicates an engine that uses python for dynamic graph inference; `online-onnx` indicates an engine that uses onnxruntime for inference. The inference speed of online-onnx is faster.
+Streaming TTS AM model support: **fastspeech2 and fastspeech2_cnndecoder**; Voc model support: **hifigan and mb_melgan**
+### 3. Server Usage
+- Command Line (Recommended)
+  ```bash
+  # start the service
+  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
+  ```
+  Usage:
+  ```bash
+  paddlespeech_server start --help
+  ```
+  Arguments:
+  - `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml
+  - `log_file`: log file. Default: ./log/paddlespeech.log
+  Output:
+  ```bash
+  [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
+  [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
+  [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
+  [2022-04-24 20:05:28,192] [    INFO] - **********************************************************************
+  INFO:     Started server process [14638]
+  [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
+  INFO:     Waiting for application startup.
+  [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  ```
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/tts_online_application.yaml", 
+      log_file="./log/paddlespeech.log")
+  ```
+  Output:
+  ```bash
+  [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
+  [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
+  [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
+  [2022-04-24 21:00:17,151] [    INFO] - **********************************************************************
+  INFO:     Started server process [320]
+  [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
+  INFO:     Waiting for application startup.
+  [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  ```
+### 4. Streaming TTS client Usage
+- Command Line (Recommended)
+    ```bash
+    # Access http streaming TTS service
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    # Access websocket streaming TTS service
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    ```
+    Usage:
+    ```bash
+    paddlespeech_client tts_online --help
+    ```
+    Arguments:
+    - `server_ip`: erver ip. Default: 127.0.0.1
+    - `port`: server port. Default: 8092
+    - `protocol`: Service protocol, choices: [http, websocket], default: http.
+    - `input`: (required): Input text to generate.
+    - `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
+    - `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
+    - `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
+    - `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
+    - `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
+    - `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
+    Output:
+    ```bash
+    [2022-04-24 21:08:18,559] [    INFO] - tts http client start
+    [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
+    [2022-04-24 21:08:21,704] [    INFO] - 尾包响应：3.1427218914031982 s
+    [2022-04-24 21:08:21,704] [    INFO] - 音频时长：3.825 s
+    [2022-04-24 21:08:21,704] [    INFO] - RTF: 0.8216266382753459
+    [2022-04-24 21:08:21,739] [    INFO] - 音频保存至：output.wav
+    ```
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
+  import json
+  executor = TTSOnlineClientExecutor()
+  executor(
+      input="您好，欢迎使用百度飞桨语音合成服务。",
+      server_ip="127.0.0.1",
+      port=8092,
+      protocol="http",
+      spk_id=0,
+      speed=1.0,
+      volume=1.0,
+      sample_rate=0,
+      output="./output.wav",
+      play=False)
+  ```
+  Output:
+  ```bash
+  [2022-04-24 21:11:13,798] [    INFO] - tts http client start
+  [2022-04-24 21:11:16,800] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+  [2022-04-24 21:11:16,801] [    INFO] - 首包响应：0.18234872817993164 s
+  [2022-04-24 21:11:16,801] [    INFO] - 尾包响应：3.0013909339904785 s
+  [2022-04-24 21:11:16,802] [    INFO] - 音频时长：3.825 s
+  [2022-04-24 21:11:16,802] [    INFO] - RTF: 0.7846773683635238
+  [2022-04-24 21:11:16,837] [    INFO] - 音频保存至：./output.wav
+  ```
--- a/demos/streaming_tts_server/README_cn.md
+++ b/demos/streaming_tts_server/README_cn.md
+([简体中文](./README_cn.md)|English)
+# 流式语音合成服务
+## 介绍
+这个demo是一个启动流式语音合成服务和访问该服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
+## 使用方法
+### 1. 安装
+请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+推荐使用 **paddlepaddle 2.2.1** 或以上版本。
+你可以从 medium，hard 两种方式中选择一种方式安装 PaddleSpeech。
+### 2. 准备配置文件
+配置文件可参见 `conf/tts_online_application.yaml` 。
+其中，`protocol`表示该流式TTS服务使用的网络协议，目前支持 http 和 websocket 两种。
+其中，`engine_list`表示即将启动的服务将会包含的语音引擎，格式为 <语音任务>_<引擎类型>。
+该demo主要介绍流式语音合成服务，因此语音任务应设置为tts。
+目前引擎类型支持两种形式：**online** 表示使用python进行动态图推理的引擎；**online-onnx** 表示使用onnxruntime进行推理的引擎。其中，online-onnx的推理速度更快。
+流式TTS的AM 模型支持：fastspeech2 以及fastspeech2_cnndecoder; Voc 模型支持：hifigan, mb_melgan
+### 3. 服务端使用方法
+- 命令行 (推荐使用)
+  ```bash
+  # 启动服务
+  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
+  ```
+  使用方法：
+  ```bash
+  paddlespeech_server start --help
+  ```
+  参数:
+  - `config_file`: 服务的配置文件，默认： ./conf/application.yaml
+  - `log_file`: log 文件. 默认：./log/paddlespeech.log
+  输出:
+  ```bash
+  [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
+  [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
+  [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
+  [2022-04-24 20:05:28,192] [    INFO] - **********************************************************************
+  INFO:     Started server process [14638]
+  [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
+  INFO:     Waiting for application startup.
+  [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  ```
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/tts_online_application.yaml", 
+      log_file="./log/paddlespeech.log")
+  ```
+  输出：
+  ```bash
+  [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
+  [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
+  [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
+  [2022-04-24 21:00:17,151] [    INFO] - **********************************************************************
+  INFO:     Started server process [320]
+  [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
+  INFO:     Waiting for application startup.
+  [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
+  ```
+### 4. 流式TTS 客户端使用方法
+- 命令行 (推荐使用)
+    ```bash
+    # 访问 http 流式TTS服务
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    # 访问 websocket 流式TTS服务
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    ```
+    使用帮助:
+    ```bash
+    paddlespeech_client tts_online --help
+    ```
+    参数:
+    - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+    - `port`: 服务端口，默认: 8092。
+    - `protocol`: 服务协议，可选 [http, websocket], 默认: http。
+    - `input`: (必须输入): 待合成的文本。
+    - `spk_id`: 说话人 id，用于多说话人语音合成，默认值： 0。
+    - `speed`: 音频速度，该值应设置在 0 到 3 之间。 默认值：1.0
+    - `volume`: 音频音量，该值应设置在 0 到 3 之间。 默认值： 1.0
+    - `sample_rate`: 采样率，可选 [0, 8000, 16000]，默认值：0，表示与模型采样率相同
+    - `output`: 输出音频的路径， 默认值：None，表示不保存音频到本地。
+    - `play`: 是否播放音频，边合成边播放， 默认值：False，表示不播放。**播放音频需要依赖pyaudio库**。
+    输出:
+    ```bash
+    [2022-04-24 21:08:18,559] [    INFO] - tts http client start
+    [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
+    [2022-04-24 21:08:21,704] [    INFO] - 尾包响应：3.1427218914031982 s
+    [2022-04-24 21:08:21,704] [    INFO] - 音频时长：3.825 s
+    [2022-04-24 21:08:21,704] [    INFO] - RTF: 0.8216266382753459
+    [2022-04-24 21:08:21,739] [    INFO] - 音频保存至：output.wav
+    ```
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
+  import json
+  executor = TTSOnlineClientExecutor()
+  executor(
+      input="您好，欢迎使用百度飞桨语音合成服务。",
+      server_ip="127.0.0.1",
+      port=8092,
+      protocol="http",
+      spk_id=0,
+      speed=1.0,
+      volume=1.0,
+      sample_rate=0,
+      output="./output.wav",
+      play=False)
+  ```
+  输出:
+  ```bash
+  [2022-04-24 21:11:13,798] [    INFO] - tts http client start
+  [2022-04-24 21:11:16,800] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+  [2022-04-24 21:11:16,801] [    INFO] - 首包响应：0.18234872817993164 s
+  [2022-04-24 21:11:16,801] [    INFO] - 尾包响应：3.0013909339904785 s
+  [2022-04-24 21:11:16,802] [    INFO] - 音频时长：3.825 s
+  [2022-04-24 21:11:16,802] [    INFO] - RTF: 0.7846773683635238
+  [2022-04-24 21:11:16,837] [    INFO] - 音频保存至：./output.wav
+  ```
--- a/demos/streaming_tts_server/conf/tts_online_application.yaml
+++ b/demos/streaming_tts_server/conf/tts_online_application.yaml
+# This is the parameter configuration file for PaddleSpeech Serving.
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 127.0.0.1
+port: 8092
+# The task format in the engin_list is: <speech task>_<engine type>
+# engine_list choices = ['tts_online', 'tts_online-onnx']
+# protocol = ['websocket', 'http'] (only one can be selected).
+protocol: 'http'
+engine_list: ['tts_online-onnx']
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+################################### TTS #########################################
+################### speech task: tts; engine_type: online #######################
+tts_online: 
+    # am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']        
+    am: 'fastspeech2_csmsc'   
+    am_config: 
+    am_ckpt: 
+    am_stat: 
+    phones_dict: 
+    tones_dict: 
+    speaker_dict: 
+    spk_id: 0
+    # voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
+    voc: 'mb_melgan_csmsc'
+    voc_config: 
+    voc_ckpt: 
+    voc_stat: 
+    # others
+    lang: 'zh'
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+    am_block: 42
+    am_pad: 12
+    voc_block: 14
+    voc_pad: 14
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+################################### TTS #########################################
+################### speech task: tts; engine_type: online-onnx #######################
+tts_online-onnx: 
+    # am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']        
+    am: 'fastspeech2_cnndecoder_csmsc_onnx' 
+    # am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
+    # if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
+    am_ckpt:   # list
+    am_stat: 
+    phones_dict: 
+    tones_dict: 
+    speaker_dict: 
+    spk_id: 0
+    am_sample_rate: 24000
+    am_sess_conf:
+        device: "cpu" # set 'gpu:id' or 'cpu'
+        use_trt: False
+        cpu_threads: 4
+    # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
+    voc: 'hifigan_csmsc_onnx'
+    voc_ckpt: 
+    voc_sample_rate: 24000
+    voc_sess_conf:
+        device: "cpu" # set 'gpu:id' or 'cpu'
+        use_trt: False
+        cpu_threads: 4
+    # others
+    lang: 'zh'
+    am_block: 42
+    am_pad: 12
+    voc_block: 14
+    voc_pad: 14
+    voc_upsample: 300
--- a/demos/streaming_tts_server/start_server.sh
+++ b/demos/streaming_tts_server/start_server.sh
+#!/bin/bash
+# start server
+paddlespeech_server start --config_file ./conf/tts_online_application.yaml
\ No newline at end of file
--- a/demos/streaming_tts_server/test_client.sh
+++ b/demos/streaming_tts_server/test_client.sh
+#!/bin/bash
+# http client test
+paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+# websocket client test
+#paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
\ No newline at end of file
--- a/paddlespeech/server/README.md
+++ b/paddlespeech/server/README.md
@@ -48,3 +48,16 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
 ```
 paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
 ```
+## Online TTS Server
+### Lanuch online tts server
+```
+paddlespeech_server start --config_file conf/tts_online_application.yaml
+```
+### Access online tts server
+```
+paddlespeech_client tts_online  --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨深度学习框架！" --output output.wav
+```
--- a/paddlespeech/server/README_cn.md
+++ b/paddlespeech/server/README_cn.md
@@ -49,3 +49,17 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
 ```
 paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input zh.wav
 ```
+## 流式TTS
+### 启动流式语音合成服务
+```
+paddlespeech_server start --config_file conf/tts_online_application.yaml
+```
+### 访问流式语音合成服务
+```
+paddlespeech_client tts_online  --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨深度学习框架！" --output output.wav
+```
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
@@ -35,8 +35,8 @@ from paddlespeech.server.utils.audio_process import wav2pcm
 from paddlespeech.server.utils.util import wav2base64
 __all__ = [
-    'TTSClientExecutor', 'ASRClientExecutor', 'ASROnlineClientExecutor',
+    'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor',
-    'CLSClientExecutor'
+    'ASROnlineClientExecutor', 'CLSClientExecutor'
 ]
@@ -161,6 +161,116 @@ class TTSClientExecutor(BaseExecutor):
        return res
+@cli_client_register(
+    name='paddlespeech_client.tts_online',
+    description='visit tts online service')
+class TTSOnlineClientExecutor(BaseExecutor):
+    def __init__(self):
+        super(TTSOnlineClientExecutor, self).__init__()
+        self.parser = argparse.ArgumentParser(
+            prog='paddlespeech_client.tts_online', add_help=True)
+        self.parser.add_argument(
+            '--server_ip', type=str, default='127.0.0.1', help='server ip')
+        self.parser.add_argument(
+            '--port', type=int, default=8092, help='server port')
+        self.parser.add_argument(
+            '--protocol',
+            type=str,
+            default="http",
+            choices=["http", "websocket"],
+            help='server protocol')
+        self.parser.add_argument(
+            '--input',
+            type=str,
+            default=None,
+            help='Text to be synthesized.',
+            required=True)
+        self.parser.add_argument(
+            '--spk_id', type=int, default=0, help='Speaker id')
+        self.parser.add_argument(
+            '--speed',
+            type=float,
+            default=1.0,
+            help='Audio speed, the value should be set between 0 and 3')
+        self.parser.add_argument(
+            '--volume',
+            type=float,
+            default=1.0,
+            help='Audio volume, the value should be set between 0 and 3')
+        self.parser.add_argument(
+            '--sample_rate',
+            type=int,
+            default=0,
+            choices=[0, 8000, 16000],
+            help='Sampling rate, the default is the same as the model')
+        self.parser.add_argument(
+            '--output', type=str, default=None, help='Synthesized audio file')
+        self.parser.add_argument(
+            "--play", type=bool, help="whether to play audio", default=False)
+    def execute(self, argv: List[str]) -> bool:
+        args = self.parser.parse_args(argv)
+        input_ = args.input
+        server_ip = args.server_ip
+        port = args.port
+        protocol = args.protocol
+        spk_id = args.spk_id
+        speed = args.speed
+        volume = args.volume
+        sample_rate = args.sample_rate
+        output = args.output
+        play = args.play
+        try:
+            res = self(
+                input=input_,
+                server_ip=server_ip,
+                port=port,
+                protocol=protocol,
+                spk_id=spk_id,
+                speed=speed,
+                volume=volume,
+                sample_rate=sample_rate,
+                output=output,
+                play=play)
+            return True
+        except Exception as e:
+            logger.error("Failed to synthesized audio.")
+            return False
+    @stats_wrapper
+    def __call__(self,
+                 input: str,
+                 server_ip: str="127.0.0.1",
+                 port: int=8092,
+                 protocol: str="http",
+                 spk_id: int=0,
+                 speed: float=1.0,
+                 volume: float=1.0,
+                 sample_rate: int=0,
+                 output: str=None,
+                 play: bool=False):
+        """
+        Python API to call an executor.
+        """
+        if protocol == "http":
+            logger.info("tts http client start")
+            from paddlespeech.server.utils.audio_handler import TTSHttpHandler
+            handler = TTSHttpHandler(server_ip, port, play)
+            handler.run(input, spk_id, speed, volume, sample_rate, output)
+        elif protocol == "websocket":
+            from paddlespeech.server.utils.audio_handler import TTSWsHandler
+            logger.info("tts websocket client start")
+            handler = TTSWsHandler(server_ip, port, play)
+            loop = asyncio.get_event_loop()
+            loop.run_until_complete(handler.run(input, output))
+        else:
+            logger.error("Please set correct protocol, http or websocket")
 @cli_client_register(
    name='paddlespeech_client.asr', description='visit asr service')
 class ASRClientExecutor(BaseExecutor):

--- a/paddlespeech/server/conf/tts_online_application.yaml
+++ b/paddlespeech/server/conf/tts_online_application.yaml
@@ -10,7 +10,7 @@ port: 8092
 # task choices = ['tts_online', 'tts_online-onnx']
 # protocol = ['websocket', 'http'] (only one can be selected).
 protocol: 'http'
-engine_list: ['tts_online']
+engine_list: ['tts_online-onnx']
 #################################################################################
@@ -67,16 +67,16 @@ tts_online-onnx:
    am_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
-    voc: 'mb_melgan_csmsc_onnx'
+    voc: 'hifigan_csmsc_onnx'
    voc_ckpt: 
    voc_sample_rate: 24000
    voc_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # others
    lang: 'zh'

--- a/paddlespeech/server/engine/tts/online/python/tts_engine.py
+++ b/paddlespeech/server/engine/tts/online/python/tts_engine.py
@@ -202,7 +202,6 @@ class TTSServerExecutor(TTSExecutor):
        """
        Init model and other resources from a specific path.
        """
-        #import pdb;pdb.set_trace()
        if hasattr(self, 'am_inference') and hasattr(self, 'voc_inference'):
            logger.info('Models had been initialized.')
            return
@@ -391,8 +390,7 @@ class TTSServerExecutor(TTSExecutor):
            # fastspeech2_cnndecoder_csmsc 
            elif am == "fastspeech2_cnndecoder_csmsc":
                # am 
-                orig_hs, h_masks = self.am_inference.encoder_infer(
+                orig_hs = self.am_inference.encoder_infer(part_phone_ids)
-                    part_phone_ids)
                # streaming voc chunk info
                mel_len = orig_hs.shape[1]
@@ -404,7 +402,7 @@ class TTSServerExecutor(TTSExecutor):
                hss = get_chunks(orig_hs, self.am_block, self.am_pad, "am")
                am_chunk_num = len(hss)
                for i, hs in enumerate(hss):
-                    before_outs, _ = self.am_inference.decoder(hs)
+                    before_outs = self.am_inference.decoder(hs)
                    after_outs = before_outs + self.am_inference.postnet(
                        before_outs.transpose((0, 2, 1))).transpose((0, 2, 1))
                    normalized_mel = after_outs[0]

--- a/paddlespeech/server/tests/tts/online/http_client.py
+++ b/paddlespeech/server/tests/tts/online/http_client.py
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -12,75 +12,19 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
-import base64
-import json
-import os
-import time
-import requests
-from paddlespeech.server.utils.audio_process import pcm2wav
-def save_audio(buffer, audio_path) -> bool:
-    if args.save_path.endswith("pcm"):
-        with open(args.save_path, "wb") as f:
-            f.write(buffer)
-    elif args.save_path.endswith("wav"):
-        with open("./tmp.pcm", "wb") as f:
-            f.write(buffer)
-        pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000)
-        os.system("rm ./tmp.pcm")
-    else:
-        print("Only supports saved audio format is pcm or wav")
-        return False
-    return True
-def test(args):
-    params = {
-        "text": args.text,
-        "spk_id": args.spk_id,
-        "speed": args.speed,
-        "volume": args.volume,
-        "sample_rate": args.sample_rate,
-        "save_path": ''
-    }
-    buffer = b''
-    flag = 1
-    url = "http://" + str(args.server) + ":" + str(
-        args.port) + "/paddlespeech/streaming/tts"
-    st = time.time()
-    html = requests.post(url, json.dumps(params), stream=True)
-    for chunk in html.iter_content(chunk_size=1024):
-        chunk = base64.b64decode(chunk)  # bytes
-        if flag:
-            first_response = time.time() - st
-            print(f"首包响应：{first_response} s")
-            flag = 0
-        buffer += chunk
-    final_response = time.time() - st
-    duration = len(buffer) / 2.0 / 24000
-    print(f"尾包响应：{final_response} s")
-    print(f"音频时长：{duration} s")
-    print(f"RTF: {final_response / duration}")
-    if args.save_path is not None:
-        if save_audio(buffer, args.save_path):
-            print("音频保存至：", args.save_path)
+from paddlespeech.server.utils.audio_handler import TTSHttpHandler
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
-        '--text',
+        "--text",
        type=str,
-        default="您好，欢迎使用语音合成服务。",
+        help="A sentence to be synthesized",
-        help='A sentence to be synthesized')
+        default="您好，欢迎使用语音合成服务。")
+    parser.add_argument(
+        "--server", type=str, help="server ip", default="127.0.0.1")
+    parser.add_argument("--port", type=int, help="server port", default=8092)
    parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
    parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
    parser.add_argument(
@@ -89,12 +33,15 @@ if __name__ == "__main__":
        '--sample_rate',
        type=int,
        default=0,
+        choices=[0, 8000, 16000],
        help='Sampling rate, the default is the same as the model')
    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
+        "--output", type=str, help="save audio path", default=None)
-    parser.add_argument("--port", type=int, help="server port", default=8092)
    parser.add_argument(
-        "--save_path", type=str, help="save audio path", default=None)
+        "--play", type=bool, help="whether to play audio", default=False)
    args = parser.parse_args()
-    test(args)
+    print("tts http client start")
+    handler = TTSHttpHandler(args.server, args.port, args.play)
+    handler.run(args.text, args.spk_id, args.speed, args.volume,
+                args.sample_rate, args.output)
--- a/paddlespeech/server/tests/tts/online/http_client_playaudio.py
+++ b/paddlespeech/server/tests/tts/online/http_client_playaudio.py
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import argparse
-import base64
-import json
-import threading
-import time
-import pyaudio
-import requests
-mutex = threading.Lock()
-buffer = b''
-p = pyaudio.PyAudio()
-stream = p.open(
-    format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
-max_fail = 50
-def play_audio():
-    global stream
-    global buffer
-    global max_fail
-    while True:
-        if not buffer:
-            max_fail -= 1
-            time.sleep(0.05)
-            if max_fail < 0:
-                break
-        mutex.acquire()
-        stream.write(buffer)
-        buffer = b''
-        mutex.release()
-def test(args):
-    global mutex
-    global buffer
-    params = {
-        "text": args.text,
-        "spk_id": args.spk_id,
-        "speed": args.speed,
-        "volume": args.volume,
-        "sample_rate": args.sample_rate,
-        "save_path": ''
-    }
-    all_bytes = 0.0
-    t = threading.Thread(target=play_audio)
-    flag = 1
-    url = "http://" + str(args.server) + ":" + str(
-        args.port) + "/paddlespeech/streaming/tts"
-    st = time.time()
-    html = requests.post(url, json.dumps(params), stream=True)
-    for chunk in html.iter_content(chunk_size=1024):
-        mutex.acquire()
-        chunk = base64.b64decode(chunk)  # bytes
-        buffer += chunk
-        mutex.release()
-        if flag:
-            first_response = time.time() - st
-            print(f"首包响应：{first_response} s")
-            flag = 0
-            t.start()
-        all_bytes += len(chunk)
-    final_response = time.time() - st
-    duration = all_bytes / 2 / 24000
-    print(f"尾包响应：{final_response} s")
-    print(f"音频时长：{duration} s")
-    print(f"RTF: {final_response / duration}")
-    t.join()
-    stream.stop_stream()
-    stream.close()
-    p.terminate()
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        '--text',
-        type=str,
-        default="您好，欢迎使用语音合成服务。",
-        help='A sentence to be synthesized')
-    parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
-    parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
-    parser.add_argument(
-        '--volume', type=float, default=1.0, help='Audio volume')
-    parser.add_argument(
-        '--sample_rate',
-        type=int,
-        default=0,
-        help='Sampling rate, the default is the same as the model')
-    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
-    parser.add_argument("--port", type=int, help="server port", default=8092)
-    args = parser.parse_args()
-    test(args)
--- a/paddlespeech/server/tests/tts/online/ws_client.py
+++ b/paddlespeech/server/tests/tts/online/ws_client.py
@@ -11,92 +11,10 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import _thread as thread
 import argparse
-import base64
+import asyncio
-import json
-import ssl
-import time
-import websocket
-flag = 1
-st = 0.0
-all_bytes = b''
-class WsParam(object):
-    # 初始化
-    def __init__(self, text, server="127.0.0.1", port=8090):
-        self.server = server
-        self.port = port
-        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
-        self.text = text
-    # 生成url
-    def create_url(self):
-        return self.url
-def on_message(ws, message):
-    global flag
-    global st
-    global all_bytes
-    try:
-        message = json.loads(message)
-        audio = message["audio"]
-        audio = base64.b64decode(audio)  # bytes
-        status = message["status"]
-        all_bytes += audio
-        if status == 0:
-            print("create successfully.")
-        elif status == 1:
-            if flag:
-                print(f"首包响应：{time.time() - st} s")
-                flag = 0
-        elif status == 2:
-            final_response = time.time() - st
-            duration = len(all_bytes) / 2.0 / 24000
-            print(f"尾包响应：{final_response} s")
-            print(f"音频时长：{duration} s")
-            print(f"RTF: {final_response / duration}")
-            with open("./out.pcm", "wb") as f:
-                f.write(all_bytes)
-            print("ws is closed")
-            ws.close()
-        else:
-            print("infer error")
-    except Exception as e:
-        print("receive msg,but parse exception:", e)
-# 收到websocket错误的处理
-def on_error(ws, error):
-    print("### error:", error)
-# 收到websocket关闭的处理
-def on_close(ws):
-    print("### closed ###")
-# 收到websocket连接建立的处理
-def on_open(ws):
-    def run(*args):
-        global st
-        text_base64 = str(
-            base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
-        d = {"text": text_base64}
-        d = json.dumps(d)
-        print("Start sending text data")
-        st = time.time()
-        ws.send(d)
-    thread.start_new_thread(run, ())
+from paddlespeech.server.utils.audio_handler import TTSWsHandler
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
@@ -108,19 +26,13 @@ if __name__ == "__main__":
    parser.add_argument(
        "--server", type=str, help="server ip", default="127.0.0.1")
    parser.add_argument("--port", type=int, help="server port", default=8092)
+    parser.add_argument(
+        "--output", type=str, help="save audio path", default=None)
+    parser.add_argument(
+        "--play", type=bool, help="whether to play audio", default=False)
    args = parser.parse_args()
-    print("***************************************")
+    print("tts websocket client start")
-    print("Server ip: ", args.server)
+    handler = TTSWsHandler(args.server, args.port, args.play)
-    print("Server port: ", args.port)
+    loop = asyncio.get_event_loop()
-    print("Sentence to be synthesized: ", args.text)
+    loop.run_until_complete(handler.run(args.text, args.output))
-    print("***************************************")
-    wsParam = WsParam(text=args.text, server=args.server, port=args.port)
-    websocket.enableTrace(False)
-    wsUrl = wsParam.create_url()
-    ws = websocket.WebSocketApp(
-        wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
-    ws.on_open = on_open
-    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
--- a/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
+++ b/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import _thread as thread
-import argparse
-import base64
-import json
-import ssl
-import threading
-import time
-import pyaudio
-import websocket
-mutex = threading.Lock()
-buffer = b''
-p = pyaudio.PyAudio()
-stream = p.open(
-    format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
-flag = 1
-st = 0.0
-all_bytes = 0.0
-class WsParam(object):
-    # 初始化
-    def __init__(self, text, server="127.0.0.1", port=8090):
-        self.server = server
-        self.port = port
-        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
-        self.text = text
-    # 生成url
-    def create_url(self):
-        return self.url
-def play_audio():
-    global stream
-    global buffer
-    while True:
-        time.sleep(0.05)
-        if not buffer:  # buffer 为空
-            break
-        mutex.acquire()
-        stream.write(buffer)
-        buffer = b''
-        mutex.release()
-t = threading.Thread(target=play_audio)
-def on_message(ws, message):
-    global flag
-    global t
-    global buffer
-    global st
-    global all_bytes
-    try:
-        message = json.loads(message)
-        audio = message["audio"]
-        audio = base64.b64decode(audio)  # bytes
-        status = message["status"]
-        all_bytes += len(audio)
-        if status == 0:
-            print("create successfully.")
-        elif status == 1:
-            mutex.acquire()
-            buffer += audio
-            mutex.release()
-            if flag:
-                print(f"首包响应：{time.time() - st} s")
-                flag = 0
-                print("Start playing audio")
-                t.start()
-        elif status == 2:
-            final_response = time.time() - st
-            duration = all_bytes / 2 / 24000
-            print(f"尾包响应：{final_response} s")
-            print(f"音频时长：{duration} s")
-            print(f"RTF: {final_response / duration}")
-            print("ws is closed")
-            ws.close()
-        else:
-            print("infer error")
-    except Exception as e:
-        print("receive msg,but parse exception:", e)
-# 收到websocket错误的处理
-def on_error(ws, error):
-    print("### error:", error)
-# 收到websocket关闭的处理
-def on_close(ws):
-    print("### closed ###")
-# 收到websocket连接建立的处理
-def on_open(ws):
-    def run(*args):
-        global st
-        text_base64 = str(
-            base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
-        d = {"text": text_base64}
-        d = json.dumps(d)
-        print("Start sending text data")
-        st = time.time()
-        ws.send(d)
-    thread.start_new_thread(run, ())
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "--text",
-        type=str,
-        help="A sentence to be synthesized",
-        default="您好，欢迎使用语音合成服务。")
-    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
-    parser.add_argument("--port", type=int, help="server port", default=8092)
-    args = parser.parse_args()
-    print("***************************************")
-    print("Server ip: ", args.server)
-    print("Server port: ", args.port)
-    print("Sentence to be synthesized: ", args.text)
-    print("***************************************")
-    wsParam = WsParam(text=args.text, server=args.server, port=args.port)
-    websocket.enableTrace(False)
-    wsUrl = wsParam.create_url()
-    ws = websocket.WebSocketApp(
-        wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
-    ws.on_open = on_open
-    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
-    t.join()
-    print("End of playing audio")
-    stream.stop_stream()
-    stream.close()
-    p.terminate()
--- a/paddlespeech/server/utils/audio_handler.py
+++ b/paddlespeech/server/utils/audio_handler.py
@@ -11,14 +11,19 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import base64
 import json
 import logging
+import threading
+import time
 import numpy as np
+import requests
 import soundfile
 import websockets
 from paddlespeech.cli.log import logger
+from paddlespeech.server.utils.audio_process import save_audio
 class ASRAudioHandler:
@@ -117,3 +122,221 @@ class ASRAudioHandler:
            logger.info("final receive msg={}".format(msg))
            result = msg
            return result
+class TTSWsHandler:
+    def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
+        """PaddleSpeech Online TTS Server Client  audio handler
+           Online tts server use the websocket protocal
+        Args:
+            server (str, optional): the server ip. Defaults to "127.0.0.1".
+            port (int, optional): the server port. Defaults to 8092.
+            play (bool, optional): whether to play audio. Defaults False
+        """
+        self.server = server
+        self.port = port
+        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
+        self.play = play
+        if self.play:
+            import pyaudio
+            self.buffer = b''
+            self.p = pyaudio.PyAudio()
+            self.stream = self.p.open(
+                format=self.p.get_format_from_width(2),
+                channels=1,
+                rate=24000,
+                output=True)
+            self.mutex = threading.Lock()
+            self.start_play = True
+            self.t = threading.Thread(target=self.play_audio)
+            self.max_fail = 50
+    def play_audio(self):
+        while True:
+            if not self.buffer:
+                self.max_fail -= 1
+                time.sleep(0.05)
+                if self.max_fail < 0:
+                    break
+            self.mutex.acquire()
+            self.stream.write(self.buffer)
+            self.buffer = b''
+            self.mutex.release()
+    async def run(self, text: str, output: str=None):
+        """Send a text to online server
+        Args:
+            text (str): sentence to be synthesized
+            output (str): save audio path
+        """
+        all_bytes = b''
+        # 1. Send websocket handshake protocal
+        async with websockets.connect(self.url) as ws:
+            # 2. Server has already received handshake protocal
+            # send text to engine
+            text_base64 = str(base64.b64encode((text).encode('utf-8')), "UTF8")
+            d = {"text": text_base64}
+            d = json.dumps(d)
+            st = time.time()
+            await ws.send(d)
+            logging.info("send a message to the server")
+            # 3. Process the received response 
+            message = await ws.recv()
+            logger.info(f"句子：{text}")
+            logger.info(f"首包响应：{time.time() - st} s")
+            message = json.loads(message)
+            status = message["status"]
+            while (status == 1):
+                audio = message["audio"]
+                audio = base64.b64decode(audio)  # bytes
+                all_bytes += audio
+                if self.play:
+                    self.mutex.acquire()
+                    self.buffer += audio
+                    self.mutex.release()
+                    if self.start_play:
+                        self.t.start()
+                        self.start_play = False
+                message = await ws.recv()
+                message = json.loads(message)
+                status = message["status"]
+            # 4. Last packet, no audio information
+            if status == 2:
+                final_response = time.time() - st
+                duration = len(all_bytes) / 2.0 / 24000
+                logger.info(f"尾包响应：{final_response} s")
+                logger.info(f"音频时长：{duration} s")
+                logger.info(f"RTF: {final_response / duration}")
+                if output is not None:
+                    if save_audio(all_bytes, output):
+                        logger.info(f"音频保存至：{output}")
+                    else:
+                        logger.error("save audio error")
+            else:
+                logger.error("infer error")
+            if self.play:
+                self.t.join()
+                self.stream.stop_stream()
+                self.stream.close()
+                self.p.terminate()
+class TTSHttpHandler:
+    def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
+        """PaddleSpeech Online TTS Server Client  audio handler
+           Online tts server use the websocket protocal
+        Args:
+            server (str, optional): the server ip. Defaults to "127.0.0.1".
+            port (int, optional): the server port. Defaults to 8092.
+            play (bool, optional): whether to play audio. Defaults False
+        """
+        self.server = server
+        self.port = port
+        self.url = "http://" + str(self.server) + ":" + str(
+            self.port) + "/paddlespeech/streaming/tts"
+        self.play = play
+        if self.play:
+            import pyaudio
+            self.buffer = b''
+            self.p = pyaudio.PyAudio()
+            self.stream = self.p.open(
+                format=self.p.get_format_from_width(2),
+                channels=1,
+                rate=24000,
+                output=True)
+            self.mutex = threading.Lock()
+            self.start_play = True
+            self.t = threading.Thread(target=self.play_audio)
+            self.max_fail = 50
+    def play_audio(self):
+        while True:
+            if not self.buffer:
+                self.max_fail -= 1
+                time.sleep(0.05)
+                if self.max_fail < 0:
+                    break
+            self.mutex.acquire()
+            self.stream.write(self.buffer)
+            self.buffer = b''
+            self.mutex.release()
+    def run(self,
+            text: str,
+            spk_id=0,
+            speed=1.0,
+            volume=1.0,
+            sample_rate=0,
+            output: str=None):
+        """Send a text to tts online server
+        Args:
+            text (str): sentence to be synthesized.
+            spk_id (int, optional): speaker id. Defaults to 0.
+            speed (float, optional): audio speed. Defaults to 1.0.
+            volume (float, optional): audio volume. Defaults to 1.0.
+            sample_rate (int, optional): audio sample rate, 0 means the same as model. Defaults to 0.
+            output (str, optional): save audio path. Defaults to None.
+        """
+        # 1. Create request
+        params = {
+            "text": text,
+            "spk_id": spk_id,
+            "speed": speed,
+            "volume": volume,
+            "sample_rate": sample_rate,
+            "save_path": output
+        }
+        all_bytes = b''
+        first_flag = 1
+        # 2. Send request
+        st = time.time()
+        html = requests.post(self.url, json.dumps(params), stream=True)
+        # 3. Process the received response 
+        for chunk in html.iter_content(chunk_size=1024):
+            audio = base64.b64decode(chunk)  # bytes
+            if first_flag:
+                first_response = time.time() - st
+                first_flag = 0
+            if self.play:
+                self.mutex.acquire()
+                self.buffer += audio
+                self.mutex.release()
+                if self.start_play:
+                    self.t.start()
+                    self.start_play = False
+            all_bytes += audio
+        final_response = time.time() - st
+        duration = len(all_bytes) / 2.0 / 24000
+        logger.info(f"句子：{text}")
+        logger.info(f"首包响应：{first_response} s")
+        logger.info(f"尾包响应：{final_response} s")
+        logger.info(f"音频时长：{duration} s")
+        logger.info(f"RTF: {final_response / duration}")
+        if output is not None:
+            if save_audio(all_bytes, output):
+                logger.info(f"音频保存至：{output}")
+            else:
+                logger.error("save audio error")
+        if self.play:
+            self.t.join()
+            self.stream.stop_stream()
+            self.stream.close()
+            self.p.terminate()
--- a/paddlespeech/server/utils/audio_process.py
+++ b/paddlespeech/server/utils/audio_process.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import os
 import wave
 import numpy as np
@@ -140,3 +141,35 @@ def pcm2float(data):
        bits = np.iinfo(np.int16).bits
        data = data / (2**(bits - 1))
    return data
+def save_audio(bytes_data, audio_path, sample_rate: int=24000) -> bool:
+    """save byte to audio file.
+    Args:
+        bytes_data (bytes): audio samples, bytes format
+        audio_path (str): save audio path
+        sample_rate (int, optional): audio sample rate. Defaults to 24000.
+    Returns:
+        bool: Whether the audio was saved successfully
+    """
+    if audio_path.endswith("pcm"):
+        with open(audio_path, "wb") as f:
+            f.write(bubytes_dataffer)
+    elif audio_path.endswith("wav"):
+        with open("./tmp.pcm", "wb") as f:
+            f.write(bytes_data)
+        pcm2wav(
+            "./tmp.pcm",
+            audio_path,
+            channels=1,
+            bits=16,
+            sample_rate=sample_rate)
+        os.system("rm ./tmp.pcm")
+    else:
+        print("Only supports saved audio format is pcm or wav")
+        return False
+    return True
--- a/tests/unit/server/online/tts/check_server/conf/application.yaml
+++ b/tests/unit/server/online/tts/check_server/conf/application.yaml
@@ -67,7 +67,7 @@ tts_online-onnx:
    am_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
    voc: 'mb_melgan_csmsc_onnx'
@@ -76,7 +76,7 @@ tts_online-onnx:
    voc_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # others
    lang: 'zh'

--- a/tests/unit/server/online/tts/check_server/test.sh
+++ b/tests/unit/server/online/tts/check_server/test.sh
@@ -28,7 +28,7 @@ StartService(){
 ClientTest_http(){
    for ((i=1; i<=3;i++))
    do
-    python http_client.py --save_path ./out_http.wav 
+    paddlespeech_client tts_online --input "您好，欢迎使用百度飞桨深度学习框架。" 
    ((http_test_times+=1))
    done
 }
@@ -36,7 +36,7 @@ ClientTest_http(){
 ClientTest_ws(){
    for ((i=1; i<=3;i++))
    do
-    python ws_client.py
+    paddlespeech_client tts_online --input "您好，欢迎使用百度飞桨深度学习框架。" --protocol websocket
    ((ws_test_times+=1))
    done
 }
@@ -71,6 +71,7 @@ rm -rf $log/server.log.wf
 rm -rf $log/server.log
 rm -rf $log/test_result.log
 config_file=./conf/application.yaml
 server_ip=$(cat $config_file | grep "host" | awk -F " " '{print $2}')
 port=$(cat $config_file | grep "port" | awk '/port:/ {print $2}')

--- a/tests/unit/server/online/tts/check_server/test_all.sh
+++ b/tests/unit/server/online/tts/check_server/test_all.sh
@@ -3,6 +3,8 @@
 log_all_dir=./log
+cp ./tts_online_application.yaml ./conf/application.yaml -rf
 bash test.sh tts_online $log_all_dir/log_tts_online_cpu
 python change_yaml.py --change_type engine_type --target_key engine_list --target_value tts_online-onnx

--- a/tests/unit/server/online/tts/check_server/tts_online_application.yaml
+++ b/tests/unit/server/online/tts/check_server/tts_online_application.yaml
@@ -67,7 +67,7 @@ tts_online-onnx:
    am_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
    voc: 'mb_melgan_csmsc_onnx'
@@ -76,7 +76,7 @@ tts_online-onnx:
    voc_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # others
    lang: 'zh'