diff --git a/README.md b/README.md
index 379550cee4ea66b9ff7b48ed5d74a266731cd55e..2ade8a69ce0ad8cc9b9af77b76e87c9ba5e90b7b 100644
--- a/README.md
+++ b/README.md
@@ -1,19 +1,10 @@
 ([简体中文](./README_cn.md)|English)
 
+
+
 <p align="center">
   <img src="./docs/images/PaddleSpeech_logo.png" />
 </p>
-<div align="center">  
-
-  <h3>
-  <a href="#quick-start"> Quick Start </a>
-  | <a href="#quick-start-server"> Quick Start Server </a>
-  | <a href="#documents"> Documents </a>
-  | <a href="#model-list"> Models List </a>
-</div>
-
-------------------------------------------------------------------------------------
-
 
 <p align="center">
     <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
@@ -28,6 +19,20 @@
     <a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
     <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
+<div align="center">  
+<h3>
+  | <a href="#quick-start"> Quick Start </a>
+  | <a href="#quick-start-server"> Quick Start Server </a>
+  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
+  |
+  </br>
+  | <a href="#documents"> Documents </a>
+  | <a href="#model-list"> Models List </a>
+  |
+</h3>
+</div>
+
+
 
 
 **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
@@ -142,47 +147,40 @@ For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech sample
 
 </div>
 
-### ⭐ Examples
-- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
-  
-<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
-
-- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
-
-- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
-
-<div align="center">
-<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
-</div>
-
-### 🔥 Hot Activities
-
-- 2021.12.21~12.24
-
-  4 Days Live Courses: Depth interpretation of PaddleSpeech!
-
-  **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
 
 ### Features
 
 Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
-- 📦  **Ease of Use**: low barriers to install, and [CLI](#quick-start) is available to quick-start your journey.
+- 📦  **Ease of Use**: low barriers to install, [CLI](#quick-start), [Server](#quick-start-server), and [Streaming Server](#quick-start-streaming-server) is available to quick-start your journey.
 - 🏆  **Align to the State-of-the-Art**: we provide high-speed and ultra-lightweight models, and also cutting-edge technology. 
+- 🏆  **Streaming ASR and TTS System**: we provide production ready streaming asr and streaming tts system.
 - 💯  **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
-- **Varieties of Functions that Vitalize both Industrial and Academia**:
-  - 🛎️  *Implementation of critical audio tasks*: this toolkit contains audio functions like  Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc.
+- 📦  **Varieties of Functions that Vitalize both Industrial and Academia**:
+  - 🛎️  *Implementation of critical audio tasks*: this toolkit contains audio functions like  Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verfication, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
   - 🔬  *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
   - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
 
 ### Recent Update
+- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md)
+- 👏🏻  2022.05.06: `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp`.
+- 👏🏻  2022.05.06: `Server` is available for `Speaker Verification`, and `Punctuation Restoration`.
+- 👏🏻  2022.04.28: `Streaming Server` is available for `Automatic Speech Recognition` and `Text-to-Speech`.
+- 👏🏻  2022.03.28: `Server` is available for `Audio Classification`, `Automatic Speech Recognition` and `Text-to-Speech`.
+- 👏🏻  2022.03.28: `CLI` is available for `Speaker Verification`.
+- 🤗  2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
+- 👏🏻  2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
+
+### 🔥 Hot Activities
 
 <!---
 2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
 --->
-- 👏🏻  2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech.
-- 👏🏻  2022.03.28: PaddleSpeech CLI is available for Speaker Verification.
-- 🤗  2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
-- 👏🏻  2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.
+
+- 2021.12.21~12.24
+
+  4 Days Live Courses: Depth interpretation of PaddleSpeech!
+
+  **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
 
 ### Community
 - Scan the QR code below with your Wechat (reply【语音】after your friend's application is approved), you can access to official technical exchange group. Look forward to your participation.
@@ -196,6 +194,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
 We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*.
 Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
 
+
 <a name="quickstart"></a>
 ## Quick Start
 
@@ -238,7 +237,7 @@ paddlespeech tts --input "你好，欢迎使用飞桨深度学习框架！" --ou
 **Batch Process**
 ```
 echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
-```  
+```
 
 **Shell Pipeline**   
 - ASR + Punctuation Restoration
@@ -257,16 +256,19 @@ If you want to try more functions like training and tuning, please have a look a
 Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).
 
 **Start server**     
+
 ```shell
 paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml
 ```
 
 **Access Speech Recognition Services**     
+
 ```shell
 paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
 ```
 
 **Access Text to Speech Services**     
+
 ```shell
 paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
 ```
@@ -280,6 +282,37 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
 For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
 
 
+<a name="quickstartstreamingserver"></a>
+## Quick Start Streaming Server
+
+Developers can have a try of  [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) server.
+
+**Start Streaming Speech Recognition Server**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
+```
+
+**Access Streaming Speech Recognition Services**     
+
+```
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
+```
+
+**Start Streaming Text to Speech  Server**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
+```
+
+**Access Streaming Text to Speech Services**     
+
+```
+paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+```
+
+For more information please see:  [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) 
+
 <a name="ModelList"></a>
 
 ## Model List
@@ -296,7 +329,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
       <th>Speech-to-Text Module Type</th>
       <th>Dataset</th>
       <th>Model Type</th>
-      <th>Link</th>
+      <th>Example</th>
     </tr>
   </thead>
   <tbody>
@@ -371,7 +404,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
       <th> Text-to-Speech Module Type </th>
       <th> Model Type </th>
       <th> Dataset </th>
-      <th> Link </th>
+      <th> Example </th>
     </tr>
   </thead>
   <tbody>
@@ -489,7 +522,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
       <th> Task </th>
       <th> Dataset </th>
       <th> Model Type </th>
-      <th> Link </th>
+      <th> Example </th>
     </tr>
   </thead>
   <tbody>
@@ -514,7 +547,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
       <th> Task </th>
       <th> Dataset </th>
       <th> Model Type </th>
-      <th> Link </th>
+      <th> Example </th>
     </tr>
   </thead>
   <tbody>
@@ -539,7 +572,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
       <th> Task </th>
       <th> Dataset </th>
       <th> Model Type </th>
-      <th> Link </th>
+      <th> Example </th>
     </tr>
   </thead>
   <tbody>
@@ -589,6 +622,21 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
 
 The Text-to-Speech module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with this repository. If you are interested in academic research about this task, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) is a good guideline for the pipeline components.
 
+
+## ⭐ Examples
+- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
+  
+<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
+
+- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
+
+- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
+</div>
+
+
 ## Citation
 
 To cite PaddleSpeech for research, please use the following format.
@@ -655,7 +703,6 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
 
 ## Acknowledgement
 
-
 - Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
 - Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
 - Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
diff --git a/README_cn.md b/README_cn.md
index 228d5d783dcddf0bf491c2a1334a7c0922e7c5f0..f5ba93629d897b793ffb45a145dc8aa37dcde8bb 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -2,26 +2,45 @@
 <p align="center">
   <img src="./docs/images/PaddleSpeech_logo.png" />
 </p>
-<div align="center">  
 
-  <h3>
-  <a href="#quick-start"> 快速开始 </a>
-  | <a href="#quick-start-server"> 快速使用服务 </a>
-  | <a href="#documents"> 教程文档 </a>
-  | <a href="#model-list"> 模型列表 </a>
-</div>
 
-------------------------------------------------------------------------------------
 <p align="center">
     <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
-    <a href="support os"><img src="https://img.shields.io/badge/os-linux-yellow.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
+    <a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
     <a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
     <a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
     <a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
     <a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
     <a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a>
+    <a href="=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a>
+    <a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
     <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
+<div align="center">  
+<h3>
+  <a href="#quick-start"> Quick Start </a>
+  | <a href="#quick-start-server"> Quick Start Server </a>
+  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
+  </br>
+  <a href="#documents"> Documents </a>
+  | <a href="#model-list"> Models List </a>
+</h3>
+</div>
+
+
+------------------------------------------------------------------------------------
+
+<div align="center">  
+  <h3>
+  <a href="#quick-start"> 快速开始 </a>
+  | <a href="#quick-start-server"> 快速使用服务 </a>
+  | <a href="#quick-start-streaming-server"> 快速使用流式服务 </a>
+  | <a href="#documents"> 教程文档 </a>
+  | <a href="#model-list"> 模型列表 </a>
+</div>
+
+
 
 <!---
 from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
@@ -31,6 +50,8 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
 4.What is the goal of this project?
 -->
 
+
+
 **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，包含大量基于深度学习前沿和有影响力的模型，一些典型的应用示例如下：
 ##### 语音识别
 
@@ -57,7 +78,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
       </td>
       <td>我认为跑步最重要的就是给我带来了身体健康。</td>
     </tr>
-    
   </tbody>
 </table>
 
@@ -143,47 +163,39 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
 
 </div>
 
-### ⭐ 应用案例
-- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
-  
-<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
-
-- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
-
-
-- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**
 
-<div align="center">
-<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
-</div>
-
-### 🔥 热门活动
-
-- 2021.12.21~12.24
-
-  4 日直播课: 深度解读 PaddleSpeech 语音技术!
-
-  **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
 ### 特性
 
 本项目采用了易用、高效、灵活以及可扩展的实现，旨在为工业应用、学术研究提供更好的支持，实现的功能包含训练、推断以及测试模块，以及部署过程，主要包括
 - 📦 **易用性**: 安装门槛低，可使用 [CLI](#quick-start) 快速开始。
 - 🏆 **对标 SoTA**: 提供了高速、轻量级模型，且借鉴了最前沿的技术。
+- 🏆 **流式ASR和TTS系统**：工业级的端到端流式识别、流式合成系统。
 - 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换（G2P）。此外，我们使用自定义语言规则来适应中文语境。
 - **多种工业界以及学术界主流功能支持**:
-  - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成等任务的实现。
+  - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成、声纹识别、KWS等任务的实现。
   - 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块，并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC，详情请见 [模型列表](#model-list)。
   - 🧩 级联模型应用: 作为传统语音任务的扩展，我们结合了自然语言处理、计算机视觉等任务，实现更接近实际需求的产业级应用。
 
+
 ### 近期更新
 
 <!---
 2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
 --->
-- 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。
-- 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。
-- 🤗  2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
-- 👏🏻 2021.12.10: PaddleSpeech CLI 上线！覆盖了声音分类、语音识别、语音翻译（英译中）以及语音合成。
+- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md)、[PP-TTS](./docs/source/tts/PPTTS_cn.md)、[PP-VPR](docs/source/vpr/PPVPR_cn.md)
+- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别（标点恢复、时间戳），和语音合成。
+- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别，标点恢复。
+- 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译（英译中）、语音合成，声纹验证。
+- 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
+
+### 🔥 热门活动
+
+- 2021.12.21~12.24
+
+  4 日直播课: 深度解读 PaddleSpeech 语音技术!
+
+  **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
+
 
 ### 技术交流群
 微信扫描二维码（好友申请通过后回复【语音】）加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
@@ -192,11 +204,13 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
 <img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg"  width = "300"  />
 </div>
 
+
 ## 安装
 
 我们强烈建议用户在 **Linux** 环境下，*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
 目前为止，**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能，**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节，可以参考[安装文档](./docs/source/install_cn.md)。
 
+
 ## 快速开始
 
 安装完成后，开发者可以通过命令行快速开始，改变 `--input` 可以尝试用自己的音频或文本测试。
@@ -232,7 +246,7 @@ paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！
 **批处理**
 ```
 echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
-```  
+```
 
 **Shell管道**
 ASR + Punc:
@@ -269,6 +283,38 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
 
 更多服务相关的命令行使用信息，请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
 
+<a name="quickstartstreamingserver"></a>
+## 快速使用流式服务
+
+开发者可以尝试[流式ASR](./demos/streaming_asr_server/README.md)和 [流式TTS](./demos/streaming_tts_server/README.md)服务.
+
+**启动流式ASR服务**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
+```
+
+**访问流式ASR服务**     
+
+```
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
+```
+
+**启动流式TTS服务**
+
+```
+paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
+```
+
+**访问流式TTS服务**     
+
+```
+paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+```
+
+更多信息参看： [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) 
+
+<a name="modulelist"></a>
 
 ## 模型列表
 PaddleSpeech 支持很多主流的模型，并提供了预训练模型，详情请见[模型列表](./docs/source/released_model.md)。
@@ -282,8 +328,8 @@ PaddleSpeech 的 **语音转文本** 包含语音识别声学模型、语音识
     <tr>
       <th>语音转文本模块类型</th>
       <th>数据集</th>
-      <th>模型种类</th>
-      <th>链接</th>
+      <th>模型类型</th>
+      <th>脚本</th>
     </tr>
   </thead>
   <tbody>
@@ -356,9 +402,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
   <thead>
     <tr>
       <th> 语音合成模块类型 </th>
-      <th> 模型种类 </th>
+      <th> 模型类型 </th>
       <th> 数据集  </th>
-      <th> 链接  </th>
+      <th> 脚本  </th>
     </tr>
   </thead>
   <tbody>
@@ -474,8 +520,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
     <tr>
       <th> 任务 </th>
       <th> 数据集 </th>
-      <th> 模型种类 </th>
-      <th> 链接</th>
+      <th> 模型类型 </th>
+      <th> 脚本</th>
     </tr>
   </thead>
   <tbody>
@@ -498,10 +544,10 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 <table style="width:100%">
   <thead>
     <tr>
-      <th> Task </th>
-      <th> Dataset </th>
-      <th> Model Type </th>
-      <th> Link </th>
+      <th> 任务 </th>
+      <th> 数据集 </th>
+      <th> 模型类型 </th>
+      <th> 脚本 </th>
     </tr>
   </thead>
   <tbody>
@@ -525,8 +571,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
     <tr>
       <th> 任务 </th>
       <th> 数据集 </th>
-      <th> 模型种类 </th>
-      <th> 链接 </th>
+      <th> 模型类型 </th>
+      <th> 脚本 </th>
     </tr>
   </thead>
   <tbody>
@@ -582,6 +628,21 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 
 语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet)，现在与此仓库合并。如果您对该任务的学术研究感兴趣，请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外，[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。
 
+## ⭐ 应用案例
+- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
+  
+<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
+
+- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
+
+
+- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png"  width = "500px"  />
+</div>
+
+
 ## 引用
 
 要引用 PaddleSpeech 进行研究，请使用以下格式进行引用。
@@ -658,6 +719,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 - 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。
 
   
+
 此外，PaddleSpeech 依赖于许多开源存储库。有关更多信息，请参阅 [references](./docs/source/reference.md)。
 
 ## License
diff --git a/paddleaudio/.gitignore b/audio/.gitignore
similarity index 100%
rename from paddleaudio/.gitignore
rename to audio/.gitignore
diff --git a/paddleaudio/CHANGELOG.md b/audio/CHANGELOG.md
similarity index 100%
rename from paddleaudio/CHANGELOG.md
rename to audio/CHANGELOG.md
diff --git a/paddleaudio/README.md b/audio/README.md
similarity index 100%
rename from paddleaudio/README.md
rename to audio/README.md
diff --git a/paddleaudio/docs/Makefile b/audio/docs/Makefile
similarity index 100%
rename from paddleaudio/docs/Makefile
rename to audio/docs/Makefile
diff --git a/paddleaudio/docs/README.md b/audio/docs/README.md
similarity index 100%
rename from paddleaudio/docs/README.md
rename to audio/docs/README.md
diff --git a/paddleaudio/docs/images/paddle.png b/audio/docs/images/paddle.png
similarity index 100%
rename from paddleaudio/docs/images/paddle.png
rename to audio/docs/images/paddle.png
diff --git a/paddleaudio/docs/make.bat b/audio/docs/make.bat
similarity index 100%
rename from paddleaudio/docs/make.bat
rename to audio/docs/make.bat
diff --git a/paddleaudio/docs/source/_static/custom.css b/audio/docs/source/_static/custom.css
similarity index 100%
rename from paddleaudio/docs/source/_static/custom.css
rename to audio/docs/source/_static/custom.css
diff --git a/paddleaudio/docs/source/_templates/module.rst_t b/audio/docs/source/_templates/module.rst_t
similarity index 100%
rename from paddleaudio/docs/source/_templates/module.rst_t
rename to audio/docs/source/_templates/module.rst_t
diff --git a/paddleaudio/docs/source/_templates/package.rst_t b/audio/docs/source/_templates/package.rst_t
similarity index 100%
rename from paddleaudio/docs/source/_templates/package.rst_t
rename to audio/docs/source/_templates/package.rst_t
diff --git a/paddleaudio/docs/source/_templates/toc.rst_t b/audio/docs/source/_templates/toc.rst_t
similarity index 100%
rename from paddleaudio/docs/source/_templates/toc.rst_t
rename to audio/docs/source/_templates/toc.rst_t
diff --git a/paddleaudio/docs/source/conf.py b/audio/docs/source/conf.py
similarity index 100%
rename from paddleaudio/docs/source/conf.py
rename to audio/docs/source/conf.py
diff --git a/paddleaudio/docs/source/index.rst b/audio/docs/source/index.rst
similarity index 100%
rename from paddleaudio/docs/source/index.rst
rename to audio/docs/source/index.rst
diff --git a/paddleaudio/paddleaudio/__init__.py b/audio/paddleaudio/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/__init__.py
rename to audio/paddleaudio/__init__.py
diff --git a/paddleaudio/paddleaudio/backends/__init__.py b/audio/paddleaudio/backends/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/backends/__init__.py
rename to audio/paddleaudio/backends/__init__.py
diff --git a/paddleaudio/paddleaudio/backends/soundfile_backend.py b/audio/paddleaudio/backends/soundfile_backend.py
similarity index 100%
rename from paddleaudio/paddleaudio/backends/soundfile_backend.py
rename to audio/paddleaudio/backends/soundfile_backend.py
diff --git a/paddleaudio/paddleaudio/backends/sox_backend.py b/audio/paddleaudio/backends/sox_backend.py
similarity index 100%
rename from paddleaudio/paddleaudio/backends/sox_backend.py
rename to audio/paddleaudio/backends/sox_backend.py
diff --git a/paddleaudio/paddleaudio/compliance/__init__.py b/audio/paddleaudio/compliance/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/compliance/__init__.py
rename to audio/paddleaudio/compliance/__init__.py
diff --git a/paddleaudio/paddleaudio/compliance/kaldi.py b/audio/paddleaudio/compliance/kaldi.py
similarity index 100%
rename from paddleaudio/paddleaudio/compliance/kaldi.py
rename to audio/paddleaudio/compliance/kaldi.py
diff --git a/paddleaudio/paddleaudio/compliance/librosa.py b/audio/paddleaudio/compliance/librosa.py
similarity index 100%
rename from paddleaudio/paddleaudio/compliance/librosa.py
rename to audio/paddleaudio/compliance/librosa.py
diff --git a/paddleaudio/paddleaudio/datasets/__init__.py b/audio/paddleaudio/datasets/__init__.py
similarity index 96%
rename from paddleaudio/paddleaudio/datasets/__init__.py
rename to audio/paddleaudio/datasets/__init__.py
index ebd4af984f697a8fe73c7a87f4d8362a95915c42..f95fad3054de8d19f24f881b69b682ae6def5b5b 100644
--- a/paddleaudio/paddleaudio/datasets/__init__.py
+++ b/audio/paddleaudio/datasets/__init__.py
@@ -13,6 +13,7 @@
 # limitations under the License.
 from .esc50 import ESC50
 from .gtzan import GTZAN
+from .hey_snips import HeySnips
 from .rirs_noises import OpenRIRNoise
 from .tess import TESS
 from .urban_sound import UrbanSound8K
diff --git a/paddleaudio/paddleaudio/datasets/dataset.py b/audio/paddleaudio/datasets/dataset.py
similarity index 76%
rename from paddleaudio/paddleaudio/datasets/dataset.py
rename to audio/paddleaudio/datasets/dataset.py
index 06e2df6d0efac865baece7f0fd446fbf41f35c32..488187a69de54aed7af2b038bea6f3bcb73c57f6 100644
--- a/paddleaudio/paddleaudio/datasets/dataset.py
+++ b/audio/paddleaudio/datasets/dataset.py
@@ -17,6 +17,8 @@ import numpy as np
 import paddle
 
 from ..backends import load as load_audio
+from ..compliance.kaldi import fbank as kaldi_fbank
+from ..compliance.kaldi import mfcc as kaldi_mfcc
 from ..compliance.librosa import melspectrogram
 from ..compliance.librosa import mfcc
 
@@ -24,6 +26,8 @@ feat_funcs = {
     'raw': None,
     'melspectrogram': melspectrogram,
     'mfcc': mfcc,
+    'kaldi_fbank': kaldi_fbank,
+    'kaldi_mfcc': kaldi_mfcc,
 }
 
 
@@ -73,16 +77,24 @@ class AudioClassificationDataset(paddle.io.Dataset):
         feat_func = feat_funcs[self.feat_type]
 
         record = {}
-        record['feat'] = feat_func(
-            waveform, sample_rate,
-            **self.feat_config) if feat_func else waveform
+        if self.feat_type in ['kaldi_fbank', 'kaldi_mfcc']:
+            waveform = paddle.to_tensor(waveform).unsqueeze(0)  # (C, T)
+            record['feat'] = feat_func(
+                waveform=waveform, sr=self.sample_rate, **self.feat_config)
+        else:
+            record['feat'] = feat_func(
+                waveform, sample_rate,
+                **self.feat_config) if feat_func else waveform
         record['label'] = label
         return record
 
     def __getitem__(self, idx):
         record = self._convert_to_record(idx)
-        return np.array(record['feat']).transpose(), np.array(
-            record['label'], dtype=np.int64)
+        if self.feat_type in ['kaldi_fbank', 'kaldi_mfcc']:
+            return self.keys[idx], record['feat'], record['label']
+        else:
+            return np.array(record['feat']).transpose(), np.array(
+                record['label'], dtype=np.int64)
 
     def __len__(self):
         return len(self.files)
diff --git a/paddleaudio/paddleaudio/datasets/esc50.py b/audio/paddleaudio/datasets/esc50.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/esc50.py
rename to audio/paddleaudio/datasets/esc50.py
diff --git a/paddleaudio/paddleaudio/datasets/gtzan.py b/audio/paddleaudio/datasets/gtzan.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/gtzan.py
rename to audio/paddleaudio/datasets/gtzan.py
diff --git a/audio/paddleaudio/datasets/hey_snips.py b/audio/paddleaudio/datasets/hey_snips.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a67b843bb4dca8bea4f49c69cd7dd2105e2618d
--- /dev/null
+++ b/audio/paddleaudio/datasets/hey_snips.py
@@ -0,0 +1,74 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import collections
+import json
+import os
+from typing import List
+from typing import Tuple
+
+from .dataset import AudioClassificationDataset
+
+__all__ = ['HeySnips']
+
+
+class HeySnips(AudioClassificationDataset):
+    meta_info = collections.namedtuple('META_INFO',
+                                       ('key', 'label', 'duration', 'wav'))
+
+    def __init__(self,
+                 data_dir: os.PathLike,
+                 mode: str='train',
+                 feat_type: str='kaldi_fbank',
+                 sample_rate: int=16000,
+                 **kwargs):
+        self.data_dir = data_dir
+        files, labels = self._get_data(mode)
+        super(HeySnips, self).__init__(
+            files=files,
+            labels=labels,
+            feat_type=feat_type,
+            sample_rate=sample_rate,
+            **kwargs)
+
+    def _get_meta_info(self, mode) -> List[collections.namedtuple]:
+        ret = []
+        with open(os.path.join(self.data_dir, '{}.json'.format(mode)),
+                  'r') as f:
+            data = json.load(f)
+            for item in data:
+                sample = collections.OrderedDict()
+                if item['duration'] > 0:
+                    sample['key'] = item['id']
+                    sample['label'] = 0 if item['is_hotword'] == 1 else -1
+                    sample['duration'] = item['duration']
+                    sample['wav'] = os.path.join(self.data_dir,
+                                                 item['audio_file_path'])
+                    ret.append(self.meta_info(*sample.values()))
+        return ret
+
+    def _get_data(self, mode: str) -> Tuple[List[str], List[int]]:
+        meta_info = self._get_meta_info(mode)
+
+        files = []
+        labels = []
+        self.keys = []
+        self.durations = []
+        for sample in meta_info:
+            key, target, duration, wav = sample
+            files.append(wav)
+            labels.append(int(target))
+            self.keys.append(key)
+            self.durations.append(float(duration))
+
+        return files, labels
diff --git a/paddleaudio/paddleaudio/datasets/rirs_noises.py b/audio/paddleaudio/datasets/rirs_noises.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/rirs_noises.py
rename to audio/paddleaudio/datasets/rirs_noises.py
diff --git a/paddleaudio/paddleaudio/datasets/tess.py b/audio/paddleaudio/datasets/tess.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/tess.py
rename to audio/paddleaudio/datasets/tess.py
diff --git a/paddleaudio/paddleaudio/datasets/urban_sound.py b/audio/paddleaudio/datasets/urban_sound.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/urban_sound.py
rename to audio/paddleaudio/datasets/urban_sound.py
diff --git a/paddleaudio/paddleaudio/datasets/voxceleb.py b/audio/paddleaudio/datasets/voxceleb.py
similarity index 100%
rename from paddleaudio/paddleaudio/datasets/voxceleb.py
rename to audio/paddleaudio/datasets/voxceleb.py
diff --git a/paddleaudio/paddleaudio/features/__init__.py b/audio/paddleaudio/features/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/features/__init__.py
rename to audio/paddleaudio/features/__init__.py
diff --git a/paddleaudio/paddleaudio/features/layers.py b/audio/paddleaudio/features/layers.py
similarity index 100%
rename from paddleaudio/paddleaudio/features/layers.py
rename to audio/paddleaudio/features/layers.py
diff --git a/paddleaudio/paddleaudio/functional/__init__.py b/audio/paddleaudio/functional/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/functional/__init__.py
rename to audio/paddleaudio/functional/__init__.py
diff --git a/paddleaudio/paddleaudio/functional/functional.py b/audio/paddleaudio/functional/functional.py
similarity index 100%
rename from paddleaudio/paddleaudio/functional/functional.py
rename to audio/paddleaudio/functional/functional.py
diff --git a/paddleaudio/paddleaudio/functional/window.py b/audio/paddleaudio/functional/window.py
similarity index 100%
rename from paddleaudio/paddleaudio/functional/window.py
rename to audio/paddleaudio/functional/window.py
diff --git a/paddleaudio/paddleaudio/io/__init__.py b/audio/paddleaudio/io/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/io/__init__.py
rename to audio/paddleaudio/io/__init__.py
diff --git a/paddleaudio/paddleaudio/metric/__init__.py b/audio/paddleaudio/metric/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/metric/__init__.py
rename to audio/paddleaudio/metric/__init__.py
diff --git a/paddleaudio/paddleaudio/metric/dtw.py b/audio/paddleaudio/metric/dtw.py
similarity index 100%
rename from paddleaudio/paddleaudio/metric/dtw.py
rename to audio/paddleaudio/metric/dtw.py
diff --git a/paddleaudio/paddleaudio/metric/eer.py b/audio/paddleaudio/metric/eer.py
similarity index 100%
rename from paddleaudio/paddleaudio/metric/eer.py
rename to audio/paddleaudio/metric/eer.py
diff --git a/paddleaudio/paddleaudio/sox_effects/__init__.py b/audio/paddleaudio/sox_effects/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/sox_effects/__init__.py
rename to audio/paddleaudio/sox_effects/__init__.py
diff --git a/paddleaudio/paddleaudio/utils/__init__.py b/audio/paddleaudio/utils/__init__.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/__init__.py
rename to audio/paddleaudio/utils/__init__.py
diff --git a/paddleaudio/paddleaudio/utils/download.py b/audio/paddleaudio/utils/download.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/download.py
rename to audio/paddleaudio/utils/download.py
diff --git a/paddleaudio/paddleaudio/utils/env.py b/audio/paddleaudio/utils/env.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/env.py
rename to audio/paddleaudio/utils/env.py
diff --git a/paddleaudio/paddleaudio/utils/error.py b/audio/paddleaudio/utils/error.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/error.py
rename to audio/paddleaudio/utils/error.py
diff --git a/paddleaudio/paddleaudio/utils/log.py b/audio/paddleaudio/utils/log.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/log.py
rename to audio/paddleaudio/utils/log.py
diff --git a/paddleaudio/paddleaudio/utils/numeric.py b/audio/paddleaudio/utils/numeric.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/numeric.py
rename to audio/paddleaudio/utils/numeric.py
diff --git a/paddleaudio/paddleaudio/utils/time.py b/audio/paddleaudio/utils/time.py
similarity index 100%
rename from paddleaudio/paddleaudio/utils/time.py
rename to audio/paddleaudio/utils/time.py
diff --git a/paddleaudio/setup.py b/audio/setup.py
similarity index 99%
rename from paddleaudio/setup.py
rename to audio/setup.py
index aac38930295aac345c0a5746e4dadfec98ef9dc7..ec67c81def776d25e86800ef3606093e91e4c2ef 100644
--- a/paddleaudio/setup.py
+++ b/audio/setup.py
@@ -19,7 +19,7 @@ from setuptools.command.install import install
 from setuptools.command.test import test
 
 # set the version here
-VERSION = '0.2.1'
+VERSION = '0.0.0'
 
 
 # Inspired by the example at https://pytest.org/latest/goodpractises.html
diff --git a/paddleaudio/tests/.gitkeep b/audio/tests/.gitkeep
similarity index 100%
rename from paddleaudio/tests/.gitkeep
rename to audio/tests/.gitkeep
diff --git a/paddleaudio/tests/backends/__init__.py b/audio/tests/backends/__init__.py
similarity index 100%
rename from paddleaudio/tests/backends/__init__.py
rename to audio/tests/backends/__init__.py
diff --git a/paddleaudio/tests/backends/base.py b/audio/tests/backends/base.py
similarity index 100%
rename from paddleaudio/tests/backends/base.py
rename to audio/tests/backends/base.py
diff --git a/paddleaudio/tests/backends/soundfile/__init__.py b/audio/tests/backends/soundfile/__init__.py
similarity index 100%
rename from paddleaudio/tests/backends/soundfile/__init__.py
rename to audio/tests/backends/soundfile/__init__.py
diff --git a/paddleaudio/tests/backends/soundfile/test_io.py b/audio/tests/backends/soundfile/test_io.py
similarity index 100%
rename from paddleaudio/tests/backends/soundfile/test_io.py
rename to audio/tests/backends/soundfile/test_io.py
index 0f7580a40d386c048e88e6e3f75c6451917c9d68..9d092902da49e4651574201fa6d050d2a12b9c92 100644
--- a/paddleaudio/tests/backends/soundfile/test_io.py
+++ b/audio/tests/backends/soundfile/test_io.py
@@ -16,9 +16,9 @@ import os
 import unittest
 
 import numpy as np
+import paddleaudio
 import soundfile as sf
 
-import paddleaudio
 from ..base import BackendTest
 
 
diff --git a/paddleaudio/tests/benchmark/README.md b/audio/tests/benchmark/README.md
similarity index 100%
rename from paddleaudio/tests/benchmark/README.md
rename to audio/tests/benchmark/README.md
diff --git a/paddleaudio/tests/benchmark/log_melspectrogram.py b/audio/tests/benchmark/log_melspectrogram.py
similarity index 99%
rename from paddleaudio/tests/benchmark/log_melspectrogram.py
rename to audio/tests/benchmark/log_melspectrogram.py
index 5230acd424e27b22dfc3e656410f5f74f5a1b2d0..9832aed4d1b80a4565efac8a551946feb7a7a117 100644
--- a/paddleaudio/tests/benchmark/log_melspectrogram.py
+++ b/audio/tests/benchmark/log_melspectrogram.py
@@ -17,11 +17,10 @@ import urllib.request
 import librosa
 import numpy as np
 import paddle
+import paddleaudio
 import torch
 import torchaudio
 
-import paddleaudio
-
 wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
 if not os.path.isfile(os.path.basename(wav_url)):
     urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
diff --git a/paddleaudio/tests/benchmark/melspectrogram.py b/audio/tests/benchmark/melspectrogram.py
similarity index 99%
rename from paddleaudio/tests/benchmark/melspectrogram.py
rename to audio/tests/benchmark/melspectrogram.py
index e0b79b45a71a83ee5791ab97a633018c1d377ee1..5fe3f2481820810a394350b56bdd3c315e08cb46 100644
--- a/paddleaudio/tests/benchmark/melspectrogram.py
+++ b/audio/tests/benchmark/melspectrogram.py
@@ -17,11 +17,10 @@ import urllib.request
 import librosa
 import numpy as np
 import paddle
+import paddleaudio
 import torch
 import torchaudio
 
-import paddleaudio
-
 wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
 if not os.path.isfile(os.path.basename(wav_url)):
     urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
diff --git a/paddleaudio/tests/benchmark/mfcc.py b/audio/tests/benchmark/mfcc.py
similarity index 99%
rename from paddleaudio/tests/benchmark/mfcc.py
rename to audio/tests/benchmark/mfcc.py
index 2572ff33dd1cd80ba41ac1f0e35ec1df5e04e757..c6a8c85f90905442a8c2ee19ac52b1f0727aa50a 100644
--- a/paddleaudio/tests/benchmark/mfcc.py
+++ b/audio/tests/benchmark/mfcc.py
@@ -17,11 +17,10 @@ import urllib.request
 import librosa
 import numpy as np
 import paddle
+import paddleaudio
 import torch
 import torchaudio
 
-import paddleaudio
-
 wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
 if not os.path.isfile(os.path.basename(wav_url)):
     urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
diff --git a/paddleaudio/tests/features/__init__.py b/audio/tests/features/__init__.py
similarity index 100%
rename from paddleaudio/tests/features/__init__.py
rename to audio/tests/features/__init__.py
diff --git a/paddleaudio/tests/features/base.py b/audio/tests/features/base.py
similarity index 99%
rename from paddleaudio/tests/features/base.py
rename to audio/tests/features/base.py
index 725e1e2e70bdacca0e067e371dfb8e71130e0170..476f6b8eeb7f14247fa00fd0943741c2eca53e66 100644
--- a/paddleaudio/tests/features/base.py
+++ b/audio/tests/features/base.py
@@ -17,7 +17,6 @@ import urllib.request
 
 import numpy as np
 import paddle
-
 from paddleaudio import load
 
 wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
diff --git a/paddleaudio/tests/features/test_istft.py b/audio/tests/features/test_istft.py
similarity index 100%
rename from paddleaudio/tests/features/test_istft.py
rename to audio/tests/features/test_istft.py
index 23371200b6209a300e1205d1db02d3f6542f473e..9cf8cdd65582c0300d59749db621155eebd3faee 100644
--- a/paddleaudio/tests/features/test_istft.py
+++ b/audio/tests/features/test_istft.py
@@ -15,9 +15,9 @@ import unittest
 
 import numpy as np
 import paddle
+from paddleaudio.functional.window import get_window
 
 from .base import FeatTest
-from paddleaudio.functional.window import get_window
 from paddlespeech.s2t.transform.spectrogram import IStft
 from paddlespeech.s2t.transform.spectrogram import Stft
 
diff --git a/paddleaudio/tests/features/test_kaldi.py b/audio/tests/features/test_kaldi.py
similarity index 100%
rename from paddleaudio/tests/features/test_kaldi.py
rename to audio/tests/features/test_kaldi.py
index 6e826aaa75b751127548cba4d600195ad7094d00..00a576f6f48ee71405f5942ff961ae8f6e8edf55 100644
--- a/paddleaudio/tests/features/test_kaldi.py
+++ b/audio/tests/features/test_kaldi.py
@@ -15,10 +15,10 @@ import unittest
 
 import numpy as np
 import paddle
+import paddleaudio
 import torch
 import torchaudio
 
-import paddleaudio
 from .base import FeatTest
 
 
diff --git a/paddleaudio/tests/features/test_librosa.py b/audio/tests/features/test_librosa.py
similarity index 100%
rename from paddleaudio/tests/features/test_librosa.py
rename to audio/tests/features/test_librosa.py
index cf0c98c7295d6a7c2cdc7739455900d28ec02ef4..a1d3e8400dbc62924b68a1519605231d5da70bd8 100644
--- a/paddleaudio/tests/features/test_librosa.py
+++ b/audio/tests/features/test_librosa.py
@@ -16,11 +16,11 @@ import unittest
 import librosa
 import numpy as np
 import paddle
-
 import paddleaudio
-from .base import FeatTest
 from paddleaudio.functional.window import get_window
 
+from .base import FeatTest
+
 
 class TestLibrosa(FeatTest):
     def initParmas(self):
diff --git a/paddleaudio/tests/features/test_log_melspectrogram.py b/audio/tests/features/test_log_melspectrogram.py
similarity index 100%
rename from paddleaudio/tests/features/test_log_melspectrogram.py
rename to audio/tests/features/test_log_melspectrogram.py
index 6bae2df3f564da16cb511541f8bbc714ad0b087e..0383c2b8b200a261cbb3e9a8a354f432e28e10a2 100644
--- a/paddleaudio/tests/features/test_log_melspectrogram.py
+++ b/audio/tests/features/test_log_melspectrogram.py
@@ -15,8 +15,8 @@ import unittest
 
 import numpy as np
 import paddle
-
 import paddleaudio
+
 from .base import FeatTest
 from paddlespeech.s2t.transform.spectrogram import LogMelSpectrogram
 
diff --git a/paddleaudio/tests/features/test_spectrogram.py b/audio/tests/features/test_spectrogram.py
similarity index 100%
rename from paddleaudio/tests/features/test_spectrogram.py
rename to audio/tests/features/test_spectrogram.py
index 50b21403b4fb8187587edae0222a09996b384aec..1774fe61975c4b4ae11b7ff2c9200a4d67499efe 100644
--- a/paddleaudio/tests/features/test_spectrogram.py
+++ b/audio/tests/features/test_spectrogram.py
@@ -15,8 +15,8 @@ import unittest
 
 import numpy as np
 import paddle
-
 import paddleaudio
+
 from .base import FeatTest
 from paddlespeech.s2t.transform.spectrogram import Spectrogram
 
diff --git a/paddleaudio/tests/features/test_stft.py b/audio/tests/features/test_stft.py
similarity index 100%
rename from paddleaudio/tests/features/test_stft.py
rename to audio/tests/features/test_stft.py
index c64b5ebe6b497b5d9c40af0c14d2785afa2e7504..58792ffe2477058958a4e31ed122263306e83388 100644
--- a/paddleaudio/tests/features/test_stft.py
+++ b/audio/tests/features/test_stft.py
@@ -15,9 +15,9 @@ import unittest
 
 import numpy as np
 import paddle
+from paddleaudio.functional.window import get_window
 
 from .base import FeatTest
-from paddleaudio.functional.window import get_window
 from paddlespeech.s2t.transform.spectrogram import Stft
 
 
diff --git a/demos/README.md b/demos/README.md
index 84f4de41f0514cb31bf114e00f5622c771a56348..8abd67249d7ad939db6d79d7b8160b8efa7cb8ba 100644
--- a/demos/README.md
+++ b/demos/README.md
@@ -11,6 +11,7 @@ The directory containes many speech applications in multi scenarios.
 * punctuation_restoration - restore punctuation from raw text
 * speech recogintion - recognize text of an audio file 
 * speech server - Server for Speech Task, e.g. ASR,TTS,CLS
+* streaming asr server - receive audio stream from websocket, and recognize to transcript.
 * speech translation - end to end speech translation  
 * story talker - book reader based on OCR and TTS  
 * style_fs2 - multi style control for FastSpeech2 model  
diff --git a/demos/README_cn.md b/demos/README_cn.md
index 692b8468fc0fc5d5c36b959f49bf73f830fd9e2b..471342127f4e6e49522714d5926f5c185fbdb92b 100644
--- a/demos/README_cn.md
+++ b/demos/README_cn.md
@@ -11,6 +11,7 @@
 * 标点恢复 - 通常作为语音识别的文本后处理任务，为一段无标点的纯文本添加相应的标点符号。
 * 语音识别 - 识别一段音频中包含的语音文字。
 * 语音服务 - 离线语音服务，包括ASR、TTS、CLS等
+* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字
 * 语音翻译 - 实时识别音频中的语言，并同时翻译成目标语言。
 * 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。
 * 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。 
diff --git a/demos/audio_content_search/README.md b/demos/audio_content_search/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..d73d6a59d71f7973b88f3cc9cee2834b49e5fe59
--- /dev/null
+++ b/demos/audio_content_search/README.md
@@ -0,0 +1,74 @@
+([简体中文](./README_cn.md)|English)
+# ACS (Audio Content Search)
+
+## Introduction
+ACS, or Audio Content Search, refers to the problem of getting the key word time stamp from automatically transcribe spoken language (speech-to-text). 
+
+This demo is an implementation of obtaining the keyword timestamp in the text from a given audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
+Now, the search word in demo is:
+```
+我
+康
+```
+## Usage
+### 1. Installation
+see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+
+You can choose one way from meduim and hard to install paddlespeech.
+
+The dependency refers to the requirements.txt
+### 2. Prepare Input File
+The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+
+Here are sample files for this demo that can be downloaded:
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+
+### 3. Usage
+- Command Line(Recommended)
+  ```bash
+  # Chinese
+  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
+  ```
+  
+  Usage:
+  ```bash
+  paddlespeech asr --help
+  ```
+  Arguments:
+  - `input`(required): Audio file to recognize.
+  - `server_ip`: the server ip.
+  - `port`: the server port.
+  - `lang`: the language type of the model. Default: `zh`.
+  - `sample_rate`: Sample rate of the model. Default: `16000`.
+  - `audio_format`: The audio format.
+
+  Output:
+  ```bash
+  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
+  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
+  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+  acs_executor = ACSClientExecutor()
+  res = acs_executor(
+      input='./zh.wav',
+      server_ip="127.0.0.1",
+      port=8490,)
+  print(res)
+  ```
+
+  Output:
+  ```bash
+  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
+  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
+  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  ```
diff --git a/demos/audio_content_search/README_cn.md b/demos/audio_content_search/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..c74af4cf1f1e1a70470bf176cd1821dfdd02ac74
--- /dev/null
+++ b/demos/audio_content_search/README_cn.md
@@ -0,0 +1,74 @@
+(简体中文|[English](./README.md))
+
+# 语音内容搜索
+## 介绍
+语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。
+
+这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
+
+当前示例中检索词是
+```
+我
+康
+```
+## 使用方法
+### 1. 安装
+请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
+
+你可以从 medium，hard 三中方式中选择一种方式安装。
+依赖参见 requirements.txt
+
+### 2. 准备输入
+这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
+
+可以下载此 demo 的示例音频：
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+### 3. 使用方法
+- 命令行 (推荐使用)
+  ```bash
+  # 中文
+  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
+  ```
+  
+  使用方法：
+  ```bash
+  paddlespeech acs --help
+  ```
+  参数：
+  - `input`(必须输入)：用于识别的音频文件。
+  - `server_ip`: 服务的ip。
+  - `port`：服务的端口。
+  - `lang`：模型语言，默认值：`zh`。
+  - `sample_rate`：音频采样率，默认值：`16000`。
+  - `audio_format`: 音频的格式。
+
+  输出：
+  ```bash
+  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
+  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
+  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+  acs_executor = ACSClientExecutor()
+  res = acs_executor(
+      input='./zh.wav',
+      server_ip="127.0.0.1",
+      port=8490,)
+  print(res)
+  ```
+
+  输出：
+  ```bash
+  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
+  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
+  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  ```
diff --git a/demos/audio_content_search/acs_clinet.py b/demos/audio_content_search/acs_clinet.py
new file mode 100644
index 0000000000000000000000000000000000000000..11f99aca7aa74b2b9fca8544939a0f7267878b21
--- /dev/null
+++ b/demos/audio_content_search/acs_clinet.py
@@ -0,0 +1,49 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.utils.audio_handler import ASRHttpHandler
+
+
+def main(args):
+    logger.info("asr http client start")
+    audio_format = "wav"
+    sample_rate = 16000
+    lang = "zh"
+    handler = ASRHttpHandler(
+        server_ip=args.server_ip, port=args.port, endpoint=args.endpoint)
+    res = handler.run(args.wavfile, audio_format, sample_rate, lang)
+    # res = res['result']
+    logger.info(f"the final result: {res}")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="audio content search client")
+    parser.add_argument(
+        '--server_ip', type=str, default='127.0.0.1', help='server ip')
+    parser.add_argument('--port', type=int, default=8090, help='server port')
+    parser.add_argument(
+        "--wavfile",
+        action="store",
+        help="wav file path ",
+        default="./16_audio.wav")
+    parser.add_argument(
+        '--endpoint',
+        type=str,
+        default='/paddlespeech/asr/search',
+        help='server endpoint')
+    args = parser.parse_args()
+
+    main(args)
diff --git a/demos/audio_content_search/conf/acs_application.yaml b/demos/audio_content_search/conf/acs_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d3c5e3039945ffe23ba6dd2de717d9b6ab8a433f
--- /dev/null
+++ b/demos/audio_content_search/conf/acs_application.yaml
@@ -0,0 +1,34 @@
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8490
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['acs_python']
+# protocol = ['http'] (only one can be selected). 
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['acs_python']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ACS #########################################
+################### acs task: engine_type: python ###############################
+acs_python:
+    task: acs
+    asr_protocol: 'websocket' # 'websocket'
+    offset: 1.0 # second
+    asr_server_ip: 127.0.0.1
+    asr_server_port: 8390
+    lang: 'zh'
+    word_list: "./conf/words.txt"
+    sample_rate: 16000
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+
+
+
+
diff --git a/demos/audio_content_search/conf/words.txt b/demos/audio_content_search/conf/words.txt
new file mode 100644
index 0000000000000000000000000000000000000000..25510eb424fbe48ba81f51a3ce10d6ff9facad63
--- /dev/null
+++ b/demos/audio_content_search/conf/words.txt
@@ -0,0 +1,2 @@
+我
+康
\ No newline at end of file
diff --git a/demos/audio_content_search/conf/ws_conformer_application.yaml b/demos/audio_content_search/conf/ws_conformer_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..97201382f57e12e3fccb600f98ee3b0b26dc889c
--- /dev/null
+++ b/demos/audio_content_search/conf/ws_conformer_application.yaml
@@ -0,0 +1,43 @@
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8390
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_multicn'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 'attention_rescoring' 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
diff --git a/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c23680bd59d5286ea0854efd46a7479485784f27
--- /dev/null
+++ b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
@@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8390
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
diff --git a/demos/audio_content_search/run.sh b/demos/audio_content_search/run.sh
new file mode 100755
index 0000000000000000000000000000000000000000..e322a37c5fcb98f1d5410f736e69646414af5f0f
--- /dev/null
+++ b/demos/audio_content_search/run.sh
@@ -0,0 +1,7 @@
+export CUDA_VISIBLE_DEVICE=0,1,2,3
+# we need the streaming asr server
+nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log  2>&1  &
+
+# start the acs server
+nohup paddlespeech_server start --config_file conf/acs_application.yaml > acs.log 2>&1 &
+
diff --git a/demos/audio_searching/README.md b/demos/audio_searching/README.md
index 87a1956b9fd22b1d5f71d33794839e5d2817d5c1..e829d991aa9863259d20b07c9dc6af664eb8dc27 100644
--- a/demos/audio_searching/README.md
+++ b/demos/audio_searching/README.md
@@ -167,8 +167,8 @@ Then to start the system server, and it provides HTTP backend services.
     [2022-03-26 22:54:08,633] [    INFO] - embedding size: (192,)
     Extracting feature from audio No. 2 , 20 audios in total
     ...
-    2022-03-26 22:54:15,892 ｜ INFO ｜ main.py ｜ load_audios ｜ 85 ｜ Successfully loaded data, total count: 20
-    2022-03-26 22:54:15,908 ｜ INFO ｜ main.py ｜ count_audio ｜ 148 ｜ Successfully count the number of data!
+    2022-03-26 22:54:15,892 ｜ INFO ｜ audio_search.py ｜ load_audios ｜ 85 ｜ Successfully loaded data, total count: 20
+    2022-03-26 22:54:15,908 ｜ INFO ｜ audio_search.py ｜ count_audio ｜ 148 ｜ Successfully count the number of data!
     [2022-03-26 22:54:15,916] [    INFO] - checking the aduio file format......
     [2022-03-26 22:54:15,916] [    INFO] - The sample rate is 16000
     [2022-03-26 22:54:15,916] [    INFO] - The audio file format is right
@@ -183,12 +183,12 @@ Then to start the system server, and it provides HTTP backend services.
     [2022-03-26 22:54:15,924] [    INFO] - feats shape:[1, 80, 53], lengths shape: [1]
     [2022-03-26 22:54:16,051] [    INFO] - embedding size: (192,)
     ...
-    2022-03-26 22:54:16,086 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
-    2022-03-26 22:54:16,087 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
-    2022-03-26 22:54:16,087 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
+    2022-03-26 22:54:16,086 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
+    2022-03-26 22:54:16,087 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
+    2022-03-26 22:54:16,087 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
     ...
-    2022-03-26 22:54:16,088 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 136 ｜ Successfully searched similar audio!
-    2022-03-26 22:54:17,164 ｜ INFO ｜ main.py ｜ drop_tables ｜ 160 ｜ Successfully drop tables in Milvus and MySQL!
+    2022-03-26 22:54:16,088 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 136 ｜ Successfully searched similar audio!
+    2022-03-26 22:54:17,164 ｜ INFO ｜ audio_search.py ｜ drop_tables ｜ 160 ｜ Successfully drop tables in Milvus and MySQL!
     ```
 - GUI test (Optional)
   
diff --git a/demos/audio_searching/README_cn.md b/demos/audio_searching/README_cn.md
index a93dbdc1f4585c35a86121b8a2629f7854cbed46..c13742af7a1613a089e1e14c069ec7a3340dd669 100644
--- a/demos/audio_searching/README_cn.md
+++ b/demos/audio_searching/README_cn.md
@@ -169,8 +169,8 @@ ffce340b3790  minio/minio:RELEASE.2020-12-03T00-03-10Z  "/usr/bin/docker-ent…"
     [2022-03-26 22:54:08,633] [    INFO] - embedding size: (192,)
     Extracting feature from audio No. 2 , 20 audios in total
     ...
-    2022-03-26 22:54:15,892 ｜ INFO ｜ main.py ｜ load_audios ｜ 85 ｜ Successfully loaded data, total count: 20
-    2022-03-26 22:54:15,908 ｜ INFO ｜ main.py ｜ count_audio ｜ 148 ｜ Successfully count the number of data!
+    2022-03-26 22:54:15,892 ｜ INFO ｜ audio_search.py ｜ load_audios ｜ 85 ｜ Successfully loaded data, total count: 20
+    2022-03-26 22:54:15,908 ｜ INFO ｜ audio_search.py ｜ count_audio ｜ 148 ｜ Successfully count the number of data!
     [2022-03-26 22:54:15,916] [    INFO] - checking the aduio file format......
     [2022-03-26 22:54:15,916] [    INFO] - The sample rate is 16000
     [2022-03-26 22:54:15,916] [    INFO] - The audio file format is right
@@ -185,12 +185,12 @@ ffce340b3790  minio/minio:RELEASE.2020-12-03T00-03-10Z  "/usr/bin/docker-ent…"
     [2022-03-26 22:54:15,924] [    INFO] - feats shape:[1, 80, 53], lengths shape: [1]
     [2022-03-26 22:54:16,051] [    INFO] - embedding size: (192,)
     ...
-    2022-03-26 22:54:16,086 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
-    2022-03-26 22:54:16,087 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
-    2022-03-26 22:54:16,087 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
+    2022-03-26 22:54:16,086 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/test.wav, score 100.0
+    2022-03-26 22:54:16,087 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, score 29.182177782058716
+    2022-03-26 22:54:16,087 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 132 ｜ search result http://testserver/data?audio_path=./example_audio/knife_cut_into_body.wav, score 22.73637056350708
     ...
-    2022-03-26 22:54:16,088 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 136 ｜ Successfully searched similar audio!
-    2022-03-26 22:54:17,164 ｜ INFO ｜ main.py ｜ drop_tables ｜ 160 ｜ Successfully drop tables in Milvus and MySQL!
+    2022-03-26 22:54:16,088 ｜ INFO ｜ audio_search.py ｜ search_local_audio ｜ 136 ｜ Successfully searched similar audio!
+    2022-03-26 22:54:17,164 ｜ INFO ｜ audio_search.py ｜ drop_tables ｜ 160 ｜ Successfully drop tables in Milvus and MySQL!
     ```
 
   - 前端测试（可选）
diff --git a/demos/audio_searching/src/operations/load.py b/demos/audio_searching/src/operations/load.py
index 0d9edb7846198f4e0e9111b642c0c55d6ff2dbb9..d1ea00576ec4fff72e9d1554ca2514e284ec9169 100644
--- a/demos/audio_searching/src/operations/load.py
+++ b/demos/audio_searching/src/operations/load.py
@@ -26,8 +26,9 @@ def get_audios(path):
     """
     supported_formats = [".wav", ".mp3", ".ogg", ".flac", ".m4a"]
     return [
-        item for sublist in [[os.path.join(dir, file) for file in files]
-                             for dir, _, files in list(os.walk(path))]
+        item
+        for sublist in [[os.path.join(dir, file) for file in files]
+                        for dir, _, files in list(os.walk(path))]
         for item in sublist if os.path.splitext(item)[1] in supported_formats
     ]
 
diff --git a/demos/audio_searching/src/test_vpr_search.py b/demos/audio_searching/src/test_vpr_search.py
index 8cc8dc8412e76d92ac04219c52cb940643df62b9..298e12ebaf2b4408f67df3e9fe16f6fd59cb6219 100644
--- a/demos/audio_searching/src/test_vpr_search.py
+++ b/demos/audio_searching/src/test_vpr_search.py
@@ -73,7 +73,9 @@ def test_data(spk: str):
     """
     Get the audio file by spk_id in MySQL
     """
-    response = client.get("/vpr/data?spk_id=" + spk)
+    response = client.get(
+        "/vpr/data",
+        json={"spk_id": spk}, )
     assert response.status_code == 200
 
 
@@ -81,7 +83,9 @@ def test_del(spk: str):
     """
     Delete the record in MySQL by spk_id
     """
-    response = client.post("/vpr/del?spk_id=" + spk)
+    response = client.post(
+        "/vpr/del",
+        json={"spk_id": spk}, )
     assert response.status_code == 200
 
 
diff --git a/demos/audio_searching/src/vpr_search.py b/demos/audio_searching/src/vpr_search.py
index 8e702221c8bc23213533654aa7d6545e91f5631b..2780dfb3bf2b8630bd1b2b5975f42d01056e1133 100644
--- a/demos/audio_searching/src/vpr_search.py
+++ b/demos/audio_searching/src/vpr_search.py
@@ -17,6 +17,7 @@ import uvicorn
 from config import UPLOAD_PATH
 from fastapi import FastAPI
 from fastapi import File
+from fastapi import Form
 from fastapi import UploadFile
 from logs import LOGGER
 from mysql_helpers import MySQLHelper
@@ -49,10 +50,12 @@ if not os.path.exists(UPLOAD_PATH):
 
 @app.post('/vpr/enroll')
 async def vpr_enroll(table_name: str=None,
-                     spk_id: str=None,
+                     spk_id: str=Form(...),
                      audio: UploadFile=File(...)):
     # Enroll the uploaded audio with spk-id into MySQL
     try:
+        if not spk_id:
+            return {'status': False, 'msg': "spk_id can not be None"}
         # Save the upload data to server.
         content = await audio.read()
         audio_path = os.path.join(UPLOAD_PATH, audio.filename)
@@ -63,7 +66,7 @@ async def vpr_enroll(table_name: str=None,
         return {'status': True, 'msg': "Successfully enroll data!"}
     except Exception as e:
         LOGGER.error(e)
-        return {'status': False, 'msg': e}, 400
+        return {'status': False, 'msg': e}
 
 
 @app.post('/vpr/enroll/local')
@@ -128,9 +131,12 @@ async def vpr_recog_local(request: Request,
 
 
 @app.post('/vpr/del')
-async def vpr_del(table_name: str=None, spk_id: str=None):
+async def vpr_del(table_name: str=None, spk_id: dict=None):
     # Delete a record by spk_id in MySQL
     try:
+        spk_id = spk_id['spk_id']
+        if not spk_id:
+            return {'status': False, 'msg': "spk_id can not be None"}
         do_delete(table_name, spk_id, MYSQL_CLI)
         LOGGER.info("Successfully delete a record by spk_id in MySQL")
         return {'status': True, 'msg': "Successfully delete data!"}
@@ -156,9 +162,12 @@ async def vpr_list(table_name: str=None):
 @app.get('/vpr/data')
 async def vpr_data(
     table_name: str=None,
-    spk_id: str=None, ):
+    spk_id: dict=None, ):
     # Get the audio file from path by spk_id in MySQL
     try:
+        spk_id = spk_id['spk_id']
+        if not spk_id:
+            return {'status': False, 'msg': "spk_id can not be None"}
         audio_path = do_get(table_name, spk_id, MYSQL_CLI)
         LOGGER.info(f"Successfully get audio path {audio_path}!")
         return FileResponse(audio_path)
diff --git a/demos/custom_streaming_asr/README.md b/demos/custom_streaming_asr/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..aa28d502f9da451b0279c224523160ad22f0b97a
--- /dev/null
+++ b/demos/custom_streaming_asr/README.md
@@ -0,0 +1,65 @@
+([简体中文](./README_cn.md)|English)
+
+# Customized Auto Speech Recognition
+
+## introduction
+In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
+
+this demo is customized for expense account, which need to recognize rare address.
+
+* G with slot: 打车到 "address_slot"。  
+![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
+
+* this is address slot wfst, you can add the address which want to recognize.  
+![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
+
+* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.  
+![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)  
+
+## Usage
+### 1. Installation
+install paddle:2.2.2 docker.
+```
+sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
+
+sudo docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
+```
+
+### 2. demo
+* run websocket_server.sh.  This script will download resources and libs, and launch the service.
+```
+cd /paddle
+bash websocket_server.sh
+```
+this script run in two steps:  
+1. download the resources.tar.gz, those direcotries will be found in resource directory.  
+model: acustic model  
+graph: the decoder graph (TLG.fst)  
+lib: some libs  
+bin: binary  
+data: audio and wav.scp  
+
+2. websocket_server_main launch the service.  
+some params:  
+port: the service port  
+graph_path: the decoder graph path  
+model_path: acustic model path  
+please refer other params in those files:  
+PaddleSpeech/speechx/speechx/decoder/param.h  
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc  
+
+* In other terminal, run script websocket_client.sh, the client will send data and get the results.
+```
+bash websocket_client.sh
+```
+websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
+
+* result:
+In the log of client, you will see the message below:
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```
\ No newline at end of file
diff --git a/demos/custom_streaming_asr/README_cn.md b/demos/custom_streaming_asr/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..ffbf682fb362394289083658364cb4bc0616682a
--- /dev/null
+++ b/demos/custom_streaming_asr/README_cn.md
@@ -0,0 +1,63 @@
+(简体中文|[English](./README.md))
+
+# 定制化语音识别演示
+## 介绍
+在一些场景中，识别系统需要高精度的识别一些稀有词，例如导航软件中地名识别。而通过定制化识别可以满足这一需求。  
+
+这个 demo 是打车报销单的场景识别，需要识别一些稀有的地名，可以通过如下操作实现。
+
+* G with slot: 打车到 "address_slot"。  
+![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
+
+* 这是 address slot wfst, 可以添加一些需要识别的地名.  
+![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
+
+* 通过 replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。  
+![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)  
+
+## 使用方法
+### 1. 配置环境
+安装paddle:2.2.2 docker镜像。
+```
+sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
+
+sudo docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
+```
+
+### 2. 演示
+* 运行如下命令，完成相关资源和库的下载和服务启动。
+```
+cd /paddle
+bash websocket_server.sh
+```
+上面脚本完成了如下两个功能：
+1. 完成 resource.tar.gz 下载，解压后,会在 resource 中发现如下目录：  
+model: 声学模型  
+graph: 解码构图  
+lib: 相关库  
+bin: 运行程序  
+data: 语音数据  
+
+2. 通过 websocket_server_main 来启动服务。
+这里简单的介绍几个参数:  
+port 是服务端口，  
+graph_path 用来指定解码图文件，  
+其他参数说明可参见代码：  
+PaddleSpeech/speechx/speechx/decoder/param.h  
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc  
+
+* 在另一个终端中， 通过 client 发送数据，得到结果。运行如下命令：
+```
+bash websocket_client.sh
+```
+通过 websocket_client_main 来启动 client 服务，其中 wav_scp 是发送的语音句子集合，port 为服务端口。
+
+* 结果：
+client 的 log 中可以看到如下类似的结果
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```
diff --git a/demos/custom_streaming_asr/path.sh b/demos/custom_streaming_asr/path.sh
new file mode 100644
index 0000000000000000000000000000000000000000..47462324d739e7cc5dbd16097d5ca5b5cbdacbf3
--- /dev/null
+++ b/demos/custom_streaming_asr/path.sh
@@ -0,0 +1,2 @@
+export LD_LIBRARY_PATH=$PWD/resource/lib
+export PATH=$PATH:$PWD/resource/bin
diff --git a/demos/custom_streaming_asr/setup_docker.sh b/demos/custom_streaming_asr/setup_docker.sh
new file mode 100644
index 0000000000000000000000000000000000000000..329a75db0ef34c8cb4e3a54d9663f027d1919a14
--- /dev/null
+++ b/demos/custom_streaming_asr/setup_docker.sh
@@ -0,0 +1 @@
+sudo nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
diff --git a/demos/custom_streaming_asr/websocket_client.sh b/demos/custom_streaming_asr/websocket_client.sh
new file mode 100755
index 0000000000000000000000000000000000000000..ede076cafa2529c89bc79dee211a8cf962cf960d
--- /dev/null
+++ b/demos/custom_streaming_asr/websocket_client.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+set +x
+set -e
+
+. path.sh
+# input
+data=$PWD/data
+
+# output
+wav_scp=wav.scp
+
+export GLOG_logtostderr=1
+
+# websocket client
+websocket_client_main \
+    --wav_rspecifier=scp:$data/$wav_scp \
+    --streaming_chunk=0.36 \
+    --port=8881
diff --git a/demos/custom_streaming_asr/websocket_server.sh b/demos/custom_streaming_asr/websocket_server.sh
new file mode 100755
index 0000000000000000000000000000000000000000..041c345be79722c882d50e828f2a2438c0eb9a24
--- /dev/null
+++ b/demos/custom_streaming_asr/websocket_server.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+set +x
+set -e
+
+export GLOG_logtostderr=1
+
+. path.sh
+#test websocket server 
+
+model_dir=./resource/model
+graph_dir=./resource/graph
+cmvn=./data/cmvn.ark
+
+
+#paddle_asr_online/resource.tar.gz
+if [ ! -f $cmvn ]; then
+    wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
+    tar xzfv resource.tar.gz
+    ln -s ./resource/data .
+fi
+
+websocket_server_main \
+    --cmvn_file=$cmvn \
+    --streaming_chunk=0.1 \
+    --use_fbank=true \
+    --model_path=$model_dir/avg_10.jit.pdmodel \
+    --param_path=$model_dir/avg_10.jit.pdiparams \
+    --model_cache_shapes="5-1-2048,5-1-2048" \
+    --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+    --word_symbol_table=$graph_dir/words.txt \
+    --graph_path=$graph_dir/TLG.fst --max_active=7500 \
+    --port=8881 \
+    --acoustic_scale=12 
diff --git a/demos/speaker_verification/README.md b/demos/speaker_verification/README.md
index b79f3f7a1660bda40695147b1177f512055f2702..b6a1d9bcc26058c2789f82444b2aa9eced26e0d0 100644
--- a/demos/speaker_verification/README.md
+++ b/demos/speaker_verification/README.md
@@ -14,7 +14,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
 You can choose one way from easy, meduim and hard to install paddlespeech.
 
 ### 2. Prepare Input File
-The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
 
 Here are sample files for this demo that can be downloaded:
 ```bash
diff --git a/demos/speaker_verification/README_cn.md b/demos/speaker_verification/README_cn.md
index db382f298df74c73ef5fcbd5a3fb64fb2fa1c44f..90bba38acf2d176092d224c5c1112418bbac353a 100644
--- a/demos/speaker_verification/README_cn.md
+++ b/demos/speaker_verification/README_cn.md
@@ -4,16 +4,16 @@
 ## 介绍
 声纹识别是一项用计算机程序自动提取说话人特征的技术。
 
-这个 demo 是一个从给定音频文件提取说话人特征，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
+这个 demo 是从一个给定音频文件中提取说话人特征，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
 
 ## 使用方法
 ### 1. 安装
 请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
 
-你可以从 easy，medium，hard 三中方式中选择一种方式安装。
+你可以从easy medium，hard 三种方式中选择一种方式安装。
 
 ### 2. 准备输入
-这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
+声纹cli demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
 
 可以下载此 demo 的示例音频：
 ```bash
diff --git a/demos/speech_recognition/README.md b/demos/speech_recognition/README.md
index 636548801b40a1485c28d77ca97a3c87265b95a7..6493e8e613800ea163b8669842c93a7dd82d68ac 100644
--- a/demos/speech_recognition/README.md
+++ b/demos/speech_recognition/README.md
@@ -24,13 +24,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Command Line(Recommended)
   ```bash
   # Chinese
-  paddlespeech asr --input ./zh.wav
+  paddlespeech asr --input ./zh.wav -v
   # English
-  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
+  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
   # Chinese ASR + Punctuation Restoration
-  paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+  paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
   ```
-  (It doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)
+  (If you don't want to see the log information, you can remove "-v". Besides, it doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)
   
   Usage:
   ```bash
@@ -45,6 +45,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
   - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
   - `yes`: No additional parameters required. Once set this parameter, it means accepting the request of the program by default, which includes transforming the audio sample rate. Default: `False`.
   - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
+  - `verbose`: Show the log information.
 
   Output:
   ```bash
@@ -84,8 +85,12 @@ Here is a list of pretrained models released by PaddleSpeech that can be used by
 
 | Model | Language | Sample Rate
 | :--- | :---: | :---: |
-| conformer_wenetspeech| zh| 16k
-| transformer_librispeech| en| 16k
+| conformer_wenetspeech | zh | 16k
+| conformer_online_multicn | zh | 16k
+| conformer_aishell | zh | 16k
+| conformer_online_aishell | zh | 16k
+| transformer_librispeech | en | 16k
+| deepspeech2online_wenetspeech | zh | 16k
 | deepspeech2offline_aishell| zh| 16k
 | deepspeech2online_aishell | zh | 16k
-|deepspeech2offline_librispeech|en| 16k
+| deepspeech2offline_librispeech | en | 16k
diff --git a/demos/speech_recognition/README_cn.md b/demos/speech_recognition/README_cn.md
index 8033dbd8130e5f282bceae286f6df7662f1deff8..8d631d89ca1d61196cbf167b3f263cfd478fb571 100644
--- a/demos/speech_recognition/README_cn.md
+++ b/demos/speech_recognition/README_cn.md
@@ -22,13 +22,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - 命令行 (推荐使用)
   ```bash
   # 中文
-  paddlespeech asr --input ./zh.wav
+  paddlespeech asr --input ./zh.wav -v
   # 英文
-  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
+  paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
   # 中文 + 标点恢复
-  paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+  paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
   ```
-  (如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error，没有关系，这个包是非必须的。)
+  (如果不想显示 log 信息，可以不使用"-v", 另外如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error，没有关系，这个包是非必须的。)
   
   使用方法：
   ```bash
@@ -43,6 +43,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
   - `ckpt_path`：模型参数文件，若不设置则下载预训练模型使用，默认值：`None`。
   - `yes`；不需要设置额外的参数，一旦设置了该参数，说明你默认同意程序的所有请求，其中包括自动转换输入音频的采样率。默认值：`False`。
   - `device`：执行预测的设备，默认值：当前系统下 paddlepaddle 的默认 device。
+  - `verbose`: 如果使用，显示 logger 信息。
 
   输出：
   ```bash
@@ -82,7 +83,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 | 模型 | 语言 | 采样率
 | :--- | :---: | :---: |
 | conformer_wenetspeech | zh | 16k
+| conformer_online_multicn | zh | 16k
+| conformer_aishell | zh | 16k
+| conformer_online_aishell | zh | 16k
 | transformer_librispeech | en | 16k
+| deepspeech2online_wenetspeech | zh | 16k
 | deepspeech2offline_aishell| zh| 16k
 | deepspeech2online_aishell | zh | 16k
 | deepspeech2offline_librispeech | en | 16k
diff --git a/demos/speech_server/README.md b/demos/speech_server/README.md
index 0323d3983ab58f40285f81f135dedf2f9f019b7e..a03a43dffa6464e2c517e4bac9c1af58fe0dd2d6 100644
--- a/demos/speech_server/README.md
+++ b/demos/speech_server/README.md
@@ -10,7 +10,7 @@ This demo is an implementation of starting the voice service and accessing the s
 ### 1. Installation
 see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 
-It is recommended to use **paddlepaddle 2.2.1** or above.
+It is recommended to use **paddlepaddle 2.2.2** or above.
 You can choose one way from meduim and hard to install paddlespeech.
 
 ### 2. Prepare config File
@@ -18,6 +18,7 @@ The configuration file can be found in `conf/application.yaml` .
 Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
 At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
 Currently the engine type supports two forms: python and inference (Paddle Inference)
+**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
 
 
 The input of  ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
@@ -83,6 +84,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 4. ASR Client Usage
 **Note:** The response time will be slightly longer when using the client for the first time
 - Command Line (Recommended)
+
+   If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
    ```
    paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
    ```
@@ -131,6 +135,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 5. TTS Client Usage
 **Note:** The response time will be slightly longer when using the client for the first time
 - Command Line (Recommended)
+
+   If `127.0.0.1` is not accessible, you need to use the actual service IP address
+
    ```bash
    paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
    ```
@@ -191,6 +198,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 6. CLS Client Usage
 **Note:** The response time will be slightly longer when using the client for the first time
 - Command Line (Recommended)
+
+   If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
    ```
    paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
    ```
@@ -235,6 +245,172 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
   ```
 
 
+### 7. Speaker Verification Client Usage
+
+#### 7.1 Extract speaker embedding
+**Note:** The response time will be slightly longer when using the client for the first time
+- Command Line (Recommended)
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ``` bash
+  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+  ```
+
+  Usage:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+
+  Arguments:
+    * server_ip: server ip. Default: 127.0.0.1
+    * port: server port. Default: 8090
+    * input(required): Input text to generate.
+    * task: the task of vector, can be use 'spk' or 'score。Default is 'spk'。
+    * enroll: enroll audio
+    * test: test audio
+
+  Output:
+
+  ```bash
+    [2022-05-08 00:18:44,249] [    INFO] - vector http client start
+    [2022-05-08 00:18:44,250] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-08 00:18:44,250] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
+    [2022-05-08 00:18:44,250] [    INFO] - http://127.0.0.1:8590/paddlespeech/vector
+    [2022-05-08 00:18:44,406] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+    [2022-05-08 00:18:44,406] [    INFO] - Response time 0.156481 s.
+  ```
+
+* Python API
+
+  ``` python
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input="85236145389.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="spk")
+  print(res)
+  ```
+
+  Output:
+
+  ``` bash
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+  ```
+
+#### 7.2 Get the score between speaker audio embedding
+
+**Note:** The response time will be slightly longer when using the client for the first time
+
+- Command Line (Recommended)
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ``` bash
+  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
+  ```
+
+  Usage:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+
+  Arguments:
+    * server_ip: server ip. Default: 127.0.0.1
+    * port: server port. Default: 8090
+    * input(required): Input text to generate.
+    * task: the task of vector, can be use 'spk' or 'score。If get the score, this must be 'score' parameter.
+    * enroll: enroll audio
+    * test: test audio
+  
+  Output:
+
+  ``` bash
+    [2022-05-09 10:28:40,556] [    INFO] - vector score http client start
+    [2022-05-09 10:28:40,556] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-09 10:28:40,556] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+    [2022-05-09 10:28:40,731] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+    [2022-05-09 10:28:40,731] [    INFO] - The vector: None
+    [2022-05-09 10:28:40,731] [    INFO] - Response time 0.175514 s.
+  ```
+
+* Python API
+
+  ``` python 
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input=None,
+      enroll_audio="85236145389.wav",
+      test_audio="123456789.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="score")
+  print(res)
+  ```
+
+  Output:
+
+  ``` bash
+  [2022-05-09 10:34:54,769] [    INFO] - vector score http client start
+  [2022-05-09 10:34:54,771] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-09 10:34:54,771] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+  [2022-05-09 10:34:55,026] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+  ```
+
+### 8. Punctuation prediction
+  
+**Note:** The response time will be slightly longer when using the client for the first time
+
+- Command Line (Recommended)
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+   ``` bash
+   paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
+   ```
+
+  Usage:
+  
+  ```bash
+  paddlespeech_client text --help
+  ```
+  Arguments:
+  - `server_ip`: server ip. Default: 127.0.0.1
+  - `port`: server port. Default: 8090
+  - `input`(required): Input text to get punctuation.
+
+  Output:
+  ```bash
+    [2022-05-09 18:19:04,397] [    INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+    [2022-05-09 18:19:04,397] [    INFO] - Response time 0.092407 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+  textclient_executor = TextClientExecutor()
+  res = textclient_executor(
+      input="我认为跑步最重要的就是给我带来了身体健康",
+      server_ip="127.0.0.1",
+      port=8090,)
+  print(res)
+
+  ```
+
+  Output:
+  ```bash
+  我认为跑步最重要的就是给我带来了身体健康。
+  ```
+
+
 ## Models supported by the service
 ### ASR model
 Get all models supported by the ASR service via `paddlespeech_server stats --task asr`, where static models can be used for paddle inference inference.
@@ -244,3 +420,9 @@ Get all models supported by the TTS service via `paddlespeech_server stats --tas
 
 ### CLS model
 Get all models supported by the CLS service via `paddlespeech_server stats --task cls`, where static models can be used for paddle inference inference.
+
+### Vector model
+Get all models supported by the TTS service via `paddlespeech_server stats --task vector`, where static models can be used for paddle inference inference.
+
+### Text model
+Get all models supported by the CLS service via `paddlespeech_server stats --task text`, where static models can be used for paddle inference inference.
diff --git a/demos/speech_server/README_cn.md b/demos/speech_server/README_cn.md
index 4a7c7447e0c0bf897fc272069a9e474f82836181..4895b182b7ae401da9e3030662d55bbd6b874818 100644
--- a/demos/speech_server/README_cn.md
+++ b/demos/speech_server/README_cn.md
@@ -1,29 +1,30 @@
-([简体中文](./README_cn.md)|English)
+(简体中文|[English](./README.md))
 
 # 语音服务
 
 ## 介绍
-这个demo是一个启动语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
+这个 demo 是一个启动离线语音服务和访问服务的实现。它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
 
 
 ## 使用方法
 ### 1. 安装
 请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 
-推荐使用 **paddlepaddle 2.2.1** 或以上版本。
-你可以从 medium，hard 三中方式中选择一种方式安装 PaddleSpeech。
+推荐使用 **paddlepaddle 2.2.2** 或以上版本。
+你可以从 medium，hard 两种方式中选择一种方式安装 PaddleSpeech。
 
 
 ### 2. 准备配置文件
 配置文件可参见 `conf/application.yaml` 。
 其中，`engine_list`表示即将启动的服务将会包含的语音引擎，格式为 <语音任务>_<引擎类型>。
-目前服务集成的语音任务有： asr(语音识别)、tts(语音合成)以及cls(音频分类)。
+目前服务集成的语音任务有： asr(语音识别)、tts(语音合成)、cls(音频分类)、vector(声纹识别)以及text(文本处理)。
 目前引擎类型支持两种形式：python 及 inference (Paddle Inference)
+**注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。
 
 
-这个 ASR client 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
+ASR client 的输入是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
 
-可以下载此 ASR client的示例音频：
+可以下载此 ASR client 的示例音频：
 ```bash
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
 ```
@@ -83,31 +84,34 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 4. ASR 客户端使用方法
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
-   ```
-   paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
 
-   ```
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
 
-    使用帮助:
-  
-    ```bash
-    paddlespeech_client asr --help
-    ```
+  ```
+  paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
 
-    参数:
-    - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
-    - `port`: 服务端口，默认: 8090。
-    - `input`(必须输入): 用于识别的音频文件。
-    - `sample_rate`: 音频采样率，默认值：16000。
-    - `lang`: 模型语言，默认值：zh_cn。
-    - `audio_format`: 音频格式，默认值：wav。
+  ```
 
-    输出:
+  使用帮助:
 
-    ```bash
-    [2022-02-23 18:11:22,819] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
-    [2022-02-23 18:11:22,820] [    INFO] - time cost 0.689145 s.
-    ```
+  ```bash
+  paddlespeech_client asr --help
+  ```
+
+  参数:
+  - `server_ip`: 服务端 ip 地址，默认: 127.0.0.1。
+  - `port`: 服务端口，默认: 8090。
+  - `input`(必须输入): 用于识别的音频文件。
+  - `sample_rate`: 音频采样率，默认值：16000。
+  - `lang`: 模型语言，默认值：zh_cn。
+  - `audio_format`: 音频格式，默认值：wav。
+
+  输出:
+
+  ```bash
+  [2022-02-23 18:11:22,819] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
+  [2022-02-23 18:11:22,820] [    INFO] - time cost 0.689145 s.
+  ```
 
 - Python API
   ```python
@@ -134,33 +138,35 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 5. TTS 客户端使用方法
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
-
-    ```bash
-    paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
-    ```
-    使用帮助:
   
-    ```bash
-    paddlespeech_client tts --help
-    ```
-
-    参数:
-    - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
-    - `port`: 服务端口，默认: 8090。
-    - `input`(必须输入): 待合成的文本。
-    - `spk_id`: 说话人 id，用于多说话人语音合成，默认值： 0。
-    - `speed`: 音频速度，该值应设置在 0 到 3 之间。 默认值：1.0
-    - `volume`: 音频音量，该值应设置在 0 到 3 之间。 默认值： 1.0
-    - `sample_rate`: 采样率，可选 [0, 8000, 16000]，默认与模型相同。 默认值：0
-    - `output`: 输出音频的路径， 默认值：None，表示不保存音频到本地。
-
-    输出:
-    ```bash
-    [2022-02-23 15:20:37,875] [    INFO] - {'description': 'success.'}
-    [2022-02-23 15:20:37,875] [    INFO] - Save synthesized audio successfully on output.wav.
-    [2022-02-23 15:20:37,875] [    INFO] - Audio duration: 3.612500 s.
-    [2022-02-23 15:20:37,875] [    INFO] - Response time: 0.348050 s.
-    ```
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ```bash
+  paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+  ```
+  使用帮助:
+
+  ```bash
+  paddlespeech_client tts --help
+  ```
+
+  参数:
+  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `port`: 服务端口，默认: 8090。
+  - `input`(必须输入): 待合成的文本。
+  - `spk_id`: 说话人 id，用于多说话人语音合成，默认值： 0。
+  - `speed`: 音频速度，该值应设置在 0 到 3 之间。 默认值：1.0
+  - `volume`: 音频音量，该值应设置在 0 到 3 之间。 默认值： 1.0
+  - `sample_rate`: 采样率，可选 [0, 8000, 16000]，默认与模型相同。 默认值：0
+  - `output`: 输出音频的路径， 默认值：None，表示不保存音频到本地。
+
+  输出:
+  ```bash
+  [2022-02-23 15:20:37,875] [    INFO] - {'description': 'success.'}
+  [2022-02-23 15:20:37,875] [    INFO] - Save synthesized audio successfully on output.wav.
+  [2022-02-23 15:20:37,875] [    INFO] - Audio duration: 3.612500 s.
+  [2022-02-23 15:20:37,875] [    INFO] - Response time: 0.348050 s.
+  ```
 
 - Python API
   ```python
@@ -192,12 +198,17 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 
   ```
 
-  ### 6. CLS 客户端使用方法
-  **注意：** 初次使用客户端时响应时间会略长
-  - 命令行 (推荐使用)
-   ```
-   paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
-   ```
+### 6. CLS 客户端使用方法
+
+**注意：** 初次使用客户端时响应时间会略长
+
+- 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ```
+  paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+  ```
 
   使用帮助:
   
@@ -205,7 +216,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
   paddlespeech_client cls --help
   ```
   参数:
-  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `server_ip`: 服务端 ip 地址，默认: 127.0.0.1。
   - `port`: 服务端口，默认: 8090。
   - `input`(必须输入): 用于分类的音频文件。
   - `topk`: 分类结果的topk。
@@ -239,13 +250,180 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 
   ```
 
+### 7. 声纹客户端使用方法
+
+#### 7.1 提取声纹特征
+注意： 初次使用客户端时响应时间会略长
+* 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ``` bash
+  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+  ```
+
+  使用帮助:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+  参数:
+  * server_ip: 服务端ip地址，默认: 127.0.0.1。
+  * port: 服务端口，默认: 8090。
+  * input(必须输入): 用于识别的音频文件。
+  * task: vector 的任务，可选spk或者score。默认是 spk。
+  * enroll: 注册音频；。
+  * test: 测试音频。
+  输出:
+
+  ``` bash
+    [2022-05-08 00:18:44,249] [    INFO] - vector http client start
+    [2022-05-08 00:18:44,250] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-08 00:18:44,250] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
+    [2022-05-08 00:18:44,250] [    INFO] - http://127.0.0.1:8590/paddlespeech/vector
+    [2022-05-08 00:18:44,406] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+    [2022-05-08 00:18:44,406] [    INFO] - Response time 0.156481 s.
+  ```
+
+* Python API
+
+  ``` python
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input="85236145389.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="spk")
+  print(res)
+  ```
+
+  输出:
+
+  ``` bash
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+  ```
+
+#### 7.2 音频声纹打分
+
+注意： 初次使用客户端时响应时间会略长
+* 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ``` bash
+  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
+  ```
+
+  使用帮助:
+
+  ``` bash
+  paddlespeech_client vector --help
+  ```
+
+  参数:
+  * server_ip: 服务端ip地址，默认: 127.0.0.1。
+  * port: 服务端口，默认: 8090。
+  * input(必须输入): 用于识别的音频文件。
+  * task: vector 的任务，可选spk或者score。默认是 spk。
+  * enroll: 注册音频；。
+  * test: 测试音频。
+
+  输出:
+
+  ``` bash
+    [2022-05-09 10:28:40,556] [    INFO] - vector score http client start
+    [2022-05-09 10:28:40,556] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-09 10:28:40,556] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+    [2022-05-09 10:28:40,731] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+    [2022-05-09 10:28:40,731] [    INFO] - The vector: None
+    [2022-05-09 10:28:40,731] [    INFO] - Response time 0.175514 s.
+  ```
+
+* Python API
+
+  ``` python 
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input=None,
+      enroll_audio="85236145389.wav",
+      test_audio="123456789.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="score")
+  print(res)
+  ```
+
+  输出:
+
+  ``` bash
+  [2022-05-09 10:34:54,769] [    INFO] - vector score http client start
+  [2022-05-09 10:34:54,771] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-09 10:34:54,771] [    INFO] - endpoint: http://127.0.0.1:8590/paddlespeech/vector/score
+  [2022-05-09 10:34:55,026] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+  ```
+
+
+### 8. 标点预测
+  
+  **注意：** 初次使用客户端时响应时间会略长
+- 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ``` bash
+  paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
+  ```
+
+  使用帮助:
+  
+  ```bash
+  paddlespeech_client text --help
+  ```
+  参数:
+  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `port`: 服务端口，默认: 8090。
+  - `input`(必须输入): 用于标点预测的文本内容。
+
+  输出:
+  ```bash
+    [2022-05-09 18:19:04,397] [    INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+    [2022-05-09 18:19:04,397] [    INFO] - Response time 0.092407 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+  textclient_executor = TextClientExecutor()
+  res = textclient_executor(
+      input="我认为跑步最重要的就是给我带来了身体健康",
+      server_ip="127.0.0.1",
+      port=8090,)
+  print(res)
+
+  ```
+
+  输出:
+  ```bash
+  我认为跑步最重要的就是给我带来了身体健康。
+  ```
 
 ## 服务支持的模型
-### ASR支持的模型
-通过 `paddlespeech_server stats --task asr` 获取ASR服务支持的所有模型，其中静态模型可用于 paddle inference 推理。 
+### ASR 支持的模型
+通过 `paddlespeech_server stats --task asr` 获取 ASR 服务支持的所有模型，其中静态模型可用于 paddle inference 推理。 
+
+### TTS 支持的模型
+通过 `paddlespeech_server stats --task tts` 获取 TTS 服务支持的所有模型，其中静态模型可用于  paddle inference 推理。
+
+### CLS 支持的模型
+通过 `paddlespeech_server stats --task cls` 获取 CLS 服务支持的所有模型，其中静态模型可用于  paddle inference 推理。
 
-### TTS支持的模型
-通过 `paddlespeech_server stats --task tts` 获取TTS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
+### Vector 支持的模型
+通过 `paddlespeech_server stats --task vector` 获取 Vector 服务支持的所有模型。
 
-### CLS支持的模型
-通过 `paddlespeech_server stats --task cls` 获取CLS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
+### Text支持的模型
+通过 `paddlespeech_server stats --task text` 获取 Text 服务支持的所有模型。
diff --git a/demos/speech_server/asr_client.sh b/demos/speech_server/asr_client.sh
index afe2f82181aeab08194963d126f7621bc59b8b63..37a7ab0b02e8afd6bb7d412314e804c56a2ac254 100644
--- a/demos/speech_server/asr_client.sh
+++ b/demos/speech_server/asr_client.sh
@@ -1,4 +1,6 @@
 #!/bin/bash
 
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
diff --git a/demos/speech_server/cls_client.sh b/demos/speech_server/cls_client.sh
index 5797aa204f6ba2cb260440e8709d7905134ddf53..67012648c7ec9ce3be6aa5f4da234116864fb503 100644
--- a/demos/speech_server/cls_client.sh
+++ b/demos/speech_server/cls_client.sh
@@ -1,4 +1,6 @@
 #!/bin/bash
 
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1
diff --git a/demos/speech_server/conf/application.yaml b/demos/speech_server/conf/application.yaml
index 2b1a05998083e08377d63ee02bc77323a7c4dce5..c6588ce802caa2419425fd5b94170a1e75d16568 100644
--- a/demos/speech_server/conf/application.yaml
+++ b/demos/speech_server/conf/application.yaml
@@ -1,15 +1,15 @@
-# This is the parameter configuration file for PaddleSpeech Serving.
+# This is the parameter configuration file for PaddleSpeech Offline Serving.
 
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
 port: 8090
 
 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference']
-
-engine_list: ['asr_python', 'tts_python', 'cls_python']
+# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
+protocol: 'http'
+engine_list: ['asr_python', 'tts_python', 'cls_python', 'text_python', 'vector_python']
 
 
 #################################################################################
@@ -135,3 +135,26 @@ cls_inference:
         glog_info: False  # True -> print glog
         summary: True  # False -> do not show predictor config
 
+
+################################### Text #########################################
+################### text task: punc; engine_type: python #######################
+text_python:
+    task: punc
+    model_type: 'ernie_linear_p3_wudao'
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    vocab_file: # [optional]
+    device:  # set 'gpu:id' or 'cpu'
+
+
+################################### Vector ######################################
+################### Vector task: spk; engine_type: python #######################
+vector_python:
+    task: spk
+    model_type: 'ecapatdnn_voxceleb12'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    device:  # set 'gpu:id' or 'cpu'
diff --git a/demos/speech_server/tts_client.sh b/demos/speech_server/tts_client.sh
index a756dfd3ef555f0b74e845d1b7754bed1d826e19..a443a0a94a6a6e19f0a0cf40708ebca3e8137624 100644
--- a/demos/speech_server/tts_client.sh
+++ b/demos/speech_server/tts_client.sh
@@ -1,3 +1,4 @@
 #!/bin/bash
 
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
diff --git a/demos/streaming_asr_server/README.md b/demos/streaming_asr_server/README.md
index 0eed8e5615f5185af884e372bf25d27b09a93936..4824da6281bc883f393dc16c9e43ba38c6bdcf6e 100644
--- a/demos/streaming_asr_server/README.md
+++ b/demos/streaming_asr_server/README.md
@@ -1,10 +1,11 @@
 ([简体中文](./README_cn.md)|English)
 
-# Speech Server
+# Streaming ASR Server
 
 ## Introduction
 This demo is an implementation of starting the streaming speech service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
 
+Streaming ASR server only support `websocket` protocol, and doesn't support `http` protocol.
 
 ## Usage
 ### 1. Installation
@@ -14,7 +15,7 @@ It is recommended to use **paddlepaddle 2.2.1** or above.
 You can choose one way from meduim and hard to install paddlespeech.
 
 ### 2. Prepare config File
-The configuration file can be found in `conf/ws_application.yaml` 和 `conf/ws_conformer_application.yaml`.
+The configuration file can be found in `conf/ws_application.yaml` 和 `conf/ws_conformer_wenetspeech_application.yaml`.
 
 At present, the speech tasks integrated by the model include: DeepSpeech2 and conformer.
 
@@ -28,10 +29,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 
 ### 3. Server Usage
 - Command Line (Recommended)
-
+  **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
   ```bash
-  # start the service
-   paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
+  # in PaddleSpeech/demos/streaming_asr_server start the service
+   paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
   ```
 
   Usage:
@@ -40,156 +41,82 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
   paddlespeech_server start --help
   ```
   Arguments:
-  - `config_file`: yaml file of the app, defalut: ./conf/ws_conformer_application.yaml
-  - `log_file`: log file. Default: ./log/paddlespeech.log
+  - `config_file`: yaml file of the app, defalut: `./conf/application.yaml`
+  - `log_file`: log file. Default: `./log/paddlespeech.log`
 
   Output:
   ```bash
-    [2022-04-21 15:52:18,126] [    INFO] - create the online asr engine instance
-    [2022-04-21 15:52:18,127] [    INFO] - paddlespeech_server set the device: cpu
-    [2022-04-21 15:52:18,128] [    INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,128] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
-    [2022-04-21 15:52:18,727] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:19,446] [    INFO] - start to create the stream conformer asr engine
-    [2022-04-21 15:52:19,473] [    INFO] - model name: conformer_online
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    [2022-04-21 15:52:21,731] [    INFO] - create the transformer like model success
-    [2022-04-21 15:52:21,733] [    INFO] - Initialize ASR server engine successfully.
-    INFO:     Started server process [11173]
-    [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
-    INFO:     Waiting for application startup.
-    [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    infos = await tasks.gather(*fs, loop=self)
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    await tasks.sleep(0, loop=self)
-    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
-    [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:13,086] [    INFO] - create the online asr engine instance
+     [2022-05-14 04:56:13,086] [    INFO] - paddlespeech_server set the device: cpu
+     [2022-05-14 04:56:13,087] [    INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+     [2022-05-14 04:56:13,087] [    INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5        checking...
+     [2022-05-14 04:56:17,542] [    INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.  0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,852] [    INFO] - start to create the stream conformer asr engine
+     [2022-05-14 04:56:17,863] [    INFO] - model name: conformer_online
+     [2022-05-14 04:56:22,756] [    INFO] - create the transformer like model success
+     [2022-05-14 04:56:22,758] [    INFO] - Initialize ASR server engine successfully.
+     INFO:     Started server process [4242]
+     [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+     INFO:     Waiting for application startup.
+     [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+     INFO:     Application startup complete.
+     [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+     INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
   ```
 
 - Python API
+  **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
   ```python
+  # in PaddleSpeech/demos/streaming_asr_server directory
   from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
 
   server_executor = ServerExecutor()
   server_executor(
-      config_file="./conf/ws_conformer_application.yaml", 
+      config_file="./conf/ws_conformer_wenetspeech_application.yaml",
       log_file="./log/paddlespeech.log")
   ```
 
   Output:
   ```bash
-    [2022-04-21 15:52:18,126] [    INFO] - create the online asr engine instance
-    [2022-04-21 15:52:18,127] [    INFO] - paddlespeech_server set the device: cpu
-    [2022-04-21 15:52:18,128] [    INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,128] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
-    [2022-04-21 15:52:18,727] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:19,446] [    INFO] - start to create the stream conformer asr engine
-    [2022-04-21 15:52:19,473] [    INFO] - model name: conformer_online
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    [2022-04-21 15:52:21,731] [    INFO] - create the transformer like model success
-    [2022-04-21 15:52:21,733] [    INFO] - Initialize ASR server engine successfully.
-    INFO:     Started server process [11173]
-    [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
-    INFO:     Waiting for application startup.
-    [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    infos = await tasks.gather(*fs, loop=self)
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    await tasks.sleep(0, loop=self)
-    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
-    [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:13,086] [    INFO] - create the online asr engine instance
+     [2022-05-14 04:56:13,086] [    INFO] - paddlespeech_server set the device: cpu
+     [2022-05-14 04:56:13,087] [    INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+     [2022-05-14 04:56:13,087] [    INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5        checking...
+     [2022-05-14 04:56:17,542] [    INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.  0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,852] [    INFO] - start to create the stream conformer asr engine
+     [2022-05-14 04:56:17,863] [    INFO] - model name: conformer_online
+     [2022-05-14 04:56:22,756] [    INFO] - create the transformer like model success
+     [2022-05-14 04:56:22,758] [    INFO] - Initialize ASR server engine successfully.
+     INFO:     Started server process [4242]
+     [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+     INFO:     Waiting for application startup.
+     [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+     INFO:     Application startup complete.
+     [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+     INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
   ```
 
 
 ### 4. ASR Client Usage
+
 **Note:** The response time will be slightly longer when using the client for the first time
 - Command Line (Recommended)
-   ```
-   paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
-   ```
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ```
+  paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+  ```
 
   Usage:
   
@@ -203,81 +130,86 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
   - `sample_rate`: Audio ampling rate, default: 16000.
   - `lang`: Language. Default: "zh_cn".
   - `audio_format`: Audio format. Default: "wav".
+  - `punc.server_ip`: punctuation server ip. Default: None.
+  - `punc.server_port`: punctuation server port. Default: None.
 
   Output:
   ```bash
-        [2022-04-21 15:59:03,904] [    INFO] - receive msg={"status": "ok", "signal": "server_ready"}
-        [2022-04-21 15:59:03,960] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,973] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,987] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,000] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,012] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,024] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,036] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,047] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,607] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,620] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,633] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,645] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,657] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,669] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,680] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:05,176] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,185] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,192] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,200] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,208] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,216] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,224] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,232] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,724] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,732] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,740] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,747] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,755] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,763] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,770] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:06,271] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,279] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,287] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,294] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,302] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,310] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,318] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,326] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,833] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,842] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,850] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,858] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,866] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,874] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,882] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:07,400] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,408] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,416] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,424] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,432] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,440] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,447] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,455] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,984] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:07,992] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,001] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,008] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,016] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,024] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,883] [    INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,884] [    INFO] - 我认为跑步最重要的就是给我带来了身体健康
-        [2022-04-21 15:59:12,884] [    INFO] - Response time 9.051567 s.
+      [2022-05-06 21:10:35,598] [    INFO] - Start to do streaming asr client
+      [2022-05-06 21:10:35,600] [    INFO] - asr websocket client start
+      [2022-05-06 21:10:35,600] [    INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+      [2022-05-06 21:10:35,600] [    INFO] - start to process the wavscp: ./zh.wav
+      [2022-05-06 21:10:35,670] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+      [2022-05-06 21:10:35,699] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,713] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,726] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,738] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,750] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,762] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,774] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,786] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,387] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,398] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,407] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,416] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,425] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,434] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,442] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,930] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,938] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,946] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,954] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,962] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,970] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,977] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,985] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:37,484] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,492] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,500] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,508] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,517] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,525] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,532] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:38,050] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,058] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,066] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,073] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,081] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,089] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,097] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,105] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,630] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,639] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,647] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,655] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,663] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,671] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,679] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:39,216] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,224] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,232] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,240] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,248] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,256] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,264] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,272] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,885] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,896] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,905] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,915] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,924] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,934] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:44,827] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+      [2022-05-06 21:10:44,827] [    INFO] - audio duration: 4.9968125, elapsed time: 9.225094079971313, RTF=1.846195765794957
+      [2022-05-06 21:10:44,828] [    INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
 
   ```
 
 - Python API
   ```python
-  from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
-  import json
+  from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
 
-  asrclient_executor = ASRClientExecutor()
+  asrclient_executor = ASROnlineClientExecutor()
   res = asrclient_executor(
       input="./zh.wav",
       server_ip="127.0.0.1",
@@ -285,71 +217,359 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
       sample_rate=16000,
       lang="zh_cn",
       audio_format="wav")
-  print(res.json())
+  print(res)
   ```
 
   Output:
   ```bash
-        [2022-04-21 15:59:03,904] [    INFO] - receive msg={"status": "ok", "signal": "server_ready"}
-        [2022-04-21 15:59:03,960] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,973] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,987] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,000] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,012] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,024] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,036] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,047] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,607] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,620] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,633] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,645] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,657] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,669] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,680] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:05,176] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,185] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,192] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,200] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,208] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,216] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,224] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,232] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,724] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,732] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,740] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,747] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,755] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,763] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,770] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:06,271] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,279] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,287] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,294] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,302] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,310] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,318] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,326] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,833] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,842] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,850] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,858] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,866] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,874] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,882] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:07,400] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,408] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,416] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,424] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,432] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,440] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,447] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,455] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,984] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:07,992] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,001] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,008] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,016] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,024] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,883] [    INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,884] [    INFO] - 我认为跑步最重要的就是给我带来了身体健康
+  [2022-05-06 21:14:03,137] [    INFO] - asr websocket client start
+  [2022-05-06 21:14:03,137] [    INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+  [2022-05-06 21:14:03,149] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+  [2022-05-06 21:14:03,167] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,181] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,194] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,207] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,219] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,230] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,241] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,252] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,768] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,776] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,784] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,792] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,800] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,807] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,815] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:04,301] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,309] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,317] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,325] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,333] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,341] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,349] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,356] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,855] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,864] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,871] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,879] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,887] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,894] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,902] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:05,418] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,426] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,434] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,442] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,449] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,457] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,465] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,473] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,996] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,006] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,013] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,021] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,029] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,037] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,045] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,581] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,589] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,597] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,605] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,613] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,621] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,628] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,636] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:07,188] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,196] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,203] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,211] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,219] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,226] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:12,158] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+  [2022-05-06 21:14:12,159] [    INFO] - audio duration: 4.9968125, elapsed time: 9.019973039627075, RTF=1.8051453881103354
+  [2022-05-06 21:14:12,160] [    INFO] - asr websocket client finished
+  ```
+
+
+## Punctuation service
+
+### 1. Server usage
+ 
+- Command Line
+  **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
+  ``` bash
+  In PaddleSpeech/demos/streaming_asr_server directory to lanuch punctuation service
+  paddlespeech_server start --config_file conf/punc_application.yaml
+  ```
+
+
+   Usage:
+  ```bash
+  paddlespeech_server start --help
+  ```
+  
+  Arguments:
+  - `config_file`: configuration file.
+  - `log_file`: log file.
+
+
+  Output:
+  ``` bash
+  [2022-05-02 17:59:26,285] [    INFO] - Create the TextEngine Instance
+  [2022-05-02 17:59:26,285] [    INFO] - Init the text engine
+  [2022-05-02 17:59:26,285] [    INFO] - Text Engine set the device: gpu:0
+  [2022-05-02 17:59:26,286] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+  [2022-05-02 17:59:30,810] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+  W0502 17:59:31.486552  9595 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+  W0502 17:59:31.491360  9595 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+  [2022-05-02 17:59:34,688] [    INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+  [2022-05-02 17:59:34,701] [    INFO] - Init the text engine successfully
+  INFO:     Started server process [9595]
+  [2022-05-02 17:59:34] [INFO] [server.py:75] Started server process [9595]
+  INFO:     Waiting for application startup.
+  [2022-05-02 17:59:34] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-05-02 17:59:34] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+  [2022-05-02 17:59:34] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+  ```
+
+- Python API
+  **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
+  ```python
+  # 在 PaddleSpeech/demos/streaming_asr_server 目录
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/punc_application.yaml", 
+      log_file="./log/paddlespeech.log")
+  ```
+
+   Output:
+   ```
+    [2022-05-02 18:09:02,542] [    INFO] - Create the TextEngine Instance
+    [2022-05-02 18:09:02,543] [    INFO] - Init the text engine
+    [2022-05-02 18:09:02,543] [    INFO] - Text Engine set the device: gpu:0
+    [2022-05-02 18:09:02,545] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+    [2022-05-02 18:09:06,919] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+    W0502 18:09:07.523002 22615 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+    W0502 18:09:07.527882 22615 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+    [2022-05-02 18:09:10,900] [    INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+    [2022-05-02 18:09:10,913] [    INFO] - Init the text engine successfully
+    INFO:     Started server process [22615]
+    [2022-05-02 18:09:10] [INFO] [server.py:75] Started server process [22615]
+    INFO:     Waiting for application startup.
+    [2022-05-02 18:09:10] [INFO] [on.py:45] Waiting for application startup.
+    INFO:     Application startup complete.
+    [2022-05-02 18:09:10] [INFO] [on.py:59] Application startup complete.
+    INFO:     Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+    [2022-05-02 18:09:10] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+   ```
+
+### 2. Client usage
+**Note** The response time will be slightly longer when using the client for the first time
+
+- Command line:
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ```
+  paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康"
+  ```
+  
+  Output
+  ```
+  [2022-05-02 18:12:29,767] [    INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+  [2022-05-02 18:12:29,767] [    INFO] - Response time 0.096548 s.
+  ```
+
+- Python3 API
+
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+  textclient_executor = TextClientExecutor()
+  res = textclient_executor(
+      input="我认为跑步最重要的就是给我带来了身体健康",
+      server_ip="127.0.0.1",
+      port=8190,)
+  print(res)
+  ```
+
+  Output:
+  ``` bash
+  我认为跑步最重要的就是给我带来了身体健康。
+  ```
+
+
+## Join streaming asr and punctuation server
+
+By default, each server is deployed on the 'CPU' device and speech recognition and punctuation prediction can be deployed on different 'GPU' by modifying the' device 'parameter in the service configuration file respectively.
+
+We use `streaming_ asr_server.py` and `punc_server.py` two services to lanuch streaming speech recognition and punctuation prediction services respectively. And the `websocket_client.py` script can be used to call streaming speech recognition and punctuation prediction services at the same time.
+
+### 1. Start two server
+
+``` bash
+Note: streaming speech recognition and punctuation prediction are configured on different graphics cards through configuration files
+bash server.sh
+```
+
+### 2. Call client
+- Command line
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ```
+  paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
+  ```
+  Output:
+  ```
+  [2022-05-07 11:21:47,060] [    INFO] - asr websocket client start
+  [2022-05-07 11:21:47,060] [    INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+  [2022-05-07 11:21:47,080] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+  [2022-05-07 11:21:47,096] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,108] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,120] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,131] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,142] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,152] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,163] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,173] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,705] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,713] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,721] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,728] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,736] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,743] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,751] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:48,459] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,572] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,681] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,790] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,898] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,005] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,112] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,219] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,935] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,062] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,186] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,310] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,435] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,560] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,686] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:51,444] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:51,606] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:51,744] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:51,882] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,020] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,159] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,298] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,437] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:53,298] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,450] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,589] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,728] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,867] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:54,007] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:54,146] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:55,002] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,148] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,292] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,437] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,584] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,731] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,877] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:56,021] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:56,842] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,013] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,174] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,336] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,497] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,659] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:22:03,035] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+  [2022-05-07 11:22:03,035] [    INFO] - audio duration: 4.9968125, elapsed time: 15.974023818969727, RTF=3.1968427510477384
+  [2022-05-07 11:22:03,037] [    INFO] - asr websocket client finished
+  [2022-05-07 11:22:03,037] [    INFO] - 我认为跑步最重要的就是给我带来了身体健康。
+  [2022-05-07 11:22:03,037] [    INFO] - Response time 15.977116 s.
   ```
+
+- Use script
+
+  If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+  ```
+  python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
+  ```
+  Output:
+  ```
+  [2022-05-07 11:11:02,984] [    INFO] - Start to do streaming asr client
+  [2022-05-07 11:11:02,985] [    INFO] - asr websocket client start
+  [2022-05-07 11:11:02,985] [    INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+  [2022-05-07 11:11:02,986] [    INFO] - start to process the wavscp: ./zh.wav
+  [2022-05-07 11:11:03,006] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+  [2022-05-07 11:11:03,021] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,034] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,046] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,058] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,070] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,081] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,092] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,102] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,629] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,638] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,645] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,653] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,661] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,668] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,676] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:04,402] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,510] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,619] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,743] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,849] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,956] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:05,063] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:05,170] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:05,876] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,019] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,184] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,342] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,537] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,727] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,871] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:07,617] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:07,769] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:07,905] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,043] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,186] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,326] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,466] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,611] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:09,431] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,571] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,714] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,853] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,992] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:10,129] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:10,266] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:11,113] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,296] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,439] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,582] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,727] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,869] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:12,011] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:12,153] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:12,969] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,137] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,297] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,456] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,615] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,776] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:18,915] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+  [2022-05-07 11:11:18,915] [    INFO] - audio duration: 4.9968125, elapsed time: 15.928460597991943, RTF=3.187724293835709
+  [2022-05-07 11:11:18,916] [    INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
+  ```
+
+  
diff --git a/demos/streaming_asr_server/README_cn.md b/demos/streaming_asr_server/README_cn.md
index bf122bb3afe845d76a6327c378917169c4dbf3ff..4ed15e17e4d2189e1579ca5a528f2072b41af320 100644
--- a/demos/streaming_asr_server/README_cn.md
+++ b/demos/streaming_asr_server/README_cn.md
@@ -1,22 +1,30 @@
 ([English](./README.md)|中文)
 
-# 语音服务
+# 流式语音识别服务
 
 ## 介绍
 这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
 
+**流式语音识别服务只支持 `weboscket` 协议，不支持 `http` 协议。**
 
 ## 使用方法
 ### 1. 安装
-请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+安装 PaddleSpeech 的详细过程请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md)。
 
 推荐使用 **paddlepaddle 2.2.1** 或以上版本。
-你可以从 medium，hard 三中方式中选择一种方式安装 PaddleSpeech。
+你可以从medium，hard 两种方式中选择一种方式安装 PaddleSpeech。
 
 
 ### 2. 准备配置文件
-配置文件可参见 `conf/ws_application.yaml` 和 `conf/ws_conformer_application.yaml` 。
-目前服务集成的模型有： DeepSpeech2和conformer模型。
+
+流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。
+下载好 `PaddleSpeech` 之后，进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。
+配置文件可参见该目录下 `conf/ws_application.yaml` 和 `conf/ws_conformer_wenetspeech_application.yaml` 。
+
+目前服务集成的模型有： DeepSpeech2和 conformer模型，对应的配置文件如下：
+* DeepSpeech: `conf/ws_application.yaml`
+* conformer: `conf/ws_conformer_wenetspeech_application.yaml`
+
 
 
 这个 ASR client 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
@@ -28,10 +36,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 
 ### 3. 服务端使用方法
 - 命令行 (推荐使用)
-
+  **注意:** 默认部署在 `cpu` 设备上，可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
   ```bash
-  # 启动服务
-  paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
+  # 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
+  paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
   ```
 
   使用方法：
@@ -40,155 +48,80 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
   paddlespeech_server start --help
   ```
   参数:
-  - `config_file`: 服务的配置文件，默认： ./conf/ws_conformer_application.yaml
-  - `log_file`: log 文件. 默认：./log/paddlespeech.log
+  - `config_file`: 服务的配置文件，默认： `./conf/application.yaml`
+  - `log_file`: log 文件. 默认：`./log/paddlespeech.log`
 
   输出:
   ```bash
-    [2022-04-21 15:52:18,126] [    INFO] - create the online asr engine instance
-    [2022-04-21 15:52:18,127] [    INFO] - paddlespeech_server set the device: cpu
-    [2022-04-21 15:52:18,128] [    INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,128] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
-    [2022-04-21 15:52:18,727] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:19,446] [    INFO] - start to create the stream conformer asr engine
-    [2022-04-21 15:52:19,473] [    INFO] - model name: conformer_online
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    [2022-04-21 15:52:21,731] [    INFO] - create the transformer like model success
-    [2022-04-21 15:52:21,733] [    INFO] - Initialize ASR server engine successfully.
-    INFO:     Started server process [11173]
-    [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
-    INFO:     Waiting for application startup.
-    [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    infos = await tasks.gather(*fs, loop=self)
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    await tasks.sleep(0, loop=self)
-    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
-    [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:13,086] [    INFO] - create the online asr engine instance
+     [2022-05-14 04:56:13,086] [    INFO] - paddlespeech_server set the device: cpu
+     [2022-05-14 04:56:13,087] [    INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+     [2022-05-14 04:56:13,087] [    INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5        checking...
+     [2022-05-14 04:56:17,542] [    INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.  0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,852] [    INFO] - start to create the stream conformer asr engine
+     [2022-05-14 04:56:17,863] [    INFO] - model name: conformer_online
+     [2022-05-14 04:56:22,756] [    INFO] - create the transformer like model success
+     [2022-05-14 04:56:22,758] [    INFO] - Initialize ASR server engine successfully.
+     INFO:     Started server process [4242]
+     [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+     INFO:     Waiting for application startup.
+     [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+     INFO:     Application startup complete.
+     [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+     INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
   ```
 
 - Python API
+  **注意:** 默认部署在 `cpu` 设备上，可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
   ```python
+  # 在 PaddleSpeech/demos/streaming_asr_server 目录
   from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
 
   server_executor = ServerExecutor()
   server_executor(
-      config_file="./conf/ws_conformer_application.yaml", 
+      config_file="./conf/ws_conformer_wenetspeech_application", 
       log_file="./log/paddlespeech.log")
   ```
 
   输出：
   ```bash
-    [2022-04-21 15:52:18,126] [    INFO] - create the online asr engine instance
-    [2022-04-21 15:52:18,127] [    INFO] - paddlespeech_server set the device: cpu
-    [2022-04-21 15:52:18,128] [    INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,128] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
-    [2022-04-21 15:52:18,727] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:18,727] [    INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
-    [2022-04-21 15:52:19,446] [    INFO] - start to create the stream conformer asr engine
-    [2022-04-21 15:52:19,473] [    INFO] - model name: conformer_online
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    set kaiming_uniform
-    [2022-04-21 15:52:21,731] [    INFO] - create the transformer like model success
-    [2022-04-21 15:52:21,733] [    INFO] - Initialize ASR server engine successfully.
-    INFO:     Started server process [11173]
-    [2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
-    INFO:     Waiting for application startup.
-    [2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    infos = await tasks.gather(*fs, loop=self)
-    /home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
-    await tasks.sleep(0, loop=self)
-    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
-    [2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:13,086] [    INFO] - create the online asr engine instance
+     [2022-05-14 04:56:13,086] [    INFO] - paddlespeech_server set the device: cpu
+     [2022-05-14 04:56:13,087] [    INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
+     [2022-05-14 04:56:13,087] [    INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5        checking...
+     [2022-05-14 04:56:17,542] [    INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.  0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,543] [    INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/               chunk_conformer/checkpoints/avg_10.pdparams
+     [2022-05-14 04:56:17,852] [    INFO] - start to create the stream conformer asr engine
+     [2022-05-14 04:56:17,863] [    INFO] - model name: conformer_online
+     [2022-05-14 04:56:22,756] [    INFO] - create the transformer like model success
+     [2022-05-14 04:56:22,758] [    INFO] - Initialize ASR server engine successfully.
+     INFO:     Started server process [4242]
+     [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
+     INFO:     Waiting for application startup.
+     [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
+     INFO:     Application startup complete.
+     [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
+     INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
+     [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
   ```
 
 ### 4. ASR 客户端使用方法
+
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
+
+   若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
    ```
    paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
-
    ```
 
     使用帮助:
@@ -204,79 +137,84 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
     - `sample_rate`: 音频采样率，默认值：16000。
     - `lang`: 模型语言，默认值：zh_cn。
     - `audio_format`: 音频格式，默认值：wav。
+    - `punc.server_ip` 标点预测服务的ip。默认是None。
+    - `punc.server_port` 标点预测服务的端口port。默认是None。
 
     输出:
 
     ```bash
-        [2022-04-21 15:59:03,904] [    INFO] - receive msg={"status": "ok", "signal": "server_ready"}
-        [2022-04-21 15:59:03,960] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,973] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,987] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,000] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,012] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,024] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,036] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,047] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,607] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,620] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,633] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,645] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,657] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,669] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,680] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:05,176] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,185] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,192] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,200] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,208] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,216] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,224] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,232] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,724] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,732] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,740] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,747] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,755] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,763] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,770] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:06,271] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,279] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,287] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,294] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,302] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,310] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,318] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,326] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,833] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,842] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,850] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,858] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,866] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,874] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,882] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:07,400] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,408] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,416] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,424] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,432] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,440] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,447] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,455] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,984] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:07,992] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,001] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,008] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,016] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,024] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,883] [    INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,884] [    INFO] - 我认为跑步最重要的就是给我带来了身体健康
-        [2022-04-21 15:59:12,884] [    INFO] - Response time 9.051567 s.
+      [2022-05-06 21:10:35,598] [    INFO] - Start to do streaming asr client
+      [2022-05-06 21:10:35,600] [    INFO] - asr websocket client start
+      [2022-05-06 21:10:35,600] [    INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+      [2022-05-06 21:10:35,600] [    INFO] - start to process the wavscp: ./zh.wav
+      [2022-05-06 21:10:35,670] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+      [2022-05-06 21:10:35,699] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,713] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,726] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,738] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,750] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,762] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,774] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:35,786] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,387] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,398] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,407] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,416] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,425] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,434] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,442] [    INFO] - client receive msg={'result': ''}
+      [2022-05-06 21:10:36,930] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,938] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,946] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,954] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,962] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,970] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,977] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:36,985] [    INFO] - client receive msg={'result': '我认为跑'}
+      [2022-05-06 21:10:37,484] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,492] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,500] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,508] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,517] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,525] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:37,532] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+      [2022-05-06 21:10:38,050] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,058] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,066] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,073] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,081] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,089] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,097] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,105] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+      [2022-05-06 21:10:38,630] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,639] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,647] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,655] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,663] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,671] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:38,679] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+      [2022-05-06 21:10:39,216] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,224] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,232] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,240] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,248] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,256] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,264] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,272] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+      [2022-05-06 21:10:39,885] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,896] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,905] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,915] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,924] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:39,934] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+      [2022-05-06 21:10:44,827] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+      [2022-05-06 21:10:44,827] [    INFO] - audio duration: 4.9968125, elapsed time: 9.225094079971313, RTF=1.846195765794957
+      [2022-05-06 21:10:44,828] [    INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
     ```
 
 - Python API
   ```python
   from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
-  import json
 
   asrclient_executor = ASROnlineClientExecutor()
   res = asrclient_executor(
@@ -286,71 +224,360 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
       sample_rate=16000,
       lang="zh_cn",
       audio_format="wav")
-  print(res.json())
+  print(res)
   ```
 
   输出:
   ```bash
-        [2022-04-21 15:59:03,904] [    INFO] - receive msg={"status": "ok", "signal": "server_ready"}
-        [2022-04-21 15:59:03,960] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,973] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:03,987] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,000] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,012] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,024] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,036] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,047] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,607] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,620] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,633] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,645] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,657] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,669] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:04,680] [    INFO] - receive msg={'asr_results': ''}
-        [2022-04-21 15:59:05,176] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,185] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,192] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,200] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,208] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,216] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,224] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,232] [    INFO] - receive msg={'asr_results': '我认为跑'}
-        [2022-04-21 15:59:05,724] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,732] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,740] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,747] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,755] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,763] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:05,770] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
-        [2022-04-21 15:59:06,271] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,279] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,287] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,294] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,302] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,310] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,318] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,326] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
-        [2022-04-21 15:59:06,833] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,842] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,850] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,858] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,866] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,874] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:06,882] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
-        [2022-04-21 15:59:07,400] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,408] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,416] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,424] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,432] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,440] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,447] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,455] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
-        [2022-04-21 15:59:07,984] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:07,992] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,001] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,008] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,016] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:08,024] [    INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,883] [    INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
-        [2022-04-21 15:59:12,884] [    INFO] - 我认为跑步最重要的就是给我带来了身体健康
+  [2022-05-06 21:14:03,137] [    INFO] - asr websocket client start
+  [2022-05-06 21:14:03,137] [    INFO] - endpoint: ws://127.0.0.1:8390/paddlespeech/asr/streaming
+  [2022-05-06 21:14:03,149] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+  [2022-05-06 21:14:03,167] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,181] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,194] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,207] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,219] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,230] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,241] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,252] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,768] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,776] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,784] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,792] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,800] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,807] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:03,815] [    INFO] - client receive msg={'result': ''}
+  [2022-05-06 21:14:04,301] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,309] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,317] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,325] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,333] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,341] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,349] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,356] [    INFO] - client receive msg={'result': '我认为跑'}
+  [2022-05-06 21:14:04,855] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,864] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,871] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,879] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,887] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,894] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:04,902] [    INFO] - client receive msg={'result': '我认为跑步最重要的'}
+  [2022-05-06 21:14:05,418] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,426] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,434] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,442] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,449] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,457] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,465] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,473] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是'}
+  [2022-05-06 21:14:05,996] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,006] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,013] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,021] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,029] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,037] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,045] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给'}
+  [2022-05-06 21:14:06,581] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,589] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,597] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,605] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,613] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,621] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,628] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:06,636] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了'}
+  [2022-05-06 21:14:07,188] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,196] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,203] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,211] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,219] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:07,226] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康'}
+  [2022-05-06 21:14:12,158] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+  [2022-05-06 21:14:12,159] [    INFO] - audio duration: 4.9968125, elapsed time: 9.019973039627075, RTF=1.8051453881103354
+  [2022-05-06 21:14:12,160] [    INFO] - asr websocket client finished
+  ```
+
+
+
+## 标点预测
+
+### 1. 服务端使用方法
+
+- 命令行
+  **注意:** 默认部署在 `cpu` 设备上，可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
+  ``` bash
+  在 PaddleSpeech/demos/streaming_asr_server 目录下启动标点预测服务
+  paddlespeech_server start --config_file conf/punc_application.yaml
+  ```
+
+
+   使用方法：
+  
+  ```bash
+  paddlespeech_server start --help
+  ```
+  
+  参数：
+  - `config_file`: 服务的配置文件。
+  - `log_file`: log 文件。
+
+
+  输出：
+  ``` bash
+  [2022-05-02 17:59:26,285] [    INFO] - Create the TextEngine Instance
+  [2022-05-02 17:59:26,285] [    INFO] - Init the text engine
+  [2022-05-02 17:59:26,285] [    INFO] - Text Engine set the device: gpu:0
+  [2022-05-02 17:59:26,286] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+  [2022-05-02 17:59:30,810] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+  W0502 17:59:31.486552  9595 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+  W0502 17:59:31.491360  9595 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+  [2022-05-02 17:59:34,688] [    INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+  [2022-05-02 17:59:34,701] [    INFO] - Init the text engine successfully
+  INFO:     Started server process [9595]
+  [2022-05-02 17:59:34] [INFO] [server.py:75] Started server process [9595]
+  INFO:     Waiting for application startup.
+  [2022-05-02 17:59:34] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-05-02 17:59:34] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+  [2022-05-02 17:59:34] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+  ```
+
+- Python API
+  **注意:** 默认部署在 `cpu` 设备上，可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
+  ```python
+  # 在 PaddleSpeech/demos/streaming_asr_server 目录
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/punc_application.yaml", 
+      log_file="./log/paddlespeech.log")
   ```
+
+   输出
+   ```
+    [2022-05-02 18:09:02,542] [    INFO] - Create the TextEngine Instance
+    [2022-05-02 18:09:02,543] [    INFO] - Init the text engine
+    [2022-05-02 18:09:02,543] [    INFO] - Text Engine set the device: gpu:0
+    [2022-05-02 18:09:02,545] [    INFO] - File /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar.gz md5 checking...
+    [2022-05-02 18:09:06,919] [    INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/ernie_linear_p3_wudao-punc-zh.tar
+    W0502 18:09:07.523002 22615 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
+    W0502 18:09:07.527882 22615 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+    [2022-05-02 18:09:10,900] [    INFO] - Already cached /home/users/xiongxinlei/.paddlenlp/models/ernie-1.0/vocab.txt
+    [2022-05-02 18:09:10,913] [    INFO] - Init the text engine successfully
+    INFO:     Started server process [22615]
+    [2022-05-02 18:09:10] [INFO] [server.py:75] Started server process [22615]
+    INFO:     Waiting for application startup.
+    [2022-05-02 18:09:10] [INFO] [on.py:45] Waiting for application startup.
+    INFO:     Application startup complete.
+    [2022-05-02 18:09:10] [INFO] [on.py:59] Application startup complete.
+    INFO:     Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+    [2022-05-02 18:09:10] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8190 (Press CTRL+C to quit)
+   ```
+
+### 2. 标点预测客户端使用方法
+**注意：** 初次使用客户端时响应时间会略长
+
+- 命令行 (推荐使用)
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+   ```
+   paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康"
+   ```
+  
+  输出
+  ```
+  [2022-05-02 18:12:29,767] [    INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
+  [2022-05-02 18:12:29,767] [    INFO] - Response time 0.096548 s.
+  ```
+
+- Python3 API
+
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
+
+  textclient_executor = TextClientExecutor()
+  res = textclient_executor(
+      input="我认为跑步最重要的就是给我带来了身体健康",
+      server_ip="127.0.0.1",
+      port=8190,)
+  print(res)
+  ```
+
+  输出：
+  ``` bash
+  我认为跑步最重要的就是给我带来了身体健康。
+  ```
+
+
+## 联合流式语音识别和标点预测
+**注意:** 默认部署在 `cpu` 设备上，可以通过修改服务配置文件中 `device` 参数将语音识别和标点预测部署在不同的 `gpu` 上。
+
+使用 `streaming_asr_server.py` 和 `punc_server.py` 两个服务，分别启动流式语音识别和标点预测服务。调用 `websocket_client.py` 脚本可以同时调用流式语音识别和标点预测服务。
+
+### 1. 启动服务
+
+``` bash
+注意：流式语音识别和标点预测通过配置文件配置到不同的显卡上
+bash server.sh
+```
+
+### 2. 调用服务
+- 使用命令行：
+
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ```
+  paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
+  ```
+  输出：
+  ```
+  [2022-05-07 11:21:47,060] [    INFO] - asr websocket client start
+  [2022-05-07 11:21:47,060] [    INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+  [2022-05-07 11:21:47,080] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+  [2022-05-07 11:21:47,096] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,108] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,120] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,131] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,142] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,152] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,163] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,173] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,705] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,713] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,721] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,728] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,736] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,743] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:47,751] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:21:48,459] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,572] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,681] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,790] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:48,898] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,005] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,112] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,219] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:21:49,935] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,062] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,186] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,310] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,435] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,560] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:50,686] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:21:51,444] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:51,606] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:51,744] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:51,882] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,020] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,159] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,298] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:52,437] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:21:53,298] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,450] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,589] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,728] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:53,867] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:54,007] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:54,146] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:21:55,002] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,148] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,292] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,437] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,584] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,731] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:55,877] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:56,021] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:21:56,842] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,013] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,174] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,336] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,497] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:21:57,659] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:22:03,035] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+  [2022-05-07 11:22:03,035] [    INFO] - audio duration: 4.9968125, elapsed time: 15.974023818969727, RTF=3.1968427510477384
+  [2022-05-07 11:22:03,037] [    INFO] - asr websocket client finished
+  [2022-05-07 11:22:03,037] [    INFO] - 我认为跑步最重要的就是给我带来了身体健康。
+  [2022-05-07 11:22:03,037] [    INFO] - Response time 15.977116 s.
+  ```
+
+- 使用脚本调用
+  
+  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+  ```
+  python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
+  ```
+  输出：
+  ```
+  [2022-05-07 11:11:02,984] [    INFO] - Start to do streaming asr client
+  [2022-05-07 11:11:02,985] [    INFO] - asr websocket client start
+  [2022-05-07 11:11:02,985] [    INFO] - endpoint: ws://127.0.0.1:8490/paddlespeech/asr/streaming
+  [2022-05-07 11:11:02,986] [    INFO] - start to process the wavscp: ./zh.wav
+  [2022-05-07 11:11:03,006] [    INFO] - client receive msg={"status": "ok", "signal": "server_ready"}
+  [2022-05-07 11:11:03,021] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,034] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,046] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,058] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,070] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,081] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,092] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,102] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,629] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,638] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,645] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,653] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,661] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,668] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:03,676] [    INFO] - client receive msg={'result': ''}
+  [2022-05-07 11:11:04,402] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,510] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,619] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,743] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,849] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:04,956] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:05,063] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:05,170] [    INFO] - client receive msg={'result': '我认为，跑'}
+  [2022-05-07 11:11:05,876] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,019] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,184] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,342] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,537] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,727] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:06,871] [    INFO] - client receive msg={'result': '我认为，跑步最重要的。'}
+  [2022-05-07 11:11:07,617] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:07,769] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:07,905] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,043] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,186] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,326] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,466] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:08,611] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是。'}
+  [2022-05-07 11:11:09,431] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,571] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,714] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,853] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:09,992] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:10,129] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:10,266] [    INFO] - client receive msg={'result': '我认为，跑步最重要的就是给。'}
+  [2022-05-07 11:11:11,113] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,296] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,439] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,582] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,727] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:11,869] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:12,011] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:12,153] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了。'}
+  [2022-05-07 11:11:12,969] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,137] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,297] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,456] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,615] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:13,776] [    INFO] - client receive msg={'result': '我认为跑步最重要的就是给我带来了身体健康。'}
+  [2022-05-07 11:11:18,915] [    INFO] - client final receive msg={'status': 'ok', 'signal': 'finished', 'result': '我认为跑步最重要的就是给我带来了身体健康。', 'times': [{'w': '我', 'bg': 0.0, 'ed': 0.7000000000000001}, {'w': '认', 'bg': 0.7000000000000001, 'ed': 0.84}, {'w': '为', 'bg': 0.84, 'ed': 1.0}, {'w': '跑', 'bg': 1.0, 'ed': 1.18}, {'w': '步', 'bg': 1.18, 'ed': 1.36}, {'w': '最', 'bg': 1.36, 'ed': 1.5}, {'w': '重', 'bg': 1.5, 'ed': 1.6400000000000001}, {'w': '要', 'bg': 1.6400000000000001, 'ed': 1.78}, {'w': '的', 'bg': 1.78, 'ed': 1.9000000000000001}, {'w': '就', 'bg': 1.9000000000000001, 'ed': 2.06}, {'w': '是', 'bg': 2.06, 'ed': 2.62}, {'w': '给', 'bg': 2.62, 'ed': 3.16}, {'w': '我', 'bg': 3.16, 'ed': 3.3200000000000003}, {'w': '带', 'bg': 3.3200000000000003, 'ed': 3.48}, {'w': '来', 'bg': 3.48, 'ed': 3.62}, {'w': '了', 'bg': 3.62, 'ed': 3.7600000000000002}, {'w': '身', 'bg': 3.7600000000000002, 'ed': 3.9}, {'w': '体', 'bg': 3.9, 'ed': 4.0600000000000005}, {'w': '健', 'bg': 4.0600000000000005, 'ed': 4.26}, {'w': '康', 'bg': 4.26, 'ed': 4.96}]}
+  [2022-05-07 11:11:18,915] [    INFO] - audio duration: 4.9968125, elapsed time: 15.928460597991943, RTF=3.187724293835709
+  [2022-05-07 11:11:18,916] [    INFO] - asr websocket client finished : 我认为跑步最重要的就是给我带来了身体健康
+  ```
+
+  
diff --git a/demos/streaming_asr_server/conf/application.yaml b/demos/streaming_asr_server/conf/application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e9a89c19d2ad08db9a6c41ec94bdf21be95125b0
--- /dev/null
+++ b/demos/streaming_asr_server/conf/application.yaml
@@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
diff --git a/demos/streaming_asr_server/conf/punc_application.yaml b/demos/streaming_asr_server/conf/punc_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f947525e16478cbbf739c0281cb2234467b82972
--- /dev/null
+++ b/demos/streaming_asr_server/conf/punc_application.yaml
@@ -0,0 +1,35 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8190
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_python']
+# protocol = ['http'] (only one can be selected). 
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['text_python']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### Text #########################################
+################### text task: punc; engine_type: python #######################
+text_python:
+    task: punc
+    model_type: 'ernie_linear_p3_wudao'
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    vocab_file: # [optional]
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+
+
+
+
diff --git a/demos/streaming_asr_server/conf/ws_application.yaml b/demos/streaming_asr_server/conf/ws_application.yaml
index dee8d78baa933f4447ea1a5afffc157fd70bfa7c..f2ea6330f690801182f457ba1170207a12e14b18 100644
--- a/demos/streaming_asr_server/conf/ws_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_application.yaml
@@ -7,8 +7,8 @@ host: 0.0.0.0
 port: 8090
 
 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_online', 'tts_online']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
@@ -29,6 +29,7 @@ asr_online:
     cfg_path: 
     decode_method: 
     force_yes: True
+    device: 'cpu' # cpu or gpu:id
 
     am_predictor_conf:
         device:  # set 'gpu:id' or 'cpu'
diff --git a/demos/streaming_asr_server/conf/ws_conformer_application.yaml b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
index 8f01148590697d2b0fec9141ca2ec09c8b946d00..2affde0739ff5873a88cbe621ebf907ab0663dcb 100644
--- a/demos/streaming_asr_server/conf/ws_conformer_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
@@ -7,8 +7,8 @@ host: 0.0.0.0
 port: 8090
 
 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_online', 'tts_online']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
@@ -29,7 +29,7 @@ asr_online:
     cfg_path: 
     decode_method: 
     force_yes: True
-    device: # cpu or gpu:id
+    device: 'cpu' # cpu or gpu:id
     am_predictor_conf:
         device:  # set 'gpu:id' or 'cpu'
         switch_ir_optim: True
@@ -42,4 +42,4 @@ asr_online:
         window_ms: 25   # ms
         shift_ms: 10    # ms
         sample_rate: 16000
-        sample_width: 2
\ No newline at end of file
+        sample_width: 2
diff --git a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e9a89c19d2ad08db9a6c41ec94bdf21be95125b0
--- /dev/null
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
@@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
diff --git a/demos/streaming_asr_server/punc_server.py b/demos/streaming_asr_server/punc_server.py
new file mode 100644
index 0000000000000000000000000000000000000000..eefa0fb407f44c5f9e2d6f8ac282a64c85ff5d3d
--- /dev/null
+++ b/demos/streaming_asr_server/punc_server.py
@@ -0,0 +1,38 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        prog='paddlespeech_server.start', add_help=True)
+    parser.add_argument(
+        "--config_file",
+        action="store",
+        help="yaml file of the app",
+        default=None,
+        required=True)
+
+    parser.add_argument(
+        "--log_file",
+        action="store",
+        help="log file",
+        default="./log/paddlespeech.log")
+    logger.info("start to parse the args")
+    args = parser.parse_args()
+
+    logger.info("start to launch the punctuation server")
+    punc_server = ServerExecutor()
+    punc_server(config_file=args.config_file, log_file=args.log_file)
diff --git a/demos/streaming_asr_server/server.sh b/demos/streaming_asr_server/server.sh
new file mode 100755
index 0000000000000000000000000000000000000000..4266f8c642c83ece8dc4a2dd29812acfad4d6f8a
--- /dev/null
+++ b/demos/streaming_asr_server/server.sh
@@ -0,0 +1,8 @@
+export CUDA_VISIBLE_DEVICE=0,1,2,3
+ export CUDA_VISIBLE_DEVICE=0,1,2,3
+
+# nohup python3 punc_server.py --config_file conf/punc_application.yaml > punc.log 2>&1 &
+paddlespeech_server start --config_file conf/punc_application.yaml &> punc.log &
+
+# nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log 2>&1 &
+paddlespeech_server start --config_file conf/ws_conformer_application.yaml &> streaming_asr.log  &
\ No newline at end of file
diff --git a/demos/streaming_asr_server/streaming_asr_server.py b/demos/streaming_asr_server/streaming_asr_server.py
new file mode 100644
index 0000000000000000000000000000000000000000..011b009aaf8b6736e5910ddca76df5f1ecdd56e0
--- /dev/null
+++ b/demos/streaming_asr_server/streaming_asr_server.py
@@ -0,0 +1,38 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        prog='paddlespeech_server.start', add_help=True)
+    parser.add_argument(
+        "--config_file",
+        action="store",
+        help="yaml file of the app",
+        default=None,
+        required=True)
+
+    parser.add_argument(
+        "--log_file",
+        action="store",
+        help="log file",
+        default="./log/paddlespeech.log")
+    logger.info("start to parse the args")
+    args = parser.parse_args()
+
+    logger.info("start to launch the streaming asr server")
+    streaming_asr_server = ServerExecutor()
+    streaming_asr_server(config_file=args.config_file, log_file=args.log_file)
diff --git a/demos/streaming_asr_server/test.sh b/demos/streaming_asr_server/test.sh
old mode 100644
new mode 100755
index fe8155cf347ead91a5956e3e575e9bf52d99af9a..4f43c6534f078683329a287bb87a1c79cff15b8f
--- a/demos/streaming_asr_server/test.sh
+++ b/demos/streaming_asr_server/test.sh
@@ -1,5 +1,12 @@
 # download the test wav
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav 
 
-# read the wav and pass it to service
-python3 websocket_client.py --wavfile ./zh.wav
+# read the wav and pass it to only streaming asr service
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --wavfile ./zh.wav
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --input ./zh.wav
+
+# read the wav and call streaming and punc service
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
\ No newline at end of file
diff --git a/demos/streaming_asr_server/web/templates/index.html b/demos/streaming_asr_server/web/templates/index.html
index 7aa227fb1d946894854131a0fb91305bd319eec0..56c630808567177993dfdb633a60a1d6c1299b4f 100644
--- a/demos/streaming_asr_server/web/templates/index.html
+++ b/demos/streaming_asr_server/web/templates/index.html
@@ -93,7 +93,7 @@
 
     function parseResult(data) {
       var data = JSON.parse(data)
-      var result = data.asr_results
+      var result = data.result
       console.log(result)
       $("#resultPanel").html(result)
     }
@@ -152,4 +152,4 @@
   </script>
 </body>
 
-</html>
\ No newline at end of file
+</html>
diff --git a/demos/streaming_asr_server/websocket_client.py b/demos/streaming_asr_server/websocket_client.py
index 2a15096c65bb96e45d5d1f51225491fbb866cadf..8a4fe330ac0e31d199cf2b436c1b4fa18e1d4a06 100644
--- a/demos/streaming_asr_server/websocket_client.py
+++ b/demos/streaming_asr_server/websocket_client.py
@@ -13,6 +13,9 @@
 # limitations under the License.
 #!/usr/bin/python
 # -*- coding: UTF-8 -*-
+
+# script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR,  "RTF", sum/NR}'
+
 import argparse
 import asyncio
 import codecs
@@ -20,22 +23,27 @@ import logging
 import os
 
 from paddlespeech.cli.log import logger
-from paddlespeech.server.utils.audio_handler import ASRAudioHandler
+from paddlespeech.server.utils.audio_handler import ASRWsAudioHandler
 
 
 def main(args):
     logger.info("asr websocket client start")
-    handler = ASRAudioHandler("127.0.0.1", 8090)
+    handler = ASRWsAudioHandler(
+        args.server_ip,
+        args.port,
+        endpoint=args.endpoint,
+        punc_server_ip=args.punc_server_ip,
+        punc_server_port=args.punc_server_port)
     loop = asyncio.get_event_loop()
 
     # support to process single audio file
     if args.wavfile and os.path.exists(args.wavfile):
         logger.info(f"start to process the wavscp: {args.wavfile}")
         result = loop.run_until_complete(handler.run(args.wavfile))
-        result = result["asr_results"]
+        result = result["result"]
         logger.info(f"asr websocket client finished : {result}")
 
-    # support to process batch audios from wav.scp 
+    # support to process batch audios from wav.scp
     if args.wavscp and os.path.exists(args.wavscp):
         logging.info(f"start to process the wavscp: {args.wavscp}")
         with codecs.open(args.wavscp, 'r', encoding='utf-8') as f,\
@@ -43,13 +51,33 @@ def main(args):
             for line in f:
                 utt_name, utt_path = line.strip().split()
                 result = loop.run_until_complete(handler.run(utt_path))
-                result = result["asr_results"]
+                result = result["result"]
                 w.write(f"{utt_name} {result}\n")
 
 
 if __name__ == "__main__":
     logger.info("Start to do streaming asr client")
     parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--server_ip', type=str, default='127.0.0.1', help='server ip')
+    parser.add_argument('--port', type=int, default=8090, help='server port')
+    parser.add_argument(
+        '--punc.server_ip',
+        type=str,
+        default=None,
+        dest="punc_server_ip",
+        help='Punctuation server ip')
+    parser.add_argument(
+        '--punc.port',
+        type=int,
+        default=8091,
+        dest="punc_server_port",
+        help='Punctuation server port')
+    parser.add_argument(
+        "--endpoint",
+        type=str,
+        default="/paddlespeech/asr/streaming",
+        help="ASR websocket endpoint")
     parser.add_argument(
         "--wavfile",
         action="store",
diff --git a/demos/streaming_tts_server/README.md b/demos/streaming_tts_server/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..775cd908603d7447a7587834f11a8b3248ae9f55
--- /dev/null
+++ b/demos/streaming_tts_server/README.md
@@ -0,0 +1,313 @@
+([简体中文](./README_cn.md)|English)
+
+# Streaming Speech Synthesis Service
+
+## Introduction
+This demo is an implementation of starting the streaming speech synthesis service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
+
+
+## Usage
+### 1. Installation
+see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+
+It is recommended to use **paddlepaddle 2.2.2** or above.
+You can choose one way from meduim and hard to install paddlespeech.
+
+
+### 2. Prepare config File
+The configuration file can be found in `conf/tts_online_application.yaml`.
+- `protocol` indicates the network protocol used by the streaming TTS service. Currently, both **http and websocket** are supported.
+- `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
+    - This demo mainly introduces the streaming speech synthesis service, so the speech task should be set to `tts`.
+    - the engine type supports two forms: **online**  and **online-onnx**. `online` indicates an engine that uses python for dynamic graph inference; `online-onnx` indicates an engine that uses onnxruntime for inference. The inference speed of online-onnx is faster.
+- Streaming TTS engine AM model support: **fastspeech2 and fastspeech2_cnndecoder**; Voc model support: **hifigan and mb_melgan**
+- In streaming am inference, one chunk of data is inferred at a time to achieve a streaming effect. Among them, `am_block` indicates the number of valid frames in the chunk, and `am_pad` indicates the number of frames added before and after am_block in a chunk. The existence of am_pad is used to eliminate errors caused by streaming inference and avoid the influence of streaming inference on the quality of synthesized audio.
+    - fastspeech2 does not support streaming am inference, so am_pad and am_block have no effect on it.
+    - fastspeech2_cnndecoder supports streaming inference. When am_pad=12, streaming inference synthesized audio is consistent with non-streaming synthesized audio.
+- In streaming voc inference, one chunk of data is inferred at a time to achieve a streaming effect. Where `voc_block` indicates the number of valid frames in the chunk, and `voc_pad` indicates the number of frames added before and after the voc_block in a chunk. The existence of voc_pad is used to eliminate errors caused by streaming inference and avoid the influence of streaming inference on the quality of synthesized audio.
+    - Both hifigan and mb_melgan support streaming voc inference.
+    - When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
+    - When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
+- Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
+- **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
+
+
+
+### 3. Streaming speech synthesis server and client using http protocol
+#### 3.1 Server Usage
+- Command Line (Recommended)
+
+  Start the service (the configuration file uses http by default):
+  ```bash
+  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
+  ```
+
+  Usage:
+  
+  ```bash
+  paddlespeech_server start --help
+  ```
+  Arguments:
+  - `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml
+  - `log_file`: log file. Default: ./log/paddlespeech.log
+
+  Output:
+  ```bash
+  [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
+  [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
+  [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
+  [2022-04-24 20:05:28,192] [    INFO] - **********************************************************************
+  INFO:     Started server process [14638]
+  [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
+  INFO:     Waiting for application startup.
+  [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/tts_online_application.yaml", 
+      log_file="./log/paddlespeech.log")
+  ```
+
+ Output:
+  ```bash
+  [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
+  [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
+  [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
+  [2022-04-24 21:00:17,151] [    INFO] - **********************************************************************
+  INFO:     Started server process [320]
+  [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
+  INFO:     Waiting for application startup.
+  [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+
+  ```
+
+#### 3.2 Streaming TTS client Usage
+- Command Line (Recommended)
+
+    Access http streaming TTS service:
+
+    If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+    ```bash
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    ```
+
+    Usage:
+  
+    ```bash
+    paddlespeech_client tts_online --help
+    ```
+
+    Arguments:
+    - `server_ip`: erver ip. Default: 127.0.0.1
+    - `port`: server port. Default: 8092
+    - `protocol`: Service protocol, choices: [http, websocket], default: http.
+    - `input`: (required): Input text to generate.
+    - `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
+    - `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
+    - `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
+    - `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
+    - `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
+    - `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
+    - `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.
+    
+    Output:
+    ```bash
+    [2022-04-24 21:08:18,559] [    INFO] - tts http client start
+    [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
+    [2022-04-24 21:08:21,704] [    INFO] - 尾包响应：3.1427218914031982 s
+    [2022-04-24 21:08:21,704] [    INFO] - 音频时长：3.825 s
+    [2022-04-24 21:08:21,704] [    INFO] - RTF: 0.8216266382753459
+    [2022-04-24 21:08:21,739] [    INFO] - 音频保存至：output.wav
+
+    ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
+  import json
+
+  executor = TTSOnlineClientExecutor()
+  executor(
+      input="您好，欢迎使用百度飞桨语音合成服务。",
+      server_ip="127.0.0.1",
+      port=8092,
+      protocol="http",
+      spk_id=0,
+      speed=1.0,
+      volume=1.0,
+      sample_rate=0,
+      output="./output.wav",
+      play=False)
+
+  ```
+
+  Output:
+  ```bash
+  [2022-04-24 21:11:13,798] [    INFO] - tts http client start
+  [2022-04-24 21:11:16,800] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+  [2022-04-24 21:11:16,801] [    INFO] - 首包响应：0.18234872817993164 s
+  [2022-04-24 21:11:16,801] [    INFO] - 尾包响应：3.0013909339904785 s
+  [2022-04-24 21:11:16,802] [    INFO] - 音频时长：3.825 s
+  [2022-04-24 21:11:16,802] [    INFO] - RTF: 0.7846773683635238
+  [2022-04-24 21:11:16,837] [    INFO] - 音频保存至：./output.wav
+  ```
+
+ 
+### 4. Streaming speech synthesis server and client using websocket protocol
+#### 4.1 Server Usage
+- Command Line (Recommended)
+  First modify the configuration file `conf/tts_online_application.yaml`, **set `protocol` to `websocket`**.
+  Start the service:
+  ```bash
+  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
+  ```
+
+  Usage:
+  
+  ```bash
+  paddlespeech_server start --help
+  ```
+  Arguments:
+  - `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml
+  - `log_file`: log file. Default: ./log/paddlespeech.log
+
+  Output:
+  ```bash
+    [2022-04-27 10:18:09,107] [    INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
+    [2022-04-27 10:18:09,219] [    INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
+    [2022-04-27 10:18:09,324] [    INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
+    [2022-04-27 10:18:09,325] [    INFO] - **********************************************************************
+    INFO:     Started server process [17600]
+    [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
+    INFO:     Waiting for application startup.
+    [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
+    INFO:     Application startup complete.
+    [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
+    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+    [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/tts_online_application.yaml", 
+      log_file="./log/paddlespeech.log")
+  ```
+
+  Output:
+  ```bash
+    [2022-04-27 10:20:16,660] [    INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
+    [2022-04-27 10:20:16,773] [    INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
+    [2022-04-27 10:20:16,878] [    INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
+    [2022-04-27 10:20:16,878] [    INFO] - **********************************************************************
+    INFO:     Started server process [23466]
+    [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
+    INFO:     Waiting for application startup.
+    [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
+    INFO:     Application startup complete.
+    [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
+    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+    [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+  ```
+
+#### 4.2 Streaming TTS client Usage
+- Command Line (Recommended)
+
+    Access websocket streaming TTS service:
+
+    If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+
+    ```bash
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    ```
+
+    Usage:
+  
+    ```bash
+    paddlespeech_client tts_online --help
+    ```
+
+    Arguments:
+    - `server_ip`: erver ip. Default: 127.0.0.1
+    - `port`: server port. Default: 8092
+    - `protocol`: Service protocol, choices: [http, websocket], default: http.
+    - `input`: (required): Input text to generate.
+    - `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
+    - `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
+    - `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
+    - `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
+    - `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
+    - `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
+    - `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.
+
+    
+    Output:
+    ```bash
+    [2022-04-27 10:21:04,262] [    INFO] - tts websocket client start
+    [2022-04-27 10:21:04,496] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-27 10:21:04,496] [    INFO] - 首包响应：0.2124948501586914 s
+    [2022-04-27 10:21:07,483] [    INFO] - 尾包响应：3.199106454849243 s
+    [2022-04-27 10:21:07,484] [    INFO] - 音频时长：3.825 s
+    [2022-04-27 10:21:07,484] [    INFO] - RTF: 0.8363677006141812
+    [2022-04-27 10:21:07,516] [    INFO] - 音频保存至：output.wav
+
+    ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
+  import json
+
+  executor = TTSOnlineClientExecutor()
+  executor(
+      input="您好，欢迎使用百度飞桨语音合成服务。",
+      server_ip="127.0.0.1",
+      port=8092,
+      protocol="websocket",
+      spk_id=0,
+      speed=1.0,
+      volume=1.0,
+      sample_rate=0,
+      output="./output.wav",
+      play=False)
+
+  ```
+
+  Output:
+  ```bash
+    [2022-04-27 10:22:48,852] [    INFO] - tts websocket client start
+    [2022-04-27 10:22:49,080] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-27 10:22:49,080] [    INFO] - 首包响应：0.21017956733703613 s
+    [2022-04-27 10:22:52,100] [    INFO] - 尾包响应：3.2304444313049316 s
+    [2022-04-27 10:22:52,101] [    INFO] - 音频时长：3.825 s
+    [2022-04-27 10:22:52,101] [    INFO] - RTF: 0.8445606356352762
+    [2022-04-27 10:22:52,134] [    INFO] - 音频保存至：./output.wav
+
+  ```
+
+
+
+  
diff --git a/demos/streaming_tts_server/README_cn.md b/demos/streaming_tts_server/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..9c2cc50ecca8b8c38d41593eefa5020cc67ab63b
--- /dev/null
+++ b/demos/streaming_tts_server/README_cn.md
@@ -0,0 +1,312 @@
+(简体中文|[English](./README.md))
+
+# 流式语音合成服务
+
+## 介绍
+这个demo是一个启动流式语音合成服务和访问该服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
+
+
+## 使用方法
+### 1. 安装
+请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+
+推荐使用 **paddlepaddle 2.2.2** 或以上版本。
+你可以从 medium，hard 两种方式中选择一种方式安装 PaddleSpeech。
+
+
+### 2. 准备配置文件
+配置文件可参见 `conf/tts_online_application.yaml` 。
+- `protocol` 表示该流式 TTS 服务使用的网络协议，目前支持 **http 和 websocket** 两种。
+- `engine_list` 表示即将启动的服务将会包含的语音引擎，格式为 <语音任务>_<引擎类型>。
+    - 该 demo 主要介绍流式语音合成服务，因此语音任务应设置为 tts。
+    - 目前引擎类型支持两种形式：**online** 表示使用python进行动态图推理的引擎；**online-onnx** 表示使用 onnxruntime 进行推理的引擎。其中，online-onnx 的推理速度更快。
+- 流式 TTS 引擎的 AM 模型支持：**fastspeech2 以及fastspeech2_cnndecoder**; Voc 模型支持：**hifigan, mb_melgan**
+- 流式 am 推理中，每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `am_block` 表示 chunk 中的有效帧数，`am_pad` 表示一个 chunk 中 am_block 前后各加的帧数。am_pad 的存在用于消除流式推理产生的误差，避免由流式推理对合成音频质量的影响。
+    - fastspeech2 不支持流式 am 推理，因此 am_pad 与 m_block 对它无效
+    - fastspeech2_cnndecoder 支持流式推理，当 am_pad=12 时，流式推理合成音频与非流式合成音频一致
+- 流式 voc 推理中，每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `voc_block` 表示chunk中的有效帧数，`voc_pad` 表示一个 chunk 中 voc_block 前后各加的帧数。voc_pad 的存在用于消除流式推理产生的误差，避免由流式推理对合成音频质量的影响。
+    - hifigan, mb_melgan 均支持流式 voc 推理
+    - 当 voc 模型为 mb_melgan，当 voc_pad=14 时，流式推理合成音频与非流式合成音频一致；voc_pad 最小可以设置为7，合成音频听感上没有异常，若 voc_pad 小于7，合成音频听感上存在异常。
+    - 当 voc 模型为 hifigan，当 voc_pad=20 时，流式推理合成音频与非流式合成音频一致；当 voc_pad=14 时，合成音频听感上没有异常。
+- 推理速度：mb_melgan > hifigan; 音频质量：mb_melgan < hifigan
+- **注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。
+
+
+### 3. 使用http协议的流式语音合成服务端及客户端使用方法
+#### 3.1 服务端使用方法
+- 命令行 (推荐使用)
+
+  启动服务（配置文件默认使用http）：
+  ```bash
+  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
+  ```
+
+  使用方法：
+  
+  ```bash
+  paddlespeech_server start --help
+  ```
+  参数:
+  - `config_file`: 服务的配置文件，默认： ./conf/tts_online_application.yaml
+  - `log_file`: log 文件. 默认：./log/paddlespeech.log
+
+  输出:
+  ```bash
+  [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
+  [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
+  [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
+  [2022-04-24 20:05:28,192] [    INFO] - **********************************************************************
+  INFO:     Started server process [14638]
+  [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
+  INFO:     Waiting for application startup.
+  [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/tts_online_application.yaml", 
+      log_file="./log/paddlespeech.log")
+  ```
+
+  输出：
+  ```bash
+  [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
+  [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
+  [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
+  [2022-04-24 21:00:17,151] [    INFO] - **********************************************************************
+  INFO:     Started server process [320]
+  [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
+  INFO:     Waiting for application startup.
+  [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+
+  ```
+
+#### 3.2 客户端使用方法
+- 命令行 (推荐使用)
+
+    访问 http 流式TTS服务：
+
+    若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+    ```bash
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    ```
+
+    使用帮助:
+  
+    ```bash
+    paddlespeech_client tts_online --help
+    ```
+
+    参数:
+    - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+    - `port`: 服务端口，默认: 8092。
+    - `protocol`: 服务协议，可选 [http, websocket], 默认: http。
+    - `input`: (必须输入): 待合成的文本。
+    - `spk_id`: 说话人 id，用于多说话人语音合成，默认值： 0。
+    - `speed`: 音频速度，该值应设置在 0 到 3 之间。 默认值：1.0
+    - `volume`: 音频音量，该值应设置在 0 到 3 之间。 默认值： 1.0
+    - `sample_rate`: 采样率，可选 [0, 8000, 16000]，默认值：0，表示与模型采样率相同
+    - `output`: 输出音频的路径， 默认值：None，表示不保存音频到本地。
+    - `play`: 是否播放音频，边合成边播放， 默认值：False，表示不播放。**播放音频需要依赖pyaudio库**。
+    - `spk_id, speed, volume, sample_rate` 在流式语音合成服务中暂时不生效。
+
+    
+    输出:
+    ```bash
+    [2022-04-24 21:08:18,559] [    INFO] - tts http client start
+    [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
+    [2022-04-24 21:08:21,704] [    INFO] - 尾包响应：3.1427218914031982 s
+    [2022-04-24 21:08:21,704] [    INFO] - 音频时长：3.825 s
+    [2022-04-24 21:08:21,704] [    INFO] - RTF: 0.8216266382753459
+    [2022-04-24 21:08:21,739] [    INFO] - 音频保存至：output.wav
+
+    ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
+  import json
+
+  executor = TTSOnlineClientExecutor()
+  executor(
+      input="您好，欢迎使用百度飞桨语音合成服务。",
+      server_ip="127.0.0.1",
+      port=8092,
+      protocol="http",
+      spk_id=0,
+      speed=1.0,
+      volume=1.0,
+      sample_rate=0,
+      output="./output.wav",
+      play=False)
+
+  ```
+
+  输出:
+  ```bash
+  [2022-04-24 21:11:13,798] [    INFO] - tts http client start
+  [2022-04-24 21:11:16,800] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+  [2022-04-24 21:11:16,801] [    INFO] - 首包响应：0.18234872817993164 s
+  [2022-04-24 21:11:16,801] [    INFO] - 尾包响应：3.0013909339904785 s
+  [2022-04-24 21:11:16,802] [    INFO] - 音频时长：3.825 s
+  [2022-04-24 21:11:16,802] [    INFO] - RTF: 0.7846773683635238
+  [2022-04-24 21:11:16,837] [    INFO] - 音频保存至：./output.wav
+  ```
+
+ 
+### 4. 使用websocket协议的流式语音合成服务端及客户端使用方法
+#### 4.1 服务端使用方法
+- 命令行 (推荐使用)
+  首先修改配置文件 `conf/tts_online_application.yaml`， **将 `protocol` 设置为 `websocket`**。
+  启动服务：
+  ```bash
+  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
+  ```
+
+  使用方法：
+  
+  ```bash
+  paddlespeech_server start --help
+  ```
+  参数:
+  - `config_file`: 服务的配置文件，默认： ./conf/tts_online_application.yaml
+  - `log_file`: log 文件. 默认：./log/paddlespeech.log
+
+  输出:
+  ```bash
+    [2022-04-27 10:18:09,107] [    INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
+    [2022-04-27 10:18:09,219] [    INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
+    [2022-04-27 10:18:09,324] [    INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
+    [2022-04-27 10:18:09,325] [    INFO] - **********************************************************************
+    INFO:     Started server process [17600]
+    [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
+    INFO:     Waiting for application startup.
+    [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
+    INFO:     Application startup complete.
+    [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
+    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+    [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+
+  server_executor = ServerExecutor()
+  server_executor(
+      config_file="./conf/tts_online_application.yaml", 
+      log_file="./log/paddlespeech.log")
+  ```
+
+  输出：
+  ```bash
+    [2022-04-27 10:20:16,660] [    INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
+    [2022-04-27 10:20:16,773] [    INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
+    [2022-04-27 10:20:16,878] [    INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
+    [2022-04-27 10:20:16,878] [    INFO] - **********************************************************************
+    INFO:     Started server process [23466]
+    [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
+    INFO:     Waiting for application startup.
+    [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
+    INFO:     Application startup complete.
+    [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
+    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+    [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+
+  ```
+
+#### 4.2 客户端使用方法
+- 命令行 (推荐使用)
+
+    访问 websocket 流式TTS服务：
+
+    若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
+
+    ```bash
+    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+    ```
+
+    使用帮助:
+  
+    ```bash
+    paddlespeech_client tts_online --help
+    ```
+
+    参数:
+    - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+    - `port`: 服务端口，默认: 8092。
+    - `protocol`: 服务协议，可选 [http, websocket], 默认: http。
+    - `input`: (必须输入): 待合成的文本。
+    - `spk_id`: 说话人 id，用于多说话人语音合成，默认值： 0。
+    - `speed`: 音频速度，该值应设置在 0 到 3 之间。 默认值：1.0
+    - `volume`: 音频音量，该值应设置在 0 到 3 之间。 默认值： 1.0
+    - `sample_rate`: 采样率，可选 [0, 8000, 16000]，默认值：0，表示与模型采样率相同
+    - `output`: 输出音频的路径， 默认值：None，表示不保存音频到本地。
+    - `play`: 是否播放音频，边合成边播放， 默认值：False，表示不播放。**播放音频需要依赖pyaudio库**。
+    - `spk_id, speed, volume, sample_rate` 在流式语音合成服务中暂时不生效。
+
+    
+    输出:
+    ```bash
+    [2022-04-27 10:21:04,262] [    INFO] - tts websocket client start
+    [2022-04-27 10:21:04,496] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-27 10:21:04,496] [    INFO] - 首包响应：0.2124948501586914 s
+    [2022-04-27 10:21:07,483] [    INFO] - 尾包响应：3.199106454849243 s
+    [2022-04-27 10:21:07,484] [    INFO] - 音频时长：3.825 s
+    [2022-04-27 10:21:07,484] [    INFO] - RTF: 0.8363677006141812
+    [2022-04-27 10:21:07,516] [    INFO] - 音频保存至：output.wav
+
+    ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
+  import json
+
+  executor = TTSOnlineClientExecutor()
+  executor(
+      input="您好，欢迎使用百度飞桨语音合成服务。",
+      server_ip="127.0.0.1",
+      port=8092,
+      protocol="websocket",
+      spk_id=0,
+      speed=1.0,
+      volume=1.0,
+      sample_rate=0,
+      output="./output.wav",
+      play=False)
+
+  ```
+
+  输出:
+  ```bash
+    [2022-04-27 10:22:48,852] [    INFO] - tts websocket client start
+    [2022-04-27 10:22:49,080] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+    [2022-04-27 10:22:49,080] [    INFO] - 首包响应：0.21017956733703613 s
+    [2022-04-27 10:22:52,100] [    INFO] - 尾包响应：3.2304444313049316 s
+    [2022-04-27 10:22:52,101] [    INFO] - 音频时长：3.825 s
+    [2022-04-27 10:22:52,101] [    INFO] - RTF: 0.8445606356352762
+    [2022-04-27 10:22:52,134] [    INFO] - 音频保存至：./output.wav
+
+  ```
+
+
+  
diff --git a/demos/streaming_tts_server/conf/tts_online_application.yaml b/demos/streaming_tts_server/conf/tts_online_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..964e85ef95a80db29a35ee9a69e5909c1aef70d8
--- /dev/null
+++ b/demos/streaming_tts_server/conf/tts_online_application.yaml
@@ -0,0 +1,103 @@
+# This is the parameter configuration file for streaming tts server.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8092
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# engine_list choices = ['tts_online', 'tts_online-onnx'], the inference speed of tts_online-onnx is faster than tts_online.
+# protocol choices = ['websocket', 'http'] 
+protocol: 'http'
+engine_list: ['tts_online-onnx']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### TTS #########################################
+################### speech task: tts; engine_type: online #######################
+tts_online: 
+    # am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']   
+    # fastspeech2_cnndecoder_csmsc support streaming am infer.     
+    am: 'fastspeech2_csmsc'   
+    am_config: 
+    am_ckpt: 
+    am_stat: 
+    phones_dict: 
+    tones_dict: 
+    speaker_dict: 
+    spk_id: 0
+
+    # voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
+    # Both mb_melgan_csmsc and hifigan_csmsc support streaming voc inference
+    voc: 'mb_melgan_csmsc'
+    voc_config: 
+    voc_ckpt: 
+    voc_stat: 
+
+    # others
+    lang: 'zh'
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+    # am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
+    # when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
+    am_block: 72
+    am_pad: 12
+    # voc_pad and voc_block voc model to streaming voc infer,
+    # when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    voc_block: 36
+    voc_pad: 14
+    
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### TTS #########################################
+################### speech task: tts; engine_type: online-onnx #######################
+tts_online-onnx: 
+    # am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']
+    # fastspeech2_cnndecoder_csmsc_onnx support streaming am infer.        
+    am: 'fastspeech2_cnndecoder_csmsc_onnx' 
+    # am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
+    # if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
+    am_ckpt:   # list
+    am_stat: 
+    phones_dict: 
+    tones_dict: 
+    speaker_dict: 
+    spk_id: 0
+    am_sample_rate: 24000
+    am_sess_conf:
+        device: "cpu" # set 'gpu:id' or 'cpu'
+        use_trt: False
+        cpu_threads: 4
+
+    # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
+    # Both mb_melgan_csmsc_onnx and hifigan_csmsc_onnx support streaming voc inference
+    voc: 'hifigan_csmsc_onnx'
+    voc_ckpt: 
+    voc_sample_rate: 24000
+    voc_sess_conf:
+        device: "cpu" # set 'gpu:id' or 'cpu'
+        use_trt: False
+        cpu_threads: 4
+
+    # others
+    lang: 'zh'
+    # am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
+    # when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
+    am_block: 72
+    am_pad: 12
+    # voc_pad and voc_block voc model to streaming voc infer,
+    # when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc_onnx, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    voc_block: 36
+    voc_pad: 14
+    # voc_upsample should be same as n_shift on voc config.
+    voc_upsample: 300
+    
diff --git a/demos/streaming_tts_server/start_server.sh b/demos/streaming_tts_server/start_server.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9c71f2fe22d01656553a56ab2d9b628bfdb54a2c
--- /dev/null
+++ b/demos/streaming_tts_server/start_server.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+# start server
+paddlespeech_server start --config_file ./conf/tts_online_application.yaml
\ No newline at end of file
diff --git a/demos/streaming_tts_server/test_client.sh b/demos/streaming_tts_server/test_client.sh
new file mode 100644
index 0000000000000000000000000000000000000000..bd88f20b1bce760437c5c2bf7ffa85624b5490e2
--- /dev/null
+++ b/demos/streaming_tts_server/test_client.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+# http client test
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
+
+# websocket client test
+# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
+# paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
diff --git a/docs/source/asr/PPASR.md b/docs/source/asr/PPASR.md
new file mode 100644
index 0000000000000000000000000000000000000000..3779434e3d5eb05d65fd89dc54c4d2cc329c8b39
--- /dev/null
+++ b/docs/source/asr/PPASR.md
@@ -0,0 +1,96 @@
+([简体中文](./PPASR_cn.md)|English)
+# PP-ASR
+
+## Catalogue
+- [1. Introduction](#1)
+- [2. Characteristic](#2)
+- [3. Tutorials](#3)
+    - [3.1 Pre-trained Models](#31)
+    - [3.2 Training](#32)
+    - [3.3 Inference](#33)
+    - [3.4 Service Deployment](#33)
+    - [3.5 Customized Auto Speech Recognition and Deployment](#33)
+- [4. Quick Start](#4)
+
+<a name="1"></a>
+## 1. Introduction
+
+PP-ASR is a tool to provide ASR(Automatic speech recognition) function. It provides a variety of Chinese and English models and supports model training. It also supports model inference using the command line. In addition, PP-ASR supports the deployment of streaming models and customized ASR.
+
+<a name="2"></a>
+## 2. Characteristic
+The basic process of ASR is shown in the figure below:  
+<center><img src=https://user-images.githubusercontent.com/87408988/168259962-cbe2008b-47b6-443d-9566-d77a5ca2eb25.png width="800" ></center>
+
+
+The main characteristics of PP-ASR are shown below:
+-  Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models include deepspeech2 and conformer/transformer.
+-  Support model training on Chinese/English datasets.
+-  Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference. 
+-  Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function.
+-  Support customized auto speech recognition and deployment.
+
+<a name="3"></a>
+## 3. Tutorials
+
+<a name="31"></a>
+## 3.1 Pre-trained Models
+The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md).  
+The model with good effect are Ds2 Online Wenetspeech ASR0 Model and Conformer Online Wenetspeech ASR1 Model. Both two models support streaming ASR.  
+For more information about model design, you can refer to the aistudio tutorial:
+- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807)
+- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110)
+
+<a name="32"></a>
+## 3.2 Training
+The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer).
+The specific steps of executing the script are recorded in `run.sh`.
+
+For more information, you can refer to [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1)
+
+
+<a name="33"></a>
+## 3.3 Inference
+
+PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`.
+
+Specific supported functions include:
+
+- Prediction of single audio
+- Use the pipe to predict multiple audio
+- Support RTF calculation
+
+For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) 
+
+
+<a name="34"></a>
+## 3.4 Service Deployment
+
+PP-ASR supports the service deployment of streaming ASR. Support the simultaneous use of speech recognition and punctuation processing.
+
+Demo of ASR Server: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server)
+
+![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png)
+
+Display of using ASR server on Web page: [streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html)
+
+
+For more information about service deployment, you can refer to the aistudio tutorial:
+- [Streaming service - model part](https://aistudio.baidu.com/aistudio/projectdetail/3839884)
+- [Streaming service](https://aistudio.baidu.com/aistudio/projectdetail/4017905)
+
+<a name="35"></a>
+## 3.5 Customized Auto Speech Recognition and Deployment
+
+For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference model（Scoring Library）=> C++ program of TLG（WFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)   
+If you want to quickly use it, you can refer to [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md)
+
+For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial:
+- [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561)
+
+
+<a name="4"></a>
+
+## 4. Quick Start
+
+To use PP-ASR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method.
diff --git a/docs/source/asr/PPASR_cn.md b/docs/source/asr/PPASR_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..82b1c1d374ae5dac968f6c6f7583cb4ad487cfdf
--- /dev/null
+++ b/docs/source/asr/PPASR_cn.md
@@ -0,0 +1,96 @@
+(简体中文|[English](./PPASR.md))
+# PP-ASR
+
+## 目录
+- [1. 简介](#1)
+- [2. 特点](#2)
+- [3. 使用教程](#3)
+    - [3.1 预训练模型](#31)
+    - [3.2 模型训练](#32)
+    - [3.3 模型推理](#33)
+    - [3.4 服务部署](#33)
+    - [3.5 支持个性化场景部署](#33)
+- [4. 快速开始](#4)
+
+<a name="1"></a>
+## 1. 简介
+
+PP-ASR 是一个 提供 ASR 功能的工具。其提供了多种中文和英文的模型，支持模型的训练，并且支持使用命令行的方式进行模型的推理。 PP-ASR 也支持流式模型的部署，以及个性化场景的部署。
+
+<a name="2"></a>
+## 2. 特点
+语音识别的基本流程如下图所示：  
+<center><img src=https://user-images.githubusercontent.com/87408988/168259962-cbe2008b-47b6-443d-9566-d77a5ca2eb25.png width="800" ></center>
+
+
+PP-ASR 的主要特点如下：
+-  提供在中/英文开源数据集 aishell （中文），wenetspeech（中文），librispeech （英文）上的预训练模型。模型包含 deepspeech2 模型以及 conformer/transformer 模型。
+-  支持中/英文的模型训练功能。
+-  支持命令行方式的模型推理，可使用 `paddlespeech asr --model xxx --input xxx.wav` 方式调用各个预训练模型进行推理。
+-  支持流式 ASR 的服务部署，也支持输出时间戳。
+-  支持个性化场景的部署。
+
+<a name="3"></a>
+## 3. 使用教程
+
+<a name="31"></a>
+## 3.1 预训练模型
+支持的预训练模型列表：[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。
+其中效果较好的模型为 Ds2 Online Wenetspeech ASR0 Model 以及 Conformer Online Wenetspeech ASR1 Model。 两个模型都支持流式 ASR。
+更多关于模型设计的部分，可以参考 AIStudio 教程：
+- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807)
+- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110)
+
+<a name="32"></a>
+## 3.2 模型训练
+
+模型的训练的参考脚本存放在 [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) 中，并按照 `examples/数据集/模型` 存放，数据集主要支持 aishell 和 librispeech，模型支持 deepspeech2 模型和 u2 (conformer/transformer) 模型。
+具体的执行脚本的步骤记录在 `run.sh` 当中。具体可参考： [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1)
+
+
+<a name="33"></a>
+## 3.3 模型推理
+
+PP-ASR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。
+
+具体支持的功能包括：
+
+- 对单条音频进行预测
+- 使用管道的方式对多条音频进行预测
+- 支持 RTF 的计算
+
+具体的使用方式可以参考： [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) 
+
+
+<a name="34"></a>
+## 3.4 服务部署
+
+PP-ASR 支持流式ASR的服务部署。支持 语音识别 + 标点处理两个功能同时使用。
+
+server 的 demo： [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server)
+
+![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png)
+
+网页上使用 asr server 的效果展示：[streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html)
+
+关于服务部署方面的更多资料，可以参考 AIStudio 教程：
+- [流式服务-模型部分](https://aistudio.baidu.com/aistudio/projectdetail/3839884)
+- [流式服务](https://aistudio.baidu.com/aistudio/projectdetail/4017905)
+
+<a name="35"></a>
+## 3.5 支持个性化场景部署
+
+针对个性化场景部署，提供了特征提取（fbank） => 推理模型（打分库）=> TLG（WFST， token, lexion, grammer）的 C++ 程序。具体参考 [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)。  
+如果想快速了解和使用，可以参考： [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md)
+
+关于支持个性化场景部署的更多资料，可以参考 AIStudio 教程：
+- [定制化识别](https://aistudio.baidu.com/aistudio/projectdetail/4021561)
+
+
+<a name="4"></a>
+
+## 4. 快速开始
+
+关于如果使用 PP-ASR，可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)，其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能，可以用 **简单** 安装方式。
+
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 7f9c87bdbb8351dd595c0d1df9480d699946f1e3..fc1649eb3c7173b63f5c1a036a09f8bf15fe65d2 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -54,7 +54,9 @@ Contents
    :caption: Demos
 
    demo_video
+   streaming_asr_demo_video
    tts_demo_video
+   streaming_tts_demo_video
 
 
 .. toctree::
diff --git a/docs/source/install.md b/docs/source/install.md
index bdeb37cec23de92941cbff2dd971a9f301c702e0..43cc784ccac744a5d9f00266912cdece7bf37f6f 100644
--- a/docs/source/install.md
+++ b/docs/source/install.md
@@ -139,7 +139,7 @@ pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple
 To avoid the trouble of environment setup, running in a Docker container is highly recommended. Otherwise, if you work on `Ubuntu` with `root` privilege, you can still complete the installation.
 
 ### Choice 1: Running in Docker Container (Recommend)
-Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with all the dependencies installed. This Docker image requires the support of NVIDIA GPU, so please make sure its availability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed.
+Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with dependencies of cuda and cudnn installed. This Docker image requires the support of NVIDIA GPU, so please make sure its availability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed.
 
 Take several steps to launch the Docker image:
 - Download the Docker image
diff --git a/docs/source/reference.md b/docs/source/reference.md
index f1a02d20009e789581369076e9e094d226539e38..ed91c2066f2fe8c4a6470e45ac094d6187472a41 100644
--- a/docs/source/reference.md
+++ b/docs/source/reference.md
@@ -13,6 +13,7 @@ We borrowed a lot of code from these repos to build `model` and `engine`, thanks
 - Apache-2.0 License
 - U2 model
 - Building TLG based Graph
+- websocket server & client
 
 * [kaldi](https://github.com/kaldi-asr/kaldi/blob/master/COPYING)
 - Apache-2.0 License
diff --git a/docs/source/released_model.md b/docs/source/released_model.md
index baa4ff456d31662b533a3b02048eb5049be46d36..74435ae1a2f0973f8675a4830e8465fe05ac380e 100644
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@@ -6,8 +6,10 @@
 ### Speech Recognition Model
 Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | Example Link 
 :-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----:  | :-----:  | :-----: 
-[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 345 MB  | 2 Conv + 5 LSTM layers with only forward direction | 0.078 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) 
+[Ds2 Online Wenetspeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz) | Wenetspeech Dataset | Char-based | 1.2 GB  | 2 Conv + 5 LSTM layers | 0.152 (test\_net, w/o LM) <br> 0.2417 (test\_meeting, w/o LM) <br> 0.053 (aishell, w/ LM) |-| 10000 h |- 
+[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz) | Aishell Dataset | Char-based | 491 MB  | 2 Conv + 5 LSTM layers | 0.0666 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) 
 [Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) 
+[Conformer Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz) | WenetSpeech Dataset | Char-based | 457 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.11 (test\_net) 0.1879 (test\_meeting) |-| 10000 h |- 
 [Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) 
 [Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0464 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) 
 [Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer  Aishell ASR1](../../examples/aishell/asr1) 
diff --git a/docs/source/streaming_asr_demo_video.rst b/docs/source/streaming_asr_demo_video.rst
new file mode 100644
index 0000000000000000000000000000000000000000..6c96fea0427053835a3741d66e011c0a552ca2e1
--- /dev/null
+++ b/docs/source/streaming_asr_demo_video.rst
@@ -0,0 +1,10 @@
+Streaming ASR Demo Video
+==================
+
+.. raw:: html
+     
+    <video controls width="1024">
+
+    <source src="https://paddlespeech.bj.bcebos.com/demos/asr_demos/streaming_ASR_slice.mp4" type="video/mp4">
+    Sorry, your browser doesn't support embedded videos.
+    </video>
diff --git a/docs/source/streaming_tts_demo_video.rst b/docs/source/streaming_tts_demo_video.rst
new file mode 100644
index 0000000000000000000000000000000000000000..3ad9ca6cffcde731e54c57012e294114d7acf1e9
--- /dev/null
+++ b/docs/source/streaming_tts_demo_video.rst
@@ -0,0 +1,12 @@
+Streaming TTS Demo Video
+==================
+
+.. raw:: html
+     
+    <video controls width="1024">
+
+    <source src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/streaming_tts_demo.mp4"
+            type="video/mp4">
+    Sorry, your browser doesn't support embedded videos.
+    </video>
+
diff --git a/docs/source/tts/PPTTS.md b/docs/source/tts/PPTTS.md
new file mode 100644
index 0000000000000000000000000000000000000000..ef0baa07d62f69b4c573a9052a63231dae718df0
--- /dev/null
+++ b/docs/source/tts/PPTTS.md
@@ -0,0 +1,76 @@
+([简体中文](./PPTTS_cn.md)|English)
+
+# PPTTS
+
+- [1. Introduction](#1)
+- [2. Characteristic](#2)
+- [3. Benchmark](#3)
+- [4. Demo](#4)
+- [5. Tutorials](#5)
+    - [5.1 Training and Inference Optimization](#51)
+    - [5.2 Characteristic APPs of TTS](#52)
+    - [5.3 TTS Server](#53)
+
+<a name="1"></a>
+## 1. Introduction
+
+PP-TTS is a streaming speech synthesis system developed by PaddleSpeech. Based on the implementation of [SOTA Algorithms](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#text-to-speech-models), a faster inference engine is used to realize streaming speech synthesis technology to meet the needs of commercial speech interaction scenarios.
+
+#### PP-TTS
+Pipline of TTS：
+<center><img src=https://ai-studio-static-online.cdn.bcebos.com/ea69ae1faff84940a59c7079d16b3a8db2741d2c423846f68822f4a7f28726e9 width="600" ></center>
+
+PP-TTS provides a Chinese streaming speech synthesis system based on FastSpeech2 and HiFiGAN by default:
+
+- Text Frontend： The rule-based Chinese text frontend system is adopted to optimize Chinese text such as text normalization, polyphony, and tone sandhi.
+- Acoustic Model: The decoder of FastSpeech2 is improved so that it can be stream synthesized
+- Vocoder: Streaming synthesis of GAN vocoder is supported
+- Inference Engine： Using ONNXRuntime to optimize the inference of TTS models, so that the TTS system can also achieve RTF < 1 on low-voltage, meeting the requirements of streaming synthesis
+
+<a name="2"></a>
+## 2. Characteristic
+- Open source leading Chinese TTS system
+- Using ONNXRuntime to optimize the inference of TTS models
+- The only open-source streaming TTS system
+- Easy disassembly: Developers can easily replace different acoustic models and vocoders in different languages, use different inference engines (Paddle dynamic graph, PaddleInference, ONNXRuntime, etc.), and use different network services (HTTP, WebSocket)
+
+<a name="3"></a>
+## 3. Benchmark
+PaddleSpeech TTS models' benchmark: [TTS-Benchmark](https://github.com/PaddlePaddle/PaddleSpeech/wiki/TTS-Benchmark)。
+
+<a name="4"></a>
+## 4. Demo 
+See: [Streaming TTS Demo Video](https://paddlespeech.readthedocs.io/en/latest/streaming_tts_demo_video.html)
+
+<a name="5"></a>
+## 5. Tutorials
+
+<a name="51"></a>
+### 5.1 Training and Inference Optimization
+
+Default FastSpeech2: [tts3/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run.sh)
+
+Streaming FastSpeech2: [tts3/run_cnndecoder.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run_cnndecoder.sh)
+
+HiFiGAN：[voc5/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/voc5/run.sh)
+
+<a name="52"></a>
+### 5.2 Characteristic APPs of TTS
+text_to_speech - convert text into speech: [text_to_speech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/text_to_speech)
+
+style_fs2 - multi style control for FastSpeech2 model: [style_fs2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/style_fs2)
+
+story talker - book reader based on OCR and TTS: [story_talker](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/story_talker)
+
+metaverse - 2D AR with TTS: [metaverse](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/metaverse)
+
+<a name="53"></a>
+### 5.3 TTS Server
+
+Non-streaming TTS Server: [speech_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
+
+Streaming TTS Server: [streaming_tts_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_tts_server)
+
+
+For more tutorials please see: [PP-TTS：流式语音合成原理及服务部署
+](https://aistudio.baidu.com/aistudio/projectdetail/3885352)
diff --git a/docs/source/tts/PPTTS_cn.md b/docs/source/tts/PPTTS_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..2b650d62eff3028368620a3ad97403c624a0d3b7
--- /dev/null
+++ b/docs/source/tts/PPTTS_cn.md
@@ -0,0 +1,76 @@
+(简体中文|[English](./PPTTS.md))
+
+# PP-TTS
+
+- [1. 简介](#1)
+- [2. 特性](#2)
+- [3. Benchmark](#3)
+- [4. 效果展示](#4)
+- [5. 使用教程](#5)
+    - [5.1 模型训练与推理优化](#51)
+    - [5.2 语音合成特色应用](#52)
+    - [5.3 语音合成服务搭建](#53)
+
+<a name="1"></a>
+## 1. 简介
+
+PP-TTS 是 PaddleSpeech 自研的流式语音合成系统。在实现[前沿算法](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#text-to-speech-models)的基础上，使用了更快的推理引擎，实现了流式语音合成技术，使其满足商业语音交互场景的需求。
+
+#### PP-TTS
+语音合成基本流程如下图所示：
+<center><img src=https://ai-studio-static-online.cdn.bcebos.com/ea69ae1faff84940a59c7079d16b3a8db2741d2c423846f68822f4a7f28726e9 width="600" ></center>
+
+PP-TTS 默认提供基于 FastSpeech2 声学模型和 HiFiGAN 声码器的中文流式语音合成系统：
+
+- 文本前端：采用基于规则的中文文本前端系统，对文本正则、多音字、变调等中文文本场景进行了优化。
+- 声学模型：对 FastSpeech2 模型的 Decoder 进行改进，使其可以流式合成
+- 声码器：支持对 GAN Vocoder 的流式合成
+- 推理引擎：使用 ONNXRuntime 推理引擎优化模型推理性能，使得语音合成系统在低压 CPU 上也能达到 RTF<1，满足流式合成的要求
+
+<a name="2"></a>
+## 2. 特性
+- 开源领先的中文语音合成系统
+- 使用 ONNXRuntime 推理引擎优化模型推理性能
+- 唯一开源的流式语音合成系统
+- 易拆卸性：可以很方便地更换不同语种上的不同声学模型和声码器、使用不同的推理引擎（Paddle 动态图、PaddleInference 和 ONNXRuntime 等）、使用不同的网络服务（HTTP、Websocket）
+
+<a name="3"></a>
+## 3. Benchmark
+PaddleSpeech TTS 模型之间的性能对比，请查看 [TTS-Benchmark](https://github.com/PaddlePaddle/PaddleSpeech/wiki/TTS-Benchmark)。
+
+<a name="4"></a>
+## 4. 效果展示 
+请参考：[Streaming TTS Demo Video](https://paddlespeech.readthedocs.io/en/latest/streaming_tts_demo_video.html)
+
+<a name="5"></a>
+## 5. 使用教程
+
+<a name="51"></a>
+### 5.1 模型训练与推理优化
+
+Default FastSpeech2：[tts3/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run.sh)
+
+流式 FastSpeech2：[tts3/run_cnndecoder.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run_cnndecoder.sh)
+
+HiFiGAN：[voc5/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/voc5/run.sh)
+
+<a name="52"></a>
+### 5.2 语音合成特色应用
+一键式实现语音合成：[text_to_speech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/text_to_speech)
+
+个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成：[style_fs2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/style_fs2)
+
+会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书：[story_talker](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/story_talker)
+
+元宇宙 - 基于语音合成的 2D 增强现实：[metaverse](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/metaverse)
+
+<a name="53"></a>
+### 5.3 语音合成服务搭建
+
+一键式搭建非流式语音合成服务：[speech_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
+
+一键式搭建流式语音合成服务：[streaming_tts_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_tts_server)
+
+
+更多教程，包括模型设计、模型训练、推理部署等，请参考 AIStudio 教程：[PP-TTS：流式语音合成原理及服务部署
+](https://aistudio.baidu.com/aistudio/projectdetail/3885352)
diff --git a/docs/source/vpr/PPVPR.md b/docs/source/vpr/PPVPR.md
new file mode 100644
index 0000000000000000000000000000000000000000..a87dd621b2bf24fb3c481d5f5a256d9a36cc154f
--- /dev/null
+++ b/docs/source/vpr/PPVPR.md
@@ -0,0 +1,78 @@
+([简体中文](./PPVPR_cn.md)|English)
+# PP-VPR
+
+## Catalogue
+- [1. Introduction](#1)
+- [2. Characteristic](#2)
+- [3. Tutorials](#3)
+    - [3.1 Pre-trained Models](#31)
+    - [3.2 Training](#32)
+    - [3.3 Inference](#33)
+    - [3.4 Service Deployment](#33)
+- [4. Quick Start](#4)
+
+<a name="1"></a>
+## 1. Introduction
+
+PP-VPR is a tool that provides voice print feature extraction and retrieval functions.  Provides a variety of quasi-industrial solutions, easy to solve the difficult problems in complex scenes, support the use of command line model reasoning.  PP-VPR also supports interface operations and container deployment.  
+
+<a name="2"></a>
+## 2. Characteristic
+The basic process of VPR is shown in the figure below:  
+<center><img src=https://ai-studio-static-online.cdn.bcebos.com/3aed59b8c8874046ad19fe583d15a8dd53c5b33e68db4383b79706e5add5c2d0 width="800" ></center>
+
+
+The main characteristics of PP-ASR are shown below:
+-  Provides pre-trained models on Chinese open source datasets: VoxCeleb(English). The models include ecapa-tdnn.
+-  Support model training/evaluation.
+-  Support model inference using the command line. You can use to use `paddlespeech vector --task spk --input xxx.wav` to use the pre-trained model to do model inference. 
+-  Support interface operations and container deployment.
+
+<a name="3"></a>
+## 3. Tutorials
+
+<a name="31"></a>
+## 3.1 Pre-trained Models
+The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md).  
+For more information about model design, you can refer to the aistudio tutorial:
+- [ecapa-tdnn](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
+
+<a name="32"></a>
+## 3.2 Training
+The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports VoxCeleb. The model supports ecapa-tdnn.
+The specific steps of executing the script are recorded in `run.sh`.
+
+For more information, you can refer to [sv0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0)
+
+
+<a name="33"></a>
+## 3.3 Inference
+
+PP-VPR supports use `paddlespeech vector --task spk --input xxx.wav` to use the pre-trained model to do inference after install `paddlespeech` by `pip install paddlespeech`.
+
+Specific supported functions include:
+
+- Prediction of single audio
+- Score the similarity between the two audios
+- Support RTF calculation
+
+For specific usage, please refer to: [speaker_verification](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speaker_verification/README_cn.md) 
+
+
+<a name="34"></a>
+## 3.4 Service Deployment
+
+PP-VPR supports Docker containerized service deployment.  Through Milvus, MySQL performs high performance library building search.  
+
+Demo of VPR Server: [audio_searching](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/audio_searching)
+
+![arch](https://ai-studio-static-online.cdn.bcebos.com/7b32dd0200084866863095677e8b40d3b725b867d2e6439e9cf21514e235dfd5)
+
+For more information about service deployment, you can refer to the aistudio tutorial:
+- [speaker_recognition](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
+
+<a name="4"></a>
+
+## 4. Quick Start
+
+To use PP-VPR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method.
diff --git a/docs/source/vpr/PPVPR_cn.md b/docs/source/vpr/PPVPR_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..f0e562d1eeaa1522c86db50ea1644ba7bb5b472b
--- /dev/null
+++ b/docs/source/vpr/PPVPR_cn.md
@@ -0,0 +1,79 @@
+(简体中文|[English](./PPVPR.md))
+# PP-VPR
+
+## 目录
+- [1. 简介](#1)
+- [2. 特点](#2)
+- [3. 使用教程](#3)
+    - [3.1 预训练模型](#31)
+    - [3.2 模型训练](#32)
+    - [3.3 模型推理](#33)
+    - [3.4 服务部署](#33)
+- [4. 快速开始](#4)
+
+<a name="1"></a>
+## 1. 简介
+
+PP-VPR 是一个 提供声纹特征提取，检索功能的工具。提供了多种准工业化的方案，轻松搞定复杂场景中的难题，支持使用命令行的方式进行模型的推理。 PP-VPR 也支持界面化的操作，容器化的部署。
+
+<a name="2"></a>
+## 2. 特点
+VPR 的基本流程如下图所示：  
+<center><img src=https://ai-studio-static-online.cdn.bcebos.com/3aed59b8c8874046ad19fe583d15a8dd53c5b33e68db4383b79706e5add5c2d0 width="800" ></center>
+
+
+PP-VPR 的主要特点如下：
+-  提供在英文开源数据集 VoxCeleb（英文）上的预训练模型，ecapa-tdnn。
+-  支持模型训练评估功能。
+-  支持命令行方式的模型推理，可使用 `paddlespeech vector --task spk --input xxx.wav` 方式调用预训练模型进行推理。
+-  支持 VPR 的服务容器化部署，界面化操作。
+
+
+<a name="3"></a>
+## 3. 使用教程
+
+<a name="31"></a>
+## 3.1 预训练模型
+支持的预训练模型列表：[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。
+更多关于模型设计的部分，可以参考 AIStudio 教程：
+- [ecapa-tdnn](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
+
+<a name="32"></a>
+## 3.2 模型训练
+
+模型的训练的参考脚本存放在 [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) 中，并按照 `examples/数据集/模型` 存放，数据集主要支持 VoxCeleb，模型支持 ecapa-tdnn 模型。
+具体的执行脚本的步骤记录在 `run.sh` 当中。具体可参考： [sv0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0)
+
+
+<a name="33"></a>
+## 3.3 模型推理
+
+PP-VPR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。
+
+具体支持的功能包括：
+
+- 对单条音频进行预测
+- 对两条音频进行打分
+- 支持 RTF 的计算
+
+具体的使用方式可以参考： [speaker_verification](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speaker_verification/README_cn.md) 
+
+
+<a name="34"></a>
+## 3.4 服务部署
+
+PP-VPR 支持 Docker 容器化服务部署。通过 Milvus, MySQL 进行高性能建库检索。
+
+server 的 demo： [audio_searching](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/audio_searching)
+
+![arch](https://ai-studio-static-online.cdn.bcebos.com/7b32dd0200084866863095677e8b40d3b725b867d2e6439e9cf21514e235dfd5)
+
+
+关于服务部署方面的更多资料，可以参考 AIStudio 教程：
+- [speaker_recognition](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
+
+<a name="4"></a>
+
+## 4. 快速开始
+
+关于如何使用 PP-VPR，可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)，其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能，可以用 **简单** 安装方式。
diff --git a/examples/aishell/asr0/RESULTS.md b/examples/aishell/asr0/RESULTS.md
index 8af3d66d17efa8fa62a276c67618cd45b03fd77d..131b66286e1bb8d3d9c9b449ac3f3b51efa3f253 100644
--- a/examples/aishell/asr0/RESULTS.md
+++ b/examples/aishell/asr0/RESULTS.md
@@ -4,6 +4,8 @@
 
 | Model | Number of Params | Release | Config | Test set | Valid Loss | CER | 
 | --- | --- | --- | --- | --- | --- | --- | 
+| DeepSpeech2 | 45.18M | r0.2.0 | conf/deepspeech2_online.yaml + U2 Data pipline and spec aug + fbank161 | test | 6.876979827880859 | 0.0666 |
+| DeepSpeech2 | 45.18M | r0.2.0 | conf/deepspeech2_online.yaml + spec aug + fbank161 | test | 7.679287910461426 | 0.0718 |
 | DeepSpeech2 | 45.18M | r0.2.0 | conf/deepspeech2_online.yaml + spec aug | test | 7.708217620849609| 0.078 |
 | DeepSpeech2 | 45.18M | v2.2.0 | conf/deepspeech2_online.yaml + spec aug | test | 7.994938373565674 | 0.080 |  
 
diff --git a/examples/aishell/asr0/local/train.sh b/examples/aishell/asr0/local/train.sh
index 54c642b637d62c9ac0b6be54e1a8d125d8efaf96..102c051c164e9ca10c26e6baa4e7d40ca64a9292 100755
--- a/examples/aishell/asr0/local/train.sh
+++ b/examples/aishell/asr0/local/train.sh
@@ -20,12 +20,21 @@ if [ ${seed} != 0 ]; then
     export FLAGS_cudnn_deterministic=True
 fi
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --model_type ${model_type} \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--model_type ${model_type} \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/aishell/asr1/RESULTS.md b/examples/aishell/asr1/RESULTS.md
index db188450ac958bc4e93a881e469e847dbae73201..f16d423a2dc11f08aeac2a8061f4532d56e6ebbf 100644
--- a/examples/aishell/asr1/RESULTS.md
+++ b/examples/aishell/asr1/RESULTS.md
@@ -11,7 +11,7 @@ paddlespeech version: 0.2.0
 | conformer | 47.07M  | conf/conformer.yaml | spec_aug | test | attention_rescoring | - | 0.0464 | 
 
 
-## Chunk Conformer
+## Conformer Streaming
 paddle version: 2.2.2  
 paddlespeech version: 0.2.0  
 Need set `decoding.decoding_chunk_size=16` when decoding.
diff --git a/examples/aishell/asr1/conf/chunk_conformer.yaml b/examples/aishell/asr1/conf/chunk_conformer.yaml
index 9f70e4c576d06f06f13635c8c19b2ca46842e841..b389e367c441540cba557c1d08c3eb5b6e01ab30 100644
--- a/examples/aishell/asr1/conf/chunk_conformer.yaml
+++ b/examples/aishell/asr1/conf/chunk_conformer.yaml
@@ -10,7 +10,7 @@ encoder_conf:
     attention_heads: 4
     linear_units: 2048  # the number of units of position-wise feed forward
     num_blocks: 12      # the number of encoder blocks
-    dropout_rate: 0.1
+    dropout_rate: 0.1   # sublayer output dropout
     positional_dropout_rate: 0.1
     attention_dropout_rate: 0.0
     input_layer: conv2d # encoder input type, you can chose conv2d, conv2d6 and conv2d8
@@ -30,7 +30,7 @@ decoder_conf:
     attention_heads: 4
     linear_units: 2048
     num_blocks: 6
-    dropout_rate: 0.1
+    dropout_rate: 0.1  # sublayer output dropout
     positional_dropout_rate: 0.1
     self_attention_dropout_rate: 0.0
     src_attention_dropout_rate: 0.0
@@ -39,7 +39,7 @@ model_conf:
     ctc_weight: 0.3
     lsm_weight: 0.1     # label smoothing option
     length_normalized_loss: false
-    init_type: 'kaiming_uniform' 
+    init_type: 'kaiming_uniform' # !Warning: need to convergence
 
 ###########################################
 #                   Data                  #
@@ -78,7 +78,7 @@ num_encs: 1
 ###########################################
 #                 Training                #
 ###########################################
-n_epoch: 180 
+n_epoch: 240 
 accum_grad: 1
 global_grad_clip: 5.0
 dist_sampler: True
diff --git a/examples/aishell/asr1/conf/conformer.yaml b/examples/aishell/asr1/conf/conformer.yaml
index a150a04d55671edf25e5871b1695bcad14710367..2419d07a4066d635fdf93f2e4258f56fcf4ea76d 100644
--- a/examples/aishell/asr1/conf/conformer.yaml
+++ b/examples/aishell/asr1/conf/conformer.yaml
@@ -37,7 +37,7 @@ model_conf:
     ctc_weight: 0.3
     lsm_weight: 0.1     # label smoothing option
     length_normalized_loss: false
-    init_type: 'kaiming_uniform' 
+    init_type: 'kaiming_uniform' # !Warning: need to convergence
 
 ###########################################
 #                   Data                  #
diff --git a/examples/aishell/asr1/conf/transformer.yaml b/examples/aishell/asr1/conf/transformer.yaml
index 9e08ea0ec79168fb969cb3b13a54be60e94157af..4e068420d26ecb5b6eb5740f65df6461e317abf4 100644
--- a/examples/aishell/asr1/conf/transformer.yaml
+++ b/examples/aishell/asr1/conf/transformer.yaml
@@ -10,7 +10,7 @@ encoder_conf:
     attention_heads: 4
     linear_units: 2048  # the number of units of position-wise feed forward
     num_blocks: 12      # the number of encoder blocks
-    dropout_rate: 0.1
+    dropout_rate: 0.1   # sublayer output dropout
     positional_dropout_rate: 0.1
     attention_dropout_rate: 0.0
     input_layer: conv2d # encoder input type, you can chose conv2d, conv2d6 and conv2d8
@@ -21,7 +21,7 @@ decoder_conf:
     attention_heads: 4
     linear_units: 2048
     num_blocks: 6
-    dropout_rate: 0.1
+    dropout_rate: 0.1  # sublayer output dropout
     positional_dropout_rate: 0.1
     self_attention_dropout_rate: 0.0
     src_attention_dropout_rate: 0.0
diff --git a/examples/aishell/asr1/local/train.sh b/examples/aishell/asr1/local/train.sh
index 1c8593bddf5499f50026bd504470114c642055b0..5617f7efe5cca2004bd6b6aae8fc8291edf22f34 100755
--- a/examples/aishell/asr1/local/train.sh
+++ b/examples/aishell/asr1/local/train.sh
@@ -27,14 +27,25 @@ ckpt_name=$2
 
 mkdir -p exp
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
 --seed ${seed} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--profiler-options "${profiler_options}" \
+--benchmark-batch-size ${benchmark_batch_size} \
+--benchmark-max-step ${benchmark_max_step}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
+--seed ${seed} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --profiler-options "${profiler_options}" \
 --benchmark-batch-size ${benchmark_batch_size} \
 --benchmark-max-step ${benchmark_max_step}
+fi
 
 
 if [ ${seed} != 0  ]; then
diff --git a/examples/ami/README.md b/examples/ami/README.md
index a038eaebe5803b11ac1347043c6348f7bdbe59e7..adc9dc4b001c4f9fded4925cef31f18c98ee3637 100644
--- a/examples/ami/README.md
+++ b/examples/ami/README.md
@@ -1,3 +1,3 @@
 # Speaker Diarization on AMI corpus
 
-* sd0 - speaker diarization by AHC,SC base on x-vectors
+* sd0 - speaker diarization by AHC,SC base on embeddings
diff --git a/examples/ami/sd0/README.md b/examples/ami/sd0/README.md
index ffe95741ac447670fa88bc3c4d723c2802f6b450..e9ecc28549ff07325075677ee737ef2e131e5825 100644
--- a/examples/ami/sd0/README.md
+++ b/examples/ami/sd0/README.md
@@ -7,7 +7,23 @@
 The script performs diarization using x-vectors(TDNN,ECAPA-TDNN) on the AMI mix-headset data. We demonstrate the use of different clustering methods: AHC, spectral.
 
 ## How to Run
+### prepare annotations and audios
+Download AMI corpus, You need around 10GB of free space to get whole data
+The signals are too large to package in this way, so you need to use the chooser to indicate which ones you wish to download
+
+```bash
+## download  annotations
+wget http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_public_manual_1.6.2.zip && unzip ami_public_manual_1.6.2.zip
+```
+
+then please follow https://groups.inf.ed.ac.uk/ami/download/ to download the Signals:
+1) Select one or more AMI meetings: the IDs please follow ./ami_split.py
+2) Select media streams: Just select Headset mix
+
+### start running
 Use the following command to run diarization on AMI corpus.
-`bash ./run.sh` 
+```bash
+./run.sh  --data_folder ./amicorpus  --manual_annot_folder ./ami_public_manual_1.6.2
+```
 
 ## Results (DER) coming soon! :)
diff --git a/examples/ami/sd0/run.sh b/examples/ami/sd0/run.sh
index 9035f5955374669220e96c49e720ade5614caa7b..1fcec269d8026de60321f61ee01603324e8313f0 100644
--- a/examples/ami/sd0/run.sh
+++ b/examples/ami/sd0/run.sh
@@ -17,18 +17,6 @@ device=gpu
 
 . ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
 
-if [ $stage -le 0 ]; then
-    # Prepare data
-    # Download AMI corpus, You need around 10GB of free space to get whole data
-    # The signals are too large to package in this way,
-    # so you need to use the chooser to indicate which ones you wish to download
-    echo "Please follow https://groups.inf.ed.ac.uk/ami/download/ to download the data."
-    echo "Annotations: AMI manual annotations v1.6.2 "
-    echo "Signals: "
-    echo "1) Select one or more AMI meetings: the IDs please follow ./ami_split.py"
-    echo "2) Select media streams: Just select Headset mix"
-fi
-
 if [ $stage -le 1 ]; then
     # Download the pretrained model
     wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz
diff --git a/examples/callcenter/asr1/local/train.sh b/examples/callcenter/asr1/local/train.sh
index 3e92fd1624db300b69dfe454cb9c0aee4eac8874..03b4588e30cb92c014f18d1509354f3caaa2b311 100755
--- a/examples/callcenter/asr1/local/train.sh
+++ b/examples/callcenter/asr1/local/train.sh
@@ -21,11 +21,19 @@ if [ ${seed} != 0 ]; then
     export FLAGS_cudnn_deterministic=True
 fi
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/hey_snips/README.md b/examples/hey_snips/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..ba263906abfff5d0d5fed03e9be6594e914b64be
--- /dev/null
+++ b/examples/hey_snips/README.md
@@ -0,0 +1,8 @@
+
+## Metrics
+
+We mesure FRRs with fixing false alarms in one hour:
+
+|Model|False Alarm| False Reject Rate|
+|--|--|--|
+|MDTC| 1| 0.003559 |
diff --git a/examples/hey_snips/kws0/README.md b/examples/hey_snips/kws0/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..be8d142b65a4c40f79703e73a01195c5fc7cba40
--- /dev/null
+++ b/examples/hey_snips/kws0/README.md
@@ -0,0 +1,22 @@
+# MDTC Keyword Spotting with HeySnips Dataset
+
+## Dataset
+
+Before running scripts, you **MUST** follow this instruction to download the dataset: https://github.com/sonos/keyword-spotting-research-datasets
+
+After you download and decompress the dataset archive, you should **REPLACE** the value of `data_dir` in `conf/*.yaml` to complete dataset config.
+
+## Get Started
+
+In this section, we will train the [MDTC](https://arxiv.org/pdf/2102.13552.pdf) model and evaluate on "Hey Snips" dataset.
+
+```sh
+CUDA_VISIBLE_DEVICES=0,1 ./run.sh conf/mdtc.yaml
+```
+
+This script contains training and scoring steps. You can just set the `CUDA_VISIBLE_DEVICES` environment var to run on single gpu or multi-gpus.
+
+The vars `stage` and `stop_stage` in `./run.sh` controls the running steps:
+- stage 1: Training from scratch.
+- stage 2: Evaluating model on test dataset and computing detection error tradeoff(DET) of all trigger thresholds.
+- stage 3: Plotting the DET cruve for visualizaiton.
diff --git a/examples/hey_snips/kws0/conf/mdtc.yaml b/examples/hey_snips/kws0/conf/mdtc.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4bd0708ceb88c79cc8172fb5271879a858768e45
--- /dev/null
+++ b/examples/hey_snips/kws0/conf/mdtc.yaml
@@ -0,0 +1,49 @@
+# https://yaml.org/type/float.html
+###########################################
+#                   Data                  #
+###########################################
+dataset: 'paddleaudio.datasets:HeySnips'
+data_dir: '/PATH/TO/DATA/hey_snips_research_6k_en_train_eval_clean_ter'
+
+############################################
+#           Network Architecture           #
+############################################
+backbone: 'paddlespeech.kws.models:MDTC'
+num_keywords: 1
+stack_num: 3
+stack_size: 4
+in_channels: 80
+res_channels: 32
+kernel_size: 5
+
+###########################################
+#                Feature                  #
+###########################################
+feat_type: 'kaldi_fbank'
+sample_rate: 16000
+frame_shift: 10
+frame_length: 25
+n_mels: 80
+
+###########################################
+#                Training                 #
+###########################################
+epochs: 100
+num_workers: 16
+batch_size: 100
+checkpoint_dir: './checkpoint'
+save_freq: 10
+log_freq: 10
+learning_rate: 0.001
+weight_decay: 0.00005
+grad_clip: 5.0
+
+###########################################
+#                Scoring                  #
+###########################################
+batch_size: 100
+num_workers: 16
+checkpoint: './checkpoint/epoch_100/model.pdparams'
+score_file: './scores.txt'
+stats_file: './stats.0.txt'
+img_file: './det.png'
\ No newline at end of file
diff --git a/examples/hey_snips/kws0/local/plot.sh b/examples/hey_snips/kws0/local/plot.sh
new file mode 100755
index 0000000000000000000000000000000000000000..783de98b49af090c45e2940cc1478da2106fe381
--- /dev/null
+++ b/examples/hey_snips/kws0/local/plot.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# != 3 ];then
+    echo "usage: ${0} config_path checkpoint output_file"
+    exit -1
+fi
+
+keyword=$1
+stats_file=$2
+img_file=$3
+
+python3 ${BIN_DIR}/plot_det_curve.py --keyword_label ${keyword} --stats_file ${stats_file} --img_file ${img_file}
diff --git a/examples/hey_snips/kws0/local/score.sh b/examples/hey_snips/kws0/local/score.sh
new file mode 100755
index 0000000000000000000000000000000000000000..916536af7bbd23b01565a302cfadd21ac03b00cc
--- /dev/null
+++ b/examples/hey_snips/kws0/local/score.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# != 4 ];then
+    echo "usage: ${0} checkpoint score_file stats_file"
+    exit -1
+fi
+
+cfg_path=$1
+ckpt=$2
+score_file=$3
+stats_file=$4
+
+python3 ${BIN_DIR}/score.py --config ${cfg_path} --ckpt ${ckpt} --score_file ${score_file} || exit -1
+python3 ${BIN_DIR}/compute_det.py --config ${cfg_path} --score_file ${score_file} --stats_file ${stats_file} || exit -1
diff --git a/examples/hey_snips/kws0/local/train.sh b/examples/hey_snips/kws0/local/train.sh
new file mode 100755
index 0000000000000000000000000000000000000000..c403f22ac97a11c55c6f621864a1a2578fb4e7da
--- /dev/null
+++ b/examples/hey_snips/kws0/local/train.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# != 2 ];then
+    echo "usage: ${0} num_gpus config_path"
+    exit -1
+fi
+
+ngpu=$1
+cfg_path=$2
+
+if [ ${ngpu} -gt 0 ]; then
+    python3 -m paddle.distributed.launch --gpus $CUDA_VISIBLE_DEVICES ${BIN_DIR}/train.py \
+    --config ${cfg_path}
+else
+    echo "set CUDA_VISIBLE_DEVICES to enable multi-gpus trainning."
+    python3 ${BIN_DIR}/train.py \
+    --config ${cfg_path}
+fi
diff --git a/examples/hey_snips/kws0/path.sh b/examples/hey_snips/kws0/path.sh
new file mode 100755
index 0000000000000000000000000000000000000000..54a430d40af431401040fb09621b0b2125d0127b
--- /dev/null
+++ b/examples/hey_snips/kws0/path.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+export MAIN_ROOT=`realpath ${PWD}/../../../`
+
+export PATH=${MAIN_ROOT}:${MAIN_ROOT}/utils:${PATH}
+export LC_ALL=C
+
+export PYTHONDONTWRITEBYTECODE=1
+# Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
+export PYTHONIOENCODING=UTF-8
+export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}
+
+export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/
+
+MODEL=mdtc
+export BIN_DIR=${MAIN_ROOT}/paddlespeech/kws/exps/${MODEL}
\ No newline at end of file
diff --git a/examples/hey_snips/kws0/run.sh b/examples/hey_snips/kws0/run.sh
new file mode 100755
index 0000000000000000000000000000000000000000..bc25a8e80fe72de415da769e483a4f58762b6b56
--- /dev/null
+++ b/examples/hey_snips/kws0/run.sh
@@ -0,0 +1,47 @@
+#!/bin/bash
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -e
+source path.sh
+
+ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
+
+if [ $# != 1 ];then
+    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path"
+    exit -1
+fi
+
+stage=1
+stop_stage=3
+
+cfg_path=$1
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+    ./local/train.sh ${ngpu} ${cfg_path} || exit -1
+fi
+
+ckpt=./checkpoint/epoch_100/model.pdparams
+score_file=./scores.txt
+stats_file=./stats.0.txt
+img_file=./det.png
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+    ./local/score.sh ${cfg_path} ${ckpt} ${score_file} ${stats_file} || exit -1
+fi
+
+keyword=HeySnips
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+    ./local/plot.sh ${keyword} ${stats_file} ${img_file} || exit -1
+fi
\ No newline at end of file
diff --git a/examples/librispeech/asr0/RESULTS.md b/examples/librispeech/asr0/RESULTS.md
index 77f92a2b7625fbb618b70025ad09d94ac1cc992d..9f6d1cc04177a54024761a1d6f713a088c307ca8 100644
--- a/examples/librispeech/asr0/RESULTS.md
+++ b/examples/librispeech/asr0/RESULTS.md
@@ -1,6 +1,6 @@
 # LibriSpeech
 
-## Deepspeech2
+## Deepspeech2 Non-Streaming
 | Model | Params | release |  Config | Test set | Loss | WER |  
 | --- | --- | --- | --- | --- | --- | --- |  
 | DeepSpeech2 | 42.96M | 2.2.0 | conf/deepspeech2.yaml + spec_aug | test-clean | 14.49190807 | 0.067283 |  
diff --git a/examples/librispeech/asr0/local/train.sh b/examples/librispeech/asr0/local/train.sh
index 0479398ffe96cef6fd9db1f1ebb1b35ab1099a56..50d1d1922b54005a80d3b90b17e1ea6def5a438a 100755
--- a/examples/librispeech/asr0/local/train.sh
+++ b/examples/librispeech/asr0/local/train.sh
@@ -20,12 +20,21 @@ if [ ${seed} != 0 ]; then
     export FLAGS_cudnn_deterministic=True
 fi
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --model_type ${model_type} \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--model_type ${model_type} \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/librispeech/asr1/RESULTS.md b/examples/librispeech/asr1/RESULTS.md
index 10f0fe33d97d56483d37a656813c508ffd15c9b1..6f39ae146b6d1a6d36cd379685a6a29f87adc57e 100644
--- a/examples/librispeech/asr1/RESULTS.md
+++ b/examples/librispeech/asr1/RESULTS.md
@@ -11,7 +11,7 @@ train: Epoch 70, 4 V100-32G, best avg: 20
 | conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention_rescoring | 6.433612394332886 | 0.033761 |  
 
 
-## Chunk Conformer
+## Conformer Streaming
 
 | Model | Params | Config | Augmentation| Test set | Decode method | Chunk Size & Left Chunks | Loss | WER |  
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |  
diff --git a/examples/librispeech/asr1/local/train.sh b/examples/librispeech/asr1/local/train.sh
index 275d3a4905654a685a65e1341d2e611bf215fefc..3860d85cf8c7a65810c2565a358aec2f537bde6b 100755
--- a/examples/librispeech/asr1/local/train.sh
+++ b/examples/librispeech/asr1/local/train.sh
@@ -22,11 +22,19 @@ fi
 # export FLAGS_cudnn_exhaustive_search=true
 # export FLAGS_conv_workspace_size_limit=4000
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/librispeech/asr2/local/train.sh b/examples/librispeech/asr2/local/train.sh
index 898391f4e4c03ed7e6e4290b65f8b4f33ff39a51..560424ea4fe5b79260d12b3678cae01a16df6226 100755
--- a/examples/librispeech/asr2/local/train.sh
+++ b/examples/librispeech/asr2/local/train.sh
@@ -19,12 +19,21 @@ if [ ${seed} != 0 ]; then
     export FLAGS_cudnn_deterministic=True
 fi
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
 --model-name u2_kaldi \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
+--model-name u2_kaldi \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/other/mfa/local/reorganize_aishell3.py b/examples/other/mfa/local/reorganize_aishell3.py
index 0ad30662622cc01405fb8ad8ff1873407e641a6d..a97ee29ede1a6e3ef6a9317bef94202f245eeadf 100644
--- a/examples/other/mfa/local/reorganize_aishell3.py
+++ b/examples/other/mfa/local/reorganize_aishell3.py
@@ -46,22 +46,22 @@ def write_lab(root_dir: Union[str, Path],
         text_path = root_dir / sub_set / 'content.txt'
         new_dir = output_dir / sub_set
 
-    with open(text_path, 'r') as rf:
-        for line in rf:
-            wav_id, context = line.strip().split('\t')
-            spk_id = wav_id[:7]
-            transcript_name = wav_id.split('.')[0] + '.lab'
-            transcript_path = new_dir / spk_id / transcript_name
-            context_list = context.split()
-            word_list = context_list[0:-1:2]
-            pinyin_list = context_list[1::2]
-            wf = open(transcript_path, 'w')
-            if script_type == 'word':
-                # add space between chinese char
-                new_context = ' '.join(word_list)
-            elif script_type == 'pinyin':
-                new_context = ' '.join(pinyin_list)
-            wf.write(new_context + '\n')
+        with open(text_path, 'r') as rf:
+            for line in rf:
+                wav_id, context = line.strip().split('\t')
+                spk_id = wav_id[:7]
+                transcript_name = wav_id.split('.')[0] + '.lab'
+                transcript_path = new_dir / spk_id / transcript_name
+                context_list = context.split()
+                word_list = context_list[0:-1:2]
+                pinyin_list = context_list[1::2]
+                wf = open(transcript_path, 'w')
+                if script_type == 'word':
+                    # add space between chinese char
+                    new_context = ' '.join(word_list)
+                elif script_type == 'pinyin':
+                    new_context = ' '.join(pinyin_list)
+                wf.write(new_context + '\n')
 
 
 def reorganize_aishell3(root_dir: Union[str, Path],
diff --git a/examples/ted_en_zh/st0/local/train.sh b/examples/ted_en_zh/st0/local/train.sh
index e366376bb82da7dbefb2744cf1099949a5fadf06..ad00653b70b366addcab126f1e63fc64aec1b37c 100755
--- a/examples/ted_en_zh/st0/local/train.sh
+++ b/examples/ted_en_zh/st0/local/train.sh
@@ -19,11 +19,19 @@ if [ ${seed} != 0 ]; then
     export FLAGS_cudnn_deterministic=True
 fi
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/ted_en_zh/st1/local/train.sh b/examples/ted_en_zh/st1/local/train.sh
index a8e4acaa01428c13ab2ba36b73c706119160c3bf..5da64e99c95af8e23c9f027080cddcd72c0a4d17 100755
--- a/examples/ted_en_zh/st1/local/train.sh
+++ b/examples/ted_en_zh/st1/local/train.sh
@@ -20,12 +20,21 @@ if [ ${seed} != 0 ]; then
     export FLAGS_cudnn_deterministic=True
 fi
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --checkpoint_path "${ckpt_path}" \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--checkpoint_path "${ckpt_path}" \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
@@ -36,4 +45,4 @@ if [ $? -ne 0 ]; then
     exit 1
 fi
 
-exit 0
\ No newline at end of file
+exit 0
diff --git a/examples/timit/asr1/local/train.sh b/examples/timit/asr1/local/train.sh
index 9b3fa17750262e775100313b5386d9150ae011b7..661407582cc2776e1676249cbe002bcef8c3c2e7 100755
--- a/examples/timit/asr1/local/train.sh
+++ b/examples/timit/asr1/local/train.sh
@@ -19,11 +19,19 @@ if [ ${seed} != 0  ]; then
     export FLAGS_cudnn_deterministic=True
 fi
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0 ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/tiny/asr0/local/train.sh b/examples/tiny/asr0/local/train.sh
index a69b6ddb90ee9ade0320ed88f91a6d1ba25e93a2..9060be674e5804005de6b23c2575bd4888efef00 100755
--- a/examples/tiny/asr0/local/train.sh
+++ b/examples/tiny/asr0/local/train.sh
@@ -26,6 +26,7 @@ model_type=$3
 
 mkdir -p exp
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
@@ -33,6 +34,15 @@ python3 -u ${BIN_DIR}/train.py \
 --model_type ${model_type} \
 --profiler-options "${profiler_options}" \
 --seed ${seed}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--model_type ${model_type} \
+--profiler-options "${profiler_options}" \
+--seed ${seed}
+fi
 
 if [ ${seed} != 0  ]; then
     unset FLAGS_cudnn_deterministic
diff --git a/examples/tiny/asr1/local/train.sh b/examples/tiny/asr1/local/train.sh
index 1c8593bddf5499f50026bd504470114c642055b0..5617f7efe5cca2004bd6b6aae8fc8291edf22f34 100755
--- a/examples/tiny/asr1/local/train.sh
+++ b/examples/tiny/asr1/local/train.sh
@@ -27,14 +27,25 @@ ckpt_name=$2
 
 mkdir -p exp
 
+if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
+--ngpu ${ngpu} \
 --seed ${seed} \
+--config ${config_path} \
+--output exp/${ckpt_name} \
+--profiler-options "${profiler_options}" \
+--benchmark-batch-size ${benchmark_batch_size} \
+--benchmark-max-step ${benchmark_max_step}
+else
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
+--seed ${seed} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --profiler-options "${profiler_options}" \
 --benchmark-batch-size ${benchmark_batch_size} \
 --benchmark-max-step ${benchmark_max_step}
+fi
 
 
 if [ ${seed} != 0  ]; then
diff --git a/examples/voxceleb/sv0/README.md b/examples/voxceleb/sv0/README.md
index 567963e5fc2878a8f4d845126fcd95ebb982d108..418102b4f820e300e493fec18b02d7ebc7127bc2 100644
--- a/examples/voxceleb/sv0/README.md
+++ b/examples/voxceleb/sv0/README.md
@@ -142,10 +142,10 @@ using the `tar` scripts to unpack the model and then you can use the script to t
 For example:
 ```
 wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
-tar xzvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
+tar -xvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
 source path.sh
 # If you have processed the data and get the manifest file， you can skip the following 2 steps
 
-CUDA_VISIBLE_DEVICES= ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2 conf/ecapa_tdnn.yaml
+CUDA_VISIBLE_DEVICES= bash ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2/model/ conf/ecapa_tdnn.yaml
 ```
 The performance of the released models are shown in [this](./RESULTS.md)
diff --git a/examples/voxceleb/sv0/local/data_prepare.py b/examples/voxceleb/sv0/local/data_prepare.py
index 03d054004f0f5f98103a78b5becdc8f0a8bda357..b4486b6f00cb91957dd0b27dffbc56dfba086df5 100644
--- a/examples/voxceleb/sv0/local/data_prepare.py
+++ b/examples/voxceleb/sv0/local/data_prepare.py
@@ -14,9 +14,9 @@
 import argparse
 
 import paddle
+from paddleaudio.datasets.voxceleb import VoxCeleb
 from yacs.config import CfgNode
 
-from paddleaudio.datasets.voxceleb import VoxCeleb
 from paddlespeech.s2t.utils.log import Log
 from paddlespeech.vector.io.augment import build_augment_pipeline
 from paddlespeech.vector.training.seeding import seed_everything
diff --git a/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py b/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py
index b25a9d49a19d4250210c92dce4afb85510c41991..0d0163f159e518d152bce3366a3cc187181e4364 100644
--- a/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py
+++ b/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py
@@ -21,9 +21,9 @@ import os
 from typing import List
 
 import tqdm
+from paddleaudio import load as load_audio
 from yacs.config import CfgNode
 
-from paddleaudio import load as load_audio
 from paddlespeech.s2t.utils.log import Log
 from paddlespeech.vector.utils.vector_utils import get_chunks
 
diff --git a/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py b/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py
index 4e64c306712635801c21d252bba498f64cbb6f6e..ffd0d212ddda7e31eeaf171f28f5096aaee5ea06 100644
--- a/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py
+++ b/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py
@@ -22,9 +22,9 @@ import os
 import random
 
 import tqdm
+from paddleaudio import load as load_audio
 from yacs.config import CfgNode
 
-from paddleaudio import load as load_audio
 from paddlespeech.s2t.utils.log import Log
 from paddlespeech.vector.utils.vector_utils import get_chunks
 
diff --git a/examples/voxceleb/sv0/local/test.sh b/examples/voxceleb/sv0/local/test.sh
index 4460a165ace5ad12a10977f97432937623b185fd..800fa67da4e38e835368eacb1620dd1a72ad3e43 100644
--- a/examples/voxceleb/sv0/local/test.sh
+++ b/examples/voxceleb/sv0/local/test.sh
@@ -33,10 +33,26 @@ dir=$1
 exp_dir=$2
 conf_path=$3
 
+# get the gpu nums for training
+ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
+echo "using $ngpu gpus..."
+
+# setting training device
+device="cpu"
+if ${use_gpu}; then
+    device="gpu"
+fi
+if [ $ngpu -le 0 ]; then 
+    echo "no gpu, training in cpu mode"
+    device='cpu'
+    use_gpu=false
+fi
+
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # test the model and compute the eer metrics
    python3 ${BIN_DIR}/test.py \
          --data-dir ${dir} \
          --load-checkpoint ${exp_dir} \
-         --config ${conf_path}
+         --config ${conf_path} \
+         --device ${device}
 fi
diff --git a/examples/voxceleb/sv0/local/train.sh b/examples/voxceleb/sv0/local/train.sh
index 5477d0a34dff3ec546146d772793136b586705f5..674fedb3279b0d81f4eb9d3370627557e4e58812 100755
--- a/examples/voxceleb/sv0/local/train.sh
+++ b/examples/voxceleb/sv0/local/train.sh
@@ -42,15 +42,25 @@ device="cpu"
 if ${use_gpu}; then
     device="gpu"
 fi
+if [ $ngpu -le 0 ]; then 
+    echo "no gpu, training in cpu mode"
+    device='cpu'
+    use_gpu=false
+fi
 
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
     # train the speaker identification task with voxceleb data
     # and we will create the trained model parameters in ${exp_dir}/model.pdparams as the soft link
     # Note: we will store the log file in exp/log directory
-    python3 -m paddle.distributed.launch --gpus=$CUDA_VISIBLE_DEVICES \
-        ${BIN_DIR}/train.py --device ${device} --checkpoint-dir ${exp_dir} \
-        --data-dir ${dir} --config ${conf_path}
-
+    if $use_gpu; then
+        python3 -m paddle.distributed.launch --gpus=$CUDA_VISIBLE_DEVICES \
+            ${BIN_DIR}/train.py --device ${device} --checkpoint-dir ${exp_dir} \
+            --data-dir ${dir} --config ${conf_path}
+    else
+        python3 \
+            ${BIN_DIR}/train.py --device ${device} --checkpoint-dir ${exp_dir} \
+            --data-dir ${dir} --config ${conf_path}
+    fi
 fi 
 
 if [ $? -ne 0 ]; then
diff --git a/examples/wenetspeech/asr0/RESULTS.md b/examples/wenetspeech/asr0/RESULTS.md
new file mode 100644
index 0000000000000000000000000000000000000000..0796b7bca1a86d9ce5bcc4a690f22ce23c8911c0
--- /dev/null
+++ b/examples/wenetspeech/asr0/RESULTS.md
@@ -0,0 +1,8 @@
+# Wenetspeech
+
+## Deepspeech2 Streaming
+
+| Model | Number of Params | Release | Config | Test set | Valid Loss | CER | 
+| --- | --- | --- | --- | --- | --- | --- | 
+| DeepSpeech2 | 1.2G | r1.0.0a | conf/deepspeech2\_online.yaml + spec aug + fbank161, w/o LM | test\_net | 13.307 | 15.02 |
+| DeepSpeech2 | 1.2G | r1.0.0a | conf/deepspeech2\_online.yaml + spec aug + fbank161, w/o LM | test\_meeting | 13.307 | 24.17 |
diff --git a/examples/wenetspeech/asr1/RESULTS.md b/examples/wenetspeech/asr1/RESULTS.md
index 5c2b8143ca31b2c6f12a8db53db4abc38a0af748..cc209db754ffd67655ab25dac662cf2ea122a1fc 100644
--- a/examples/wenetspeech/asr1/RESULTS.md
+++ b/examples/wenetspeech/asr1/RESULTS.md
@@ -1,9 +1,21 @@
 # WenetSpeech
 
+## Conformer Streaming
+
+| Model | Params | Config | Augmentation| Test set | Decode method | Valid Loss | CER |  
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test net | attention | 9.329 | 0.1102 |  
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test net | ctc_greedy_search | 9.329 | 0.1207 |  
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test net | ctc_prefix_beam_search | 9.329 | 0.1203 |  
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test net | attention_rescoring | 9.329  | 0.1100 |  
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test meeting | attention | 9.329 | 0.1992 |  
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test meeting | ctc_greedy_search | 9.329 | 0.1960 |  
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test meeting | ctc_prefix_beam_search | 9.329 | 0.1946 |  
+| conformer_online | 123.47 M | conf/chunk_conformer.yaml | spec_aug  | test meeting | attention_rescoring | 9.329  | 0.1879|  
 
 ## Conformer
 
-| Model | Params | Config | Augmentation| Test set | Decode method | Loss | WER |  
+| Model | Params | Config | Augmentation| Test set | Decode method | Loss | CER |  
 | --- | --- | --- | --- | --- | --- | --- | --- |
 | conformer | 32.52 M | conf/conformer.yaml | spec_aug  | dev | attention |  |  |  
 | conformer | 32.52 M | conf/conformer.yaml | spec_aug  | test net | ctc_greedy_search |  |  |  
@@ -16,7 +28,7 @@
 
 Pretrain model from http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/wenetspeech/20211025_conformer_exp.tar.gz
 
-| Model | Params | Config | Augmentation| Test set | Decode method | Loss | WER |  
+| Model | Params | Config | Augmentation| Test set | Decode method | Loss | CER |  
 | --- | --- | --- | --- | --- | --- | --- | --- |
 | conformer | 32.52 M | conf/conformer.yaml | spec_aug  | aishell1 | attention | - | 0.048456 |  
 | conformer | 32.52 M | conf/conformer.yaml | spec_aug  | aishell1 | ctc_greedy_search | - | 0.052534 |  
diff --git a/paddlespeech/cli/asr/infer.py b/paddlespeech/cli/asr/infer.py
index 97a1b321997786f1b56375585f2d317b4c0d8392..863a933f2a7446b920228fe2f5fa6e0294b50d5d 100644
--- a/paddlespeech/cli/asr/infer.py
+++ b/paddlespeech/cli/asr/infer.py
@@ -14,6 +14,7 @@
 import argparse
 import os
 import sys
+import time
 from collections import OrderedDict
 from typing import List
 from typing import Optional
@@ -29,8 +30,10 @@ from ..download import get_path_from_url
 from ..executor import BaseExecutor
 from ..log import logger
 from ..utils import cli_register
+from ..utils import CLI_TIMER
 from ..utils import MODEL_HOME
 from ..utils import stats_wrapper
+from ..utils import timer_register
 from .pretrained_models import model_alias
 from .pretrained_models import pretrained_models
 from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
@@ -41,6 +44,7 @@ from paddlespeech.s2t.utils.utility import UpdateConfig
 __all__ = ['ASRExecutor']
 
 
+@timer_register
 @cli_register(
     name='paddlespeech.asr', description='Speech to text infer command.')
 class ASRExecutor(BaseExecutor):
@@ -99,6 +103,11 @@ class ASRExecutor(BaseExecutor):
             default=False,
             help='No additional parameters required. Once set this parameter, it means accepting the request of the program by default, which includes transforming the audio sample rate'
         )
+        self.parser.add_argument(
+            '--rtf',
+            action="store_true",
+            default=False,
+            help='Show Real-time Factor(RTF).')
         self.parser.add_argument(
             '--device',
             type=str,
@@ -178,6 +187,7 @@ class ASRExecutor(BaseExecutor):
                     vocab=self.config.vocab_filepath,
                     spm_model_prefix=self.config.spm_model_prefix)
                 self.config.decode.decoding_method = decode_method
+
             else:
                 raise Exception("wrong type")
         model_name = model_type[:model_type.rindex(
@@ -192,6 +202,21 @@ class ASRExecutor(BaseExecutor):
         model_dict = paddle.load(self.ckpt_path)
         self.model.set_state_dict(model_dict)
 
+        # compute the max len limit
+        if "conformer" in model_type or "transformer" in model_type or "wenetspeech" in model_type:
+            # in transformer like model, we may use the subsample rate cnn network
+            subsample_rate = self.model.subsampling_rate()
+            frame_shift_ms = self.config.preprocess_config.process[0][
+                'n_shift'] / self.config.preprocess_config.process[0]['fs']
+            max_len = self.model.encoder.embed.pos_enc.max_len
+
+            if self.config.encoder_conf.get("max_len", None):
+                max_len = self.config.encoder_conf.max_len
+
+            self.max_len = frame_shift_ms * max_len * subsample_rate
+            logger.info(
+                f"The asr server limit max duration len: {self.max_len}")
+
     def preprocess(self, model_type: str, input: Union[str, os.PathLike]):
         """
         Input preprocess and return paddle.Tensor stored in self.input.
@@ -343,10 +368,11 @@ class ASRExecutor(BaseExecutor):
             audio, audio_sample_rate = soundfile.read(
                 audio_file, dtype="int16", always_2d=True)
             audio_duration = audio.shape[0] / audio_sample_rate
-            max_duration = 50.0
-            if audio_duration >= max_duration:
-                logger.error("Please input audio file less then 50 seconds.\n")
-                return
+            if audio_duration > self.max_len:
+                logger.error(
+                    f"Please input audio file less then {self.max_len} seconds.\n"
+                )
+                return False
         except Exception as e:
             logger.exception(e)
             logger.error(
@@ -383,7 +409,7 @@ class ASRExecutor(BaseExecutor):
                     ) == "n" or content.strip() == "no" or content.strip(
                     ) == "No":
                         logger.info("Exit the program")
-                        exit(1)
+                        return False
                     else:
                         logger.warning("Not regular input, please input again")
 
@@ -407,6 +433,7 @@ class ASRExecutor(BaseExecutor):
         ckpt_path = parser_args.ckpt_path
         decode_method = parser_args.decode_method
         force_yes = parser_args.yes
+        rtf = parser_args.rtf
         device = parser_args.device
 
         if not parser_args.verbose:
@@ -419,12 +446,15 @@ class ASRExecutor(BaseExecutor):
         for id_, input_ in task_source.items():
             try:
                 res = self(input_, model, lang, sample_rate, config, ckpt_path,
-                           decode_method, force_yes, device)
+                           decode_method, force_yes, rtf, device)
                 task_results[id_] = res
             except Exception as e:
                 has_exceptions = True
                 task_results[id_] = f'{e.__class__.__name__}: {e}'
 
+        if rtf:
+            self.show_rtf(CLI_TIMER[self.__class__.__name__])
+
         self.process_task_results(parser_args.input, task_results,
                                   parser_args.job_dump_result)
 
@@ -443,6 +473,7 @@ class ASRExecutor(BaseExecutor):
                  ckpt_path: os.PathLike=None,
                  decode_method: str='attention_rescoring',
                  force_yes: bool=False,
+                 rtf: bool=False,
                  device=paddle.get_device()):
         """
         Python API to call an executor.
@@ -453,8 +484,18 @@ class ASRExecutor(BaseExecutor):
         paddle.set_device(device)
         self._init_from_path(model, lang, sample_rate, config, decode_method,
                              ckpt_path)
+        if rtf:
+            k = self.__class__.__name__
+            CLI_TIMER[k]['start'].append(time.time())
+
         self.preprocess(model, audio_file)
         self.infer(model)
         res = self.postprocess()  # Retrieve result of asr.
 
+        if rtf:
+            CLI_TIMER[k]['end'].append(time.time())
+            audio, audio_sample_rate = soundfile.read(
+                audio_file, dtype="int16", always_2d=True)
+            CLI_TIMER[k]['extra'].append(audio.shape[0] / audio_sample_rate)
+
         return res
diff --git a/paddlespeech/cli/asr/pretrained_models.py b/paddlespeech/cli/asr/pretrained_models.py
index cc52c751b68c4fd94677d32fe03cbb41b7643f37..0f521884020b039a074ad302100a58a59e4d77b1 100644
--- a/paddlespeech/cli/asr/pretrained_models.py
+++ b/paddlespeech/cli/asr/pretrained_models.py
@@ -27,6 +27,46 @@ pretrained_models = {
         'ckpt_path':
         'exp/conformer/checkpoints/wenetspeech',
     },
+    "conformer_online_wenetspeech-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz',
+        'md5':
+        'b8c02632b04da34aca88459835be54a6',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/chunk_conformer/checkpoints/avg_10',
+    },
+    "conformer_online_multicn-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/multi_cn/asr1/asr1_chunk_conformer_multi_cn_ckpt_0.2.0.model.tar.gz',
+        'md5':
+        '7989b3248c898070904cf042fd656003',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/chunk_conformer/checkpoints/multi_cn',
+    },
+    "conformer_aishell-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz',
+        'md5':
+        '3f073eccfa7bb14e0c6867d65fc0dc3a',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/conformer/checkpoints/avg_30',
+    },
+    "conformer_online_aishell-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz',
+        'md5':
+        'b374cfb93537761270b6224fb0bfc26a',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/chunk_conformer/checkpoints/avg_30',
+    },
     "transformer_librispeech-en-16k": {
         'url':
         'https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_transformer_librispeech_ckpt_0.1.1.model.tar.gz',
@@ -37,6 +77,20 @@ pretrained_models = {
         'ckpt_path':
         'exp/transformer/checkpoints/avg_10',
     },
+    "deepspeech2online_wenetspeech-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz',
+        'md5':
+        'e393d4d274af0f6967db24fc146e8074',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/deepspeech2_online/checkpoints/avg_10',
+        'lm_url':
+        'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
+        'lm_md5':
+        '29e02312deb2e59b3c8686c7966d4fe3'
+    },
     "deepspeech2offline_aishell-zh-16k": {
         'url':
         'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz',
@@ -53,9 +107,9 @@ pretrained_models = {
     },
     "deepspeech2online_aishell-zh-16k": {
         'url':
-        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz',
+        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz',
         'md5':
-        '23e16c69730a1cb5d735c98c83c21e16',
+        '98b87b171b7240b7cae6e07d8d0bc9be',
         'cfg_path':
         'model.yaml',
         'ckpt_path':
diff --git a/paddlespeech/cli/base_commands.py b/paddlespeech/cli/base_commands.py
index 97d5cd7fa3ad30ee2338b50cfd5123fe4cd99d05..0a26b12030a0b25fe169be5ad3bc61e82c500fa7 100644
--- a/paddlespeech/cli/base_commands.py
+++ b/paddlespeech/cli/base_commands.py
@@ -47,3 +47,29 @@ class HelpCommand:
 
         print(msg)
         return True
+
+
+@cli_register(
+    name='paddlespeech.version',
+    description='Show version and commit id of current package.')
+class VersionCommand:
+    def execute(self, argv: List[str]) -> bool:
+        try:
+            from .. import __version__
+            version = __version__
+        except ImportError:
+            version = 'Not an official release'
+
+        try:
+            from .. import __commit__
+            commit_id = __commit__
+        except ImportError:
+            commit_id = 'Not found'
+
+        msg = 'Package Version:\n'
+        msg += '    {}\n\n'.format(version)
+        msg += 'Commit ID:\n'
+        msg += '    {}\n\n'.format(commit_id)
+
+        print(msg)
+        return True
diff --git a/paddlespeech/cli/cls/infer.py b/paddlespeech/cli/cls/infer.py
index fa46db19dffc5434632694ec25e17b1bed71e9c6..40072d9974e5798dc5d74b921efb905230e06246 100644
--- a/paddlespeech/cli/cls/infer.py
+++ b/paddlespeech/cli/cls/infer.py
@@ -21,6 +21,9 @@ from typing import Union
 import numpy as np
 import paddle
 import yaml
+from paddleaudio import load
+from paddleaudio.features import LogMelSpectrogram
+from paddlespeech.utils.dynamic_import import dynamic_import
 
 from ..executor import BaseExecutor
 from ..log import logger
@@ -28,9 +31,7 @@ from ..utils import cli_register
 from ..utils import stats_wrapper
 from .pretrained_models import model_alias
 from .pretrained_models import pretrained_models
-from paddleaudio import load
-from paddleaudio.features import LogMelSpectrogram
-from paddlespeech.utils.dynamic_import import dynamic_import
+
 
 __all__ = ['CLSExecutor']
 
@@ -245,4 +246,4 @@ class CLSExecutor(BaseExecutor):
         self.infer()
         res = self.postprocess(topk)  # Retrieve result of cls.
 
-        return res
+        return res
\ No newline at end of file
diff --git a/paddlespeech/cli/executor.py b/paddlespeech/cli/executor.py
index df0b6783823b7ac0e23b373f0bb898e0d103be64..4a631c7f5b8d73ad1095a3d00b8ddbbc9615a8e5 100644
--- a/paddlespeech/cli/executor.py
+++ b/paddlespeech/cli/executor.py
@@ -235,3 +235,19 @@ class BaseExecutor(ABC):
             'Use pretrained model stored in: {}'.format(decompressed_path))
 
         return decompressed_path
+
+    def show_rtf(self, info: Dict[str, List[float]]):
+        """
+        Calculate rft of current task and show results.
+        """
+        num_samples = 0
+        task_duration = 0.0
+        wav_duration = 0.0
+
+        for start, end, dur in zip(info['start'], info['end'], info['extra']):
+            num_samples += 1
+            task_duration += end - start
+            wav_duration += dur
+
+        logger.info('Sample Count: {}'.format(num_samples))
+        logger.info('Avg RTF: {}'.format(task_duration / wav_duration))
diff --git a/paddlespeech/cli/utils.py b/paddlespeech/cli/utils.py
index f7d64b9a95e296a57abcf6340d4e6581e3ccbbe6..82d40c8bc772fbb342ea0834a04af0de99aad667 100644
--- a/paddlespeech/cli/utils.py
+++ b/paddlespeech/cli/utils.py
@@ -39,6 +39,7 @@ except ImportError:
 requests.adapters.DEFAULT_RETRIES = 3
 
 __all__ = [
+    'timer_register',
     'cli_register',
     'get_command',
     'download_and_decompress',
@@ -46,6 +47,13 @@ __all__ = [
     'stats_wrapper',
 ]
 
+CLI_TIMER = {}
+
+
+def timer_register(command):
+    CLI_TIMER[command.__name__] = {'start': [], 'end': [], 'extra': []}
+    return command
+
 
 def cli_register(name: str, description: str='') -> Any:
     def _warpper(command):
diff --git a/paddlespeech/cli/vector/infer.py b/paddlespeech/cli/vector/infer.py
index ea8f2c1f755875fdc67bcfc4de71ef81a9ff6097..cc664369fa2a82f929e8112610955e1aa727a6d2 100644
--- a/paddlespeech/cli/vector/infer.py
+++ b/paddlespeech/cli/vector/infer.py
@@ -22,6 +22,8 @@ from typing import Union
 
 import paddle
 import soundfile
+from paddleaudio.backends import load as load_audio
+from paddleaudio.compliance.librosa import melspectrogram
 from yacs.config import CfgNode
 
 from ..executor import BaseExecutor
@@ -30,8 +32,6 @@ from ..utils import cli_register
 from ..utils import stats_wrapper
 from .pretrained_models import model_alias
 from .pretrained_models import pretrained_models
-from paddleaudio.backends import load as load_audio
-from paddleaudio.compliance.librosa import melspectrogram
 from paddlespeech.utils.dynamic_import import dynamic_import
 from paddlespeech.vector.io.batch import feature_normalize
 from paddlespeech.vector.modules.sid_model import SpeakerIdetification
@@ -272,7 +272,8 @@ class VectorExecutor(BaseExecutor):
                         model_type: str='ecapatdnn_voxceleb12',
                         sample_rate: int=16000,
                         cfg_path: Optional[os.PathLike]=None,
-                        ckpt_path: Optional[os.PathLike]=None):
+                        ckpt_path: Optional[os.PathLike]=None,
+                        task=None):
         """Init the neural network from the model path
 
         Args:
@@ -284,8 +285,10 @@ class VectorExecutor(BaseExecutor):
                                                         Defaults to None.
             ckpt_path (Optional[os.PathLike], optional): the pretrained model path, which is stored in the disk. 
                                                          Defaults to None.
+            task (str, optional): the model task type
         """
         # stage 0: avoid to init the mode again
+        self.task = task
         if hasattr(self, "model"):
             logger.info("Model has been initialized")
             return
@@ -434,6 +437,9 @@ class VectorExecutor(BaseExecutor):
         if self.sample_rate != 16000 and self.sample_rate != 8000:
             logger.error(
                 "invalid sample rate, please input --sr 8000 or --sr 16000")
+            logger.error(
+                f"The model sample rate: {self.sample_rate}, the external sample rate is: {sample_rate}"
+            )
             return False
 
         if isinstance(audio_file, (str, os.PathLike)):
@@ -470,4 +476,4 @@ class VectorExecutor(BaseExecutor):
         else:
             logger.info("The audio file format is right")
 
-        return True
+        return True
\ No newline at end of file
diff --git a/paddlespeech/cls/exps/panns/deploy/predict.py b/paddlespeech/cls/exps/panns/deploy/predict.py
index d4e5c22fb12b6453ba6ef6e4192f6a9442b960a9..ee566ed4f8e29171a41364b8e86bb143fb570796 100644
--- a/paddlespeech/cls/exps/panns/deploy/predict.py
+++ b/paddlespeech/cls/exps/panns/deploy/predict.py
@@ -16,11 +16,10 @@ import os
 
 import numpy as np
 from paddle import inference
-from scipy.special import softmax
-
 from paddleaudio.backends import load as load_audio
 from paddleaudio.datasets import ESC50
 from paddleaudio.features import melspectrogram
+from scipy.special import softmax
 
 # yapf: disable
 parser = argparse.ArgumentParser()
diff --git a/paddlespeech/cls/exps/panns/export_model.py b/paddlespeech/cls/exps/panns/export_model.py
index c295c6a33838b086480ddc4e681341cbd023d560..63b22981adb62d213f0b2bd79ae9d4b180d06591 100644
--- a/paddlespeech/cls/exps/panns/export_model.py
+++ b/paddlespeech/cls/exps/panns/export_model.py
@@ -15,8 +15,8 @@ import argparse
 import os
 
 import paddle
-
 from paddleaudio.datasets import ESC50
+
 from paddlespeech.cls.models import cnn14
 from paddlespeech.cls.models import SoundClassifier
 
diff --git a/paddlespeech/cls/models/panns/panns.py b/paddlespeech/cls/models/panns/panns.py
index 6d2dac56ac23d9b3322e49703f98e15faf936fd0..b442b2fd1224df90d0ec519124d2fc65397928c5 100644
--- a/paddlespeech/cls/models/panns/panns.py
+++ b/paddlespeech/cls/models/panns/panns.py
@@ -15,7 +15,6 @@ import os
 
 import paddle.nn as nn
 import paddle.nn.functional as F
-
 from paddleaudio.utils.download import load_state_dict_from_url
 from paddleaudio.utils.env import MODEL_HOME
 
diff --git a/paddlespeech/kws/__init__.py b/paddlespeech/kws/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..9c6e278eaf0324dd04adb178b5846036c073c59d
--- /dev/null
+++ b/paddlespeech/kws/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .models.mdtc import MDTC
diff --git a/paddlespeech/kws/exps/mdtc/collate.py b/paddlespeech/kws/exps/mdtc/collate.py
new file mode 100644
index 0000000000000000000000000000000000000000..dcc811236b89e37f2255652d815c5fcb83b80efc
--- /dev/null
+++ b/paddlespeech/kws/exps/mdtc/collate.py
@@ -0,0 +1,39 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import time
+
+import paddle
+
+
+def collate_features(batch):
+    # (key, feat, label)
+    collate_start = time.time()
+    keys = []
+    feats = []
+    labels = []
+    lengths = []
+    for sample in batch:
+        keys.append(sample[0])
+        feats.append(sample[1])
+        labels.append(sample[2])
+        lengths.append(sample[1].shape[0])
+
+    max_length = max(lengths)
+    for i in range(len(feats)):
+        feats[i] = paddle.nn.functional.pad(
+            feats[i], [0, max_length - feats[i].shape[0], 0, 0],
+            data_format='NLC')
+
+    return keys, paddle.stack(feats), paddle.to_tensor(
+        labels), paddle.to_tensor(lengths)
diff --git a/paddlespeech/kws/exps/mdtc/compute_det.py b/paddlespeech/kws/exps/mdtc/compute_det.py
new file mode 100644
index 0000000000000000000000000000000000000000..e43a953dbc35246f295a79790b2da55837318114
--- /dev/null
+++ b/paddlespeech/kws/exps/mdtc/compute_det.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Modified from wekws(https://github.com/wenet-e2e/wekws)
+import os
+
+import paddle
+from tqdm import tqdm
+from yacs.config import CfgNode
+
+from paddlespeech.s2t.training.cli import default_argument_parser
+from paddlespeech.s2t.utils.dynamic_import import dynamic_import
+
+
+def load_label_and_score(keyword_index: int,
+                         ds: paddle.io.Dataset,
+                         score_file: os.PathLike):
+    score_table = {}  # {utt_id: scores_over_frames}
+    with open(score_file, 'r', encoding='utf8') as fin:
+        for line in fin:
+            arr = line.strip().split()
+            key = arr[0]
+            current_keyword = arr[1]
+            str_list = arr[2:]
+            if int(current_keyword) == keyword_index:
+                scores = list(map(float, str_list))
+                if key not in score_table:
+                    score_table.update({key: scores})
+    keyword_table = {}  # scores of keyword utt_id
+    filler_table = {}  # scores of non-keyword utt_id
+    filler_duration = 0.0
+
+    for key, index, duration in zip(ds.keys, ds.labels, ds.durations):
+        assert key in score_table
+        if index == keyword_index:
+            keyword_table[key] = score_table[key]
+        else:
+            filler_table[key] = score_table[key]
+            filler_duration += duration
+
+    return keyword_table, filler_table, filler_duration
+
+
+if __name__ == '__main__':
+    parser = default_argument_parser()
+    parser.add_argument(
+        '--keyword_index', type=int, default=0, help='keyword index')
+    parser.add_argument(
+        '--step',
+        type=float,
+        default=0.01,
+        help='threshold step of trigger score')
+    parser.add_argument(
+        '--window_shift',
+        type=int,
+        default=50,
+        help='window_shift is used to skip the frames after triggered')
+    parser.add_argument(
+        "--score_file",
+        type=str,
+        required=True,
+        help='output file of trigger scores')
+    parser.add_argument(
+        '--stats_file',
+        type=str,
+        default='./stats.0.txt',
+        help='output file of detection error tradeoff')
+    args = parser.parse_args()
+
+    # https://yaml.org/type/float.html
+    config = CfgNode(new_allowed=True)
+    if args.config:
+        config.merge_from_file(args.config)
+
+    # Dataset
+    ds_class = dynamic_import(config['dataset'])
+    test_ds = ds_class(
+        data_dir=config['data_dir'],
+        mode='test',
+        feat_type=config['feat_type'],
+        sample_rate=config['sample_rate'],
+        frame_shift=config['frame_shift'],
+        frame_length=config['frame_length'],
+        n_mels=config['n_mels'], )
+
+    keyword_table, filler_table, filler_duration = load_label_and_score(
+        args.keyword_index, test_ds, args.score_file)
+    print('Filler total duration Hours: {}'.format(filler_duration / 3600.0))
+    pbar = tqdm(total=int(1.0 / args.step))
+    with open(args.stats_file, 'w', encoding='utf8') as fout:
+        keyword_index = args.keyword_index
+        threshold = 0.0
+        while threshold <= 1.0:
+            num_false_reject = 0
+            # transverse the all keyword_table
+            for key, score_list in keyword_table.items():
+                # computer positive test sample, use the max score of list.
+                score = max(score_list)
+                if float(score) < threshold:
+                    num_false_reject += 1
+            num_false_alarm = 0
+            # transverse the all filler_table
+            for key, score_list in filler_table.items():
+                i = 0
+                while i < len(score_list):
+                    if score_list[i] >= threshold:
+                        num_false_alarm += 1
+                        i += args.window_shift
+                    else:
+                        i += 1
+            if len(keyword_table) != 0:
+                false_reject_rate = num_false_reject / len(keyword_table)
+            num_false_alarm = max(num_false_alarm, 1e-6)
+            if filler_duration != 0:
+                false_alarm_per_hour = num_false_alarm / \
+                    (filler_duration / 3600.0)
+            fout.write('{:.6f} {:.6f} {:.6f}\n'.format(
+                threshold, false_alarm_per_hour, false_reject_rate))
+            threshold += args.step
+            pbar.update(1)
+
+    pbar.close()
+    print('DET saved to: {}'.format(args.stats_file))
diff --git a/paddlespeech/kws/exps/mdtc/plot_det_curve.py b/paddlespeech/kws/exps/mdtc/plot_det_curve.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3ea21eff9773f922c789cccaf6b5d5f02fed5c5
--- /dev/null
+++ b/paddlespeech/kws/exps/mdtc/plot_det_curve.py
@@ -0,0 +1,68 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Modified from wekws(https://github.com/wenet-e2e/wekws)
+import argparse
+import os
+
+import matplotlib.pyplot as plt
+import numpy as np
+
+# yapf: disable
+parser = argparse.ArgumentParser(__doc__)
+parser.add_argument('--keyword_label', type=str, required=True, help='keyword string shown on image')
+parser.add_argument('--stats_file', type=str, required=True, help='output file of detection error tradeoff')
+parser.add_argument('--img_file', type=str, default='./det.png', help='output det image')
+args = parser.parse_args()
+# yapf: enable
+
+
+def load_stats_file(stats_file):
+    values = []
+    with open(stats_file, 'r', encoding='utf8') as fin:
+        for line in fin:
+            arr = line.strip().split()
+            threshold, fa_per_hour, frr = arr
+            values.append([float(fa_per_hour), float(frr) * 100])
+    values.reverse()
+    return np.array(values)
+
+
+def plot_det_curve(keywords, stats_file, figure_file, xlim, x_step, ylim,
+                   y_step):
+    plt.figure(dpi=200)
+    plt.rcParams['xtick.direction'] = 'in'
+    plt.rcParams['ytick.direction'] = 'in'
+    plt.rcParams['font.size'] = 12
+
+    for index, keyword in enumerate(keywords):
+        values = load_stats_file(stats_file)
+        plt.plot(values[:, 0], values[:, 1], label=keyword)
+
+    plt.xlim([0, xlim])
+    plt.ylim([0, ylim])
+    plt.xticks(range(0, xlim + x_step, x_step))
+    plt.yticks(range(0, ylim + y_step, y_step))
+    plt.xlabel('False Alarm Per Hour')
+    plt.ylabel('False Rejection Rate (\\%)')
+    plt.grid(linestyle='--')
+    plt.legend(loc='best', fontsize=16)
+    plt.savefig(figure_file)
+
+
+if __name__ == '__main__':
+    img_file = os.path.abspath(args.img_file)
+    stats_file = os.path.abspath(args.stats_file)
+    plot_det_curve([args.keyword_label], stats_file, img_file, 10, 2, 10, 2)
+
+    print('DET curve image saved to: {}'.format(img_file))
diff --git a/paddlespeech/kws/exps/mdtc/score.py b/paddlespeech/kws/exps/mdtc/score.py
new file mode 100644
index 0000000000000000000000000000000000000000..1b5e1e2967f62c5f6259ddc40ad8929fd61d1a7c
--- /dev/null
+++ b/paddlespeech/kws/exps/mdtc/score.py
@@ -0,0 +1,90 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Modified from wekws(https://github.com/wenet-e2e/wekws)
+import paddle
+from tqdm import tqdm
+from yacs.config import CfgNode
+
+from paddlespeech.kws.exps.mdtc.collate import collate_features
+from paddlespeech.kws.models.mdtc import KWSModel
+from paddlespeech.s2t.training.cli import default_argument_parser
+from paddlespeech.s2t.utils.dynamic_import import dynamic_import
+
+if __name__ == '__main__':
+    parser = default_argument_parser()
+    parser.add_argument(
+        "--ckpt",
+        type=str,
+        required=True,
+        help='model checkpoint for evaluation.')
+    parser.add_argument(
+        "--score_file",
+        type=str,
+        default='./scores.txt',
+        help='output file of trigger scores')
+    args = parser.parse_args()
+
+    # https://yaml.org/type/float.html
+    config = CfgNode(new_allowed=True)
+    if args.config:
+        config.merge_from_file(args.config)
+
+    # Dataset
+    ds_class = dynamic_import(config['dataset'])
+    test_ds = ds_class(
+        data_dir=config['data_dir'],
+        mode='test',
+        feat_type=config['feat_type'],
+        sample_rate=config['sample_rate'],
+        frame_shift=config['frame_shift'],
+        frame_length=config['frame_length'],
+        n_mels=config['n_mels'], )
+    test_sampler = paddle.io.BatchSampler(
+        test_ds, batch_size=config['batch_size'], drop_last=False)
+    test_loader = paddle.io.DataLoader(
+        test_ds,
+        batch_sampler=test_sampler,
+        num_workers=config['num_workers'],
+        return_list=True,
+        use_buffer_reader=True,
+        collate_fn=collate_features, )
+
+    # Model
+    backbone_class = dynamic_import(config['backbone'])
+    backbone = backbone_class(
+        stack_num=config['stack_num'],
+        stack_size=config['stack_size'],
+        in_channels=config['in_channels'],
+        res_channels=config['res_channels'],
+        kernel_size=config['kernel_size'], )
+    model = KWSModel(backbone=backbone, num_keywords=config['num_keywords'])
+    model.set_state_dict(paddle.load(args.ckpt))
+    model.eval()
+
+    with paddle.no_grad(), open(args.score_file, 'w', encoding='utf8') as f:
+        for batch_idx, batch in enumerate(
+                tqdm(test_loader, total=len(test_loader))):
+            keys, feats, labels, lengths = batch
+            logits = model(feats)
+            num_keywords = logits.shape[2]
+            for i in range(len(keys)):
+                key = keys[i]
+                score = logits[i][:lengths[i]]
+                for keyword_i in range(num_keywords):
+                    keyword_scores = score[:, keyword_i]
+                    score_frames = ' '.join(
+                        ['{:.6f}'.format(x) for x in keyword_scores.tolist()])
+                    f.write('{} {} {}\n'.format(key, keyword_i, score_frames))
+
+    print('Result saved to: {}'.format(args.score_file))
diff --git a/paddlespeech/kws/exps/mdtc/train.py b/paddlespeech/kws/exps/mdtc/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a9ca92d16ff3eada36840da6914397227005b49
--- /dev/null
+++ b/paddlespeech/kws/exps/mdtc/train.py
@@ -0,0 +1,177 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+
+import paddle
+from paddleaudio.utils import logger
+from paddleaudio.utils import Timer
+from yacs.config import CfgNode
+
+from paddlespeech.kws.exps.mdtc.collate import collate_features
+from paddlespeech.kws.models.loss import max_pooling_loss
+from paddlespeech.kws.models.mdtc import KWSModel
+from paddlespeech.s2t.training.cli import default_argument_parser
+from paddlespeech.s2t.utils.dynamic_import import dynamic_import
+
+if __name__ == '__main__':
+    parser = default_argument_parser()
+    args = parser.parse_args()
+
+    # https://yaml.org/type/float.html
+    config = CfgNode(new_allowed=True)
+    if args.config:
+        config.merge_from_file(args.config)
+
+    nranks = paddle.distributed.get_world_size()
+    if paddle.distributed.get_world_size() > 1:
+        paddle.distributed.init_parallel_env()
+    local_rank = paddle.distributed.get_rank()
+
+    # Dataset
+    ds_class = dynamic_import(config['dataset'])
+    train_ds = ds_class(
+        data_dir=config['data_dir'],
+        mode='train',
+        feat_type=config['feat_type'],
+        sample_rate=config['sample_rate'],
+        frame_shift=config['frame_shift'],
+        frame_length=config['frame_length'],
+        n_mels=config['n_mels'], )
+    dev_ds = ds_class(
+        data_dir=config['data_dir'],
+        mode='dev',
+        feat_type=config['feat_type'],
+        sample_rate=config['sample_rate'],
+        frame_shift=config['frame_shift'],
+        frame_length=config['frame_length'],
+        n_mels=config['n_mels'], )
+
+    train_sampler = paddle.io.DistributedBatchSampler(
+        train_ds,
+        batch_size=config['batch_size'],
+        shuffle=True,
+        drop_last=False)
+    train_loader = paddle.io.DataLoader(
+        train_ds,
+        batch_sampler=train_sampler,
+        num_workers=config['num_workers'],
+        return_list=True,
+        use_buffer_reader=True,
+        collate_fn=collate_features, )
+
+    # Model
+    backbone_class = dynamic_import(config['backbone'])
+    backbone = backbone_class(
+        stack_num=config['stack_num'],
+        stack_size=config['stack_size'],
+        in_channels=config['in_channels'],
+        res_channels=config['res_channels'],
+        kernel_size=config['kernel_size'], )
+    model = KWSModel(backbone=backbone, num_keywords=config['num_keywords'])
+    model = paddle.DataParallel(model)
+    clip = paddle.nn.ClipGradByGlobalNorm(config['grad_clip'])
+    optimizer = paddle.optimizer.Adam(
+        learning_rate=config['learning_rate'],
+        weight_decay=config['weight_decay'],
+        parameters=model.parameters(),
+        grad_clip=clip)
+    criterion = max_pooling_loss
+
+    steps_per_epoch = len(train_sampler)
+    timer = Timer(steps_per_epoch * config['epochs'])
+    timer.start()
+
+    for epoch in range(1, config['epochs'] + 1):
+        model.train()
+
+        avg_loss = 0
+        num_corrects = 0
+        num_samples = 0
+        for batch_idx, batch in enumerate(train_loader):
+            keys, feats, labels, lengths = batch
+            logits = model(feats)
+            loss, corrects, acc = criterion(logits, labels, lengths)
+            loss.backward()
+            optimizer.step()
+            if isinstance(optimizer._learning_rate,
+                          paddle.optimizer.lr.LRScheduler):
+                optimizer._learning_rate.step()
+            optimizer.clear_grad()
+
+            # Calculate loss
+            avg_loss += loss.numpy()[0]
+
+            # Calculate metrics
+            num_corrects += corrects
+            num_samples += feats.shape[0]
+
+            timer.count()
+
+            if (batch_idx + 1) % config['log_freq'] == 0 and local_rank == 0:
+                lr = optimizer.get_lr()
+                avg_loss /= config['log_freq']
+                avg_acc = num_corrects / num_samples
+
+                print_msg = 'Epoch={}/{}, Step={}/{}'.format(
+                    epoch, config['epochs'], batch_idx + 1, steps_per_epoch)
+                print_msg += ' loss={:.4f}'.format(avg_loss)
+                print_msg += ' acc={:.4f}'.format(avg_acc)
+                print_msg += ' lr={:.6f} step/sec={:.2f} | ETA {}'.format(
+                    lr, timer.timing, timer.eta)
+                logger.train(print_msg)
+
+                avg_loss = 0
+                num_corrects = 0
+                num_samples = 0
+
+        if epoch % config[
+                'save_freq'] == 0 and batch_idx + 1 == steps_per_epoch and local_rank == 0:
+            dev_sampler = paddle.io.BatchSampler(
+                dev_ds,
+                batch_size=config['batch_size'],
+                shuffle=False,
+                drop_last=False)
+            dev_loader = paddle.io.DataLoader(
+                dev_ds,
+                batch_sampler=dev_sampler,
+                num_workers=config['num_workers'],
+                return_list=True,
+                use_buffer_reader=True,
+                collate_fn=collate_features, )
+
+            model.eval()
+            num_corrects = 0
+            num_samples = 0
+            with logger.processing('Evaluation on validation dataset'):
+                for batch_idx, batch in enumerate(dev_loader):
+                    keys, feats, labels, lengths = batch
+                    logits = model(feats)
+                    loss, corrects, acc = criterion(logits, labels, lengths)
+                    num_corrects += corrects
+                    num_samples += feats.shape[0]
+
+            eval_acc = num_corrects / num_samples
+            print_msg = '[Evaluation result]'
+            print_msg += ' dev_acc={:.4f}'.format(eval_acc)
+
+            logger.eval(print_msg)
+
+            # Save model
+            save_dir = os.path.join(config['checkpoint_dir'],
+                                    'epoch_{}'.format(epoch))
+            logger.info('Saving model checkpoint to {}'.format(save_dir))
+            paddle.save(model.state_dict(),
+                        os.path.join(save_dir, 'model.pdparams'))
+            paddle.save(optimizer.state_dict(),
+                        os.path.join(save_dir, 'model.pdopt'))
diff --git a/paddlespeech/kws/models/__init__.py b/paddlespeech/kws/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..125a0d7a37877a6578d00586b3507ce1ca261b04
--- /dev/null
+++ b/paddlespeech/kws/models/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .mdtc import KWSModel
+from .mdtc import MDTC
diff --git a/paddlespeech/kws/models/loss.py b/paddlespeech/kws/models/loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..64c9a32c9f128338081999ebd209719e90fcf98c
--- /dev/null
+++ b/paddlespeech/kws/models/loss.py
@@ -0,0 +1,81 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Modified from wekws(https://github.com/wenet-e2e/wekws)
+import paddle
+
+
+def padding_mask(lengths: paddle.Tensor) -> paddle.Tensor:
+    batch_size = lengths.shape[0]
+    max_len = int(lengths.max().item())
+    seq = paddle.arange(max_len, dtype=paddle.int64)
+    seq = seq.expand((batch_size, max_len))
+    return seq >= lengths.unsqueeze(1)
+
+
+def fill_mask_elements(condition: paddle.Tensor, value: float,
+                       x: paddle.Tensor) -> paddle.Tensor:
+    assert condition.shape == x.shape
+    values = paddle.ones_like(x, dtype=x.dtype) * value
+    return paddle.where(condition, values, x)
+
+
+def max_pooling_loss(logits: paddle.Tensor,
+                     target: paddle.Tensor,
+                     lengths: paddle.Tensor,
+                     min_duration: int=0):
+
+    mask = padding_mask(lengths)
+    num_utts = logits.shape[0]
+    num_keywords = logits.shape[2]
+
+    loss = 0.0
+    for i in range(num_utts):
+        for j in range(num_keywords):
+            # Add entropy loss CE = -(t * log(p) + (1 - t) * log(1 - p))
+            if target[i] == j:
+                # For the keyword, do max-polling
+                prob = logits[i, :, j]
+                m = mask[i]
+                if min_duration > 0:
+                    m[:min_duration] = True
+                prob = fill_mask_elements(m, 0.0, prob)
+                prob = paddle.clip(prob, 1e-8, 1.0)
+                max_prob = prob.max()
+                loss += -paddle.log(max_prob)
+            else:
+                # For other keywords or filler, do min-polling
+                prob = 1 - logits[i, :, j]
+                prob = fill_mask_elements(mask[i], 1.0, prob)
+                prob = paddle.clip(prob, 1e-8, 1.0)
+                min_prob = prob.min()
+                loss += -paddle.log(min_prob)
+    loss = loss / num_utts
+
+    # Compute accuracy of current batch
+    mask = mask.unsqueeze(-1)
+    logits = fill_mask_elements(mask, 0.0, logits)
+    max_logits = logits.max(1)
+    num_correct = 0
+    for i in range(num_utts):
+        max_p = max_logits[i].max(0).item()
+        idx = max_logits[i].argmax(0).item()
+        # Predict correct as the i'th keyword
+        if max_p > 0.5 and idx == target[i].item():
+            num_correct += 1
+        # Predict correct as the filler, filler id < 0
+        if max_p < 0.5 and target[i].item() < 0:
+            num_correct += 1
+    acc = num_correct / num_utts
+    # acc = 0.0
+    return loss, num_correct, acc
diff --git a/paddlespeech/kws/models/mdtc.py b/paddlespeech/kws/models/mdtc.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d2e5de649c5adc43536e15f1867d19e75b589f6
--- /dev/null
+++ b/paddlespeech/kws/models/mdtc.py
@@ -0,0 +1,233 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Modified from wekws(https://github.com/wenet-e2e/wekws)
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+class DSDilatedConv1d(nn.Layer):
+    def __init__(
+            self,
+            in_channels: int,
+            out_channels: int,
+            kernel_size: int,
+            dilation: int=1,
+            stride: int=1,
+            bias: bool=True, ):
+        super(DSDilatedConv1d, self).__init__()
+        self.receptive_fields = dilation * (kernel_size - 1)
+        self.conv = nn.Conv1D(
+            in_channels,
+            in_channels,
+            kernel_size,
+            padding=0,
+            dilation=dilation,
+            stride=stride,
+            groups=in_channels,
+            bias_attr=bias, )
+        self.bn = nn.BatchNorm1D(in_channels)
+        self.pointwise = nn.Conv1D(
+            in_channels,
+            out_channels,
+            kernel_size=1,
+            padding=0,
+            dilation=1,
+            bias_attr=bias)
+
+    def forward(self, inputs: paddle.Tensor):
+        outputs = self.conv(inputs)
+        outputs = self.bn(outputs)
+        outputs = self.pointwise(outputs)
+        return outputs
+
+
+class TCNBlock(nn.Layer):
+    def __init__(
+            self,
+            in_channels: int,
+            res_channels: int,
+            kernel_size: int,
+            dilation: int,
+            causal: bool, ):
+        super(TCNBlock, self).__init__()
+        self.in_channels = in_channels
+        self.res_channels = res_channels
+        self.kernel_size = kernel_size
+        self.dilation = dilation
+        self.causal = causal
+        self.receptive_fields = dilation * (kernel_size - 1)
+        self.half_receptive_fields = self.receptive_fields // 2
+        self.conv1 = DSDilatedConv1d(
+            in_channels=in_channels,
+            out_channels=res_channels,
+            kernel_size=kernel_size,
+            dilation=dilation, )
+        self.bn1 = nn.BatchNorm1D(res_channels)
+        self.relu1 = nn.ReLU()
+
+        self.conv2 = nn.Conv1D(
+            in_channels=res_channels, out_channels=res_channels, kernel_size=1)
+        self.bn2 = nn.BatchNorm1D(res_channels)
+        self.relu2 = nn.ReLU()
+
+    def forward(self, inputs: paddle.Tensor):
+        outputs = self.relu1(self.bn1(self.conv1(inputs)))
+        outputs = self.bn2(self.conv2(outputs))
+        if self.causal:
+            inputs = inputs[:, :, self.receptive_fields:]
+        else:
+            inputs = inputs[:, :, self.half_receptive_fields:
+                            -self.half_receptive_fields]
+        if self.in_channels == self.res_channels:
+            res_out = self.relu2(outputs + inputs)
+        else:
+            res_out = self.relu2(outputs)
+        return res_out
+
+
+class TCNStack(nn.Layer):
+    def __init__(
+            self,
+            in_channels: int,
+            stack_num: int,
+            stack_size: int,
+            res_channels: int,
+            kernel_size: int,
+            causal: bool, ):
+        super(TCNStack, self).__init__()
+        self.in_channels = in_channels
+        self.stack_num = stack_num
+        self.stack_size = stack_size
+        self.res_channels = res_channels
+        self.kernel_size = kernel_size
+        self.causal = causal
+        self.res_blocks = self.stack_tcn_blocks()
+        self.receptive_fields = self.calculate_receptive_fields()
+        self.res_blocks = nn.Sequential(*self.res_blocks)
+
+    def calculate_receptive_fields(self):
+        receptive_fields = 0
+        for block in self.res_blocks:
+            receptive_fields += block.receptive_fields
+        return receptive_fields
+
+    def build_dilations(self):
+        dilations = []
+        for s in range(0, self.stack_size):
+            for l in range(0, self.stack_num):
+                dilations.append(2**l)
+        return dilations
+
+    def stack_tcn_blocks(self):
+        dilations = self.build_dilations()
+        res_blocks = nn.LayerList()
+
+        res_blocks.append(
+            TCNBlock(
+                self.in_channels,
+                self.res_channels,
+                self.kernel_size,
+                dilations[0],
+                self.causal, ))
+        for dilation in dilations[1:]:
+            res_blocks.append(
+                TCNBlock(
+                    self.res_channels,
+                    self.res_channels,
+                    self.kernel_size,
+                    dilation,
+                    self.causal, ))
+        return res_blocks
+
+    def forward(self, inputs: paddle.Tensor):
+        outputs = self.res_blocks(inputs)
+        return outputs
+
+
+class MDTC(nn.Layer):
+    def __init__(
+            self,
+            stack_num: int,
+            stack_size: int,
+            in_channels: int,
+            res_channels: int,
+            kernel_size: int,
+            causal: bool=True, ):
+        super(MDTC, self).__init__()
+        assert kernel_size % 2 == 1
+        self.kernel_size = kernel_size
+        self.causal = causal
+        self.preprocessor = TCNBlock(
+            in_channels, res_channels, kernel_size, dilation=1, causal=causal)
+        self.relu = nn.ReLU()
+        self.blocks = nn.LayerList()
+        self.receptive_fields = self.preprocessor.receptive_fields
+        for i in range(stack_num):
+            self.blocks.append(
+                TCNStack(res_channels, stack_size, 1, res_channels, kernel_size,
+                         causal))
+            self.receptive_fields += self.blocks[-1].receptive_fields
+        self.half_receptive_fields = self.receptive_fields // 2
+        self.hidden_dim = res_channels
+
+    def forward(self, x: paddle.Tensor):
+        if self.causal:
+            outputs = F.pad(x, (0, 0, self.receptive_fields, 0, 0, 0),
+                            'constant')
+        else:
+            outputs = F.pad(
+                x,
+                (0, 0, self.half_receptive_fields, self.half_receptive_fields,
+                 0, 0),
+                'constant', )
+        outputs = outputs.transpose([0, 2, 1])
+        outputs_list = []
+        outputs = self.relu(self.preprocessor(outputs))
+        for block in self.blocks:
+            outputs = block(outputs)
+            outputs_list.append(outputs)
+
+        normalized_outputs = []
+        output_size = outputs_list[-1].shape[-1]
+        for x in outputs_list:
+            remove_length = x.shape[-1] - output_size
+            if self.causal and remove_length > 0:
+                normalized_outputs.append(x[:, :, remove_length:])
+            elif not self.causal and remove_length > 1:
+                half_remove_length = remove_length // 2
+                normalized_outputs.append(
+                    x[:, :, half_remove_length:-half_remove_length])
+            else:
+                normalized_outputs.append(x)
+
+        outputs = paddle.zeros_like(
+            outputs_list[-1], dtype=outputs_list[-1].dtype)
+        for x in normalized_outputs:
+            outputs += x
+        outputs = outputs.transpose([0, 2, 1])
+        return outputs, None
+
+
+class KWSModel(nn.Layer):
+    def __init__(self, backbone, num_keywords):
+        super(KWSModel, self).__init__()
+        self.backbone = backbone
+        self.linear = nn.Linear(self.backbone.hidden_dim, num_keywords)
+        self.activation = nn.Sigmoid()
+
+    def forward(self, x):
+        outputs = self.backbone(x)
+        outputs = self.linear(outputs)
+        return self.activation(outputs)
diff --git a/paddlespeech/s2t/__init__.py b/paddlespeech/s2t/__init__.py
index 7acc371604fb0e73620b03dbef5b8fbe1d89aab8..2365071f31fbb51b0c8dec6950a8eb7521d92693 100644
--- a/paddlespeech/s2t/__init__.py
+++ b/paddlespeech/s2t/__init__.py
@@ -131,12 +131,14 @@ if not hasattr(paddle.Tensor, 'long'):
         "override long of paddle.Tensor if exists or register, remove this when fixed!"
     )
     paddle.Tensor.long = func_long
+    paddle.static.Variable.long = func_long
 
 if not hasattr(paddle.Tensor, 'numel'):
     logger.debug(
         "override numel of paddle.Tensor if exists or register, remove this when fixed!"
     )
     paddle.Tensor.numel = paddle.numel
+    paddle.static.Variable.numel = paddle.numel
 
 
 def new_full(x: paddle.Tensor,
@@ -151,6 +153,7 @@ if not hasattr(paddle.Tensor, 'new_full'):
         "override new_full of paddle.Tensor if exists or register, remove this when fixed!"
     )
     paddle.Tensor.new_full = new_full
+    paddle.static.Variable.new_full = new_full
 
 
 def eq(xs: paddle.Tensor, ys: Union[paddle.Tensor, float]) -> paddle.Tensor:
@@ -166,6 +169,7 @@ if not hasattr(paddle.Tensor, 'eq'):
         "override eq of paddle.Tensor if exists or register, remove this when fixed!"
     )
     paddle.Tensor.eq = eq
+    paddle.static.Variable.eq = eq
 
 if not hasattr(paddle, 'eq'):
     logger.debug(
@@ -182,6 +186,7 @@ if not hasattr(paddle.Tensor, 'contiguous'):
         "override contiguous of paddle.Tensor if exists or register, remove this when fixed!"
     )
     paddle.Tensor.contiguous = contiguous
+    paddle.static.Variable.contiguous = contiguous
 
 
 def size(xs: paddle.Tensor, *args: int) -> paddle.Tensor:
@@ -200,6 +205,7 @@ logger.debug(
     "(`to_static` do not process `size` property, maybe some `paddle` api dependent on it), remove this when fixed!"
 )
 paddle.Tensor.size = size
+paddle.static.Variable.size = size
 
 
 def view(xs: paddle.Tensor, *args: int) -> paddle.Tensor:
@@ -209,6 +215,7 @@ def view(xs: paddle.Tensor, *args: int) -> paddle.Tensor:
 if not hasattr(paddle.Tensor, 'view'):
     logger.debug("register user view to paddle.Tensor, remove this when fixed!")
     paddle.Tensor.view = view
+    paddle.static.Variable.view = view
 
 
 def view_as(xs: paddle.Tensor, ys: paddle.Tensor) -> paddle.Tensor:
@@ -219,6 +226,7 @@ if not hasattr(paddle.Tensor, 'view_as'):
     logger.debug(
         "register user view_as to paddle.Tensor, remove this when fixed!")
     paddle.Tensor.view_as = view_as
+    paddle.static.Variable.view_as = view_as
 
 
 def is_broadcastable(shp1, shp2):
@@ -246,6 +254,7 @@ if not hasattr(paddle.Tensor, 'masked_fill'):
     logger.debug(
         "register user masked_fill to paddle.Tensor, remove this when fixed!")
     paddle.Tensor.masked_fill = masked_fill
+    paddle.static.Variable.masked_fill = masked_fill
 
 
 def masked_fill_(xs: paddle.Tensor,
@@ -264,6 +273,7 @@ if not hasattr(paddle.Tensor, 'masked_fill_'):
     logger.debug(
         "register user masked_fill_ to paddle.Tensor, remove this when fixed!")
     paddle.Tensor.masked_fill_ = masked_fill_
+    paddle.static.Variable.maksed_fill_ = masked_fill_
 
 
 def fill_(xs: paddle.Tensor, value: Union[float, int]) -> paddle.Tensor:
@@ -276,6 +286,7 @@ if not hasattr(paddle.Tensor, 'fill_'):
     logger.debug(
         "register user fill_ to paddle.Tensor, remove this when fixed!")
     paddle.Tensor.fill_ = fill_
+    paddle.static.Variable.fill_ = fill_
 
 
 def repeat(xs: paddle.Tensor, *size: Any) -> paddle.Tensor:
@@ -286,6 +297,7 @@ if not hasattr(paddle.Tensor, 'repeat'):
     logger.debug(
         "register user repeat to paddle.Tensor, remove this when fixed!")
     paddle.Tensor.repeat = repeat
+    paddle.static.Variable.repeat = repeat
 
 if not hasattr(paddle.Tensor, 'softmax'):
     logger.debug(
@@ -310,6 +322,7 @@ if not hasattr(paddle.Tensor, 'type_as'):
     logger.debug(
         "register user type_as to paddle.Tensor, remove this when fixed!")
     setattr(paddle.Tensor, 'type_as', type_as)
+    setattr(paddle.static.Variable, 'type_as', type_as)
 
 
 def to(x: paddle.Tensor, *args, **kwargs) -> paddle.Tensor:
@@ -325,6 +338,7 @@ def to(x: paddle.Tensor, *args, **kwargs) -> paddle.Tensor:
 if not hasattr(paddle.Tensor, 'to'):
     logger.debug("register user to to paddle.Tensor, remove this when fixed!")
     setattr(paddle.Tensor, 'to', to)
+    setattr(paddle.static.Variable, 'to', to)
 
 
 def func_float(x: paddle.Tensor) -> paddle.Tensor:
@@ -335,6 +349,7 @@ if not hasattr(paddle.Tensor, 'float'):
     logger.debug(
         "register user float to paddle.Tensor, remove this when fixed!")
     setattr(paddle.Tensor, 'float', func_float)
+    setattr(paddle.static.Variable, 'float', func_float)
 
 
 def func_int(x: paddle.Tensor) -> paddle.Tensor:
@@ -344,6 +359,7 @@ def func_int(x: paddle.Tensor) -> paddle.Tensor:
 if not hasattr(paddle.Tensor, 'int'):
     logger.debug("register user int to paddle.Tensor, remove this when fixed!")
     setattr(paddle.Tensor, 'int', func_int)
+    setattr(paddle.static.Variable, 'int', func_int)
 
 
 def tolist(x: paddle.Tensor) -> List[Any]:
@@ -354,6 +370,7 @@ if not hasattr(paddle.Tensor, 'tolist'):
     logger.debug(
         "register user tolist to paddle.Tensor, remove this when fixed!")
     setattr(paddle.Tensor, 'tolist', tolist)
+    setattr(paddle.static.Variable, 'tolist', tolist)
 
 ########### hack paddle.nn #############
 from paddle.nn import Layer
diff --git a/paddlespeech/s2t/exps/deepspeech2/bin/train.py b/paddlespeech/s2t/exps/deepspeech2/bin/train.py
index 09e8662f1ce75ff351d7981dc1c4382082f9b61a..e2c68d4be9ae2de9875c4a95d06c6542fd397ce3 100644
--- a/paddlespeech/s2t/exps/deepspeech2/bin/train.py
+++ b/paddlespeech/s2t/exps/deepspeech2/bin/train.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Trainer for DeepSpeech2 model."""
-from paddle import distributed as dist
 from yacs.config import CfgNode
 
 from paddlespeech.s2t.exps.deepspeech2.model import DeepSpeech2Trainer as Trainer
@@ -27,10 +26,7 @@ def main_sp(config, args):
 
 
 def main(config, args):
-    if args.ngpu > 1:
-        dist.spawn(main_sp, args=(config, args), nprocs=args.ngpu)
-    else:
-        main_sp(config, args)
+    main_sp(config, args)
 
 
 if __name__ == "__main__":
diff --git a/paddlespeech/s2t/exps/u2/bin/train.py b/paddlespeech/s2t/exps/u2/bin/train.py
index 53c223283f180397e69b6a5290a4572e5a76cc41..dc3a87c1618a508b1c4d785858e856af06b3614a 100644
--- a/paddlespeech/s2t/exps/u2/bin/train.py
+++ b/paddlespeech/s2t/exps/u2/bin/train.py
@@ -15,7 +15,6 @@
 import cProfile
 import os
 
-from paddle import distributed as dist
 from yacs.config import CfgNode
 
 from paddlespeech.s2t.exps.u2.model import U2Trainer as Trainer
@@ -32,10 +31,7 @@ def main_sp(config, args):
 
 
 def main(config, args):
-    if args.ngpu > 1:
-        dist.spawn(main_sp, args=(config, args), nprocs=args.ngpu)
-    else:
-        main_sp(config, args)
+    main_sp(config, args)
 
 
 if __name__ == "__main__":
diff --git a/paddlespeech/s2t/exps/u2_kaldi/bin/train.py b/paddlespeech/s2t/exps/u2_kaldi/bin/train.py
index fcfc05a8aea553d154ebde5289560d9c818672ff..b11da7154732bd9b6e5267028f1efca977895e2d 100644
--- a/paddlespeech/s2t/exps/u2_kaldi/bin/train.py
+++ b/paddlespeech/s2t/exps/u2_kaldi/bin/train.py
@@ -15,7 +15,6 @@
 import cProfile
 import os
 
-from paddle import distributed as dist
 from yacs.config import CfgNode
 
 from paddlespeech.s2t.training.cli import default_argument_parser
@@ -36,10 +35,7 @@ def main_sp(config, args):
 
 
 def main(config, args):
-    if args.ngpu > 1:
-        dist.spawn(main_sp, args=(config, args), nprocs=args.ngpu)
-    else:
-        main_sp(config, args)
+    main_sp(config, args)
 
 
 if __name__ == "__main__":
diff --git a/paddlespeech/s2t/exps/u2_st/bin/train.py b/paddlespeech/s2t/exps/u2_st/bin/train.py
index 4dec9ec8ae5da869f8b78282ea19c7a6964610db..574942e5af0f6cd5dccdcc0f83d46ffa5378b3a0 100644
--- a/paddlespeech/s2t/exps/u2_st/bin/train.py
+++ b/paddlespeech/s2t/exps/u2_st/bin/train.py
@@ -15,7 +15,6 @@
 import cProfile
 import os
 
-from paddle import distributed as dist
 from yacs.config import CfgNode
 
 from paddlespeech.s2t.exps.u2_st.model import U2STTrainer as Trainer
@@ -30,10 +29,7 @@ def main_sp(config, args):
 
 
 def main(config, args):
-    if args.ngpu > 1:
-        dist.spawn(main_sp, args=(config, args), nprocs=args.ngpu)
-    else:
-        main_sp(config, args)
+    main_sp(config, args)
 
 
 if __name__ == "__main__":
diff --git a/paddlespeech/s2t/frontend/featurizer/audio_featurizer.py b/paddlespeech/s2t/frontend/featurizer/audio_featurizer.py
index 6f3b646c5ac5c0e19bdddc54d9ed398fbf14a263..22329d5e028ebed0a87af69c30eba1e6513d6226 100644
--- a/paddlespeech/s2t/frontend/featurizer/audio_featurizer.py
+++ b/paddlespeech/s2t/frontend/featurizer/audio_featurizer.py
@@ -13,8 +13,9 @@
 # limitations under the License.
 """Contains the audio featurizer class."""
 import numpy as np
+import paddle
+import paddleaudio.compliance.kaldi as kaldi
 from python_speech_features import delta
-from python_speech_features import logfbank
 from python_speech_features import mfcc
 
 
@@ -345,19 +346,17 @@ class AudioFeaturizer():
             raise ValueError("Stride size must not be greater than "
                              "window size.")
         # (T, D)
-        fbank_feat = logfbank(
-            signal=samples,
-            samplerate=sample_rate,
-            winlen=0.001 * window_ms,
-            winstep=0.001 * stride_ms,
-            nfilt=feat_dim,
-            nfft=512,
-            lowfreq=20,
-            highfreq=max_freq,
+        waveform = paddle.to_tensor(
+            np.expand_dims(samples, 0), dtype=paddle.float32)
+        mat = kaldi.fbank(
+            waveform,
+            n_mels=feat_dim,
+            frame_length=window_ms,  # default : 25
+            frame_shift=stride_ms,  # default : 10
             dither=dither,
-            remove_dc_offset=True,
-            preemph=0.97,
-            wintype='povey')
+            energy_floor=0.0,
+            sr=sample_rate)
+        fbank_feat = np.squeeze(mat.numpy())
         if delta_delta:
             fbank_feat = self._concat_delta_delta(fbank_feat)
         return fbank_feat
diff --git a/paddlespeech/s2t/io/sampler.py b/paddlespeech/s2t/io/sampler.py
index ac55af1236f11d175e9e7717220980cf95c7d79b..89752bb9fdb98faecc0ccc5b8f59ea1f09efc8b6 100644
--- a/paddlespeech/s2t/io/sampler.py
+++ b/paddlespeech/s2t/io/sampler.py
@@ -51,7 +51,7 @@ def _batch_shuffle(indices, batch_size, epoch, clipped=False):
     """
     rng = np.random.RandomState(epoch)
     shift_len = rng.randint(0, batch_size - 1)
-    batch_indices = list(zip(* [iter(indices[shift_len:])] * batch_size))
+    batch_indices = list(zip(*[iter(indices[shift_len:])] * batch_size))
     rng.shuffle(batch_indices)
     batch_indices = [item for batch in batch_indices for item in batch]
     assert clipped is False
diff --git a/paddlespeech/s2t/modules/decoder.py b/paddlespeech/s2t/modules/decoder.py
index 3a851ec62c35f633ce07fd0b4380d92b31d67b3b..42ac119b44540a1931408b1b86aa75e8b1413597 100644
--- a/paddlespeech/s2t/modules/decoder.py
+++ b/paddlespeech/s2t/modules/decoder.py
@@ -62,21 +62,21 @@ class TransformerDecoder(BatchScorerInterface, nn.Layer):
             False: x -> x + att(x)
     """
 
-    def __init__(
-            self,
-            vocab_size: int,
-            encoder_output_size: int,
-            attention_heads: int=4,
-            linear_units: int=2048,
-            num_blocks: int=6,
-            dropout_rate: float=0.1,
-            positional_dropout_rate: float=0.1,
-            self_attention_dropout_rate: float=0.0,
-            src_attention_dropout_rate: float=0.0,
-            input_layer: str="embed",
-            use_output_layer: bool=True,
-            normalize_before: bool=True,
-            concat_after: bool=False, ):
+    def __init__(self,
+                 vocab_size: int,
+                 encoder_output_size: int,
+                 attention_heads: int=4,
+                 linear_units: int=2048,
+                 num_blocks: int=6,
+                 dropout_rate: float=0.1,
+                 positional_dropout_rate: float=0.1,
+                 self_attention_dropout_rate: float=0.0,
+                 src_attention_dropout_rate: float=0.0,
+                 input_layer: str="embed",
+                 use_output_layer: bool=True,
+                 normalize_before: bool=True,
+                 concat_after: bool=False,
+                 max_len: int=5000):
 
         assert check_argument_types()
 
@@ -87,7 +87,8 @@ class TransformerDecoder(BatchScorerInterface, nn.Layer):
         if input_layer == "embed":
             self.embed = nn.Sequential(
                 Embedding(vocab_size, attention_dim),
-                PositionalEncoding(attention_dim, positional_dropout_rate), )
+                PositionalEncoding(
+                    attention_dim, positional_dropout_rate, max_len=max_len), )
         else:
             raise ValueError(f"only 'embed' is supported: {input_layer}")
 
diff --git a/paddlespeech/s2t/modules/embedding.py b/paddlespeech/s2t/modules/embedding.py
index 5d4e91753b38129a9c2c71d706787af9d14a903d..596f61b78a4e449b2998b3544dd4204371aa8a2b 100644
--- a/paddlespeech/s2t/modules/embedding.py
+++ b/paddlespeech/s2t/modules/embedding.py
@@ -112,7 +112,9 @@ class PositionalEncoding(nn.Layer, PositionalEncodingInterface):
             paddle.Tensor: for compatibility to RelPositionalEncoding, (batch=1, time, ...)
         """
         T = x.shape[1]
-        assert offset + x.shape[1] < self.max_len
+        assert offset + x.shape[
+            1] < self.max_len, "offset: {} + x.shape[1]: {} is larger than the max_len: {}".format(
+                offset, x.shape[1], self.max_len)
         #TODO(Hui Zhang): using T = x.size(1), __getitem__ not support Tensor
         pos_emb = self.pe[:, offset:offset + T]
         x = x * self.xscale + pos_emb
@@ -148,6 +150,7 @@ class RelPositionalEncoding(PositionalEncoding):
             max_len (int, optional): [Maximum input length.]. Defaults to 5000.
         """
         super().__init__(d_model, dropout_rate, max_len, reverse=True)
+        logger.info(f"max len: {max_len}")
 
     def forward(self, x: paddle.Tensor,
                 offset: int=0) -> Tuple[paddle.Tensor, paddle.Tensor]:
@@ -158,7 +161,9 @@ class RelPositionalEncoding(PositionalEncoding):
             paddle.Tensor: Encoded tensor (batch, time, `*`).
             paddle.Tensor: Positional embedding tensor (1, time, `*`).
         """
-        assert offset + x.shape[1] < self.max_len
+        assert offset + x.shape[
+            1] < self.max_len, "offset: {} + x.shape[1]: {} is larger than the max_len: {}".format(
+                offset, x.shape[1], self.max_len)
         x = x * self.xscale
         #TODO(Hui Zhang): using x.size(1), __getitem__ not support Tensor
         pos_emb = self.pe[:, offset:offset + x.shape[1]]
diff --git a/paddlespeech/s2t/modules/encoder.py b/paddlespeech/s2t/modules/encoder.py
index c843c0e207054b20a5d3850334198ef6bcb6888c..669a12d656947f0446eba3d228832964e8c1d7b0 100644
--- a/paddlespeech/s2t/modules/encoder.py
+++ b/paddlespeech/s2t/modules/encoder.py
@@ -47,24 +47,24 @@ __all__ = ["BaseEncoder", 'TransformerEncoder', "ConformerEncoder"]
 
 
 class BaseEncoder(nn.Layer):
-    def __init__(
-            self,
-            input_size: int,
-            output_size: int=256,
-            attention_heads: int=4,
-            linear_units: int=2048,
-            num_blocks: int=6,
-            dropout_rate: float=0.1,
-            positional_dropout_rate: float=0.1,
-            attention_dropout_rate: float=0.0,
-            input_layer: str="conv2d",
-            pos_enc_layer_type: str="abs_pos",
-            normalize_before: bool=True,
-            concat_after: bool=False,
-            static_chunk_size: int=0,
-            use_dynamic_chunk: bool=False,
-            global_cmvn: paddle.nn.Layer=None,
-            use_dynamic_left_chunk: bool=False, ):
+    def __init__(self,
+                 input_size: int,
+                 output_size: int=256,
+                 attention_heads: int=4,
+                 linear_units: int=2048,
+                 num_blocks: int=6,
+                 dropout_rate: float=0.1,
+                 positional_dropout_rate: float=0.1,
+                 attention_dropout_rate: float=0.0,
+                 input_layer: str="conv2d",
+                 pos_enc_layer_type: str="abs_pos",
+                 normalize_before: bool=True,
+                 concat_after: bool=False,
+                 static_chunk_size: int=0,
+                 use_dynamic_chunk: bool=False,
+                 global_cmvn: paddle.nn.Layer=None,
+                 use_dynamic_left_chunk: bool=False,
+                 max_len: int=5000):
         """
         Args:
             input_size (int): input dim, d_feature
@@ -127,7 +127,9 @@ class BaseEncoder(nn.Layer):
             odim=output_size,
             dropout_rate=dropout_rate,
             pos_enc_class=pos_enc_class(
-                d_model=output_size, dropout_rate=positional_dropout_rate), )
+                d_model=output_size,
+                dropout_rate=positional_dropout_rate,
+                max_len=max_len), )
 
         self.normalize_before = normalize_before
         self.after_norm = LayerNorm(output_size, epsilon=1e-12)
@@ -415,32 +417,32 @@ class TransformerEncoder(BaseEncoder):
 class ConformerEncoder(BaseEncoder):
     """Conformer encoder module."""
 
-    def __init__(
-            self,
-            input_size: int,
-            output_size: int=256,
-            attention_heads: int=4,
-            linear_units: int=2048,
-            num_blocks: int=6,
-            dropout_rate: float=0.1,
-            positional_dropout_rate: float=0.1,
-            attention_dropout_rate: float=0.0,
-            input_layer: str="conv2d",
-            pos_enc_layer_type: str="rel_pos",
-            normalize_before: bool=True,
-            concat_after: bool=False,
-            static_chunk_size: int=0,
-            use_dynamic_chunk: bool=False,
-            global_cmvn: nn.Layer=None,
-            use_dynamic_left_chunk: bool=False,
-            positionwise_conv_kernel_size: int=1,
-            macaron_style: bool=True,
-            selfattention_layer_type: str="rel_selfattn",
-            activation_type: str="swish",
-            use_cnn_module: bool=True,
-            cnn_module_kernel: int=15,
-            causal: bool=False,
-            cnn_module_norm: str="batch_norm", ):
+    def __init__(self,
+                 input_size: int,
+                 output_size: int=256,
+                 attention_heads: int=4,
+                 linear_units: int=2048,
+                 num_blocks: int=6,
+                 dropout_rate: float=0.1,
+                 positional_dropout_rate: float=0.1,
+                 attention_dropout_rate: float=0.0,
+                 input_layer: str="conv2d",
+                 pos_enc_layer_type: str="rel_pos",
+                 normalize_before: bool=True,
+                 concat_after: bool=False,
+                 static_chunk_size: int=0,
+                 use_dynamic_chunk: bool=False,
+                 global_cmvn: nn.Layer=None,
+                 use_dynamic_left_chunk: bool=False,
+                 positionwise_conv_kernel_size: int=1,
+                 macaron_style: bool=True,
+                 selfattention_layer_type: str="rel_selfattn",
+                 activation_type: str="swish",
+                 use_cnn_module: bool=True,
+                 cnn_module_kernel: int=15,
+                 causal: bool=False,
+                 cnn_module_norm: str="batch_norm",
+                 max_len: int=5000):
         """Construct ConformerEncoder
         Args:
             input_size to use_dynamic_chunk, see in BaseEncoder
@@ -464,7 +466,7 @@ class ConformerEncoder(BaseEncoder):
                          attention_dropout_rate, input_layer,
                          pos_enc_layer_type, normalize_before, concat_after,
                          static_chunk_size, use_dynamic_chunk, global_cmvn,
-                         use_dynamic_left_chunk)
+                         use_dynamic_left_chunk, max_len)
         activation = get_activation(activation_type)
 
         # self-attention module definition
diff --git a/paddlespeech/s2t/training/trainer.py b/paddlespeech/s2t/training/trainer.py
index de90c9ef889c76f5c9733cf7347f4656d2035ca6..84da251aa062b3a82f0c4d1a4f2c012361b86ae6 100644
--- a/paddlespeech/s2t/training/trainer.py
+++ b/paddlespeech/s2t/training/trainer.py
@@ -289,7 +289,8 @@ class Trainer():
                                                              float) else f"{v}"
                             msg += ","
                         msg = msg[:-1]  # remove the last ","
-                        logger.info(msg)
+                        if (batch_index + 1) % self.config.log_interval == 0:
+                            logger.info(msg)
                         data_start_time = time.time()
                 except Exception as e:
                     logger.error(e)
@@ -316,10 +317,10 @@ class Trainer():
                 self.visualizer.add_scalar(
                     tag='eval/lr', value=self.lr_scheduler(), step=self.epoch)
 
-            # after epoch
-            self.save(tag=self.epoch, infos={'val_loss': cv_loss})
             # step lr every epoch
             self.lr_scheduler.step()
+            # after epoch
+            self.save(tag=self.epoch, infos={'val_loss': cv_loss})
             self.new_epoch()
 
     def run(self):
diff --git a/paddlespeech/s2t/transform/spectrogram.py b/paddlespeech/s2t/transform/spectrogram.py
index 4a65548fe141bb7e23b1b04fa990d998891d922d..2a93bedc87edbb9f33799fb145e19de4e0ee9d57 100644
--- a/paddlespeech/s2t/transform/spectrogram.py
+++ b/paddlespeech/s2t/transform/spectrogram.py
@@ -15,9 +15,8 @@
 import librosa
 import numpy as np
 import paddle
-from python_speech_features import logfbank
-
 import paddleaudio.compliance.kaldi as kaldi
+from python_speech_features import logfbank
 
 
 def stft(x,
diff --git a/paddlespeech/server/README.md b/paddlespeech/server/README.md
index 8f140e4e14c8b2049eb27960338f362d506f5124..34b7fc2ae3b3df5d204e2806ec355e0479f6cb42 100644
--- a/paddlespeech/server/README.md
+++ b/paddlespeech/server/README.md
@@ -10,7 +10,9 @@
  paddlespeech_server help
  ```
  ### Start the server
- First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started
+ First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started.
+ **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
+
  Then start the service:
  ```bash
  paddlespeech_server start --config_file ./conf/application.yaml
@@ -48,3 +50,37 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
 ```
 paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
 ```
+
+## Online TTS Server
+
+### Lanuch online tts server
+```
+paddlespeech_server start --config_file conf/tts_online_application.yaml
+```
+
+### Access online tts server
+
+```
+paddlespeech_client tts_online  --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨深度学习框架！" --output output.wav
+```
+
+
+## Speaker Verification
+
+### Lanuch speaker verification server
+
+```
+paddlespeech_server start --config_file conf/vector_application.yaml
+```
+
+### Extract speaker embedding from aduio
+
+```
+paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+```
+
+### Get score with speaker audio embedding
+
+```
+paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 123456789.wav --test 85236145389.wav
+```
diff --git a/paddlespeech/server/README_cn.md b/paddlespeech/server/README_cn.md
index 91df981781693ddafefe4964d8d25bc0ad6a8f74..4bd4d873fe8726a0a22d7f2e419ec909ee5ad2f3 100644
--- a/paddlespeech/server/README_cn.md
+++ b/paddlespeech/server/README_cn.md
@@ -11,6 +11,7 @@
  ```
  ### 启动服务
  首先设置服务相关配置文件，类似于 `./conf/application.yaml`，设置 `engine_list`，该值表示即将启动的服务中包含的语音任务。
+ **注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。
  然后启动服务：
  ```bash
  paddlespeech_server start --config_file ./conf/application.yaml
@@ -49,3 +50,37 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
 ```
 paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input zh.wav
 ```
+
+## 流式TTS
+
+### 启动流式语音合成服务
+
+```
+paddlespeech_server start --config_file conf/tts_online_application.yaml
+```
+
+### 访问流式语音合成服务
+
+```
+paddlespeech_client tts_online  --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨深度学习框架！" --output output.wav
+```
+
+## 声纹识别
+
+### 启动声纹识别服务
+
+```
+paddlespeech_server start --config_file conf/vector_application.yaml
+```
+
+### 获取说话人音频声纹
+
+```
+paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
+```
+
+### 两个说话人音频声纹打分
+
+```
+paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 123456789.wav --test 85236145389.wav
+```
diff --git a/paddlespeech/server/bin/main.py b/paddlespeech/server/bin/main.py
deleted file mode 100644
index 81824c85c46687ff12d2ffd366743eaf237dbd9a..0000000000000000000000000000000000000000
--- a/paddlespeech/server/bin/main.py
+++ /dev/null
@@ -1,77 +0,0 @@
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import argparse
-
-import uvicorn
-from fastapi import FastAPI
-
-from paddlespeech.server.engine.engine_pool import init_engine_pool
-from paddlespeech.server.restful.api import setup_router as setup_http_router
-from paddlespeech.server.utils.config import get_config
-from paddlespeech.server.ws.api import setup_router as setup_ws_router
-
-app = FastAPI(
-    title="PaddleSpeech Serving API", description="Api", version="0.0.1")
-
-
-def init(config):
-    """system initialization
-
-    Args:
-        config (CfgNode): config object
-
-    Returns:
-        bool: 
-    """
-    # init api
-    api_list = list(engine.split("_")[0] for engine in config.engine_list)
-    if config.protocol == "websocket":
-        api_router = setup_ws_router(api_list)
-    elif config.protocol == "http":
-        api_router = setup_http_router(api_list)
-    else:
-        raise Exception("unsupported protocol")
-    app.include_router(api_router)
-
-    if not init_engine_pool(config):
-        return False
-
-    return True
-
-
-def main(args):
-    """main function"""
-
-    config = get_config(args.config_file)
-
-    if init(config):
-        uvicorn.run(app, host=config.host, port=config.port, debug=True)
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "--config_file",
-        action="store",
-        help="yaml file of the app",
-        default="./conf/application.yaml")
-
-    parser.add_argument(
-        "--log_file",
-        action="store",
-        help="log file",
-        default="./log/paddlespeech.log")
-    args = parser.parse_args()
-
-    main(args)
diff --git a/paddlespeech/server/bin/paddlespeech_client.py b/paddlespeech/server/bin/paddlespeech_client.py
index d7858be6a5eb14597642bee0bbb2b726f88aadbe..74e7ce3fe8c3693d9f3f293a59e9a3c574dd5534 100644
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
@@ -16,10 +16,11 @@ import asyncio
 import base64
 import io
 import json
-import logging
 import os
 import random
+import sys
 import time
+import warnings
 from typing import List
 
 import numpy as np
@@ -30,13 +31,15 @@ from ..executor import BaseExecutor
 from ..util import cli_client_register
 from ..util import stats_wrapper
 from paddlespeech.cli.log import logger
-from paddlespeech.server.utils.audio_handler import ASRAudioHandler
+from paddlespeech.server.utils.audio_handler import ASRWsAudioHandler
 from paddlespeech.server.utils.audio_process import wav2pcm
+from paddlespeech.server.utils.util import compute_delay
 from paddlespeech.server.utils.util import wav2base64
+warnings.filterwarnings("ignore")
 
 __all__ = [
-    'TTSClientExecutor', 'ASRClientExecutor', 'ASROnlineClientExecutor',
-    'CLSClientExecutor'
+    'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor',
+    'ASROnlineClientExecutor', 'CLSClientExecutor', 'VectorClientExecutor'
 ]
 
 
@@ -91,7 +94,7 @@ class TTSClientExecutor(BaseExecutor):
             temp_wav = str(random.getrandbits(128)) + ".wav"
             soundfile.write(temp_wav, samples, sample_rate)
             wav2pcm(temp_wav, outfile, data_type=np.int16)
-            os.system("rm %s" % (temp_wav))
+            os.remove(temp_wav)
         else:
             logger.error("The format for saving audio only supports wav or pcm")
 
@@ -128,6 +131,7 @@ class TTSClientExecutor(BaseExecutor):
             return True
         except Exception as e:
             logger.error("Failed to synthesized audio.")
+            logger.error(e)
             return False
 
     @stats_wrapper
@@ -161,6 +165,142 @@ class TTSClientExecutor(BaseExecutor):
         return res
 
 
+@cli_client_register(
+    name='paddlespeech_client.tts_online',
+    description='visit tts online service')
+class TTSOnlineClientExecutor(BaseExecutor):
+    def __init__(self):
+        super(TTSOnlineClientExecutor, self).__init__()
+        self.parser = argparse.ArgumentParser(
+            prog='paddlespeech_client.tts_online', add_help=True)
+        self.parser.add_argument(
+            '--server_ip', type=str, default='127.0.0.1', help='server ip')
+        self.parser.add_argument(
+            '--port', type=int, default=8092, help='server port')
+        self.parser.add_argument(
+            '--protocol',
+            type=str,
+            default="http",
+            choices=["http", "websocket"],
+            help='server protocol')
+        self.parser.add_argument(
+            '--input',
+            type=str,
+            default=None,
+            help='Text to be synthesized.',
+            required=True)
+        self.parser.add_argument(
+            '--spk_id', type=int, default=0, help='Speaker id')
+        self.parser.add_argument(
+            '--speed',
+            type=float,
+            default=1.0,
+            help='Audio speed, the value should be set between 0 and 3')
+        self.parser.add_argument(
+            '--volume',
+            type=float,
+            default=1.0,
+            help='Audio volume, the value should be set between 0 and 3')
+        self.parser.add_argument(
+            '--sample_rate',
+            type=int,
+            default=0,
+            choices=[0, 8000, 16000],
+            help='Sampling rate, the default is the same as the model')
+        self.parser.add_argument(
+            '--output', type=str, default=None, help='Synthesized audio file')
+        self.parser.add_argument(
+            "--play", type=bool, help="whether to play audio", default=False)
+
+    def execute(self, argv: List[str]) -> bool:
+        args = self.parser.parse_args(argv)
+        input_ = args.input
+        server_ip = args.server_ip
+        port = args.port
+        protocol = args.protocol
+        spk_id = args.spk_id
+        speed = args.speed
+        volume = args.volume
+        sample_rate = args.sample_rate
+        output = args.output
+        play = args.play
+
+        try:
+            self(
+                input=input_,
+                server_ip=server_ip,
+                port=port,
+                protocol=protocol,
+                spk_id=spk_id,
+                speed=speed,
+                volume=volume,
+                sample_rate=sample_rate,
+                output=output,
+                play=play)
+            return True
+        except Exception as e:
+            logger.error("Failed to synthesized audio.")
+            logger.error(e)
+            return False
+
+    @stats_wrapper
+    def __call__(self,
+                 input: str,
+                 server_ip: str="127.0.0.1",
+                 port: int=8092,
+                 protocol: str="http",
+                 spk_id: int=0,
+                 speed: float=1.0,
+                 volume: float=1.0,
+                 sample_rate: int=0,
+                 output: str=None,
+                 play: bool=False):
+        """
+        Python API to call an executor.
+        """
+
+        if protocol == "http":
+            logger.info("tts http client start")
+            from paddlespeech.server.utils.audio_handler import TTSHttpHandler
+            handler = TTSHttpHandler(server_ip, port, play)
+            first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list = handler.run(
+                input, spk_id, speed, volume, sample_rate, output)
+            delay_time_list = compute_delay(receive_time_list,
+                                            chunk_duration_list)
+
+        elif protocol == "websocket":
+            from paddlespeech.server.utils.audio_handler import TTSWsHandler
+            logger.info("tts websocket client start")
+            handler = TTSWsHandler(server_ip, port, play)
+            loop = asyncio.get_event_loop()
+            first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list = loop.run_until_complete(
+                handler.run(input, output))
+            delay_time_list = compute_delay(receive_time_list,
+                                            chunk_duration_list)
+
+        else:
+            logger.error("Please set correct protocol, http or websocket")
+            sys.exit(-1)
+
+        logger.info(f"sentence: {input}")
+        logger.info(f"duration: {duration} s")
+        logger.info(f"first response: {first_response} s")
+        logger.info(f"final response: {final_response} s")
+        logger.info(f"RTF: {final_response/duration}")
+        if output is not None:
+            if save_audio_success:
+                logger.info(f"Audio successfully saved in {output}")
+            else:
+                logger.error("Audio save failed.")
+
+        if delay_time_list != []:
+            logger.info(
+                f"Delay situation: total number of packages: {len(receive_time_list)}, the number of delayed packets: {len(delay_time_list)}, minimum delay time: {min(delay_time_list)} s, maximum delay time: {max(delay_time_list)} s, average delay time: {sum(delay_time_list)/len(delay_time_list)} s, delay rate:{len(delay_time_list)/len(receive_time_list)}"
+            )
+        else:
+            logger.info("The sentence has no delay in streaming synthesis.")
+
+
 @cli_client_register(
     name='paddlespeech_client.asr', description='visit asr service')
 class ASRClientExecutor(BaseExecutor):
@@ -178,6 +318,12 @@ class ASRClientExecutor(BaseExecutor):
             default=None,
             help='Audio file to be recognized',
             required=True)
+        self.parser.add_argument(
+            '--protocol',
+            type=str,
+            default="http",
+            choices=["http", "websocket"],
+            help='server protocol')
         self.parser.add_argument(
             '--sample_rate', type=int, default=16000, help='audio sample rate')
         self.parser.add_argument(
@@ -185,6 +331,19 @@ class ASRClientExecutor(BaseExecutor):
         self.parser.add_argument(
             '--audio_format', type=str, default="wav", help='audio format')
 
+        self.parser.add_argument(
+            '--punc.server_ip',
+            type=str,
+            default=None,
+            dest="punc_server_ip",
+            help='Punctuation server ip')
+        self.parser.add_argument(
+            '--punc.port',
+            type=int,
+            default=8091,
+            dest="punc_server_port",
+            help='Punctuation server port')
+
     def execute(self, argv: List[str]) -> bool:
         args = self.parser.parse_args(argv)
         input_ = args.input
@@ -193,6 +352,7 @@ class ASRClientExecutor(BaseExecutor):
         sample_rate = args.sample_rate
         lang = args.lang
         audio_format = args.audio_format
+        protocol = args.protocol
 
         try:
             time_start = time.time()
@@ -202,13 +362,17 @@ class ASRClientExecutor(BaseExecutor):
                 port=port,
                 sample_rate=sample_rate,
                 lang=lang,
-                audio_format=audio_format)
+                audio_format=audio_format,
+                protocol=protocol,
+                punc_server_ip=args.punc_server_ip,
+                punc_server_port=args.punc_server_port)
             time_end = time.time()
-            logger.info(res.json())
+            logger.info(f"ASR result: {res}")
             logger.info("Response time %f s." % (time_end - time_start))
             return True
         except Exception as e:
             logger.error("Failed to speech recognition.")
+            logger.error(e)
             return False
 
     @stats_wrapper
@@ -218,21 +382,39 @@ class ASRClientExecutor(BaseExecutor):
                  port: int=8090,
                  sample_rate: int=16000,
                  lang: str="zh_cn",
-                 audio_format: str="wav"):
-        """
-        Python API to call an executor.
+                 audio_format: str="wav",
+                 protocol: str="http",
+                 punc_server_ip: str=None,
+                 punc_server_port: int=None):
+        """Python API to call an executor.
+
+        Args:
+            input (str): The input audio file path
+            server_ip (str, optional): The ASR server ip. Defaults to "127.0.0.1".
+            port (int, optional): The ASR server port. Defaults to 8090.
+            sample_rate (int, optional): The audio sample rate. Defaults to 16000.
+            lang (str, optional): The audio language type. Defaults to "zh_cn".
+            audio_format (str, optional): The audio format information. Defaults to "wav".
+            protocol (str, optional): The ASR server. Defaults to "http".
+
+        Returns:
+            str: The ASR results
         """
+        # we use the asr server to recognize the audio text content
+        # and paddlespeech_client asr only support http protocol
+        protocol = "http"
+        if protocol.lower() == "http":
+            from paddlespeech.server.utils.audio_handler import ASRHttpHandler
+            logger.info("asr http client start")
+            handler = ASRHttpHandler(server_ip=server_ip, port=port)
+            res = handler.run(input, audio_format, sample_rate, lang)
+            res = res['result']['transcription']
+            logger.info("asr http client finished")
+        else:
+            logger.error(f"Sorry, we have not support protocol: {protocol},"
+                         "please use http or websocket protocol")
+            sys.exit(-1)
 
-        url = 'http://' + server_ip + ":" + str(port) + '/paddlespeech/asr'
-        audio = wav2base64(input)
-        data = {
-            "audio": audio,
-            "audio_format": audio_format,
-            "sample_rate": sample_rate,
-            "lang": lang,
-        }
-
-        res = requests.post(url=url, data=json.dumps(data))
         return res
 
 
@@ -260,6 +442,18 @@ class ASROnlineClientExecutor(BaseExecutor):
             '--lang', type=str, default="zh_cn", help='language')
         self.parser.add_argument(
             '--audio_format', type=str, default="wav", help='audio format')
+        self.parser.add_argument(
+            '--punc.server_ip',
+            type=str,
+            default=None,
+            dest="punc_server_ip",
+            help='Punctuation server ip')
+        self.parser.add_argument(
+            '--punc.port',
+            type=int,
+            default=8190,
+            dest="punc_server_port",
+            help='Punctuation server port')
 
     def execute(self, argv: List[str]) -> bool:
         args = self.parser.parse_args(argv)
@@ -269,7 +463,6 @@ class ASROnlineClientExecutor(BaseExecutor):
         sample_rate = args.sample_rate
         lang = args.lang
         audio_format = args.audio_format
-
         try:
             time_start = time.time()
             res = self(
@@ -278,7 +471,9 @@ class ASROnlineClientExecutor(BaseExecutor):
                 port=port,
                 sample_rate=sample_rate,
                 lang=lang,
-                audio_format=audio_format)
+                audio_format=audio_format,
+                punc_server_ip=args.punc_server_ip,
+                punc_server_port=args.punc_server_port)
             time_end = time.time()
             logger.info(res)
             logger.info("Response time %f s." % (time_end - time_start))
@@ -295,18 +490,36 @@ class ASROnlineClientExecutor(BaseExecutor):
                  port: int=8091,
                  sample_rate: int=16000,
                  lang: str="zh_cn",
-                 audio_format: str="wav"):
-        """
-        Python API to call an executor.
+                 audio_format: str="wav",
+                 punc_server_ip: str=None,
+                 punc_server_port: str=None):
+        """Python API to call asr online executor.
+
+        Args:
+            input (str): the audio file to be send to streaming asr service.
+            server_ip (str, optional): streaming asr server ip. Defaults to "127.0.0.1".
+            port (int, optional): streaming asr server port. Defaults to 8091.
+            sample_rate (int, optional): audio sample rate. Defaults to 16000.
+            lang (str, optional): audio language type. Defaults to "zh_cn".
+            audio_format (str, optional): audio format. Defaults to "wav".
+            punc_server_ip (str, optional): punctuation server ip. Defaults to None.
+            punc_server_port (str, optional): punctuation server port. Defaults to None.
+
+        Returns:
+            str: the audio text
         """
-        logging.basicConfig(level=logging.INFO)
-        logging.info("asr websocket client start")
-        handler = ASRAudioHandler(server_ip, port)
+
+        logger.info("asr websocket client start")
+        handler = ASRWsAudioHandler(
+            server_ip,
+            port,
+            punc_server_ip=punc_server_ip,
+            punc_server_port=punc_server_port)
         loop = asyncio.get_event_loop()
         res = loop.run_until_complete(handler.run(input))
-        logging.info("asr websocket client finished")
+        logger.info("asr websocket client finished")
 
-        return res['asr_results']
+        return res['result']
 
 
 @cli_client_register(
@@ -348,6 +561,7 @@ class CLSClientExecutor(BaseExecutor):
             return True
         except Exception as e:
             logger.error("Failed to speech classification.")
+            logger.error(e)
             return False
 
     @stats_wrapper
@@ -366,3 +580,262 @@ class CLSClientExecutor(BaseExecutor):
 
         res = requests.post(url=url, data=json.dumps(data))
         return res
+
+
+@cli_client_register(
+    name='paddlespeech_client.text', description='visit the text service')
+class TextClientExecutor(BaseExecutor):
+    def __init__(self):
+        super(TextClientExecutor, self).__init__()
+        self.parser = argparse.ArgumentParser(
+            prog='paddlespeech_client.text', add_help=True)
+        self.parser.add_argument(
+            '--server_ip', type=str, default='127.0.0.1', help='server ip')
+        self.parser.add_argument(
+            '--port', type=int, default=8090, help='server port')
+        self.parser.add_argument(
+            '--input',
+            type=str,
+            default=None,
+            help='sentence to be process by text server.',
+            required=True)
+
+    def execute(self, argv: List[str]) -> bool:
+        """Execute the request from the argv.
+
+        Args:
+            argv (List): the request arguments
+
+        Returns:
+            str: the request flag
+        """
+        args = self.parser.parse_args(argv)
+        input_ = args.input
+        server_ip = args.server_ip
+        port = args.port
+
+        try:
+            time_start = time.time()
+            res = self(input=input_, server_ip=server_ip, port=port)
+            time_end = time.time()
+            logger.info(f"The punc text: {res}")
+            logger.info("Response time %f s." % (time_end - time_start))
+            return True
+        except Exception as e:
+            logger.error("Failed to Text punctuation.")
+            return False
+
+    @stats_wrapper
+    def __call__(self, input: str, server_ip: str="127.0.0.1", port: int=8090):
+        """
+        Python API to call text executor.
+
+        Args:
+            input (str): the request sentence text
+            server_ip (str, optional): the server ip. Defaults to "127.0.0.1".
+            port (int, optional): the server port. Defaults to 8090.
+
+        Returns:
+            str: the punctuation text
+        """
+
+        url = 'http://' + server_ip + ":" + str(port) + '/paddlespeech/text'
+        request = {
+            "text": input,
+        }
+
+        res = requests.post(url=url, data=json.dumps(request))
+        response_dict = res.json()
+        punc_text = response_dict["result"]["punc_text"]
+        return punc_text
+
+
+@cli_client_register(
+    name='paddlespeech_client.vector', description='visit the vector service')
+class VectorClientExecutor(BaseExecutor):
+    def __init__(self):
+        super(VectorClientExecutor, self).__init__()
+        self.parser = argparse.ArgumentParser(
+            prog='paddlespeech_client.vector', add_help=True)
+        self.parser.add_argument(
+            '--server_ip', type=str, default='127.0.0.1', help='server ip')
+        self.parser.add_argument(
+            '--port', type=int, default=8090, help='server port')
+        self.parser.add_argument(
+            '--input',
+            type=str,
+            default=None,
+            help='sentence to be process by text server.')
+        self.parser.add_argument(
+            '--task',
+            type=str,
+            default="spk",
+            choices=["spk", "score"],
+            help="The vector service task")
+        self.parser.add_argument(
+            "--enroll", type=str, default=None, help="The enroll audio")
+        self.parser.add_argument(
+            "--test", type=str, default=None, help="The test audio")
+
+    def execute(self, argv: List[str]) -> bool:
+        """Execute the request from the argv.
+
+        Args:
+            argv (List): the request arguments
+
+        Returns:
+            str: the request flag
+        """
+        args = self.parser.parse_args(argv)
+        input_ = args.input
+        server_ip = args.server_ip
+        port = args.port
+        task = args.task
+
+        try:
+            time_start = time.time()
+            res = self(
+                input=input_,
+                server_ip=server_ip,
+                port=port,
+                enroll_audio=args.enroll,
+                test_audio=args.test,
+                task=task)
+            time_end = time.time()
+            logger.info(f"The vector: {res}")
+            logger.info("Response time %f s." % (time_end - time_start))
+            return True
+        except Exception as e:
+            logger.error("Failed to extract vector.")
+            logger.error(e)
+            return False
+
+    @stats_wrapper
+    def __call__(self,
+                 input: str,
+                 server_ip: str="127.0.0.1",
+                 port: int=8090,
+                 audio_format: str="wav",
+                 sample_rate: int=16000,
+                 enroll_audio: str=None,
+                 test_audio: str=None,
+                 task="spk"):
+        """
+        Python API to call text executor.
+
+        Args:
+            input (str): the request audio data
+            server_ip (str, optional): the server ip. Defaults to "127.0.0.1".
+            port (int, optional): the server port. Defaults to 8090.
+            audio_format (str, optional): audio format. Defaults to "wav".
+            sample_rate (str, optional): audio sample rate. Defaults to 16000.
+            enroll_audio (str, optional): enroll audio data. Defaults to None.
+            test_audio (str, optional): test audio data. Defaults to None.
+            task (str, optional): the task type, "spk" or "socre". Defaults to "spk"
+        Returns:
+            str: the audio embedding or score between enroll and test audio
+        """
+
+        if task == "spk":
+            from paddlespeech.server.utils.audio_handler import VectorHttpHandler
+            logger.info("vector http client start")
+            logger.info(f"the input audio: {input}")
+            handler = VectorHttpHandler(server_ip=server_ip, port=port)
+            res = handler.run(input, audio_format, sample_rate)
+            return res
+        elif task == "score":
+            from paddlespeech.server.utils.audio_handler import VectorScoreHttpHandler
+            logger.info("vector score http client start")
+            logger.info(
+                f"enroll audio: {enroll_audio}, test audio: {test_audio}")
+            handler = VectorScoreHttpHandler(server_ip=server_ip, port=port)
+            res = handler.run(enroll_audio, test_audio, audio_format,
+                              sample_rate)
+            logger.info(f"The vector score is: {res}")
+        else:
+            logger.error(f"Sorry, we have not support such task {task}")
+
+
+@cli_client_register(
+    name='paddlespeech_client.acs', description='visit acs service')
+class ACSClientExecutor(BaseExecutor):
+    def __init__(self):
+        super(ACSClientExecutor, self).__init__()
+        self.parser = argparse.ArgumentParser(
+            prog='paddlespeech_client.acs', add_help=True)
+        self.parser.add_argument(
+            '--server_ip', type=str, default='127.0.0.1', help='server ip')
+        self.parser.add_argument(
+            '--port', type=int, default=8090, help='server port')
+        self.parser.add_argument(
+            '--input',
+            type=str,
+            default=None,
+            help='Audio file to be recognized',
+            required=True)
+        self.parser.add_argument(
+            '--sample_rate', type=int, default=16000, help='audio sample rate')
+        self.parser.add_argument(
+            '--lang', type=str, default="zh_cn", help='language')
+        self.parser.add_argument(
+            '--audio_format', type=str, default="wav", help='audio format')
+
+    def execute(self, argv: List[str]) -> bool:
+        args = self.parser.parse_args(argv)
+        input_ = args.input
+        server_ip = args.server_ip
+        port = args.port
+        sample_rate = args.sample_rate
+        lang = args.lang
+        audio_format = args.audio_format
+
+        try:
+            time_start = time.time()
+            res = self(
+                input=input_,
+                server_ip=server_ip,
+                port=port,
+                sample_rate=sample_rate,
+                lang=lang,
+                audio_format=audio_format, )
+            time_end = time.time()
+            logger.info(f"ACS result: {res}")
+            logger.info("Response time %f s." % (time_end - time_start))
+            return True
+        except Exception as e:
+            logger.error("Failed to speech recognition.")
+            logger.error(e)
+            return False
+
+    @stats_wrapper
+    def __call__(
+            self,
+            input: str,
+            server_ip: str="127.0.0.1",
+            port: int=8090,
+            sample_rate: int=16000,
+            lang: str="zh_cn",
+            audio_format: str="wav", ):
+        """Python API to call an executor.
+
+        Args:
+            input (str): The input audio file path
+            server_ip (str, optional): The ASR server ip. Defaults to "127.0.0.1".
+            port (int, optional): The ASR server port. Defaults to 8090.
+            sample_rate (int, optional): The audio sample rate. Defaults to 16000.
+            lang (str, optional): The audio language type. Defaults to "zh_cn".
+            audio_format (str, optional): The audio format information. Defaults to "wav".
+
+        Returns:
+            str: The ACS results
+        """
+        # we use the acs server to get the key word time stamp in audio text content
+        logger.info("acs http client start")
+        from paddlespeech.server.utils.audio_handler import ASRHttpHandler
+        handler = ASRHttpHandler(
+            server_ip=server_ip, port=port, endpoint="/paddlespeech/asr/search")
+        res = handler.run(input, audio_format, sample_rate, lang)
+        res = res['result']
+        logger.info("acs http client finished")
+
+        return res
diff --git a/paddlespeech/server/bin/paddlespeech_server.py b/paddlespeech/server/bin/paddlespeech_server.py
index 474a8b79f7695aac587f22817df101f8d4a6b3ee..e59f17d38820374a620e8d9eb78daf060412130d 100644
--- a/paddlespeech/server/bin/paddlespeech_server.py
+++ b/paddlespeech/server/bin/paddlespeech_server.py
@@ -12,11 +12,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
+import sys
+import warnings
 from typing import List
 
 import uvicorn
 from fastapi import FastAPI
 from prettytable import PrettyTable
+from starlette.middleware.cors import CORSMiddleware
 
 from ..executor import BaseExecutor
 from ..util import cli_server_register
@@ -26,12 +29,20 @@ from paddlespeech.server.engine.engine_pool import init_engine_pool
 from paddlespeech.server.restful.api import setup_router as setup_http_router
 from paddlespeech.server.utils.config import get_config
 from paddlespeech.server.ws.api import setup_router as setup_ws_router
+warnings.filterwarnings("ignore")
 
 __all__ = ['ServerExecutor', 'ServerStatsExecutor']
 
 app = FastAPI(
     title="PaddleSpeech Serving API", description="Api", version="0.0.1")
 
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"])
+
 
 @cli_server_register(
     name='paddlespeech_server.start', description='Start the service')
@@ -71,7 +82,7 @@ class ServerExecutor(BaseExecutor):
         else:
             raise Exception("unsupported protocol")
         app.include_router(api_router)
-
+        logger.info("start to init the engine")
         if not init_engine_pool(config):
             return False
 
@@ -79,10 +90,12 @@ class ServerExecutor(BaseExecutor):
 
     def execute(self, argv: List[str]) -> bool:
         args = self.parser.parse_args(argv)
-        config = get_config(args.config_file)
-
-        if self.init(config):
-            uvicorn.run(app, host=config.host, port=config.port, debug=True)
+        try:
+            self(args.config_file, args.log_file)
+        except Exception as e:
+            logger.error("Failed to start server.")
+            logger.error(e)
+            sys.exit(-1)
 
     @stats_wrapper
     def __call__(self,
@@ -109,14 +122,16 @@ class ServerStatsExecutor():
             '--task',
             type=str,
             default=None,
-            choices=['asr', 'tts', 'cls'],
+            choices=['asr', 'tts', 'cls', 'text', 'vector'],
             help='Choose speech task.',
             required=True)
-        self.task_choices = ['asr', 'tts', 'cls']
+        self.task_choices = ['asr', 'tts', 'cls', 'text', 'vector']
         self.model_name_format = {
             'asr': 'Model-Language-Sample Rate',
             'tts': 'Model-Language',
-            'cls': 'Model-Sample Rate'
+            'cls': 'Model-Sample Rate',
+            'text': 'Model-Task-Language',
+            'vector': 'Model-Sample Rate'
         }
 
     def show_support_models(self, pretrained_models: dict):
@@ -137,7 +152,7 @@ class ServerStatsExecutor():
                 "Please input correct speech task, choices = ['asr', 'tts']")
             return False
 
-        elif self.task == 'asr':
+        elif self.task.lower() == 'asr':
             try:
                 from paddlespeech.cli.asr.infer import pretrained_models
                 logger.info(
@@ -159,7 +174,7 @@ class ServerStatsExecutor():
                 )
                 return False
 
-        elif self.task == 'tts':
+        elif self.task.lower() == 'tts':
             try:
                 from paddlespeech.cli.tts.infer import pretrained_models
                 logger.info(
@@ -181,7 +196,7 @@ class ServerStatsExecutor():
                 )
                 return False
 
-        elif self.task == 'cls':
+        elif self.task.lower() == 'cls':
             try:
                 from paddlespeech.cli.cls.infer import pretrained_models
                 logger.info(
@@ -202,3 +217,36 @@ class ServerStatsExecutor():
                     "Failed to get the table of CLS pretrained models supported in the service."
                 )
                 return False
+        elif self.task.lower() == 'text':
+            try:
+                from paddlespeech.cli.text.infer import pretrained_models
+                logger.info(
+                    "Here is the table of Text pretrained models supported in the service."
+                )
+                self.show_support_models(pretrained_models)
+
+                return True
+            except BaseException:
+                logger.error(
+                    "Failed to get the table of Text pretrained models supported in the service."
+                )
+                return False
+        elif self.task.lower() == 'vector':
+            try:
+                from paddlespeech.cli.vector.infer import pretrained_models
+                logger.info(
+                    "Here is the table of Vector pretrained models supported in the service."
+                )
+                self.show_support_models(pretrained_models)
+
+                return True
+            except BaseException:
+                logger.error(
+                    "Failed to get the table of Vector pretrained models supported in the service."
+                )
+                return False
+        else:
+            logger.error(
+                f"Failed to get the table of {self.task} pretrained models supported in the service."
+            )
+            return False
diff --git a/paddlespeech/server/conf/application.yaml b/paddlespeech/server/conf/application.yaml
index 849349c2df371a58f754d1fa881ba524ac7df5d7..8650154e953c9db0dcc4eb03e598cef64f648878 100644
--- a/paddlespeech/server/conf/application.yaml
+++ b/paddlespeech/server/conf/application.yaml
@@ -1,17 +1,15 @@
-# This is the parameter configuration file for PaddleSpeech Serving.
+# This is the parameter configuration file for PaddleSpeech Offline Serving..
 
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
 port: 8090
 
 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference']
-# protocol = ['websocket', 'http'] (only one can be selected). 
-# http only support offline engine type.
+# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
 protocol: 'http'
-engine_list: ['asr_python', 'tts_python', 'cls_python']
+engine_list: ['asr_python', 'tts_python', 'cls_python', 'text_python', 'vector_python']
 
 
 #################################################################################
@@ -50,24 +48,6 @@ asr_inference:
         summary: True  # False -> do not show predictor config
 
 
-################### speech task: asr; engine_type: online #######################
-asr_online:
-    model_type: 'deepspeech2online_aishell'
-    am_model: # the pdmodel file of am static model [optional]
-    am_params:  # the pdiparams file of am static model [optional]
-    lang: 'zh'
-    sample_rate: 16000
-    cfg_path: 
-    decode_method: 
-    force_yes: True
-
-    am_predictor_conf:
-        device:  # set 'gpu:id' or 'cpu'
-        switch_ir_optim: True
-        glog_info: False  # True -> print glog
-        summary: True  # False -> do not show predictor config
-
-
 ################################### TTS #########################################
 ################### speech task: tts; engine_type: python #######################
 tts_python: 
@@ -155,3 +135,26 @@ cls_inference:
         glog_info: False  # True -> print glog
         summary: True  # False -> do not show predictor config
 
+
+################################### Text #########################################
+################### text task: punc; engine_type: python #######################
+text_python:
+    task: punc
+    model_type: 'ernie_linear_p3_wudao'
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    vocab_file: # [optional]
+    device:  # set 'gpu:id' or 'cpu'
+
+
+################################### Vector ######################################
+################### Vector task: spk; engine_type: python #######################
+vector_python:
+    task: spk
+    model_type: 'ecapatdnn_voxceleb12'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    device:  # set 'gpu:id' or 'cpu'
diff --git a/paddlespeech/server/conf/tts_online_application.yaml b/paddlespeech/server/conf/tts_online_application.yaml
index 10abf0d43d8e0ec5dcd61f29f58e8e2d179e4948..964e85ef95a80db29a35ee9a69e5909c1aef70d8 100644
--- a/paddlespeech/server/conf/tts_online_application.yaml
+++ b/paddlespeech/server/conf/tts_online_application.yaml
@@ -1,16 +1,16 @@
-# This is the parameter configuration file for PaddleSpeech Serving.
+# This is the parameter configuration file for streaming tts server.
 
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
 port: 8092
 
 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['tts_online', 'tts_online-onnx']
-# protocol = ['websocket', 'http'] (only one can be selected).
+# engine_list choices = ['tts_online', 'tts_online-onnx'], the inference speed of tts_online-onnx is faster than tts_online.
+# protocol choices = ['websocket', 'http'] 
 protocol: 'http'
-engine_list: ['tts_online']
+engine_list: ['tts_online-onnx']
 
 
 #################################################################################
@@ -20,8 +20,9 @@ engine_list: ['tts_online']
 ################################### TTS #########################################
 ################### speech task: tts; engine_type: online #######################
 tts_online: 
-    # am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']        
-    am: 'fastspeech2_cnndecoder_csmsc'   
+    # am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']   
+    # fastspeech2_cnndecoder_csmsc support streaming am infer.     
+    am: 'fastspeech2_csmsc'   
     am_config: 
     am_ckpt: 
     am_stat: 
@@ -31,6 +32,7 @@ tts_online:
     spk_id: 0
 
     # voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
+    # Both mb_melgan_csmsc and hifigan_csmsc support streaming voc inference
     voc: 'mb_melgan_csmsc'
     voc_config: 
     voc_ckpt: 
@@ -39,9 +41,14 @@ tts_online:
     # others
     lang: 'zh'
     device: 'cpu' # set 'gpu:id' or 'cpu'
-    am_block: 42
+    # am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
+    # when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
+    am_block: 72
     am_pad: 12
-    voc_block: 14
+    # voc_pad and voc_block voc model to streaming voc infer,
+    # when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    voc_block: 36
     voc_pad: 14
     
 
@@ -53,7 +60,8 @@ tts_online:
 ################################### TTS #########################################
 ################### speech task: tts; engine_type: online-onnx #######################
 tts_online-onnx: 
-    # am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']        
+    # am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']
+    # fastspeech2_cnndecoder_csmsc_onnx support streaming am infer.        
     am: 'fastspeech2_cnndecoder_csmsc_onnx' 
     # am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
     # if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
@@ -67,22 +75,29 @@ tts_online-onnx:
     am_sess_conf:
         device: "cpu" # set 'gpu:id' or 'cpu'
         use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
 
     # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
-    voc: 'mb_melgan_csmsc_onnx'
+    # Both mb_melgan_csmsc_onnx and hifigan_csmsc_onnx support streaming voc inference
+    voc: 'hifigan_csmsc_onnx'
     voc_ckpt: 
     voc_sample_rate: 24000
     voc_sess_conf:
         device: "cpu" # set 'gpu:id' or 'cpu'
         use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
 
     # others
     lang: 'zh'
-    am_block: 42
+    # am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
+    # when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
+    am_block: 72
     am_pad: 12
-    voc_block: 14
+    # voc_pad and voc_block voc model to streaming voc infer,
+    # when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc_onnx, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    voc_block: 36
     voc_pad: 14
+    # voc_upsample should be same as n_shift on voc config.
     voc_upsample: 300
     
diff --git a/paddlespeech/server/conf/vector_application.yaml b/paddlespeech/server/conf/vector_application.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c78659e351579fb629fde531d1ff9b5d726f4349
--- /dev/null
+++ b/paddlespeech/server/conf/vector_application.yaml
@@ -0,0 +1,32 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# protocol = ['http'] (only one can be selected). 
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['vector_python']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### Vector ######################################
+################### Vector task: spk; engine_type: python #######################
+vector_python:
+    task: spk
+    model_type: 'ecapatdnn_voxceleb12'
+    sample_rate: 16000
+    cfg_path: # [optional]
+    ckpt_path: # [optional]
+    device: # set 'gpu:id' or 'cpu'
+
+
+
+
diff --git a/speechx/speechx/common/CMakeLists.txt b/paddlespeech/server/engine/acs/__init__.py
similarity index 100%
rename from speechx/speechx/common/CMakeLists.txt
rename to paddlespeech/server/engine/acs/__init__.py
diff --git a/paddlespeech/server/engine/acs/python/__init__.py b/paddlespeech/server/engine/acs/python/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/paddlespeech/server/engine/acs/python/acs_engine.py b/paddlespeech/server/engine/acs/python/acs_engine.py
new file mode 100644
index 0000000000000000000000000000000000000000..30deeeb50519d65fcad2a6f398932718d2bbcd7a
--- /dev/null
+++ b/paddlespeech/server/engine/acs/python/acs_engine.py
@@ -0,0 +1,188 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import io
+import json
+import os
+import re
+
+import paddle
+import soundfile
+import websocket
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.base_engine import BaseEngine
+
+
+class ACSEngine(BaseEngine):
+    def __init__(self):
+        """The ACSEngine Engine
+        """
+        super(ACSEngine, self).__init__()
+        logger.info("Create the ACSEngine Instance")
+        self.word_list = []
+
+    def init(self, config: dict):
+        """Init the ACSEngine Engine
+
+        Args:
+            config (dict): The server configuation
+
+        Returns:
+            bool: The engine instance flag
+        """
+        logger.info("Init the acs engine")
+        try:
+            self.config = config
+            if self.config.device:
+                self.device = self.config.device
+            else:
+                self.device = paddle.get_device()
+
+            paddle.set_device(self.device)
+            logger.info(f"ACS Engine set the device: {self.device}")
+
+        except BaseException as e:
+            logger.error(
+                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+            )
+            logger.error("Initialize Text server engine Failed on device: %s." %
+                         (self.device))
+            return False
+
+        self.read_search_words()
+
+        # init the asr url
+        self.url = "ws://" + self.config.asr_server_ip + ":" + str(
+            self.config.asr_server_port) + "/paddlespeech/asr/streaming"
+
+        logger.info("Init the acs engine successfully")
+        return True
+
+    def read_search_words(self):
+        word_list = self.config.word_list
+        if word_list is None:
+            logger.error(
+                "No word list file in config, please set the word list parameter"
+            )
+            return
+
+        if not os.path.exists(word_list):
+            logger.error("Please input correct word list file")
+            return
+
+        with open(word_list, 'r') as fp:
+            self.word_list = [line.strip() for line in fp.readlines()]
+
+        logger.info(f"word list: {self.word_list}")
+
+    def get_asr_content(self, audio_data):
+        """Get the streaming asr result
+
+        Args:
+            audio_data (_type_): _description_
+
+        Returns:
+            _type_: _description_
+        """
+        logger.info("send a message to the server")
+        if self.url is None:
+            logger.error("No asr server, please input valid ip and port")
+            return ""
+        ws = websocket.WebSocket()
+        ws.connect(self.url)
+        # with websocket.WebSocket.connect(self.url) as ws:
+        audio_info = json.dumps(
+            {
+                "name": "test.wav",
+                "signal": "start",
+                "nbest": 1
+            },
+            sort_keys=True,
+            indent=4,
+            separators=(',', ': '))
+        ws.send(audio_info)
+        msg = ws.recv()
+        logger.info("client receive msg={}".format(msg))
+
+        # send the total audio data
+        samples, sample_rate = soundfile.read(audio_data, dtype='int16')
+        ws.send_binary(samples.tobytes())
+        msg = ws.recv()
+        msg = json.loads(msg)
+        logger.info(f"audio result: {msg}")
+
+        # 3. send chunk audio data to engine
+        logger.info("send the end signal")
+        audio_info = json.dumps(
+            {
+                "name": "test.wav",
+                "signal": "end",
+                "nbest": 1
+            },
+            sort_keys=True,
+            indent=4,
+            separators=(',', ': '))
+        ws.send(audio_info)
+        msg = ws.recv()
+        msg = json.loads(msg)
+
+        logger.info(f"the final result: {msg}")
+        ws.close()
+
+        return msg
+
+    def get_macthed_word(self, msg):
+        """Get the matched info in msg
+
+        Args:
+            msg (dict): the asr info, including the asr result and time stamp
+
+        Returns:
+            acs_result, asr_result: the acs result and the asr result
+        """
+        asr_result = msg['result']
+        time_stamp = msg['times']
+        acs_result = []
+
+        # search for each word in self.word_list
+        offset = self.config.offset
+        max_ed = time_stamp[-1]['ed']
+        for w in self.word_list:
+            # search the w in asr_result and the index in asr_result
+            for m in re.finditer(w, asr_result):
+                start = max(time_stamp[m.start(0)]['bg'] - offset, 0)
+
+                end = min(time_stamp[m.end(0) - 1]['ed'] + offset, max_ed)
+                logger.info(f'start: {start}, end: {end}')
+                acs_result.append({'w': w, 'bg': start, 'ed': end})
+
+        return acs_result, asr_result
+
+    def run(self, audio_data):
+        """process the audio data in acs engine
+           the engine does not store any data, so all the request use the self.run api
+
+        Args:
+            audio_data (str): the audio data
+
+        Returns:
+            acs_result, asr_result: the acs result and the asr result
+        """
+        logger.info("start to process the audio content search")
+        msg = self.get_asr_content(io.BytesIO(audio_data))
+
+        acs_result, asr_result = self.get_macthed_word(msg)
+        logger.info(f'the asr result {asr_result}')
+        logger.info(f'the acs result: {acs_result}')
+        return acs_result, asr_result
diff --git a/paddlespeech/server/engine/asr/online/asr_engine.py b/paddlespeech/server/engine/asr/online/asr_engine.py
index 10e720249db5461b94169db557ad35b26bf22882..fd57a3d5214107996d4355dbdd6f1a46514b4bb0 100644
--- a/paddlespeech/server/engine/asr/online/asr_engine.py
+++ b/paddlespeech/server/engine/asr/online/asr_engine.py
@@ -13,6 +13,7 @@
 # limitations under the License.
 import copy
 import os
+import sys
 from typing import Optional
 
 import numpy as np
@@ -20,10 +21,9 @@ import paddle
 from numpy import float32
 from yacs.config import CfgNode
 
+from .pretrained_models import pretrained_models
 from paddlespeech.cli.asr.infer import ASRExecutor
-from paddlespeech.cli.asr.infer import model_alias
 from paddlespeech.cli.log import logger
-from paddlespeech.cli.utils import download_and_decompress
 from paddlespeech.cli.utils import MODEL_HOME
 from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
 from paddlespeech.s2t.frontend.speech import SpeechSegment
@@ -40,45 +40,6 @@ from paddlespeech.server.utils.paddle_predictor import init_predictor
 
 __all__ = ['ASREngine']
 
-pretrained_models = {
-    "deepspeech2online_aishell-zh-16k": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz',
-        'md5':
-        '23e16c69730a1cb5d735c98c83c21e16',
-        'cfg_path':
-        'model.yaml',
-        'ckpt_path':
-        'exp/deepspeech2_online/checkpoints/avg_1',
-        'model':
-        'exp/deepspeech2_online/checkpoints/avg_1.jit.pdmodel',
-        'params':
-        'exp/deepspeech2_online/checkpoints/avg_1.jit.pdiparams',
-        'lm_url':
-        'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
-        'lm_md5':
-        '29e02312deb2e59b3c8686c7966d4fe3'
-    },
-    "conformer_online_multicn-zh-16k": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/s2t/multi_cn/asr1/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz',
-        'md5':
-        '0ac93d390552336f2a906aec9e33c5fa',
-        'cfg_path':
-        'model.yaml',
-        'ckpt_path':
-        'exp/chunk_conformer/checkpoints/multi_cn',
-        'model':
-        'exp/chunk_conformer/checkpoints/multi_cn.pdparams',
-        'params':
-        'exp/chunk_conformer/checkpoints/multi_cn.pdparams',
-        'lm_url':
-        'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
-        'lm_md5':
-        '29e02312deb2e59b3c8686c7966d4fe3'
-    },
-}
-
 
 # ASR server connection process class
 class PaddleASRConnectionHanddler:
@@ -153,6 +114,12 @@ class PaddleASRConnectionHanddler:
             self.n_shift = self.preprocess_conf.process[0]['n_shift']
 
     def extract_feat(self, samples):
+
+        # we compute the elapsed time of first char occuring 
+        # and we record the start time at the first pcm sample arraving
+        # if self.first_char_occur_elapsed is not None:
+        #     self.first_char_occur_elapsed = time.time()
+
         if "deepspeech2online" in self.model_type:
             # self.reamined_wav stores all the samples, 
             # include the original remained_wav and this package samples
@@ -290,6 +257,9 @@ class PaddleASRConnectionHanddler:
         self.chunk_num = 0
         self.global_frame_offset = 0
         self.result_transcripts = ['']
+        self.word_time_stamp = []
+        self.time_stamp = []
+        self.first_char_occur_elapsed = None
 
     def decode(self, is_finished=False):
         if "deepspeech2online" in self.model_type:
@@ -505,6 +475,9 @@ class PaddleASRConnectionHanddler:
         else:
             return ''
 
+    def get_word_time_stamp(self):
+        return self.word_time_stamp
+
     @paddle.no_grad()
     def rescoring(self):
         if "deepspeech2online" in self.model_type or "deepspeech2offline" in self.model_type:
@@ -567,35 +540,56 @@ class PaddleASRConnectionHanddler:
                 best_index = i
 
         # update the one best result
+        # hyps stored the beam results and each fields is:
+
         logger.info(f"best index: {best_index}")
+        # logger.info(f'best result: {hyps[best_index]}')
+        # the field of the hyps is:
+        # hyps[0][0]: the sentence word-id in the vocab with a tuple
+        # hyps[0][1]: the sentence decoding probability with all paths
+        # hyps[0][2]: viterbi_blank ending probability
+        # hyps[0][3]: viterbi_non_blank probability
+        # hyps[0][4]: current_token_prob,
+        # hyps[0][5]: times_viterbi_blank, 
+        # hyps[0][6]: times_titerbi_non_blank 
         self.hyps = [hyps[best_index][0]]
+
+        # update the hyps time stamp
+        self.time_stamp = hyps[best_index][5] if hyps[best_index][2] > hyps[
+            best_index][3] else hyps[best_index][6]
+        logger.info(f"time stamp: {self.time_stamp}")
+
         self.update_result()
 
+        # update each word start and end time stamp
+        frame_shift_in_ms = self.model.encoder.embed.subsampling_rate * self.n_shift / self.sample_rate
+        logger.info(f"frame shift ms: {frame_shift_in_ms}")
+        word_time_stamp = []
+        for idx, _ in enumerate(self.time_stamp):
+            start = (self.time_stamp[idx - 1] + self.time_stamp[idx]
+                     ) / 2.0 if idx > 0 else 0
+            start = start * frame_shift_in_ms
+
+            end = (self.time_stamp[idx] + self.time_stamp[idx + 1]
+                   ) / 2.0 if idx < len(self.time_stamp) - 1 else self.offset
+            end = end * frame_shift_in_ms
+            word_time_stamp.append({
+                "w": self.result_transcripts[0][idx],
+                "bg": start,
+                "ed": end
+            })
+            # logger.info(f"{self.result_transcripts[0][idx]}, start: {start}, end: {end}")
+        self.word_time_stamp = word_time_stamp
+        logger.info(f"word time stamp: {self.word_time_stamp}")
+
 
 class ASRServerExecutor(ASRExecutor):
     def __init__(self):
         super().__init__()
-        pass
-
-    def _get_pretrained_path(self, tag: str) -> os.PathLike:
-        """
-        Download and returns pretrained resources path of current task.
-        """
-        support_models = list(pretrained_models.keys())
-        assert tag in pretrained_models, 'The model "{}" you want to use has not been supported, please choose other models.\nThe support models includes:\n\t\t{}\n'.format(
-            tag, '\n\t\t'.join(support_models))
-
-        res_path = os.path.join(MODEL_HOME, tag)
-        decompressed_path = download_and_decompress(pretrained_models[tag],
-                                                    res_path)
-        decompressed_path = os.path.abspath(decompressed_path)
-        logger.info(
-            'Use pretrained model stored in: {}'.format(decompressed_path))
-
-        return decompressed_path
+        self.pretrained_models = pretrained_models
 
     def _init_from_path(self,
-                        model_type: str='deepspeech2online_aishell',
+                        model_type: str=None,
                         am_model: Optional[os.PathLike]=None,
                         am_params: Optional[os.PathLike]=None,
                         lang: str='zh',
@@ -606,22 +600,28 @@ class ASRServerExecutor(ASRExecutor):
         """
         Init model and other resources from a specific path.
         """
+        if not model_type or not lang or not sample_rate:
+            logger.error(
+                "The model type or lang or sample rate is None, please input an valid server parameter yaml"
+            )
+            return False
+
         self.model_type = model_type
         self.sample_rate = sample_rate
+        sample_rate_str = '16k' if sample_rate == 16000 else '8k'
+        tag = model_type + '-' + lang + '-' + sample_rate_str
         if cfg_path is None or am_model is None or am_params is None:
-            sample_rate_str = '16k' if sample_rate == 16000 else '8k'
-            tag = model_type + '-' + lang + '-' + sample_rate_str
             logger.info(f"Load the pretrained model, tag = {tag}")
             res_path = self._get_pretrained_path(tag)  # wenetspeech_zh
             self.res_path = res_path
 
-            self.cfg_path = os.path.join(res_path,
-                                         pretrained_models[tag]['cfg_path'])
+            self.cfg_path = os.path.join(
+                res_path, self.pretrained_models[tag]['cfg_path'])
 
             self.am_model = os.path.join(res_path,
-                                         pretrained_models[tag]['model'])
+                                         self.pretrained_models[tag]['model'])
             self.am_params = os.path.join(res_path,
-                                          pretrained_models[tag]['params'])
+                                          self.pretrained_models[tag]['params'])
             logger.info(res_path)
         else:
             self.cfg_path = os.path.abspath(cfg_path)
@@ -649,8 +649,8 @@ class ASRServerExecutor(ASRExecutor):
                 self.text_feature = TextFeaturizer(
                     unit_type=self.config.unit_type, vocab=self.vocab)
 
-                lm_url = pretrained_models[tag]['lm_url']
-                lm_md5 = pretrained_models[tag]['lm_md5']
+                lm_url = self.pretrained_models[tag]['lm_url']
+                lm_md5 = self.pretrained_models[tag]['lm_md5']
                 logger.info(f"Start to load language model {lm_url}")
                 self.download_lm(
                     lm_url,
@@ -676,7 +676,7 @@ class ASRServerExecutor(ASRExecutor):
                 ]:
                     logger.info(
                         "we set the decoding_method to attention_rescoring")
-                    self.config.decode.decoding = "attention_rescoring"
+                    self.config.decode.decoding_method = "attention_rescoring"
                 assert self.config.decode.decoding_method in [
                     "ctc_prefix_beam_search", "attention_rescoring"
                 ], f"we only support ctc_prefix_beam_search and attention_rescoring dedoding method, current decoding method is {self.config.decode.decoding_method}"
@@ -723,7 +723,7 @@ class ASRServerExecutor(ASRExecutor):
             model_name = model_type[:model_type.rindex(
                 '_')]  # model_type: {model_name}_{dataset}
             logger.info(f"model name: {model_name}")
-            model_class = dynamic_import(model_name, model_alias)
+            model_class = dynamic_import(model_name, self.model_alias)
             model_conf = self.config
             model = model_class.from_config(model_conf)
             self.model = model
@@ -737,6 +737,8 @@ class ASRServerExecutor(ASRExecutor):
             # update the ctc decoding
             self.searcher = CTCPrefixBeamSearch(self.config.decode)
             self.transformer_decode_reset()
+            
+        return True
 
     def reset_decoder_and_chunk(self):
         """reset decoder and chunk state for an new audio
@@ -1035,20 +1037,27 @@ class ASREngine(BaseEngine):
                 self.device = paddle.get_device()
             logger.info(f"paddlespeech_server set the device: {self.device}")
             paddle.set_device(self.device)
-        except BaseException:
+        except BaseException as e:
             logger.error(
-                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+                f"Set device failed, please check if device '{self.device}' is already used and the parameter 'device' in the yaml file"
             )
-
-        self.executor._init_from_path(
-            model_type=self.config.model_type,
-            am_model=self.config.am_model,
-            am_params=self.config.am_params,
-            lang=self.config.lang,
-            sample_rate=self.config.sample_rate,
-            cfg_path=self.config.cfg_path,
-            decode_method=self.config.decode_method,
-            am_predictor_conf=self.config.am_predictor_conf)
+            logger.error(
+                "If all GPU or XPU is used, you can set the server to 'cpu'")
+            sys.exit(-1)
+
+        if not self.executor._init_from_path(
+                model_type=self.config.model_type,
+                am_model=self.config.am_model,
+                am_params=self.config.am_params,
+                lang=self.config.lang,
+                sample_rate=self.config.sample_rate,
+                cfg_path=self.config.cfg_path,
+                decode_method=self.config.decode_method,
+                am_predictor_conf=self.config.am_predictor_conf):
+            logger.error(
+                "Init the ASR server occurs error, please check the server configuration yaml"
+            )
+            return False
 
         logger.info("Initialize ASR server engine successfully.")
         return True
diff --git a/paddlespeech/server/engine/asr/online/ctc_search.py b/paddlespeech/server/engine/asr/online/ctc_search.py
index be5fb15bd7cc2d2a7adbced06be1ccb32a284a67..4c9ac3acbad1758b163e34f10ac46f6ae78e9b71 100644
--- a/paddlespeech/server/engine/asr/online/ctc_search.py
+++ b/paddlespeech/server/engine/asr/online/ctc_search.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import copy
 from collections import defaultdict
 
 import paddle
@@ -26,7 +27,7 @@ class CTCPrefixBeamSearch:
         """Implement the ctc prefix beam search
 
         Args:
-            config (yacs.config.CfgNode): _description_
+            config (yacs.config.CfgNode): the ctc prefix beam search configuration
         """
         self.config = config
         self.reset()
@@ -54,14 +55,23 @@ class CTCPrefixBeamSearch:
         assert len(ctc_probs.shape) == 2
 
         # cur_hyps: (prefix, (blank_ending_score, none_blank_ending_score))
-        # blank_ending_score and  none_blank_ending_score in ln domain
+        # 0. blank_ending_score,
+        # 1. none_blank_ending_score, 
+        # 2. viterbi_blank ending, 
+        # 3. viterbi_non_blank, 
+        # 4. current_token_prob, 
+        # 5. times_viterbi_blank, 
+        # 6. times_titerbi_non_blank
         if self.cur_hyps is None:
-            self.cur_hyps = [(tuple(), (0.0, -float('inf')))]
+            self.cur_hyps = [(tuple(), (0.0, -float('inf'), 0.0, 0.0,
+                                        -float('inf'), [], []))]
+            # self.cur_hyps = [(tuple(), (0.0, -float('inf')))]
         # 2. CTC beam search step by step
         for t in range(0, maxlen):
             logp = ctc_probs[t]  # (vocab_size,)
-            # key: prefix, value (pb, pnb), default value(-inf, -inf)
-            next_hyps = defaultdict(lambda: (-float('inf'), -float('inf')))
+            # next_hyps = defaultdict(lambda: (-float('inf'), -float('inf')))
+            next_hyps = defaultdict(
+                        lambda: (-float('inf'), -float('inf'), -float('inf'), -float('inf'), -float('inf'), [], []))
 
             # 2.1 First beam prune: select topk best
             #     do token passing process
@@ -69,36 +79,83 @@ class CTCPrefixBeamSearch:
             for s in top_k_index:
                 s = s.item()
                 ps = logp[s].item()
-                for prefix, (pb, pnb) in self.cur_hyps:
+                for prefix, (pb, pnb, v_b_s, v_nb_s, cur_token_prob, times_s,
+                             times_ns) in self.cur_hyps:
                     last = prefix[-1] if len(prefix) > 0 else None
                     if s == blank_id:  # blank
-                        n_pb, n_pnb = next_hyps[prefix]
+                        n_pb, n_pnb, n_v_s, n_v_ns, n_cur_token_prob, n_times_s, n_times_ns = next_hyps[
+                            prefix]
                         n_pb = log_add([n_pb, pb + ps, pnb + ps])
-                        next_hyps[prefix] = (n_pb, n_pnb)
+
+                        pre_times = times_s if v_b_s > v_nb_s else times_ns
+                        n_times_s = copy.deepcopy(pre_times)
+                        viterbi_score = v_b_s if v_b_s > v_nb_s else v_nb_s
+                        n_v_s = viterbi_score + ps
+                        next_hyps[prefix] = (n_pb, n_pnb, n_v_s, n_v_ns,
+                                             n_cur_token_prob, n_times_s,
+                                             n_times_ns)
                     elif s == last:
                         #  Update *ss -> *s;
-                        n_pb, n_pnb = next_hyps[prefix]
+                        # case1: *a + a => *a
+                        n_pb, n_pnb, n_v_s, n_v_ns, n_cur_token_prob, n_times_s, n_times_ns = next_hyps[
+                            prefix]
                         n_pnb = log_add([n_pnb, pnb + ps])
-                        next_hyps[prefix] = (n_pb, n_pnb)
+                        if n_v_ns < v_nb_s + ps:
+                            n_v_ns = v_nb_s + ps
+                            if n_cur_token_prob < ps:
+                                n_cur_token_prob = ps
+                                n_times_ns = copy.deepcopy(times_ns)
+                                n_times_ns[
+                                    -1] = self.abs_time_step  # 注意，这里要重新使用绝对时间
+                        next_hyps[prefix] = (n_pb, n_pnb, n_v_s, n_v_ns,
+                                             n_cur_token_prob, n_times_s,
+                                             n_times_ns)
+
                         # Update *s-s -> *ss, - is for blank
+                        # Case 2: *aε + a => *aa
                         n_prefix = prefix + (s, )
-                        n_pb, n_pnb = next_hyps[n_prefix]
+                        n_pb, n_pnb, n_v_s, n_v_ns, n_cur_token_prob, n_times_s, n_times_ns = next_hyps[
+                            n_prefix]
+                        if n_v_ns < v_b_s + ps:
+                            n_v_ns = v_b_s + ps
+                            n_cur_token_prob = ps
+                            n_times_ns = copy.deepcopy(times_s)
+                            n_times_ns.append(self.abs_time_step)
                         n_pnb = log_add([n_pnb, pb + ps])
-                        next_hyps[n_prefix] = (n_pb, n_pnb)
+                        next_hyps[n_prefix] = (n_pb, n_pnb, n_v_s, n_v_ns,
+                                               n_cur_token_prob, n_times_s,
+                                               n_times_ns)
                     else:
+                        # Case 3: *a + b => *ab, *aε + b => *ab
                         n_prefix = prefix + (s, )
-                        n_pb, n_pnb = next_hyps[n_prefix]
+                        n_pb, n_pnb, n_v_s, n_v_ns, n_cur_token_prob, n_times_s, n_times_ns = next_hyps[
+                            n_prefix]
+                        viterbi_score = v_b_s if v_b_s > v_nb_s else v_nb_s
+                        pre_times = times_s if v_b_s > v_nb_s else times_ns
+                        if n_v_ns < viterbi_score + ps:
+                            n_v_ns = viterbi_score + ps
+                            n_cur_token_prob = ps
+                            n_times_ns = copy.deepcopy(pre_times)
+                            n_times_ns.append(self.abs_time_step)
+
                         n_pnb = log_add([n_pnb, pb + ps, pnb + ps])
-                        next_hyps[n_prefix] = (n_pb, n_pnb)
+                        next_hyps[n_prefix] = (n_pb, n_pnb, n_v_s, n_v_ns,
+                                               n_cur_token_prob, n_times_s,
+                                               n_times_ns)
 
             # 2.2 Second beam prune
             next_hyps = sorted(
                 next_hyps.items(),
-                key=lambda x: log_add(list(x[1])),
+                key=lambda x: log_add([x[1][0], x[1][1]]),
                 reverse=True)
             self.cur_hyps = next_hyps[:beam_size]
 
-        self.hyps = [(y[0], log_add([y[1][0], y[1][1]])) for y in self.cur_hyps]
+            # 2.3 update the absolute time step
+            self.abs_time_step += 1
+
+        self.hyps = [(y[0], log_add([y[1][0], y[1][1]]), y[1][2], y[1][3],
+                      y[1][4], y[1][5], y[1][6]) for y in self.cur_hyps]
+
         logger.info("ctc prefix search success")
         return self.hyps
 
@@ -123,6 +180,7 @@ class CTCPrefixBeamSearch:
         """
         self.cur_hyps = None
         self.hyps = None
+        self.abs_time_step = 0
 
     def finalize_search(self):
         """do nothing in ctc_prefix_beam_search
diff --git a/paddlespeech/server/engine/asr/online/pretrained_models.py b/paddlespeech/server/engine/asr/online/pretrained_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff3778657e85efe1808b1cdb8e34d33ebad862d3
--- /dev/null
+++ b/paddlespeech/server/engine/asr/online/pretrained_models.py
@@ -0,0 +1,70 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+pretrained_models = {
+    "deepspeech2online_aishell-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz',
+        'md5':
+        '98b87b171b7240b7cae6e07d8d0bc9be',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/deepspeech2_online/checkpoints/avg_1',
+        'model':
+        'exp/deepspeech2_online/checkpoints/avg_1.jit.pdmodel',
+        'params':
+        'exp/deepspeech2_online/checkpoints/avg_1.jit.pdiparams',
+        'lm_url':
+        'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
+        'lm_md5':
+        '29e02312deb2e59b3c8686c7966d4fe3'
+    },
+    "conformer_online_multicn-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/multi_cn/asr1/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz',
+        'md5':
+        '0ac93d390552336f2a906aec9e33c5fa',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/chunk_conformer/checkpoints/multi_cn',
+        'model':
+        'exp/chunk_conformer/checkpoints/multi_cn.pdparams',
+        'params':
+        'exp/chunk_conformer/checkpoints/multi_cn.pdparams',
+        'lm_url':
+        'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
+        'lm_md5':
+        '29e02312deb2e59b3c8686c7966d4fe3'
+    },
+    "conformer_online_wenetspeech-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz',
+        'md5':
+        'b8c02632b04da34aca88459835be54a6',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/chunk_conformer/checkpoints/avg_10',
+        'model':
+        'exp/chunk_conformer/checkpoints/avg_10.pdparams',
+        'params':
+        'exp/chunk_conformer/checkpoints/avg_10.pdparams',
+        'lm_url':
+        '',
+        'lm_md5':
+        '',
+    },
+}
diff --git a/paddlespeech/server/engine/asr/paddleinference/asr_engine.py b/paddlespeech/server/engine/asr/paddleinference/asr_engine.py
index 1925bf1d623613d073bb028133a348842b591127..e275f1088f648df62947ded43f297cbb8d2c70c2 100644
--- a/paddlespeech/server/engine/asr/paddleinference/asr_engine.py
+++ b/paddlespeech/server/engine/asr/paddleinference/asr_engine.py
@@ -19,6 +19,7 @@ from typing import Optional
 import paddle
 from yacs.config import CfgNode
 
+from .pretrained_models import pretrained_models
 from paddlespeech.cli.asr.infer import ASRExecutor
 from paddlespeech.cli.log import logger
 from paddlespeech.cli.utils import MODEL_HOME
@@ -31,32 +32,11 @@ from paddlespeech.server.utils.paddle_predictor import run_model
 
 __all__ = ['ASREngine']
 
-pretrained_models = {
-    "deepspeech2offline_aishell-zh-16k": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz',
-        'md5':
-        '932c3593d62fe5c741b59b31318aa314',
-        'cfg_path':
-        'model.yaml',
-        'ckpt_path':
-        'exp/deepspeech2/checkpoints/avg_1',
-        'model':
-        'exp/deepspeech2/checkpoints/avg_1.jit.pdmodel',
-        'params':
-        'exp/deepspeech2/checkpoints/avg_1.jit.pdiparams',
-        'lm_url':
-        'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
-        'lm_md5':
-        '29e02312deb2e59b3c8686c7966d4fe3'
-    },
-}
-
 
 class ASRServerExecutor(ASRExecutor):
     def __init__(self):
         super().__init__()
-        pass
+        self.pretrained_models = pretrained_models
 
     def _init_from_path(self,
                         model_type: str='wenetspeech',
@@ -71,18 +51,18 @@ class ASRServerExecutor(ASRExecutor):
         Init model and other resources from a specific path.
         """
 
+        sample_rate_str = '16k' if sample_rate == 16000 else '8k'
+        tag = model_type + '-' + lang + '-' + sample_rate_str
         if cfg_path is None or am_model is None or am_params is None:
-            sample_rate_str = '16k' if sample_rate == 16000 else '8k'
-            tag = model_type + '-' + lang + '-' + sample_rate_str
             res_path = self._get_pretrained_path(tag)  # wenetspeech_zh
             self.res_path = res_path
-            self.cfg_path = os.path.join(res_path,
-                                         pretrained_models[tag]['cfg_path'])
+            self.cfg_path = os.path.join(
+                res_path, self.pretrained_models[tag]['cfg_path'])
 
             self.am_model = os.path.join(res_path,
-                                         pretrained_models[tag]['model'])
+                                         self.pretrained_models[tag]['model'])
             self.am_params = os.path.join(res_path,
-                                          pretrained_models[tag]['params'])
+                                          self.pretrained_models[tag]['params'])
             logger.info(res_path)
             logger.info(self.cfg_path)
             logger.info(self.am_model)
@@ -109,8 +89,8 @@ class ASRServerExecutor(ASRExecutor):
                 self.text_feature = TextFeaturizer(
                     unit_type=self.config.unit_type, vocab=self.vocab)
 
-                lm_url = pretrained_models[tag]['lm_url']
-                lm_md5 = pretrained_models[tag]['lm_md5']
+                lm_url = self.pretrained_models[tag]['lm_url']
+                lm_md5 = self.pretrained_models[tag]['lm_md5']
                 self.download_lm(
                     lm_url,
                     os.path.dirname(self.config.decode.lang_model_path), lm_md5)
diff --git a/paddlespeech/server/engine/asr/paddleinference/pretrained_models.py b/paddlespeech/server/engine/asr/paddleinference/pretrained_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..c4c23e38cfb0b126e91090053054bcc50dc733e1
--- /dev/null
+++ b/paddlespeech/server/engine/asr/paddleinference/pretrained_models.py
@@ -0,0 +1,34 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+pretrained_models = {
+    "deepspeech2offline_aishell-zh-16k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz',
+        'md5':
+        '932c3593d62fe5c741b59b31318aa314',
+        'cfg_path':
+        'model.yaml',
+        'ckpt_path':
+        'exp/deepspeech2/checkpoints/avg_1',
+        'model':
+        'exp/deepspeech2/checkpoints/avg_1.jit.pdmodel',
+        'params':
+        'exp/deepspeech2/checkpoints/avg_1.jit.pdiparams',
+        'lm_url':
+        'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
+        'lm_md5':
+        '29e02312deb2e59b3c8686c7966d4fe3'
+    },
+}
diff --git a/paddlespeech/server/engine/asr/python/asr_engine.py b/paddlespeech/server/engine/asr/python/asr_engine.py
index e76c49a79a66be505f239f9f04b5fdd050701fda..d60a5feaeca6caa5e385f872872104df2a8aa124 100644
--- a/paddlespeech/server/engine/asr/python/asr_engine.py
+++ b/paddlespeech/server/engine/asr/python/asr_engine.py
@@ -78,21 +78,26 @@ class ASREngine(BaseEngine):
         Args:
             audio_data (bytes): base64.b64decode
         """
-        if self.executor._check(
-                io.BytesIO(audio_data), self.config.sample_rate,
-                self.config.force_yes):
-            logger.info("start run asr engine")
-            self.executor.preprocess(self.config.model, io.BytesIO(audio_data))
-            st = time.time()
-            self.executor.infer(self.config.model)
-            infer_time = time.time() - st
-            self.output = self.executor.postprocess()  # Retrieve result of asr.
-        else:
-            logger.info("file check failed!")
-            self.output = None
-
-        logger.info("inference time: {}".format(infer_time))
-        logger.info("asr engine type: python")
+        try:
+            if self.executor._check(
+                    io.BytesIO(audio_data), self.config.sample_rate,
+                    self.config.force_yes):
+                logger.info("start run asr engine")
+                self.executor.preprocess(self.config.model,
+                                         io.BytesIO(audio_data))
+                st = time.time()
+                self.executor.infer(self.config.model)
+                infer_time = time.time() - st
+                self.output = self.executor.postprocess(
+                )  # Retrieve result of asr.
+            else:
+                logger.info("file check failed!")
+                self.output = None
+
+            logger.info("inference time: {}".format(infer_time))
+            logger.info("asr engine type: python")
+        except Exception as e:
+            logger.info(e)
 
     def postprocess(self):
         """postprocess
diff --git a/paddlespeech/server/engine/cls/paddleinference/cls_engine.py b/paddlespeech/server/engine/cls/paddleinference/cls_engine.py
index 3982effd902c9d79b7b7684a7bd0268d0e8c1049..0906c2412d36f2d27393731da18e994772c2addd 100644
--- a/paddlespeech/server/engine/cls/paddleinference/cls_engine.py
+++ b/paddlespeech/server/engine/cls/paddleinference/cls_engine.py
@@ -20,83 +20,20 @@ import numpy as np
 import paddle
 import yaml
 
+from .pretrained_models import pretrained_models
 from paddlespeech.cli.cls.infer import CLSExecutor
 from paddlespeech.cli.log import logger
-from paddlespeech.cli.utils import download_and_decompress
-from paddlespeech.cli.utils import MODEL_HOME
 from paddlespeech.server.engine.base_engine import BaseEngine
 from paddlespeech.server.utils.paddle_predictor import init_predictor
 from paddlespeech.server.utils.paddle_predictor import run_model
 
 __all__ = ['CLSEngine']
 
-pretrained_models = {
-    "panns_cnn6-32k": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn6_static.tar.gz',
-        'md5':
-        'da087c31046d23281d8ec5188c1967da',
-        'cfg_path':
-        'panns.yaml',
-        'model_path':
-        'inference.pdmodel',
-        'params_path':
-        'inference.pdiparams',
-        'label_file':
-        'audioset_labels.txt',
-    },
-    "panns_cnn10-32k": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn10_static.tar.gz',
-        'md5':
-        '5460cc6eafbfaf0f261cc75b90284ae1',
-        'cfg_path':
-        'panns.yaml',
-        'model_path':
-        'inference.pdmodel',
-        'params_path':
-        'inference.pdiparams',
-        'label_file':
-        'audioset_labels.txt',
-    },
-    "panns_cnn14-32k": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn14_static.tar.gz',
-        'md5':
-        'ccc80b194821274da79466862b2ab00f',
-        'cfg_path':
-        'panns.yaml',
-        'model_path':
-        'inference.pdmodel',
-        'params_path':
-        'inference.pdiparams',
-        'label_file':
-        'audioset_labels.txt',
-    },
-}
-
 
 class CLSServerExecutor(CLSExecutor):
     def __init__(self):
         super().__init__()
-        pass
-
-    def _get_pretrained_path(self, tag: str) -> os.PathLike:
-        """
-            Download and returns pretrained resources path of current task.
-        """
-        support_models = list(pretrained_models.keys())
-        assert tag in pretrained_models, 'The model "{}" you want to use has not been supported, please choose other models.\nThe support models includes:\n\t\t{}\n'.format(
-            tag, '\n\t\t'.join(support_models))
-
-        res_path = os.path.join(MODEL_HOME, tag)
-        decompressed_path = download_and_decompress(pretrained_models[tag],
-                                                    res_path)
-        decompressed_path = os.path.abspath(decompressed_path)
-        logger.info(
-            'Use pretrained model stored in: {}'.format(decompressed_path))
-
-        return decompressed_path
+        self.pretrained_models = pretrained_models
 
     def _init_from_path(
             self,
@@ -113,14 +50,14 @@ class CLSServerExecutor(CLSExecutor):
         if cfg_path is None or model_path is None or params_path is None or label_file is None:
             tag = model_type + '-' + '32k'
             self.res_path = self._get_pretrained_path(tag)
-            self.cfg_path = os.path.join(self.res_path,
-                                         pretrained_models[tag]['cfg_path'])
-            self.model_path = os.path.join(self.res_path,
-                                           pretrained_models[tag]['model_path'])
+            self.cfg_path = os.path.join(
+                self.res_path, self.pretrained_models[tag]['cfg_path'])
+            self.model_path = os.path.join(
+                self.res_path, self.pretrained_models[tag]['model_path'])
             self.params_path = os.path.join(
-                self.res_path, pretrained_models[tag]['params_path'])
-            self.label_file = os.path.join(self.res_path,
-                                           pretrained_models[tag]['label_file'])
+                self.res_path, self.pretrained_models[tag]['params_path'])
+            self.label_file = os.path.join(
+                self.res_path, self.pretrained_models[tag]['label_file'])
         else:
             self.cfg_path = os.path.abspath(cfg_path)
             self.model_path = os.path.abspath(model_path)
diff --git a/paddlespeech/server/engine/cls/paddleinference/pretrained_models.py b/paddlespeech/server/engine/cls/paddleinference/pretrained_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..e4914874600c2198e434d267c775dea66f3f252a
--- /dev/null
+++ b/paddlespeech/server/engine/cls/paddleinference/pretrained_models.py
@@ -0,0 +1,58 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+pretrained_models = {
+    "panns_cnn6-32k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn6_static.tar.gz',
+        'md5':
+        'da087c31046d23281d8ec5188c1967da',
+        'cfg_path':
+        'panns.yaml',
+        'model_path':
+        'inference.pdmodel',
+        'params_path':
+        'inference.pdiparams',
+        'label_file':
+        'audioset_labels.txt',
+    },
+    "panns_cnn10-32k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn10_static.tar.gz',
+        'md5':
+        '5460cc6eafbfaf0f261cc75b90284ae1',
+        'cfg_path':
+        'panns.yaml',
+        'model_path':
+        'inference.pdmodel',
+        'params_path':
+        'inference.pdiparams',
+        'label_file':
+        'audioset_labels.txt',
+    },
+    "panns_cnn14-32k": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn14_static.tar.gz',
+        'md5':
+        'ccc80b194821274da79466862b2ab00f',
+        'cfg_path':
+        'panns.yaml',
+        'model_path':
+        'inference.pdmodel',
+        'params_path':
+        'inference.pdiparams',
+        'label_file':
+        'audioset_labels.txt',
+    },
+}
diff --git a/paddlespeech/server/engine/engine_factory.py b/paddlespeech/server/engine/engine_factory.py
index 9ebf137dfcbf074a17dd7df2b19b18c30ac79192..5fdaacceaca3f9d0f0a84ddb82ad7a9426a219ab 100644
--- a/paddlespeech/server/engine/engine_factory.py
+++ b/paddlespeech/server/engine/engine_factory.py
@@ -46,5 +46,14 @@ class EngineFactory(object):
         elif engine_name == 'cls' and engine_type == 'python':
             from paddlespeech.server.engine.cls.python.cls_engine import CLSEngine
             return CLSEngine()
+        elif engine_name.lower() == 'text' and engine_type.lower() == 'python':
+            from paddlespeech.server.engine.text.python.text_engine import TextEngine
+            return TextEngine()
+        elif engine_name.lower() == 'vector' and engine_type.lower() == 'python':
+            from paddlespeech.server.engine.vector.python.vector_engine import VectorEngine
+            return VectorEngine()
+        elif engine_name.lower() == 'acs' and engine_type.lower() == 'python':
+            from paddlespeech.server.engine.acs.python.acs_engine import ACSEngine
+            return ACSEngine()
         else:
             return None
diff --git a/paddlespeech/server/engine/engine_pool.py b/paddlespeech/server/engine/engine_pool.py
index 9de73567e47c8150a7b2807d4bf1cc299e0e1b40..5300303f6b8e991fcade29bfa5aacc421a7003dc 100644
--- a/paddlespeech/server/engine/engine_pool.py
+++ b/paddlespeech/server/engine/engine_pool.py
@@ -34,6 +34,7 @@ def init_engine_pool(config) -> bool:
         engine_type = engine_and_type.split("_")[1]
         ENGINE_POOL[engine] = EngineFactory.get_engine(
             engine_name=engine, engine_type=engine_type)
+
         if not ENGINE_POOL[engine].init(config=config[engine_and_type]):
             return False
 
diff --git a/paddlespeech/server/engine/text/__init__.py b/paddlespeech/server/engine/text/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47
--- /dev/null
+++ b/paddlespeech/server/engine/text/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/paddlespeech/server/engine/text/python/__init__.py b/paddlespeech/server/engine/text/python/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47
--- /dev/null
+++ b/paddlespeech/server/engine/text/python/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/paddlespeech/server/engine/text/python/text_engine.py b/paddlespeech/server/engine/text/python/text_engine.py
new file mode 100644
index 0000000000000000000000000000000000000000..73cf8737beeecbffa5e3ce97eb4b010a1345d719
--- /dev/null
+++ b/paddlespeech/server/engine/text/python/text_engine.py
@@ -0,0 +1,172 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from collections import OrderedDict
+
+import paddle
+
+from paddlespeech.cli.log import logger
+from paddlespeech.cli.text.infer import TextExecutor
+from paddlespeech.server.engine.base_engine import BaseEngine
+
+
+class PaddleTextConnectionHandler:
+    def __init__(self, text_engine):
+        """The PaddleSpeech Text Server Connection Handler
+           This connection process every server request
+        Args:
+            text_engine (TextEngine): The Text engine
+        """
+        super().__init__()
+        logger.info(
+            "Create PaddleTextConnectionHandler to process the text request")
+        self.text_engine = text_engine
+        self.task = self.text_engine.executor.task
+        self.model = self.text_engine.executor.model
+        self.tokenizer = self.text_engine.executor.tokenizer
+        self._punc_list = self.text_engine.executor._punc_list
+        self._inputs = OrderedDict()
+        self._outputs = OrderedDict()
+
+    @paddle.no_grad()
+    def run(self, text):
+        """The connection process the request text
+
+        Args:
+            text (str): the request text
+
+        Returns:
+            str: the punctuation text
+        """
+        self.preprocess(text)
+        self.infer()
+        res = self.postprocess()
+
+        return res
+
+    @paddle.no_grad()
+    def preprocess(self, text):
+        """
+            Input preprocess and return paddle.Tensor stored in self.input.
+            Input content can be a text(tts), a file(asr, cls) or a streaming(not supported yet).
+
+        Args:
+            text (str): the request text
+        """
+        if self.task == 'punc':
+            clean_text = self.text_engine.executor._clean_text(text)
+            assert len(clean_text) > 0, f'Invalid input string: {text}'
+
+            tokenized_input = self.tokenizer(
+                list(clean_text), return_length=True, is_split_into_words=True)
+
+            self._inputs['input_ids'] = tokenized_input['input_ids']
+            self._inputs['seg_ids'] = tokenized_input['token_type_ids']
+            self._inputs['seq_len'] = tokenized_input['seq_len']
+        else:
+            raise NotImplementedError
+
+    @paddle.no_grad()
+    def infer(self):
+        """Model inference and result stored in self.output.
+        """
+        if self.task == 'punc':
+            input_ids = paddle.to_tensor(self._inputs['input_ids']).unsqueeze(0)
+            seg_ids = paddle.to_tensor(self._inputs['seg_ids']).unsqueeze(0)
+            logits, _ = self.model(input_ids, seg_ids)
+            preds = paddle.argmax(logits, axis=-1).squeeze(0)
+
+            self._outputs['preds'] = preds
+        else:
+            raise NotImplementedError
+
+    def postprocess(self):
+        """Output postprocess and return human-readable results such as texts and audio files.
+
+        Returns:
+            str: The punctuation text
+        """
+        if self.task == 'punc':
+            input_ids = self._inputs['input_ids']
+            seq_len = self._inputs['seq_len']
+            preds = self._outputs['preds']
+
+            tokens = self.tokenizer.convert_ids_to_tokens(
+                input_ids[1:seq_len - 1])
+            labels = preds[1:seq_len - 1].tolist()
+            assert len(tokens) == len(labels)
+
+            text = ''
+            for t, l in zip(tokens, labels):
+                text += t
+                if l != 0:  # Non punc.
+                    text += self._punc_list[l]
+
+            return text
+        else:
+            raise NotImplementedError
+
+
+class TextServerExecutor(TextExecutor):
+    def __init__(self):
+        """The wrapper for TextEcutor
+        """
+        super().__init__()
+        pass
+
+
+class TextEngine(BaseEngine):
+    def __init__(self):
+        """The Text Engine
+        """
+        super(TextEngine, self).__init__()
+        logger.info("Create the TextEngine Instance")
+
+    def init(self, config: dict):
+        """Init the Text Engine
+
+        Args:
+            config (dict): The server configuation
+
+        Returns:
+            bool: The engine instance flag
+        """
+        logger.info("Init the text engine")
+        try:
+            self.config = config
+            if self.config.device:
+                self.device = self.config.device
+            else:
+                self.device = paddle.get_device()
+
+            paddle.set_device(self.device)
+            logger.info(f"Text Engine set the device: {self.device}")
+        except BaseException as e:
+            logger.error(
+                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+            )
+            logger.error("Initialize Text server engine Failed on device: %s." %
+                         (self.device))
+            return False
+
+        self.executor = TextServerExecutor()
+        self.executor._init_from_path(
+            task=config.task,
+            model_type=config.model_type,
+            lang=config.lang,
+            cfg_path=config.cfg_path,
+            ckpt_path=config.ckpt_path,
+            vocab_file=config.vocab_file)
+
+        logger.info("Init the text engine successfully")
+        return True
diff --git a/paddlespeech/server/engine/tts/online/onnx/pretrained_models.py b/paddlespeech/server/engine/tts/online/onnx/pretrained_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..789f5be7d7ca16965459fec6df7e40f7713ee104
--- /dev/null
+++ b/paddlespeech/server/engine/tts/online/onnx/pretrained_models.py
@@ -0,0 +1,69 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# support online model
+pretrained_models = {
+    # fastspeech2
+    "fastspeech2_csmsc_onnx-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_onnx_0.2.0.zip',
+        'md5':
+        'fd3ad38d83273ad51f0ea4f4abf3ab4e',
+        'ckpt': ['fastspeech2_csmsc.onnx'],
+        'phones_dict':
+        'phone_id_map.txt',
+        'sample_rate':
+        24000,
+    },
+    "fastspeech2_cnndecoder_csmsc_onnx-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip',
+        'md5':
+        '5f70e1a6bcd29d72d54e7931aa86f266',
+        'ckpt': [
+            'fastspeech2_csmsc_am_encoder_infer.onnx',
+            'fastspeech2_csmsc_am_decoder.onnx',
+            'fastspeech2_csmsc_am_postnet.onnx',
+        ],
+        'speech_stats':
+        'speech_stats.npy',
+        'phones_dict':
+        'phone_id_map.txt',
+        'sample_rate':
+        24000,
+    },
+
+    # mb_melgan
+    "mb_melgan_csmsc_onnx-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip',
+        'md5':
+        '5b83ec746e8414bc29032d954ffd07ec',
+        'ckpt':
+        'mb_melgan_csmsc.onnx',
+        'sample_rate':
+        24000,
+    },
+
+    # hifigan
+    "hifigan_csmsc_onnx-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_onnx_0.2.0.zip',
+        'md5':
+        '1a7dc0385875889e46952e50c0994a6b',
+        'ckpt':
+        'hifigan_csmsc.onnx',
+        'sample_rate':
+        24000,
+    },
+}
diff --git a/paddlespeech/server/engine/tts/online/onnx/tts_engine.py b/paddlespeech/server/engine/tts/online/onnx/tts_engine.py
index abe99ae161ff9d860d21d9838e8da181ac2947fb..792442065074af9168f84b1ce695bb484b01e388 100644
--- a/paddlespeech/server/engine/tts/online/onnx/tts_engine.py
+++ b/paddlespeech/server/engine/tts/online/onnx/tts_engine.py
@@ -20,10 +20,9 @@ from typing import Optional
 import numpy as np
 import paddle
 
+from .pretrained_models import pretrained_models
 from paddlespeech.cli.log import logger
 from paddlespeech.cli.tts.infer import TTSExecutor
-from paddlespeech.cli.utils import download_and_decompress
-from paddlespeech.cli.utils import MODEL_HOME
 from paddlespeech.server.engine.base_engine import BaseEngine
 from paddlespeech.server.utils.audio_process import float2pcm
 from paddlespeech.server.utils.onnx_infer import get_sess
@@ -34,83 +33,6 @@ from paddlespeech.t2s.frontend.zh_frontend import Frontend
 
 __all__ = ['TTSEngine']
 
-# support online model
-pretrained_models = {
-    # fastspeech2
-    "fastspeech2_csmsc_onnx-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_onnx_0.2.0.zip',
-        'md5':
-        'fd3ad38d83273ad51f0ea4f4abf3ab4e',
-        'ckpt': ['fastspeech2_csmsc.onnx'],
-        'phones_dict':
-        'phone_id_map.txt',
-        'sample_rate':
-        24000,
-    },
-    "fastspeech2_cnndecoder_csmsc_onnx-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip',
-        'md5':
-        '5f70e1a6bcd29d72d54e7931aa86f266',
-        'ckpt': [
-            'fastspeech2_csmsc_am_encoder_infer.onnx',
-            'fastspeech2_csmsc_am_decoder.onnx',
-            'fastspeech2_csmsc_am_postnet.onnx',
-        ],
-        'speech_stats':
-        'speech_stats.npy',
-        'phones_dict':
-        'phone_id_map.txt',
-        'sample_rate':
-        24000,
-    },
-
-    # mb_melgan
-    "mb_melgan_csmsc_onnx-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip',
-        'md5':
-        '5b83ec746e8414bc29032d954ffd07ec',
-        'ckpt':
-        'mb_melgan_csmsc.onnx',
-        'sample_rate':
-        24000,
-    },
-
-    # hifigan
-    "hifigan_csmsc_onnx-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_onnx_0.2.0.zip',
-        'md5':
-        '1a7dc0385875889e46952e50c0994a6b',
-        'ckpt':
-        'hifigan_csmsc.onnx',
-        'sample_rate':
-        24000,
-    },
-}
-
-model_alias = {
-    # acoustic model
-    "fastspeech2":
-    "paddlespeech.t2s.models.fastspeech2:FastSpeech2",
-    "fastspeech2_inference":
-    "paddlespeech.t2s.models.fastspeech2:FastSpeech2Inference",
-
-    # voc
-    "mb_melgan":
-    "paddlespeech.t2s.models.melgan:MelGANGenerator",
-    "mb_melgan_inference":
-    "paddlespeech.t2s.models.melgan:MelGANInference",
-    "hifigan":
-    "paddlespeech.t2s.models.hifigan:HiFiGANGenerator",
-    "hifigan_inference":
-    "paddlespeech.t2s.models.hifigan:HiFiGANInference",
-}
-
-__all__ = ['TTSEngine']
-
 
 class TTSServerExecutor(TTSExecutor):
     def __init__(self, am_block, am_pad, voc_block, voc_pad, voc_upsample):
@@ -122,25 +44,6 @@ class TTSServerExecutor(TTSExecutor):
         self.voc_upsample = voc_upsample
 
         self.pretrained_models = pretrained_models
-        self.model_alias = model_alias
-
-    
-    def _get_pretrained_path(self, tag: str) -> os.PathLike:
-        """
-        #Download and returns pretrained resources path of current task.
-        """
-        support_models = list(pretrained_models.keys())
-        assert tag in pretrained_models, 'The model "{}" you want to use has not been supported, please choose other models.\nThe support models includes:\n\t\t{}\n'.format(
-            tag, '\n\t\t'.join(support_models))
-
-        res_path = os.path.join(MODEL_HOME, tag)
-        decompressed_path = download_and_decompress(pretrained_models[tag],
-                                                    res_path)
-        decompressed_path = os.path.abspath(decompressed_path)
-        logger.info(
-            'Use pretrained model stored in: {}'.format(decompressed_path))
-        return decompressed_path
-    
 
     def _init_from_path(
             self,
@@ -175,10 +78,10 @@ class TTSServerExecutor(TTSExecutor):
                 am_res_path = self._get_pretrained_path(am_tag)
                 self.am_res_path = am_res_path
                 self.am_ckpt = os.path.join(
-                    am_res_path, pretrained_models[am_tag]['ckpt'][0])
+                    am_res_path, self.pretrained_models[am_tag]['ckpt'][0])
                 # must have phones_dict in acoustic
                 self.phones_dict = os.path.join(
-                    am_res_path, pretrained_models[am_tag]['phones_dict'])
+                    am_res_path, self.pretrained_models[am_tag]['phones_dict'])
 
             else:
                 self.am_ckpt = os.path.abspath(am_ckpt[0])
@@ -194,16 +97,16 @@ class TTSServerExecutor(TTSExecutor):
                 am_res_path = self._get_pretrained_path(am_tag)
                 self.am_res_path = am_res_path
                 self.am_encoder_infer = os.path.join(
-                    am_res_path, pretrained_models[am_tag]['ckpt'][0])
+                    am_res_path, self.pretrained_models[am_tag]['ckpt'][0])
                 self.am_decoder = os.path.join(
-                    am_res_path, pretrained_models[am_tag]['ckpt'][1])
+                    am_res_path, self.pretrained_models[am_tag]['ckpt'][1])
                 self.am_postnet = os.path.join(
-                    am_res_path, pretrained_models[am_tag]['ckpt'][2])
+                    am_res_path, self.pretrained_models[am_tag]['ckpt'][2])
                 # must have phones_dict in acoustic
                 self.phones_dict = os.path.join(
-                    am_res_path, pretrained_models[am_tag]['phones_dict'])
+                    am_res_path, self.pretrained_models[am_tag]['phones_dict'])
                 self.am_stat = os.path.join(
-                    am_res_path, pretrained_models[am_tag]['speech_stats'])
+                    am_res_path, self.pretrained_models[am_tag]['speech_stats'])
 
             else:
                 self.am_encoder_infer = os.path.abspath(am_ckpt[0])
@@ -231,8 +134,8 @@ class TTSServerExecutor(TTSExecutor):
         if voc_ckpt is None:
             voc_res_path = self._get_pretrained_path(voc_tag)
             self.voc_res_path = voc_res_path
-            self.voc_ckpt = os.path.join(voc_res_path,
-                                         pretrained_models[voc_tag]['ckpt'])
+            self.voc_ckpt = os.path.join(
+                voc_res_path, self.pretrained_models[voc_tag]['ckpt'])
         else:
             self.voc_ckpt = os.path.abspath(voc_ckpt)
             self.voc_res_path = os.path.dirname(os.path.abspath(self.voc_ckpt))
@@ -285,7 +188,6 @@ class TTSServerExecutor(TTSExecutor):
         """
         Model inference and result stored in self.output.
         """
-        #import pdb;pdb.set_trace()
 
         am_block = self.am_block
         am_pad = self.am_pad
@@ -455,10 +357,21 @@ class TTSEngine(BaseEngine):
             self.config.am_block, self.config.am_pad, self.config.voc_block,
             self.config.voc_pad, self.config.voc_upsample)
 
-        if "cpu" in self.config.am_sess_conf.device or "cpu" in self.config.voc_sess_conf.device:
-            paddle.set_device("cpu")
-        else:
-            paddle.set_device(self.config.am_sess_conf.device)
+        try:
+            if self.config.am_sess_conf.device is not None:
+                self.device = self.config.am_sess_conf.device
+            elif self.config.voc_sess_conf.device is not None:
+                self.device = self.config.voc_sess_conf.device
+            else:
+                self.device = paddle.get_device()
+            paddle.set_device(self.device)
+        except BaseException as e:
+            logger.error(
+                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+            )
+            logger.error("Initialize TTS server engine Failed on device: %s." %
+                         (self.device))
+            return False
 
         try:
             self.executor._init_from_path(
@@ -482,16 +395,17 @@ class TTSEngine(BaseEngine):
                          (self.config.voc_sess_conf.device))
             return False
 
-        logger.info("Initialize TTS server engine successfully on device: %s." %
-                    (self.config.voc_sess_conf.device))
-
         # warm up
         try:
             self.warm_up()
+            logger.info("Warm up successfully.")
         except Exception as e:
             logger.error("Failed to warm up on tts engine.")
             return False
 
+        logger.info("Initialize TTS server engine successfully on device: %s." %
+                    (self.config.voc_sess_conf.device))
+
         return True
 
     def warm_up(self):
@@ -501,9 +415,7 @@ class TTSEngine(BaseEngine):
             sentence = "您好，欢迎使用语音合成服务。"
         if self.config.lang == 'en':
             sentence = "Hello and welcome to the speech synthesis service."
-        logger.info(
-            "*******************************warm up ********************************"
-        )
+        logger.info("Start to warm up.")
         for i in range(3):
             for wav in self.executor.infer(
                     text=sentence,
@@ -514,9 +426,6 @@ class TTSEngine(BaseEngine):
                     f"The first response time of the {i} warm up: {self.executor.first_response_time} s"
                 )
                 break
-        logger.info(
-            "**********************************************************************"
-        )
 
     def preprocess(self, text_bese64: str=None, text_bytes: bytes=None):
         # Convert byte to text
diff --git a/paddlespeech/server/engine/tts/online/python/pretrained_models.py b/paddlespeech/server/engine/tts/online/python/pretrained_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf6aded51168c2c21172ec8101413b4cb0e05154
--- /dev/null
+++ b/paddlespeech/server/engine/tts/online/python/pretrained_models.py
@@ -0,0 +1,73 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# support online model
+pretrained_models = {
+    # fastspeech2
+    "fastspeech2_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip',
+        'md5':
+        '637d28a5e53aa60275612ba4393d5f22',
+        'config':
+        'default.yaml',
+        'ckpt':
+        'snapshot_iter_76000.pdz',
+        'speech_stats':
+        'speech_stats.npy',
+        'phones_dict':
+        'phone_id_map.txt',
+    },
+    "fastspeech2_cnndecoder_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip',
+        'md5':
+        '6eb28e22ace73e0ebe7845f86478f89f',
+        'config':
+        'cnndecoder.yaml',
+        'ckpt':
+        'snapshot_iter_153000.pdz',
+        'speech_stats':
+        'speech_stats.npy',
+        'phones_dict':
+        'phone_id_map.txt',
+    },
+
+    # mb_melgan
+    "mb_melgan_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip',
+        'md5':
+        'ee5f0604e20091f0d495b6ec4618b90d',
+        'config':
+        'default.yaml',
+        'ckpt':
+        'snapshot_iter_1000000.pdz',
+        'speech_stats':
+        'feats_stats.npy',
+    },
+
+    # hifigan
+    "hifigan_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip',
+        'md5':
+        'dd40a3d88dfcf64513fba2f0f961ada6',
+        'config':
+        'default.yaml',
+        'ckpt':
+        'snapshot_iter_2500000.pdz',
+        'speech_stats':
+        'feats_stats.npy',
+    },
+}
diff --git a/paddlespeech/server/engine/tts/online/python/tts_engine.py b/paddlespeech/server/engine/tts/online/python/tts_engine.py
index efed19528458198295db98d021f42a0bd52d3300..8dc36f8ef8f6d0d2316e59e8090f43aa2702f8e2 100644
--- a/paddlespeech/server/engine/tts/online/python/tts_engine.py
+++ b/paddlespeech/server/engine/tts/online/python/tts_engine.py
@@ -22,10 +22,9 @@ import paddle
 import yaml
 from yacs.config import CfgNode
 
+from .pretrained_models import pretrained_models
 from paddlespeech.cli.log import logger
 from paddlespeech.cli.tts.infer import TTSExecutor
-from paddlespeech.cli.utils import download_and_decompress
-from paddlespeech.cli.utils import MODEL_HOME
 from paddlespeech.server.engine.base_engine import BaseEngine
 from paddlespeech.server.utils.audio_process import float2pcm
 from paddlespeech.server.utils.util import denorm
@@ -37,87 +36,6 @@ from paddlespeech.utils.dynamic_import import dynamic_import
 
 __all__ = ['TTSEngine']
 
-# support online model
-pretrained_models = {
-    # fastspeech2
-    "fastspeech2_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip',
-        'md5':
-        '637d28a5e53aa60275612ba4393d5f22',
-        'config':
-        'default.yaml',
-        'ckpt':
-        'snapshot_iter_76000.pdz',
-        'speech_stats':
-        'speech_stats.npy',
-        'phones_dict':
-        'phone_id_map.txt',
-    },
-    "fastspeech2_cnndecoder_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip',
-        'md5':
-        '6eb28e22ace73e0ebe7845f86478f89f',
-        'config':
-        'cnndecoder.yaml',
-        'ckpt':
-        'snapshot_iter_153000.pdz',
-        'speech_stats':
-        'speech_stats.npy',
-        'phones_dict':
-        'phone_id_map.txt',
-    },
-
-    # mb_melgan
-    "mb_melgan_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip',
-        'md5':
-        'ee5f0604e20091f0d495b6ec4618b90d',
-        'config':
-        'default.yaml',
-        'ckpt':
-        'snapshot_iter_1000000.pdz',
-        'speech_stats':
-        'feats_stats.npy',
-    },
-
-    # hifigan
-    "hifigan_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip',
-        'md5':
-        'dd40a3d88dfcf64513fba2f0f961ada6',
-        'config':
-        'default.yaml',
-        'ckpt':
-        'snapshot_iter_2500000.pdz',
-        'speech_stats':
-        'feats_stats.npy',
-    },
-}
-
-model_alias = {
-    # acoustic model
-    "fastspeech2":
-    "paddlespeech.t2s.models.fastspeech2:FastSpeech2",
-    "fastspeech2_inference":
-    "paddlespeech.t2s.models.fastspeech2:FastSpeech2Inference",
-
-    # voc
-    "mb_melgan":
-    "paddlespeech.t2s.models.melgan:MelGANGenerator",
-    "mb_melgan_inference":
-    "paddlespeech.t2s.models.melgan:MelGANInference",
-    "hifigan":
-    "paddlespeech.t2s.models.hifigan:HiFiGANGenerator",
-    "hifigan_inference":
-    "paddlespeech.t2s.models.hifigan:HiFiGANInference",
-}
-
-__all__ = ['TTSEngine']
-
 
 class TTSServerExecutor(TTSExecutor):
     def __init__(self, am_block, am_pad, voc_block, voc_pad):
@@ -126,6 +44,7 @@ class TTSServerExecutor(TTSExecutor):
         self.am_pad = am_pad
         self.voc_block = voc_block
         self.voc_pad = voc_pad
+        self.pretrained_models = pretrained_models
 
     def get_model_info(self,
                        field: str,
@@ -146,7 +65,7 @@ class TTSServerExecutor(TTSExecutor):
             [Tensor]: standard deviation
         """
 
-        model_class = dynamic_import(model_name, model_alias)
+        model_class = dynamic_import(model_name, self.model_alias)
 
         if field == "am":
             odim = self.am_config.n_mels
@@ -169,22 +88,6 @@ class TTSServerExecutor(TTSExecutor):
 
         return model, model_mu, model_std
 
-    def _get_pretrained_path(self, tag: str) -> os.PathLike:
-        """
-        Download and returns pretrained resources path of current task.
-        """
-        support_models = list(pretrained_models.keys())
-        assert tag in pretrained_models, 'The model "{}" you want to use has not been supported, please choose other models.\nThe support models includes:\n\t\t{}\n'.format(
-            tag, '\n\t\t'.join(support_models))
-
-        res_path = os.path.join(MODEL_HOME, tag)
-        decompressed_path = download_and_decompress(pretrained_models[tag],
-                                                    res_path)
-        decompressed_path = os.path.abspath(decompressed_path)
-        logger.info(
-            'Use pretrained model stored in: {}'.format(decompressed_path))
-        return decompressed_path
-
     def _init_from_path(
             self,
             am: str='fastspeech2_csmsc',
@@ -202,7 +105,6 @@ class TTSServerExecutor(TTSExecutor):
         """
         Init model and other resources from a specific path.
         """
-        #import pdb;pdb.set_trace()
         if hasattr(self, 'am_inference') and hasattr(self, 'voc_inference'):
             logger.info('Models had been initialized.')
             return
@@ -211,15 +113,15 @@ class TTSServerExecutor(TTSExecutor):
         if am_ckpt is None or am_config is None or am_stat is None or phones_dict is None:
             am_res_path = self._get_pretrained_path(am_tag)
             self.am_res_path = am_res_path
-            self.am_config = os.path.join(am_res_path,
-                                          pretrained_models[am_tag]['config'])
+            self.am_config = os.path.join(
+                am_res_path, self.pretrained_models[am_tag]['config'])
             self.am_ckpt = os.path.join(am_res_path,
-                                        pretrained_models[am_tag]['ckpt'])
+                                        self.pretrained_models[am_tag]['ckpt'])
             self.am_stat = os.path.join(
-                am_res_path, pretrained_models[am_tag]['speech_stats'])
+                am_res_path, self.pretrained_models[am_tag]['speech_stats'])
             # must have phones_dict in acoustic
             self.phones_dict = os.path.join(
-                am_res_path, pretrained_models[am_tag]['phones_dict'])
+                am_res_path, self.pretrained_models[am_tag]['phones_dict'])
             print("self.phones_dict:", self.phones_dict)
             logger.info(am_res_path)
             logger.info(self.am_config)
@@ -240,12 +142,12 @@ class TTSServerExecutor(TTSExecutor):
         if voc_ckpt is None or voc_config is None or voc_stat is None:
             voc_res_path = self._get_pretrained_path(voc_tag)
             self.voc_res_path = voc_res_path
-            self.voc_config = os.path.join(voc_res_path,
-                                           pretrained_models[voc_tag]['config'])
-            self.voc_ckpt = os.path.join(voc_res_path,
-                                         pretrained_models[voc_tag]['ckpt'])
+            self.voc_config = os.path.join(
+                voc_res_path, self.pretrained_models[voc_tag]['config'])
+            self.voc_ckpt = os.path.join(
+                voc_res_path, self.pretrained_models[voc_tag]['ckpt'])
             self.voc_stat = os.path.join(
-                voc_res_path, pretrained_models[voc_tag]['speech_stats'])
+                voc_res_path, self.pretrained_models[voc_tag]['speech_stats'])
             logger.info(voc_res_path)
             logger.info(self.voc_config)
             logger.info(self.voc_ckpt)
@@ -287,7 +189,7 @@ class TTSServerExecutor(TTSExecutor):
                                                     self.am_ckpt, self.am_stat)
             am_normalizer = ZScore(am_mu, am_std)
             am_inference_class = dynamic_import(self.am_name + '_inference',
-                                                model_alias)
+                                                self.model_alias)
             self.am_inference = am_inference_class(am_normalizer, am)
             self.am_inference.eval()
         print("acoustic model done!")
@@ -298,7 +200,7 @@ class TTSServerExecutor(TTSExecutor):
                                                    self.voc_ckpt, self.voc_stat)
         voc_normalizer = ZScore(voc_mu, voc_std)
         voc_inference_class = dynamic_import(self.voc_name + '_inference',
-                                             model_alias)
+                                             self.model_alias)
         self.voc_inference = voc_inference_class(voc_normalizer, voc)
         self.voc_inference.eval()
         print("voc done!")
@@ -391,8 +293,7 @@ class TTSServerExecutor(TTSExecutor):
             # fastspeech2_cnndecoder_csmsc 
             elif am == "fastspeech2_cnndecoder_csmsc":
                 # am 
-                orig_hs, h_masks = self.am_inference.encoder_infer(
-                    part_phone_ids)
+                orig_hs = self.am_inference.encoder_infer(part_phone_ids)
 
                 # streaming voc chunk info
                 mel_len = orig_hs.shape[1]
@@ -404,7 +305,7 @@ class TTSServerExecutor(TTSExecutor):
                 hss = get_chunks(orig_hs, self.am_block, self.am_pad, "am")
                 am_chunk_num = len(hss)
                 for i, hs in enumerate(hss):
-                    before_outs, _ = self.am_inference.decoder(hs)
+                    before_outs = self.am_inference.decoder(hs)
                     after_outs = before_outs + self.am_inference.postnet(
                         before_outs.transpose((0, 2, 1))).transpose((0, 2, 1))
                     normalized_mel = after_outs[0]
@@ -479,7 +380,7 @@ class TTSEngine(BaseEngine):
         ), "Please set correct voc_block and voc_pad, they should be more than 0."
 
         try:
-            if self.config.device:
+            if self.config.device is not None:
                 self.device = self.config.device
             else:
                 self.device = paddle.get_device()
@@ -515,16 +416,16 @@ class TTSEngine(BaseEngine):
                          (self.device))
             return False
 
-        logger.info("Initialize TTS server engine successfully on device: %s." %
-                    (self.device))
-
         # warm up
         try:
             self.warm_up()
+            logger.info("Warm up successfully.")
         except Exception as e:
             logger.error("Failed to warm up on tts engine.")
             return False
 
+        logger.info("Initialize TTS server engine successfully on device: %s." %
+                    (self.device))
         return True
 
     def warm_up(self):
@@ -534,9 +435,7 @@ class TTSEngine(BaseEngine):
             sentence = "您好，欢迎使用语音合成服务。"
         if self.config.lang == 'en':
             sentence = "Hello and welcome to the speech synthesis service."
-        logger.info(
-            "*******************************warm up ********************************"
-        )
+        logger.info("Start to warm up.")
         for i in range(3):
             for wav in self.executor.infer(
                     text=sentence,
@@ -547,9 +446,6 @@ class TTSEngine(BaseEngine):
                     f"The first response time of the {i} warm up: {self.executor.first_response_time} s"
                 )
                 break
-        logger.info(
-            "**********************************************************************"
-        )
 
     def preprocess(self, text_bese64: str=None, text_bytes: bytes=None):
         # Convert byte to text
@@ -609,4 +505,4 @@ class TTSEngine(BaseEngine):
         logger.info(f"RTF: {self.executor.final_response_time / duration}")
         logger.info(
             f"Other info: front time: {self.executor.frontend_time} s, first am infer time: {self.executor.first_am_infer} s, first voc infer time: {self.executor.first_voc_infer} s,"
-        )
+        )
\ No newline at end of file
diff --git a/paddlespeech/server/engine/tts/paddleinference/pretrained_models.py b/paddlespeech/server/engine/tts/paddleinference/pretrained_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..9618a7a697765f532a172c551b6be733a68a1bec
--- /dev/null
+++ b/paddlespeech/server/engine/tts/paddleinference/pretrained_models.py
@@ -0,0 +1,87 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Static model applied on paddle inference
+pretrained_models = {
+    # speedyspeech
+    "speedyspeech_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_static_0.5.zip',
+        'md5':
+        'f10cbdedf47dc7a9668d2264494e1823',
+        'model':
+        'speedyspeech_csmsc.pdmodel',
+        'params':
+        'speedyspeech_csmsc.pdiparams',
+        'phones_dict':
+        'phone_id_map.txt',
+        'tones_dict':
+        'tone_id_map.txt',
+        'sample_rate':
+        24000,
+    },
+    # fastspeech2
+    "fastspeech2_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_static_0.4.zip',
+        'md5':
+        '9788cd9745e14c7a5d12d32670b2a5a7',
+        'model':
+        'fastspeech2_csmsc.pdmodel',
+        'params':
+        'fastspeech2_csmsc.pdiparams',
+        'phones_dict':
+        'phone_id_map.txt',
+        'sample_rate':
+        24000,
+    },
+    # pwgan
+    "pwgan_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip',
+        'md5':
+        'e3504aed9c5a290be12d1347836d2742',
+        'model':
+        'pwgan_csmsc.pdmodel',
+        'params':
+        'pwgan_csmsc.pdiparams',
+        'sample_rate':
+        24000,
+    },
+    # mb_melgan
+    "mb_melgan_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip',
+        'md5':
+        'ac6eee94ba483421d750433f4c3b8d36',
+        'model':
+        'mb_melgan_csmsc.pdmodel',
+        'params':
+        'mb_melgan_csmsc.pdiparams',
+        'sample_rate':
+        24000,
+    },
+    # hifigan
+    "hifigan_csmsc-zh": {
+        'url':
+        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip',
+        'md5':
+        '7edd8c436b3a5546b3a7cb8cff9d5a0c',
+        'model':
+        'hifigan_csmsc.pdmodel',
+        'params':
+        'hifigan_csmsc.pdiparams',
+        'sample_rate':
+        24000,
+    },
+}
diff --git a/paddlespeech/server/engine/tts/paddleinference/tts_engine.py b/paddlespeech/server/engine/tts/paddleinference/tts_engine.py
index db8813ba901a93fa935ce003b8a7abdeec245485..f1ce8b76e2eacd378ccb8657486716ffb5ad4036 100644
--- a/paddlespeech/server/engine/tts/paddleinference/tts_engine.py
+++ b/paddlespeech/server/engine/tts/paddleinference/tts_engine.py
@@ -23,10 +23,9 @@ import paddle
 import soundfile as sf
 from scipy.io import wavfile
 
+from .pretrained_models import pretrained_models
 from paddlespeech.cli.log import logger
 from paddlespeech.cli.tts.infer import TTSExecutor
-from paddlespeech.cli.utils import download_and_decompress
-from paddlespeech.cli.utils import MODEL_HOME
 from paddlespeech.server.engine.base_engine import BaseEngine
 from paddlespeech.server.utils.audio_process import change_speed
 from paddlespeech.server.utils.errors import ErrorCode
@@ -38,101 +37,11 @@ from paddlespeech.t2s.frontend.zh_frontend import Frontend
 
 __all__ = ['TTSEngine']
 
-# Static model applied on paddle inference
-pretrained_models = {
-    # speedyspeech
-    "speedyspeech_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_static_0.5.zip',
-        'md5':
-        'f10cbdedf47dc7a9668d2264494e1823',
-        'model':
-        'speedyspeech_csmsc.pdmodel',
-        'params':
-        'speedyspeech_csmsc.pdiparams',
-        'phones_dict':
-        'phone_id_map.txt',
-        'tones_dict':
-        'tone_id_map.txt',
-        'sample_rate':
-        24000,
-    },
-    # fastspeech2
-    "fastspeech2_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_static_0.4.zip',
-        'md5':
-        '9788cd9745e14c7a5d12d32670b2a5a7',
-        'model':
-        'fastspeech2_csmsc.pdmodel',
-        'params':
-        'fastspeech2_csmsc.pdiparams',
-        'phones_dict':
-        'phone_id_map.txt',
-        'sample_rate':
-        24000,
-    },
-    # pwgan
-    "pwgan_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip',
-        'md5':
-        'e3504aed9c5a290be12d1347836d2742',
-        'model':
-        'pwgan_csmsc.pdmodel',
-        'params':
-        'pwgan_csmsc.pdiparams',
-        'sample_rate':
-        24000,
-    },
-    # mb_melgan
-    "mb_melgan_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip',
-        'md5':
-        'ac6eee94ba483421d750433f4c3b8d36',
-        'model':
-        'mb_melgan_csmsc.pdmodel',
-        'params':
-        'mb_melgan_csmsc.pdiparams',
-        'sample_rate':
-        24000,
-    },
-    # hifigan
-    "hifigan_csmsc-zh": {
-        'url':
-        'https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip',
-        'md5':
-        '7edd8c436b3a5546b3a7cb8cff9d5a0c',
-        'model':
-        'hifigan_csmsc.pdmodel',
-        'params':
-        'hifigan_csmsc.pdiparams',
-        'sample_rate':
-        24000,
-    },
-}
-
 
 class TTSServerExecutor(TTSExecutor):
     def __init__(self):
         super().__init__()
-        pass
-
-    def _get_pretrained_path(self, tag: str) -> os.PathLike:
-        """
-        Download and returns pretrained resources path of current task.
-        """
-        assert tag in pretrained_models, 'Can not find pretrained resources of {}.'.format(
-            tag)
-
-        res_path = os.path.join(MODEL_HOME, tag)
-        decompressed_path = download_and_decompress(pretrained_models[tag],
-                                                    res_path)
-        decompressed_path = os.path.abspath(decompressed_path)
-        logger.info(
-            'Use pretrained model stored in: {}'.format(decompressed_path))
-        return decompressed_path
+        self.pretrained_models = pretrained_models
 
     def _init_from_path(
             self,
@@ -161,14 +70,14 @@ class TTSServerExecutor(TTSExecutor):
         if am_model is None or am_params is None or phones_dict is None:
             am_res_path = self._get_pretrained_path(am_tag)
             self.am_res_path = am_res_path
-            self.am_model = os.path.join(am_res_path,
-                                         pretrained_models[am_tag]['model'])
-            self.am_params = os.path.join(am_res_path,
-                                          pretrained_models[am_tag]['params'])
+            self.am_model = os.path.join(
+                am_res_path, self.pretrained_models[am_tag]['model'])
+            self.am_params = os.path.join(
+                am_res_path, self.pretrained_models[am_tag]['params'])
             # must have phones_dict in acoustic
             self.phones_dict = os.path.join(
-                am_res_path, pretrained_models[am_tag]['phones_dict'])
-            self.am_sample_rate = pretrained_models[am_tag]['sample_rate']
+                am_res_path, self.pretrained_models[am_tag]['phones_dict'])
+            self.am_sample_rate = self.pretrained_models[am_tag]['sample_rate']
 
             logger.info(am_res_path)
             logger.info(self.am_model)
@@ -183,17 +92,17 @@ class TTSServerExecutor(TTSExecutor):
 
         # for speedyspeech
         self.tones_dict = None
-        if 'tones_dict' in pretrained_models[am_tag]:
+        if 'tones_dict' in self.pretrained_models[am_tag]:
             self.tones_dict = os.path.join(
-                am_res_path, pretrained_models[am_tag]['tones_dict'])
+                am_res_path, self.pretrained_models[am_tag]['tones_dict'])
             if tones_dict:
                 self.tones_dict = tones_dict
 
         # for multi speaker fastspeech2
         self.speaker_dict = None
-        if 'speaker_dict' in pretrained_models[am_tag]:
+        if 'speaker_dict' in self.pretrained_models[am_tag]:
             self.speaker_dict = os.path.join(
-                am_res_path, pretrained_models[am_tag]['speaker_dict'])
+                am_res_path, self.pretrained_models[am_tag]['speaker_dict'])
             if speaker_dict:
                 self.speaker_dict = speaker_dict
 
@@ -202,11 +111,12 @@ class TTSServerExecutor(TTSExecutor):
         if voc_model is None or voc_params is None:
             voc_res_path = self._get_pretrained_path(voc_tag)
             self.voc_res_path = voc_res_path
-            self.voc_model = os.path.join(voc_res_path,
-                                          pretrained_models[voc_tag]['model'])
-            self.voc_params = os.path.join(voc_res_path,
-                                           pretrained_models[voc_tag]['params'])
-            self.voc_sample_rate = pretrained_models[voc_tag]['sample_rate']
+            self.voc_model = os.path.join(
+                voc_res_path, self.pretrained_models[voc_tag]['model'])
+            self.voc_params = os.path.join(
+                voc_res_path, self.pretrained_models[voc_tag]['params'])
+            self.voc_sample_rate = self.pretrained_models[voc_tag][
+                'sample_rate']
             logger.info(voc_res_path)
             logger.info(self.voc_model)
             logger.info(self.voc_params)
@@ -352,8 +262,24 @@ class TTSEngine(BaseEngine):
 
     def init(self, config: dict) -> bool:
         self.executor = TTSServerExecutor()
-
         self.config = config
+
+        try:
+            if self.config.am_predictor_conf.device is not None:
+                self.device = self.config.am_predictor_conf.device
+            elif self.config.voc_predictor_conf.device is not None:
+                self.device = self.config.voc_predictor_conf.device
+            else:
+                self.device = paddle.get_device()
+            paddle.set_device(self.device)
+        except BaseException as e:
+            logger.error(
+                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+            )
+            logger.error("Initialize TTS server engine Failed on device: %s." %
+                         (self.device))
+            return False
+
         self.executor._init_from_path(
             am=self.config.am,
             am_model=self.config.am_model,
@@ -370,9 +296,35 @@ class TTSEngine(BaseEngine):
             am_predictor_conf=self.config.am_predictor_conf,
             voc_predictor_conf=self.config.voc_predictor_conf, )
 
+        # warm up
+        try:
+            self.warm_up()
+            logger.info("Warm up successfully.")
+        except Exception as e:
+            logger.error("Failed to warm up on tts engine.")
+            return False
+
         logger.info("Initialize TTS server engine successfully.")
         return True
 
+    def warm_up(self):
+        """warm up
+        """
+        if self.config.lang == 'zh':
+            sentence = "您好，欢迎使用语音合成服务。"
+        if self.config.lang == 'en':
+            sentence = "Hello and welcome to the speech synthesis service."
+        logger.info("Start to warm up.")
+        for i in range(3):
+            st = time.time()
+            self.executor.infer(
+                text=sentence,
+                lang=self.config.lang,
+                am=self.config.am,
+                spk_id=0, )
+            logger.info(
+                f"The response time of the {i} warm up: {time.time() - st} s")
+
     def postprocess(self,
                     wav,
                     original_fs: int,
diff --git a/paddlespeech/server/engine/tts/python/tts_engine.py b/paddlespeech/server/engine/tts/python/tts_engine.py
index f153f60b966682fea72418643b29adc38ffa1f07..d0002baa4f46c949e8258a7bea527a18b781b657 100644
--- a/paddlespeech/server/engine/tts/python/tts_engine.py
+++ b/paddlespeech/server/engine/tts/python/tts_engine.py
@@ -51,15 +51,15 @@ class TTSEngine(BaseEngine):
 
     def init(self, config: dict) -> bool:
         self.executor = TTSServerExecutor()
+        self.config = config
 
         try:
-            self.config = config
-            if self.config.device:
+            if self.config.device is not None:
                 self.device = self.config.device
             else:
                 self.device = paddle.get_device()
             paddle.set_device(self.device)
-        except BaseException:
+        except BaseException as e:
             logger.error(
                 "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
             )
@@ -87,10 +87,36 @@ class TTSEngine(BaseEngine):
                          (self.device))
             return False
 
+        # warm up
+        try:
+            self.warm_up()
+            logger.info("Warm up successfully.")
+        except Exception as e:
+            logger.error("Failed to warm up on tts engine.")
+            return False
+
         logger.info("Initialize TTS server engine successfully on device: %s." %
                     (self.device))
         return True
 
+    def warm_up(self):
+        """warm up
+        """
+        if self.config.lang == 'zh':
+            sentence = "您好，欢迎使用语音合成服务。"
+        if self.config.lang == 'en':
+            sentence = "Hello and welcome to the speech synthesis service."
+        logger.info("Start to warm up.")
+        for i in range(3):
+            st = time.time()
+            self.executor.infer(
+                text=sentence,
+                lang=self.config.lang,
+                am=self.config.am,
+                spk_id=0, )
+            logger.info(
+                f"The response time of the {i} warm up: {time.time() - st} s")
+
     def postprocess(self,
                     wav,
                     original_fs: int,
diff --git a/paddlespeech/server/engine/vector/__init__.py b/paddlespeech/server/engine/vector/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47
--- /dev/null
+++ b/paddlespeech/server/engine/vector/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/paddlespeech/server/engine/vector/python/__init__.py b/paddlespeech/server/engine/vector/python/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47
--- /dev/null
+++ b/paddlespeech/server/engine/vector/python/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/paddlespeech/server/engine/vector/python/vector_engine.py b/paddlespeech/server/engine/vector/python/vector_engine.py
new file mode 100644
index 0000000000000000000000000000000000000000..854303701d1717478b5083a27879a7071434702f
--- /dev/null
+++ b/paddlespeech/server/engine/vector/python/vector_engine.py
@@ -0,0 +1,200 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import io
+from collections import OrderedDict
+
+import numpy as np
+import paddle
+from paddleaudio.backends import load as load_audio
+from paddleaudio.compliance.librosa import melspectrogram
+
+from paddlespeech.cli.log import logger
+from paddlespeech.cli.vector.infer import VectorExecutor
+from paddlespeech.server.engine.base_engine import BaseEngine
+from paddlespeech.vector.io.batch import feature_normalize
+
+
+class PaddleVectorConnectionHandler:
+    def __init__(self, vector_engine):
+        """The PaddleSpeech Vector Server Connection Handler
+           This connection process every server request
+        Args:
+            vector_engine (VectorEngine): The Vector engine
+        """
+        super().__init__()
+        logger.info(
+            "Create PaddleVectorConnectionHandler to process the vector request")
+        self.vector_engine = vector_engine
+        self.executor = self.vector_engine.executor
+        self.task = self.vector_engine.executor.task
+        self.model = self.vector_engine.executor.model
+        self.config = self.vector_engine.executor.config
+
+        self._inputs = OrderedDict()
+        self._outputs = OrderedDict()
+
+    @paddle.no_grad()
+    def run(self, audio_data, task="spk"):
+        """The connection process the http request audio
+
+        Args:
+            audio_data (bytes): base64.b64decode
+
+        Returns:
+            str: the punctuation text
+        """
+        logger.info(
+            f"start to extract the do vector {self.task} from the http request")
+        if self.task == "spk" and task == "spk":
+            embedding = self.extract_audio_embedding(audio_data)
+            return embedding
+        else:
+            logger.error(
+                "The request task is not matched with server model task")
+            logger.error(
+                f"The server model task is: {self.task}, but the request task is: {task}"
+            )
+
+        return np.array([
+            0.0,
+        ])
+
+    @paddle.no_grad()
+    def get_enroll_test_score(self, enroll_audio, test_audio):
+        """Get the enroll and test audio score
+
+        Args:
+            enroll_audio (str): the base64 format enroll audio
+            test_audio (str): the base64 format test audio
+
+        Returns:
+            float: the score between enroll and test audio
+        """
+        logger.info("start to extract the enroll audio embedding")
+        enroll_emb = self.extract_audio_embedding(enroll_audio)
+
+        logger.info("start to extract the test audio embedding")
+        test_emb = self.extract_audio_embedding(test_audio)
+
+        logger.info(
+            "start to get the score between the enroll and test embedding")
+        score = self.executor.get_embeddings_score(enroll_emb, test_emb)
+
+        logger.info(f"get the enroll vs test score: {score}")
+        return score
+
+    @paddle.no_grad()
+    def extract_audio_embedding(self, audio: str, sample_rate: int=16000):
+        """extract the audio embedding
+
+        Args:
+            audio (str): the audio data
+            sample_rate (int, optional): the audio sample rate. Defaults to 16000.
+        """
+        # we can not reuse the cache io.BytesIO(audio) data, 
+        # because the soundfile will change the io.BytesIO(audio) to the end
+        # thus we should convert the base64 string to io.BytesIO when we need the audio data
+        if not self.executor._check(io.BytesIO(audio), sample_rate):
+            logger.info("check the audio sample rate occurs error")
+            return np.array([0.0])
+
+        waveform, sr = load_audio(io.BytesIO(audio))
+        logger.info(f"load the audio sample points, shape is: {waveform.shape}")
+
+        # stage 2: get the audio feat
+        # Note: Now we only support fbank feature
+        try:
+            feats = melspectrogram(
+                x=waveform,
+                sr=self.config.sr,
+                n_mels=self.config.n_mels,
+                window_size=self.config.window_size,
+                hop_length=self.config.hop_size)
+            logger.info(f"extract the audio feats, shape is: {feats.shape}")
+        except Exception as e:
+            logger.info(f"feats occurs exception {e}")
+            sys.exit(-1)
+
+        feats = paddle.to_tensor(feats).unsqueeze(0)
+        # in inference period, the lengths is all one without padding
+        lengths = paddle.ones([1])
+
+        # stage 3: we do feature normalize,
+        #          Now we assume that the feats must do normalize
+        feats = feature_normalize(feats, mean_norm=True, std_norm=False)
+
+        # stage 4: store the feats and length in the _inputs,
+        #          which will be used in other function
+        logger.info(f"feats shape: {feats.shape}")
+        logger.info("audio extract the feats success")
+
+        logger.info("start to extract the audio embedding")
+        embedding = self.model.backbone(feats, lengths).squeeze().numpy()
+        logger.info(f"embedding size: {embedding.shape}")
+
+        return embedding
+
+
+class VectorServerExecutor(VectorExecutor):
+    def __init__(self):
+        """The wrapper for TextEcutor
+        """
+        super().__init__()
+        pass
+
+
+class VectorEngine(BaseEngine):
+    def __init__(self):
+        """The Vector Engine
+        """
+        super(VectorEngine, self).__init__()
+        logger.info("Create the VectorEngine Instance")
+
+    def init(self, config: dict):
+        """Init the Vector Engine
+
+        Args:
+            config (dict): The server configuation
+
+        Returns:
+            bool: The engine instance flag
+        """
+        logger.info("Init the vector engine")
+        try:
+            self.config = config
+            if self.config.device:
+                self.device = self.config.device
+            else:
+                self.device = paddle.get_device()
+
+            paddle.set_device(self.device)
+            logger.info(f"Vector Engine set the device: {self.device}")
+        except BaseException as e:
+            logger.error(
+                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+            )
+            logger.error("Initialize Vector server engine Failed on device: %s."
+                         % (self.device))
+            return False
+
+        self.executor = VectorServerExecutor()
+
+        self.executor._init_from_path(
+            model_type=config.model_type,
+            cfg_path=config.cfg_path,
+            ckpt_path=config.ckpt_path,
+            task=config.task)
+
+        logger.info("Init the Vector engine successfully")
+        return True
diff --git a/paddlespeech/server/restful/acs_api.py b/paddlespeech/server/restful/acs_api.py
new file mode 100644
index 0000000000000000000000000000000000000000..61cb34d9f47ba7dade4bc769cc179b597b7e2a8b
--- /dev/null
+++ b/paddlespeech/server/restful/acs_api.py
@@ -0,0 +1,101 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import base64
+from typing import Union
+
+from fastapi import APIRouter
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.engine_pool import get_engine_pool
+from paddlespeech.server.restful.request import ASRRequest
+from paddlespeech.server.restful.response import ACSResponse
+from paddlespeech.server.restful.response import ErrorResponse
+from paddlespeech.server.utils.errors import ErrorCode
+from paddlespeech.server.utils.errors import failed_response
+from paddlespeech.server.utils.exception import ServerBaseException
+
+router = APIRouter()
+
+
+@router.get('/paddlespeech/asr/search/help')
+def help():
+    """help
+
+    Returns:
+        json: the audio content search result
+    """
+    response = {
+        "success": "True",
+        "code": 200,
+        "message": {
+            "global": "success"
+        },
+        "result": {
+            "description": "acs server",
+            "input": "base64 string of wavfile",
+            "output": {
+                "asr_result": "你好",
+                "acs_result": [{
+                    'w': '你',
+                    'bg': 0.0,
+                    'ed': 1.2
+                }]
+            }
+        }
+    }
+    return response
+
+
+@router.post(
+    "/paddlespeech/asr/search",
+    response_model=Union[ACSResponse, ErrorResponse])
+def acs(request_body: ASRRequest):
+    """acs api 
+
+    Args:
+        request_body (ASRRequest): the acs request, we reuse the http ASRRequest
+
+    Returns:
+        json: the acs result
+    """
+    try:
+        # 1. get the audio data via base64 decoding
+        audio_data = base64.b64decode(request_body.audio)
+
+        # 2. get single engine from engine pool
+        engine_pool = get_engine_pool()
+        acs_engine = engine_pool['acs']
+
+        # 3. no data stored in acs_engine, so we need to create the another instance process the data
+        acs_result, asr_result = acs_engine.run(audio_data)
+
+        response = {
+            "success": True,
+            "code": 200,
+            "message": {
+                "description": "success"
+            },
+            "result": {
+                "transcription": asr_result,
+                "acs": acs_result
+            }
+        }
+
+    except ServerBaseException as e:
+        response = failed_response(e.error_code, e.msg)
+    except BaseException as e:
+        response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
+        logger.error(e)
+
+    return response
diff --git a/paddlespeech/server/restful/api.py b/paddlespeech/server/restful/api.py
index 3f91a03b6473f95a69f6a4f0da3ce3c9b911eeae..1c2dd28147cce5307dcfac0fa27383f178868eeb 100644
--- a/paddlespeech/server/restful/api.py
+++ b/paddlespeech/server/restful/api.py
@@ -11,14 +11,18 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import sys
 from typing import List
 
 from fastapi import APIRouter
 
+from paddlespeech.cli.log import logger
 from paddlespeech.server.restful.asr_api import router as asr_router
 from paddlespeech.server.restful.cls_api import router as cls_router
+from paddlespeech.server.restful.text_api import router as text_router
 from paddlespeech.server.restful.tts_api import router as tts_router
-
+from paddlespeech.server.restful.vector_api import router as vec_router
+from paddlespeech.server.restful.acs_api import router as acs_router
 _router = APIRouter()
 
 
@@ -26,19 +30,27 @@ def setup_router(api_list: List):
     """setup router for fastapi
 
     Args:
-        api_list (List): [asr, tts, cls]
+        api_list (List): [asr, tts, cls, text, vecotr]
 
     Returns:
         APIRouter
     """
     for api_name in api_list:
-        if api_name == 'asr':
+        if api_name.lower() == 'asr':
             _router.include_router(asr_router)
-        elif api_name == 'tts':
+        elif api_name.lower() == 'tts':
             _router.include_router(tts_router)
-        elif api_name == 'cls':
+        elif api_name.lower() == 'cls':
             _router.include_router(cls_router)
+        elif api_name.lower() == 'text':
+            _router.include_router(text_router)
+        elif api_name.lower() == 'vector':
+            _router.include_router(vec_router)
+        elif api_name.lower() == 'acs':
+            _router.include_router(acs_router)
         else:
-            pass
+            logger.error(
+                f"PaddleSpeech has not support such service: {api_name}")
+            sys.exit(-1)
 
     return _router
diff --git a/paddlespeech/server/restful/request.py b/paddlespeech/server/restful/request.py
index dbac9dac881f7b3ed04e0ab17592b0eb5ff5884d..b7a32481f2c4963b3c5c123fd9cefb954fcdebdc 100644
--- a/paddlespeech/server/restful/request.py
+++ b/paddlespeech/server/restful/request.py
@@ -15,7 +15,10 @@ from typing import Optional
 
 from pydantic import BaseModel
 
-__all__ = ['ASRRequest', 'TTSRequest', 'CLSRequest']
+__all__ = [
+    'ASRRequest', 'TTSRequest', 'CLSRequest', 'VectorRequest',
+    'VectorScoreRequest'
+]
 
 
 #****************************************************************************************/
@@ -78,3 +81,47 @@ class CLSRequest(BaseModel):
     """
     audio: str
     topk: int = 1
+
+
+#****************************************************************************************/
+#************************************ Text request **************************************/
+#****************************************************************************************/
+class TextRequest(BaseModel):
+    text: str
+
+
+#****************************************************************************************/
+#************************************ Vecotr request ************************************/
+#****************************************************************************************/
+class VectorRequest(BaseModel):
+    """
+    request body example
+    {
+        "audio": "exSI6ICJlbiIsCgkgICAgInBvc2l0aW9uIjogImZhbHNlIgoJf...",
+        "task": "spk",
+        "audio_format": "wav",
+        "sample_rate": 16000,
+    }
+    """
+    audio: str
+    task: str
+    audio_format: str
+    sample_rate: int
+
+
+class VectorScoreRequest(BaseModel):
+    """
+    request body example
+    {
+        "enroll_audio": "exSI6ICJlbiIsCgkgICAgInBvc2l0aW9uIjogImZhbHNlIgoJf...",
+        "test_audio": "exSI6ICJlbiIsCgkgICAgInBvc2l0aW9uIjogImZhbHNlIgoJf...",
+        "task": "score",
+        "audio_format": "wav",
+        "sample_rate": 16000,
+    }
+    """
+    enroll_audio: str
+    test_audio: str
+    task: str
+    audio_format: str
+    sample_rate: int
diff --git a/paddlespeech/server/restful/response.py b/paddlespeech/server/restful/response.py
index a2a207e4f689103c96bd513a8552fcdc3cce24d4..3d991de43ed41c62284587c7c9924965068c69a3 100644
--- a/paddlespeech/server/restful/response.py
+++ b/paddlespeech/server/restful/response.py
@@ -15,7 +15,10 @@ from typing import List
 
 from pydantic import BaseModel
 
-__all__ = ['ASRResponse', 'TTSResponse', 'CLSResponse']
+__all__ = [
+    'ASRResponse', 'TTSResponse', 'CLSResponse', 'TextResponse',
+    'VectorResponse', 'VectorScoreResponse', 'ACSResponse'
+]
 
 
 class Message(BaseModel):
@@ -129,6 +132,88 @@ class CLSResponse(BaseModel):
     result: CLSResult
 
 
+#****************************************************************************************/
+#************************************ Text response **************************************/
+#****************************************************************************************/
+
+
+class TextResult(BaseModel):
+    punc_text: str
+
+
+class TextResponse(BaseModel):
+    """
+    response example
+    {
+        "success": true,
+        "code": 0,
+        "message": {
+            "description": "success" 
+        },
+        "result": {
+            "punc_text": "你好，飞桨"
+        }
+    }
+    """
+    success: bool
+    code: int
+    message: Message
+    result: TextResult
+
+
+#****************************************************************************************/
+#************************************ Vector response **************************************/
+#****************************************************************************************/
+
+
+class VectorResult(BaseModel):
+    vec: list
+
+
+class VectorResponse(BaseModel):
+    """
+    response example
+    {
+        "success": true,
+        "code": 0,
+        "message": {
+            "description": "success" 
+        },
+        "result": {
+            "vec": [1.0, 1.0]
+        }
+    }
+    """
+    success: bool
+    code: int
+    message: Message
+    result: VectorResult
+
+
+class VectorScoreResult(BaseModel):
+    score: float
+
+
+class VectorScoreResponse(BaseModel):
+    """
+    response example
+    {
+        "success": true,
+        "code": 0,
+        "message": {
+            "description": "success" 
+        },
+        "result": {
+            "score": 1.0
+        }
+    }
+    """
+    success: bool
+    code: int
+    message: Message
+    result: VectorScoreResult
+
+
 #****************************************************************************************/
 #********************************** Error response **************************************/
 #****************************************************************************************/
@@ -146,3 +231,32 @@ class ErrorResponse(BaseModel):
     success: bool
     code: int
     message: Message
+
+
+#****************************************************************************************/
+#************************************ ACS response **************************************/
+#****************************************************************************************/
+class AcsResult(BaseModel):
+    transcription: str
+    acs: list
+
+
+class ACSResponse(BaseModel):
+    """
+    response example
+    {
+        "success": true,
+        "code": 0,
+        "message": {
+            "description": "success" 
+        },
+        "result": {
+            "transcription": "你好，飞桨"
+            "acs": [(你好, 0.0, 0.45)]
+        }
+    }
+    """
+    success: bool
+    code: int
+    message: Message
+    result: AcsResult
diff --git a/paddlespeech/server/restful/text_api.py b/paddlespeech/server/restful/text_api.py
new file mode 100644
index 0000000000000000000000000000000000000000..696630fbfb095fc63208bba41f8047edc66196d3
--- /dev/null
+++ b/paddlespeech/server/restful/text_api.py
@@ -0,0 +1,96 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import traceback
+from typing import Union
+
+from fastapi import APIRouter
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.engine_pool import get_engine_pool
+from paddlespeech.server.engine.text.python.text_engine import PaddleTextConnectionHandler
+from paddlespeech.server.restful.request import TextRequest
+from paddlespeech.server.restful.response import ErrorResponse
+from paddlespeech.server.restful.response import TextResponse
+from paddlespeech.server.utils.errors import ErrorCode
+from paddlespeech.server.utils.errors import failed_response
+from paddlespeech.server.utils.exception import ServerBaseException
+router = APIRouter()
+
+
+@router.get('/paddlespeech/text/help')
+def help():
+    """help
+
+    Returns:
+        json: The /paddlespeech/text api response content
+    """
+    response = {
+        "success": "True",
+        "code": 200,
+        "message": {
+            "global": "success"
+        },
+        "result": {
+            "punc_text": "The punctuation text content"
+        }
+    }
+    return response
+
+
+@router.post(
+    "/paddlespeech/text", response_model=Union[TextResponse, ErrorResponse])
+def asr(request_body: TextRequest):
+    """asr api 
+
+    Args:
+        request_body (TextRequest): the punctuation request body
+
+    Returns:
+        json: the punctuation response body
+    """
+    try:
+        # 1. we get the sentence content from the request
+        text = request_body.text
+        logger.info(f"Text service receive the {text}")
+
+        # 2. get single engine from engine pool
+        #    and each request has its own connection to process the text
+        engine_pool = get_engine_pool()
+        text_engine = engine_pool['text']
+        connection_handler = PaddleTextConnectionHandler(text_engine)
+        punc_text = connection_handler.run(text)
+        logger.info(f"Get the Text Connection result {punc_text}")
+
+        # 3. create the response
+        if punc_text is None:
+            punc_text = text
+        response = {
+            "success": True,
+            "code": 200,
+            "message": {
+                "description": "success"
+            },
+            "result": {
+                "punc_text": punc_text
+            }
+        }
+
+        logger.info(f"The Text Service final response: {response}")
+    except ServerBaseException as e:
+        response = failed_response(e.error_code, e.msg)
+    except BaseException:
+        response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
+        traceback.print_exc()
+
+    return response
diff --git a/paddlespeech/server/restful/tts_api.py b/paddlespeech/server/restful/tts_api.py
index d1268428a0b53d41fdf9abb8fd7dbff4d485decc..15d618d9324fcda2616d571a4d074ea0876f0fb5 100644
--- a/paddlespeech/server/restful/tts_api.py
+++ b/paddlespeech/server/restful/tts_api.py
@@ -128,7 +128,7 @@ def tts(request_body: TTSRequest):
     return response
 
 
-@router.post("/paddlespeech/streaming/tts")
+@router.post("/paddlespeech/tts/streaming")
 async def stream_tts(request_body: TTSRequest):
     text = request_body.text
 
diff --git a/paddlespeech/server/restful/vector_api.py b/paddlespeech/server/restful/vector_api.py
new file mode 100644
index 0000000000000000000000000000000000000000..6e04f48e766c52dc5206c029bf9c2731f90f0c35
--- /dev/null
+++ b/paddlespeech/server/restful/vector_api.py
@@ -0,0 +1,151 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import base64
+import traceback
+from typing import Union
+
+import numpy as np
+from fastapi import APIRouter
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.engine_pool import get_engine_pool
+from paddlespeech.server.engine.vector.python.vector_engine import PaddleVectorConnectionHandler
+from paddlespeech.server.restful.request import VectorRequest
+from paddlespeech.server.restful.request import VectorScoreRequest
+from paddlespeech.server.restful.response import ErrorResponse
+from paddlespeech.server.restful.response import VectorResponse
+from paddlespeech.server.restful.response import VectorScoreResponse
+from paddlespeech.server.utils.errors import ErrorCode
+from paddlespeech.server.utils.errors import failed_response
+from paddlespeech.server.utils.exception import ServerBaseException
+router = APIRouter()
+
+
+@router.get('/paddlespeech/vector/help')
+def help():
+    """help
+
+    Returns:
+        json: The /paddlespeech/vector api response content
+    """
+    response = {
+        "success": "True",
+        "code": 200,
+        "message": {
+            "global": "success"
+        },
+        "vector": [2.3, 3.5, 5.5, 6.2, 2.8, 1.2, 0.3, 3.6]
+    }
+    return response
+
+
+@router.post(
+    "/paddlespeech/vector", response_model=Union[VectorResponse, ErrorResponse])
+def vector(request_body: VectorRequest):
+    """vector api 
+
+    Args:
+        request_body (VectorRequest): the vector request body
+
+    Returns:
+        json: the vector response body
+    """
+    try:
+        # 1. get the audio data
+        #    the audio must be base64 format
+        audio_data = base64.b64decode(request_body.audio)
+
+        # 2. get single engine from engine pool
+        #    and we use the vector_engine to create an connection handler to process the request
+        engine_pool = get_engine_pool()
+        vector_engine = engine_pool['vector']
+        connection_handler = PaddleVectorConnectionHandler(vector_engine)
+
+        # 3. we use the connection handler to process the audio
+        audio_vec = connection_handler.run(audio_data, request_body.task)
+
+        # 4. we need the result of the vector instance be numpy.ndarray
+        if not isinstance(audio_vec, np.ndarray):
+            logger.error(
+                f"the vector type is not numpy.array, that is: {type(audio_vec)}"
+            )
+            error_reponse = ErrorResponse()
+            error_reponse.message.description = f"the vector type is not numpy.array, that is: {type(audio_vec)}"
+            return error_reponse
+
+        response = {
+            "success": True,
+            "code": 200,
+            "message": {
+                "description": "success"
+            },
+            "result": {
+                "vec": audio_vec.tolist()
+            }
+        }
+
+    except ServerBaseException as e:
+        response = failed_response(e.error_code, e.msg)
+    except BaseException:
+        response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
+        traceback.print_exc()
+
+    return response
+
+
+@router.post(
+    "/paddlespeech/vector/score",
+    response_model=Union[VectorScoreResponse, ErrorResponse])
+def score(request_body: VectorScoreRequest):
+    """vector api 
+
+    Args:
+        request_body (VectorScoreRequest): the punctuation request body
+
+    Returns:
+        json: the punctuation response body
+    """
+    try:
+        # 1. get the audio data
+        #    the audio must be base64 format
+        enroll_data = base64.b64decode(request_body.enroll_audio)
+        test_data = base64.b64decode(request_body.test_audio)
+
+        # 2. get single engine from engine pool
+        #    and we use the vector_engine to create an connection handler to process the request
+        engine_pool = get_engine_pool()
+        vector_engine = engine_pool['vector']
+        connection_handler = PaddleVectorConnectionHandler(vector_engine)
+
+        # 3. we use the connection handler to process the audio
+        score = connection_handler.get_enroll_test_score(enroll_data, test_data)
+
+        response = {
+            "success": True,
+            "code": 200,
+            "message": {
+                "description": "success"
+            },
+            "result": {
+                "score": score
+            }
+        }
+
+    except ServerBaseException as e:
+        response = failed_response(e.error_code, e.msg)
+    except BaseException:
+        response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
+        traceback.print_exc()
+
+    return response
diff --git a/paddlespeech/server/tests/asr/online/microphone_client.py b/paddlespeech/server/tests/asr/online/microphone_client.py
index 2ceaf6d03a07ab922477505c016e3870351d2574..bb27e54807b96899a4f44dbfe1346c8366c081d7 100644
--- a/paddlespeech/server/tests/asr/online/microphone_client.py
+++ b/paddlespeech/server/tests/asr/online/microphone_client.py
@@ -26,7 +26,7 @@ import pyaudio
 import websockets
 
 
-class ASRAudioHandler(threading.Thread):
+class ASRWsAudioHandler(threading.Thread):
     def __init__(self, url="127.0.0.1", port=8091):
         threading.Thread.__init__(self)
         self.url = url
@@ -148,7 +148,7 @@ if __name__ == "__main__":
     logging.basicConfig(level=logging.INFO)
     logging.info("asr websocket client start")
 
-    handler = ASRAudioHandler("127.0.0.1", 8091)
+    handler = ASRWsAudioHandler("127.0.0.1", 8091)
     loop = asyncio.get_event_loop()
     main_task = asyncio.ensure_future(handler.run())
     for signal in [SIGINT, SIGTERM]:
diff --git a/paddlespeech/server/tests/text/http_client.py b/paddlespeech/server/tests/text/http_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..c2eb3eb10cb0c7ba49b883951aca230c4edfe7c4
--- /dev/null
+++ b/paddlespeech/server/tests/text/http_client.py
@@ -0,0 +1,75 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import json
+import time
+
+import requests
+
+from paddlespeech.cli.log import logger
+
+
+# Request and response
+def text_client(args):
+    """ Request and response
+    Args:
+        text: A sentence to be processed by PaddleSpeech Text Server
+        outfile: The punctuation text
+    """
+    url = "http://" + str(args.server) + ":" + str(
+        args.port) + "/paddlespeech/text"
+    request = {
+        "text": args.text,
+    }
+
+    response = requests.post(url, json.dumps(request))
+    response_dict = response.json()
+    punc_text = response_dict["result"]["punc_text"]
+
+    # transform audio
+    outfile = args.output
+    if outfile:
+        with open(outfile, 'w') as w:
+            w.write(punc_text + "\n")
+
+    logger.info(f"The punc text is: {punc_text}")
+    return punc_text
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--text',
+        type=str,
+        default="今天的天气真不错啊你下午有空吗我想约你一起去吃饭",
+        help='A sentence to be synthesized')
+    parser.add_argument(
+        '--output', type=str, default="./punc_text", help='Punc text file')
+    parser.add_argument(
+        "--server", type=str, help="server ip", default="127.0.0.1")
+    parser.add_argument("--port", type=int, help="server port", default=8090)
+    args = parser.parse_args()
+
+    st = time.time()
+    try:
+        punc_text = text_client(args)
+        time_consume = time.time() - st
+        time_per_word = time_consume / len(args.text)
+        print("Text Process successfully.")
+        print("Inference time: %f" % (time_consume))
+        print("The text length: %f" % (len(args.text)))
+        print("The time per work is: %f" % (time_per_word))
+    except BaseException as e:
+        logger.info("Failed to Process text.")
+        logger.info(e)
diff --git a/paddlespeech/server/tests/tts/offline/http_client.py b/paddlespeech/server/tests/tts/offline/http_client.py
index 1bdee4c18ff5b7b3fe3ce348fc3ea0151b32db19..24109a0e12507c81664a88a31c21ec16852f9f64 100644
--- a/paddlespeech/server/tests/tts/offline/http_client.py
+++ b/paddlespeech/server/tests/tts/offline/http_client.py
@@ -61,7 +61,7 @@ def tts_client(args):
         temp_wav = str(random.getrandbits(128)) + ".wav"
         soundfile.write(temp_wav, samples, sample_rate)
         wav2pcm(temp_wav, outfile, data_type=np.int16)
-        os.system("rm %s" % (temp_wav))
+        os.remove(temp_wav)
     else:
         print("The format for saving audio only supports wav or pcm")
 
diff --git a/paddlespeech/server/tests/tts/online/http_client.py b/paddlespeech/server/tests/tts/online/http_client.py
index cbc1f5c023968eba254a8d1ab18926abcade69e6..47b781ed9030e55a33a6a8383f83eb1ba61b617d 100644
--- a/paddlespeech/server/tests/tts/online/http_client.py
+++ b/paddlespeech/server/tests/tts/online/http_client.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -12,75 +12,20 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
-import base64
-import json
-import os
-import time
-
-import requests
-
-from paddlespeech.server.utils.audio_process import pcm2wav
-
-
-def save_audio(buffer, audio_path) -> bool:
-    if args.save_path.endswith("pcm"):
-        with open(args.save_path, "wb") as f:
-            f.write(buffer)
-    elif args.save_path.endswith("wav"):
-        with open("./tmp.pcm", "wb") as f:
-            f.write(buffer)
-        pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000)
-        os.system("rm ./tmp.pcm")
-    else:
-        print("Only supports saved audio format is pcm or wav")
-        return False
-
-    return True
-
-
-def test(args):
-    params = {
-        "text": args.text,
-        "spk_id": args.spk_id,
-        "speed": args.speed,
-        "volume": args.volume,
-        "sample_rate": args.sample_rate,
-        "save_path": ''
-    }
-
-    buffer = b''
-    flag = 1
-    url = "http://" + str(args.server) + ":" + str(
-        args.port) + "/paddlespeech/streaming/tts"
-    st = time.time()
-    html = requests.post(url, json.dumps(params), stream=True)
-    for chunk in html.iter_content(chunk_size=1024):
-        chunk = base64.b64decode(chunk)  # bytes
-        if flag:
-            first_response = time.time() - st
-            print(f"首包响应：{first_response} s")
-            flag = 0
-        buffer += chunk
-
-    final_response = time.time() - st
-    duration = len(buffer) / 2.0 / 24000
-
-    print(f"尾包响应：{final_response} s")
-    print(f"音频时长：{duration} s")
-    print(f"RTF: {final_response / duration}")
-
-    if args.save_path is not None:
-        if save_audio(buffer, args.save_path):
-            print("音频保存至：", args.save_path)
 
+from paddlespeech.server.utils.audio_handler import TTSHttpHandler
+from paddlespeech.server.utils.util import compute_delay
 
 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
     parser.add_argument(
-        '--text',
+        "--text",
         type=str,
-        default="您好，欢迎使用语音合成服务。",
-        help='A sentence to be synthesized')
+        help="A sentence to be synthesized",
+        default="您好，欢迎使用语音合成服务。")
+    parser.add_argument(
+        "--server", type=str, help="server ip", default="127.0.0.1")
+    parser.add_argument("--port", type=int, help="server port", default=8092)
     parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
     parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
     parser.add_argument(
@@ -89,12 +34,35 @@ if __name__ == "__main__":
         '--sample_rate',
         type=int,
         default=0,
+        choices=[0, 8000, 16000],
         help='Sampling rate, the default is the same as the model')
     parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
-    parser.add_argument("--port", type=int, help="server port", default=8092)
+        "--output", type=str, help="save audio path", default=None)
     parser.add_argument(
-        "--save_path", type=str, help="save audio path", default=None)
-
+        "--play", type=bool, help="whether to play audio", default=False)
     args = parser.parse_args()
-    test(args)
+
+    print("tts http client start")
+    handler = TTSHttpHandler(args.server, args.port, args.play)
+    first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list = handler.run(
+        args.text, args.spk_id, args.speed, args.volume, args.sample_rate,
+        args.output)
+    delay_time_list = compute_delay(receive_time_list, chunk_duration_list)
+
+    print(f"sentence: {args.text}")
+    print(f"duration: {duration} s")
+    print(f"first response: {first_response} s")
+    print(f"final response: {final_response} s")
+    print(f"RTF: {final_response/duration}")
+    if args.output is not None:
+        if save_audio_success:
+            print(f"Audio successfully saved in {args.output}")
+        else:
+            print("Audio save failed.")
+
+    if delay_time_list != []:
+        print(
+            f"Delay situation: total number of packages: {len(receive_time_list)}, the number of delayed packets: {len(delay_time_list)}, minimum delay time: {min(delay_time_list)} s, maximum delay time: {max(delay_time_list)} s, average delay time: {sum(delay_time_list)/len(delay_time_list)} s, delay rate:{len(delay_time_list)/len(receive_time_list)}"
+        )
+    else:
+        print("The sentence has no delay in streaming synthesis.")
diff --git a/paddlespeech/server/tests/tts/online/http_client_playaudio.py b/paddlespeech/server/tests/tts/online/http_client_playaudio.py
deleted file mode 100644
index 1e7e8064e11075b0e28b94bd0273c441599f03e4..0000000000000000000000000000000000000000
--- a/paddlespeech/server/tests/tts/online/http_client_playaudio.py
+++ /dev/null
@@ -1,112 +0,0 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import argparse
-import base64
-import json
-import threading
-import time
-
-import pyaudio
-import requests
-
-mutex = threading.Lock()
-buffer = b''
-p = pyaudio.PyAudio()
-stream = p.open(
-    format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
-max_fail = 50
-
-
-def play_audio():
-    global stream
-    global buffer
-    global max_fail
-    while True:
-        if not buffer:
-            max_fail -= 1
-            time.sleep(0.05)
-            if max_fail < 0:
-                break
-        mutex.acquire()
-        stream.write(buffer)
-        buffer = b''
-        mutex.release()
-
-
-def test(args):
-    global mutex
-    global buffer
-    params = {
-        "text": args.text,
-        "spk_id": args.spk_id,
-        "speed": args.speed,
-        "volume": args.volume,
-        "sample_rate": args.sample_rate,
-        "save_path": ''
-    }
-
-    all_bytes = 0.0
-    t = threading.Thread(target=play_audio)
-    flag = 1
-    url = "http://" + str(args.server) + ":" + str(
-        args.port) + "/paddlespeech/streaming/tts"
-    st = time.time()
-    html = requests.post(url, json.dumps(params), stream=True)
-    for chunk in html.iter_content(chunk_size=1024):
-        mutex.acquire()
-        chunk = base64.b64decode(chunk)  # bytes
-        buffer += chunk
-        mutex.release()
-        if flag:
-            first_response = time.time() - st
-            print(f"首包响应：{first_response} s")
-            flag = 0
-            t.start()
-        all_bytes += len(chunk)
-
-    final_response = time.time() - st
-    duration = all_bytes / 2 / 24000
-
-    print(f"尾包响应：{final_response} s")
-    print(f"音频时长：{duration} s")
-    print(f"RTF: {final_response / duration}")
-
-    t.join()
-    stream.stop_stream()
-    stream.close()
-    p.terminate()
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        '--text',
-        type=str,
-        default="您好，欢迎使用语音合成服务。",
-        help='A sentence to be synthesized')
-    parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
-    parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
-    parser.add_argument(
-        '--volume', type=float, default=1.0, help='Audio volume')
-    parser.add_argument(
-        '--sample_rate',
-        type=int,
-        default=0,
-        help='Sampling rate, the default is the same as the model')
-    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
-    parser.add_argument("--port", type=int, help="server port", default=8092)
-
-    args = parser.parse_args()
-    test(args)
diff --git a/paddlespeech/server/tests/tts/online/ws_client.py b/paddlespeech/server/tests/tts/online/ws_client.py
index eef010cf2b1e6f60a0a87c24b2f2d69660d6b392..0b1794c8aaef4dc3af3cea3f80b9166548a7a39c 100644
--- a/paddlespeech/server/tests/tts/online/ws_client.py
+++ b/paddlespeech/server/tests/tts/online/ws_client.py
@@ -11,92 +11,11 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import _thread as thread
 import argparse
-import base64
-import json
-import ssl
-import time
-
-import websocket
-
-flag = 1
-st = 0.0
-all_bytes = b''
-
-
-class WsParam(object):
-    # 初始化
-    def __init__(self, text, server="127.0.0.1", port=8090):
-        self.server = server
-        self.port = port
-        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
-        self.text = text
-
-    # 生成url
-    def create_url(self):
-        return self.url
-
-
-def on_message(ws, message):
-    global flag
-    global st
-    global all_bytes
-
-    try:
-        message = json.loads(message)
-        audio = message["audio"]
-        audio = base64.b64decode(audio)  # bytes
-        status = message["status"]
-        all_bytes += audio
-
-        if status == 0:
-            print("create successfully.")
-        elif status == 1:
-            if flag:
-                print(f"首包响应：{time.time() - st} s")
-                flag = 0
-        elif status == 2:
-            final_response = time.time() - st
-            duration = len(all_bytes) / 2.0 / 24000
-            print(f"尾包响应：{final_response} s")
-            print(f"音频时长：{duration} s")
-            print(f"RTF: {final_response / duration}")
-            with open("./out.pcm", "wb") as f:
-                f.write(all_bytes)
-            print("ws is closed")
-            ws.close()
-        else:
-            print("infer error")
-
-    except Exception as e:
-        print("receive msg,but parse exception:", e)
-
-
-# 收到websocket错误的处理
-def on_error(ws, error):
-    print("### error:", error)
-
-
-# 收到websocket关闭的处理
-def on_close(ws):
-    print("### closed ###")
-
-
-# 收到websocket连接建立的处理
-def on_open(ws):
-    def run(*args):
-        global st
-        text_base64 = str(
-            base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
-        d = {"text": text_base64}
-        d = json.dumps(d)
-        print("Start sending text data")
-        st = time.time()
-        ws.send(d)
-
-    thread.start_new_thread(run, ())
+import asyncio
 
+from paddlespeech.server.utils.audio_handler import TTSWsHandler
+from paddlespeech.server.utils.util import compute_delay
 
 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
@@ -108,19 +27,33 @@ if __name__ == "__main__":
     parser.add_argument(
         "--server", type=str, help="server ip", default="127.0.0.1")
     parser.add_argument("--port", type=int, help="server port", default=8092)
+    parser.add_argument(
+        "--output", type=str, help="save audio path", default=None)
+    parser.add_argument(
+        "--play", type=bool, help="whether to play audio", default=False)
     args = parser.parse_args()
 
-    print("***************************************")
-    print("Server ip: ", args.server)
-    print("Server port: ", args.port)
-    print("Sentence to be synthesized: ", args.text)
-    print("***************************************")
-
-    wsParam = WsParam(text=args.text, server=args.server, port=args.port)
-
-    websocket.enableTrace(False)
-    wsUrl = wsParam.create_url()
-    ws = websocket.WebSocketApp(
-        wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
-    ws.on_open = on_open
-    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
+    print("tts websocket client start")
+    handler = TTSWsHandler(args.server, args.port, args.play)
+    loop = asyncio.get_event_loop()
+    first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list = loop.run_until_complete(
+        handler.run(args.text, args.output))
+    delay_time_list = compute_delay(receive_time_list, chunk_duration_list)
+
+    print(f"sentence: {args.text}")
+    print(f"duration: {duration} s")
+    print(f"first response: {first_response} s")
+    print(f"final response: {final_response} s")
+    print(f"RTF: {final_response/duration}")
+    if args.output is not None:
+        if save_audio_success:
+            print(f"Audio successfully saved in {args.output}")
+        else:
+            print("Audio save failed.")
+
+    if delay_time_list != []:
+        print(
+            f"Delay situation: total number of packages: {len(receive_time_list)}, the number of delayed packets: {len(delay_time_list)}, minimum delay time: {min(delay_time_list)} s, maximum delay time: {max(delay_time_list)} s, average delay time: {sum(delay_time_list)/len(delay_time_list)} s, delay rate:{len(delay_time_list)/len(receive_time_list)}"
+        )
+    else:
+        print("The sentence has no delay in streaming synthesis.")
diff --git a/paddlespeech/server/tests/tts/online/ws_client_playaudio.py b/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
deleted file mode 100644
index cdeb362dfef1cef47b19b18a88c68ede25e69b3b..0000000000000000000000000000000000000000
--- a/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
+++ /dev/null
@@ -1,160 +0,0 @@
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import _thread as thread
-import argparse
-import base64
-import json
-import ssl
-import threading
-import time
-
-import pyaudio
-import websocket
-
-mutex = threading.Lock()
-buffer = b''
-p = pyaudio.PyAudio()
-stream = p.open(
-    format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
-flag = 1
-st = 0.0
-all_bytes = 0.0
-
-
-class WsParam(object):
-    # 初始化
-    def __init__(self, text, server="127.0.0.1", port=8090):
-        self.server = server
-        self.port = port
-        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
-        self.text = text
-
-    # 生成url
-    def create_url(self):
-        return self.url
-
-
-def play_audio():
-    global stream
-    global buffer
-    while True:
-        time.sleep(0.05)
-        if not buffer:  # buffer 为空
-            break
-        mutex.acquire()
-        stream.write(buffer)
-        buffer = b''
-        mutex.release()
-
-
-t = threading.Thread(target=play_audio)
-
-
-def on_message(ws, message):
-    global flag
-    global t
-    global buffer
-    global st
-    global all_bytes
-
-    try:
-        message = json.loads(message)
-        audio = message["audio"]
-        audio = base64.b64decode(audio)  # bytes
-        status = message["status"]
-        all_bytes += len(audio)
-
-        if status == 0:
-            print("create successfully.")
-        elif status == 1:
-            mutex.acquire()
-            buffer += audio
-            mutex.release()
-            if flag:
-                print(f"首包响应：{time.time() - st} s")
-                flag = 0
-                print("Start playing audio")
-                t.start()
-        elif status == 2:
-            final_response = time.time() - st
-            duration = all_bytes / 2 / 24000
-            print(f"尾包响应：{final_response} s")
-            print(f"音频时长：{duration} s")
-            print(f"RTF: {final_response / duration}")
-            print("ws is closed")
-            ws.close()
-        else:
-            print("infer error")
-
-    except Exception as e:
-        print("receive msg,but parse exception:", e)
-
-
-# 收到websocket错误的处理
-def on_error(ws, error):
-    print("### error:", error)
-
-
-# 收到websocket关闭的处理
-def on_close(ws):
-    print("### closed ###")
-
-
-# 收到websocket连接建立的处理
-def on_open(ws):
-    def run(*args):
-        global st
-        text_base64 = str(
-            base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
-        d = {"text": text_base64}
-        d = json.dumps(d)
-        print("Start sending text data")
-        st = time.time()
-        ws.send(d)
-
-    thread.start_new_thread(run, ())
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "--text",
-        type=str,
-        help="A sentence to be synthesized",
-        default="您好，欢迎使用语音合成服务。")
-    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
-    parser.add_argument("--port", type=int, help="server port", default=8092)
-    args = parser.parse_args()
-
-    print("***************************************")
-    print("Server ip: ", args.server)
-    print("Server port: ", args.port)
-    print("Sentence to be synthesized: ", args.text)
-    print("***************************************")
-
-    wsParam = WsParam(text=args.text, server=args.server, port=args.port)
-
-    websocket.enableTrace(False)
-    wsUrl = wsParam.create_url()
-    ws = websocket.WebSocketApp(
-        wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
-    ws.on_open = on_open
-    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
-
-    t.join()
-    print("End of playing audio")
-    stream.stop_stream()
-    stream.close()
-    p.terminate()
diff --git a/paddlespeech/server/util.py b/paddlespeech/server/util.py
index 1f1b0be1bd82f112bfa7c6162fde42c236739243..ae3e9c6aa913174bc866cd071e6eaa28cd9c8059 100644
--- a/paddlespeech/server/util.py
+++ b/paddlespeech/server/util.py
@@ -24,11 +24,11 @@ from typing import Any
 from typing import Dict
 
 import paddle
+import paddleaudio
 import requests
 import yaml
 from paddle.framework import load
 
-import paddleaudio
 from . import download
 from .entry import client_commands
 from .entry import server_commands
diff --git a/paddlespeech/server/utils/audio_handler.py b/paddlespeech/server/utils/audio_handler.py
new file mode 100644
index 0000000000000000000000000000000000000000..baa7b9343c2f0409db444e2061a41a50d96880ad
--- /dev/null
+++ b/paddlespeech/server/utils/audio_handler.py
@@ -0,0 +1,597 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import base64
+import json
+import logging
+import threading
+import time
+
+import numpy as np
+import requests
+import soundfile
+import websockets
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.utils.audio_process import save_audio
+from paddlespeech.server.utils.util import wav2base64
+
+
+class TextHttpHandler:
+    def __init__(self, server_ip="127.0.0.1", port=8090):
+        """Text http client request 
+
+        Args:
+            server_ip (str, optional): the text server ip. Defaults to "127.0.0.1".
+            port (int, optional): the text server port. Defaults to 8090.
+        """
+        super().__init__()
+        self.server_ip = server_ip
+        self.port = port
+        if server_ip is None or port is None:
+            self.url = None
+        else:
+            self.url = 'http://' + self.server_ip + ":" + str(
+                self.port) + '/paddlespeech/text'
+        logger.info(f"endpoint: {self.url}")
+
+    def run(self, text):
+        """Call the text server to process the specific text
+
+        Args:
+            text (str): the text to be processed
+
+        Returns:
+            str: punctuation text
+        """
+        if self.server_ip is None or self.port is None:
+            return text
+        request = {
+            "text": text,
+        }
+        try:
+            res = requests.post(url=self.url, data=json.dumps(request))
+            response_dict = res.json()
+            punc_text = response_dict["result"]["punc_text"]
+        except Exception as e:
+            logger.error(f"Call punctuation {self.url} occurs error")
+            logger.error(e)
+            punc_text = text
+
+        return punc_text
+
+
+class ASRWsAudioHandler:
+    def __init__(self,
+                 url=None,
+                 port=None,
+                 endpoint="/paddlespeech/asr/streaming",
+                 punc_server_ip=None,
+                 punc_server_port=None):
+        """PaddleSpeech Online ASR Server Client  audio handler
+           Online asr server use the websocket protocal
+        Args:
+            url (str, optional): the server ip. Defaults to None.
+            port (int, optional): the server port. Defaults to None.
+            endpoint(str, optional): to compatiable with python server and c++ server.
+            punc_server_ip(str, optional): the punctuation server ip. Defaults to None. 
+            punc_server_port(int, optional): the punctuation port. Defaults to None
+        """
+        self.url = url
+        self.port = port
+        if url is None or port is None or endpoint is None:
+            self.url = None
+        else:
+            self.url = "ws://" + self.url + ":" + str(self.port) + endpoint
+        self.punc_server = TextHttpHandler(punc_server_ip, punc_server_port)
+        logger.info(f"endpoint: {self.url}")
+
+    def read_wave(self, wavfile_path: str):
+        """read the audio file from specific wavfile path
+
+        Args:
+            wavfile_path (str): the audio wavfile, 
+                                 we assume that audio sample rate matches the model
+
+        Yields:
+            numpy.array: the samall package audio pcm data
+        """
+        samples, sample_rate = soundfile.read(wavfile_path, dtype='int16')
+        x_len = len(samples)
+        assert sample_rate == 16000
+
+        chunk_size = int(85 * sample_rate / 1000)  # 85ms, sample_rate = 16kHz
+
+        if x_len % chunk_size != 0:
+            padding_len_x = chunk_size - x_len % chunk_size
+        else:
+            padding_len_x = 0
+
+        padding = np.zeros((padding_len_x), dtype=samples.dtype)
+        padded_x = np.concatenate([samples, padding], axis=0)
+
+        assert (x_len + padding_len_x) % chunk_size == 0
+        num_chunk = (x_len + padding_len_x) / chunk_size
+        num_chunk = int(num_chunk)
+        for i in range(0, num_chunk):
+            start = i * chunk_size
+            end = start + chunk_size
+            x_chunk = padded_x[start:end]
+            yield x_chunk
+
+    async def run(self, wavfile_path: str):
+        """Send a audio file to online server
+
+        Args:
+            wavfile_path (str): audio path
+
+        Returns:
+            str: the final asr result
+        """
+        logging.info("send a message to the server")
+
+        if self.url is None:
+            logger.error("No asr server, please input valid ip and port")
+            return ""
+
+        # 1. send websocket handshake protocal
+        start_time = time.time()
+        async with websockets.connect(self.url) as ws:
+            # 2. server has already received handshake protocal
+            # client start to send the command
+            audio_info = json.dumps(
+                {
+                    "name": "test.wav",
+                    "signal": "start",
+                    "nbest": 1
+                },
+                sort_keys=True,
+                indent=4,
+                separators=(',', ': '))
+            await ws.send(audio_info)
+            msg = await ws.recv()
+            logger.info("client receive msg={}".format(msg))
+
+            # 3. send chunk audio data to engine
+            for chunk_data in self.read_wave(wavfile_path):
+                await ws.send(chunk_data.tobytes())
+                msg = await ws.recv()
+                msg = json.loads(msg)
+
+                if self.punc_server and len(msg["result"]) > 0:
+                    msg["result"] = self.punc_server.run(msg["result"])
+                logger.info("client receive msg={}".format(msg))
+
+            # 4. we must send finished signal to the server
+            audio_info = json.dumps(
+                {
+                    "name": "test.wav",
+                    "signal": "end",
+                    "nbest": 1
+                },
+                sort_keys=True,
+                indent=4,
+                separators=(',', ': '))
+            await ws.send(audio_info)
+            msg = await ws.recv()
+
+            # 5. decode the bytes to str
+            msg = json.loads(msg)
+
+            if self.punc_server:
+                msg["result"] = self.punc_server.run(msg["result"])
+
+            # 6. logging the final result and comptute the statstics
+            elapsed_time = time.time() - start_time
+            audio_info = soundfile.info(wavfile_path)
+            logger.info("client final receive msg={}".format(msg))
+            logger.info(
+                f"audio duration: {audio_info.duration}, elapsed time: {elapsed_time}, RTF={elapsed_time/audio_info.duration}"
+            )
+
+            result = msg
+
+            return result
+
+
+class ASRHttpHandler:
+    def __init__(self, server_ip=None, port=None, endpoint="/paddlespeech/asr"):
+        """The ASR client http request
+
+        Args:
+            server_ip (str, optional): the http asr server ip. Defaults to "127.0.0.1".
+            port (int, optional): the http asr server port. Defaults to 8090.
+        """
+        super().__init__()
+        self.server_ip = server_ip
+        self.port = port
+        if server_ip is None or port is None:
+            self.url = None
+        else:
+            self.url = 'http://' + self.server_ip + ":" + str(
+                self.port) + endpoint
+        logger.info(f"endpoint: {self.url}")
+
+    def run(self, input, audio_format, sample_rate, lang):
+        """Call the http asr to process the audio
+
+        Args:
+            input (str): the audio file path
+            audio_format (str): the audio format
+            sample_rate (str): the audio sample rate
+            lang (str): the audio language type
+
+        Returns:
+            str: the final asr result
+        """
+        if self.url is None:
+            logger.error(
+                "No punctuation server, please input valid ip and port")
+            return ""
+
+        audio = wav2base64(input)
+        data = {
+            "audio": audio,
+            "audio_format": audio_format,
+            "sample_rate": sample_rate,
+            "lang": lang,
+        }
+
+        res = requests.post(url=self.url, data=json.dumps(data))
+        
+        return res.json()
+
+
+class TTSWsHandler:
+    def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
+        """PaddleSpeech Online TTS Server Client  audio handler
+           Online tts server use the websocket protocal
+        Args:
+            server (str, optional): the server ip. Defaults to "127.0.0.1".
+            port (int, optional): the server port. Defaults to 8092.
+            play (bool, optional): whether to play audio. Defaults False
+        """
+        self.server = server
+        self.port = port
+        self.url = "ws://" + self.server + ":" + str(
+            self.port) + "/paddlespeech/tts/streaming"
+        self.play = play
+        if self.play:
+            import pyaudio
+            self.buffer = b''
+            self.p = pyaudio.PyAudio()
+            self.stream = self.p.open(
+                format=self.p.get_format_from_width(2),
+                channels=1,
+                rate=24000,
+                output=True)
+            self.mutex = threading.Lock()
+            self.start_play = True
+            self.t = threading.Thread(target=self.play_audio)
+            self.max_fail = 50
+        logger.info(f"endpoint: {self.url}")
+
+    def play_audio(self):
+        while True:
+            if not self.buffer:
+                self.max_fail -= 1
+                time.sleep(0.05)
+                if self.max_fail < 0:
+                    break
+            self.mutex.acquire()
+            self.stream.write(self.buffer)
+            self.buffer = b''
+            self.mutex.release()
+
+    async def run(self, text: str, output: str=None):
+        """Send a text to online server
+
+        Args:
+            text (str): sentence to be synthesized
+            output (str): save audio path
+        """
+        all_bytes = b''
+        receive_time_list = []
+        chunk_duration_list = []
+
+        # 1. Send websocket handshake request
+        async with websockets.connect(self.url) as ws:
+            # 2. Server has already received handshake response, send start request
+            start_request = json.dumps({"task": "tts", "signal": "start"})
+            await ws.send(start_request)
+            msg = await ws.recv()
+            logger.info(f"client receive msg={msg}")
+            msg = json.loads(msg)
+            session = msg["session"]
+
+            # 3. send speech synthesis request 
+            text_base64 = str(base64.b64encode((text).encode('utf-8')), "UTF8")
+            request = json.dumps({"text": text_base64})
+            st = time.time()
+            await ws.send(request)
+            logging.info("send a message to the server")
+
+            # 4. Process the received response
+            message = await ws.recv()
+            first_response = time.time() - st
+            message = json.loads(message)
+            status = message["status"]
+            while True:
+                # When throw an exception
+                if status == -1:
+                    # send end request
+                    end_request = json.dumps({
+                        "task": "tts",
+                        "signal": "end",
+                        "session": session
+                    })
+                    await ws.send(end_request)
+                    break
+
+                # Rerutn last packet normally, no audio information
+                elif status == 2:
+                    final_response = time.time() - st
+                    duration = len(all_bytes) / 2.0 / 24000
+
+                    if output is not None:
+                        save_audio_success = save_audio(all_bytes, output)
+                    else:
+                        save_audio_success = False
+
+                    # send end request
+                    end_request = json.dumps({
+                        "task": "tts",
+                        "signal": "end",
+                        "session": session
+                    })
+                    await ws.send(end_request)
+                    break
+
+                # Return the audio stream normally
+                elif status == 1:
+                    receive_time_list.append(time.time())
+                    audio = message["audio"]
+                    audio = base64.b64decode(audio)  # bytes
+                    chunk_duration_list.append(len(audio) / 2.0 / 24000)
+                    all_bytes += audio
+                    if self.play:
+                        self.mutex.acquire()
+                        self.buffer += audio
+                        self.mutex.release()
+                        if self.start_play:
+                            self.t.start()
+                            self.start_play = False
+
+                    message = await ws.recv()
+                    message = json.loads(message)
+                    status = message["status"]
+
+                else:
+                    logger.error("infer error, return status is invalid.")
+
+            if self.play:
+                self.t.join()
+                self.stream.stop_stream()
+                self.stream.close()
+                self.p.terminate()
+
+        return first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list
+
+
+class TTSHttpHandler:
+    def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
+        """PaddleSpeech Online TTS Server Client  audio handler
+           Online tts server use the websocket protocal
+        Args:
+            server (str, optional): the server ip. Defaults to "127.0.0.1".
+            port (int, optional): the server port. Defaults to 8092.
+            play (bool, optional): whether to play audio. Defaults False
+        """
+        self.server = server
+        self.port = port
+        self.url = "http://" + str(self.server) + ":" + str(
+            self.port) + "/paddlespeech/tts/streaming"
+        self.play = play
+
+        if self.play:
+            import pyaudio
+            self.buffer = b''
+            self.p = pyaudio.PyAudio()
+            self.stream = self.p.open(
+                format=self.p.get_format_from_width(2),
+                channels=1,
+                rate=24000,
+                output=True)
+            self.mutex = threading.Lock()
+            self.start_play = True
+            self.t = threading.Thread(target=self.play_audio)
+            self.max_fail = 50
+        logger.info(f"endpoint: {self.url}")
+
+    def play_audio(self):
+        while True:
+            if not self.buffer:
+                self.max_fail -= 1
+                time.sleep(0.05)
+                if self.max_fail < 0:
+                    break
+            self.mutex.acquire()
+            self.stream.write(self.buffer)
+            self.buffer = b''
+            self.mutex.release()
+
+    def run(self,
+            text: str,
+            spk_id=0,
+            speed=1.0,
+            volume=1.0,
+            sample_rate=0,
+            output: str=None):
+        """Send a text to tts online server
+
+        Args:
+            text (str): sentence to be synthesized.
+            spk_id (int, optional): speaker id. Defaults to 0.
+            speed (float, optional): audio speed. Defaults to 1.0.
+            volume (float, optional): audio volume. Defaults to 1.0.
+            sample_rate (int, optional): audio sample rate, 0 means the same as model. Defaults to 0.
+            output (str, optional): save audio path. Defaults to None.
+        """
+        # 1. Create request
+        params = {
+            "text": text,
+            "spk_id": spk_id,
+            "speed": speed,
+            "volume": volume,
+            "sample_rate": sample_rate,
+            "save_path": output
+        }
+
+        all_bytes = b''
+        first_flag = 1
+        receive_time_list = []
+        chunk_duration_list = []
+
+        # 2. Send request
+        st = time.time()
+        html = requests.post(self.url, json.dumps(params), stream=True)
+
+        # 3. Process the received response 
+        for chunk in html.iter_content(chunk_size=None):
+            receive_time_list.append(time.time())
+            audio = base64.b64decode(chunk)  # bytes
+            if first_flag:
+                first_response = time.time() - st
+                first_flag = 0
+
+            if self.play:
+                self.mutex.acquire()
+                self.buffer += audio
+                self.mutex.release()
+                if self.start_play:
+                    self.t.start()
+                    self.start_play = False
+            all_bytes += audio
+            chunk_duration_list.append(len(audio) / 2.0 / 24000)
+
+        final_response = time.time() - st
+        duration = len(all_bytes) / 2.0 / 24000
+        html.close()  # when stream=True
+
+        if output is not None:
+            save_audio_success = save_audio(all_bytes, output)
+        else:
+            save_audio_success = False
+
+        if self.play:
+            self.t.join()
+            self.stream.stop_stream()
+            self.stream.close()
+            self.p.terminate()
+
+        return first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list
+
+
+class VectorHttpHandler:
+    def __init__(self, server_ip=None, port=None):
+        """The Vector client http request
+
+        Args:
+            server_ip (str, optional): the http vector server ip. Defaults to "127.0.0.1".
+            port (int, optional): the http vector server port. Defaults to 8090.
+        """
+        super().__init__()
+        self.server_ip = server_ip
+        self.port = port
+        if server_ip is None or port is None:
+            self.url = None
+        else:
+            self.url = 'http://' + self.server_ip + ":" + str(
+                self.port) + '/paddlespeech/vector'
+        logger.info(f"endpoint: {self.url}")
+
+    def run(self, input, audio_format, sample_rate, task="spk"):
+        """Call the http asr to process the audio
+
+        Args:
+            input (str): the audio file path
+            audio_format (str): the audio format
+            sample_rate (str): the audio sample rate
+
+        Returns:
+            list: the audio vector
+        """
+        if self.url is None:
+            logger.error("No vector server, please input valid ip and port")
+            return ""
+
+        audio = wav2base64(input)
+        data = {
+            "audio": audio,
+            "task": task,
+            "audio_format": audio_format,
+            "sample_rate": sample_rate,
+        }
+
+        logger.info(self.url)
+        res = requests.post(url=self.url, data=json.dumps(data))
+
+        return res.json()
+
+
+class VectorScoreHttpHandler:
+    def __init__(self, server_ip=None, port=None):
+        """The Vector score client http request
+
+        Args:
+            server_ip (str, optional): the http vector server ip. Defaults to "127.0.0.1".
+            port (int, optional): the http vector server port. Defaults to 8090.
+        """
+        super().__init__()
+        self.server_ip = server_ip
+        self.port = port
+        if server_ip is None or port is None:
+            self.url = None
+        else:
+            self.url = 'http://' + self.server_ip + ":" + str(
+                self.port) + '/paddlespeech/vector/score'
+        logger.info(f"endpoint: {self.url}")
+
+    def run(self, enroll_audio, test_audio, audio_format, sample_rate):
+        """Call the http asr to process the audio
+
+        Args:
+            input (str): the audio file path
+            audio_format (str): the audio format
+            sample_rate (str): the audio sample rate
+
+        Returns:
+            list: the audio vector
+        """
+        if self.url is None:
+            logger.error("No vector server, please input valid ip and port")
+            return ""
+
+        enroll_audio = wav2base64(enroll_audio)
+        test_audio = wav2base64(test_audio)
+        data = {
+            "enroll_audio": enroll_audio,
+            "test_audio": test_audio,
+            "task": "score",
+            "audio_format": audio_format,
+            "sample_rate": sample_rate,
+        }
+
+        res = requests.post(url=self.url, data=json.dumps(data))
+
+        return res.json()
diff --git a/paddlespeech/server/utils/audio_process.py b/paddlespeech/server/utils/audio_process.py
index e85b9a27e14c980fd513d6e8e7a29eb62de73074..416d77ac41d02794ce8bd5ec3de4f1fd8f5add9a 100644
--- a/paddlespeech/server/utils/audio_process.py
+++ b/paddlespeech/server/utils/audio_process.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import os
 import wave
 
 import numpy as np
@@ -106,7 +107,7 @@ def change_speed(sample_raw, speed_rate, sample_rate):
 
 
 def float2pcm(sig, dtype='int16'):
-    """Convert floating point signal with a range from -1 to 1 to PCM.
+    """Convert floating point signal with a range from -1 to 1 to PCM16.
 
     Args:
         sig (array): Input array, must have floating point type.
@@ -140,3 +141,35 @@ def pcm2float(data):
         bits = np.iinfo(np.int16).bits
         data = data / (2**(bits - 1))
     return data
+
+
+def save_audio(bytes_data, audio_path, sample_rate: int=24000) -> bool:
+    """save byte to audio file.
+
+    Args:
+        bytes_data (bytes): audio samples, bytes format
+        audio_path (str): save audio path
+        sample_rate (int, optional): audio sample rate. Defaults to 24000.
+
+    Returns:
+        bool: Whether the audio was saved successfully
+    """
+
+    if audio_path.endswith("pcm"):
+        with open(audio_path, "wb") as f:
+            f.write(bytes_data)
+    elif audio_path.endswith("wav"):
+        with open("./tmp.pcm", "wb") as f:
+            f.write(bytes_data)
+        pcm2wav(
+            "./tmp.pcm",
+            audio_path,
+            channels=1,
+            bits=16,
+            sample_rate=sample_rate)
+        os.remove("./tmp.pcm")
+    else:
+        print("Only supports saved audio format is pcm or wav")
+        return False
+
+    return True
diff --git a/paddlespeech/server/utils/buffer.py b/paddlespeech/server/utils/buffer.py
index d4e6cd4934a7b449482bac77e4c3a94797e12d08..f56db752d48ea1eb5ee9fbd3786d860c8cdbd7a2 100644
--- a/paddlespeech/server/utils/buffer.py
+++ b/paddlespeech/server/utils/buffer.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-
 class Frame(object):
     """Represents a "frame" of audio data."""
 
@@ -46,8 +45,7 @@ class ChunkBuffer(object):
         self.shift_ms = shift_ms
         self.sample_rate = sample_rate
         self.sample_width = sample_width  # int16 = 2; float32 = 4
-        self.remained_audio = b''
-
+        
         self.window_sec = float((self.window_n - 1) * self.shift_ms +
                                 self.window_ms) / 1000.0
         self.shift_sec = float(self.shift_n * self.shift_ms / 1000.0)
@@ -57,22 +55,31 @@ class ChunkBuffer(object):
         self.shift_bytes = int(self.shift_sec * self.sample_rate *
                                self.sample_width)
 
+        self.remained_audio = b''
+        # abs timestamp from `start` or latest `reset`
+        self.timestamp = 0.0
+
+    def reset(self):
+        """
+            reset buffer state.
+        """
+        self.timestamp = 0.0
+        self.remained_audio = b''
+
     def frame_generator(self, audio):
         """Generates audio frames from PCM audio data.
         Takes the desired frame duration in milliseconds, the PCM data, and
         the sample rate.
         Yields Frames of the requested duration.
         """
-
         audio = self.remained_audio + audio
         self.remained_audio = b''
 
         offset = 0
-        timestamp = 0.0
         while offset + self.window_bytes <= len(audio):
-            yield Frame(audio[offset:offset + self.window_bytes], timestamp,
+            yield Frame(audio[offset:offset + self.window_bytes], self.timestamp,
                         self.window_sec)
-            timestamp += self.shift_sec
+            self.timestamp += self.shift_sec
             offset += self.shift_bytes
 
         self.remained_audio += audio[offset:]
diff --git a/paddlespeech/server/utils/util.py b/paddlespeech/server/utils/util.py
index 72ee0060e246d437052b916362b2b55b1946fc65..061b213c78360d523d1cc3cc180f93cfaac387ab 100644
--- a/paddlespeech/server/utils/util.py
+++ b/paddlespeech/server/utils/util.py
@@ -75,3 +75,74 @@ def get_chunks(data, block_size, pad_size, step):
         else:
             print("Please set correct type to get chunks, am or voc")
     return chunks
+
+
+def compute_delay(receive_time_list, chunk_duration_list):
+    """compute delay 
+        Args:
+            receive_time_list (list): Time to receive each packet
+            chunk_duration_list (list): The audio duration corresponding to each packet
+        Returns:
+            [list]: Delay time list
+        """
+    assert (len(receive_time_list) == len(chunk_duration_list))
+    delay_time_list = []
+    play_time = receive_time_list[0] + chunk_duration_list[0]
+    for i in range(1, len(receive_time_list)):
+        receive_time = receive_time_list[i]
+        delay_time = receive_time - play_time
+        # 有延迟
+        if delay_time > 0:
+            play_time = play_time + delay_time + chunk_duration_list[i]
+            delay_time_list.append(delay_time)
+        # 没有延迟
+        else:
+            play_time = play_time + chunk_duration_list[i]
+
+    return delay_time_list
+
+
+def count_engine(logfile: str="./nohup.out"):
+    """For inference on the statistical engine side
+    Args:
+        logfile (str, optional): server log. Defaults to "./nohup.out".
+    """
+    first_response_list = []
+    final_response_list = []
+    duration_list = []
+
+    with open(logfile, "r") as f:
+        for line in f.readlines():
+            if "- first response time:" in line:
+                first_response = float(line.splie(" ")[-2])
+                first_response_list.append(first_response)
+            elif "- final response time:" in line:
+                final_response = float(line.splie(" ")[-2])
+                final_response_list.append(final_response)
+            elif "- The durations of audio is:" in line:
+                duration = float(line.splie(" ")[-2])
+                duration_list.append(duration)
+
+    assert (len(first_response_list) == len(final_response_list) and
+            len(final_response_list) == len(duration_list))
+
+    avg_first_response = sum(first_response_list) / len(first_response_list)
+    avg_final_response = sum(final_response_list) / len(final_response_list)
+    avg_duration = sum(duration_list) / len(duration_list)
+    RTF = sum(final_response_list) / sum(duration_list)
+
+    print(
+        "************************* engine result ***************************************"
+    )
+    print(
+        f"test num: {len(duration_list)}, avg first response: {avg_first_response} s, avg final response: {avg_final_response} s, avg duration: {avg_duration}, RTF: {RTF}"
+    )
+    print(
+        f"min duration: {min(duration_list)} s, max duration: {max(duration_list)} s"
+    )
+    print(
+        f"max first response: {max(first_response_list)} s, min first response: {min(first_response_list)} s"
+    )
+    print(
+        f"max final response: {max(final_response_list)} s, min final response: {min(final_response_list)} s"
+    )
diff --git a/paddlespeech/server/ws/api.py b/paddlespeech/server/ws/api.py
index 313fd16f55b117b30cddccaeba404f7ee9762b5d..83d542a110c1bd625314ee9202456db997224072 100644
--- a/paddlespeech/server/ws/api.py
+++ b/paddlespeech/server/ws/api.py
@@ -15,8 +15,8 @@ from typing import List
 
 from fastapi import APIRouter
 
-from paddlespeech.server.ws.asr_socket import router as asr_router
-from paddlespeech.server.ws.tts_socket import router as tts_router
+from paddlespeech.server.ws.asr_api import router as asr_router
+from paddlespeech.server.ws.tts_api import router as tts_router
 
 _router = APIRouter()
 
diff --git a/paddlespeech/server/ws/asr_socket.py b/paddlespeech/server/ws/asr_api.py
similarity index 60%
rename from paddlespeech/server/ws/asr_socket.py
rename to paddlespeech/server/ws/asr_api.py
index a865703dd338eaecfd66902d18fe2a4d58c857d4..0faa131aaf27e04b535d685da2349b9d1b3268d8 100644
--- a/paddlespeech/server/ws/asr_socket.py
+++ b/paddlespeech/server/ws/asr_api.py
@@ -18,52 +18,54 @@ from fastapi import WebSocket
 from fastapi import WebSocketDisconnect
 from starlette.websockets import WebSocketState as WebSocketState
 
+from paddlespeech.cli.log import logger
 from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
 from paddlespeech.server.engine.engine_pool import get_engine_pool
-from paddlespeech.server.utils.buffer import ChunkBuffer
-from paddlespeech.server.utils.vad import VADAudio
-
 router = APIRouter()
 
 
-@router.websocket('/ws/asr')
+@router.websocket('/paddlespeech/asr/streaming')
 async def websocket_endpoint(websocket: WebSocket):
+    """PaddleSpeech Online ASR Server api
+
+    Args:
+        websocket (WebSocket): the websocket instance
+    """
+
+    #1. the interface wait to accept the websocket protocal header
+    #   and only we receive the header, it establish the connection with specific thread
     await websocket.accept()
 
+    #2. if we accept the websocket headers, we will get the online asr engine instance
     engine_pool = get_engine_pool()
     asr_engine = engine_pool['asr']
-    connection_handler = None
-    # init buffer
-    # each websocekt connection has its own chunk buffer
-    chunk_buffer_conf = asr_engine.config.chunk_buffer_conf
-    chunk_buffer = ChunkBuffer(
-        window_n=chunk_buffer_conf.window_n,
-        shift_n=chunk_buffer_conf.shift_n,
-        window_ms=chunk_buffer_conf.window_ms,
-        shift_ms=chunk_buffer_conf.shift_ms,
-        sample_rate=chunk_buffer_conf.sample_rate,
-        sample_width=chunk_buffer_conf.sample_width)
 
-    # init vad
-    vad_conf = asr_engine.config.get('vad_conf', None)
-    if vad_conf:
-        vad = VADAudio(
-            aggressiveness=vad_conf['aggressiveness'],
-            rate=vad_conf['sample_rate'],
-            frame_duration_ms=vad_conf['frame_duration_ms'])
+    #3. each websocket connection, we will create an PaddleASRConnectionHanddler to process such audio
+    #   and each connection has its own connection instance to process the request
+    #   and only if client send the start signal, we create the PaddleASRConnectionHanddler instance
+    connection_handler = None
 
     try:
+        #4. we do a loop to process the audio package by package according the protocal
+        #   and only if the client send finished signal, we will break the loop
         while True:
             # careful here, changed the source code from starlette.websockets
+            # 4.1 we wait for the client signal for the specific action
             assert websocket.application_state == WebSocketState.CONNECTED
             message = await websocket.receive()
             websocket._raise_on_disconnect(message)
+
+            #4.2 text for the action command and bytes for pcm data
             if "text" in message:
+                # we first parse the specific command
                 message = json.loads(message["text"])
                 if 'signal' not in message:
                     resp = {"status": "ok", "message": "no valid json data"}
                     await websocket.send_json(resp)
 
+                # start command, we create the PaddleASRConnectionHanddler instance to process the audio data
+                # end command, we process the all the last audio pcm and return the final result
+                #              and we break the loop
                 if message['signal'] == 'start':
                     resp = {"status": "ok", "signal": "server_ready"}
                     # do something at begining here
@@ -72,15 +74,18 @@ async def websocket_endpoint(websocket: WebSocket):
                     await websocket.send_json(resp)
                 elif message['signal'] == 'end':
                     # reset single  engine for an new connection
+                    # and we will destroy the connection
                     connection_handler.decode(is_finished=True)
                     connection_handler.rescoring()
                     asr_results = connection_handler.get_result()
+                    word_time_stamp = connection_handler.get_word_time_stamp()
                     connection_handler.reset()
 
                     resp = {
                         "status": "ok",
                         "signal": "finished",
-                        'asr_results': asr_results
+                        'result': asr_results,
+                        'times': word_time_stamp
                     }
                     await websocket.send_json(resp)
                     break
@@ -88,13 +93,18 @@ async def websocket_endpoint(websocket: WebSocket):
                     resp = {"status": "ok", "message": "no valid json data"}
                     await websocket.send_json(resp)
             elif "bytes" in message:
+                # bytes for the pcm data
                 message = message["bytes"]
 
+                # we extract the remained audio pcm 
+                # and decode for the result in this package data
                 connection_handler.extract_feat(message)
                 connection_handler.decode(is_finished=False)
                 asr_results = connection_handler.get_result()
 
-                resp = {'asr_results': asr_results}
+                # return the current period result
+                # if the engine create the vad instance, this connection will have many period results 
+                resp = {'result': asr_results}
                 await websocket.send_json(resp)
-    except WebSocketDisconnect:
-        pass
+    except WebSocketDisconnect as e:
+        logger.error(e)
diff --git a/paddlespeech/server/ws/tts_api.py b/paddlespeech/server/ws/tts_api.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3a4c4d4b1cbf298f258e5fa064aa425c6f1bdea
--- /dev/null
+++ b/paddlespeech/server/ws/tts_api.py
@@ -0,0 +1,104 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import json
+import uuid
+
+from fastapi import APIRouter
+from fastapi import WebSocket
+from starlette.websockets import WebSocketState as WebSocketState
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.engine_pool import get_engine_pool
+
+router = APIRouter()
+
+
+@router.websocket('/paddlespeech/tts/streaming')
+async def websocket_endpoint(websocket: WebSocket):
+    """PaddleSpeech Online TTS Server api
+
+    Args:
+        websocket (WebSocket): the websocket instance
+    """
+
+    #1. the interface wait to accept the websocket protocal header
+    #   and only we receive the header, it establish the connection with specific thread
+    await websocket.accept()
+
+    #2. if we accept the websocket headers, we will get the online tts engine instance
+    engine_pool = get_engine_pool()
+    tts_engine = engine_pool['tts']
+
+    try:
+        while True:
+            # careful here, changed the source code from starlette.websockets
+            assert websocket.application_state == WebSocketState.CONNECTED
+            message = await websocket.receive()
+            websocket._raise_on_disconnect(message)
+            message = json.loads(message["text"])
+
+            if 'signal' in message:
+                # start request
+                if message['signal'] == 'start':
+                    session = uuid.uuid1().hex
+                    resp = {
+                        "status": 0,
+                        "signal": "server ready",
+                        "session": session
+                    }
+                    await websocket.send_json(resp)
+
+                # end request
+                elif message['signal'] == 'end':
+                    resp = {
+                        "status": 0,
+                        "signal": "connection will be closed",
+                        "session": session
+                    }
+                    await websocket.send_json(resp)
+                    break
+                else:
+                    resp = {"status": 0, "signal": "no valid json data"}
+                    await websocket.send_json(resp)
+
+            # speech synthesis request 
+            elif 'text' in message:
+                text_bese64 = message["text"]
+                sentence = tts_engine.preprocess(text_bese64=text_bese64)
+
+                # run
+                wav_generator = tts_engine.run(sentence)
+
+                while True:
+                    try:
+                        tts_results = next(wav_generator)
+                        resp = {"status": 1, "audio": tts_results}
+                        await websocket.send_json(resp)
+                    except StopIteration as e:
+                        resp = {"status": 2, "audio": ''}
+                        await websocket.send_json(resp)
+                        logger.info(
+                            "Complete the synthesis of the audio streams")
+                        break
+                    except Exception as e:
+                        resp = {"status": -1, "audio": ''}
+                        await websocket.send_json(resp)
+                        break
+
+            else:
+                logger.error(
+                    "Invalid request, please check if the request is correct.")
+
+    except Exception as e:
+        logger.error(e)
diff --git a/paddlespeech/server/ws/tts_socket.py b/paddlespeech/server/ws/tts_socket.py
deleted file mode 100644
index 699ee412bb43a2b8f39d164e96360afd88cda689..0000000000000000000000000000000000000000
--- a/paddlespeech/server/ws/tts_socket.py
+++ /dev/null
@@ -1,61 +0,0 @@
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import json
-
-from fastapi import APIRouter
-from fastapi import WebSocket
-from fastapi import WebSocketDisconnect
-from starlette.websockets import WebSocketState as WebSocketState
-
-from paddlespeech.cli.log import logger
-from paddlespeech.server.engine.engine_pool import get_engine_pool
-
-router = APIRouter()
-
-
-@router.websocket('/ws/tts')
-async def websocket_endpoint(websocket: WebSocket):
-    await websocket.accept()
-
-    try:
-        # careful here, changed the source code from starlette.websockets
-        assert websocket.application_state == WebSocketState.CONNECTED
-        message = await websocket.receive()
-        websocket._raise_on_disconnect(message)
-
-        # get engine
-        engine_pool = get_engine_pool()
-        tts_engine = engine_pool['tts']
-
-        # 获取 message 并转文本
-        message = json.loads(message["text"])
-        text_bese64 = message["text"]
-        sentence = tts_engine.preprocess(text_bese64=text_bese64)
-
-        # run
-        wav_generator = tts_engine.run(sentence)
-
-        while True:
-            try:
-                tts_results = next(wav_generator)
-                resp = {"status": 1, "audio": tts_results}
-                await websocket.send_json(resp)
-            except StopIteration as e:
-                resp = {"status": 2, "audio": ''}
-                await websocket.send_json(resp)
-                logger.info("Complete the transmission of audio streams")
-                break
-
-    except WebSocketDisconnect:
-        pass
diff --git a/paddlespeech/t2s/exps/fastspeech2/preprocess.py b/paddlespeech/t2s/exps/fastspeech2/preprocess.py
index 55dc3808904f4b09deac6a69d9d41725ce2a7050..eac75f9821dd69b798a097687a1101b8d717dc9c 100644
--- a/paddlespeech/t2s/exps/fastspeech2/preprocess.py
+++ b/paddlespeech/t2s/exps/fastspeech2/preprocess.py
@@ -55,8 +55,11 @@ def process_sentence(config: Dict[str, Any],
     if utt_id in sentences:
         # reading, resampling may occur
         wav, _ = librosa.load(str(fp), sr=config.fs)
-        if len(wav.shape) != 1 or np.abs(wav).max() > 1.0:
+        if len(wav.shape) != 1:
             return record
+        max_value = np.abs(wav).max()
+        if max_value > 1.0:
+            wav = wav / max_value
         assert len(wav.shape) == 1, f"{utt_id} is not a mono-channel audio."
         assert np.abs(wav).max(
         ) <= 1.0, f"{utt_id} is seems to be different that 16 bit PCM."
diff --git a/paddlespeech/t2s/exps/gan_vocoder/preprocess.py b/paddlespeech/t2s/exps/gan_vocoder/preprocess.py
index 5a407f5ba0aea98fcb0b0932b077fa00733eb578..546367964f98205318b1ad089604d2518472506e 100644
--- a/paddlespeech/t2s/exps/gan_vocoder/preprocess.py
+++ b/paddlespeech/t2s/exps/gan_vocoder/preprocess.py
@@ -47,8 +47,11 @@ def process_sentence(config: Dict[str, Any],
     if utt_id in sentences:
         # reading, resampling may occur
         y, _ = librosa.load(str(fp), sr=config.fs)
-        if len(y.shape) != 1 or np.abs(y).max() > 1.0:
+        if len(y.shape) != 1:
             return record
+        max_value = np.abs(y).max()
+        if max_value > 1.0:
+            y = y / max_value
         assert len(y.shape) == 1, f"{utt_id} is not a mono-channel audio."
         assert np.abs(y).max(
         ) <= 1.0, f"{utt_id} is seems to be different that 16 bit PCM."
diff --git a/paddlespeech/t2s/exps/speedyspeech/preprocess.py b/paddlespeech/t2s/exps/speedyspeech/preprocess.py
index e8d89a4f5ee1e37ec993c01c7a70f9b1c9803830..aa7608d6b945b7fda3bdfab9ab74c1c080b20537 100644
--- a/paddlespeech/t2s/exps/speedyspeech/preprocess.py
+++ b/paddlespeech/t2s/exps/speedyspeech/preprocess.py
@@ -47,8 +47,11 @@ def process_sentence(config: Dict[str, Any],
     if utt_id in sentences:
         # reading, resampling may occur
         wav, _ = librosa.load(str(fp), sr=config.fs)
-        if len(wav.shape) != 1 or np.abs(wav).max() > 1.0:
+        if len(wav.shape) != 1:
             return record
+        max_value = np.abs(wav).max()
+        if max_value > 1.0:
+            wav = wav / max_value
         assert len(wav.shape) == 1, f"{utt_id} is not a mono-channel audio."
         assert np.abs(wav).max(
         ) <= 1.0, f"{utt_id} is seems to be different that 16 bit PCM."
diff --git a/paddlespeech/t2s/exps/speedyspeech/synthesize_e2e.py b/paddlespeech/t2s/exps/speedyspeech/synthesize_e2e.py
index cb742c59587fa91f442d4ba5868c7b13a23fe085..252ac93265db60cfd855c876df15dca5c91384bf 100644
--- a/paddlespeech/t2s/exps/speedyspeech/synthesize_e2e.py
+++ b/paddlespeech/t2s/exps/speedyspeech/synthesize_e2e.py
@@ -174,12 +174,17 @@ def main():
     parser.add_argument(
         "--inference-dir", type=str, help="dir to save inference models")
     parser.add_argument(
-        "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu.")
+        "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu or xpu.")
+    parser.add_argument(
+        "--nxpu", type=int, default=0, help="if nxpu == 0 and ngpu == 0, use cpu.")
 
     args, _ = parser.parse_known_args()
 
     if args.ngpu == 0:
-        paddle.set_device("cpu")
+        if args.nxpu == 0:
+            paddle.set_device("cpu")
+        else:
+            paddle.set_device("xpu")
     elif args.ngpu > 0:
         paddle.set_device("gpu")
     else:
diff --git a/paddlespeech/t2s/exps/speedyspeech/train.py b/paddlespeech/t2s/exps/speedyspeech/train.py
index bda5370c1da9ceb1569a27ef068ec57b289e6888..d4cfe3488ec58841b0d1fd5245de42451b759952 100644
--- a/paddlespeech/t2s/exps/speedyspeech/train.py
+++ b/paddlespeech/t2s/exps/speedyspeech/train.py
@@ -46,7 +46,10 @@ def train_sp(args, config):
     # setup running environment correctly
     world_size = paddle.distributed.get_world_size()
     if (not paddle.is_compiled_with_cuda()) or args.ngpu == 0:
-        paddle.set_device("cpu")
+        if (not paddle.is_compiled_with_xpu()) or args.nxpu == 0:
+            paddle.set_device("cpu")
+        else:
+            paddle.set_device("xpu")
     else:
         paddle.set_device("gpu")
         if world_size > 1:
@@ -185,7 +188,9 @@ def main():
     parser.add_argument("--dev-metadata", type=str, help="dev data.")
     parser.add_argument("--output-dir", type=str, help="output dir.")
     parser.add_argument(
-        "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu.")
+        "--nxpu", type=int, default=0, help="if nxpu == 0 and ngpu == 0, use cpu.")
+    parser.add_argument(
+        "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu or xpu")
 
     parser.add_argument(
         "--use-relative-path",
diff --git a/paddlespeech/t2s/exps/tacotron2/preprocess.py b/paddlespeech/t2s/exps/tacotron2/preprocess.py
index fe2a2b26f6287f4356e9f444da8c653431c94509..6137da7f175b4e23af7a4b2e60908527bb65978d 100644
--- a/paddlespeech/t2s/exps/tacotron2/preprocess.py
+++ b/paddlespeech/t2s/exps/tacotron2/preprocess.py
@@ -51,8 +51,11 @@ def process_sentence(config: Dict[str, Any],
     if utt_id in sentences:
         # reading, resampling may occur
         wav, _ = librosa.load(str(fp), sr=config.fs)
-        if len(wav.shape) != 1 or np.abs(wav).max() > 1.0:
+        if len(wav.shape) != 1:
             return record
+        max_value = np.abs(wav).max()
+        if max_value > 1.0:
+            wav = wav / max_value
         assert len(wav.shape) == 1, f"{utt_id} is not a mono-channel audio."
         assert np.abs(wav).max(
         ) <= 1.0, f"{utt_id} is seems to be different that 16 bit PCM."
diff --git a/paddlespeech/t2s/exps/voice_cloning.py b/paddlespeech/t2s/exps/voice_cloning.py
index 3cf1cabcfbbeb8ef68aadd9736658f917b016f0d..4858d2d561776baf7ea5da4ff3da5c1dbfff4b99 100644
--- a/paddlespeech/t2s/exps/voice_cloning.py
+++ b/paddlespeech/t2s/exps/voice_cloning.py
@@ -110,10 +110,10 @@ def voice_cloning(args):
         print(f"{utt_id} done!")
     # Randomly generate numbers of 0 ~ 0.2, 256 is the dim of spk_emb
     random_spk_emb = np.random.rand(256) * 0.2
-    random_spk_emb = paddle.to_tensor(random_spk_emb)
+    random_spk_emb = paddle.to_tensor(random_spk_emb, dtype='float32')
     utt_id = "random_spk_emb"
     with paddle.no_grad():
-        wav = voc_inference(am_inference(phone_ids, spk_emb=spk_emb))
+        wav = voc_inference(am_inference(phone_ids, spk_emb=random_spk_emb))
     sf.write(
         str(output_dir / (utt_id + ".wav")),
         wav.numpy(),
diff --git a/paddlespeech/t2s/frontend/tone_sandhi.py b/paddlespeech/t2s/frontend/tone_sandhi.py
index 07f7fa2b8f8615af73fd656b0abd381e551179f9..e3102b9bc14ea89d065b5cb26a6339295bb26c66 100644
--- a/paddlespeech/t2s/frontend/tone_sandhi.py
+++ b/paddlespeech/t2s/frontend/tone_sandhi.py
@@ -63,7 +63,8 @@ class ToneSandhi():
             '扫把', '惦记'
         }
         self.must_not_neural_tone_words = {
-            "男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子", "人人", "虎虎"
+            "男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子", "人人", "虎虎",
+            "幺幺"
         }
         self.punc = "：，；。？！“”‘’':,;.?!"
 
diff --git a/paddlespeech/t2s/frontend/zh_normalization/num.py b/paddlespeech/t2s/frontend/zh_normalization/num.py
index a83b42a47b70b30452d5908e58d6e7a5b1c2f93c..ec13677367f949b73de74692a941a36ac9acadc0 100644
--- a/paddlespeech/t2s/frontend/zh_normalization/num.py
+++ b/paddlespeech/t2s/frontend/zh_normalization/num.py
@@ -103,7 +103,7 @@ def replace_default_num(match):
         str
     """
     number = match.group(0)
-    return verbalize_digit(number)
+    return verbalize_digit(number, alt_one=True)
 
 
 # 数字表达式
diff --git a/paddlespeech/t2s/modules/predictor/length_regulator.py b/paddlespeech/t2s/modules/predictor/length_regulator.py
index b64aa44ad4ca3b4403ba1092d5ecf22fac15ff71..e4fbf54916ed98948fffe8bf8325a312928efa57 100644
--- a/paddlespeech/t2s/modules/predictor/length_regulator.py
+++ b/paddlespeech/t2s/modules/predictor/length_regulator.py
@@ -49,7 +49,9 @@ class LengthRegulator(nn.Layer):
         encodings: (B, T, C)
         durations: (B, T)
         """
-        batch_size, t_enc = durations.shape
+        #batch_size, t_enc = durations.shape  # linux
+        batch_size = paddle.shape(durations)[0]  # windows and mac
+        t_enc = paddle.shape(durations)[1]  # windows and mac
         durations = durations.numpy()
         slens = np.sum(durations, -1)
         t_dec = np.max(slens)
diff --git a/paddlespeech/t2s/modules/transformer/repeat.py b/paddlespeech/t2s/modules/transformer/repeat.py
index 1e946adf7e469fd6c05c2a8c8d9e6f16f638524e..2073a78b9330201dba15b42badf77cee0caceab1 100644
--- a/paddlespeech/t2s/modules/transformer/repeat.py
+++ b/paddlespeech/t2s/modules/transformer/repeat.py
@@ -36,4 +36,4 @@ def repeat(N, fn):
     Returns:
         MultiSequential: Repeated model instance.
     """
-    return MultiSequential(* [fn(n) for n in range(N)])
+    return MultiSequential(*[fn(n) for n in range(N)])
diff --git a/paddlespeech/vector/exps/ecapa_tdnn/extract_emb.py b/paddlespeech/vector/exps/ecapa_tdnn/extract_emb.py
index 686de9363e82f121e59348441c09bb150984d218..e8d91bf3a50425c567b68eb4a408f1046f3b96ce 100644
--- a/paddlespeech/vector/exps/ecapa_tdnn/extract_emb.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/extract_emb.py
@@ -16,10 +16,10 @@ import os
 import time
 
 import paddle
-from yacs.config import CfgNode
-
 from paddleaudio.backends import load as load_audio
 from paddleaudio.compliance.librosa import melspectrogram
+from yacs.config import CfgNode
+
 from paddlespeech.s2t.utils.log import Log
 from paddlespeech.vector.io.batch import feature_normalize
 from paddlespeech.vector.models.ecapa_tdnn import EcapaTdnn
diff --git a/paddlespeech/vector/exps/ecapa_tdnn/test.py b/paddlespeech/vector/exps/ecapa_tdnn/test.py
index 1b38075d64aa1a5092164c6d9eda31eb6e3fe225..f15dbf9b7a111720de9481b9ce62104d47ea9e95 100644
--- a/paddlespeech/vector/exps/ecapa_tdnn/test.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/test.py
@@ -18,10 +18,10 @@ import numpy as np
 import paddle
 from paddle.io import BatchSampler
 from paddle.io import DataLoader
+from paddleaudio.metric import compute_eer
 from tqdm import tqdm
 from yacs.config import CfgNode
 
-from paddleaudio.metric import compute_eer
 from paddlespeech.s2t.utils.log import Log
 from paddlespeech.vector.io.batch import batch_feature_normalize
 from paddlespeech.vector.io.dataset import CSVDataset
diff --git a/paddlespeech/vector/exps/ecapa_tdnn/train.py b/paddlespeech/vector/exps/ecapa_tdnn/train.py
index 8855689d5ca894262d859223c75aec0e56304901..bf014045d0a85d253d9ef6056cf402a2042988b9 100644
--- a/paddlespeech/vector/exps/ecapa_tdnn/train.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/train.py
@@ -20,9 +20,9 @@ import paddle
 from paddle.io import BatchSampler
 from paddle.io import DataLoader
 from paddle.io import DistributedBatchSampler
+from paddleaudio.compliance.librosa import melspectrogram
 from yacs.config import CfgNode
 
-from paddleaudio.compliance.librosa import melspectrogram
 from paddlespeech.s2t.utils.log import Log
 from paddlespeech.vector.io.augment import build_augment_pipeline
 from paddlespeech.vector.io.augment import waveform_augment
@@ -54,7 +54,7 @@ def main(args, config):
     # stage1: we must call the paddle.distributed.init_parallel_env() api at the begining
     paddle.distributed.init_parallel_env()
     nranks = paddle.distributed.get_world_size()
-    local_rank = paddle.distributed.get_rank()
+    rank = paddle.distributed.get_rank()
     # set the random seed, it is the necessary measures for multiprocess training
     seed_everything(config.seed)
 
@@ -112,10 +112,10 @@ def main(args, config):
             state_dict = paddle.load(
                 os.path.join(args.load_checkpoint, 'model.pdopt'))
             optimizer.set_state_dict(state_dict)
-            if local_rank == 0:
+            if rank == 0:
                 logger.info(f'Checkpoint loaded from {args.load_checkpoint}')
         except FileExistsError:
-            if local_rank == 0:
+            if rank == 0:
                 logger.info('Train from scratch.')
 
         try:
@@ -219,7 +219,7 @@ def main(args, config):
             timer.count()  # step plus one in timer
 
             # stage 9-10: print the log information only on 0-rank per log-freq batchs
-            if (batch_idx + 1) % config.log_interval == 0 and local_rank == 0:
+            if (batch_idx + 1) % config.log_interval == 0 and rank == 0:
                 lr = optimizer.get_lr()
                 avg_loss /= config.log_interval
                 avg_acc = num_corrects / num_samples
@@ -250,7 +250,7 @@ def main(args, config):
 
         # stage 9-11: save the model parameters only on 0-rank per save-freq batchs
         if epoch % config.save_interval == 0 and batch_idx + 1 == steps_per_epoch:
-            if local_rank != 0:
+            if rank != 0:
                 paddle.distributed.barrier(
                 )  # Wait for valid step in main process
                 continue  # Resume trainning on other process
@@ -317,7 +317,7 @@ def main(args, config):
                 paddle.distributed.barrier()  # Main process
 
     # stage 10: create the final trained model.pdparams with soft link
-    if local_rank == 0:
+    if rank == 0:
         final_model = os.path.join(args.checkpoint_dir, "model.pdparams")
         logger.info(f"we will create the final model: {final_model}")
         if os.path.islink(final_model):
diff --git a/paddlespeech/vector/io/dataset.py b/paddlespeech/vector/io/dataset.py
index 0a1b2ba5c669d46b6e2597d7a335b6f4c70d0a25..1b514f3d624f1f8ddce610835670bc4f680ecbd7 100644
--- a/paddlespeech/vector/io/dataset.py
+++ b/paddlespeech/vector/io/dataset.py
@@ -15,9 +15,9 @@ from dataclasses import dataclass
 from dataclasses import fields
 
 from paddle.io import Dataset
-
 from paddleaudio import load as load_audio
 from paddleaudio.compliance.librosa import melspectrogram
+
 from paddlespeech.s2t.utils.log import Log
 logger = Log(__name__).getlog()
 
diff --git a/paddlespeech/vector/io/dataset_from_json.py b/paddlespeech/vector/io/dataset_from_json.py
index a4d8c4524a504fc7349ca908fe416df7b52b6e35..bf04e1132950287d77786b6dc7050322a09b4226 100644
--- a/paddlespeech/vector/io/dataset_from_json.py
+++ b/paddlespeech/vector/io/dataset_from_json.py
@@ -16,7 +16,6 @@ from dataclasses import dataclass
 from dataclasses import fields
 
 from paddle.io import Dataset
-
 from paddleaudio import load as load_audio
 from paddleaudio.compliance.librosa import melspectrogram
 from paddleaudio.compliance.librosa import mfcc
diff --git a/paddlespeech/vector/modules/loss.py b/paddlespeech/vector/modules/loss.py
index 9a7530c189a3eebf08fb9d6203a0bedd57629df0..3232e3057a5fb6443a97519cc4a9e863319d8acf 100644
--- a/paddlespeech/vector/modules/loss.py
+++ b/paddlespeech/vector/modules/loss.py
@@ -18,6 +18,7 @@ import math
 import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
+from paddle.nn import initializer as I
 
 
 class AngularMargin(nn.Layer):
@@ -268,22 +269,128 @@ class FocalLoss(nn.Layer):
             return focal_loss.sum()
 
 
+class GE2ELoss(nn.Layer):
+    """Generalized end-to-end loss which defined in the paper "GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION"
+    """
+
+    def __init__(self, init_w=10.0, init_b=-5.0, loss_method="softmax"):
+        super(GE2ELoss, self).__init__()
+        self.loss_method = loss_method.lower()
+        self.w = self.create_parameter(
+            [1], default_initializer=I.Constant(init_w))
+        self.b = self.create_parameter(
+            [1], default_initializer=I.Constant(init_b))
+        assert self.loss_method in ["softmax", "contrast"]
+
+    def get_cossim(self, embeddings_list, centroids):
+        """Compute cosine similarity for each speaker
+        """
+        cossims = []
+        for s_idx, embeddings in enumerate(embeddings_list):
+            cossim = F.linear(embeddings, centroids.t())
+            e_num = len(embeddings)
+            if embeddings.ndim > 1 and e_num > 1:
+                expand_centroids = paddle.expand(
+                    centroids[s_idx], shape=[e_num, embeddings.shape[1]])
+                new_centroids = (expand_centroids * e_num - embeddings) / (
+                    e_num - 1)
+                sims = F.cosine_similarity(embeddings, new_centroids)
+                cossim[:, s_idx] = sims
+            cossims.append(self.w * cossim + self.b)
+
+        return cossims
+
+    def cal_softmax_loss(self, cossims):
+        """Calculate softmax loss
+        """
+        loss = 0.0
+        n = 0
+        for s_idx, cossim in enumerate(cossims):
+            loss += -F.log_softmax(cossim, axis=1)[:, s_idx].sum()
+            n += cossim.shape[0]
+
+        return loss / n
+
+    def cal_contrast_loss(self, cossims):
+        """Calculate contrast loss
+        """
+        loss = 0.0
+        n = 0
+        for s_idx, cossim in enumerate(cossims):
+            cossim = F.sigmoid(cossim)
+            col_loss = 1. - cossim[:, s_idx]
+            if len(cossims) > 1:
+                if s_idx == 0:
+                    excl_centroids_sigmoids = cossim[:, s_idx + 1:]
+                elif s_idx == (len(cossims) - 1):
+                    excl_centroids_sigmoids = cossim[:, :s_idx]
+                else:
+                    excl_centroids_sigmoids = paddle.concat(
+                        (cossim[:, :s_idx], cossim[:, s_idx + 1:]), axis=1)
+                col_loss += paddle.max(excl_centroids_sigmoids, axis=1)[0]
+            loss += col_loss.sum()
+            n += cossim.shape[0]
+
+        return loss / n
+
+    def forward(self, output, target):
+        """Forward inference
+
+            Args:
+                output: input tensor
+                target: target label tensor
+        """
+        spkers = paddle.unique(target)
+
+        embeddings_list = []
+        for spkid in spkers:
+            index = (target == spkid).nonzero().reshape([-1])
+            embeddings_list.append(output[index])
+        # cal centroid
+        centroids = []
+        for embeddings in embeddings_list:
+            if (embeddings.ndim > 1):
+                spker_centroid = paddle.mean(embeddings, axis=0)
+            else:
+                spker_centroid = embeddings
+            centroids.append(spker_centroid.clone() / paddle.norm(
+                spker_centroid, axis=0, keepdim=True))
+        centroids = paddle.stack(centroids)
+        # cal cosine similarity
+        cossims = self.get_cossim(embeddings_list, centroids)
+
+        # cal loss
+        if self.loss_method == "softmax":
+            loss = self.cal_softmax_loss(cossims)
+        else:
+            loss = self.cal_contrast_loss(cossims)
+
+        return loss
+
+
 if __name__ == "__main__":
     import numpy as np
     from paddlespeech.vector.utils.vector_utils import Q_from_tokens
     paddle.set_device("cpu")
 
-    input_data = paddle.uniform([5, 100], dtype="float64")
-    label_data = np.random.randint(0, 100, size=(5)).astype(np.int64)
-
+    input_data = paddle.uniform([32, 100], dtype="float64")
+    label_data = np.random.randint(0, 4, size=(32)).astype(np.int64)
     input = paddle.to_tensor(input_data)
     label = paddle.to_tensor(label_data)
 
-    loss1 = FocalLoss()
+    loss1 = GE2ELoss(loss_method="softmax")
     loss = loss1.forward(input, label)
-    print("loss: %.5f" % (loss))
+    print("GE2ELoss softmax-loss: %.5f" % (loss[0]))
 
-    Q = Q_from_tokens(100)
-    loss2 = NCELoss(Q)
+    loss2 = GE2ELoss(loss_method="contrast")
     loss = loss2.forward(input, label)
-    print("loss: %.5f" % (loss))
+    print("GE2ELoss contrast-loss: %.5f" % (loss[0]))
+
+    loss3 = FocalLoss()
+    loss = loss3.forward(input, label)
+    print("FocalLoss loss: %.5f" % (loss))
+
+    Q = Q_from_tokens(100)
+    loss4 = NCELoss(Q)
+    loss = loss4.forward(input, label)
+    print("NCELoss loss: %.5f" % (loss))
diff --git a/setup.py b/setup.py
index 8053c9b2f2fe8d550f38e809a6ff62d091e3d6ec..ad353d42b5c1a9c48b908d03b7ccd50ce90e5325 100644
--- a/setup.py
+++ b/setup.py
@@ -27,7 +27,8 @@ from setuptools.command.install import install
 
 HERE = Path(os.path.abspath(os.path.dirname(__file__)))
 
-VERSION = '0.2.0'
+VERSION = '0.0.0'
+COMMITID = 'none'
 
 base = [
     "editdistance",
@@ -73,8 +74,6 @@ server = [
     "uvicorn",
     "pattern_singleton",
     "websockets",
-    "websocket",
-    "websocket-client",
 ]
 
 requirements = {
@@ -99,22 +98,31 @@ requirements = {
 }
 
 
-def write_version_py(filename='paddlespeech/__init__.py'):
-    import paddlespeech
-    if hasattr(paddlespeech,
-               "__version__") and paddlespeech.__version__ == VERSION:
-        return
-    with open(filename, "a") as f:
-        f.write(f"\n__version__ = '{VERSION}'\n")
 
+def check_call(cmd: str, shell=False, executable=None):
+    try:
+        sp.check_call(
+            cmd.split(),
+            shell=shell,
+            executable="/bin/bash" if shell else executable)
+    except sp.CalledProcessError as e:
+        print(
+            f"{__file__}:{inspect.currentframe().f_lineno}: CMD: {cmd}, Error:",
+            e.output,
+            file=sys.stderr)
+        raise e
 
-def remove_version_py(filename='paddlespeech/__init__.py'):
-    with open(filename, "r") as f:
-        lines = f.readlines()
-    with open(filename, "w") as f:
-        for line in lines:
-            if "__version__" not in line:
-                f.write(line)
+def check_output(cmd: str, shell=False):
+    try:
+        out_bytes = sp.check_output(cmd.split())
+    except sp.CalledProcessError as e:
+        out_bytes = e.output       # Output generated before error
+        code      = e.returncode   # Return code
+        print(
+            f"{__file__}:{inspect.currentframe().f_lineno}: CMD: {cmd}, Error:",
+            out_bytes,
+            file=sys.stderr)
+    return out_bytes.strip().decode('utf8')
 
 
 @contextlib.contextmanager
@@ -134,24 +142,12 @@ def read(*names, **kwargs):
         return fp.read()
 
 
-def check_call(cmd: str, shell=False, executable=None):
-    try:
-        sp.check_call(
-            cmd.split(),
-            shell=shell,
-            executable="/bin/bash" if shell else executable)
-    except sp.CalledProcessError as e:
-        print(
-            f"{__file__}:{inspect.currentframe().f_lineno}: CMD: {cmd}, Error:",
-            e.output,
-            file=sys.stderr)
-        raise e
-
-
 def _remove(files: str):
     for f in files:
         f.unlink()
 
+################################# Install ##################################
+
 
 def _post_install(install_lib_dir):
     # tools/make
@@ -204,8 +200,45 @@ class UploadCommand(Command):
         sys.exit()
 
 
-write_version_py()
+################################# Version ##################################
+def write_version_py(filename='paddlespeech/__init__.py'):
+    import paddlespeech
+    if hasattr(paddlespeech,
+               "__version__") and paddlespeech.__version__ == VERSION:
+        return
+    with open(filename, "a") as f:
+        out_str = f"\n__version__ = '{VERSION}'\n"
+        print(out_str)
+        f.write(f"\n__version__ = '{VERSION}'\n")
+
+    COMMITID = check_output("git rev-parse HEAD")
+    with open(filename, 'a') as f:
+        out_str = f"\n__commit__ = '{COMMITID}'\n"
+        print(out_str)
+        f.write(f"\n__commit__ = '{COMMITID}'\n")
+
+    print(f"{inspect.currentframe().f_code.co_name} done")
+
+
+def remove_version_py(filename='paddlespeech/__init__.py'):
+    with open(filename, "r") as f:
+        lines = f.readlines()
+    with open(filename, "w") as f:
+        for line in lines:
+            if "__version__" in line or "__commit__" in line:
+                continue
+            f.write(line)
+    print(f"{inspect.currentframe().f_code.co_name} done")
+
+
+@contextlib.contextmanager
+def version_info():
+    write_version_py()
+    yield
+    remove_version_py()
+
 
+################################# Steup ##################################
 setup_info = dict(
     # Metadata
     name='paddlespeech',
@@ -275,6 +308,6 @@ setup_info = dict(
         ]
     })
 
-setup(**setup_info)
 
-remove_version_py()
+with version_info():
+    setup(**setup_info)
diff --git a/speechx/README.md b/speechx/README.md
index 34a662786f4b1c5acb1213d0cb7ec798ec22ad26..f75d8ac4eb42c4ca257279763006bd3fc61dee42 100644
--- a/speechx/README.md
+++ b/speechx/README.md
@@ -24,8 +24,6 @@ docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/workspace --nam
 
 * More `Paddle` docker images you can see [here](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html).
 
-* If you want only work under cpu, please download corresponded [image](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html), and using `docker` instead `nvidia-docker`.
-
 
 2. Build `speechx` and `examples`.
 
diff --git a/speechx/examples/README.md b/speechx/examples/README.md
index 50f5f902fefc190c36f154a8eec243bf39e24f45..b18c88e048dc242532e43f032c8098b872ef831a 100644
--- a/speechx/examples/README.md
+++ b/speechx/examples/README.md
@@ -1,8 +1,6 @@
 # Examples for SpeechX
 
-* ds2_ol - ds2 streaming test under `aishell-1` test dataset.
-   The entrypoint is `ds2_ol/aishell/run.sh`
-
+* `ds2_ol` - ds2 streaming test under `aishell-1` test dataset. 
 
 ## How to run  
 
diff --git a/speechx/examples/ds2_ol/README.md b/speechx/examples/ds2_ol/README.md
index 18f248a12d3c7c3ee68a7d75bfb1f7317a5b8c26..f405198d9137ef9372a7889b9248fe84b6ec68ad 100644
--- a/speechx/examples/ds2_ol/README.md
+++ b/speechx/examples/ds2_ol/README.md
@@ -1,13 +1,14 @@
-# Deepspeech2 Streaming
+# Deepspeech2 Streaming ASR
 
-Please go to `aishell` to test it.
+## Examples
 
-* aishell
-Deepspeech2 Streaming Decoding under aishell dataset.
-* websocket
-Streaming ASR with websocket.
+* `websocket` - Streaming ASR with websocket.
 
-The below is for developing and offline testing:
+* `aishell` - Streaming Decoding under aishell dataset, for local WER test.
+
+## More
+
+> The below is for developing and offline testing. Do not run it only if you know what it is.
 * nnet
 * feat
 * decoder
diff --git a/speechx/examples/ds2_ol/aishell/README.md b/speechx/examples/ds2_ol/aishell/README.md
index 01c899799cfceb9ed3f75ed15aebe418dfc0e52d..1ed8a67c2f682a36ab652ba74aa7e017cd2d4ebb 100644
--- a/speechx/examples/ds2_ol/aishell/README.md
+++ b/speechx/examples/ds2_ol/aishell/README.md
@@ -1,6 +1,14 @@
 # Aishell - Deepspeech2 Streaming
 
-## CTC Prefix Beam Search w/o LM
+## How to run
+
+```
+bash run.sh
+```
+
+## Results
+
+### CTC Prefix Beam Search w/o LM
 
 ```
 Overall -> 16.14 % N=104612 C=88190 S=16110 D=312 I=465
@@ -8,7 +16,7 @@ Mandarin -> 16.14 % N=104612 C=88190 S=16110 D=312 I=465
 Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
 ```
 
-## CTC Prefix Beam Search w/ LM
+### CTC Prefix Beam Search w/ LM
 
 LM: zh_giga.no_cna_cmn.prune01244.klm
 ```
@@ -17,7 +25,7 @@ Mandarin -> 7.86 % N=104768 C=96865 S=7573 D=330 I=327
 Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
 ```
 
-## CTC WFST
+### CTC WFST
 
 LM: [aishell train](http://paddlespeech.bj.bcebos.com/speechx/examples/ds2_ol/aishell/aishell_graph.zip)
 --acoustic_scale=1.2
diff --git a/speechx/examples/ds2_ol/aishell/run.sh b/speechx/examples/ds2_ol/aishell/run.sh
index 06f274276f5e61d232e6cd505ba84936ad167edf..650cb14090eea9b33e11d24c6d1951033f3e4c78 100755
--- a/speechx/examples/ds2_ol/aishell/run.sh
+++ b/speechx/examples/ds2_ol/aishell/run.sh
@@ -66,9 +66,9 @@ wer=./aishell_wer
 export GLOG_logtostderr=1
 
 
+cmvn=$data/cmvn.ark
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
     # 3. gen linear feat
-    cmvn=$data/cmvn.ark
     cmvn-json2kaldi --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn
 
     ./local/split_data.sh $data $data/$aishell_wav_scp $aishell_wav_scp $nj
@@ -79,6 +79,7 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
         --feature_wspecifier=ark,scp:$data/split${nj}/JOB/feat.ark,$data/split${nj}/JOB/feat.scp \
         --cmvn_file=$cmvn \
         --streaming_chunk=0.36
+    echo "feature make have finished!!!"
 fi
 
 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@@ -94,6 +95,8 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
 
     cat $data/split${nj}/*/result > $exp/${label_file}
     utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file} > $exp/${wer}
+    echo "ctc-prefix-beam-search-decoder-ol without lm has finished!!!"
+    echo "please checkout in ${exp}/${wer}"
 fi
 
 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
@@ -110,10 +113,12 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
  
     cat $data/split${nj}/*/result_lm > $exp/${label_file}_lm
     utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_lm > $exp/${wer}.lm
+    echo "ctc-prefix-beam-search-decoder-ol with lm test has finished!!!"
+    echo "please checkout in ${exp}/${wer}.lm"
 fi
 
+wfst=$data/wfst/
 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
-    wfst=$data/wfst/
     mkdir -p $wfst
     if [ ! -f $wfst/aishell_graph.zip ]; then
         pushd $wfst
@@ -122,60 +127,44 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
         mv aishell_graph/* $wfst
         popd
     fi
+fi
 
-    graph_dir=$wfst/
-
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
     #  TLG decoder
     utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.wfst.log \
     wfst-decoder-ol \
         --feature_rspecifier=scp:$data/split${nj}/JOB/feat.scp \
         --model_path=$model_dir/avg_1.jit.pdmodel \
         --param_path=$model_dir/avg_1.jit.pdiparams \
-        --word_symbol_table=$graph_dir/words.txt \
+        --word_symbol_table=$wfst/words.txt \
         --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
-        --graph_path=$graph_dir/TLG.fst --max_active=7500 \
+        --graph_path=$wfst/TLG.fst --max_active=7500 \
         --acoustic_scale=1.2 \
         --result_wspecifier=ark,t:$data/split${nj}/JOB/result_tlg
 
     cat $data/split${nj}/*/result_tlg > $exp/${label_file}_tlg
     utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_tlg > $exp/${wer}.tlg
+    echo "wfst-decoder-ol have finished!!!"
+    echo "please checkout in ${exp}/${wer}.tlg"
 fi
 
 if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
-
-    cmvn=$data/cmvn.ark
-    if [ ! -f $data/split${nj}/1/${aishell_wav_scp} ]; then
-        cmvn-json2kaldi --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn
-        ./local/split_data.sh $data ${data}/${aishell_wav_scp} $aishell_wav_scp $nj
-    fi
-
-    wfst=$data/wfst/
-    mkdir -p $wfst
-    if [ ! -f $wfst/aishell_graph.zip ]; then
-        pushd $wfst
-        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph.zip
-        unzip aishell_graph.zip
-        popd
-    fi
-
-    graph_dir=$wfst/aishell_graph
-
     #  TLG decoder
     utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recognizer.log \
     recognizer_test_main \
         --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
         --cmvn_file=$cmvn \
         --model_path=$model_dir/avg_1.jit.pdmodel \
-        --convert2PCM32=true \
         --streaming_chunk=30 \
-        --params_path=$model_dir/avg_1.jit.pdiparams \
-        --word_symbol_table=$graph_dir/words.txt \
+        --param_path=$model_dir/avg_1.jit.pdiparams \
+        --word_symbol_table=$wfst/words.txt \
         --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
-        --graph_path=$graph_dir/TLG.fst --max_active=7500 \
+        --graph_path=$wfst/TLG.fst --max_active=7500 \
         --acoustic_scale=1.2 \
         --result_wspecifier=ark,t:$data/split${nj}/JOB/result_recognizer
 
     cat $data/split${nj}/*/result_recognizer > $exp/${label_file}_recognizer
     utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_recognizer > $exp/${wer}.recognizer
+    echo "recognizer test have finished!!!"
+    echo "please checkout in ${exp}/${wer}.recognizer"
 fi
-
diff --git a/speechx/examples/ds2_ol/aishell/run_fbank.sh b/speechx/examples/ds2_ol/aishell/run_fbank.sh
new file mode 100755
index 0000000000000000000000000000000000000000..3d4825ace078d2720fee9b6b9b73cac5ab71492a
--- /dev/null
+++ b/speechx/examples/ds2_ol/aishell/run_fbank.sh
@@ -0,0 +1,170 @@
+#!/bin/bash
+set +x
+set -e
+
+. path.sh
+
+nj=40
+stage=0
+stop_stage=5
+
+. utils/parse_options.sh
+
+# 1. compile
+if [ ! -d ${SPEECHX_EXAMPLES} ]; then
+    pushd ${SPEECHX_ROOT} 
+    bash build.sh
+    popd
+fi
+
+# input
+mkdir -p data
+data=$PWD/data
+
+ckpt_dir=$data/fbank_model
+model_dir=$ckpt_dir/exp/deepspeech2_online/checkpoints/
+vocb_dir=$ckpt_dir/data/lang_char/
+
+# output
+mkdir -p exp
+exp=$PWD/exp
+
+lm=$data/zh_giga.no_cna_cmn.prune01244.klm
+aishell_wav_scp=aishell_test.scp
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ];then
+    if [ ! -d $data/test ]; then
+        pushd $data
+        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_test.zip
+        unzip  aishell_test.zip
+        popd
+
+        realpath $data/test/*/*.wav > $data/wavlist
+        awk -F '/' '{ print $(NF) }' $data/wavlist | awk -F '.' '{ print $1 }' > $data/utt_id
+        paste $data/utt_id $data/wavlist > $data/$aishell_wav_scp
+    fi
+
+    if [ ! -f $ckpt_dir/data/mean_std.json ]; then
+        mkdir -p $ckpt_dir
+        pushd $ckpt_dir
+        wget -c https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/WIP1_asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz
+        tar xzfv WIP1_asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz
+        popd
+    fi
+
+    if [ ! -f $lm ]; then
+        pushd $data
+        wget -c https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm
+        popd
+    fi
+fi
+
+# 3. make feature
+text=$data/test/text
+label_file=./aishell_result_fbank
+wer=./aishell_wer_fbank
+
+export GLOG_logtostderr=1
+
+
+cmvn=$data/cmvn_fbank.ark
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+    # 3. gen linear feat
+    cmvn-json2kaldi --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn --binary=false
+
+    ./local/split_data.sh $data $data/$aishell_wav_scp $aishell_wav_scp $nj
+
+    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/feat.log \
+    compute_fbank_main \
+        --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
+        --feature_wspecifier=ark,scp:$data/split${nj}/JOB/fbank_feat.ark,$data/split${nj}/JOB/fbank_feat.scp \
+        --cmvn_file=$cmvn \
+        --streaming_chunk=36
+fi
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+    #  recognizer
+    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.fbank.wolm.log \
+    ctc-prefix-beam-search-decoder-ol \
+        --feature_rspecifier=scp:$data/split${nj}/JOB/fbank_feat.scp \
+        --model_path=$model_dir/avg_5.jit.pdmodel \
+        --param_path=$model_dir/avg_5.jit.pdiparams \
+        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+    	--model_cache_shapes="5-1-2048,5-1-2048" \
+        --dict_file=$vocb_dir/vocab.txt \
+        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_fbank
+
+    cat $data/split${nj}/*/result_fbank > $exp/${label_file}
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file} > $exp/${wer}
+fi
+
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+    #  decode with lm
+    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.fbank.lm.log \
+    ctc-prefix-beam-search-decoder-ol \
+        --feature_rspecifier=scp:$data/split${nj}/JOB/fbank_feat.scp \
+        --model_path=$model_dir/avg_5.jit.pdmodel \
+        --param_path=$model_dir/avg_5.jit.pdiparams \
+        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+	--model_cache_shapes="5-1-2048,5-1-2048" \
+        --dict_file=$vocb_dir/vocab.txt \
+        --lm_path=$lm \
+        --result_wspecifier=ark,t:$data/split${nj}/JOB/fbank_result_lm
+ 
+    cat $data/split${nj}/*/fbank_result_lm > $exp/${label_file}_lm
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_lm > $exp/${wer}.lm
+fi
+
+wfst=$data/wfst_fbank/
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+    mkdir -p $wfst
+    if [ ! -f $wfst/aishell_graph2.zip ]; then
+        pushd $wfst
+        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph2.zip
+        unzip aishell_graph2.zip
+        mv aishell_graph2/* $wfst
+        popd
+    fi
+fi
+
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+    #  TLG decoder
+    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.fbank.wfst.log \
+    wfst-decoder-ol \
+        --feature_rspecifier=scp:$data/split${nj}/JOB/fbank_feat.scp \
+        --model_path=$model_dir/avg_5.jit.pdmodel \
+        --param_path=$model_dir/avg_5.jit.pdiparams \
+        --word_symbol_table=$wfst/words.txt \
+        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+	--model_cache_shapes="5-1-2048,5-1-2048" \
+        --graph_path=$wfst/TLG.fst --max_active=7500 \
+        --acoustic_scale=1.2 \
+        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_tlg
+
+    cat $data/split${nj}/*/result_tlg > $exp/${label_file}_tlg
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_tlg > $exp/${wer}.tlg
+    echo "wfst-decoder-ol have finished!!!"
+    echo "please checkout in ${exp}/${wer}.tlg"
+fi
+
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/fbank_recognizer.log \
+    recognizer_test_main \
+        --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
+        --cmvn_file=$cmvn \
+        --model_path=$model_dir/avg_5.jit.pdmodel \
+        --streaming_chunk=30 \
+        --use_fbank=true \
+        --to_float32=false \
+        --param_path=$model_dir/avg_5.jit.pdiparams \
+        --word_symbol_table=$wfst/words.txt \
+        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+        --model_cache_shapes="5-1-2048,5-1-2048" \
+        --graph_path=$wfst/TLG.fst --max_active=7500 \
+        --acoustic_scale=1.2 \
+        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_fbank_recognizer
+
+    cat $data/split${nj}/*/result_fbank_recognizer > $exp/${label_file}_recognizer
+    utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file}_recognizer > $exp/${wer}.recognizer
+    echo "recognizer test have finished!!!"
+    echo "please checkout in ${exp}/${wer}.recognizer"
+fi
diff --git a/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc b/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc
index 6a6495aa1a9313ce3c93e7db2b05be079a22c328..eaec41b71f43ef2835004b22a969cb572b4e5e25 100644
--- a/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc
+++ b/speechx/examples/ds2_ol/decoder/ctc-prefix-beam-search-decoder-ol.cc
@@ -41,7 +41,10 @@ DEFINE_string(
 DEFINE_string(model_output_names,
               "softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0",
               "model output names");
-DEFINE_string(model_cache_names, "5-1-1024,5-1-1024", "model cache names");
+DEFINE_string(model_cache_names,
+              "chunk_state_h_box,chunk_state_c_box",
+              "model cache names");
+DEFINE_string(model_cache_shapes, "5-1-1024,5-1-1024", "model cache shapes");
 
 using kaldi::BaseFloat;
 using kaldi::Matrix;
@@ -77,7 +80,8 @@ int main(int argc, char* argv[]) {
     ppspeech::ModelOptions model_opts;
     model_opts.model_path = model_path;
     model_opts.param_path = model_params;
-    model_opts.cache_shape = FLAGS_model_cache_names;
+    model_opts.cache_names = FLAGS_model_cache_names;
+    model_opts.cache_shape = FLAGS_model_cache_shapes;
     model_opts.input_names = FLAGS_model_input_names;
     model_opts.output_names = FLAGS_model_output_names;
     std::shared_ptr<ppspeech::PaddleNnet> nnet(
@@ -94,6 +98,7 @@ int main(int argc, char* argv[]) {
     LOG(INFO) << "receptive field (frame): " << receptive_field_length;
     decoder.InitDecoder();
 
+    kaldi::Timer timer;
     for (; !feature_reader.Done(); feature_reader.Next()) {
         string utt = feature_reader.Key();
         kaldi::Matrix<BaseFloat> feature = feature_reader.Value();
@@ -156,5 +161,7 @@ int main(int argc, char* argv[]) {
 
     KALDI_LOG << "Done " << num_done << " utterances, " << num_err
               << " with errors.";
+    double elapsed = timer.Elapsed();
+    KALDI_LOG << " cost:" << elapsed << " s";
     return (num_done != 0 ? 0 : 1);
 }
diff --git a/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc b/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc
index e6fed0ed9d5c24518ac64936c235788d8d10456e..7aef73f7439ef7cf56e46486ca6e2baf8e1640f0 100644
--- a/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc
+++ b/speechx/examples/ds2_ol/decoder/recognizer_test_main.cc
@@ -19,6 +19,7 @@
 
 DEFINE_string(wav_rspecifier, "", "test feature rspecifier");
 DEFINE_string(result_wspecifier, "", "test result wspecifier");
+DEFINE_int32(sample_rate, 16000, "sample rate");
 
 int main(int argc, char* argv[]) {
     gflags::ParseCommandLineFlags(&argc, &argv, false);
@@ -30,7 +31,8 @@ int main(int argc, char* argv[]) {
     kaldi::SequentialTableReader<kaldi::WaveHolder> wav_reader(
         FLAGS_wav_rspecifier);
     kaldi::TokenWriter result_writer(FLAGS_result_wspecifier);
-    int sample_rate = 16000;
+
+    int sample_rate = FLAGS_sample_rate;
     float streaming_chunk = FLAGS_streaming_chunk;
     int chunk_sample_size = streaming_chunk * sample_rate;
     LOG(INFO) << "sr: " << sample_rate;
@@ -38,6 +40,9 @@ int main(int argc, char* argv[]) {
     LOG(INFO) << "chunk size (sample): " << chunk_sample_size;
 
     int32 num_done = 0, num_err = 0;
+    double tot_wav_duration = 0.0;
+
+    kaldi::Timer timer;
 
     for (; !wav_reader.Done(); wav_reader.Next()) {
         std::string utt = wav_reader.Key();
@@ -47,6 +52,7 @@ int main(int argc, char* argv[]) {
         kaldi::SubVector<kaldi::BaseFloat> waveform(wave_data.Data(),
                                                     this_channel);
         int tot_samples = waveform.Dim();
+        tot_wav_duration += tot_samples * 1.0 / sample_rate;
         LOG(INFO) << "wav len (sample): " << tot_samples;
 
         int sample_offset = 0;
@@ -85,4 +91,9 @@ int main(int argc, char* argv[]) {
         result_writer.Write(utt, result);
         ++num_done;
     }
+    double elapsed = timer.Elapsed();
+    KALDI_LOG << "Done " << num_done << " out of " << (num_err + num_done);
+    KALDI_LOG << " cost:" << elapsed << " s";
+    KALDI_LOG << "total wav duration is: " << tot_wav_duration << " s";
+    KALDI_LOG << "the RTF is: " << elapsed / tot_wav_duration;
 }
\ No newline at end of file
diff --git a/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc b/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc
index 544e59cb1a765b62728de3e75ea010544ea71e70..fefc16d2cf86740a51b9fb7d137fd8a07513d8ee 100644
--- a/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc
+++ b/speechx/examples/ds2_ol/decoder/wfst-decoder-ol.cc
@@ -44,7 +44,10 @@ DEFINE_string(
 DEFINE_string(model_output_names,
               "softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0",
               "model output names");
-DEFINE_string(model_cache_names, "5-1-1024,5-1-1024", "model cache names");
+DEFINE_string(model_cache_names,
+              "chunk_state_h_box,chunk_state_c_box",
+              "model cache names");
+DEFINE_string(model_cache_shapes, "5-1-1024,5-1-1024", "model cache shapes");
 
 using kaldi::BaseFloat;
 using kaldi::Matrix;
@@ -80,7 +83,8 @@ int main(int argc, char* argv[]) {
     ppspeech::ModelOptions model_opts;
     model_opts.model_path = model_graph;
     model_opts.param_path = model_params;
-    model_opts.cache_shape = FLAGS_model_cache_names;
+    model_opts.cache_names = FLAGS_model_cache_names;
+    model_opts.cache_shape = FLAGS_model_cache_shapes;
     model_opts.input_names = FLAGS_model_input_names;
     model_opts.output_names = FLAGS_model_output_names;
     std::shared_ptr<ppspeech::PaddleNnet> nnet(
@@ -96,7 +100,7 @@ int main(int argc, char* argv[]) {
     LOG(INFO) << "chunk stride (frame): " << chunk_stride;
     LOG(INFO) << "receptive field (frame): " << receptive_field_length;
     decoder.InitDecoder();
-
+    kaldi::Timer timer;
     for (; !feature_reader.Done(); feature_reader.Next()) {
         string utt = feature_reader.Key();
         kaldi::Matrix<BaseFloat> feature = feature_reader.Value();
@@ -156,6 +160,9 @@ int main(int argc, char* argv[]) {
         ++num_done;
     }
 
+    double elapsed = timer.Elapsed();
+    KALDI_LOG << " cost:" << elapsed << " s";
+
     KALDI_LOG << "Done " << num_done << " utterances, " << num_err
               << " with errors.";
     return (num_done != 0 ? 0 : 1);
diff --git a/speechx/examples/ds2_ol/feat/CMakeLists.txt b/speechx/examples/ds2_ol/feat/CMakeLists.txt
index db59fc8ecfb42efa5b651284cd649290938361e7..632f22e85897f055235735f53c7b8f0a993e6e5b 100644
--- a/speechx/examples/ds2_ol/feat/CMakeLists.txt
+++ b/speechx/examples/ds2_ol/feat/CMakeLists.txt
@@ -5,8 +5,12 @@ add_executable(${bin_name} ${CMAKE_CURRENT_SOURCE_DIR}/${bin_name}.cc)
 target_include_directories(${bin_name} PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
 target_link_libraries(${bin_name} frontend kaldi-util kaldi-feat-common gflags glog)
 
+set(bin_name compute_fbank_main)
+add_executable(${bin_name} ${CMAKE_CURRENT_SOURCE_DIR}/${bin_name}.cc)
+target_include_directories(${bin_name} PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
+target_link_libraries(${bin_name} frontend kaldi-util kaldi-feat-common gflags glog)
 
 set(bin_name cmvn-json2kaldi)
 add_executable(${bin_name} ${CMAKE_CURRENT_SOURCE_DIR}/${bin_name}.cc)
 target_include_directories(${bin_name} PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
-target_link_libraries(${bin_name} utils kaldi-util kaldi-matrix gflags glog ${DEPS})
\ No newline at end of file
+target_link_libraries(${bin_name} utils kaldi-util kaldi-matrix gflags glog)
diff --git a/speechx/examples/ds2_ol/feat/compute_fbank_main.cc b/speechx/examples/ds2_ol/feat/compute_fbank_main.cc
new file mode 100644
index 0000000000000000000000000000000000000000..67683eebf6e2fb3f73b6a44f6c9ac682c6c5cda7
--- /dev/null
+++ b/speechx/examples/ds2_ol/feat/compute_fbank_main.cc
@@ -0,0 +1,143 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// todo refactor, repalce with gtest
+
+#include "base/flags.h"
+#include "base/log.h"
+#include "kaldi/feat/wave-reader.h"
+#include "kaldi/util/kaldi-io.h"
+#include "kaldi/util/table-types.h"
+
+#include "frontend/audio/audio_cache.h"
+#include "frontend/audio/data_cache.h"
+#include "frontend/audio/fbank.h"
+#include "frontend/audio/feature_cache.h"
+#include "frontend/audio/frontend_itf.h"
+#include "frontend/audio/normalizer.h"
+
+DEFINE_string(wav_rspecifier, "", "test wav scp path");
+DEFINE_string(feature_wspecifier, "", "output feats wspecifier");
+DEFINE_string(cmvn_file, "", "read cmvn");
+DEFINE_double(streaming_chunk, 0.36, "streaming feature chunk size");
+DEFINE_int32(num_bins, 161, "fbank num bins");
+
+int main(int argc, char* argv[]) {
+    gflags::ParseCommandLineFlags(&argc, &argv, false);
+    google::InitGoogleLogging(argv[0]);
+
+    kaldi::SequentialTableReader<kaldi::WaveHolder> wav_reader(
+        FLAGS_wav_rspecifier);
+    kaldi::BaseFloatMatrixWriter feat_writer(FLAGS_feature_wspecifier);
+
+    int32 num_done = 0, num_err = 0;
+
+    // feature pipeline: wave cache --> povey window
+    // -->fbank --> global cmvn -> feat cache
+
+    std::unique_ptr<ppspeech::FrontendInterface> data_source(
+        new ppspeech::AudioCache(3600 * 1600, false));
+
+    ppspeech::FbankOptions opt;
+    opt.fbank_opts.frame_opts.frame_length_ms = 25;
+    opt.fbank_opts.frame_opts.frame_shift_ms = 10;
+    opt.streaming_chunk = FLAGS_streaming_chunk;
+    opt.fbank_opts.mel_opts.num_bins = FLAGS_num_bins;
+    opt.fbank_opts.frame_opts.dither = 0.0;
+
+    std::unique_ptr<ppspeech::FrontendInterface> fbank(
+        new ppspeech::Fbank(opt, std::move(data_source)));
+
+    std::unique_ptr<ppspeech::FrontendInterface> cmvn(
+        new ppspeech::CMVN(FLAGS_cmvn_file, std::move(fbank)));
+
+    ppspeech::FeatureCacheOptions feat_cache_opts;
+    // the feature cache output feature chunk by chunk.
+    // frame_chunk_size : num frame of a chunk.
+    // frame_chunk_stride: chunk sliding window stride.
+    feat_cache_opts.frame_chunk_stride = 1;
+    feat_cache_opts.frame_chunk_size = 1;
+    ppspeech::FeatureCache feature_cache(feat_cache_opts, std::move(cmvn));
+    LOG(INFO) << "fbank: " << true;
+    LOG(INFO) << "feat dim: " << feature_cache.Dim();
+
+    int sample_rate = 16000;
+    float streaming_chunk = FLAGS_streaming_chunk;
+    int chunk_sample_size = streaming_chunk * sample_rate;
+    LOG(INFO) << "sr: " << sample_rate;
+    LOG(INFO) << "chunk size (s): " << streaming_chunk;
+    LOG(INFO) << "chunk size (sample): " << chunk_sample_size;
+
+    for (; !wav_reader.Done(); wav_reader.Next()) {
+        std::string utt = wav_reader.Key();
+        const kaldi::WaveData& wave_data = wav_reader.Value();
+        LOG(INFO) << "process utt: " << utt;
+
+        int32 this_channel = 0;
+        kaldi::SubVector<kaldi::BaseFloat> waveform(wave_data.Data(),
+                                                    this_channel);
+        int tot_samples = waveform.Dim();
+        LOG(INFO) << "wav len (sample): " << tot_samples;
+
+        int sample_offset = 0;
+        std::vector<kaldi::Vector<BaseFloat>> feats;
+        int feature_rows = 0;
+        while (sample_offset < tot_samples) {
+            int cur_chunk_size =
+                std::min(chunk_sample_size, tot_samples - sample_offset);
+
+            kaldi::Vector<kaldi::BaseFloat> wav_chunk(cur_chunk_size);
+            for (int i = 0; i < cur_chunk_size; ++i) {
+                wav_chunk(i) = waveform(sample_offset + i);
+            }
+
+            kaldi::Vector<BaseFloat> features;
+            feature_cache.Accept(wav_chunk);
+            if (cur_chunk_size < chunk_sample_size) {
+                feature_cache.SetFinished();
+            }
+            bool flag = true;
+            do {
+                flag = feature_cache.Read(&features);
+                feats.push_back(features);
+                feature_rows += features.Dim() / feature_cache.Dim();
+            } while (flag == true && features.Dim() != 0);
+            sample_offset += cur_chunk_size;
+        }
+
+        int cur_idx = 0;
+        kaldi::Matrix<kaldi::BaseFloat> features(feature_rows,
+                                                 feature_cache.Dim());
+        for (auto feat : feats) {
+            int num_rows = feat.Dim() / feature_cache.Dim();
+            for (int row_idx = 0; row_idx < num_rows; ++row_idx) {
+                for (size_t col_idx = 0; col_idx < feature_cache.Dim();
+                     ++col_idx) {
+                    features(cur_idx, col_idx) =
+                        feat(row_idx * feature_cache.Dim() + col_idx);
+                }
+                ++cur_idx;
+            }
+        }
+        feat_writer.Write(utt, features);
+        feature_cache.Reset();
+
+        if (num_done % 50 == 0 && num_done != 0)
+            KALDI_VLOG(2) << "Processed " << num_done << " utterances";
+        num_done++;
+    }
+    KALDI_LOG << "Done " << num_done << " utterances, " << num_err
+              << " with errors.";
+    return (num_done != 0 ? 0 : 1);
+}
diff --git a/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc b/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc
index 6f532af442e372e4906822873ae401a1fb11c832..bbf0e6908dddb0e7b1c776d9f42616fb70d92e81 100644
--- a/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc
+++ b/speechx/examples/ds2_ol/feat/linear-spectrogram-wo-db-norm-ol.cc
@@ -42,8 +42,8 @@ int main(int argc, char* argv[]) {
 
     int32 num_done = 0, num_err = 0;
 
-    // feature pipeline: wave cache --> hanning
-    // window -->linear_spectrogram --> global cmvn -> feat cache
+    // feature pipeline: wave cache --> hanning window
+    // -->linear_spectrogram --> global cmvn -> feat cache
 
     std::unique_ptr<ppspeech::FrontendInterface> data_source(
         new ppspeech::AudioCache(3600 * 1600, true));
@@ -56,6 +56,7 @@ int main(int argc, char* argv[]) {
     opt.frame_opts.remove_dc_offset = false;
     opt.frame_opts.window_type = "hanning";
     opt.frame_opts.preemph_coeff = 0.0;
+    LOG(INFO) << "linear feature: " << true;
     LOG(INFO) << "frame length (ms): " << opt.frame_opts.frame_length_ms;
     LOG(INFO) << "frame shift (ms): " << opt.frame_opts.frame_shift_ms;
 
@@ -77,7 +78,7 @@ int main(int argc, char* argv[]) {
     int sample_rate = 16000;
     float streaming_chunk = FLAGS_streaming_chunk;
     int chunk_sample_size = streaming_chunk * sample_rate;
-    LOG(INFO) << "sr: " << sample_rate;
+    LOG(INFO) << "sample rate: " << sample_rate;
     LOG(INFO) << "chunk size (s): " << streaming_chunk;
     LOG(INFO) << "chunk size (sample): " << chunk_sample_size;
 
@@ -115,7 +116,7 @@ int main(int argc, char* argv[]) {
                 flag = feature_cache.Read(&features);
                 feats.push_back(features);
                 feature_rows += features.Dim() / feature_cache.Dim();
-            } while(flag == true && features.Dim() != 0);
+            } while (flag == true && features.Dim() != 0);
             sample_offset += cur_chunk_size;
         }
 
diff --git a/speechx/examples/ds2_ol/websocket/websocket_client.sh b/speechx/examples/ds2_ol/websocket/websocket_client.sh
index 3c6b4e911264d82d7a7a6fe619f84d2c137755ed..2a52d2a3d8d2d9e7ddde908273b8c14334732a40 100755
--- a/speechx/examples/ds2_ol/websocket/websocket_client.sh
+++ b/speechx/examples/ds2_ol/websocket/websocket_client.sh
@@ -14,9 +14,7 @@ fi
 # input
 mkdir -p data
 data=$PWD/data
-ckpt_dir=$data/model
-model_dir=$ckpt_dir/exp/deepspeech2_online/checkpoints/
-vocb_dir=$ckpt_dir/data/lang_char
+
 # output
 aishell_wav_scp=aishell_test.scp
 if [ ! -d $data/test ]; then
@@ -34,4 +32,4 @@ export GLOG_logtostderr=1
 
 # websocket client
 websocket_client_main \
-    --wav_rspecifier=scp:$data/$aishell_wav_scp --streaming_chunk=0.36
+    --wav_rspecifier=scp:$data/$aishell_wav_scp --streaming_chunk=0.36
\ No newline at end of file
diff --git a/speechx/examples/ds2_ol/websocket/websocket_server.sh b/speechx/examples/ds2_ol/websocket/websocket_server.sh
index 0e9e796cf0652de2727d7f52c3d62051c1204243..fc57e326fb8cc2491d2443738fb8552e052fd033 100755
--- a/speechx/examples/ds2_ol/websocket/websocket_server.sh
+++ b/speechx/examples/ds2_ol/websocket/websocket_server.sh
@@ -19,12 +19,26 @@ ckpt_dir=$data/model
 model_dir=$ckpt_dir/exp/deepspeech2_online/checkpoints/
 vocb_dir=$ckpt_dir/data/lang_char/
 
+# output
+aishell_wav_scp=aishell_test.scp
+if [ ! -d $data/test ]; then
+    pushd $data
+    wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_test.zip
+    unzip  aishell_test.zip
+    popd
+
+    realpath $data/test/*/*.wav > $data/wavlist
+    awk -F '/' '{ print $(NF) }' $data/wavlist | awk -F '.' '{ print $1 }' > $data/utt_id
+    paste $data/utt_id $data/wavlist > $data/$aishell_wav_scp
+fi
+
+
 if [ ! -f $ckpt_dir/data/mean_std.json ]; then
-        mkdir -p $ckpt_dir
-        pushd $ckpt_dir
-        wget -c https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz
-        tar xzfv asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz 
-        popd
+    mkdir -p $ckpt_dir
+    pushd $ckpt_dir
+    wget -c https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz
+    tar xzfv asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz 
+    popd
 fi
 
 export GLOG_logtostderr=1
@@ -49,9 +63,8 @@ websocket_server_main \
     --cmvn_file=$cmvn \
     --model_path=$model_dir/avg_1.jit.pdmodel \
     --streaming_chunk=0.1 \
-    --convert2PCM32=true \
     --param_path=$model_dir/avg_1.jit.pdiparams \
-    --word_symbol_table=$data/wfst/words.txt \
+    --word_symbol_table=$wfst/words.txt \
     --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
-    --graph_path=$data/wfst/TLG.fst --max_active=7500 \
+    --graph_path=$wfst/TLG.fst --max_active=7500 \
     --acoustic_scale=1.2 
diff --git a/speechx/patch/README.md b/speechx/patch/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1bee5ed64f1b9323e18961008212421bfae48042
--- /dev/null
+++ b/speechx/patch/README.md
@@ -0,0 +1,2 @@
+reference:
+this patch is from WeNet wenet/runtime/core/patch
diff --git a/speechx/speechx/decoder/ctc_tlg_decoder.cc b/speechx/speechx/decoder/ctc_tlg_decoder.cc
index 7b720e7ba424da48d2691a05fe0bc8e9c0f94f13..02e6431658a453b70c23435fdb22b4ac9fc034d5 100644
--- a/speechx/speechx/decoder/ctc_tlg_decoder.cc
+++ b/speechx/speechx/decoder/ctc_tlg_decoder.cc
@@ -48,6 +48,12 @@ void TLGDecoder::Reset() {
 }
 
 std::string TLGDecoder::GetFinalBestPath() {
+    if (frame_decoded_size_ == 0) {
+        // Assertion failed: (this->NumFramesDecoded() > 0 && "You cannot call
+        // BestPathEnd if no frames were decoded.")
+        return std::string("");
+    }
+
     decoder_->FinalizeDecoding();
     kaldi::Lattice lat;
     kaldi::LatticeWeight weight;
diff --git a/speechx/speechx/decoder/param.h b/speechx/speechx/decoder/param.h
index aff8d39a83a529f494faf9870c0183ad2b72aac4..b2bf1890a41c5d831d5fc777bbec7aea93e1eb0e 100644
--- a/speechx/speechx/decoder/param.h
+++ b/speechx/speechx/decoder/param.h
@@ -19,23 +19,24 @@
 #include "decoder/ctc_tlg_decoder.h"
 #include "frontend/audio/feature_pipeline.h"
 
+// feature
+DEFINE_bool(use_fbank, false, "False for fbank; or linear feature");
+// DEFINE_bool(to_float32, true, "audio convert to pcm32. True for linear
+// feature, or fbank");
+DEFINE_int32(num_bins, 161, "num bins of mel");
 DEFINE_string(cmvn_file, "", "read cmvn");
 DEFINE_double(streaming_chunk, 0.1, "streaming feature chunk size");
-DEFINE_bool(convert2PCM32, true, "audio convert to pcm32");
-DEFINE_string(model_path, "avg_1.jit.pdmodel", "paddle nnet model");
-DEFINE_string(param_path, "avg_1.jit.pdiparams", "paddle nnet model param");
-DEFINE_string(word_symbol_table, "words.txt", "word symbol table");
-DEFINE_string(graph_path, "TLG", "decoder graph");
-DEFINE_double(acoustic_scale, 1.0, "acoustic scale");
-DEFINE_int32(max_active, 7500, "max active");
-DEFINE_double(beam, 15.0, "decoder beam");
-DEFINE_double(lattice_beam, 7.5, "decoder beam");
+// feature sliding window
 DEFINE_int32(receptive_field_length,
              7,
              "receptive field of two CNN(kernel=5) downsampling module.");
 DEFINE_int32(downsampling_rate,
              4,
              "two CNN(kernel=5) module downsampling rate.");
+
+// nnet
+DEFINE_string(model_path, "avg_1.jit.pdmodel", "paddle nnet model");
+DEFINE_string(param_path, "avg_1.jit.pdiparams", "paddle nnet model param");
 DEFINE_string(
     model_input_names,
     "audio_chunk,audio_chunk_lens,chunk_state_h_box,chunk_state_c_box",
@@ -43,8 +44,18 @@ DEFINE_string(
 DEFINE_string(model_output_names,
               "softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0",
               "model output names");
-DEFINE_string(model_cache_names, "5-1-1024,5-1-1024", "model cache names");
+DEFINE_string(model_cache_names,
+              "chunk_state_h_box,chunk_state_c_box",
+              "model cache names");
+DEFINE_string(model_cache_shapes, "5-1-1024,5-1-1024", "model cache shapes");
 
+// decoder
+DEFINE_string(word_symbol_table, "words.txt", "word symbol table");
+DEFINE_string(graph_path, "TLG", "decoder graph");
+DEFINE_double(acoustic_scale, 1.0, "acoustic scale");
+DEFINE_int32(max_active, 7500, "max active");
+DEFINE_double(beam, 15.0, "decoder beam");
+DEFINE_double(lattice_beam, 7.5, "decoder beam");
 
 namespace ppspeech {
 // todo refactor later
@@ -52,15 +63,24 @@ FeaturePipelineOptions InitFeaturePipelineOptions() {
     FeaturePipelineOptions opts;
     opts.cmvn_file = FLAGS_cmvn_file;
     opts.linear_spectrogram_opts.streaming_chunk = FLAGS_streaming_chunk;
-    opts.convert2PCM32 = FLAGS_convert2PCM32;
     kaldi::FrameExtractionOptions frame_opts;
-    frame_opts.frame_length_ms = 20;
-    frame_opts.frame_shift_ms = 10;
-    frame_opts.remove_dc_offset = false;
-    frame_opts.window_type = "hanning";
-    frame_opts.preemph_coeff = 0.0;
     frame_opts.dither = 0.0;
-    opts.linear_spectrogram_opts.frame_opts = frame_opts;
+    frame_opts.frame_shift_ms = 10;
+    opts.use_fbank = FLAGS_use_fbank;
+    if (opts.use_fbank) {
+        opts.to_float32 = false;
+        frame_opts.window_type = "povey";
+        frame_opts.frame_length_ms = 25;
+        opts.fbank_opts.fbank_opts.mel_opts.num_bins = FLAGS_num_bins;
+        opts.fbank_opts.fbank_opts.frame_opts = frame_opts;
+    } else {
+        opts.to_float32 = true;
+        frame_opts.remove_dc_offset = false;
+        frame_opts.frame_length_ms = 20;
+        frame_opts.window_type = "hanning";
+        frame_opts.preemph_coeff = 0.0;
+        opts.linear_spectrogram_opts.frame_opts = frame_opts;
+    }
     opts.feature_cache_opts.frame_chunk_size = FLAGS_receptive_field_length;
     opts.feature_cache_opts.frame_chunk_stride = FLAGS_downsampling_rate;
     return opts;
@@ -70,7 +90,9 @@ ModelOptions InitModelOptions() {
     ModelOptions model_opts;
     model_opts.model_path = FLAGS_model_path;
     model_opts.param_path = FLAGS_param_path;
-    model_opts.cache_shape = FLAGS_model_cache_names;
+    model_opts.cache_names = FLAGS_model_cache_names;
+    model_opts.cache_shape = FLAGS_model_cache_shapes;
+    model_opts.input_names = FLAGS_model_input_names;
     model_opts.output_names = FLAGS_model_output_names;
     return model_opts;
 }
diff --git a/speechx/speechx/frontend/audio/CMakeLists.txt b/speechx/speechx/frontend/audio/CMakeLists.txt
index 2d20edf718ae0e4c9162b479a346b42390d52c96..745832fe7c79cfd87f94a53b7c00e594a6ce3e0f 100644
--- a/speechx/speechx/frontend/audio/CMakeLists.txt
+++ b/speechx/speechx/frontend/audio/CMakeLists.txt
@@ -7,6 +7,7 @@ add_library(frontend STATIC
   audio_cache.cc
   feature_cache.cc
   feature_pipeline.cc
+  fbank.cc
 )
 
-target_link_libraries(frontend PUBLIC kaldi-matrix kaldi-feat-common)
+target_link_libraries(frontend PUBLIC kaldi-matrix kaldi-feat-common kaldi-fbank)
diff --git a/speechx/speechx/frontend/audio/audio_cache.cc b/speechx/speechx/frontend/audio/audio_cache.cc
index e8af6668d420bd2dee6543eff6173b5b4ca80660..b7a15acd745a245007ad1825708dd20efcb38b58 100644
--- a/speechx/speechx/frontend/audio/audio_cache.cc
+++ b/speechx/speechx/frontend/audio/audio_cache.cc
@@ -21,17 +21,18 @@ using kaldi::BaseFloat;
 using kaldi::VectorBase;
 using kaldi::Vector;
 
-AudioCache::AudioCache(int buffer_size, bool convert2PCM32)
+AudioCache::AudioCache(int buffer_size, bool to_float32)
     : finished_(false),
-      capacity_(buffer_size),
+      capacity_(buffer_size),  // unit: sample
       size_(0),
       offset_(0),
-      timeout_(1),
-      convert2PCM32_(convert2PCM32) {
+      timeout_(1),  // ms
+      to_float32_(to_float32) {
     ring_buffer_.resize(capacity_);
 }
 
 BaseFloat AudioCache::Convert2PCM32(BaseFloat val) {
+    // sample type int16, int16->float32
     return val * (1. / std::pow(2.0, 15));
 }
 
@@ -43,8 +44,7 @@ void AudioCache::Accept(const VectorBase<BaseFloat>& waves) {
     for (size_t idx = 0; idx < waves.Dim(); ++idx) {
         int32 buffer_idx = (idx + offset_ + size_) % ring_buffer_.size();
         ring_buffer_[buffer_idx] = waves(idx);
-        if (convert2PCM32_)
-            ring_buffer_[buffer_idx] = Convert2PCM32(waves(idx));
+        if (to_float32_) ring_buffer_[buffer_idx] = Convert2PCM32(waves(idx));
     }
     size_ += waves.Dim();
 }
diff --git a/speechx/speechx/frontend/audio/audio_cache.h b/speechx/speechx/frontend/audio/audio_cache.h
index a681ef092162583f255c2b2938fd4dad5f31e338..4ebcd9474d1fe70398d5888589ba4223e9a63326 100644
--- a/speechx/speechx/frontend/audio/audio_cache.h
+++ b/speechx/speechx/frontend/audio/audio_cache.h
@@ -24,7 +24,7 @@ namespace ppspeech {
 class AudioCache : public FrontendInterface {
   public:
     explicit AudioCache(int buffer_size = 1000 * kint16max,
-                        bool convert2PCM32 = true);
+                        bool to_float32 = false);
 
     virtual void Accept(const kaldi::VectorBase<BaseFloat>& waves);
 
@@ -50,14 +50,15 @@ class AudioCache : public FrontendInterface {
     kaldi::BaseFloat Convert2PCM32(kaldi::BaseFloat val);
 
     std::vector<kaldi::BaseFloat> ring_buffer_;
-    size_t offset_;    // offset in ring_buffer_
-    size_t size_;      // samples in ring_buffer_ now
-    size_t capacity_;  // capacity of ring_buffer_
+    size_t offset_;    // offset in ring_buffer_, begin of data
+    size_t size_;      // samples in ring_buffer_, size of valid data
+    size_t capacity_;  // capacity of ring_buffer_, full size of data buffer,
+                       // unit: sample
     bool finished_;    // reach audio end
     std::mutex mutex_;
     std::condition_variable ready_feed_condition_;
     kaldi::int32 timeout_;  // millisecond
-    bool convert2PCM32_;
+    bool to_float32_;       // int16 -> float32. used in linear_spectrogram
 
     DISALLOW_COPY_AND_ASSIGN(AudioCache);
 };
diff --git a/speechx/speechx/frontend/audio/cmvn.cc b/speechx/speechx/frontend/audio/cmvn.cc
index c7e446c92655cad25ff8ad39eb4920deb742af65..1ea83aba5b335f77ba5de58b2317464a8f9e617f 100644
--- a/speechx/speechx/frontend/audio/cmvn.cc
+++ b/speechx/speechx/frontend/audio/cmvn.cc
@@ -37,14 +37,17 @@ CMVN::CMVN(std::string cmvn_file, unique_ptr<FrontendInterface> base_extractor)
 }
 
 void CMVN::Accept(const kaldi::VectorBase<kaldi::BaseFloat>& inputs) {
+    // feed waves/feats to compute feature
     base_extractor_->Accept(inputs);
     return;
 }
 
 bool CMVN::Read(kaldi::Vector<BaseFloat>* feats) {
+    // compute feature
     if (base_extractor_->Read(feats) == false || feats->Dim() == 0) {
         return false;
     }
+    // appply cmvn
     Compute(feats);
     return true;
 }
diff --git a/speechx/speechx/frontend/audio/data_cache.h b/speechx/speechx/frontend/audio/data_cache.h
index a812278ce2e1aa8fb66c57885e36f324e25fe078..64e9db860ec7fd978e2623de453d8be6d622640c 100644
--- a/speechx/speechx/frontend/audio/data_cache.h
+++ b/speechx/speechx/frontend/audio/data_cache.h
@@ -21,12 +21,15 @@
 
 
 namespace ppspeech {
-// A data source for testing different frontend module.
-// It accepts waves or feats.
+
+// Simulates audio/feature input, by returning data from a Vector.
+// This class is mostly meant to be used for online decoder testing using
+// pre-recorded audio/feature
 class DataCache : public FrontendInterface {
   public:
     explicit DataCache() { finished_ = false; }
 
+    // accept waves/feats
     virtual void Accept(const kaldi::VectorBase<kaldi::BaseFloat>& inputs) {
         data_ = inputs;
     }
diff --git a/speechx/speechx/frontend/audio/fbank.cc b/speechx/speechx/frontend/audio/fbank.cc
new file mode 100644
index 0000000000000000000000000000000000000000..fea9032acf7a799785b620610987e4d0bc170c59
--- /dev/null
+++ b/speechx/speechx/frontend/audio/fbank.cc
@@ -0,0 +1,124 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+
+#include "frontend/audio/fbank.h"
+#include "kaldi/base/kaldi-math.h"
+#include "kaldi/feat/feature-common.h"
+#include "kaldi/feat/feature-functions.h"
+#include "kaldi/matrix/matrix-functions.h"
+
+namespace ppspeech {
+
+using kaldi::int32;
+using kaldi::BaseFloat;
+using kaldi::Vector;
+using kaldi::SubVector;
+using kaldi::VectorBase;
+using kaldi::Matrix;
+using std::vector;
+
+// todo refactor later:(SmileGoat)
+
+Fbank::Fbank(const FbankOptions& opts,
+             std::unique_ptr<FrontendInterface> base_extractor)
+    : opts_(opts),
+      computer_(opts.fbank_opts),
+      window_function_(opts.fbank_opts.frame_opts) {
+    base_extractor_ = std::move(base_extractor);
+    chunk_sample_size_ = static_cast<int32>(
+        opts.streaming_chunk * opts.fbank_opts.frame_opts.samp_freq);
+}
+
+void Fbank::Accept(const VectorBase<BaseFloat>& inputs) {
+    base_extractor_->Accept(inputs);
+}
+
+bool Fbank::Read(Vector<BaseFloat>* feats) {
+    Vector<BaseFloat> wav(chunk_sample_size_);
+    bool flag = base_extractor_->Read(&wav);
+    if (flag == false || wav.Dim() == 0) return false;
+
+    // append remaned waves
+    int32 wav_len = wav.Dim();
+    int32 left_len = remained_wav_.Dim();
+    Vector<BaseFloat> waves(left_len + wav_len);
+    waves.Range(0, left_len).CopyFromVec(remained_wav_);
+    waves.Range(left_len, wav_len).CopyFromVec(wav);
+
+    // compute speech feature
+    Compute(waves, feats);
+
+    // cache remaned waves
+    kaldi::FrameExtractionOptions frame_opts = computer_.GetFrameOptions();
+    int32 num_frames = kaldi::NumFrames(waves.Dim(), frame_opts);
+    int32 frame_shift = frame_opts.WindowShift();
+    int32 left_samples = waves.Dim() - frame_shift * num_frames;
+    remained_wav_.Resize(left_samples);
+    remained_wav_.CopyFromVec(
+        waves.Range(frame_shift * num_frames, left_samples));
+    return true;
+}
+
+// Compute spectrogram feat
+bool Fbank::Compute(const Vector<BaseFloat>& waves, Vector<BaseFloat>* feats) {
+    const kaldi::FrameExtractionOptions& frame_opts =
+        computer_.GetFrameOptions();
+    int32 num_samples = waves.Dim();
+    int32 frame_length = frame_opts.WindowSize();
+    int32 sample_rate = frame_opts.samp_freq;
+    if (num_samples < frame_length) {
+        return true;
+    }
+
+    int32 num_frames = kaldi::NumFrames(num_samples, frame_opts);
+    feats->Resize(num_frames * Dim());
+
+    Vector<BaseFloat> window;
+    bool need_raw_log_energy = computer_.NeedRawLogEnergy();
+    for (int32 frame = 0; frame < num_frames; frame++) {
+        BaseFloat raw_log_energy = 0.0;
+        kaldi::ExtractWindow(0,
+                             waves,
+                             frame,
+                             frame_opts,
+                             window_function_,
+                             &window,
+                             need_raw_log_energy ? &raw_log_energy : NULL);
+
+
+        Vector<BaseFloat> this_feature(computer_.Dim(), kaldi::kUndefined);
+        // note: this online feature-extraction code does not support VTLN.
+        RealFft(&window, true);
+        kaldi::ComputePowerSpectrum(&window);
+        const kaldi::MelBanks& mel_bank = *(computer_.GetMelBanks(1.0));
+        SubVector<BaseFloat> power_spectrum(window, 0, window.Dim() / 2 + 1);
+        if (!opts_.fbank_opts.use_power) {
+            power_spectrum.ApplyPow(0.5);
+        }
+        int32 mel_offset =
+            ((opts_.fbank_opts.use_energy && !opts_.fbank_opts.htk_compat) ? 1
+                                                                           : 0);
+        SubVector<BaseFloat> mel_energies(
+            this_feature, mel_offset, opts_.fbank_opts.mel_opts.num_bins);
+        mel_bank.Compute(power_spectrum, &mel_energies);
+        mel_energies.ApplyFloor(1e-07);
+        mel_energies.ApplyLog();
+        SubVector<BaseFloat> output_row(feats->Data() + frame * Dim(), Dim());
+        output_row.CopyFromVec(this_feature);
+    }
+    return true;
+}
+
+}  // namespace ppspeech
diff --git a/speechx/speechx/frontend/audio/fbank.h b/speechx/speechx/frontend/audio/fbank.h
index 68267b3d0efee480c89b0816489c6014baceb13f..66957dc6dfe0fcddaac63829c431b7a64776da9a 100644
--- a/speechx/speechx/frontend/audio/fbank.h
+++ b/speechx/speechx/frontend/audio/fbank.h
@@ -12,26 +12,66 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-// wrap the fbank feat of kaldi, todo (SmileGoat)
+#pragma once
 
+#include "base/common.h"
+#include "frontend/audio/frontend_itf.h"
+#include "kaldi/feat/feature-fbank.h"
 #include "kaldi/feat/feature-mfcc.h"
-
-#incldue "kaldi/matrix/kaldi-vector.h"
+#include "kaldi/matrix/kaldi-vector.h"
 
 namespace ppspeech {
 
-class FbankExtractor : FrontendInterface {
+struct FbankOptions {
+    kaldi::FbankOptions fbank_opts;
+    kaldi::BaseFloat streaming_chunk;  // second
+
+    FbankOptions() : streaming_chunk(0.1), fbank_opts() {}
+
+    void Register(kaldi::OptionsItf* opts) {
+        opts->Register("streaming-chunk",
+                       &streaming_chunk,
+                       "streaming chunk size, default: 0.1 sec");
+        fbank_opts.Register(opts);
+    }
+};
+
+
+class Fbank : public FrontendInterface {
   public:
-    explicit FbankExtractor(const FbankOptions& opts,
-                            share_ptr<FrontendInterface> pre_extractor);
-    virtual void AcceptWaveform(
-        const kaldi::Vector<kaldi::BaseFloat>& input) = 0;
-    virtual void Read(kaldi::Vector<kaldi::BaseFloat>* feat) = 0;
-    virtual size_t Dim() const = 0;
+    explicit Fbank(const FbankOptions& opts,
+                   std::unique_ptr<FrontendInterface> base_extractor);
+    virtual void Accept(const kaldi::VectorBase<kaldi::BaseFloat>& inputs);
+    virtual bool Read(kaldi::Vector<kaldi::BaseFloat>* feats);
+
+    // the dim_ is the dim of single frame feature
+    virtual size_t Dim() const { return computer_.Dim(); }
+
+    virtual void SetFinished() { base_extractor_->SetFinished(); }
+
+    virtual bool IsFinished() const { return base_extractor_->IsFinished(); }
+
+    virtual void Reset() {
+        base_extractor_->Reset();
+        remained_wav_.Resize(0);
+    }
 
   private:
-    bool Compute(const kaldi::Vector<kaldi::BaseFloat>& wave,
-                 kaldi::Vector<kaldi::BaseFloat>* feat) const;
+    bool Compute(const kaldi::Vector<kaldi::BaseFloat>& waves,
+                 kaldi::Vector<kaldi::BaseFloat>* feats);
+
+    FbankOptions opts_;
+    std::unique_ptr<FrontendInterface> base_extractor_;
+
+    kaldi::FeatureWindowFunction window_function_;
+    kaldi::FbankComputer computer_;
+    // features_ is the Mfcc or Plp or Fbank features that we have already
+    // computed.
+    kaldi::Vector<kaldi::BaseFloat> features_;
+    kaldi::Vector<kaldi::BaseFloat> remained_wav_;
+    kaldi::int32 chunk_sample_size_;
+
+    DISALLOW_COPY_AND_ASSIGN(Fbank);
 };
 
-}  // namespace ppspeech
\ No newline at end of file
+}  // namespace ppspeech
diff --git a/speechx/speechx/frontend/audio/feature_cache.cc b/speechx/speechx/frontend/audio/feature_cache.cc
index b5768460e9aeb188303cbac5d11f504bba96d8a6..05283bb7e51e863759db8b728ce54d5448c90a88 100644
--- a/speechx/speechx/frontend/audio/feature_cache.cc
+++ b/speechx/speechx/frontend/audio/feature_cache.cc
@@ -28,11 +28,13 @@ FeatureCache::FeatureCache(FeatureCacheOptions opts,
     max_size_ = opts.max_size;
     frame_chunk_stride_ = opts.frame_chunk_stride;
     frame_chunk_size_ = opts.frame_chunk_size;
+    timeout_ = opts.timeout;  // ms
     base_extractor_ = std::move(base_extractor);
     dim_ = base_extractor_->Dim();
 }
 
 void FeatureCache::Accept(const kaldi::VectorBase<kaldi::BaseFloat>& inputs) {
+    // read inputs
     base_extractor_->Accept(inputs);
     // feed current data
     bool result = false;
@@ -49,14 +51,15 @@ bool FeatureCache::Read(kaldi::Vector<kaldi::BaseFloat>* feats) {
     while (cache_.empty() && base_extractor_->IsFinished() == false) {
         // todo refactor: wait
         // ready_read_condition_.wait(lock);
-        int32 elapsed = static_cast<int32>(timer.Elapsed() * 1000);
-        // todo replace 1 with timeout_, 1 ms
-        if (elapsed > 1) {
+        int32 elapsed = static_cast<int32>(timer.Elapsed() * 1000);  // ms
+        if (elapsed > timeout_) {
             return false;
         }
         usleep(100);  // sleep 0.1 ms
     }
     if (cache_.empty()) return false;
+
+    // read from cache
     feats->Resize(cache_.front().Dim());
     feats->CopyFromVec(cache_.front());
     cache_.pop();
@@ -70,18 +73,22 @@ bool FeatureCache::Compute() {
     Vector<BaseFloat> feature;
     bool result = base_extractor_->Read(&feature);
     if (result == false || feature.Dim() == 0) return false;
-    int32 joint_len = feature.Dim() + remained_feature_.Dim();
-    int32 num_chunk =
-        ((joint_len / dim_) - frame_chunk_size_) / frame_chunk_stride_ + 1;
 
+    // join with remained
+    int32 joint_len = feature.Dim() + remained_feature_.Dim();
     Vector<BaseFloat> joint_feature(joint_len);
     joint_feature.Range(0, remained_feature_.Dim())
         .CopyFromVec(remained_feature_);
     joint_feature.Range(remained_feature_.Dim(), feature.Dim())
         .CopyFromVec(feature);
 
+    // one by one, or stride with window
+    // controlled by frame_chunk_stride_ and frame_chunk_size_
+    int32 num_chunk =
+        ((joint_len / dim_) - frame_chunk_size_) / frame_chunk_stride_ + 1;
     for (int chunk_idx = 0; chunk_idx < num_chunk; ++chunk_idx) {
         int32 start = chunk_idx * frame_chunk_stride_ * dim_;
+
         Vector<BaseFloat> feature_chunk(frame_chunk_size_ * dim_);
         SubVector<BaseFloat> tmp(joint_feature.Data() + start,
                                  frame_chunk_size_ * dim_);
@@ -89,6 +96,7 @@ bool FeatureCache::Compute() {
 
         std::unique_lock<std::mutex> lock(mutex_);
         while (cache_.size() >= max_size_) {
+            // cache full, wait
             ready_feed_condition_.wait(lock);
         }
 
@@ -96,6 +104,8 @@ bool FeatureCache::Compute() {
         cache_.push(feature_chunk);
         ready_read_condition_.notify_one();
     }
+
+    // cache remained feats
     int32 remained_feature_len =
         joint_len - num_chunk * frame_chunk_stride_ * dim_;
     remained_feature_.Resize(remained_feature_len);
diff --git a/speechx/speechx/frontend/audio/feature_cache.h b/speechx/speechx/frontend/audio/feature_cache.h
index 607f72c08f9313ecf9e21e642272d909230890b1..0dc704bbff9c268652d311571d33218b902b01cc 100644
--- a/speechx/speechx/frontend/audio/feature_cache.h
+++ b/speechx/speechx/frontend/audio/feature_cache.h
@@ -23,8 +23,12 @@ struct FeatureCacheOptions {
     int32 max_size;
     int32 frame_chunk_size;
     int32 frame_chunk_stride;
+    int32 timeout;  // ms
     FeatureCacheOptions()
-        : max_size(kint16max), frame_chunk_size(1), frame_chunk_stride(1) {}
+        : max_size(kint16max),
+          frame_chunk_size(1),
+          frame_chunk_stride(1),
+          timeout(1) {}
 };
 
 class FeatureCache : public FrontendInterface {
@@ -64,14 +68,15 @@ class FeatureCache : public FrontendInterface {
     bool Compute();
 
     int32 dim_;
-    size_t max_size_;
-    int32 frame_chunk_size_;
-    int32 frame_chunk_stride_;
+    size_t max_size_;           // cache capacity
+    int32 frame_chunk_size_;    // window
+    int32 frame_chunk_stride_;  // stride
+    std::unique_ptr<FrontendInterface> base_extractor_;
 
+    kaldi::int32 timeout_;  // ms
     kaldi::Vector<kaldi::BaseFloat> remained_feature_;
-    std::unique_ptr<FrontendInterface> base_extractor_;
+    std::queue<kaldi::Vector<BaseFloat>> cache_;  // feature cache
     std::mutex mutex_;
-    std::queue<kaldi::Vector<BaseFloat>> cache_;
     std::condition_variable ready_feed_condition_;
     std::condition_variable ready_read_condition_;
 
diff --git a/speechx/speechx/frontend/audio/feature_pipeline.cc b/speechx/speechx/frontend/audio/feature_pipeline.cc
index 86eca2e05a33053b573a8c3a73a4104ce6bf491d..087de0f0d14cc7c760389b4fb69ad216e0b77db9 100644
--- a/speechx/speechx/frontend/audio/feature_pipeline.cc
+++ b/speechx/speechx/frontend/audio/feature_pipeline.cc
@@ -20,14 +20,20 @@ using std::unique_ptr;
 
 FeaturePipeline::FeaturePipeline(const FeaturePipelineOptions& opts) {
     unique_ptr<FrontendInterface> data_source(
-        new ppspeech::AudioCache(1000 * kint16max, opts.convert2PCM32));
+        new ppspeech::AudioCache(1000 * kint16max, opts.to_float32));
 
-    unique_ptr<FrontendInterface> linear_spectrogram(
-        new ppspeech::LinearSpectrogram(opts.linear_spectrogram_opts,
-                                        std::move(data_source)));
+    unique_ptr<FrontendInterface> base_feature;
+
+    if (opts.use_fbank) {
+        base_feature.reset(
+            new ppspeech::Fbank(opts.fbank_opts, std::move(data_source)));
+    } else {
+        base_feature.reset(new ppspeech::LinearSpectrogram(
+            opts.linear_spectrogram_opts, std::move(data_source)));
+    }
 
     unique_ptr<FrontendInterface> cmvn(
-        new ppspeech::CMVN(opts.cmvn_file, std::move(linear_spectrogram)));
+        new ppspeech::CMVN(opts.cmvn_file, std::move(base_feature)));
 
     base_extractor_.reset(
         new ppspeech::FeatureCache(opts.feature_cache_opts, std::move(cmvn)));
diff --git a/speechx/speechx/frontend/audio/feature_pipeline.h b/speechx/speechx/frontend/audio/feature_pipeline.h
index 7bd6c84f95a1310fd599a7d71ac3bd7d7b55448c..6b9b4795e5431b20116a1e7cbed59415e8f1e0c7 100644
--- a/speechx/speechx/frontend/audio/feature_pipeline.h
+++ b/speechx/speechx/frontend/audio/feature_pipeline.h
@@ -18,6 +18,7 @@
 
 #include "frontend/audio/audio_cache.h"
 #include "frontend/audio/data_cache.h"
+#include "frontend/audio/fbank.h"
 #include "frontend/audio/feature_cache.h"
 #include "frontend/audio/frontend_itf.h"
 #include "frontend/audio/linear_spectrogram.h"
@@ -27,13 +28,17 @@ namespace ppspeech {
 
 struct FeaturePipelineOptions {
     std::string cmvn_file;
-    bool convert2PCM32;
+    bool to_float32;  // true, only for linear feature
+    bool use_fbank;
     LinearSpectrogramOptions linear_spectrogram_opts;
+    FbankOptions fbank_opts;
     FeatureCacheOptions feature_cache_opts;
     FeaturePipelineOptions()
         : cmvn_file(""),
-          convert2PCM32(false),
+          to_float32(false),  // true, only for linear feature
+          use_fbank(true),
           linear_spectrogram_opts(),
+          fbank_opts(),
           feature_cache_opts() {}
 };
 
diff --git a/speechx/speechx/frontend/audio/mfcc.cc b/speechx/speechx/frontend/audio/mfcc.cc
new file mode 100644
index 0000000000000000000000000000000000000000..bda1f96d71d587e46579c5cadc73ead3433eb9b5
--- /dev/null
+++ b/speechx/speechx/frontend/audio/mfcc.cc
@@ -0,0 +1,108 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+
+#include "frontend/audio/mfcc.h"
+#include "kaldi/base/kaldi-math.h"
+#include "kaldi/feat/feature-common.h"
+#include "kaldi/feat/feature-functions.h"
+#include "kaldi/matrix/matrix-functions.h"
+
+namespace ppspeech {
+
+using kaldi::int32;
+using kaldi::BaseFloat;
+using kaldi::Vector;
+using kaldi::SubVector;
+using kaldi::VectorBase;
+using kaldi::Matrix;
+using std::vector;
+
+Mfcc::Mfcc(const MfccOptions& opts,
+           std::unique_ptr<FrontendInterface> base_extractor)
+    : opts_(opts),
+      computer_(opts.mfcc_opts),
+      window_function_(computer_.GetFrameOptions()) {
+    base_extractor_ = std::move(base_extractor);
+    chunk_sample_size_ =
+        static_cast<int32>(opts.streaming_chunk * opts.frame_opts.samp_freq);
+}
+
+void Mfcc::Accept(const VectorBase<BaseFloat>& inputs) {
+    base_extractor_->Accept(inputs);
+}
+
+bool Mfcc::Read(Vector<BaseFloat>* feats) {
+    Vector<BaseFloat> wav(chunk_sample_size_);
+    bool flag = base_extractor_->Read(&wav);
+    if (flag == false || wav.Dim() == 0) return false;
+
+    // append remaned waves
+    int32 wav_len = wav.Dim();
+    int32 left_len = remained_wav_.Dim();
+    Vector<BaseFloat> waves(left_len + wav_len);
+    waves.Range(0, left_len).CopyFromVec(remained_wav_);
+    waves.Range(left_len, wav_len).CopyFromVec(wav);
+
+    // compute speech feature
+    Compute(waves, feats);
+
+    // cache remaned waves
+    kaldi::FrameExtractionOptions frame_opts = computer_.GetFrameOptions();
+    int32 num_frames = kaldi::NumFrames(waves.Dim(), frame_opts);
+    int32 frame_shift = frame_opts.WindowShift();
+    int32 left_samples = waves.Dim() - frame_shift * num_frames;
+    remained_wav_.Resize(left_samples);
+    remained_wav_.CopyFromVec(
+        waves.Range(frame_shift * num_frames, left_samples));
+    return true;
+}
+
+// Compute spectrogram feat
+bool Mfcc::Compute(const Vector<BaseFloat>& waves, Vector<BaseFloat>* feats) {
+    const FrameExtractionOptions& frame_opts = computer_.GetFrameOptions();
+    int32 num_samples = waves.Dim();
+    int32 frame_length = frame_opts.WindowSize();
+    int32 sample_rate = frame_opts.samp_freq;
+    if (num_samples < frame_length) {
+        return true;
+    }
+
+    int32 num_frames = kaldi::NumFrames(num_samples, frame_opts);
+    feats->Rsize(num_frames * Dim());
+
+    Vector<BaseFloat> window;
+    bool need_raw_log_energy = computer_.NeedRawLogEnergy();
+    for (int32 frame = 0; frame < num_frames; frame++) {
+        BaseFloat raw_log_energy = 0.0;
+        kaldi::ExtractWindow(0,
+                             waves,
+                             frame,
+                             frame_opts,
+                             window_function_,
+                             &window,
+                             need_raw_log_energy ? &raw_log_energy : NULL);
+
+
+        Vector<BaseFloat> this_feature(computer_.Dim(), kUndefined);
+        // note: this online feature-extraction code does not support VTLN.
+        BaseFloat vtln_warp = 1.0;
+        computer_.Compute(raw_log_energy, vtln_warp, &window, &this_feature);
+        SubVector<BaseFloat> output_row(feats->Data() + frame * Dim(), Dim());
+        output_row.CopyFromVec(this_feature);
+    }
+    return true;
+}
+
+}  // namespace ppspeech
\ No newline at end of file
diff --git a/speechx/speechx/frontend/audio/mfcc.h b/speechx/speechx/frontend/audio/mfcc.h
index aa369655e53cc187bb79b118c2b3594bb5a882e3..62b0078c7d4fe3f1509a72318470fc23b53bc1b1 100644
--- a/speechx/speechx/frontend/audio/mfcc.h
+++ b/speechx/speechx/frontend/audio/mfcc.h
@@ -12,5 +12,65 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-// wrap the mfcc feat of kaldi, todo (SmileGoat)
-#include "kaldi/feat/feature-mfcc.h"
\ No newline at end of file
+#pragma once
+
+#include "kaldi/feat/feature-mfcc.h"
+#include "kaldi/feat/feature-mfcc.h"
+#include "kaldi/matrix/kaldi-vector.h"
+
+namespace ppspeech {
+
+struct MfccOptions {
+    kaldi::MfccOptions mfcc_opts;
+    kaldi::BaseFloat streaming_chunk;  // second
+
+    MfccOptions() : streaming_chunk(0.1), mfcc_opts() {}
+
+    void Register(kaldi::OptionsItf* opts) {
+        opts->Register("streaming-chunk",
+                       &streaming_chunk,
+                       "streaming chunk size, default: 0.1 sec");
+        mfcc_opts.Register(opts);
+    }
+};
+
+
+class Mfcc : public FrontendInterface {
+  public:
+    explicit Mfcc(const MfccOptions& opts,
+                  unique_ptr<FrontendInterface> base_extractor);
+
+    virtual void Accept(const kaldi::VectorBase<kaldi::BaseFloat>& inputs);
+    virtual bool Read(kaldi::Vector<kaldi::BaseFloat>* feats);
+
+    // the dim_ is the dim of single frame feature
+    virtual size_t Dim() const { return computer_.Dim(); }
+
+    virtual void SetFinished() { base_extractor_->SetFinished(); }
+
+    virtual bool IsFinished() const { return base_extractor_->IsFinished(); }
+
+    virtual void Reset() {
+        base_extractor_->Reset();
+        remained_wav_.Resize(0);
+    }
+
+  private:
+    bool Compute(const kaldi::Vector<kaldi::BaseFloat>& waves,
+                 kaldi::Vector<kaldi::BaseFloat>* feats);
+
+    MfccOptions opts_;
+    std::unique_ptr<FrontendInterface> base_extractor_;
+
+
+    FeatureWindowFunction window_function_;
+    kaldi::MfccComputer computer_;
+    // features_ is the Mfcc or Plp or Fbank features that we have already
+    // computed.
+    kaldi::Vector<kaldi::BaseFloat> features_;
+    kaldi::Vector<kaldi::BaseFloat> remained_wav_;
+
+    DISALLOW_COPY_AND_ASSIGN(Fbank);
+};
+
+}  // namespace ppspeech
\ No newline at end of file
diff --git a/speechx/speechx/kaldi/feat/CMakeLists.txt b/speechx/speechx/kaldi/feat/CMakeLists.txt
index c3a996ffb20f7601fdb047dfd78846f1b52cc6cc..cfbf20256f188d2d85e0e9856b682e006decfd2f 100644
--- a/speechx/speechx/kaldi/feat/CMakeLists.txt
+++ b/speechx/speechx/kaldi/feat/CMakeLists.txt
@@ -3,10 +3,10 @@ add_library(kaldi-mfcc
 )
 target_link_libraries(kaldi-mfcc PUBLIC kaldi-feat-common)
 
-add_library(fbank
+add_library(kaldi-fbank
   feature-fbank.cc
 )
-target_link_libraries(fbank PUBLIC kaldi-feat-common)
+target_link_libraries(kaldi-fbank PUBLIC kaldi-feat-common)
 
 add_library(kaldi-feat-common
   wave-reader.cc
diff --git a/speechx/speechx/kaldi/feat/feature-fbank.h b/speechx/speechx/kaldi/feat/feature-fbank.h
index f57d185a41c4cfca96a7ee58f92773fe63849be8..d121cc0ee1a4aa4016d721e4ad856f3b095bf788 100644
--- a/speechx/speechx/kaldi/feat/feature-fbank.h
+++ b/speechx/speechx/kaldi/feat/feature-fbank.h
@@ -128,8 +128,8 @@ class FbankComputer {
 
   ~FbankComputer();
 
- private:
   const MelBanks *GetMelBanks(BaseFloat vtln_warp);
+ private:
 
 
   FbankOptions opts_;
diff --git a/speechx/speechx/kaldi/feat/mel-computations.cc b/speechx/speechx/kaldi/feat/mel-computations.cc
index bb5e9f9acffade3816985736fb6acb36998bb652..626cb67753b563b2ac2084354c1883925a848122 100644
--- a/speechx/speechx/kaldi/feat/mel-computations.cc
+++ b/speechx/speechx/kaldi/feat/mel-computations.cc
@@ -120,8 +120,8 @@ MelBanks::MelBanks(const MelBanksOptions &opts,
         last_index = i;
       }
     }
-    KALDI_ASSERT(first_index != -1 && last_index >= first_index
-                 && "You may have set --num-mel-bins too large.");
+    //KALDI_ASSERT(first_index != -1 && last_index >= first_index
+    //             && "You may have set --num-mel-bins too large.");
 
     bins_[bin].first = first_index;
     int32 size = last_index + 1 - first_index;
diff --git a/speechx/speechx/nnet/paddle_nnet.cc b/speechx/speechx/nnet/paddle_nnet.cc
index f8e1f697b8367eea17fe7c32341bbc8e575e388e..881a82f51774763dc8cd02e5f7b79316d988a652 100644
--- a/speechx/speechx/nnet/paddle_nnet.cc
+++ b/speechx/speechx/nnet/paddle_nnet.cc
@@ -74,6 +74,7 @@ PaddleNnet::PaddleNnet(const ModelOptions& opts) : opts_(opts) {
     LOG(INFO) << "output names: " << opts.output_names;
     vector<string> input_names_vec = absl::StrSplit(opts.input_names, ",");
     vector<string> output_names_vec = absl::StrSplit(opts.output_names, ",");
+
     paddle_infer::Predictor* predictor = GetPredictor();
 
     std::vector<std::string> model_input_names = predictor->GetInputNames();
@@ -87,6 +88,7 @@ PaddleNnet::PaddleNnet(const ModelOptions& opts) : opts_(opts) {
     for (size_t i = 0; i < output_names_vec.size(); i++) {
         assert(output_names_vec[i] == model_output_names[i]);
     }
+
     ReleasePredictor(predictor);
     InitCacheEncouts(opts);
 }
@@ -95,6 +97,7 @@ void PaddleNnet::Reset() { InitCacheEncouts(opts_); }
 
 paddle_infer::Predictor* PaddleNnet::GetPredictor() {
     paddle_infer::Predictor* predictor = nullptr;
+
     std::lock_guard<std::mutex> guard(pool_mutex);
     int pred_id = 0;
 
@@ -144,15 +147,19 @@ void PaddleNnet::FeedForward(const Vector<BaseFloat>& features,
                              Vector<BaseFloat>* inferences,
                              int32* inference_dim) {
     paddle_infer::Predictor* predictor = GetPredictor();
+
     int feat_row = features.Dim() / feature_dim;
+
     std::vector<std::string> input_names = predictor->GetInputNames();
     std::vector<std::string> output_names = predictor->GetOutputNames();
 
+    // feed inputs
     std::unique_ptr<paddle_infer::Tensor> input_tensor =
         predictor->GetInputHandle(input_names[0]);
     std::vector<int> INPUT_SHAPE = {1, feat_row, feature_dim};
     input_tensor->Reshape(INPUT_SHAPE);
     input_tensor->CopyFromCpu(features.Data());
+
     std::unique_ptr<paddle_infer::Tensor> input_len =
         predictor->GetInputHandle(input_names[1]);
     std::vector<int> input_len_size = {1};
@@ -161,32 +168,36 @@ void PaddleNnet::FeedForward(const Vector<BaseFloat>& features,
     audio_len.push_back(feat_row);
     input_len->CopyFromCpu(audio_len.data());
 
-    std::unique_ptr<paddle_infer::Tensor> h_box =
+    std::unique_ptr<paddle_infer::Tensor> state_h =
         predictor->GetInputHandle(input_names[2]);
     shared_ptr<Tensor<BaseFloat>> h_cache = GetCacheEncoder(input_names[2]);
-    h_box->Reshape(h_cache->get_shape());
-    h_box->CopyFromCpu(h_cache->get_data().data());
-    std::unique_ptr<paddle_infer::Tensor> c_box =
+    state_h->Reshape(h_cache->get_shape());
+    state_h->CopyFromCpu(h_cache->get_data().data());
+
+    std::unique_ptr<paddle_infer::Tensor> state_c =
         predictor->GetInputHandle(input_names[3]);
     shared_ptr<Tensor<float>> c_cache = GetCacheEncoder(input_names[3]);
-    c_box->Reshape(c_cache->get_shape());
-    c_box->CopyFromCpu(c_cache->get_data().data());
+    state_c->Reshape(c_cache->get_shape());
+    state_c->CopyFromCpu(c_cache->get_data().data());
+
+    // forward
     bool success = predictor->Run();
 
     if (success == false) {
         LOG(INFO) << "predictor run occurs error";
     }
 
+    // fetch outpus
     std::unique_ptr<paddle_infer::Tensor> h_out =
         predictor->GetOutputHandle(output_names[2]);
     assert(h_cache->get_shape() == h_out->shape());
     h_out->CopyToCpu(h_cache->get_data().data());
+
     std::unique_ptr<paddle_infer::Tensor> c_out =
         predictor->GetOutputHandle(output_names[3]);
     assert(c_cache->get_shape() == c_out->shape());
     c_out->CopyToCpu(c_cache->get_data().data());
 
-    // get result
     std::unique_ptr<paddle_infer::Tensor> output_tensor =
         predictor->GetOutputHandle(output_names[0]);
     std::vector<int> output_shape = output_tensor->shape();
@@ -195,6 +206,7 @@ void PaddleNnet::FeedForward(const Vector<BaseFloat>& features,
     inferences->Resize(row * col);
     *inference_dim = col;
     output_tensor->CopyToCpu(inferences->Data());
+
     ReleasePredictor(predictor);
 }
 
diff --git a/speechx/speechx/nnet/paddle_nnet.h b/speechx/speechx/nnet/paddle_nnet.h
index 8b4ed4785fe824c2a0ca624dd91e74d24a15a916..e2b3d5bc495ca1f15ab012de895b26486c2f2693 100644
--- a/speechx/speechx/nnet/paddle_nnet.h
+++ b/speechx/speechx/nnet/paddle_nnet.h
@@ -24,7 +24,7 @@ namespace ppspeech {
 struct ModelOptions {
     std::string model_path;
     std::string param_path;
-    int thread_num;
+    int thread_num;  // predictor thread pool size
     bool use_gpu;
     bool switch_ir_optim;
     std::string input_names;
@@ -34,19 +34,14 @@ struct ModelOptions {
     bool enable_fc_padding;
     bool enable_profile;
     ModelOptions()
-        : model_path("avg_1.jit.pdmodel"),
-          param_path("avg_1.jit.pdiparams"),
+        : model_path(""),
+          param_path(""),
           thread_num(2),
           use_gpu(false),
-          input_names(
-              "audio_chunk,audio_chunk_lens,chunk_state_h_box,chunk_state_c_"
-              "box"),
-          output_names(
-              "save_infer_model/scale_0.tmp_1,save_infer_model/"
-              "scale_1.tmp_1,save_infer_model/scale_2.tmp_1,save_infer_model/"
-              "scale_3.tmp_1"),
-          cache_names("chunk_state_h_box,chunk_state_c_box"),
-          cache_shape("3-1-1024,3-1-1024"),
+          input_names(""),
+          output_names(""),
+          cache_names(""),
+          cache_shape(""),
           switch_ir_optim(false),
           enable_fc_padding(false),
           enable_profile(false) {}
@@ -76,17 +71,19 @@ class Tensor {
   public:
     Tensor() {}
     Tensor(const std::vector<int>& shape) : _shape(shape) {
-        int data_size = std::accumulate(
+        int neml = std::accumulate(
             _shape.begin(), _shape.end(), 1, std::multiplies<int>());
-        LOG(INFO) << "data size: " << data_size;
-        _data.resize(data_size, 0);
+        LOG(INFO) << "Tensor neml: " << neml;
+        _data.resize(neml, 0);
     }
+
     void reshape(const std::vector<int>& shape) {
         _shape = shape;
-        int data_size = std::accumulate(
+        int neml = std::accumulate(
             _shape.begin(), _shape.end(), 1, std::multiplies<int>());
-        _data.resize(data_size, 0);
+        _data.resize(neml, 0);
     }
+
     const std::vector<int>& get_shape() const { return _shape; }
     std::vector<T>& get_data() { return _data; }
 
@@ -98,10 +95,12 @@ class Tensor {
 class PaddleNnet : public NnetInterface {
   public:
     PaddleNnet(const ModelOptions& opts);
+
     virtual void FeedForward(const kaldi::Vector<kaldi::BaseFloat>& features,
                              int32 feature_dim,
                              kaldi::Vector<kaldi::BaseFloat>* inferences,
                              int32* inference_dim);
+
     void Dim();
     virtual void Reset();
     std::shared_ptr<Tensor<kaldi::BaseFloat>> GetCacheEncoder(
diff --git a/speechx/speechx/websocket/websocket_client.cc b/speechx/speechx/websocket/websocket_client.cc
index bf3bbef26facd4db9e2568d2f48a197f73e7bf89..6bd930b858aa10d15ac24397e2e29f33eeb22ebb 100644
--- a/speechx/speechx/websocket/websocket_client.cc
+++ b/speechx/speechx/websocket/websocket_client.cc
@@ -1,5 +1,5 @@
-// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-//
+// Copyright (c) 2020 Mobvoi Inc (Binbin Zhang)
+//               2022 PaddlePaddle Authors
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
diff --git a/speechx/speechx/websocket/websocket_client.h b/speechx/speechx/websocket/websocket_client.h
index 35def076d74242d607eda9058144bf6926089326..ac0aed310bd1f017550e3663a8589c740f769294 100644
--- a/speechx/speechx/websocket/websocket_client.h
+++ b/speechx/speechx/websocket/websocket_client.h
@@ -1,5 +1,5 @@
-// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-//
+// Copyright (c) 2020 Mobvoi Inc (Binbin Zhang)
+//               2022 PaddlePaddle Authors
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
diff --git a/speechx/speechx/websocket/websocket_server.cc b/speechx/speechx/websocket/websocket_server.cc
index 3f6da894bf76508d7b39c5923904741c9d6dece3..28c9eca4ee7776e8f1c4606dff60fa13b1a284bd 100644
--- a/speechx/speechx/websocket/websocket_server.cc
+++ b/speechx/speechx/websocket/websocket_server.cc
@@ -1,5 +1,5 @@
-// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-//
+// Copyright (c) 2020 Mobvoi Inc (Binbin Zhang)
+//               2022 PaddlePaddle Authors
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
@@ -27,26 +27,27 @@ ConnectionHandler::ConnectionHandler(
     : ws_(std::move(socket)), recognizer_resource_(recognizer_resource) {}
 
 void ConnectionHandler::OnSpeechStart() {
-    LOG(INFO) << "Recieved speech start signal, start reading speech";
-    got_start_tag_ = true;
-    json::value rv = {{"status", "ok"}, {"type", "server_ready"}};
-    ws_.text(true);
-    ws_.write(asio::buffer(json::serialize(rv)));
     recognizer_ = std::make_shared<Recognizer>(recognizer_resource_);
     // Start decoder thread
     decode_thread_ = std::make_shared<std::thread>(
         &ConnectionHandler::DecodeThreadFunc, this);
+    got_start_tag_ = true;
+    LOG(INFO) << "Server: Recieved speech start signal, start reading speech";
+    json::value rv = {{"status", "ok"}, {"type", "server_ready"}};
+    ws_.text(true);
+    ws_.write(asio::buffer(json::serialize(rv)));
 }
 
 void ConnectionHandler::OnSpeechEnd() {
-    LOG(INFO) << "Recieved speech end signal";
-    CHECK(recognizer_ != nullptr);
-    recognizer_->SetFinished();
+    LOG(INFO) << "Server: Recieved speech end signal";
+    if (recognizer_ != nullptr) {
+        recognizer_->SetFinished();
+    }
     got_end_tag_ = true;
 }
 
 void ConnectionHandler::OnFinalResult(const std::string& result) {
-    LOG(INFO) << "Final result: " << result;
+    LOG(INFO) << "Server: Final result: " << result;
     json::value rv = {
         {"status", "ok"}, {"type", "final_result"}, {"result", result}};
     ws_.text(true);
@@ -69,10 +70,16 @@ void ConnectionHandler::OnSpeechData(const beast::flat_buffer& buffer) {
         pcm_data(i) = static_cast<float>(*pdata);
         pdata++;
     }
-    VLOG(2) << "Recieved " << num_samples << " samples";
-    LOG(INFO) << "Recieved " << num_samples << " samples";
+    VLOG(2) << "Server: Recieved " << num_samples << " samples";
+    LOG(INFO) << "Server: Recieved " << num_samples << " samples";
     CHECK(recognizer_ != nullptr);
     recognizer_->Accept(pcm_data);
+
+    // TODO: return lpartial result
+    json::value rv = {
+        {"status", "ok"}, {"type", "partial_result"}, {"result", "TODO"}};
+    ws_.text(true);
+    ws_.write(asio::buffer(json::serialize(rv)));
 }
 
 void ConnectionHandler::DecodeThreadFunc() {
@@ -80,9 +87,9 @@ void ConnectionHandler::DecodeThreadFunc() {
         while (true) {
             recognizer_->Decode();
             if (recognizer_->IsFinished()) {
-                LOG(INFO) << "enter finish";
+                LOG(INFO) << "Server: enter finish";
                 recognizer_->Decode();
-                LOG(INFO) << "finish";
+                LOG(INFO) << "Server: finish";
                 std::string result = recognizer_->GetFinalResult();
                 OnFinalResult(result);
                 OnFinish();
@@ -135,7 +142,7 @@ void ConnectionHandler::operator()() {
             ws_.read(buffer);
             if (ws_.got_text()) {
                 std::string message = beast::buffers_to_string(buffer.data());
-                LOG(INFO) << message;
+                LOG(INFO) << "Server: Text: " << message;
                 OnText(message);
                 if (got_end_tag_) {
                     break;
@@ -152,7 +159,7 @@ void ConnectionHandler::operator()() {
             }
         }
 
-        LOG(INFO) << "Read all pcm data, wait for decoding thread";
+        LOG(INFO) << "Server: finished to wait for decoding thread join.";
         if (decode_thread_ != nullptr) {
             decode_thread_->join();
         }
diff --git a/speechx/speechx/websocket/websocket_server.h b/speechx/speechx/websocket/websocket_server.h
index 469f123f137566ae5d213f92213995f1088d02ad..9ea88282ec5e60682daeabcccf0d4f2c09c114fb 100644
--- a/speechx/speechx/websocket/websocket_server.h
+++ b/speechx/speechx/websocket/websocket_server.h
@@ -1,5 +1,5 @@
-// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-//
+// Copyright (c) 2020 Mobvoi Inc (Binbin Zhang)
+//               2022 PaddlePaddle Authors
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
@@ -12,7 +12,6 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-
 #pragma once
 
 #include "base/common.h"
diff --git a/tests/unit/cli/aishell_test_prepare.py b/tests/unit/cli/aishell_test_prepare.py
new file mode 100644
index 0000000000000000000000000000000000000000..288de62a0cc44bd6c9336d4570f3ac048e0a18ac
--- /dev/null
+++ b/tests/unit/cli/aishell_test_prepare.py
@@ -0,0 +1,142 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Prepare Aishell mandarin dataset
+
+Download, unpack and create manifest files.
+Manifest file is a json-format file with each line containing the
+meta data (i.e. audio filepath, transcript and audio duration)
+of each audio file in the data set.
+"""
+import argparse
+import codecs
+import json
+import os
+from pathlib import Path
+
+import soundfile
+
+from utils.utility import download
+from utils.utility import unpack
+
+DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
+
+URL_ROOT = 'http://www.openslr.org/resources/33'
+# URL_ROOT = 'https://openslr.magicdatatech.com/resources/33'
+DATA_URL = URL_ROOT + '/data_aishell.tgz'
+MD5_DATA = '2f494334227864a8a8fec932999db9d8'
+RESOURCE_URL = URL_ROOT + '/resource_aishell.tgz'
+MD5_RESOURCE = '957d480a0fcac85fc18e550756f624e5'
+
+parser = argparse.ArgumentParser(description=__doc__)
+parser.add_argument(
+    "--target_dir",
+    default=DATA_HOME + "/Aishell",
+    type=str,
+    help="Directory to save the dataset. (default: %(default)s)")
+parser.add_argument(
+    "--manifest_prefix",
+    default="manifest",
+    type=str,
+    help="Filepath prefix for output manifests. (default: %(default)s)")
+args = parser.parse_args()
+
+
+def create_manifest(data_dir, manifest_path_prefix):
+    print("Creating manifest %s ..." % manifest_path_prefix)
+    json_lines = []
+    transcript_path = os.path.join(data_dir, 'transcript',
+                                   'aishell_transcript_v0.8.txt')
+    transcript_dict = {}
+    for line in codecs.open(transcript_path, 'r', 'utf-8'):
+        line = line.strip()
+        if line == '':
+            continue
+        audio_id, text = line.split(' ', 1)
+        # remove withespace, charactor text
+        text = ''.join(text.split())
+        transcript_dict[audio_id] = text
+
+    data_types = ['test']
+    for dtype in data_types:
+        del json_lines[:]
+        total_sec = 0.0
+        total_text = 0.0
+        total_num = 0
+
+        audio_dir = os.path.join(data_dir, 'wav', dtype)
+        for subfolder, _, filelist in sorted(os.walk(audio_dir)):
+            for fname in filelist:
+                audio_path = os.path.abspath(os.path.join(subfolder, fname))
+                audio_id = os.path.basename(fname)[:-4]
+                # if no transcription for audio then skipped
+                if audio_id not in transcript_dict:
+                    continue
+
+                utt2spk = Path(audio_path).parent.name
+                audio_data, samplerate = soundfile.read(audio_path)
+                duration = float(len(audio_data) / samplerate)
+                text = transcript_dict[audio_id]
+                json_lines.append(audio_path)
+
+                total_sec += duration
+                total_text += len(text)
+                total_num += 1
+
+        manifest_path = manifest_path_prefix + '.' + dtype
+        with codecs.open(manifest_path, 'w', 'utf-8') as fout:
+            for line in json_lines:
+                fout.write(line + '\n')
+
+        manifest_dir = os.path.dirname(manifest_path_prefix)
+
+def prepare_dataset(url, md5sum, target_dir, manifest_path=None):
+    """Download, unpack and create manifest file."""
+    data_dir = os.path.join(target_dir, 'data_aishell')
+    if not os.path.exists(data_dir):
+        filepath = download(url, md5sum, target_dir)
+        unpack(filepath, target_dir)
+        # unpack all audio tar files
+        audio_dir = os.path.join(data_dir, 'wav')
+        for subfolder, _, filelist in sorted(os.walk(audio_dir)):
+            for ftar in filelist:
+                unpack(os.path.join(subfolder, ftar), subfolder, True)
+    else:
+        print("Skip downloading and unpacking. Data already exists in %s." %
+              target_dir)
+
+    if manifest_path:
+        create_manifest(data_dir, manifest_path)
+
+
+def main():
+    if args.target_dir.startswith('~'):
+        args.target_dir = os.path.expanduser(args.target_dir)
+
+    prepare_dataset(
+        url=DATA_URL,
+        md5sum=MD5_DATA,
+        target_dir=args.target_dir,
+        manifest_path=args.manifest_prefix)
+
+    prepare_dataset(
+        url=RESOURCE_URL,
+        md5sum=MD5_RESOURCE,
+        target_dir=args.target_dir,
+        manifest_path=None)
+
+    print("Data download and manifest prepare done!")
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tests/unit/cli/calc_rtf_by_aishell.sh b/tests/unit/cli/calc_rtf_by_aishell.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cee79160e0720ae7abb877ff1a39a84bd27a523d
--- /dev/null
+++ b/tests/unit/cli/calc_rtf_by_aishell.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+source path.sh
+stage=-1
+stop_stage=100
+MAIN_ROOT=../../..
+
+. ${MAIN_ROOT}/utils/parse_options.sh || exit -1;
+TARGET_DIR=${MAIN_ROOT}/dataset
+mkdir -p ${TARGET_DIR}
+mkdir -p data
+
+if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
+    # download data, generate manifests
+    python3 aishell_test_prepare.py \
+    --manifest_prefix="data/manifest" \
+    --target_dir="${TARGET_DIR}/aishell"
+
+    if [ $? -ne 0 ]; then
+        echo "Prepare Aishell failed. Terminated."
+        exit 1
+    fi
+
+fi
+
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+   cat data/manifest.test | paddlespeech asr --model conformer_online_aishell --device gpu --decode_method ctc_prefix_beam_search --rtf -v
+fi
diff --git a/tests/unit/cli/path.sh b/tests/unit/cli/path.sh
new file mode 100644
index 0000000000000000000000000000000000000000..38a242a4ab3dd01e29873e8f827f9bdb4656fb57
--- /dev/null
+++ b/tests/unit/cli/path.sh
@@ -0,0 +1,11 @@
+export MAIN_ROOT=`realpath ${PWD}/../../../`
+
+export PATH=${MAIN_ROOT}:${MAIN_ROOT}/utils:${PATH}
+export LC_ALL=C
+
+export PYTHONDONTWRITEBYTECODE=1
+# Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
+export PYTHONIOENCODING=UTF-8
+export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}
+
+export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/
diff --git a/tests/unit/cli/test_cli.sh b/tests/unit/cli/test_cli.sh
index 87c24b099ce01f9bd1b319809e3860137faec24b..e1f1853f6f0672486641fc6d1a7f1c8cc0c15eba 100755
--- a/tests/unit/cli/test_cli.sh
+++ b/tests/unit/cli/test_cli.sh
@@ -1,5 +1,6 @@
 #!/bin/bash
 set -e
+echo -e "\e[1;31monly if you see 'Test success !!!', the cli testing is successful\e[0m"
 
 # Audio classification
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
@@ -11,7 +12,28 @@ paddlespeech text --input 今天的天气真不错啊你下午有空吗我想约
 # Speech_recognition
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
 paddlespeech asr --input ./zh.wav
+paddlespeech asr --model conformer_aishell --input ./zh.wav
+paddlespeech asr --model conformer_online_aishell --input ./zh.wav
+paddlespeech asr --model conformer_online_wenetspeech --input ./zh.wav
+paddlespeech asr --model conformer_online_multicn --input ./zh.wav
 paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
+paddlespeech asr --model deepspeech2offline_aishell --input ./zh.wav
+paddlespeech asr --model deepspeech2online_wenetspeech --input ./zh.wav
+paddlespeech asr --model deepspeech2online_aishell --input ./zh.wav
+paddlespeech asr --model deepspeech2offline_librispeech --lang en --input ./en.wav
+
+# long audio restriction
+{
+wget -c https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/test_long_audio_01.wav
+paddlespeech asr --input test_long_audio_01.wav
+if [ $? -ne 255 ]; then
+   echo -e "\e[1;31mTime restriction not passed\e[0m"
+   exit 1
+fi
+} &&
+{
+ echo -e "\033[32mTime restriction passed\033[0m"
+}
 
 # Text To Speech
 paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！"
@@ -56,3 +78,6 @@ paddlespeech stats --task cls
 paddlespeech stats --task text
 paddlespeech stats --task vector
 paddlespeech stats --task st
+
+
+echo -e "\033[32mTest success !!!\033[0m"
diff --git a/tests/unit/server/offline/change_yaml.py b/tests/unit/server/offline/change_yaml.py
index cdeaebdbcf35b8c4345d1bdaca5ccfa6ebffbc1b..ded7e3b411046022425bb0b552f3c7a04424b7d5 100644
--- a/tests/unit/server/offline/change_yaml.py
+++ b/tests/unit/server/offline/change_yaml.py
@@ -1,6 +1,7 @@
 #!/usr/bin/python
 import argparse
 import os
+import shutil
 
 import yaml
 
@@ -14,12 +15,12 @@ def change_device(yamlfile: str, engine: str, device: str):
         model_type (dict): change model type
     """
     tmp_yamlfile = yamlfile.split(".yaml")[0] + "_tmp.yaml"
-    os.system("cp %s %s" % (yamlfile, tmp_yamlfile))
+    shutil.copyfile(yamlfile, tmp_yamlfile)
 
     if device == 'cpu':
         set_device = 'cpu'
     elif device == 'gpu':
-        set_device = 'gpu:0'
+        set_device = 'gpu:3'
     else:
         print("Please set correct device: cpu or gpu.")
 
@@ -41,7 +42,7 @@ def change_device(yamlfile: str, engine: str, device: str):
 
         print(yaml.dump(y, default_flow_style=False, sort_keys=False))
         yaml.dump(y, fw, allow_unicode=True)
-    os.system("rm %s" % (tmp_yamlfile))
+    os.remove(tmp_yamlfile)
     print("Change %s successfully." % (yamlfile))
 
 
@@ -52,7 +53,7 @@ def change_engine_type(yamlfile: str, engine_type):
         task (str):  asr or tts
     """
     tmp_yamlfile = yamlfile.split(".yaml")[0] + "_tmp.yaml"
-    os.system("cp %s %s" % (yamlfile, tmp_yamlfile))
+    shutil.copyfile(yamlfile, tmp_yamlfile)
     speech_task = engine_type.split("_")[0]
 
     with open(tmp_yamlfile) as f, open(yamlfile, "w+", encoding="utf-8") as fw:
@@ -65,7 +66,7 @@ def change_engine_type(yamlfile: str, engine_type):
         y['engine_list'] = engine_list
         print(yaml.dump(y, default_flow_style=False, sort_keys=False))
         yaml.dump(y, fw, allow_unicode=True)
-    os.system("rm %s" % (tmp_yamlfile))
+    os.remove(tmp_yamlfile)
     print("Change %s successfully." % (yamlfile))
 
 
diff --git a/tests/unit/server/offline/conf/application.yaml b/tests/unit/server/offline/conf/application.yaml
index 2b1a05998083e08377d63ee02bc77323a7c4dce5..ce399e288c9e4c51069538962568dfaa03565f8e 100644
--- a/tests/unit/server/offline/conf/application.yaml
+++ b/tests/unit/server/offline/conf/application.yaml
@@ -1,14 +1,14 @@
-# This is the parameter configuration file for PaddleSpeech Serving.
+# This is the parameter configuration file for PaddleSpeech Offline Serving.
 
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
 port: 8090
 
 # The task format in the engin_list is: <speech task>_<engine type>
-# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference']
-
+# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
+protocol: 'http'
 engine_list: ['asr_python', 'tts_python', 'cls_python']
 
 
diff --git a/tests/unit/server/offline/test_server_client.sh b/tests/unit/server/offline/test_server_client.sh
index e7ae7604d177f4fa0f654a10e64e5eb9ba45669e..dc52609c50162a6abc990fc0e51dabaea76bd6ef 100644
--- a/tests/unit/server/offline/test_server_client.sh
+++ b/tests/unit/server/offline/test_server_client.sh
@@ -39,6 +39,7 @@ ClientTest(){
     ((test_times+=1))
     paddlespeech_client cls --server_ip $server_ip --port $port --input ./zh.wav 
     ((test_times+=1)) 
+
 }
 
 GetTestResult() {
@@ -58,6 +59,7 @@ rm -rf log/server.log.wf
 rm -rf log/server.log
 rm -rf log/test_result.log
 
+cp ../../../../demos/speech_server/conf/application.yaml ./conf/
 config_file=./conf/application.yaml
 server_ip=$(cat $config_file | grep "host" | awk -F " " '{print $2}')
 port=$(cat $config_file | grep "port" | awk '/port:/ {print $2}')
@@ -191,5 +193,4 @@ echo "***************** Here are all the test results ********************"
 cat ./log/test_result.log
 
 # Restoring conf is the same as demos/speech_server
-rm -rf ./conf
-cp ../../../demos/speech_server/conf/ ./ -rf
\ No newline at end of file
+cp ../../../../demos/speech_server/conf/application.yaml ./conf/
diff --git a/tests/unit/server/online/tts/check_server/change_yaml.py b/tests/unit/server/online/tts/check_server/change_yaml.py
index c46d5bd0f12cf6c11883ac2e7fe452e91c8b7407..b04ad0a84b7f658357603988857d956d5d319456 100644
--- a/tests/unit/server/online/tts/check_server/change_yaml.py
+++ b/tests/unit/server/online/tts/check_server/change_yaml.py
@@ -1,9 +1,11 @@
 #!/usr/bin/python
 import argparse
 import os
+import shutil
 
 import yaml
 
+
 def change_value(args):
     yamlfile = args.config_file
     change_type = args.change_type
@@ -12,7 +14,7 @@ def change_value(args):
     target_value = args.target_value
 
     tmp_yamlfile = yamlfile.split(".yaml")[0] + "_tmp.yaml"
-    os.system("cp %s %s" % (yamlfile, tmp_yamlfile))
+    shutil.copyfile(yamlfile, tmp_yamlfile)
 
     with open(tmp_yamlfile) as f, open(yamlfile, "w+", encoding="utf-8") as fw:
         y = yaml.safe_load(f)
@@ -50,7 +52,7 @@ def change_value(args):
 
         print(yaml.dump(y, default_flow_style=False, sort_keys=False))
         yaml.dump(y, fw, allow_unicode=True)
-    os.system("rm %s" % (tmp_yamlfile))
+    os.remove(tmp_yamlfile)
     print(f"Change key: {target_key} to value: {target_value} successfully.")
 
 
diff --git a/tests/unit/server/online/tts/check_server/conf/application.yaml b/tests/unit/server/online/tts/check_server/conf/application.yaml
index 347411b6633b6c3de805e0a8be455b2aede44c44..9bf663964c93c3a3de886664451ea93f2953aa06 100644
--- a/tests/unit/server/online/tts/check_server/conf/application.yaml
+++ b/tests/unit/server/online/tts/check_server/conf/application.yaml
@@ -3,7 +3,7 @@
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
 port: 8092
 
 # The task format in the engin_list is: <speech task>_<engine type>
@@ -39,9 +39,9 @@ tts_online:
     # others
     lang: 'zh'
     device: 'cpu' # set 'gpu:id' or 'cpu'
-    am_block: 42
+    am_block: 72
     am_pad: 12
-    voc_block: 14
+    voc_block: 36
     voc_pad: 14
     
 
@@ -67,7 +67,7 @@ tts_online-onnx:
     am_sess_conf:
         device: "cpu" # set 'gpu:id' or 'cpu'
         use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
 
     # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
     voc: 'mb_melgan_csmsc_onnx'
@@ -76,13 +76,13 @@ tts_online-onnx:
     voc_sess_conf:
         device: "cpu" # set 'gpu:id' or 'cpu'
         use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
 
     # others
     lang: 'zh'
-    am_block: 42
+    am_block: 72
     am_pad: 12
-    voc_block: 14
+    voc_block: 36
     voc_pad: 14
     voc_upsample: 300
     
diff --git a/tests/unit/server/online/tts/check_server/http_client.py b/tests/unit/server/online/tts/check_server/http_client.py
deleted file mode 100644
index cbc1f5c023968eba254a8d1ab18926abcade69e6..0000000000000000000000000000000000000000
--- a/tests/unit/server/online/tts/check_server/http_client.py
+++ /dev/null
@@ -1,100 +0,0 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import argparse
-import base64
-import json
-import os
-import time
-
-import requests
-
-from paddlespeech.server.utils.audio_process import pcm2wav
-
-
-def save_audio(buffer, audio_path) -> bool:
-    if args.save_path.endswith("pcm"):
-        with open(args.save_path, "wb") as f:
-            f.write(buffer)
-    elif args.save_path.endswith("wav"):
-        with open("./tmp.pcm", "wb") as f:
-            f.write(buffer)
-        pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000)
-        os.system("rm ./tmp.pcm")
-    else:
-        print("Only supports saved audio format is pcm or wav")
-        return False
-
-    return True
-
-
-def test(args):
-    params = {
-        "text": args.text,
-        "spk_id": args.spk_id,
-        "speed": args.speed,
-        "volume": args.volume,
-        "sample_rate": args.sample_rate,
-        "save_path": ''
-    }
-
-    buffer = b''
-    flag = 1
-    url = "http://" + str(args.server) + ":" + str(
-        args.port) + "/paddlespeech/streaming/tts"
-    st = time.time()
-    html = requests.post(url, json.dumps(params), stream=True)
-    for chunk in html.iter_content(chunk_size=1024):
-        chunk = base64.b64decode(chunk)  # bytes
-        if flag:
-            first_response = time.time() - st
-            print(f"首包响应：{first_response} s")
-            flag = 0
-        buffer += chunk
-
-    final_response = time.time() - st
-    duration = len(buffer) / 2.0 / 24000
-
-    print(f"尾包响应：{final_response} s")
-    print(f"音频时长：{duration} s")
-    print(f"RTF: {final_response / duration}")
-
-    if args.save_path is not None:
-        if save_audio(buffer, args.save_path):
-            print("音频保存至：", args.save_path)
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        '--text',
-        type=str,
-        default="您好，欢迎使用语音合成服务。",
-        help='A sentence to be synthesized')
-    parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
-    parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
-    parser.add_argument(
-        '--volume', type=float, default=1.0, help='Audio volume')
-    parser.add_argument(
-        '--sample_rate',
-        type=int,
-        default=0,
-        help='Sampling rate, the default is the same as the model')
-    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
-    parser.add_argument("--port", type=int, help="server port", default=8092)
-    parser.add_argument(
-        "--save_path", type=str, help="save audio path", default=None)
-
-    args = parser.parse_args()
-    test(args)
diff --git a/tests/unit/server/online/tts/check_server/test.sh b/tests/unit/server/online/tts/check_server/test.sh
index 54e274f1ff67bbf7d6337b67bc72aa1458d4c717..766aea8507790f6d4be79af0ea4b56b967771f86 100644
--- a/tests/unit/server/online/tts/check_server/test.sh
+++ b/tests/unit/server/online/tts/check_server/test.sh
@@ -28,7 +28,7 @@ StartService(){
 ClientTest_http(){
     for ((i=1; i<=3;i++))
     do
-    python http_client.py --save_path ./out_http.wav 
+    paddlespeech_client tts_online --input "您好，欢迎使用百度飞桨深度学习框架。" 
     ((http_test_times+=1))
     done
 }
@@ -36,7 +36,7 @@ ClientTest_http(){
 ClientTest_ws(){
     for ((i=1; i<=3;i++))
     do
-    python ws_client.py
+    paddlespeech_client tts_online --input "您好，欢迎使用百度飞桨深度学习框架。" --protocol websocket
     ((ws_test_times+=1))
     done
 }
@@ -71,6 +71,7 @@ rm -rf $log/server.log.wf
 rm -rf $log/server.log
 rm -rf $log/test_result.log
 
+
 config_file=./conf/application.yaml
 server_ip=$(cat $config_file | grep "host" | awk -F " " '{print $2}')
 port=$(cat $config_file | grep "port" | awk '/port:/ {print $2}')
diff --git a/tests/unit/server/online/tts/check_server/test_all.sh b/tests/unit/server/online/tts/check_server/test_all.sh
index 8e490255df177de04382ff810399e5f238ba7340..94129860e1eb562d401f8b8ad1ef1f8b16a93109 100644
--- a/tests/unit/server/online/tts/check_server/test_all.sh
+++ b/tests/unit/server/online/tts/check_server/test_all.sh
@@ -3,12 +3,13 @@
 
 log_all_dir=./log
 
+cp ./tts_online_application.yaml ./conf/application.yaml -rf
+
 bash test.sh tts_online $log_all_dir/log_tts_online_cpu
 
 python change_yaml.py --change_type engine_type --target_key engine_list --target_value tts_online-onnx
 bash test.sh tts_online-onnx $log_all_dir/log_tts_online-onnx_cpu
 
-
 python change_yaml.py --change_type device --target_key device --target_value gpu:3
 bash test.sh tts_online $log_all_dir/log_tts_online_gpu
 
diff --git a/tests/unit/server/online/tts/check_server/tts_online_application.yaml b/tests/unit/server/online/tts/check_server/tts_online_application.yaml
index 347411b6633b6c3de805e0a8be455b2aede44c44..9bf663964c93c3a3de886664451ea93f2953aa06 100644
--- a/tests/unit/server/online/tts/check_server/tts_online_application.yaml
+++ b/tests/unit/server/online/tts/check_server/tts_online_application.yaml
@@ -3,7 +3,7 @@
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
-host: 127.0.0.1
+host: 0.0.0.0
 port: 8092
 
 # The task format in the engin_list is: <speech task>_<engine type>
@@ -39,9 +39,9 @@ tts_online:
     # others
     lang: 'zh'
     device: 'cpu' # set 'gpu:id' or 'cpu'
-    am_block: 42
+    am_block: 72
     am_pad: 12
-    voc_block: 14
+    voc_block: 36
     voc_pad: 14
     
 
@@ -67,7 +67,7 @@ tts_online-onnx:
     am_sess_conf:
         device: "cpu" # set 'gpu:id' or 'cpu'
         use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
 
     # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
     voc: 'mb_melgan_csmsc_onnx'
@@ -76,13 +76,13 @@ tts_online-onnx:
     voc_sess_conf:
         device: "cpu" # set 'gpu:id' or 'cpu'
         use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
 
     # others
     lang: 'zh'
-    am_block: 42
+    am_block: 72
     am_pad: 12
-    voc_block: 14
+    voc_block: 36
     voc_pad: 14
     voc_upsample: 300
     
diff --git a/tests/unit/server/online/tts/check_server/ws_client.py b/tests/unit/server/online/tts/check_server/ws_client.py
deleted file mode 100644
index eef010cf2b1e6f60a0a87c24b2f2d69660d6b392..0000000000000000000000000000000000000000
--- a/tests/unit/server/online/tts/check_server/ws_client.py
+++ /dev/null
@@ -1,126 +0,0 @@
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import _thread as thread
-import argparse
-import base64
-import json
-import ssl
-import time
-
-import websocket
-
-flag = 1
-st = 0.0
-all_bytes = b''
-
-
-class WsParam(object):
-    # 初始化
-    def __init__(self, text, server="127.0.0.1", port=8090):
-        self.server = server
-        self.port = port
-        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
-        self.text = text
-
-    # 生成url
-    def create_url(self):
-        return self.url
-
-
-def on_message(ws, message):
-    global flag
-    global st
-    global all_bytes
-
-    try:
-        message = json.loads(message)
-        audio = message["audio"]
-        audio = base64.b64decode(audio)  # bytes
-        status = message["status"]
-        all_bytes += audio
-
-        if status == 0:
-            print("create successfully.")
-        elif status == 1:
-            if flag:
-                print(f"首包响应：{time.time() - st} s")
-                flag = 0
-        elif status == 2:
-            final_response = time.time() - st
-            duration = len(all_bytes) / 2.0 / 24000
-            print(f"尾包响应：{final_response} s")
-            print(f"音频时长：{duration} s")
-            print(f"RTF: {final_response / duration}")
-            with open("./out.pcm", "wb") as f:
-                f.write(all_bytes)
-            print("ws is closed")
-            ws.close()
-        else:
-            print("infer error")
-
-    except Exception as e:
-        print("receive msg,but parse exception:", e)
-
-
-# 收到websocket错误的处理
-def on_error(ws, error):
-    print("### error:", error)
-
-
-# 收到websocket关闭的处理
-def on_close(ws):
-    print("### closed ###")
-
-
-# 收到websocket连接建立的处理
-def on_open(ws):
-    def run(*args):
-        global st
-        text_base64 = str(
-            base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
-        d = {"text": text_base64}
-        d = json.dumps(d)
-        print("Start sending text data")
-        st = time.time()
-        ws.send(d)
-
-    thread.start_new_thread(run, ())
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "--text",
-        type=str,
-        help="A sentence to be synthesized",
-        default="您好，欢迎使用语音合成服务。")
-    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
-    parser.add_argument("--port", type=int, help="server port", default=8092)
-    args = parser.parse_args()
-
-    print("***************************************")
-    print("Server ip: ", args.server)
-    print("Server port: ", args.port)
-    print("Sentence to be synthesized: ", args.text)
-    print("***************************************")
-
-    wsParam = WsParam(text=args.text, server=args.server, port=args.port)
-
-    websocket.enableTrace(False)
-    wsUrl = wsParam.create_url()
-    ws = websocket.WebSocketApp(
-        wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
-    ws.on_open = on_open
-    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
diff --git a/tests/unit/server/online/tts/test_server/test_http_client.py b/tests/unit/server/online/tts/test_server/test_http_client.py
index 96372ab37c141825d7d59d79f876ab6dccd22b9e..3174e85e28c6de76b49bc348d6112165577e5837 100644
--- a/tests/unit/server/online/tts/test_server/test_http_client.py
+++ b/tests/unit/server/online/tts/test_server/test_http_client.py
@@ -12,117 +12,35 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
-import base64
-import json
+import asyncio
 import os
-import time
 
-import requests
-
-from paddlespeech.server.utils.audio_process import pcm2wav
+from paddlespeech.server.utils.util import compute_delay
 from paddlespeech.t2s.exps.syn_utils import get_sentences
 
 
-def save_audio(buffer, audio_path) -> bool:
-    if audio_path.endswith("pcm"):
-        with open(audio_path, "wb") as f:
-            f.write(buffer)
-    elif audio_path.endswith("wav"):
-        with open("./tmp.pcm", "wb") as f:
-            f.write(buffer)
-        pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000)
-        os.system("rm ./tmp.pcm")
-    else:
-        print("Only supports saved audio format is pcm or wav")
-        return False
-
-    return True
-
-
 def test(args, text, utt_id):
-    params = {
-        "text": text,
-        "spk_id": args.spk_id,
-        "speed": args.speed,
-        "volume": args.volume,
-        "sample_rate": args.sample_rate,
-        "save_path": ''
-    }
-
-    buffer = b''
-    flag = 1
-    url = "http://" + str(args.server) + ":" + str(
-        args.port) + "/paddlespeech/streaming/tts"
-    st = time.time()
-    html = requests.post(url, json.dumps(params), stream=True)
-    for chunk in html.iter_content(chunk_size=1024):
-        chunk = base64.b64decode(chunk)  # bytes
-        if flag:
-            first_response = time.time() - st
-            print(f"首包响应：{first_response} s")
-            flag = 0
-        buffer += chunk
-
-    final_response = time.time() - st
-    duration = len(buffer) / 2.0 / 24000
-
-    print(f"sentence: {text}")
-    print(f"尾包响应：{final_response} s")
-    print(f"音频时长：{duration} s")
-    print(f"RTF: {final_response / duration}")
-
-    save_path = str(args.output_dir + "/" + utt_id + ".wav")
-    save_audio(buffer, save_path)
-    print("音频保存至：", save_path)
-
-    return first_response, final_response, duration
-
-
-def count_engine(logfile: str="./nohup.out"):
-    """For inference on the statistical engine side
-
-    Args:
-        logfile (str, optional): server log. Defaults to "./nohup.out".
-    """
-    first_response_list = []
-    final_response_list = []
-    duration_list = []
+    output = str(args.output_dir + "/" + utt_id + ".wav")
+    if args.protocol == "http":
+        print("tts http client start")
+        from paddlespeech.server.utils.audio_handler import TTSHttpHandler
+        handler = TTSHttpHandler(args.server_ip, args.port, args.play)
+        first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list = handler.run(
+            text, args.spk_id, args.speed, args.volume, args.sample_rate,
+            output)
+
+    elif args.protocol == "websocket":
+        from paddlespeech.server.utils.audio_handler import TTSWsHandler
+        print("tts websocket client start")
+        handler = TTSWsHandler(args.server_ip, args.port, args.play)
+        loop = asyncio.get_event_loop()
+        first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list = loop.run_until_complete(
+            handler.run(text, output))
 
-    with open(logfile, "r") as f:
-        for line in f.readlines():
-            if "- first response time:" in line:
-                first_response = float(line.splie(" ")[-2])
-                first_response_list.append(first_response)
-            elif "- final response time:" in line:
-                final_response = float(line.splie(" ")[-2])
-                final_response_list.append(final_response)
-            elif "- The durations of audio is:" in line:
-                duration = float(line.splie(" ")[-2])
-                duration_list.append(duration)
+    else:
+        print("Please set correct protocol, http or websocket")
 
-    assert (len(first_response_list) == len(final_response_list) and
-            len(final_response_list) == len(duration_list))
-
-    avg_first_response = sum(first_response_list) / len(first_response_list)
-    avg_final_response = sum(final_response_list) / len(final_response_list)
-    avg_duration = sum(duration_list) / len(duration_list)
-    RTF = sum(final_response_list) / sum(duration_list)
-
-    print(
-        "************************* engine result ***************************************"
-    )
-    print(
-        f"test num: {len(duration_list)}, avg first response: {avg_first_response} s, avg final response: {avg_final_response} s, avg duration: {avg_duration}, RTF: {RTF}"
-    )
-    print(
-        f"min duration: {min(duration_list)} s, max duration: {max(duration_list)} s"
-    )
-    print(
-        f"max first response: {max(first_response_list)} s, min first response: {min(first_response_list)} s"
-    )
-    print(
-        f"max final response: {max(final_response_list)} s, min final response: {min(final_response_list)} s"
-    )
+    return first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list
 
 
 if __name__ == "__main__":
@@ -142,26 +60,56 @@ if __name__ == "__main__":
         default=0,
         help='Sampling rate, the default is the same as the model')
     parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
+        "--server_ip", type=str, help="server ip", default="127.0.0.1")
     parser.add_argument("--port", type=int, help="server port", default=8092)
+    parser.add_argument(
+        "--protocol",
+        type=str,
+        choices=['http', 'websocket'],
+        help="server protocol",
+        default="http")
     parser.add_argument(
         "--output_dir", type=str, default="./output", help="output dir")
+    parser.add_argument(
+        "--play", type=bool, help="whether to play audio", default=False)
 
     args = parser.parse_args()
 
-    os.system("rm -rf %s" % (args.output_dir))
-    os.mkdir(args.output_dir)
+    if not os.path.exists(args.output_dir):
+        os.makedirs(args.output_dir)
 
     first_response_list = []
     final_response_list = []
     duration_list = []
+    all_delay_list = []
+    packet_count = 0.0
 
     sentences = get_sentences(text_file=args.text, lang="zh")
     for utt_id, sentence in sentences:
-        first_response, final_response, duration = test(args, sentence, utt_id)
+        first_response, final_response, duration, save_audio_success, receive_time_list, chunk_duration_list = test(
+            args, sentence, utt_id)
+        delay_time_list = compute_delay(receive_time_list, chunk_duration_list)
         first_response_list.append(first_response)
         final_response_list.append(final_response)
         duration_list.append(duration)
+        packet_count += len(receive_time_list)
+
+        print(f"句子：{sentence}")
+        print(f"首包响应时间：{first_response} s")
+        print(f"尾包响应时间：{final_response} s")
+        print(f"音频时长：{duration} s")
+        print(f"该句RTF：{final_response/duration}")
+
+        if delay_time_list != []:
+            for t in delay_time_list:
+                all_delay_list.append(t)
+            print(
+                f"该句流式合成的延迟情况：总包个数：{len(receive_time_list)}，延迟包个数：{len(delay_time_list)}, 最小延迟时间：{min(delay_time_list)} s, 最大延迟时间：{max(delay_time_list)} s, 平均延迟时间：{sum(delay_time_list)/len(delay_time_list)} s, 延迟率：{len(delay_time_list)/len(receive_time_list)}"
+            )
+        else:
+            print("该句流式合成无延迟情况")
+
+        packet_count += len(receive_time_list)
 
     assert (len(first_response_list) == len(final_response_list) and
             len(final_response_list) == len(duration_list))
@@ -170,19 +118,35 @@ if __name__ == "__main__":
     avg_final_response = sum(final_response_list) / len(final_response_list)
     avg_duration = sum(duration_list) / len(duration_list)
     RTF = sum(final_response_list) / sum(duration_list)
+    if all_delay_list != []:
+        delay_count = len(all_delay_list)
+        avg_delay = sum(all_delay_list) / len(all_delay_list)
+        delay_ratio = len(all_delay_list) / packet_count
+        min_delay = min(all_delay_list)
+        max_delay = max(all_delay_list)
+    else:
+        delay_count = 0.0
+        avg_delay = 0.0
+        delay_ratio = 0.0
+        min_delay = 0.0
+        max_delay = 0.0
 
     print(
         "************************* server/client result ***************************************"
     )
     print(
-        f"test num: {len(duration_list)}, avg first response: {avg_first_response} s, avg final response: {avg_final_response} s, avg duration: {avg_duration}, RTF: {RTF}"
+        f"test num: {len(duration_list)}, avg first response: {avg_first_response} s, avg final response: {avg_final_response} s, avg duration: {avg_duration}, RTF: {RTF}."
+    )
+    print(
+        f"test num: {len(duration_list)}, packet count: {packet_count}, delay count: {delay_count}, avg delay time: {avg_delay} s, delay ratio: {delay_ratio} "
     )
     print(
         f"min duration: {min(duration_list)} s, max duration: {max(duration_list)} s"
     )
     print(
-        f"max first response: {max(first_response_list)} s, min first response: {min(first_response_list)} s"
+        f"min first response: {min(first_response_list)} s, max first response: {max(first_response_list)} s."
     )
     print(
-        f"max final response: {max(final_response_list)} s, min final response: {min(final_response_list)} s"
+        f"min final response: {min(final_response_list)} s, max final response: {max(final_response_list)} s."
     )
+    print(f"min delay: {min_delay} s, max delay: {max_delay}")